03 Apr, 2015

1 commit

  • Thomas Schlichter reports the following issue on his Samsung NC20:

    "The C-states C1 and C2 to the OS when connected to AC, and additionally
    provides the C3 C-state when disconnected from AC. However, the number
    of C-states shown in sysfs is fixed to the number of C-states present
    at boot.
    If I boot with AC connected, I always only see the C-states up to C2
    even if I disconnect AC.

    The reason is commit 130a5f692425 (ACPI / cpuidle: remove dev->state_count
    setting). It removes the update of dev->state_count, but sysfs uses
    exactly this variable to show the C-states.

    The fix is to use drv->state_count in sysfs. As this is currently the
    last user of dev->state_count, this variable can be completely removed."

    Remove dev->state_count as per the above.

    Reported-by: Thomas Schlichter
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Kyungmin Park
    Acked-by: Daniel Lezcano
    Cc: 3.14+ # 3.14+
    [ rjw: Changelog ]
    Signed-off-by: Rafael J. Wysocki

    Bartlomiej Zolnierkiewicz
     

06 Mar, 2015

1 commit

  • Commit 381063133246 (PM / sleep: Re-implement suspend-to-idle handling)
    overlooked the fact that entering some sufficiently deep idle states
    by CPUs may cause their local timers to stop and in those cases it
    is necessary to switch over to a broadcast timer prior to entering
    the idle state. If the cpuidle driver in use does not provide
    the new ->enter_freeze callback for any of the idle states, that
    problem affects suspend-to-idle too, but it is not taken into account
    after the changes made by commit 381063133246.

    Fix that by changing the definition of cpuidle_enter_freeze() and
    re-arranging of the code in cpuidle_idle_call(), so the former does
    not call cpuidle_enter() any more and the fallback case is handled
    by cpuidle_idle_call() directly.

    Fixes: 381063133246 (PM / sleep: Re-implement suspend-to-idle handling)
    Reported-and-tested-by: Lorenzo Pieralisi
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

16 Feb, 2015

1 commit

  • The efficiency of suspend-to-idle depends on being able to keep CPUs
    in the deepest available idle states for as much time as possible.
    Ideally, they should only be brought out of idle by system wakeup
    interrupts.

    However, timer interrupts occurring periodically prevent that from
    happening and it is not practical to chase all of the "misbehaving"
    timers in a whack-a-mole fashion. A much more effective approach is
    to suspend the local ticks for all CPUs and the entire timekeeping
    along the lines of what is done during full suspend, which also
    helps to keep suspend-to-idle and full suspend reasonably similar.

    The idea is to suspend the local tick on each CPU executing
    cpuidle_enter_freeze() and to make the last of them suspend the
    entire timekeeping. That should prevent timer interrupts from
    triggering until an IO interrupt wakes up one of the CPUs. It
    needs to be done with interrupts disabled on all of the CPUs,
    though, because otherwise the suspended clocksource might be
    accessed by an interrupt handler which might lead to fatal
    consequences.

    Unfortunately, the existing ->enter callbacks provided by cpuidle
    drivers generally cannot be used for implementing that, because some
    of them re-enable interrupts temporarily and some idle entry methods
    cause interrupts to be re-enabled automatically on exit. Also some
    of these callbacks manipulate local clock event devices of the CPUs
    which really shouldn't be done after suspending their ticks.

    To overcome that difficulty, introduce a new cpuidle state callback,
    ->enter_freeze, that will be guaranteed (1) to keep interrupts
    disabled all the time (and return with interrupts disabled) and (2)
    not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to
    look for the deepest available idle state with ->enter_freeze present
    and to make the CPU execute that callback with suspended tick (and the
    last of the online CPUs to execute it with suspended timekeeping).

    Suggested-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

14 Feb, 2015

1 commit

  • In preparation for adding support for quiescing timers in the final
    stage of suspend-to-idle transitions, rework the freeze_enter()
    function making the system wait on a wakeup event, the freeze_wake()
    function terminating the suspend-to-idle loop and the mechanism by
    which deep idle states are entered during suspend-to-idle.

    First of all, introduce a simple state machine for suspend-to-idle
    and make the code in question use it.

    Second, prevent freeze_enter() from losing wakeup events due to race
    conditions and ensure that the number of online CPUs won't change
    while it is being executed. In addition to that, make it force
    all of the CPUs re-enter the idle loop in case they are in idle
    states already (so they can enter deeper idle states if possible).

    Next, drop cpuidle_use_deepest_state() and replace use_deepest_state
    checks in cpuidle_select() and cpuidle_reflect() with a single
    suspend-to-idle state check in cpuidle_idle_call().

    Finally, introduce cpuidle_enter_freeze() that will simply find the
    deepest idle state available to the given CPU and enter it using
    cpuidle_enter().

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)

    Rafael J. Wysocki
     

17 Dec, 2014

1 commit

  • CPUIDLE_FLAG_TIME_INVALID is no longer checked
    by menu or ladder cpuidle governors, so don't
    bother setting or defining it.

    It was originally invented to account for the fact that
    acpi_safe_halt() enables interrupts to invoke HLT.
    That would allow interrupt service routines to be included
    in the last_idle duration measurements made in cpuidle_enter_state(),
    potentially returning a duration much larger than reality.

    But menu and ladder can gracefully handle erroneously large duration
    intervals without checking for CPUIDLE_FLAG_TIME_INVALID.
    Further, if they don't check CPUIDLE_FLAG_TIME_INVALID, they
    can also benefit from the instances when the duration interval
    is not erroneously large.

    Signed-off-by: Len Brown
    Acked-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     

13 Nov, 2014

1 commit

  • The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
    method is not set. Otherwise for all the drivers, the time can be correctly
    measured.

    Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
    for all the states, just invert the logic by replacing it by the flag
    CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
    driver, remove the former flag from all the drivers and invert the logic with
    this flag in the different governor.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

10 Jun, 2014

1 commit

  • Pull MIPS updates from Ralf Baechle:
    - three fixes for 3.15 that didn't make it in time
    - limited Octeon 3 support.
    - paravirtualization support
    - improvment to platform support for Netlogix SOCs.
    - add support for powering down the Malta eval board in software
    - add many instructions to the in-kernel microassembler.
    - add support for the BPF JIT.
    - minor cleanups of the BCM47xx code.
    - large cleanup of math emu code resulting in significant code size
    reduction, better readability of the code and more accurate
    emulation.
    - improvments to the MIPS CPS code.
    - support C3 power status for the R4k count/compare clock device.
    - improvments to the GIO support for older SGI workstations.
    - increase number of supported CPUs to 256; this can be reached on
    certain embedded multithreaded ccNUMA configurations.
    - various small cleanups, updates and fixes

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (173 commits)
    MIPS: IP22/IP28: Improve GIO support
    MIPS: Octeon: Add twsi interrupt initialization for OCTEON 3XXX, 5XXX, 63XX
    DEC: Document the R4k MB ASIC mini interrupt controller
    DEC: Add self as the maintainer
    MIPS: Add microMIPS MSA support.
    MIPS: Replace calls to obsolete strict_strto call with kstrto* equivalents.
    MIPS: Replace obsolete strict_strto call with kstrto
    MIPS: BFP: Simplify code slightly.
    MIPS: Call find_vma with the mmap_sem held
    MIPS: Fix 'write_msa_##' inline macro.
    MIPS: Fix MSA toolchain support detection.
    mips: Update the email address of Geert Uytterhoeven
    MIPS: Add minimal defconfig for mips_paravirt
    MIPS: Enable build for new system 'paravirt'
    MIPS: paravirt: Add pci controller for virtio
    MIPS: Add code for new system 'paravirt'
    MIPS: Add functions for hypervisor call
    MIPS: OCTEON: Add OCTEON3 to __get_cpu_type
    MIPS: Add function get_ebase_cpunum
    MIPS: Add minimal support for OCTEON3 to c-r4k.c
    ...

    Linus Torvalds
     

28 May, 2014

1 commit


07 May, 2014

1 commit


01 May, 2014

1 commit

  • Since both cpuidle_enabled() and cpuidle_select() are only called by
    cpuidle_idle_call(), it is not really useful to keep them separate
    and combining them will help to avoid complicating cpuidle_idle_call()
    even further if governors are changed to return error codes sometimes.

    This code modification shouldn't lead to any functional changes.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

11 Mar, 2014

3 commits

  • Now that we have the main cpuidle function in idle.c, move some code from
    the idle mainloop to this function for the sake of clarity.

    That removes if then else indentation difficult to follow when looking at the
    code. This patch does not change the current behavior.

    Signed-off-by: Daniel Lezcano
    Acked-by: Nicolas Pitre
    Signed-off-by: Peter Zijlstra
    Cc: tglx@linutronix.de
    Cc: rjw@rjwysocki.net
    Cc: preeti@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1393832934-11625-3-git-send-email-daniel.lezcano@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Lezcano
     
  • The cpuidle_idle_call does nothing more than calling the three individuals
    function and is no longer used by any arch specific code but only in the
    cpuidle framework code.

    We can move this function into the idle task code to ensure better
    proximity to the scheduler code.

    Signed-off-by: Daniel Lezcano
    Acked-by: Nicolas Pitre
    Signed-off-by: Peter Zijlstra
    Cc: rjw@rjwysocki.net
    Cc: preeti@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1393832934-11625-2-git-send-email-daniel.lezcano@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Lezcano
     
  • In order to allow better integration between the cpuidle framework and the
    scheduler, reducing the distance between these two sub-components will
    facilitate this integration by moving part of the cpuidle code in the idle
    task file and, because idle.c is in the sched directory, we have access to
    the scheduler's private structures.

    This patch splits the cpuidle_idle_call main entry function into 3 calls
    to a newly added API:

    1. select the idle state
    2. enter the idle state
    3. reflect the idle state

    The cpuidle_idle_call calls these three functions to implement the main
    idle entry function.

    Signed-off-by: Daniel Lezcano
    Acked-by: Nicolas Pitre
    Signed-off-by: Peter Zijlstra
    Cc: rjw@rjwysocki.net
    Cc: preeti@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1393832934-11625-1-git-send-email-daniel.lezcano@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Lezcano
     

30 Oct, 2013

2 commits


15 Jul, 2013

2 commits

  • Add missing forward declarations of struct cpuidle_state_kobj and
    struct cpuidle_driver_kobj in cpuidle.h.

    [rjw: Changelog]
    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • The cpuidle sysfs code is designed to have a single instance of per
    CPU cpuidle directory. It is not possible to remove the sysfs entry
    and create it again. This is not a problem with the current code but
    future changes will add CPU hotplug support to enable/disable the
    device, so it will need to remove the sysfs entry like other
    subsystems do. That won't be possible without this change, because
    the kobj is a static object which can't be reused for
    kobj_init_and_add().

    Add cpuidle_device_kobj to be allocated dynamically when
    adding/removing a sysfs entry which is consistent with the other
    cpuidle's sysfs entries.

    An added benefit is that the sysfs code is now more self-contained
    and the includes needed for sysfs can be moved from cpuidle.h
    directly into sysfs.c so as to reduce the total number of headers
    dragged along with cpuidle.h.

    [rjw: Changelog]
    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

11 Jun, 2013

1 commit

  • Commit bf4d1b5 (cpuidle: support multiple drivers) introduced support
    for using multiple cpuidle drivers at the same time. It added a
    couple of new APIs to register the driver per CPU, but that led to
    some unnecessary code complexity related to the kernel config options
    deciding whether or not the multiple driver support is enabled. The
    code has to work as it did before when the multiple driver support is
    not enabled and the multiple driver support has to be compatible with
    the previously existing API.

    Remove the new API, not used by any driver in the tree yet (but
    needed for the HMP cpuidle drivers that will be submitted soon), and
    add a new cpumask pointer to the cpuidle driver structure that will
    point to the mask of CPUs handled by the given driver. That will
    allow the cpuidle_[un]register_driver() API to be used for the
    multiple driver support along with the cpuidle_[un]register()
    functions added recently.

    [rjw: Changelog]
    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

12 May, 2013

1 commit


23 Apr, 2013

2 commits

  • The usual scheme to initialize a cpuidle driver on a SMP is:

    cpuidle_register_driver(drv);
    for_each_possible_cpu(cpu) {
    device = &per_cpu(cpuidle_dev, cpu);
    cpuidle_register_device(device);
    }

    This code is duplicated in each cpuidle driver.

    On UP systems, it is done this way:

    cpuidle_register_driver(drv);
    device = &per_cpu(cpuidle_dev, cpu);
    cpuidle_register_device(device);

    On UP, the macro 'for_each_cpu' does one iteration:

    #define for_each_cpu(cpu, mask) \
    for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

    Hence, the initialization loop is the same for UP than SMP.

    Beside, we saw different bugs / mis-initialization / return code unchecked in
    the different drivers, the code is duplicated including bugs. After fixing all
    these ones, it appears the initialization pattern is the same for everyone.

    Please note, some drivers are doing dev->state_count = drv->state_count. This is
    not necessary because it is done by the cpuidle_enable_device function in the
    cpuidle framework. This is true, until you have the same states for all your
    devices. Otherwise, the 'low level' API should be used instead with the specific
    initialization for the driver.

    Let's add a wrapper function doing this initialization with a cpumask parameter
    for the coupled idle states and use it for all the drivers.

    That will save a lot of LOC, consolidate the code, and the modifications in the
    future could be done in a single place. Another benefit is the consolidation of
    the cpuidle_device variable which is now in the cpuidle framework and no longer
    spread accross the different arch specific drivers.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • The en_core_tk_irqen flag is set in all the cpuidle driver which
    means it is not necessary to specify this flag.

    Remove the flag and the code related to it.

    Signed-off-by: Daniel Lezcano
    Acked-by: Kevin Hilman # for mach-omap2/*
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

18 Apr, 2013

1 commit


01 Apr, 2013

2 commits

  • The commit 89878baa73f0f1c679355006bd8632e5d78f96c2 introduced
    the CPUIDLE_FLAG_TIMER_STOP flag where we specify a specific idle
    state stops the local timer.

    Now use this flag to check at init time if one state will need
    the broadcast timer and, in this case, setup the broadcast timer
    framework. That prevents multiple code duplication in the drivers.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • When a cpu enters a deep idle state, the local timers are stopped and
    the time framework falls back to the timer device used as a broadcast
    timer.

    The different cpuidle drivers are calling clockevents_notify ENTER/EXIT
    when the idle state stops the local timer.

    Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle
    drivers. If the flag is set, the cpuidle core code takes care of the
    notification on behalf of the driver to avoid pointless code duplication.

    Signed-off-by: Daniel Lezcano
    Reviewed-by: Thomas Gleixner
    Acked-by: Santosh Shilimkar
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

12 Feb, 2013

1 commit


15 Jan, 2013

1 commit

  • We realized that the power usage field is never filled and when it
    is filled for tegra, the power_specified flag is not set causing all
    of these values to be reset when the driver is initialized with
    set_power_state().

    However, the power_specified flag can be simply removed under the
    assumption that the states are always backward sorted, which is the
    case with the current code.

    This change allows the menu governor select function and the
    cpuidle_play_dead() to be simplified. Moreover, the
    set_power_states() function can removed as it does not make sense
    any more.

    Drop the power_specified flag from struct cpuidle_driver and make
    the related changes as described above.

    As a consequence, this also fixes the bug where on the dynamic
    C-states system, the power fields are not initialized.

    [rjw: Changelog]
    References: https://bugzilla.kernel.org/show_bug.cgi?id=42870
    References: https://bugzilla.kernel.org/show_bug.cgi?id=43349
    References: https://lkml.org/lkml/2012/10/16/518
    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

15 Nov, 2012

3 commits

  • With the tegra3 and the big.LITTLE [1] new architectures, several cpus
    with different characteristics (latencies and states) can co-exists on the
    system.

    The cpuidle framework has the limitation of handling only identical cpus.

    This patch removes this limitation by introducing the multiple driver support
    for cpuidle.

    This option is configurable at compile time and should be enabled for the
    architectures mentioned above. So there is no impact for the other platforms
    if the option is disabled. The option defaults to 'n'. Note the multiple drivers
    support is also compatible with the existing drivers, even if just one driver is
    needed, all the cpu will be tied to this driver using an extra small chunk of
    processor memory.

    The multiple driver support use a per-cpu driver pointer instead of a global
    variable and the accessor to this variable are done from a cpu context.

    In order to keep the compatibility with the existing drivers, the function
    'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
    the specified driver for all the cpus.

    The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
    remains the same except the driver name will be related to the current cpu.

    The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
    allowing to read the per cpu driver name.

    [1] http://lwn.net/Articles/481055/

    Signed-off-by: Daniel Lezcano
    Acked-by: Peter De Schrijver
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • We want to support different cpuidle drivers co-existing together.
    In this case we should move the refcount to the cpuidle_driver
    structure to handle several drivers at a time.

    Signed-off-by: Daniel Lezcano
    Acked-by: Peter De Schrijver
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • The structure cpuidle_state_kobj is not used anywhere except
    in the sysfs.c file. The definition of this structure is not
    needed in the cpuidle header file. This patch moves it to the
    sysfs.c file in order to encapsulate the code a bit more.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

23 Aug, 2012

1 commit

  • The new omap4 cpuidle implementation currently requires
    ARCH_NEEDS_CPU_IDLE_COUPLED, which only works on SMP.

    This patch makes it possible to build a non-SMP kernel
    for that platform. This is not normally desired for
    end-users but can be useful for testing.

    Without this patch, building rand-0y2jSKT results in:

    drivers/cpuidle/coupled.c: In function 'cpuidle_coupled_poke':
    drivers/cpuidle/coupled.c:317:3: error: implicit declaration of function '__smp_call_function_single' [-Werror=implicit-function-declaration]

    It's not clear if this patch is the best solution for
    the problem at hand. I have made sure that we can now
    build the kernel in all configurations, but that does
    not mean it will actually work on an OMAP44xx.

    Signed-off-by: Arnd Bergmann
    Acked-by: Santosh Shilimkar
    Tested-by: Santosh Shilimkar
    Cc: Kevin Hilman
    Cc: Tony Lindgren

    Arnd Bergmann
     

27 Jul, 2012

1 commit

  • Pull ACPI & power management update from Len Brown:
    "Re-write of the turbostat tool.
    lower overhead was necessary for measuring very large system when
    they are very idle.

    IVB support in intel_idle
    It's what I run on my IVB, others should be able to also:-)

    ACPICA core update
    We have found some bugs due to divergence between Linux and the
    upstream ACPICA base. Most of these patches are to reduce that
    divergence to reduce the risk of future bugs.

    Some cpuidle updates, mostly for non-Intel
    More will be coming, as they depend on this part.

    Some thermal management changes needed by non-ACPI systems.

    Some _OST (OS Status Indication) updates for hot ACPI hot-plug."

    * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits)
    Thermal: Documentation update
    Thermal: Add Hysteresis attributes
    Thermal: Make Thermal trip points writeable
    ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check
    tools/power: turbostat: fix large c1% issue
    tools/power: turbostat v2 - re-write for efficiency
    ACPICA: Update to version 20120711
    ACPICA: AcpiSrc: Fix some translation issues for Linux conversion
    ACPICA: Update header files copyrights to 2012
    ACPICA: Add new ACPI table load/unload external interfaces
    ACPICA: Split file: tbxface.c -> tbxfload.c
    ACPICA: Add PCC address space to space ID decode function
    ACPICA: Fix some comment fields
    ACPICA: Table manager: deploy new firmware error/warning interfaces
    ACPICA: Add new interfaces for BIOS(firmware) errors and warnings
    ACPICA: Split exception code utilities to a new file, utexcep.c
    ACPI: acpi_pad: tune round_robin_time
    ACPICA: Update to version 20120620
    ACPICA: Add support for implicit notify on multiple devices
    ACPICA: Update comments; no functional change
    ...

    Linus Torvalds
     

19 Jul, 2012

1 commit

  • * pm-domains:
    PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset
    PM / Domains: Replace plain integer with NULL pointer in domain.c file
    PM / Domains: Add missing static storage class specifier in domain.c file
    PM / Domains: Allow device callbacks to be added at any time
    PM / Domains: Add device domain data reference counter
    PM / Domains: Add preliminary support for cpuidle, v2
    PM / Domains: Do not stop devices after restoring their states
    PM / Domains: Use subsystem runtime suspend/resume callbacks by default

    Rafael J. Wysocki
     

11 Jul, 2012

1 commit

  • On certain bios, resume hangs if cpus are allowed to enter idle states
    during suspend [1].

    This was fixed in apci idle driver [2].But intel_idle driver does not
    have this fix. Thus instead of replicating the fix in both the idle
    drivers, or in more platform specific idle drivers if needed, the
    more general cpuidle infrastructure could handle this.

    A suspend callback in cpuidle_driver could handle this fix. But
    a cpuidle_driver provides only basic functionalities like platform idle
    state detection capability and mechanisms to support entry and exit
    into CPU idle states. All other cpuidle functions are found in the
    cpuidle generic infrastructure for good reason that all cpuidle
    drivers, irrepective of their platforms will support these functions.

    One option therefore would be to register a suspend callback in cpuidle
    which handles this fix. This could be called through a PM_SUSPEND_PREPARE
    notifier. But this is too generic a notfier for a driver to handle.

    Also, ideally the job of cpuidle is not to handle side effects of suspend.
    It should expose the interfaces which "handle cpuidle 'during' suspend"
    or any other operation, which the subsystems call during that respective
    operation.

    The fix demands that during suspend, no cpus should be allowed to enter
    deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
    ensures that. Not just that it also kicks all the cpus which are already
    in idle out of their idle states which was being done during cpu hotplug
    through a CPU_DYING_FROZEN callbacks.

    Now the question arises about when during suspend should
    cpuidle_uninstall_idle_handler() be called. Since we are dealing with
    drivers it seems best to call this function during dpm_suspend().
    Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
    before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
    operations. In dpm_suspend_noirq(), it would be wise to place this call
    before suspend_device_irqs() to avoid ugly interactions with the same.

    Ananlogously, during resume.

    References:
    [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
    [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2

    Reported-and-tested-by: Dave Hansen
    Signed-off-by: Preeti U Murthy
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Preeti U Murthy
     

06 Jul, 2012

1 commit

  • When the system is booted with some cpus offline, the idle
    driver is not initialized. When a cpu is set online, the
    acpi code call the intel idle init function. Unfortunately
    this code introduce a dependency between intel_idle and acpi.

    This patch is intended to remove this dependency by using the
    notifier of intel_idle. This patch has the benefit of
    encapsulating the intel_idle driver and remove some exported
    functions.

    Signed-off-by: Daniel Lezcano
    Acked-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

04 Jul, 2012

3 commits

  • On some systems there are CPU cores located in the same power
    domains as I/O devices. Then, power can only be removed from the
    domain if all I/O devices in it are not in use and the CPU core
    is idle. Add preliminary support for that to the generic PM domains
    framework.

    First, the platform is expected to provide a cpuidle driver with one
    extra state designated for use with the generic PM domains code.
    This state should be initially disabled and its exit_latency value
    should be set to whatever time is needed to bring up the CPU core
    itself after restoring power to it, not including the domain's
    power on latency. Its .enter() callback should point to a procedure
    that will remove power from the domain containing the CPU core at
    the end of the CPU power transition.

    The remaining characteristics of the extra cpuidle state, referred to
    as the "domain" cpuidle state below, (e.g. power usage, target
    residency) should be populated in accordance with the properties of
    the hardware.

    Next, the platform should execute genpd_attach_cpuidle() on the PM
    domain containing the CPU core. That will cause the generic PM
    domains framework to treat that domain in a special way such that:

    * When all devices in the domain have been suspended and it is about
    to be turned off, the states of the devices will be saved, but
    power will not be removed from the domain. Instead, the "domain"
    cpuidle state will be enabled so that power can be removed from
    the domain when the CPU core is idle and the state has been chosen
    as the target by the cpuidle governor.

    * When the first I/O device in the domain is resumed and
    __pm_genpd_poweron(() is called for the first time after
    power has been removed from the domain, the "domain" cpuidle
    state will be disabled to avoid subsequent surprise power removals
    via cpuidle.

    The effective exit_latency value of the "domain" cpuidle state
    depends on the time needed to bring up the CPU core itself after
    restoring power to it as well as on the power on latency of the
    domain containing the CPU core. Thus the "domain" cpuidle state's
    exit_latency has to be recomputed every time the domain's power on
    latency is updated, which may happen every time power is restored
    to the domain, if the measured power on latency is greater than
    the latency stored in the corresponding generic_pm_domain structure.

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Kevin Hilman

    Rafael J. Wysocki
     
  • Add a reference counter for the cpuidle driver, so that it can't
    be unregistered when it is in use.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Andrew J.Schorr raises a question. When he changes the disable setting on
    a single CPU, it affects all the other CPUs. Basically, currently, the
    disable field is per-driver instead of per-cpu. All the C states of the
    same driver are shared by all CPU in the same machine.

    The patch changes the `disable' field to per-cpu, so we could set this
    separately for each cpu.

    Signed-off-by: ShuoX Liu
    Reported-by: Andrew J.Schorr
    Reviewed-by: Yanmin Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Rafael J. Wysocki

    ShuoX Liu
     

02 Jun, 2012

2 commits

  • Adds cpuidle_coupled_parallel_barrier, which can be used by coupled
    cpuidle state enter functions to handle resynchronization after
    determining if any cpu needs to abort. The normal use case will
    be:

    static bool abort_flag;
    static atomic_t abort_barrier;

    int arch_cpuidle_enter(struct cpuidle_device *dev, ...)
    {
    if (arch_turn_off_irq_controller()) {
    /* returns an error if an irq is pending and would be lost
    if idle continued and turned off power */
    abort_flag = true;
    }

    cpuidle_coupled_parallel_barrier(dev, &abort_barrier);

    if (abort_flag) {
    /* One of the cpus didn't turn off it's irq controller */
    arch_turn_on_irq_controller();
    return -EINTR;
    }

    /* continue with idle */
    ...
    }

    This will cause all cpus to abort idle together if one of them needs
    to abort.

    Reviewed-by: Santosh Shilimkar
    Tested-by: Santosh Shilimkar
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Colin Cross
    Signed-off-by: Len Brown

    Colin Cross
     
  • On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
    cpus cannot be independently powered down, either due to
    sequencing restrictions (on Tegra 2, cpu 0 must be the last to
    power down), or due to HW bugs (on OMAP4460, a cpu powering up
    will corrupt the gic state unless the other cpu runs a work
    around). Each cpu has a power state that it can enter without
    coordinating with the other cpu (usually Wait For Interrupt, or
    WFI), and one or more "coupled" power states that affect blocks
    shared between the cpus (L2 cache, interrupt controller, and
    sometimes the whole SoC). Entering a coupled power state must
    be tightly controlled on both cpus.

    The easiest solution to implementing coupled cpu power states is
    to hotplug all but one cpu whenever possible, usually using a
    cpufreq governor that looks at cpu load to determine when to
    enable the secondary cpus. This causes problems, as hotplug is an
    expensive operation, so the number of hotplug transitions must be
    minimized, leading to very slow response to loads, often on the
    order of seconds.

    This file implements an alternative solution, where each cpu will
    wait in the WFI state until all cpus are ready to enter a coupled
    state, at which point the coupled state function will be called
    on all cpus at approximately the same time.

    Once all cpus are ready to enter idle, they are woken by an smp
    cross call. At this point, there is a chance that one of the
    cpus will find work to do, and choose not to enter idle. A
    final pass is needed to guarantee that all cpus will call the
    power state enter function at the same time. During this pass,
    each cpu will increment the ready counter, and continue once the
    ready counter matches the number of online coupled cpus. If any
    cpu exits idle, the other cpus will decrement their counter and
    retry.

    To use coupled cpuidle states, a cpuidle driver must:

    Set struct cpuidle_device.coupled_cpus to the mask of all
    coupled cpus, usually the same as cpu_possible_mask if all cpus
    are part of the same cluster. The coupled_cpus mask must be
    set in the struct cpuidle_device for each cpu.

    Set struct cpuidle_device.safe_state to a state that is not a
    coupled state. This is usually WFI.

    Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
    state that affects multiple cpus.

    Provide a struct cpuidle_state.enter function for each state
    that affects multiple cpus. This function is guaranteed to be
    called on all cpus at approximately the same time. The driver
    should ensure that the cpus all abort together if any cpu tries
    to abort once the function is called.

    update1:

    cpuidle: coupled: fix count of online cpus

    online_count was never incremented on boot, and was also counting
    cpus that were not part of the coupled set. Fix both issues by
    introducting a new function that counts online coupled cpus, and
    call it from register as well as the hotplug notifier.

    update2:

    cpuidle: coupled: fix decrementing ready count

    cpuidle_coupled_set_not_ready sometimes refuses to decrement the
    ready count in order to prevent a race condition. This makes it
    unsuitable for use when finished with idle. Add a new function
    cpuidle_coupled_set_done that decrements both the ready count and
    waiting count, and call it after idle is complete.

    Cc: Amit Kucheria
    Cc: Arjan van de Ven
    Cc: Trinabh Gupta
    Cc: Deepthi Dharwar
    Reviewed-by: Santosh Shilimkar
    Tested-by: Santosh Shilimkar
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Colin Cross
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Colin Cross
     

30 Mar, 2012

1 commit