11 Jun, 2013

1 commit

  • Commit bf4d1b5 (cpuidle: support multiple drivers) introduced support
    for using multiple cpuidle drivers at the same time. It added a
    couple of new APIs to register the driver per CPU, but that led to
    some unnecessary code complexity related to the kernel config options
    deciding whether or not the multiple driver support is enabled. The
    code has to work as it did before when the multiple driver support is
    not enabled and the multiple driver support has to be compatible with
    the previously existing API.

    Remove the new API, not used by any driver in the tree yet (but
    needed for the HMP cpuidle drivers that will be submitted soon), and
    add a new cpumask pointer to the cpuidle driver structure that will
    point to the mask of CPUs handled by the given driver. That will
    allow the cpuidle_[un]register_driver() API to be used for the
    multiple driver support along with the cpuidle_[un]register()
    functions added recently.

    [rjw: Changelog]
    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

24 Apr, 2013

1 commit


23 Apr, 2013

2 commits

  • The usual scheme to initialize a cpuidle driver on a SMP is:

    cpuidle_register_driver(drv);
    for_each_possible_cpu(cpu) {
    device = &per_cpu(cpuidle_dev, cpu);
    cpuidle_register_device(device);
    }

    This code is duplicated in each cpuidle driver.

    On UP systems, it is done this way:

    cpuidle_register_driver(drv);
    device = &per_cpu(cpuidle_dev, cpu);
    cpuidle_register_device(device);

    On UP, the macro 'for_each_cpu' does one iteration:

    #define for_each_cpu(cpu, mask) \
    for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

    Hence, the initialization loop is the same for UP than SMP.

    Beside, we saw different bugs / mis-initialization / return code unchecked in
    the different drivers, the code is duplicated including bugs. After fixing all
    these ones, it appears the initialization pattern is the same for everyone.

    Please note, some drivers are doing dev->state_count = drv->state_count. This is
    not necessary because it is done by the cpuidle_enable_device function in the
    cpuidle framework. This is true, until you have the same states for all your
    devices. Otherwise, the 'low level' API should be used instead with the specific
    initialization for the driver.

    Let's add a wrapper function doing this initialization with a cpumask parameter
    for the coupled idle states and use it for all the drivers.

    That will save a lot of LOC, consolidate the code, and the modifications in the
    future could be done in a single place. Another benefit is the consolidation of
    the cpuidle_device variable which is now in the cpuidle framework and no longer
    spread accross the different arch specific drivers.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • The en_core_tk_irqen flag is set in all the cpuidle driver which
    means it is not necessary to specify this flag.

    Remove the flag and the code related to it.

    Signed-off-by: Daniel Lezcano
    Acked-by: Kevin Hilman # for mach-omap2/*
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

01 Apr, 2013

1 commit

  • When a cpu enters a deep idle state, the local timers are stopped and
    the time framework falls back to the timer device used as a broadcast
    timer.

    The different cpuidle drivers are calling clockevents_notify ENTER/EXIT
    when the idle state stops the local timer.

    Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle
    drivers. If the flag is set, the cpuidle core code takes care of the
    notification on behalf of the driver to avoid pointless code duplication.

    Signed-off-by: Daniel Lezcano
    Reviewed-by: Thomas Gleixner
    Acked-by: Santosh Shilimkar
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

26 Jan, 2013

1 commit

  • The text in Documentation said it would be removed in 2.6.41;
    the text in the Kconfig said removal in the 3.1 release. Either
    way you look at it, we are well past both, so push it off a cliff.

    Note that the POWER_CSTATE and the POWER_PSTATE are part of the
    legacy tracing API. Remove all tracepoints which use these flags.
    As can be seen from context, most already have a trace entry via
    trace_cpu_idle anyways.

    Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
    compared to the CSTATE ones which all have a clear start/stop.
    As part of this, the trace_power_frequency also becomes orphaned,
    so it too is deleted.

    Signed-off-by: Paul Gortmaker
    Acked-by: Steven Rostedt
    Signed-off-by: Rafael J. Wysocki

    Paul Gortmaker
     

15 Jan, 2013

1 commit

  • We realized that the power usage field is never filled and when it
    is filled for tegra, the power_specified flag is not set causing all
    of these values to be reset when the driver is initialized with
    set_power_state().

    However, the power_specified flag can be simply removed under the
    assumption that the states are always backward sorted, which is the
    case with the current code.

    This change allows the menu governor select function and the
    cpuidle_play_dead() to be simplified. Moreover, the
    set_power_states() function can removed as it does not make sense
    any more.

    Drop the power_specified flag from struct cpuidle_driver and make
    the related changes as described above.

    As a consequence, this also fixes the bug where on the dynamic
    C-states system, the power fields are not initialized.

    [rjw: Changelog]
    References: https://bugzilla.kernel.org/show_bug.cgi?id=42870
    References: https://bugzilla.kernel.org/show_bug.cgi?id=43349
    References: https://lkml.org/lkml/2012/10/16/518
    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

03 Jan, 2013

1 commit

  • Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead
    of -1) to init the local copies so that functions that tries to find
    cpuidle states with minimum power usage works correctly even if they use
    non-negative values.

    Signed-off-by: Sivaram Nair
    Reviewed-by: Rik van Riel
    Signed-off-by: Rafael J. Wysocki

    Sivaram Nair
     

27 Nov, 2012

1 commit

  • Many cpuidle drivers measure their time spent in an idle state by
    reading the wallclock time before and after idling and calculating the
    difference. This leads to erroneous results when the wallclock time gets
    updated by another processor in the meantime, adding that clock
    adjustment to the idle state's time counter.

    If the clock adjustment was negative, the result is even worse due to an
    erroneous cast from int to unsigned long long of the last_residency
    variable. The negative 32 bit integer will zero-extend and result in a
    forward time jump of roughly four billion milliseconds or 1.3 hours on
    the idle state residency counter.

    This patch changes all affected cpuidle drivers to either use the
    monotonic clock for their measurements or make use of the generic time
    measurement wrapper in cpuidle.c, which was already working correctly.
    Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
    should always already be disabled before entering the idle function, and
    not get reenabled until the generic wrapper has performed its second
    measurement). It also removes the erroneous cast, making sure that
    negative residency values are applied correctly even though they should
    not appear anymore.

    Signed-off-by: Julius Werner
    Reviewed-by: Preeti U Murthy
    Tested-by: Daniel Lezcano
    Acked-by: Daniel Lezcano
    Acked-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Julius Werner
     

15 Nov, 2012

4 commits

  • With the tegra3 and the big.LITTLE [1] new architectures, several cpus
    with different characteristics (latencies and states) can co-exists on the
    system.

    The cpuidle framework has the limitation of handling only identical cpus.

    This patch removes this limitation by introducing the multiple driver support
    for cpuidle.

    This option is configurable at compile time and should be enabled for the
    architectures mentioned above. So there is no impact for the other platforms
    if the option is disabled. The option defaults to 'n'. Note the multiple drivers
    support is also compatible with the existing drivers, even if just one driver is
    needed, all the cpu will be tied to this driver using an extra small chunk of
    processor memory.

    The multiple driver support use a per-cpu driver pointer instead of a global
    variable and the accessor to this variable are done from a cpu context.

    In order to keep the compatibility with the existing drivers, the function
    'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
    the specified driver for all the cpus.

    The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
    remains the same except the driver name will be related to the current cpu.

    The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
    allowing to read the per cpu driver name.

    [1] http://lwn.net/Articles/481055/

    Signed-off-by: Daniel Lezcano
    Acked-by: Peter De Schrijver
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
    there is tasks request to be executed. So the idle CPU will not really enter
    the target C-state and go to run task.

    In this situation, it will use the residency of previous really entered target
    C-states. Obviously, it is not reasonable.

    So, this patch fix it by set the target C-state residency to 0.

    Signed-off-by: Rik van Riel
    Signed-off-by: Youquan Song
    Signed-off-by: Rafael J. Wysocki

    Youquan Song
     
  • Move the kobj initialization and completion in the sysfs.c
    and encapsulate the code more.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     
  • The function needs the cpuidle_device which is initially passed to the
    caller.

    The current code gets the struct device from the struct cpuidle_device,
    pass it the cpuidle_add_sysfs function. This function calls
    per_cpu(cpuidle_devices, cpu) to get the cpuidle_device.

    This patch pass the cpuidle_device instead and simplify the code.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Rafael J. Wysocki

    Daniel Lezcano
     

09 Oct, 2012

1 commit

  • On a KVM guest, when a CPU is taken offline and brought back online, we hit
    the following NULL pointer dereference:

    [ 45.400843] Unregister pv shared memory for cpu 1
    [ 45.412331] smpboot: CPU 1 is now offline
    [ 45.529894] SMP alternatives: lockdep: fixing up alternatives
    [ 45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1
    [ 45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock
    [ 45.571370] KVM setup async PF for cpu 1
    [ 45.572331] kvm-stealtime: cpu 1, msr 7d0e040
    [ 45.575031] BUG: unable to handle kernel NULL pointer dereference at (null)
    [ 45.576017] IP: [] cpuidle_disable_device+0x18/0x80
    [ 45.576017] PGD 5dfb067 PUD 5da8067 PMD 0
    [ 45.576017] Oops: 0000 [#1] SMP
    [ 45.576017] Modules linked in:
    [ 45.576017] CPU 0
    [ 45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs
    [ 45.576017] RIP: 0010:[] [] cpuidle_disable_device+0x18/0x80
    [ 45.576017] RSP: 0018:ffff880005d93ce8 EFLAGS: 00010286
    [ 45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006
    [ 45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000
    [ 45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222
    [ 45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [ 45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001
    [ 45.576017] FS: 00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
    [ 45.576017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0
    [ 45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40)
    [ 45.576017] Stack:
    [ 45.576017] ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689
    [ 45.576017] ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01
    [ 45.576017] ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd
    [ 45.576017] Call Trace:
    [ 45.576017] [] acpi_processor_hotplug+0x55/0x97
    [ 45.576017] [] acpi_cpu_soft_notify+0x93/0xce
    [ 45.576017] [] notifier_call_chain+0x5d/0x110
    [ 45.576017] [] __raw_notifier_call_chain+0xe/0x10
    [ 45.576017] [] __cpu_notify+0x20/0x40
    [ 45.576017] [] cpu_notify+0x15/0x20
    [ 45.576017] [] _cpu_up+0xee/0x137
    [ 45.576017] [] cpu_up+0x49/0x59
    [ 45.576017] [] store_online+0x9d/0xe0
    [ 45.576017] [] dev_attr_store+0x18/0x30
    [ 45.576017] [] sysfs_write_file+0xe0/0x150
    [ 45.576017] [] vfs_write+0xac/0x180
    [ 45.576017] [] sys_write+0x52/0xa0
    [ 45.576017] [] system_call_fastpath+0x16/0x1b
    [ 45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00
    [ 45.576017] RIP [] cpuidle_disable_device+0x18/0x80
    [ 45.576017] RSP
    [ 45.576017] CR2: 0000000000000000
    [ 45.656079] ---[ end trace 433d6c9ac0b02cef ]---

    Analysis:
    Commit 3d339dc (cpuidle / ACPI : move cpuidle_device field out of the
    acpi_processor_power structure()) made the allocation of the dev structure
    (struct cpuidle) of a CPU dynamic, whereas previously it was statically
    allocated. And this dynamic allocation occurs in acpi_processor_power_init()
    if pr->flags.power evaluates to non-zero.

    On KVM guests, pr->flags.power evaluates to zero, hence dev is never
    allocated. This causes the NULL pointer (dev) dereference in
    cpuidle_disable_device() during a subsequent CPU online operation. Fix this
    by ensuring that dev is non-NULL before dereferencing.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Len Brown

    Srivatsa S. Bhat
     

27 Jul, 2012

1 commit

  • Pull ACPI & power management update from Len Brown:
    "Re-write of the turbostat tool.
    lower overhead was necessary for measuring very large system when
    they are very idle.

    IVB support in intel_idle
    It's what I run on my IVB, others should be able to also:-)

    ACPICA core update
    We have found some bugs due to divergence between Linux and the
    upstream ACPICA base. Most of these patches are to reduce that
    divergence to reduce the risk of future bugs.

    Some cpuidle updates, mostly for non-Intel
    More will be coming, as they depend on this part.

    Some thermal management changes needed by non-ACPI systems.

    Some _OST (OS Status Indication) updates for hot ACPI hot-plug."

    * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits)
    Thermal: Documentation update
    Thermal: Add Hysteresis attributes
    Thermal: Make Thermal trip points writeable
    ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check
    tools/power: turbostat: fix large c1% issue
    tools/power: turbostat v2 - re-write for efficiency
    ACPICA: Update to version 20120711
    ACPICA: AcpiSrc: Fix some translation issues for Linux conversion
    ACPICA: Update header files copyrights to 2012
    ACPICA: Add new ACPI table load/unload external interfaces
    ACPICA: Split file: tbxface.c -> tbxfload.c
    ACPICA: Add PCC address space to space ID decode function
    ACPICA: Fix some comment fields
    ACPICA: Table manager: deploy new firmware error/warning interfaces
    ACPICA: Add new interfaces for BIOS(firmware) errors and warnings
    ACPICA: Split exception code utilities to a new file, utexcep.c
    ACPI: acpi_pad: tune round_robin_time
    ACPICA: Update to version 20120620
    ACPICA: Add support for implicit notify on multiple devices
    ACPICA: Update comments; no functional change
    ...

    Linus Torvalds
     

26 Jul, 2012

1 commit


19 Jul, 2012

1 commit

  • * pm-domains:
    PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset
    PM / Domains: Replace plain integer with NULL pointer in domain.c file
    PM / Domains: Add missing static storage class specifier in domain.c file
    PM / Domains: Allow device callbacks to be added at any time
    PM / Domains: Add device domain data reference counter
    PM / Domains: Add preliminary support for cpuidle, v2
    PM / Domains: Do not stop devices after restoring their states
    PM / Domains: Use subsystem runtime suspend/resume callbacks by default

    Rafael J. Wysocki
     

11 Jul, 2012

1 commit

  • On certain bios, resume hangs if cpus are allowed to enter idle states
    during suspend [1].

    This was fixed in apci idle driver [2].But intel_idle driver does not
    have this fix. Thus instead of replicating the fix in both the idle
    drivers, or in more platform specific idle drivers if needed, the
    more general cpuidle infrastructure could handle this.

    A suspend callback in cpuidle_driver could handle this fix. But
    a cpuidle_driver provides only basic functionalities like platform idle
    state detection capability and mechanisms to support entry and exit
    into CPU idle states. All other cpuidle functions are found in the
    cpuidle generic infrastructure for good reason that all cpuidle
    drivers, irrepective of their platforms will support these functions.

    One option therefore would be to register a suspend callback in cpuidle
    which handles this fix. This could be called through a PM_SUSPEND_PREPARE
    notifier. But this is too generic a notfier for a driver to handle.

    Also, ideally the job of cpuidle is not to handle side effects of suspend.
    It should expose the interfaces which "handle cpuidle 'during' suspend"
    or any other operation, which the subsystems call during that respective
    operation.

    The fix demands that during suspend, no cpus should be allowed to enter
    deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
    ensures that. Not just that it also kicks all the cpus which are already
    in idle out of their idle states which was being done during cpu hotplug
    through a CPU_DYING_FROZEN callbacks.

    Now the question arises about when during suspend should
    cpuidle_uninstall_idle_handler() be called. Since we are dealing with
    drivers it seems best to call this function during dpm_suspend().
    Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
    before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
    operations. In dpm_suspend_noirq(), it would be wise to place this call
    before suspend_device_irqs() to avoid ugly interactions with the same.

    Ananlogously, during resume.

    References:
    [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
    [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2

    Reported-and-tested-by: Dave Hansen
    Signed-off-by: Preeti U Murthy
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Preeti U Murthy
     

04 Jul, 2012

2 commits

  • On some systems there are CPU cores located in the same power
    domains as I/O devices. Then, power can only be removed from the
    domain if all I/O devices in it are not in use and the CPU core
    is idle. Add preliminary support for that to the generic PM domains
    framework.

    First, the platform is expected to provide a cpuidle driver with one
    extra state designated for use with the generic PM domains code.
    This state should be initially disabled and its exit_latency value
    should be set to whatever time is needed to bring up the CPU core
    itself after restoring power to it, not including the domain's
    power on latency. Its .enter() callback should point to a procedure
    that will remove power from the domain containing the CPU core at
    the end of the CPU power transition.

    The remaining characteristics of the extra cpuidle state, referred to
    as the "domain" cpuidle state below, (e.g. power usage, target
    residency) should be populated in accordance with the properties of
    the hardware.

    Next, the platform should execute genpd_attach_cpuidle() on the PM
    domain containing the CPU core. That will cause the generic PM
    domains framework to treat that domain in a special way such that:

    * When all devices in the domain have been suspended and it is about
    to be turned off, the states of the devices will be saved, but
    power will not be removed from the domain. Instead, the "domain"
    cpuidle state will be enabled so that power can be removed from
    the domain when the CPU core is idle and the state has been chosen
    as the target by the cpuidle governor.

    * When the first I/O device in the domain is resumed and
    __pm_genpd_poweron(() is called for the first time after
    power has been removed from the domain, the "domain" cpuidle
    state will be disabled to avoid subsequent surprise power removals
    via cpuidle.

    The effective exit_latency value of the "domain" cpuidle state
    depends on the time needed to bring up the CPU core itself after
    restoring power to it as well as on the power on latency of the
    domain containing the CPU core. Thus the "domain" cpuidle state's
    exit_latency has to be recomputed every time the domain's power on
    latency is updated, which may happen every time power is restored
    to the domain, if the measured power on latency is greater than
    the latency stored in the corresponding generic_pm_domain structure.

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Kevin Hilman

    Rafael J. Wysocki
     
  • Andrew J.Schorr raises a question. When he changes the disable setting on
    a single CPU, it affects all the other CPUs. Basically, currently, the
    disable field is per-driver instead of per-cpu. All the C states of the
    same driver are shared by all CPU in the same machine.

    The patch changes the `disable' field to per-cpu, so we could set this
    separately for each cpu.

    Signed-off-by: ShuoX Liu
    Reported-by: Andrew J.Schorr
    Reviewed-by: Yanmin Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Rafael J. Wysocki

    ShuoX Liu
     

02 Jun, 2012

5 commits

  • On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
    cpus cannot be independently powered down, either due to
    sequencing restrictions (on Tegra 2, cpu 0 must be the last to
    power down), or due to HW bugs (on OMAP4460, a cpu powering up
    will corrupt the gic state unless the other cpu runs a work
    around). Each cpu has a power state that it can enter without
    coordinating with the other cpu (usually Wait For Interrupt, or
    WFI), and one or more "coupled" power states that affect blocks
    shared between the cpus (L2 cache, interrupt controller, and
    sometimes the whole SoC). Entering a coupled power state must
    be tightly controlled on both cpus.

    The easiest solution to implementing coupled cpu power states is
    to hotplug all but one cpu whenever possible, usually using a
    cpufreq governor that looks at cpu load to determine when to
    enable the secondary cpus. This causes problems, as hotplug is an
    expensive operation, so the number of hotplug transitions must be
    minimized, leading to very slow response to loads, often on the
    order of seconds.

    This file implements an alternative solution, where each cpu will
    wait in the WFI state until all cpus are ready to enter a coupled
    state, at which point the coupled state function will be called
    on all cpus at approximately the same time.

    Once all cpus are ready to enter idle, they are woken by an smp
    cross call. At this point, there is a chance that one of the
    cpus will find work to do, and choose not to enter idle. A
    final pass is needed to guarantee that all cpus will call the
    power state enter function at the same time. During this pass,
    each cpu will increment the ready counter, and continue once the
    ready counter matches the number of online coupled cpus. If any
    cpu exits idle, the other cpus will decrement their counter and
    retry.

    To use coupled cpuidle states, a cpuidle driver must:

    Set struct cpuidle_device.coupled_cpus to the mask of all
    coupled cpus, usually the same as cpu_possible_mask if all cpus
    are part of the same cluster. The coupled_cpus mask must be
    set in the struct cpuidle_device for each cpu.

    Set struct cpuidle_device.safe_state to a state that is not a
    coupled state. This is usually WFI.

    Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
    state that affects multiple cpus.

    Provide a struct cpuidle_state.enter function for each state
    that affects multiple cpus. This function is guaranteed to be
    called on all cpus at approximately the same time. The driver
    should ensure that the cpus all abort together if any cpu tries
    to abort once the function is called.

    update1:

    cpuidle: coupled: fix count of online cpus

    online_count was never incremented on boot, and was also counting
    cpus that were not part of the coupled set. Fix both issues by
    introducting a new function that counts online coupled cpus, and
    call it from register as well as the hotplug notifier.

    update2:

    cpuidle: coupled: fix decrementing ready count

    cpuidle_coupled_set_not_ready sometimes refuses to decrement the
    ready count in order to prevent a race condition. This makes it
    unsuitable for use when finished with idle. Add a new function
    cpuidle_coupled_set_done that decrements both the ready count and
    waiting count, and call it after idle is complete.

    Cc: Amit Kucheria
    Cc: Arjan van de Ven
    Cc: Trinabh Gupta
    Cc: Deepthi Dharwar
    Reviewed-by: Santosh Shilimkar
    Tested-by: Santosh Shilimkar
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Colin Cross
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Colin Cross
     
  • Fix the error handling in __cpuidle_register_device to include
    the missing list_del. Move it to a label, which will simplify
    the error handling when coupled states are added.

    Reviewed-by: Santosh Shilimkar
    Tested-by: Santosh Shilimkar
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Colin Cross
    Reviewed-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Colin Cross
     
  • Split the code to enter a state and update the stats into a helper
    function, cpuidle_enter_state, and export it. This function will
    be called by the coupled state code to handle entering the safe
    state and the final coupled state.

    Reviewed-by: Santosh Shilimkar
    Tested-by: Santosh Shilimkar
    Reviewed-by: Kevin Hilman
    Tested-by: Kevin Hilman
    Signed-off-by: Colin Cross
    Reviewed-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Colin Cross
     
  • The existing check for dev == NULL in __cpuidle_register_device() is
    rendered useless because dev is dereferenced before the check itself.
    Moreover, correctly speaking, it is the job of the callers of this
    function, i.e., cpuidle_register_device() & cpuidle_enable_device() (which
    also happen to be exported functions) to ensure that
    __cpuidle_register_device() is called with a non-NULL dev.

    So add the necessary dev == NULL checks in the two callers and remove the
    (useless) check from __cpuidle_register_device().

    Signed-off-by: Srivatsa S. Bhat
    Acked-by: Daniel Lezcano
    Signed-off-by: Andrew Morton
    Signed-off-by: Len Brown

    Srivatsa S. Bhat
     
  • commit 9a6558371bcd01c2973b7638181db4ccc34eab4f
    Author: Arjan van de Ven
    Date: Sun Nov 9 12:45:10 2008 -0800

    regression: disable timer peek-ahead for 2.6.28

    It's showing up as regressions; disabling it very likely just papers
    over an underlying issue, but time is running out for 2.6.28, lets get
    back to this for 2.6.29

    Many years has passed since 2008, so it seems ok to remove whole `#if 0' block.

    Signed-off-by: Sergey Senozhatsky
    Cc: Kevin Hilman
    Cc: Trinabh Gupta
    Cc: Deepthi Dharwar
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Len Brown

    Sergey Senozhatsky
     

08 May, 2012

1 commit

  • kick_all_cpus_sync() is the core implementation of cpu_idle_wait()
    which is copied all over the arch code.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120507175652.119842173@linutronix.de

    Thomas Gleixner
     

07 Apr, 2012

2 commits


31 Mar, 2012

1 commit

  • Pull ACPI & Power Management changes from Len Brown:
    - ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup
    - cpuidle evolving, more ARM use
    - thermal sub-system evolving, ditto
    - assorted other PM bits

    Fix up conflicts in various cpuidle implementations due to ARM cpuidle
    cleanups (ARM at91 self-refresh and cpu idle code rewritten into
    "standby" in asm conflicting with the consolidation of cpuidle time
    keeping), trivial SH include file context conflict and RCU tracing fixes
    in generic code.

    * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits)
    ACPI throttling: fix endian bug in acpi_read_throttling_status()
    Disable MCP limit exceeded messages from Intel IPS driver
    ACPI video: Don't start video device until its associated input device has been allocated
    ACPI video: Harden video bus adding.
    ACPI: Add support for exposing BGRT data
    ACPI: export acpi_kobj
    ACPI: Fix logic for removing mappings in 'acpi_unmap'
    CPER failed to handle generic error records with multiple sections
    ACPI: Clean redundant codes in scan.c
    ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()
    ACPI: consistently use should_use_kmap()
    PNPACPI: Fix device ref leaking in acpi_pnp_match
    ACPI: Fix use-after-free in acpi_map_lsapic
    ACPI: processor_driver: add missing kfree
    ACPI, APEI: Fix incorrect APEI register bit width check and usage
    Update documentation for parameter *notrigger* in einj.txt
    ACPI, APEI, EINJ, new parameter to control trigger action
    ACPI, APEI, EINJ, limit the range of einj_param
    ACPI, APEI, Fix ERST header length check
    cpuidle: power_usage should be declared signed integer
    ...

    Linus Torvalds
     

30 Mar, 2012

3 commits

  • Currently when a CPU is off-lined it enters either MWAIT-based idle or,
    if MWAIT is not desired or supported, HLT-based idle (which places the
    processor in C1 state). This patch allows processors without MWAIT
    support to stay in states deeper than C1.

    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Len Brown

    Boris Ostrovsky
     
  • If the state_count is not initialized for the device use
    the driver's state count as the default. That will prevent
    to add it manually in the cpuidle driver initialization
    routine and will save us from duplicate line of code.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Len Brown

    Daniel Lezcano
     
  • Some C states of new CPU might be not good. One reason is BIOS might
    configure them incorrectly. To help developers root cause it quickly, the
    patch adds a new sysfs entry, so developers could disable specific C state
    manually.

    In addition, C state might have much impact on performance tuning, as it
    takes much time to enter/exit C states, which might delay interrupt
    processing. With the new debug option, developers could check if a deep C
    state could impact performance and how much impact it could cause.

    Also add this option in Documentation/cpuidle/sysfs.txt.

    [akpm@linux-foundation.org: check kstrtol return value]
    Signed-off-by: ShuoX Liu
    Reviewed-by: Yanmin Zhang
    Reviewed-and-Tested-by: Deepthi Dharwar
    Signed-off-by: Andrew Morton
    Signed-off-by: Len Brown

    ShuoX Liu
     

21 Mar, 2012

1 commit

  • Make necessary changes to implement time keeping and irq enabling
    in the core cpuidle code. This will allow the removal of these
    functionalities from various platform cpuidle implementations whose
    timekeeping and irq enabling follows the form in this common code.

    Signed-off-by: Robert Lee
    Tested-by: Jean Pihet
    Tested-by: Amit Daniel
    Tested-by: Robert Lee
    Reviewed-by: Kevin Hilman
    Reviewed-by: Daniel Lezcano
    Reviewed-by: Deepthi Dharwar
    Acked-by: Jean Pihet
    Signed-off-by: Len Brown

    Robert Lee
     

13 Feb, 2012

1 commit


22 Dec, 2011

1 commit

  • This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Userspace relies on events and generic sysfs subsystem infrastructure
    from sysdev devices, which are made available with this conversion.

    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: Tigran Aivazian
    Cc: Len Brown
    Cc: Zhang Rui
    Cc: Dave Jones
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Arjan van de Ven
    Cc: "Rafael J. Wysocki"
    Cc: "Srivatsa S. Bhat"
    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

08 Nov, 2011

1 commit

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    cpuidle: Single/Global registration of idle states
    cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
    cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
    cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
    ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
    ACPI: Export FADT pm_profile integer value to userspace
    thermal: Prevent polling from happening during system suspend
    ACPI: Drop ACPI_NO_HARDWARE_INIT
    ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
    PNPACPI: Simplify disabled resource registration
    ACPI: Fix possible recursive locking in hwregs.c
    ACPI: use kstrdup()
    mrst pmu: update comment
    tools/power turbostat: less verbose debugging

    Linus Torvalds
     

07 Nov, 2011

4 commits

  • This patch makes the cpuidle_states structure global (single copy)
    instead of per-cpu. The statistics needed on per-cpu basis
    by the governor are kept per-cpu. This simplifies the cpuidle
    subsystem as state registration is done by single cpu only.
    Having single copy of cpuidle_states saves memory. Rare case
    of asymmetric C-states can be handled within the cpuidle driver
    and architectures such as POWER do not have asymmetric C-states.

    Having single/global registration of all the idle states,
    dynamic C-state transitions on x86 are handled by
    the boot cpu. Here, the boot cpu would disable all the devices,
    re-populate the states and later enable all the devices,
    irrespective of the cpu that would receive the notification first.

    Reference:
    https://lkml.org/lkml/2011/4/25/83

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • This is the first step towards global registration of cpuidle
    states. The statistics used primarily by the governor are per-cpu
    and have to be split from rest of the fields inside cpuidle_state,
    which would be made global i.e. single copy. The driver_data field
    is also per-cpu and moved.

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • The cpuidle_device->prepare() mechanism causes updates to the
    cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
    to tell the governor not to chose a state on a per-cpu basis at
    run-time. State demotion is now handled by the driver and it returns
    the actual state entered. Hence, this mechanism is not required.
    Also this removes per-cpu flags from cpuidle_state enabling
    it to be made global.

    Reference:
    https://lkml.org/lkml/2011/3/25/52

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Acked-by: Arjan van de Ven
    Reviewed-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • Cpuidle governor only suggests the state to enter using the
    governor->select() interface, but allows the low level driver to
    override the recommended state. The actual entered state
    may be different because of software or hardware demotion. Software
    demotion is done by the back-end cpuidle driver and can be accounted
    correctly. Current cpuidle code uses last_state field to capture the
    actual state entered and based on that updates the statistics for the
    state entered.

    Ideally the driver enter routine should update the counters,
    and it should return the state actually entered rather than the time
    spent there. The generic cpuidle code should simply handle where
    the counters live in the sysfs namespace, not updating the counters.

    Reference:
    https://lkml.org/lkml/2011/3/25/52

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar