08 Nov, 2011

1 commit

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    cpuidle: Single/Global registration of idle states
    cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
    cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
    cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
    ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
    ACPI: Export FADT pm_profile integer value to userspace
    thermal: Prevent polling from happening during system suspend
    ACPI: Drop ACPI_NO_HARDWARE_INIT
    ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
    PNPACPI: Simplify disabled resource registration
    ACPI: Fix possible recursive locking in hwregs.c
    ACPI: use kstrdup()
    mrst pmu: update comment
    tools/power turbostat: less verbose debugging

    Linus Torvalds
     

07 Nov, 2011

3 commits

  • This patch makes the cpuidle_states structure global (single copy)
    instead of per-cpu. The statistics needed on per-cpu basis
    by the governor are kept per-cpu. This simplifies the cpuidle
    subsystem as state registration is done by single cpu only.
    Having single copy of cpuidle_states saves memory. Rare case
    of asymmetric C-states can be handled within the cpuidle driver
    and architectures such as POWER do not have asymmetric C-states.

    Having single/global registration of all the idle states,
    dynamic C-state transitions on x86 are handled by
    the boot cpu. Here, the boot cpu would disable all the devices,
    re-populate the states and later enable all the devices,
    irrespective of the cpu that would receive the notification first.

    Reference:
    https://lkml.org/lkml/2011/4/25/83

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • This is the first step towards global registration of cpuidle
    states. The statistics used primarily by the governor are per-cpu
    and have to be split from rest of the fields inside cpuidle_state,
    which would be made global i.e. single copy. The driver_data field
    is also per-cpu and moved.

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • Cpuidle governor only suggests the state to enter using the
    governor->select() interface, but allows the low level driver to
    override the recommended state. The actual entered state
    may be different because of software or hardware demotion. Software
    demotion is done by the back-end cpuidle driver and can be accounted
    correctly. Current cpuidle code uses last_state field to capture the
    actual state entered and based on that updates the statistics for the
    state entered.

    Ideally the driver enter routine should update the counters,
    and it should return the state actually entered rather than the time
    spent there. The generic cpuidle code should simply handle where
    the counters live in the sysfs namespace, not updating the counters.

    Reference:
    https://lkml.org/lkml/2011/3/25/52

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     

01 Nov, 2011

1 commit


01 Mar, 2011

1 commit

  • Userspace apps might have to cut off parts off the
    idle state name for display reasons.
    Switch NHM-C1 to C1-NHM (and others) so that a cut off
    name is unique and makes sense to the user.

    Signed-off-by: Thomas Renninger
    CC: lenb@kernel.org
    Signed-off-by: Len Brown

    Thomas Renninger
     

18 Feb, 2011

2 commits

  • Just as we had to disable auto-demotion for NHM/WSM,
    we need to do the same for Atom (Lincroft version).

    In particular, auto-demotion will prevent Lincroft
    from entering the S0i3 idle power saving state.

    https://bugzilla.kernel.org/show_bug.cgi?id=25252

    Signed-off-by: Len Brown

    Len Brown
     
  • Hardware C-state auto-demotion is a mechanism where the HW overrides
    the OS C-state request, instead demoting to a shallower state,
    which is less expensive, but saves less power.

    Modern Linux should generally get exactly the states it requests.
    In particular, when a CPU is taken off-line, it must not be demoted, else
    it can prevent the entire package from reaching deep C-states.

    https://bugzilla.kernel.org/show_bug.cgi?id=25252

    Signed-off-by: Len Brown

    Len Brown
     

25 Jan, 2011

1 commit

  • Fix a shutdown regression caused by 2a2d31c8dc6f ("intel_idle: open
    broadcast clock event"). The clockevent framework can automatically
    shutdown broadcast timers for hotremove CPUs. And we get a shutdown
    regression when we shutdown broadcast timer for hot remove CPU, so just
    delete some code.

    Also fix some section mismatch.

    Reported-by: Ari Savolainen
    Signed-off-by: Shaohua Li
    Tested-by: Linus Torvalds
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

13 Jan, 2011

7 commits

  • Len Brown
     
  • Len Brown
     
  • … from the cpuidle layer

    Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
    events -> this patch fixes it and makes cpu_idle events throwing less complex.

    It also introduces cpu_idle events for all architectures which use
    the cpuidle subsystem, namely:
    - arch/arm/mach-at91/cpuidle.c
    - arch/arm/mach-davinci/cpuidle.c
    - arch/arm/mach-kirkwood/cpuidle.c
    - arch/arm/mach-omap2/cpuidle34xx.c
    - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
    - arch/x86/kernel/process.c (did throw events before, but was a mess)
    - drivers/idle/intel_idle.c (did throw events before)

    Convention should be:
    Fire cpu_idle events inside the current pm_idle function (not somewhere
    down the the callee tree) to keep things easy.

    Current possible pm_idle functions in X86:
    c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
    -> this is really easy is now.

    This affects userspace:
    The type field of the cpu_idle power event can now direclty get
    mapped to:
    /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
    instead of throwing very CPU/mwait specific values.
    This change is not visible for the intel_idle driver.
    For the acpi_idle driver it should only be visible if the vendor
    misses out C-states in his BIOS.
    Another (perf timechart) patch reads out cpuidle info of cpu_idle
    events from:
    /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
    to the correct C-/cpuidle state again, even if e.g. vendors miss
    out C-states in their BIOS and for example only export C1 and C3.
    -> everything is fine.

    Signed-off-by: Thomas Renninger <trenn@suse.de>
    CC: Robert Schoene <robert.schoene@tu-dresden.de>
    CC: Jean Pihet <j-pihet@ti.com>
    CC: Arjan van de Ven <arjan@linux.intel.com>
    CC: Ingo Molnar <mingo@elte.hu>
    CC: Frederic Weisbecker <fweisbec@gmail.com>
    CC: linux-pm@lists.linux-foundation.org
    CC: linux-acpi@vger.kernel.org
    CC: linux-kernel@vger.kernel.org
    CC: linux-perf-users@vger.kernel.org
    CC: linux-omap@vger.kernel.org
    Signed-off-by: Len Brown <len.brown@intel.com>

    Thomas Renninger
     
  • Intel_idle driver uses CLOCK_EVT_NOTIFY_BROADCAST_ENTER
    CLOCK_EVT_NOTIFY_BROADCAST_EXIT
    for broadcast clock events. The _ENTER/_EXIT doesn't really open broadcast clock
    events, please see processor_idle.c for an example. In some situation, this will
    cause boot hang, because some CPUs enters idle but local APIC timer stalls.

    Reported-and-tested-by: Yan Zheng
    Signed-off-by: Shaohua Li
    cc: stable@kernel.org
    Signed-off-by: Len Brown

    Shaohua Li
     
  • Signed-off-by: Len Brown

    Len Brown
     
  • Having four variables for the same thing:
    idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
    is rather confusing and unnecessary complex.

    if idle= boot param is passed, only set up one variable:
    boot_option_idle_overrides

    Introduces following functional changes/fixes:
    - intel_idle driver does not register if any idle=xy
    boot param is passed.
    - processor_idle.c will also not register a cpuidle driver
    and get active if idle=halt is passed.
    Before a cpuidle driver with one (C1, halt) state got registered
    Now the default_idle function will be used which finally uses
    the same idle call to enter sleep state (safe_halt()), but
    without registering a whole cpuidle driver.

    That means idle= param will always avoid cpuidle drivers to register
    with one exception (same behavior as before):
    idle=nomwait
    may still register acpi_idle cpuidle driver, but C1 will not use
    mwait, but hlt. This can be a workaround for IO based deeper sleep
    states where C1 mwait causes problems.

    Signed-off-by: Thomas Renninger
    cc: x86@kernel.org
    Signed-off-by: Len Brown

    Thomas Renninger
     
  • Signed-off-by: Len Brown

    Len Brown
     

04 Jan, 2011

2 commits

  • Add these new power trace events:

    power:cpu_idle
    power:cpu_frequency
    power:machine_suspend

    The old C-state/idle accounting events:
    power:power_start
    power:power_end

    Have now a replacement (but we are still keeping the old
    tracepoints for compatibility):

    power:cpu_idle

    and
    power:power_frequency

    is replaced with:
    power:cpu_frequency

    power:machine_suspend is newly introduced.

    Jean Pihet has a patch integrated into the generic layer
    (kernel/power/suspend.c) which will make use of it.

    the type= field got removed from both, it was never
    used and the type is differed by the event type itself.

    perf timechart userspace tool gets adjusted in a separate patch.

    Signed-off-by: Thomas Renninger
    Signed-off-by: Ingo Molnar
    Acked-by: Arjan van de Ven
    Acked-by: Jean Pihet
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: rjw@sisk.pl
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Thomas Renninger
     
  • power_frequency moved to drivers/cpufreq/cpufreq.c which has
    to be compiled in, no need to export it.

    intel_idle can a be module though...

    Signed-off-by: Thomas Renninger
    Signed-off-by: Ingo Molnar
    Acked-by: Jean Pihet
    Cc: Jean Pihet
    Cc: Arjan van de Ven
    Cc: rjw@sisk.pl
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Thomas Renninger
     

02 Dec, 2010

1 commit


27 Oct, 2010

2 commits


23 Oct, 2010

3 commits


22 Oct, 2010

1 commit


16 Oct, 2010

2 commits


15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

09 Oct, 2010

1 commit


01 Oct, 2010

1 commit

  • Avoid TLB flush IPIs for the cores in deeper c-states by voluntary leave_mm()
    before entering into that state. CPUs tend to flush TLB in those c-states
    anyways.

    acpi_idle does this with C3-type states, but it was not caried over
    when intel_idle was introduced. intel_idle can apply it
    to C-states in addition to those that ACPI might export as C3...

    Signed-off-by: Suresh Siddha
    Signed-off-by: Len Brown

    Suresh Siddha
     

29 Sep, 2010

2 commits


18 Sep, 2010

1 commit


16 Aug, 2010

1 commit


15 Aug, 2010

3 commits


05 Aug, 2010

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
    [CPUFREQ] Remove pointless printk from p4-clockmod.
    [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c
    [CPUFREQ] Fix section mismatch for longhaul_cpu_init.
    [CPUFREQ] Fix section mismatch for longrun_cpu_init.
    [CPUFREQ] powernow-k8: Fix misleading variable naming
    [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
    [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev()
    [CPUFREQ] fix brace coding style issue.
    [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
    [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification
    [CPUFREQ] ondemand: don't synchronize sample rate unless multiple cpus present
    [CPUFREQ] unexport (un)lock_policy_rwsem* functions
    [CPUFREQ] ondemand: Refactor frequency increase code
    [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup
    [CPUFREQ] powernow-k8: Limit Pstate transition latency check
    [CPUFREQ] Fix PCC driver error path
    [CPUFREQ] fix double freeing in error path of pcc-cpufreq
    [CPUFREQ] pcc driver should check for pcch method before calling _OSC
    [CPUFREQ] fix memory leak in cpufreq_add_dev
    [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"

    Manually fix up non-data merge conflict introduced by new calling
    conventions for trace_power_start() in commit 6f4f2723d085 ("x86
    cpufreq: Make trace_power_frequency cpufreq driver independent"), which
    didn't update the intel_idle native hardware cpuidle driver.

    Linus Torvalds
     

27 Jul, 2010

1 commit


24 Jul, 2010

1 commit

  • The idea behind power policy was that it would start off as a modparam,
    and then hook into the new "global" in-kernel power vs energy tunable.
    But that tunable isn't happening, so delete the hook here.

    With the policy hook gone, the sub-state choice functions
    do not do anything useful, so delete them from the critical path.

    To handle sub-states in the future, we will advertise them
    with dedicated cpuidle_state entries. That is necessary
    because some of the sub-states will have substantially different
    properties than their peer sub-states.

    Signed-off-by: Len Brown

    Len Brown