09 Jun, 2009

1 commit


27 May, 2009

3 commits

  • * Rafael J. Wysocki (rjw@sisk.pl) wrote:
    > This message has been generated automatically as a part of a report
    > of regressions introduced between 2.6.28 and 2.6.29.
    >
    > The following bug entry is on the current list of known regressions
    > introduced between 2.6.28 and 2.6.29. Please verify if it still should
    > be listed and let me know (either way).
    >
    >
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186
    > Subject : cpufreq timer teardown problem
    > Submitter : Mathieu Desnoyers
    > Date : 2009-04-23 14:00 (24 days old)
    > References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4
    > Handled-By : Mathieu Desnoyers
    > Patch : http://patchwork.kernel.org/patch/19754/
    > http://patchwork.kernel.org/patch/19753/
    >

    (updated changelog)

    cpufreq fix timer teardown in ondemand governor

    The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
    use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
    workqueue handler to exit.

    The ondemand governor does not seem to be affected because the
    "if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
    immediately without rescheduling the work. The conservative governor in
    2.6.30-rc has the same check as the ondemand governor, which makes things
    usually run smoothly. However, if the governor is quickly stopped and then
    started, this could lead to the following race :

    dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
    This is why a synchronized teardown is required.

    The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.

    Depends on patch
    cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

    Signed-off-by: Mathieu Desnoyers
    CC: Andrew Morton
    CC: gregkh@suse.de
    CC: stable@kernel.org
    CC: cpufreq@vger.kernel.org
    CC: Ingo Molnar
    CC: rjw@sisk.pl
    CC: Ben Slusky
    Signed-off-by: Dave Jones

    Mathieu Desnoyers
     
  • * Rafael J. Wysocki (rjw@sisk.pl) wrote:
    > This message has been generated automatically as a part of a report
    > of regressions introduced between 2.6.28 and 2.6.29.
    >
    > The following bug entry is on the current list of known regressions
    > introduced between 2.6.28 and 2.6.29. Please verify if it still should
    > be listed and let me know (either way).
    >
    >
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186
    > Subject : cpufreq timer teardown problem
    > Submitter : Mathieu Desnoyers
    > Date : 2009-04-23 14:00 (24 days old)
    > References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4
    > Handled-By : Mathieu Desnoyers
    > Patch : http://patchwork.kernel.org/patch/19754/
    > http://patchwork.kernel.org/patch/19753/
    >

    (re-send with updated changelog)

    cpufreq fix timer teardown in conservative governor

    The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
    use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
    workqueue handler to exit.

    The ondemand governor does not seem to be affected because the
    "if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
    immediately without rescheduling the work. The conservative governor in
    2.6.30-rc has the same check as the ondemand governor, which makes things
    usually run smoothly. However, if the governor is quickly stopped and then
    started, this could lead to the following race :

    dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
    This is why a synchronized teardown is required.

    Depends on patch
    cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

    The following patch applies to 2.6.30-rc2. Stable kernels have a similar
    issue which should also be fixed, but the code changed between 2.6.29
    and 2.6.30, so this patch only applies to 2.6.30-rc.

    Signed-off-by: Mathieu Desnoyers
    CC: Andrew Morton
    CC: gregkh@suse.de
    CC: stable@kernel.org
    CC: cpufreq@vger.kernel.org
    CC: Ingo Molnar
    CC: rjw@sisk.pl
    CC: Ben Slusky
    Signed-off-by: Dave Jones

    Mathieu Desnoyers
     
  • * Rafael J. Wysocki (rjw@sisk.pl) wrote:
    > This message has been generated automatically as a part of a report
    > of regressions introduced between 2.6.28 and 2.6.29.
    >
    > The following bug entry is on the current list of known regressions
    > introduced between 2.6.28 and 2.6.29. Please verify if it still should
    > be listed and let me know (either way).
    >
    >
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186
    > Subject : cpufreq timer teardown problem
    > Submitter : Mathieu Desnoyers
    > Date : 2009-04-23 14:00 (24 days old)
    > References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4
    > Handled-By : Mathieu Desnoyers
    > Patch : http://patchwork.kernel.org/patch/19754/
    > http://patchwork.kernel.org/patch/19753/

    The patches linked above depend on the following patch to remove
    circular locking dependency :

    cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

    (the following issue was faced when using cancel_delayed_work_sync() in the
    timer teardown (which fixes a race).

    * KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
    > Hi
    >
    > my box output following warnings.
    > it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
    >
    > A: work -> do_dbs_timer() -> cpu_policy_rwsem
    > B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
    >
    >

    Hrm, I think it must be due to my attempt to fix the timer teardown race
    in ondemand governor mixed with new locking behavior in 2.6.30-rc.

    The rwlock seems to be taken around the whole call to
    cpufreq_governor_dbs(), when it should be only taken around accesses to
    the locked data, and especially *not* around the call to
    dbs_timer_exit().

    Reverting my fix attempt would put the teardown race back in place
    (replacing the cancel_delayed_work_sync by cancel_delayed_work).
    Instead, a proper fix would imply modifying this critical section :

    cpufreq.c: __cpufreq_remove_dev()
    ...
    if (cpufreq_driver->target)
    __cpufreq_governor(data, CPUFREQ_GOV_STOP);

    unlock_policy_rwsem_write(cpu);

    To make sure the __cpufreq_governor() callback is not called with rwsem
    held. This would allow execution of cancel_delayed_work_sync() without
    being nested within the rwsem.

    Applies on top of the 2.6.30-rc5 tree.

    Required to remove circular dep in teardown of both conservative and
    ondemande governors so they can use cancel_delayed_work_sync().
    CPUFREQ_GOV_STOP does not modify the policy, therefore this locking seemed
    unneeded.

    Signed-off-by: Mathieu Desnoyers
    CC: KOSAKI Motohiro
    Cc: Greg KH
    CC: Ingo Molnar
    CC: "Rafael J. Wysocki"
    CC: Ben Slusky
    CC: Chris Wright
    CC: Andrew Morton
    Signed-off-by: Dave Jones

    Mathieu Desnoyers
     

27 Mar, 2009

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (35 commits)
    [CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor.
    [CPUFREQ] Make cpufreq-nforce2 less obnoxious
    [CPUFREQ] p4-clockmod reports wrong frequency.
    [CPUFREQ] powernow-k8: Use a common exit path.
    [CPUFREQ] Change link order of x86 cpufreq modules
    [CPUFREQ] conservative: remove 10x from def_sampling_rate
    [CPUFREQ] conservative: fixup governor to function more like ondemand logic
    [CPUFREQ] conservative: fix dbs_cpufreq_notifier so freq is not locked
    [CPUFREQ] conservative: amend author's email address
    [CPUFREQ] Use swap() in longhaul.c
    [CPUFREQ] checkpatch cleanups for acpi-cpufreq
    [CPUFREQ] powernow-k8: Only print error message once, not per core.
    [CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions
    [CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}
    [CPUFREQ] powernow-k8: Always compile powernow-k8 driver with ACPI support
    [CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
    [CPUFREQ] checkpatch cleanups for powernow-k8
    [CPUFREQ] checkpatch cleanups for ondemand governor.
    [CPUFREQ] checkpatch cleanups for powernow-k7
    [CPUFREQ] checkpatch cleanups for speedstep related drivers.
    ...

    Linus Torvalds
     

10 Mar, 2009

1 commit

  • This reverts commit e088e4c9cdb618675874becb91b2fd581ee707e6.

    Removing the sysfs interface for p4-clockmod was flagged as a
    regression in bug 12826.

    Course of action:
    - Find out the remaining causes of overheating, and fix them
    if possible. ACPI should be doing the right thing automatically.
    If it isn't, we need to fix that.
    - mark p4-clockmod ui as deprecated
    - try again with the removal in six months.

    It's not really feasible to printk about the deprecation, because
    it needs to happen at all the sysfs entry points, which means adding
    a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.

    Signed-off-by: Dave Jones

    Dave Jones
     

25 Feb, 2009

13 commits


06 Feb, 2009

1 commit


10 Jan, 2009

1 commit

  • * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    [IA64] fix typo in cpumask_of_pcibus()
    x86: fix x86_32 builds for summit and es7000 arch's
    cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
    cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
    cpumask: use cpumask_var_t in acpi-cpufreq.c
    cpumask: use work_on_cpu in acpi/cstate.c
    cpumask: convert struct cpufreq_policy to cpumask_var_t
    cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
    x86: cleanup remaining cpumask_t ops in smpboot code
    cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
    cpumask: update local_cpus_show to use new cpumask API
    ia64: cpumask fix for is_affinity_mask_valid()

    Linus Torvalds
     

06 Jan, 2009

2 commits


06 Dec, 2008

2 commits

  • Previously driver resume would always set the current policy min/max with
    the cpuinfo min/max, defined by user_policy.min/max. Resulting in a reset
    of policy settings when policy.min/max != cpuinfo.min/max when coming out
    of suspend. Now user_policy is saved as the policy instead of cpuinfo to
    preserve what the user actually set.

    Signed-off-by: Mike Chan
    Signed-off-by: Dave Jones

    Mike Chan
     
  • p4-clockmod has a long history of abuse. It pretends to be a CPU
    frequency scaling driver, even though it doesn't actually change
    the CPU frequency, but instead just modulates the frequency with
    wait-states.
    The biggest misconception is that when running at the lower 'frequency'
    p4-clockmod is saving power. This isn't the case, as workloads running
    slower take longer to complete, preventing the CPU from entering deep C states.

    However p4-clockmod does have a purpose. It can prevent overheating.
    Having it hooked up to the cpufreq interfaces is the wrong way to achieve
    cooling however. It should instead be hooked up to ACPI.

    This diff introduces a means for a cpufreq driver to register with the
    cpufreq core, but not present a sysfs interface.

    Signed-off-by: Matthew Garrett
    Signed-off-by: Dave Jones

    Matthew Garrett
     

10 Oct, 2008

10 commits

  • Use get_cpu()/put_cpu() in cpufreq_ondemand init routine, instead of
    smp_processor_id() to avoid the following BUG:

    [ 35.313118] BUG: using smp_processor_id() in preemptible [00000000] code=: modprobe/4952
    [ 35.313132] caller is cpufreq_gov_dbs_init+0xa/0x8f [cpufreq_ondemand]
    [ 35.313140] Pid: 4952, comm: modprobe Not tainted 2.6.27-rc5-mm1 #23
    [ 35.313145] Call Trace:
    [ 35.313158] [] debug_smp_processor_id+0xd7/0xe0
    [ 35.313167] [] cpufreq_gov_dbs_init+0xa/0x8f [cpufreq_ondemand]
    [ 35.313176] [] _stext+0x3b/0x160
    [ 35.313185] [] __mutex_unlock_slowpath+0xe5/0x190
    [ 35.313195] [] trace_hardirqs_on_caller+0xca/0x140
    [ 35.313205] [] sys_init_module+0xdc/0x210
    [ 35.313212] [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Andrea Righi
    Signed-off-by: Dave Jones

    Andrea Righi
     
  • We don't need to export the governors for use as the default governor,
    because the default governor will be built-in anyway and we can access
    the symbol directly.

    This also fixes the following sparse warnings:

    drivers/cpufreq/cpufreq_conservative.c:578:25: warning: symbol 'cpufreq_gov_conservative' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_ondemand.c:582:25: warning: symbol 'cpufreq_gov_ondemand' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_performance.c:39:25: warning: symbol 'cpufreq_gov_performance' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_powersave.c:38:25: warning: symbol 'cpufreq_gov_powersave' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_userspace.c:190:25: warning: symbol 'cpufreq_gov_userspace' was not declared. Should it be static?

    Signed-off-by: Sven Wegener
    Signed-off-by: Dave Jones

    Sven Wegener
     
  • Use get_cpu_idle_time_us() to get micro-accounted idle information.
    This enables ondemand to get more accurate idle and busy timings
    than the jiffy based calculation. As a result, we can decrease
    the ondemand safety gaurd band from 80-10 to 95-3.

    Results in more aggressive power savings.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Use a parameter for down differential, instead of hardcoded 10%. Follow-on
    patch changes the down-differential dynamically, based on whether
    we are using idle micro-accounting or not.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Preparatory changes for doing idle micro-accounting in ondemand governor.
    get_cpu_idle_time() gets extra parameter and returns idle time and also the
    wall time that corresponds to the idle time measurement.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Change the load calculation algorithm in ondemand to work well with software
    coordination of frequency across the dependent cpus.

    Multiply individual CPU utilization with the average freq of that logical CPU
    during the measurement interval (using getavg call). And find the max CPU
    utilization number in terms of CPU freq. That number is then used to
    get to the target freq for next sampling interval.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
    cpufreq coordination where policy->cpu may not be same as the CPU on which we
    want to getavg frequency.

    A follow-on patch will use this parameter to getavg freq from all cpus
    in policy->cpus.

    Change since last patch. Fix the offline/online and suspend/resume
    oops reported by Youquan Song

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Venki Pallipadi made a similar change to the ondemand governor a while
    back (in commit 28287033e12463c8ff89f1ea8038783d0360391c). It seems to
    work just as well in the conservative governor, leading to fewer wakeups
    as reported by powertop.

    Signed-off-by: Ben Slusky
    Signed-off-by: Dave Jones

    Ben Slusky
     
  • After calling cpufreq_cpu_get, error handling code should call
    cpufreq_cpu_put.

    The semantic match that finds this problem is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @r@
    expression x,E;
    statement S;
    position p1,p2,p3;
    @@

    (
    if ((x = cpufreq_cpu_get@p1(...)) == NULL || ...) S
    |
    x = cpufreq_cpu_get@p1(...)
    ... when != x
    if (x == NULL || ...) S
    )

    (
    return x;
    |
    return 0;
    |
    x = E
    |
    E = x
    |
    cpufreq_cpu_put(x)
    )

    @exists@
    position r.p1,r.p2,r.p3;
    expression x;
    int ret != 0;
    statement S;
    @@

    * x = cpufreq_cpu_get@p1(...)

    * return@p2 \(NULL\|ret\);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Dave Jones

    Julia Lawall
     
  • Add error handling for cpufreq_register_governor() error

    Signed-off-by: Akinobu Mita
    Cc: cpufreq@lists.linux.org.uk
    Signed-off-by: Dave Jones

    Akinobu Mita
     

09 Aug, 2008

1 commit


31 Jul, 2008

1 commit

  • Ingo Molnar provided a fix to not call _PPC at processor driver
    initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
    commit e4233dec749a3519069d9390561b5636a75c7579)

    But it can still happen that _PPC is called at processor driver
    initialization time.

    This patch should make sure that this is not possible anymore.

    Signed-off-by: Thomas Renninger
    Cc: Andi Kleen
    Cc: Len Brown
    Cc: Dave Jones
    Cc: Ingo Molnar
    Cc: Venkatesh Pallipadi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Renninger
     

24 Jul, 2008

1 commit

  • * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
    NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
    cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
    NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
    NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
    NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
    NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
    NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
    NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
    cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
    cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
    cpumask: Provide a generic set of CPUMASK_ALLOC macros
    cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
    cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
    cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
    cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
    cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
    cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
    Revert "cpumask: introduce new APIs"
    cpumask: make for_each_cpu_mask a bit smaller
    net: Pass reference to cpumask variable in net/sunrpc/svc.c
    ...

    Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually

    Linus Torvalds
     

22 Jul, 2008

1 commit


20 Jul, 2008

1 commit

  • * Replace arrays sized by NR_CPUS with percpu variables.

    Prior reference: http://marc.info/?l=linux-kernel&m=120251421825989&w=4
    Subject: [PATCH 1/4] cpufreq: change cpu freq tables to per_cpu variables
    From: Mike Travis
    Date: 2008-02-08 23:37:39

    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Mike Travis