17 Mar, 2011

4 commits


26 Jan, 2011

1 commit

  • With cmwq, there's no reason for cpufreq drivers to use separate
    workqueues. Remove the dedicated workqueues from cpufreq_conservative
    and cpufreq_ondemand and use system_wq instead. The work items are
    already sync canceled on stop, so it's already guaranteed that no work
    is running on module exit.

    Signed-off-by: Tejun Heo
    Acked-by: Dave Jones
    Cc: cpufreq@vger.kernel.org

    Tejun Heo
     

22 Oct, 2010

1 commit

  • Adds a new global tunable, sampling_down_factor. Set to 1 it makes no
    changes from existing behavior, but set to greater than 1 (e.g. 100)
    it acts as a multiplier for the scheduling interval for reevaluating
    load when the CPU is at its top speed due to high load. This improves
    performance by reducing the overhead of load evaluation and helping
    the CPU stay at its top speed when truly busy, rather than shifting
    back and forth in speed. This tunable has no effect on behavior at
    lower speeds/lower CPU loads.

    This patch is against 2.6.36-rc6.

    This patch should help solve kernel bug 19672 "ondemand is slow".

    Signed-off-by: David Niemi
    Acked-by: Venkatesh Pallipadi
    CC: Daniel Hollocher
    CC:
    CC:
    Signed-off-by: Dave Jones

    David C Niemi
     

04 Aug, 2010

2 commits


18 May, 2010

1 commit

  • * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, hypervisor: add missing
    Modify the VMware balloon driver for the new x86_hyper API
    x86, hypervisor: Export the x86_hyper* symbols
    x86: Clean up the hypervisor layer
    x86, HyperV: fix up the license to mshyperv.c
    x86: Detect running on a Microsoft HyperV system
    x86, cpu: Make APERF/MPERF a normal table-driven flag
    x86, k8: Fix build error when K8_NB is disabled
    x86, cacheinfo: Disable index in all four subcaches
    x86, cacheinfo: Make L3 cache info per node
    x86, cacheinfo: Reorganize AMD L3 cache structure
    x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
    x86, cacheinfo: Unify AMD L3 cache index disable checking
    cpufreq: Unify sysfs attribute definition macros
    powernow-k8: Fix frequency reporting
    x86, cpufreq: Add APERF/MPERF support for AMD processors
    x86: Unify APERF/MPERF support
    powernow-k8: Add core performance boost support
    x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo

    Fix up trivial conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c and
    drivers/cpufreq/cpufreq_ondemand.c

    Linus Torvalds
     

10 May, 2010

2 commits

  • Pavel Machek pointed out that not all CPUs have an efficient
    idle at high frequency. Specifically, older Intel and various
    AMD cpus would get a higher powerusage when copying files from
    USB.

    Mike Chan pointed out that the same is true for various ARM
    chips as well.

    Thomas Renninger suggested to make this a sysfs tunable with a
    reasonable default.

    This patch adds a sysfs tunable for the new behavior, and uses
    a very simple function to determine a reasonable default,
    depending on the CPU vendor/type.

    Signed-off-by: Arjan van de Ven
    Acked-by: Rik van Riel
    Acked-by: Pavel Machek
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    [ minor tidyup ]
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • The ondemand cpufreq governor uses CPU busy time (e.g. not-idle
    time) as a measure for scaling the CPU frequency up or down.
    If the CPU is busy, the CPU frequency scales up, if it's idle,
    the CPU frequency scales down. Effectively, it uses the CPU busy
    time as proxy variable for the more nebulous "how critical is
    performance right now" question.

    This algorithm falls flat on its face in the light of workloads
    where you're alternatingly disk and CPU bound, such as the ever
    popular "git grep", but also things like startup of programs and
    maildir using email clients... much to the chagarin of Andrew
    Morton.

    This patch changes the ondemand algorithm to count iowait time
    as busy, not idle, time. As shown in the breakdown cases above,
    iowait is performance critical often, and by counting iowait,
    the proxy variable becomes a more accurate representation of the
    "how critical is performance" question.

    The problem and fix are both verified with the "perf timechar"
    tool.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Jones
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

10 Apr, 2010

1 commit

  • Multiple modules used to define those which are with identical
    functionality and were needlessly replicated among the different cpufreq
    drivers. Push them into the header and remove duplication.

    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Reviewed-by: Thomas Renninger
    Signed-off-by: H. Peter Anvin

    Borislav Petkov
     

13 Jan, 2010

1 commit

  • Dominik said:
    target_freq cannot be below policy->min or above policy->max.
    If it were, the whole cpufreq subsystem is broken.

    But (answer):
    I think the "ondemand" governor can ask for a target frequency that is
    below policy->min.
    ...
    A patch such as below may be needed to sanitize the target frequency
    requested by "ondemand". The "conservative" governor already has this check:

    Signed-off-by: Thomas Renninger
    Signed-off-by: Dave Jones

    # diff -bur x/drivers/cpufreq/cpufreq_ondemand.c.orig y/drivers/cpufreq/cpufreq_ondemand.c

    Nagananda.Chumbalkar@hp.com
     

18 Nov, 2009

1 commit

  • ondemand and conservative governors are messing up time units in the
    code path where NO_HZ is not enabled and ignore_nice is set. The walltime
    idletime stored is in jiffies and nice time calculation is happening in
    microseconds.

    The problem was reported and diagnosed by Alexander here.
    http://marc.info/?l=linux-kernel&m=125752550404513&w=2

    The patch below fixes this thinko.

    Reported-by: Alexander Miller
    Tested-by: Alexander Miller
    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    Pallipadi, Venkatesh
     

19 Sep, 2009

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
    [CPUFREQ] Fix NULL ptr regression in powernow-k8
    [CPUFREQ] Create a blacklist for processors that should not load the acpi-cpufreq module.
    [CPUFREQ] Powernow-k8: Enable more than 2 low P-states
    [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)
    [CPUFREQ] ondemand - Use global sysfs dir for tuning settings
    [CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq
    [CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created
    [CPUFREQ] Factor out policy setting from cpufreq_add_dev
    [CPUFREQ] Factor out interface creation from cpufreq_add_dev
    [CPUFREQ] Factor out symlink creation from cpufreq_add_dev
    [CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev
    [CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev
    [CPUFREQ] update Doc for cpuinfo_cur_freq and scaling_cur_freq

    Linus Torvalds
     

02 Sep, 2009

1 commit

  • Ondemand has only global variables for userspace tunings via sysfs.
    But they were exposed per CPU which wrongly implies to the user that
    his settings are applied per cpu. Also locking sysfs against concurrent
    access won't be necessary anymore after deprecation time.

    This means the ondemand config dir is moved:
    /sys/devices/system/cpu/cpu*/cpufreq/ondemand ->
    /sys/devices/system/cpu/cpufreq/ondemand

    The old files will still exist, but reading or writing to them will
    result in one (printk_once) deprecation msg to syslog per file.

    Signed-off-by: Thomas Renninger
    Signed-off-by: Dave Jones

    Thomas Renninger
     

14 Aug, 2009

1 commit

  • Conflicts:
    arch/sparc/kernel/smp_64.c
    arch/x86/kernel/cpu/perf_counter.c
    arch/x86/kernel/setup_percpu.c
    drivers/cpufreq/cpufreq_ondemand.c
    mm/percpu.c

    Conflicts in core and arch percpu codes are mostly from commit
    ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
    num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
    the first chunk allocators into mm/percpu.c, the changes are moved
    from arch code to mm/percpu.c.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

07 Jul, 2009

2 commits

  • Redesign the locking inside ondemand driver. Make dbs_mutex handle all the
    global state changes inside the driver and invent a new percpu mutex
    to serialize percpu timer and frequency limit change.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Commit b14893a62c73af0eca414cfed505b8c09efc613c although it was very
    much needed to properly cleanup ondemand timer, opened-up a can of worms
    related to locking dependencies in cpufreq.

    Patch here defines the need for dbs_mutex and cleans up its usage in
    ondemand governor. This also resolves the lockdep warnings reported here

    http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.html
    http://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html

    and few others..

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     

24 Jun, 2009

1 commit

  • Percpu variable definition is about to be updated such that all percpu
    symbols including the static ones must be unique. Update percpu
    variable definitions accordingly.

    * as,cfq: rename ioc_count uniquely

    * cpufreq: rename cpu_dbs_info uniquely

    * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it

    * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
    rename it

    * ipv4,6: rename cookie_scratch uniquely

    * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
    pmc_irq_entry and nmi_entry to pmc_nmi_entry

    * perf_counter: rename disable_count to perf_disable_count

    * ftrace: rename test_event_disable to ftrace_test_event_disable

    * kmemleak: rename test_pointer to kmemleak_test_pointer

    * mce: rename next_interval to mce_next_interval

    [ Impact: percpu usage cleanups, no duplicate static percpu var names ]

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Cc: Ivan Kokshaysky
    Cc: Jens Axboe
    Cc: Dave Jones
    Cc: Jeremy Fitzhardinge
    Cc: linux-mm
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Catalin Marinas
    Cc: Andi Kleen

    Tejun Heo
     

15 Jun, 2009

2 commits

  • Update the documentation accordingly.
    Cleanup and use printk_once.

    Signed-off-by: Thomas Renninger
    Signed-off-by: Dave Jones

    Thomas Renninger
     
  • With this patch you have following minimal sampling rate restrictions:

    Kernel restrictions:
    If CONFIG_NO_HZ is set, the limit is 10ms fixed.
    If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
    limits depend on the CONFIG_HZ option:
    HZ=1000: min=20000us (20ms)
    HZ=250: min=80000us (80ms)
    HZ=100: min=200000us (200ms)

    HW restrictions:
    Do not sample/poll more often than HW latency * 100 exported by the low
    level cpufreq HW driver

    The higher value of above restrictions is the minimal sampling rate
    that can be set (and can be seen via ondemand/sampling_rate_min sysfs file)

    Default sampling rate still is HW latency * 1000, but this will now end
    up in lower values on latest (Intel and AMD) hardware as these can switch
    really fast and sampling rate mostly was limited to the 80ms or 200ms
    (depending on whether HZ=250 or HZ=1000 is used).

    Signed-off-by: Thomas Renninger
    Cc: Pallipadi Venkatesh
    Signed-off-by: Dave Jones

    Thomas Renninger
     

27 May, 2009

1 commit

  • * Rafael J. Wysocki (rjw@sisk.pl) wrote:
    > This message has been generated automatically as a part of a report
    > of regressions introduced between 2.6.28 and 2.6.29.
    >
    > The following bug entry is on the current list of known regressions
    > introduced between 2.6.28 and 2.6.29. Please verify if it still should
    > be listed and let me know (either way).
    >
    >
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186
    > Subject : cpufreq timer teardown problem
    > Submitter : Mathieu Desnoyers
    > Date : 2009-04-23 14:00 (24 days old)
    > References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4
    > Handled-By : Mathieu Desnoyers
    > Patch : http://patchwork.kernel.org/patch/19754/
    > http://patchwork.kernel.org/patch/19753/
    >

    (updated changelog)

    cpufreq fix timer teardown in ondemand governor

    The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
    use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
    workqueue handler to exit.

    The ondemand governor does not seem to be affected because the
    "if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
    immediately without rescheduling the work. The conservative governor in
    2.6.30-rc has the same check as the ondemand governor, which makes things
    usually run smoothly. However, if the governor is quickly stopped and then
    started, this could lead to the following race :

    dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
    This is why a synchronized teardown is required.

    The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.

    Depends on patch
    cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

    Signed-off-by: Mathieu Desnoyers
    CC: Andrew Morton
    CC: gregkh@suse.de
    CC: stable@kernel.org
    CC: cpufreq@vger.kernel.org
    CC: Ingo Molnar
    CC: rjw@sisk.pl
    CC: Ben Slusky
    Signed-off-by: Dave Jones

    Mathieu Desnoyers
     

25 Feb, 2009

3 commits


06 Feb, 2009

1 commit


06 Jan, 2009

1 commit

  • Impact: use new cpumask API to reduce memory usage

    This is part of an effort to reduce structure sizes for machines
    configured with large NR_CPUS. cpumask_t gets replaced by
    cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
    struct cpumask * (large NR_CPUS).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Acked-by: Dave Jones
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

10 Oct, 2008

8 commits

  • Use get_cpu()/put_cpu() in cpufreq_ondemand init routine, instead of
    smp_processor_id() to avoid the following BUG:

    [ 35.313118] BUG: using smp_processor_id() in preemptible [00000000] code=: modprobe/4952
    [ 35.313132] caller is cpufreq_gov_dbs_init+0xa/0x8f [cpufreq_ondemand]
    [ 35.313140] Pid: 4952, comm: modprobe Not tainted 2.6.27-rc5-mm1 #23
    [ 35.313145] Call Trace:
    [ 35.313158] [] debug_smp_processor_id+0xd7/0xe0
    [ 35.313167] [] cpufreq_gov_dbs_init+0xa/0x8f [cpufreq_ondemand]
    [ 35.313176] [] _stext+0x3b/0x160
    [ 35.313185] [] __mutex_unlock_slowpath+0xe5/0x190
    [ 35.313195] [] trace_hardirqs_on_caller+0xca/0x140
    [ 35.313205] [] sys_init_module+0xdc/0x210
    [ 35.313212] [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Andrea Righi
    Signed-off-by: Dave Jones

    Andrea Righi
     
  • We don't need to export the governors for use as the default governor,
    because the default governor will be built-in anyway and we can access
    the symbol directly.

    This also fixes the following sparse warnings:

    drivers/cpufreq/cpufreq_conservative.c:578:25: warning: symbol 'cpufreq_gov_conservative' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_ondemand.c:582:25: warning: symbol 'cpufreq_gov_ondemand' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_performance.c:39:25: warning: symbol 'cpufreq_gov_performance' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_powersave.c:38:25: warning: symbol 'cpufreq_gov_powersave' was not declared. Should it be static?
    drivers/cpufreq/cpufreq_userspace.c:190:25: warning: symbol 'cpufreq_gov_userspace' was not declared. Should it be static?

    Signed-off-by: Sven Wegener
    Signed-off-by: Dave Jones

    Sven Wegener
     
  • Use get_cpu_idle_time_us() to get micro-accounted idle information.
    This enables ondemand to get more accurate idle and busy timings
    than the jiffy based calculation. As a result, we can decrease
    the ondemand safety gaurd band from 80-10 to 95-3.

    Results in more aggressive power savings.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Use a parameter for down differential, instead of hardcoded 10%. Follow-on
    patch changes the down-differential dynamically, based on whether
    we are using idle micro-accounting or not.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Preparatory changes for doing idle micro-accounting in ondemand governor.
    get_cpu_idle_time() gets extra parameter and returns idle time and also the
    wall time that corresponds to the idle time measurement.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Change the load calculation algorithm in ondemand to work well with software
    coordination of frequency across the dependent cpus.

    Multiply individual CPU utilization with the average freq of that logical CPU
    during the measurement interval (using getavg call). And find the max CPU
    utilization number in terms of CPU freq. That number is then used to
    get to the target freq for next sampling interval.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
    cpufreq coordination where policy->cpu may not be same as the CPU on which we
    want to getavg frequency.

    A follow-on patch will use this parameter to getavg freq from all cpus
    in policy->cpus.

    Change since last patch. Fix the offline/online and suspend/resume
    oops reported by Youquan Song

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    venkatesh.pallipadi@intel.com
     
  • Add error handling for cpufreq_register_governor() error

    Signed-off-by: Akinobu Mita
    Cc: cpufreq@lists.linux.org.uk
    Signed-off-by: Dave Jones

    Akinobu Mita
     

24 May, 2008

1 commit


18 Jan, 2008

1 commit

  • When the cpufreq driver starts up at boot time, it calls into the default
    governor which might not be initialised yet. This hurts when the
    governor's worker function relies on memory that is not yet set up by its
    init function.

    This migrates all governors from module_init() to fs_initcall() when being
    the default, as was already done in cpufreq_performance when it was the
    only possible choice. The performance governor is always initialized early
    because it might be used as fallback even when not being the default.

    Fixes at least one actual oops where ondemand is the default governor and
    cpufreq_governor_dbs() uses the uninitialised kondemand_wq work-queue
    during boot-time.

    Signed-off-by: Johannes Weiner
    Cc: Dave Jones
    Cc: "Rafael J. Wysocki"
    Cc: Venkatesh Pallipadi
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

05 Oct, 2007

1 commit

  • Depending on the transition latency of the HW for cpufreq switches, the
    ondemand or conservative governor cannot be used with certain cpufreq
    drivers. Still the ondemand should be the default governor on a wide range
    of systems. This patch allows this and lets the governor fallback to the
    performance governor at cpufreq driver load time, if the driver does not
    support fast enough frequency switching.

    Main benefit is that on e.g. installation or other systems without
    userspace support a working dynamic cpufreq support can be achieved on most
    systems by simply loading the cpufreq driver. This is especially essential
    for recent x86(_64) laptop hardware which may rely on working dynamic
    cpufreq OS support.

    Signed-off-by: Thomas Renninger
    Signed-off-by: Venkatesh Pallipadi
    Cc: Russell King
    Cc: Bryan Wu
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Jones

    Thomas Renninger
     

22 Jun, 2007

1 commit

  • With tickless kernel and software coordination os P-states, ondemand
    can look at wrong idle statistics. This can happen when ondemand sampling
    is happening on CPU 0 and due to software coordination sampling also looks at
    utilization of CPU 1. If CPU 1 is in tickless state at that moment, its idle
    statistics will not be uptodate and CPU 0 thinks CPU 1 is idle for less
    amount of time than it actually is.

    This can be resolved by looking at all the busy times of CPUs, which is
    accurate, even with tickless, and use that to determine idle time in a
    round about way (total time - busy time).

    Thanks to Arjan for originally reporting the ondemand bug on
    Lenovo T61.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    Venki Pallipadi