22 Jun, 2007

2 commits

  • With tickless kernel and software coordination os P-states, ondemand
    can look at wrong idle statistics. This can happen when ondemand sampling
    is happening on CPU 0 and due to software coordination sampling also looks at
    utilization of CPU 1. If CPU 1 is in tickless state at that moment, its idle
    statistics will not be uptodate and CPU 0 thinks CPU 1 is idle for less
    amount of time than it actually is.

    This can be resolved by looking at all the busy times of CPUs, which is
    accurate, even with tickless, and use that to determine idle time in a
    round about way (total time - busy time).

    Thanks to Arjan for originally reporting the ondemand bug on
    Lenovo T61.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    Venki Pallipadi
     
  • Due to rounding and inexact jiffy accounting, idle_ticks can sometimes
    be higher than total_ticks. Make sure those cases are handled as
    zero load case.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    Venki Pallipadi
     

09 May, 2007

1 commit

  • Add a new deferrable delayed work init. This can be used to schedule work
    that are 'unimportant' when CPU is idle and can be called later, when CPU
    eventually comes out of idle.

    Use this init in cpufreq ondemand governor.

    Signed-off-by: Venkatesh Pallipadi
    Cc: Dave Jones
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venki Pallipadi
     

21 Feb, 2007

1 commit


11 Feb, 2007

4 commits

  • Signed-off-by: Dave Jones

    Dave Jones
     
  • Eliminate flush_workqueue in cpufreq_governor(STOP) callpath. Using flush
    there has a deadlock potential as in

    http://uwsg.iu.edu/hypermail/linux/kernel/0611.3/1223.html

    Also, cleanup the locking issues with do_dbs_timer delayed_work callback. As
    it changes the CPU frequency using __cpufreq_target, it needs to have
    policy_rwsem in write mode, which also protects it from hot plug.

    Signed-off-by: Venkatesh Pallipadi
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Jones

    Venkatesh Pallipadi
     
  • Restructure the delayed_work callback in ondemand.

    This eliminates the need for smp_processor_id in the callback function and
    also helps in proper locking and avoiding flush_workqueue when stopping the
    governor (done in subsequent patch).

    Signed-off-by: Venkatesh Pallipadi
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Jones

    Venkatesh Pallipadi
     
  • The hotplug CPU locking in cpufreq is horrendous. No-one seems to care
    enough to fix it, so just remove it so that the 99.9% of the real world
    users of this code can use cpufreq without being bothered by warnings.

    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Jones

    Dave Jones
     

13 Dec, 2006

1 commit


22 Nov, 2006

1 commit


07 Nov, 2006

1 commit


21 Oct, 2006

1 commit


16 Oct, 2006

1 commit

  • Enable ondemand governor and acpi-cpufreq to use IA32_APERF and IA32_MPERF MSR
    to get active frequency feedback for the last sampling interval. This will
    make ondemand take right frequency decisions when hardware coordination of
    frequency is going on.

    Without APERF/MPERF, ondemand can take wrong decision at times due
    to underlying hardware coordination or TM2.
    Example:
    * CPU 0 and CPU 1 are hardware cooridnated.
    * CPU 1 running at highest frequency.
    * CPU 0 was running at highest freq. Now ondemand reduces it to
    some intermediate frequency based on utilization.
    * Due to underlying hardware coordination with other CPU 1, CPU 0 continues to
    run at highest frequency (as long as other CPU is at highest).
    * When ondemand samples CPU 0 again next time, without actual frequency
    feedback from APERF/MPERF, it will think that previous frequency change
    was successful and can go to wrong target frequency. This is because it
    thinks that utilization it has got this sampling interval is when running at
    intermediate frequency, rather than actual highest frequency.

    More information about IA32_APERF IA32_MPERF MSR:
    Refer to IA-32 Intel® Architecture Software Developer's Manual at
    http://developer.intel.com

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    Venkatesh Pallipadi
     

06 Sep, 2006

1 commit


14 Aug, 2006

1 commit


12 Aug, 2006

2 commits

  • ondemand selects the minimum frequency that can retire
    a workload with negligible idle time -- ideally resulting in the highest
    performance/power efficiency with negligible performance impact.

    But on some systems and some workloads, this algorithm
    is more performance biased than necessary, and
    de-tuning it a bit to allow some performance impact
    can save measurable power.

    This patch adds a "powersave_bias" tunable to ondemand
    to allow it to reduce its target frequency by a specified percent.

    By default, the powersave_bias is 0 and has no effect.
    powersave_bias is in units of 0.1%, so it has an effective range
    of 1 through 1000, resulting in 0.1% to 100% impact.

    In practice, users will not be able to detect a difference between
    0.1% increments, but 1.0% increments turned out to be too large.
    Also, the max value of 1000 (100%) would simply peg the system
    in its deepest power saving P-state, unless the processor really has
    a hardware P-state at 0Hz:-)

    For example, If ondemand requests 2.0GHz based on utilization,
    and powersave_bias=100, this code will knock 10% off the target
    and seek a target of 1.8GHz instead of 2.0GHz until the
    next sampling. If 1.8 is an exact match with an hardware frequency
    we use it, otherwise we average our time between the frequency
    next higher than 1.8 and next lower than 1.8.

    Note that a user or administrative program can change powersave_bias
    at run-time depending on how they expect the system to be used.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Alexey Starikovskiy
    Signed-off-by: Dave Jones

    Alexey Starikovskiy
     
  • Try to make dbs_check_cpu() call on all CPUs at the same jiffy.
    This will help when multiple cores share P-states via Hardware Coordination.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Alexey Starikovskiy
    Signed-off-by: Dave Jones

    Alexey Starikovskiy
     

26 Jul, 2006

1 commit

  • The patch below moves the cpu hotplugging higher up in the cpufreq
    layering; this is needed to avoid recursive taking of the cpu hotplug
    lock and to otherwise detangle the mess.

    The new rules are:
    1. you must do lock_cpu_hotplug() around the following functions:
    __cpufreq_driver_target
    __cpufreq_governor (for CPUFREQ_GOV_LIMITS operation only)
    __cpufreq_set_policy
    2. governer methods (.governer) must NOT take the lock_cpu_hotplug()
    lock in any way; they are called with the lock taken already
    3. if your governer spawns a thread that does things, like calling
    __cpufreq_driver_target, your thread must honor rule #1.
    4. the policy lock and other cpufreq internal locks nest within
    the lock_cpu_hotplug() lock.

    I'm not entirely happy about how the __cpufreq_governor rule ended up
    (conditional locking rule depending on the argument) but basically all
    callers pass this as a constant so it's not too horrible.

    The patch also removes the cpufreq_governor() function since during the
    locking audit it turned out to be entirely unused (so no need to fix it)

    The patch works on my testbox, but it could use more testing
    (otoh... it can't be much worse than the current code)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

24 Jul, 2006

1 commit


30 Jun, 2006

3 commits


23 Jun, 2006

1 commit

  • drivers/cpufreq/cpufreq_ondemand.c: In function 'do_dbs_timer':
    drivers/cpufreq/cpufreq_ondemand.c:374: warning: implicit declaration of function 'lock_cpu_hotplug'
    drivers/cpufreq/cpufreq_ondemand.c:381: warning: implicit declaration of function 'unlock_cpu_hotplug'
    drivers/cpufreq/cpufreq_conservative.c: In function 'do_dbs_timer':
    drivers/cpufreq/cpufreq_conservative.c:425: warning: implicit declaration of function 'lock_cpu_hotplug'
    drivers/cpufreq/cpufreq_conservative.c:432: warning: implicit declaration of function 'unlock_cpu_hotplug'

    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

22 Jun, 2006

1 commit

  • Rootcaused the bug to a deadlock in cpufreq and ondemand. Due to non-existent
    ordering between cpu_hotplug lock and dbs_mutex. Basically a race condition
    between cpu_down() and do_dbs_timer().

    cpu_down() flow:
    * cpu_down() call for CPU 1
    * Takes hot plug lock
    * Calls pre down notifier
    * cpufreq notifier handler calls cpufreq_driver_target() which takes
    cpu_hotplug lock again. OK as cpu_hotplug lock is recursive in same
    process context
    * CPU 1 goes down
    * Calls post down notifier
    * cpufreq notifier handler calls ondemand event stop which takes dbs_mutex

    So, cpu_hotplug lock is taken before dbs_mutex in this flow.

    do_dbs_timer is triggerred by a periodic timer event.
    It first takes dbs_mutex and then takes cpu_hotplug lock in
    cpufreq_driver_target().
    Note the reverse order here compared to above. So, if this timer event happens
    at right moment during cpu_down, system will deadlok.

    Attached patch fixes the issue for both ondemand and conservative.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones

    Venkatesh Pallipadi
     

09 May, 2006

1 commit

  • Taking the cpu hotplug semaphore in a normal events workqueue
    is unsafe because other tasks can wait for any workqueues with
    it hold. This results in a deadlock.

    Move the DBS timer into its own work queue which is not
    affected by other work queue flushes to avoid this.

    Has been acked by Venkatesh.

    Cc: venkatesh.pallipadi@intel.com
    Cc: cpufreq@lists.linux.org.uk
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

26 Mar, 2006

3 commits


28 Feb, 2006

1 commit


19 Jan, 2006

1 commit


01 Dec, 2005

1 commit

  • The use of the 'ignore_nice' sysfs file is confusing to anyone using it.
    This removes the sysfs file 'ignore_nice' and in its place creates a
    'ignore_nice_load' entry that defaults to '0'; meaning nice'd processes
    _are_ counted towards the 'business' calculation.

    WARNING: this obvious breaks any userland tools that expected ignore_nice'
    to exist, to draw attention to this fact it was concluded on the mailing
    list that the entry should be removed altogether so the userland app breaks
    and so the author can build simple to detect workaround. Having said that
    it seems currently very few tools even make use of this functionality; all
    I could find was a Gentoo Wiki entry.

    Signed-off-by: Alexander Clouter
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Jones

    Alexander Clouter
     

21 Sep, 2005

1 commit

  • The problem is in the ondemand governor, there is a periodic measurement
    of the CPU usage. This CPU usage is updated by the scheduler after every
    tick (basically, by adding 1 either to "idle" or to "user" or to
    "system"). So if the frequency of the governor is too high, the stat
    will be meaningless (as mostly no number have changed).

    So this patch checks that the measurements are separated by at least 10
    ticks. It means that by default, stats will have about 5% error (20
    ticks). Of course those numbers can be argued but, IMHO, they look sane.
    The patch also includes a small clean-up to check more explictly the
    result of the conversion from ns to µs being null.

    Let's note that (on x86) this has never been really needed before 2.6.13
    because HZ was always 1000. Now that HZ can be 100, some CPU might be
    affected by this problem. For instance when HZ=100, the centrino ,which
    has a 10µs transition latency, would lead to the governor allowing to
    read stats every tick (10ms)!

    Signed-off-by: Eric Piel
    Signed-off-by: Dave Jones

    Dave Jones
     

01 Jun, 2005

8 commits