23 Jul, 2012

2 commits

  • Pull timer core changes from Ingo Molnar:
    "Continued cleanups of the core time and NTP code, plus more nohz work
    preparing for tick-less userspace execution."

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Rework timekeeping functions to take timekeeper ptr as argument
    time: Move xtime_nsec adjustment underflow handling timekeeping_adjust
    time: Move arch_gettimeoffset() usage into timekeeping_get_ns()
    time: Refactor accumulation of nsecs to secs
    time: Condense timekeeper.xtime into xtime_sec
    time: Explicitly use u32 instead of int for shift values
    time: Whitespace cleanups per Ingo%27s requests
    nohz: Move next idle expiry time record into idle logic area
    nohz: Move ts->idle_calls incrementation into strict idle logic
    nohz: Rename ts->idle_tick to ts->last_tick
    nohz: Make nohz API agnostic against idle ticks cputime accounting
    nohz: Separate idle sleeping time accounting from nohz logic
    timers: Improve get_next_timer_interrupt()
    timers: Add accounting of non deferrable timers
    timers: Consolidate base->next_timer update
    timers: Create detach_if_pending() and use it

    Linus Torvalds
     
  • Pull RCU changes from Ingo Molnar:
    "Quoting from Paul, the major features of this series are:

    1. Preventing latency spikes of more than 200 microseconds for
    kernels built with NR_CPUS=4096, which is reportedly becoming the
    default for some distros. This is a first step, as it does not
    help with systems that actually -have- 4096 CPUs (work on this case
    is in progress, but is not yet ready for mainline).

    This category also includes improving concurrency of rcu_barrier(),
    placed here due to conflicts. Posted to LKML at:

    https://lkml.org/lkml/2012/6/22/381

    Note that patches 18-22 of that series have been defered to 3.7, as
    they have not yet proven themselves to be mainline-ready (and yes,
    these are the ones intended to get rid of RCU's latency spikes for
    systems that actually have 4096 CPUs).

    2. Updates to documentation and rcutorture fixes, the latter category
    including improvements to rcu_barrier() testing. Posted to LKML at

    http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/04094.html.

    3. Miscellaneous fixes posted to LKML at:

    https://lkml.org/lkml/2012/6/22/500

    with the exception of the last commit, which was posted here:

    http://www.gossamer-threads.com/lists/linux/kernel/1561830

    4. RCU_FAST_NO_HZ fixes and improvements. Posted to LKML at:

    http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/00006.html
    http://www.gossamer-threads.com/lists/linux/kernel/1561833

    The first four patches of the first series went into 3.5 to fix a
    regression.

    5. Code-style fixes. These were posted to LKML at

    http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01180.html
    http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01181.html"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
    rcu: Fix broken strings in RCU's source code.
    rcu: Fix code-style issues involving "else"
    rcu: Introduce check for callback list/count mismatch
    rcu: Make RCU_FAST_NO_HZ respect nohz= boot parameter
    rcu: Fix qlen_lazy breakage
    rcu: Round FAST_NO_HZ lazy timeout to nearest second
    rcu: The rcu_needs_cpu() function is not a quiescent state
    rcu: Dump only the current CPU's buffers for idle-entry/exit warnings
    rcu: Add check for CPUs going offline with callbacks queued
    rcu: Disable preemption in rcu_blocking_is_gp()
    rcu: Prevent uninitialized string in RCU CPU stall info
    rcu: Fix rcu_is_cpu_idle() #ifdef in TINY_RCU
    rcu: Split RCU core processing out of __call_rcu()
    rcu: Prevent __call_rcu() from invoking RCU core on offline CPUs
    rcu: Make __call_rcu() handle invocation from idle
    rcu: Remove function versions of __kfree_rcu and __is_kfree_rcu_offset
    rcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementations
    rcu: Remove return value from rcu_assign_pointer()
    key: Remove extraneous parentheses from rcu_assign_keypointer()
    rcu: Remove return value from RCU_INIT_POINTER()
    ...

    Linus Torvalds
     

19 Jul, 2012

1 commit


18 Jul, 2012

1 commit


17 Jul, 2012

1 commit

  • The leap second rework unearthed another issue of inconsistent data.

    On timekeeping_resume() the timekeeper data is updated, but nothing
    calls timekeeping_update(), so now the update code in the timer
    interrupt sees stale values.

    This has been the case before those changes, but then the timer
    interrupt was using stale data as well so this went unnoticed for quite
    some time.

    Add the missing update call, so all the data is consistent everywhere.

    Reported-by: Andreas Schwab
    Reported-and-tested-by: "Rafael J. Wysocki"
    Reported-and-tested-by: Martin Steigerwald
    Cc: LKML
    Cc: Linux PM list
    Cc: John Stultz
    Cc: Ingo Molnar
    Cc: Peter Zijlstra ,
    Cc: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

15 Jul, 2012

10 commits

  • As part of cleaning up the timekeeping code, this patch converts
    a number of internal functions to takei a timekeeper ptr as an
    argument, so that the internal functions don't access the global
    timekeeper structure directly. This allows for further optimizations
    to reduce lock hold time later.

    This patch has been updated to include more consistent usage of the
    timekeeper value, by making sure it is always passed as a argument
    to non top-level functions.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1342156917-25092-9-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • When we make adjustments speeding up the clock, its possible
    for xtime_nsec to underflow. We already handle this properly,
    but we do so from update_wall_time() instead of the more logical
    timekeeping_adjust(), where the possible underflow actually
    occurs.

    Thus, move the correction logic to the timekeeping_adjust, which
    is the function that causes the issue. Making update_wall_time()
    more readable.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1342156917-25092-8-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • Since we call arch_gettimeoffset() in all the accessor
    functions, move arch_gettimeoffset() calls into
    timekeeping_get_ns() and timekeeping_get_ns_raw() to simplify
    the code.

    This also makes the code easier to maintain as we don't have to
    worry about forgetting the arch_gettimeoffset() as has happened
    in the past.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1342156917-25092-7-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • We do the exact same logic moving nsecs to secs in the
    timekeeper in multiple places, so condense this into a
    single function.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1342156917-25092-6-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • The timekeeper struct has a xtime_nsec, which keeps the
    sub-nanosecond remainder. This ends up being somewhat
    duplicative of the timekeeper.xtime.tv_nsec value, and we
    have to do extra work to keep them apart, copying the full
    nsec portion out and back in over and over.

    This patch simplifies some of the logic by taking the timekeeper
    xtime value and splitting it into timekeeper.xtime_sec and
    reuses the timekeeper.xtime_nsec for the sub-second portion
    (stored in higher res shifted nanoseconds).

    This simplifies some of the accumulation logic. And will
    allow for more accurate timekeeping once the vsyscall code
    is updated to use the shifted nanosecond remainder.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • Ingo noted that using a u32 instead of int for shift values
    would be better to make sure the compiler doesn't unnecessarily
    use complex signed arithmetic.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1342156917-25092-4-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • Ingo noted a number of places where there is inconsistent
    use of whitespace. This patch tries to address the main
    culprits.

    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1342156917-25092-3-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • Reason: Update to upstream changes to avoid further conflicts.
    Fixup a trivial merge conflict in kernel/time/tick-sched.c

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I
    introduced a bug that kept the STA_INS or STA_DEL bit
    from being cleared from time_status via adjtimex()
    without forcing STA_PLL first.

    Usually once the STA_INS is set, it isn't cleared
    until the leap second is applied, so its unlikely this
    affected anyone. However during testing I noticed it
    took some effort to cancel a leap second once STA_INS
    was set.

    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    CC: stable@vger.kernel.org # 3.4
    Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • …t-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull RCU, perf, and scheduler fixes from Ingo Molnar.

    The RCU fix is a revert for an optimization that could cause deadlocks.

    One of the scheduler commits (164c33c6adee "sched: Fix fork() error path
    to not crash") is correct but not complete (some architectures like Tile
    are not covered yet) - the resulting additional fixes are still WIP and
    Ingo did not want to delay these pending fixes. See this thread on
    lkml:

    [PATCH] fork: fix error handling in dup_task()

    The perf fixes are just trivial oneliners.

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf kvm: Fix segfault with report and mixed guestmount use
    perf kvm: Fix regression with guest machine creation
    perf script: Fix format regression due to libtraceevent merge
    ring-buffer: Fix accounting of entries when removing pages
    ring-buffer: Fix crash due to uninitialized new_pages list head

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    MAINTAINERS/sched: Update scheduler file pattern
    sched/nohz: Rewrite and fix load-avg computation -- again
    sched: Fix fork() error path to not crash

    Linus Torvalds
     

12 Jul, 2012

3 commits

  • To finally fix the infamous leap second issue and other race windows
    caused by functions which change the offsets between the various time
    bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
    function which atomically gets the current monotonic time and updates
    the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
    overhead. The previous patch which provides ktime_t offsets allows us
    to make this function almost as cheap as ktime_get() which is going to
    be replaced in hrtimer_interrupt().

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Signed-off-by: John Stultz
    Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • We need to update the hrtimer clock offsets from the hrtimer interrupt
    context. To avoid conversions from timespec to ktime_t maintain a
    ktime_t based representation of those offsets in the timekeeper. This
    puts the conversion overhead into the code which updates the
    underlying offsets and provides fast accessible values in the hrtimer
    interrupt.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The timekeeping code misses an update of the hrtimer subsystem after a
    leap second happened. Due to that timers based on CLOCK_REALTIME are
    either expiring a second early or late depending on whether a leap
    second has been inserted or deleted until an operation is initiated
    which causes that update. Unless the update happens by some other
    means this discrepancy between the timekeeping and the hrtimer data
    stays forever and timers are expired either early or late.

    The reported immediate workaround - $ data -s "`date`" - is causing a
    call to clock_was_set() which updates the hrtimer data structures.
    See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

    Add the missing clock_was_set() call to update_wall_time() in case of
    a leap second event. The actual update is deferred to softirq context
    as the necessary smp function call cannot be invoked from hard
    interrupt context.

    Signed-off-by: John Stultz
    Reported-by: Jan Engelhardt
    Reviewed-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com
    Signed-off-by: Thomas Gleixner

    John Stultz
     

06 Jul, 2012

1 commit

  • Thanks to Charles Wang for spotting the defects in the current code:

    - If we go idle during the sample window -- after sampling, we get a
    negative bias because we can negate our own sample.

    - If we wake up during the sample window we get a positive bias
    because we push the sample to a known active period.

    So rewrite the entire nohz load-avg muck once again, now adding
    copious documentation to the code.

    Reported-and-tested-by: Doug Smythies
    Reported-and-tested-by: Charles Wang
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins
    [ minor edits ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jul, 2012

1 commit

  • If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
    also disable itself. This commit therefore checks for tick_nohz_enabled
    being zero, disabling rcu_prepare_for_idle() if so. This commit assumes
    that tick_nohz_enabled can change at runtime: If this is not the case,
    then a simpler approach suffices.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

16 Jun, 2012

1 commit

  • Pull core updates (RCU and locking) from Ingo Molnar:
    "Most of the diffstat comes from the RCU slow boot regression fixes,
    but there's also a debuggability improvements/fixes."

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    memblock: Document memblock_is_region_{memory,reserved}()
    rcu: Precompute RCU_FAST_NO_HZ timer offsets
    rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure
    rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks
    rcu: RCU_FAST_NO_HZ detection of callback adoption
    spinlock: Indicate that a lockup is only suspected
    kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
    panic: Make panic_on_oops configurable

    Linus Torvalds
     

12 Jun, 2012

5 commits

  • The next idle expiry time record and idle sleeps tracking are
    statistics that only concern idle.

    Since we want the nohz APIs to become usable further idle
    context, let's pull up the handling of these statistics to the
    callers in idle.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Daniel Lezcano
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Since we want to prepare for making the nohz API to work further
    the idle case, we need to pull ts->idle_calls incrementation up to
    the callers in idle.

    To perform this, we split tick_nohz_stop_sched_tick() in two parts:
    a first one that checks if we can really stop the tick for idle,
    and another that actually stops it. Then from the callers in idle,
    we check if we can stop the tick and only then we increment idle_calls
    and finally relay to the nohz API that won't care about these details
    anymore.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Daniel Lezcano
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Now that idle and nohz logics are going to be independant each others,
    ts->idle_tick becomes too much a biased name to describe the field that
    saves the last scheduled tick on top of which we re-calculate the next
    tick to schedule when the timer is restarted.

    We want to reuse this even to stop the tick outside idle cases. So let's
    rename it to some more generic name: ts->last_tick.

    This changes a bit the timer list stat export so we need to increase its
    version.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Daniel Lezcano
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • When the timer tick fires, it accounts the new jiffy as either part
    of system, user or idle time. This is how we record the cputime
    statistics.

    But when the tick is stopped from the idle task, we still need
    to record the number of jiffies spent tickless until we restart
    the tick and fall back to traditional tick-based cputime accounting.

    To do this, we take a snapshot of jiffies when the tick is stopped
    and compute the difference against the new value of jiffies when
    the tick is restarted. Then we account this whole difference to
    the idle cputime.

    However we are preparing to be able to stop the tick from other places
    than idle. So this idle time accounting needs to be performed from
    the callers of nohz APIs, not from the nohz APIs themselves because
    we now want them to be agnostic against places that stop/restart tick.

    Therefore, we pull the tickless idle time accounting out of generic
    nohz helpers up to idle entry/exit callers.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Daniel Lezcano
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • As we plan to be able to stop the tick outside the idle task, we
    need to prepare for separating nohz logic from idle. As a start,
    this pulls the idle sleeping time accounting out of the tick
    stop/restart API to the callers on idle entry/exit.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Daniel Lezcano
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

11 Jun, 2012

1 commit

  • …ck/linux-rcu into core/urgent

    Merge RCU fixes from Paul E. McKenney:

    " This series has four patches, the major point of which is to eliminate
    some slowdowns (including boot-time slowdowns) resulting from some
    RCU_FAST_NO_HZ changes. The issue with the changes is that posting timers
    from the idle loop has no effect if the CPU has entered dyntick-idle
    mode because the CPU has already computed its wakeup time, and posting
    a timer does not cause it to be recomputed. The short-term fix is for
    RCU to precompute the timeout value so that the CPU's calculation is
    correct. "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

09 Jun, 2012

1 commit


07 Jun, 2012

1 commit

  • When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
    calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
    next wakeup time based on the timer wheels. Only later, when actually
    entering the idle loop, rcu_prepare_for_idle() will be invoked. In some
    cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
    But all for naught: The next wakeup time for the CPU has already been
    computed, and posting a timer afterwards does not force that wakeup
    time to be recomputed. This means that rcu_prepare_for_idle()'s have
    no effect.

    This is not a problem on a busy system because something else will wake
    up the CPU soon enough. However, on lightly loaded systems, the CPU
    might stay asleep for a considerable length of time. If that CPU has
    a callback that the rest of the system is waiting on, the system might
    run very slowly or (in theory) even hang.

    This commit avoids this problem by having rcu_needs_cpu() give
    tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
    to wake back up, which tick_nohz_stop_sched_tick() takes into account
    when programming the CPU's wakeup time. An alternative approach is
    for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
    but timers are much more efficient than are hrtimers for frequently
    and repeatedly posting and cancelling a given timer, which is exactly
    what RCU_FAST_NO_HZ does.

    Reported-by: Pascal Chapperon
    Reported-by: Heiko Carstens
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     

06 Jun, 2012

1 commit

  • Pull scheduler fixes from Ingo Molnar.

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Remove NULL assignment of dattr_cur
    sched: Remove the last NULL entry from sched_feat_names
    sched: Make sched_feat_names const
    sched/rt: Fix SCHED_RR across cgroups
    sched: Move nr_cpus_allowed out of 'struct sched_rt_entity'
    sched: Make sure to not re-read variables after validation
    sched: Fix SD_OVERLAP
    sched: Don't try allocating memory from offline nodes
    sched/nohz: Fix rq->cpu_load calculations some more
    sched/x86: Use cpu_llc_shared_mask(cpu) for coregroup_mask

    Linus Torvalds
     

05 Jun, 2012

1 commit

  • Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
    leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
    wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

    Adjust wall_to_monotonic when NTP inserted a leapsecond.

    Reported-by: Richard Cochran
    Signed-off-by: John Stultz
    Tested-by: Richard Cochran
    Cc: stable@kernel.org
    Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     

30 May, 2012

1 commit

  • Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[]
    calculations") since while that fixed the busy case it regressed the
    mostly idle case.

    Add a callback from the nohz exit to also age the rq->cpu_load[]
    array. This closes the hole where either there was no nohz load
    balance pass during the nohz, or there was a 'significant' amount of
    idle time between the last nohz balance and the nohz exit.

    So we'll update unconditionally from the tick to not insert any
    accidental 0 load periods while busy, and we try and catch up from
    nohz idle balance and nohz exit. Both these are still prone to missing
    a jiffy, but that has always been the case.

    Signed-off-by: Peter Zijlstra
    Cc: pjt@google.com
    Cc: Venkatesh Pallipadi
    Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 May, 2012

4 commits

  • commit 5307c95 (tick: Add tick skew boot option) broke the
    !CONFIG_HIGH_RES_TIMERS build.

    Move the boot option parsing into the CONFIG_HIGH_RES_TIMERS section.

    Reported-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Cc: Mike Galbraith

    Thomas Gleixner
     
  • Make clockevents_config() into a global symbol to allow it to be used
    by compiled-in clockevent drivers. This is needed by drivers that want
    to update the timer frequency after registration time.

    Signed-off-by: Magnus Damm
    Tested-by: Simon Horman
    Cc: arnd@arndb.de
    Cc: johnstul@us.ibm.com
    Cc: rjw@sisk.pl
    Cc: lethal@linux-sh.org
    Cc: gregkh@linuxfoundation.org
    Cc: olof@lixom.net
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20120509143934.27521.46553.sendpatchset@w520
    Signed-off-by: Thomas Gleixner

    Magnus Damm
     
  • Let the user decide whether power consumption or jitter is the
    more important consideration for their machines.

    Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867:

    "Historically, Linux has tried to make the regular timer tick on the
    various CPUs not happen at the same time, to avoid contention on
    xtime_lock.

    Nowadays, with the tickless kernel, this contention no longer happens
    since time keeping and updating are done differently. In addition,
    this skew is actually hurting power consumption in a measurable way on
    many-core systems."

    Problems:

    - Contrary to the above, systems do encounter contention on both
    xtime_lock and RCU structure locks when the tick is synchronized.

    - Moderate sized RT systems suffer intolerable jitter due to the tick
    being synchronized.

    - SGI reports the same for their large systems.

    - Fully utilized systems reap no power saving benefit from skew removal,
    but do suffer from resulting induced lock contention.

    - 0209f649 rcu: limit rcu_node leaf-level fanout
    This patch was born to combat lock contention which testing showed
    to have been _induced by_ skew removal. Skew the tick, contention
    disappeared virtually completely.

    Signed-off-by: Mike Galbraith
    Link: http://lkml.kernel.org/r/1336472458.21924.78.camel@marge.simpson.net
    Signed-off-by: Thomas Gleixner

    Mike Galbraith
     
  • Pull timer updates from Thomas Gleixner.

    Various trivial conflict fixups in arch Kconfig due to addition of
    unrelated entries nearby. And one slightly more subtle one for sparc32
    (new user of GENERIC_CLOCKEVENTS), fixed up as per Thomas.

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
    timekeeping: Fix a few minor newline issues.
    time: remove obsolete declaration
    ntp: Fix a stale comment and a few stray newlines.
    ntp: Correct TAI offset during leap second
    timers: Fixup the Kconfig consolidation fallout
    x86: Use generic time config
    unicore32: Use generic time config
    um: Use generic time config
    tile: Use generic time config
    sparc: Use: generic time config
    sh: Use generic time config
    score: Use generic time config
    s390: Use generic time config
    openrisc: Use generic time config
    powerpc: Use generic time config
    mn10300: Use generic time config
    mips: Use generic time config
    microblaze: Use generic time config
    m68k: Use generic time config
    m32r: Use generic time config
    ...

    Linus Torvalds
     

22 May, 2012

4 commits