02 Oct, 2013

1 commit

  • commit 7bd36014460f793c19e7d6c94dab67b0afcfcb7f upstream.

    Gerlando Falauto reported that when HRTICK is enabled, it is
    possible to trigger system deadlocks. These were hard to
    reproduce, as HRTICK has been broken in the past, but seemed
    to be connected to the timekeeping_seq lock.

    Since seqlock/seqcount's aren't supported w/ lockdep, I added
    some extra spinlock based locking and triggered the following
    lockdep output:

    [ 15.849182] ntpd/4062 is trying to acquire lock:
    [ 15.849765] (&(&pool->lock)->rlock){..-...}, at: [] __queue_work+0x145/0x480
    [ 15.850051]
    [ 15.850051] but task is already holding lock:
    [ 15.850051] (timekeeper_lock){-.-.-.}, at: [] do_adjtimex+0x7f/0x100

    [ 15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
    [ 15.850051] Possible unsafe locking scenario:
    [ 15.850051]
    [ 15.850051] CPU0 CPU1
    [ 15.850051] ---- ----
    [ 15.850051] lock(timekeeper_lock);
    [ 15.850051] lock(&p->pi_lock);
    [ 15.850051] lock(timekeeper_lock);
    [ 15.850051] lock(&(&pool->lock)->rlock);
    [ 15.850051]
    [ 15.850051] *** DEADLOCK ***

    The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping:
    Hold timekeepering locks in do_adjtimex and hardpps") in 3.10

    This patch avoids this deadlock, by moving the call to
    schedule_delayed_work() outside of the timekeeper lock
    critical section.

    Reported-by: Gerlando Falauto
    Tested-by: Lin Ming
    Signed-off-by: John Stultz
    Cc: Mathieu Desnoyers
    Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    John Stultz
     

08 Sep, 2013

1 commit

  • commit 84a78a6504f5c5394a8e558702e5b54131f01d14 upstream.

    Correct an issue with /proc/timer_list reported by Holger.

    When reading from the proc file with a sufficiently small buffer, 2k so
    not really that small, there was one could get hung trying to read the
    file a chunk at a time.

    The timer_list_start function failed to account for the possibility that
    the offset was adjusted outside the timer_list_next.

    Signed-off-by: Nathan Zimmer
    Reported-by: Holger Hans Peter Freyther
    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Berke Durak
    Cc: Jeff Layton
    Tested-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Nathan Zimmer
     

12 Aug, 2013

1 commit

  • commit 148519120c6d1f19ad53349683aeae9f228b0b8d upstream.

    Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
    repeat mode), because it has been identified as the source of a
    significant performance regression in v3.8 and later as explained by
    Jeremy Eder:

    We believe we've identified a particular commit to the cpuidle code
    that seems to be impacting performance of variety of workloads.
    The simplest way to reproduce is using netperf TCP_RR test, so
    we're using that, on a pair of Sandy Bridge based servers. We also
    have data from a large database setup where performance is also
    measurably/positively impacted, though that test data isn't easily
    share-able.

    Included below are test results from 3 test kernels:

    kernel reverts
    -----------------------------------------------------------
    1) vanilla upstream (no reverts)

    2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c

    3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
    e11538d1f03914eb92af5a1a378375c05ae8520c

    In summary, netperf TCP_RR numbers improve by approximately 4%
    after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When
    69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency
    never seems to get above 40%. Taking that patch out gets C0 near
    100% quite often, and performance increases.

    The below data are histograms representing the %c0 residency @
    1-second sample rates (using turbostat), while under netperf test.

    - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
    - The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
    shows %c0 in the 80,90,100% bins.

    Below each kernel name are netperf TCP_RR trans/s numbers for the
    particular kernel that can be disclosed publicly, comparing the 3
    test kernels. We ran a 4th test with the vanilla kernel where
    we've also set /dev/cpu_dma_latency=0 to show overall impact
    boosting single-threaded TCP_RR performance over 11% above
    baseline.

    3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
    TCP_RR trans/s 54323.78

    -----------------------------------------------------------
    3.10-rc2 vanilla RX (no reverts)
    TCP_RR trans/s 48192.47

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 59]:
    ***********************************************************
    40.0000 - 50.0000 [ 1]: *
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    Sender %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 11]: ***********
    40.0000 - 50.0000 [ 49]:
    *************************************************
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    -----------------------------------------------------------
    3.10-rc2 perfteam2 RX (reverts commit
    e11538d1f03914eb92af5a1a378375c05ae8520c)
    TCP_RR trans/s 49698.69

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 1]: *
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 59]:
    ***********************************************************
    40.0000 - 50.0000 [ 0]:
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    Sender %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 2]: **
    40.0000 - 50.0000 [ 58]:
    **********************************************************
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    -----------------------------------------------------------
    3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
    and e11538d1f03914eb92af5a1a378375c05ae8520c)
    TCP_RR trans/s 47766.95

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 1]: *
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 27]: ***************************
    40.0000 - 50.0000 [ 2]: **
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 2]: **
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 28]: ****************************

    Sender:
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 11]: ***********
    40.0000 - 50.0000 [ 0]:
    50.0000 - 60.0000 [ 1]: *
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 3]: ***
    80.0000 - 90.0000 [ 7]: *******
    90.0000 - 100.0000 [ 38]: **************************************

    These results demonstrate gaining back the tendency of the CPU to
    stay in more responsive, performant C-states (and thus yield
    measurably better performance), by reverting commit
    69a37beabf1f0a6705c08e879bdd5d82ff6486c4.

    Requested-by: Jeremy Eder
    Tested-by: Len Brown
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     

26 Jul, 2013

2 commits

  • commit 1f73a9806bdd07a5106409bbcab3884078bd34fe upstream.

    When the system switches from periodic to oneshot mode, the broadcast
    logic causes a possibility that a CPU which has not yet switched to
    oneshot mode puts its own clock event device into oneshot mode without
    updating the state and the timer handler.

    CPU0 CPU1
    per cpu tickdev is in periodic mode
    and switched to broadcast

    Switch to oneshot mode
    tick_broadcast_switch_to_oneshot()
    cpumask_copy(tick_oneshot_broacast_mask,
    tick_broadcast_mask);

    broadcast device mode = oneshot

    Timer interrupt

    irq_enter()
    tick_check_oneshot_broadcast()
    dev->set_mode(ONESHOT);

    tick_handle_periodic()
    if (dev->mode == ONESHOT)
    dev->next_event += period;
    FAIL.

    We fail, because dev->next_event contains KTIME_MAX, if the device was
    in periodic mode before the uncontrolled switch to oneshot happened.

    We must copy the broadcast bits over to the oneshot mask, because
    otherwise a CPU which relies on the broadcast would not been woken up
    anymore after the broadcast device switched to oneshot mode.

    So we need to verify in tick_check_oneshot_broadcast() whether the CPU
    has already switched to oneshot mode. If not, leave the device
    untouched and let the CPU switch controlled into oneshot mode.

    This is a long standing bug, which was never noticed, because the main
    user of the broadcast x86 cannot run into that scenario, AFAICT. The
    nonarchitected timer mess of ARM creates a gazillion of differently
    broken abominations which trigger the shortcomings of that broadcast
    code, which better had never been necessary in the first place.

    Reported-and-tested-by: Stehle Vincent-B46079
    Reviewed-by: Stephen Boyd
    Cc: John Stultz ,
    Cc: Mark Rutland
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 07bd1172902e782f288e4d44b1fde7dec0f08b6f upstream.

    The recent implementation of a generic dummy timer resulted in a
    different registration order of per cpu local timers which made the
    broadcast control logic go belly up.

    If the dummy timer is the first clock event device which is registered
    for a CPU, then it is installed, the broadcast timer is initialized
    and the CPU is marked as broadcast target.

    If a real clock event device is installed after that, we can fail to
    take the CPU out of the broadcast mask. In the worst case we end up
    with two periodic timer events firing for the same CPU. One from the
    per cpu hardware device and one from the broadcast.

    Now the problem is that we have no way to distinguish whether the
    system is in a state which makes broadcasting necessary or the
    broadcast bit was set due to the nonfunctional dummy timer
    installment.

    To solve this we need to keep track of the system state seperately and
    provide a more detailed decision logic whether we keep the CPU in
    broadcast mode or not.

    The old decision logic only clears the broadcast mode, if the newly
    installed clock event device is not affected by power states.

    The new logic clears the broadcast mode if one of the following is
    true:

    - The new device is not affected by power states.

    - The system is not in a power state affected mode

    - The system has switched to oneshot mode. The oneshot broadcast is
    controlled from the deep idle state. The CPU is not in idle at
    this point, so it's safe to remove it from the mask.

    If we clear the broadcast bit for the CPU when a new device is
    installed, we also shutdown the broadcast device when this was the
    last CPU in the broadcast mask.

    If the broadcast bit is kept, then we leave the new device in shutdown
    state and rely on the broadcast to deliver the timer interrupts via
    the broadcast ipis.

    Reported-and-tested-by: Stehle Vincent-B46079
    Reviewed-by: Stephen Boyd
    Cc: John Stultz ,
    Cc: Mark Rutland
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Jun, 2013

1 commit

  • The recent modification in the cpuidle framework consolidated the
    timer broadcast code across the different drivers by setting a new
    flag in the idle state. It tells the cpuidle core code to enter/exit
    the broadcast mode for the cpu when entering a deep idle state. The
    broadcast timer enter/exit is no longer handled by the back-end
    driver.

    This change made the local interrupt to be enabled *before* calling
    CLOCK_EVENT_NOTIFY_EXIT.

    On a tegra114, a four cores system, when the flag has been introduced
    in the driver, the following warning appeared:

    WARNING: at kernel/time/tick-broadcast.c:578 tick_broadcast_oneshot_control
    CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-rc3-next-20130529+ #15
    [] (tick_broadcast_oneshot_control+0x1a4/0x1d0) from [] (tick_notify+0x240/0x40c)
    [] (tick_notify+0x240/0x40c) from [] (notifier_call_chain+0x44/0x84)
    [] (notifier_call_chain+0x44/0x84) from [] (raw_notifier_call_chain+0x18/0x20)
    [] (raw_notifier_call_chain+0x18/0x20) from [] (clockevents_notify+0x28/0x170)
    [] (clockevents_notify+0x28/0x170) from [] (cpuidle_idle_call+0x11c/0x168)
    [] (cpuidle_idle_call+0x11c/0x168) from [] (arch_cpu_idle+0x8/0x38)
    [] (arch_cpu_idle+0x8/0x38) from [] (cpu_startup_entry+0x60/0x134)
    [] (cpu_startup_entry+0x60/0x134) from [] (0x804fe9a4)

    I don't have the hardware, so I wasn't able to reproduce the warning
    but after looking a while at the code, I deduced the following:

    1. the CPU2 enters a deep idle state and sets the broadcast timer

    2. the timer expires, the tick_handle_oneshot_broadcast function is
    called, setting the tick_broadcast_pending_mask and waking up the
    idle cpu CPU2

    3. the CPU2 exits idle handles the interrupt and then invokes
    tick_broadcast_oneshot_control with CLOCK_EVENT_NOTIFY_EXIT which
    runs the following code:

    [...]
    if (dev->next_event.tv64 == KTIME_MAX)
    goto out;

    if (cpumask_test_and_clear_cpu(cpu,
    tick_broadcast_pending_mask))
    goto out;
    [...]

    So if there is no next event scheduled for CPU2, we fulfil the
    first condition and jump out without clearing the
    tick_broadcast_pending_mask.

    4. CPU2 goes to deep idle again and calls
    tick_broadcast_oneshot_control with CLOCK_NOTIFY_EVENT_ENTER but
    with the tick_broadcast_pending_mask set for CPU2, triggering the
    warning.

    The issue only surfaced due to the modifications of the cpuidle
    framework, which resulted in interrupts being enabled before the call
    to the clockevents code. If the call happens before interrupts have
    been enabled, the warning cannot trigger, because there is still the
    event pending which caused the broadcast timer expiry.

    Move the check for the next event below the check for the pending bit,
    so the pending bit gets cleared whether an event is scheduled on the
    cpu or not.

    [ tglx: Massaged changelog ]

    Signed-off-by: Daniel Lezcano
    Reported-and-tested-by: Joseph Lo
    Cc: Stephen Warren
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linaro-kernel@lists.linaro.org
    Link: http://lkml.kernel.org/r/1371485735-31249-1-git-send-email-daniel.lezcano@linaro.org
    Signed-off-by: Thomas Gleixner

    Daniel Lezcano
     

31 May, 2013

2 commits

  • Since 7300711e ("clockevents: broadcast fixup possible waiters"),
    the timekeeping duty is assigned to the CPU that handles the tick
    broadcast clock device by the time it is set in one shot mode.

    This is an issue in full dynticks mode where the timekeeping duty
    must stay handled by the boot CPU for now. Otherwise it prevents
    secondary CPUs from offlining and this breaks
    suspend/shutdown/reboot/...

    As it appears there is no reason for this timekeeping duty to be
    moved to the broadcast CPU, besides nothing prevent it from being
    later re-assigned to another target, let's simply remove it.

    Signed-off-by: Jiri Bohac
    Reported-by: Steven Rostedt
    Acked-by: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Jiri Bohac
     
  • In tick_nohz_cpu_down_callback() if the cpu is the one handling
    timekeeping, we must return something that stops the CPU_DOWN_PREPARE
    notifiers and then start notify CPU_DOWN_FAILED on the already called
    notifier call backs.

    However traditional errno values are not handled by the notifier unless
    these are encapsulated using errno_to_notifier().

    Hence the current -EINVAL is misinterpreted and converted to junk after
    notifier_to_errno(), leaving the notifier subsystem to random behaviour
    such as eventually allowing the cpu to go down.

    Fix this by using the standard NOTIFY_BAD instead.

    Signed-off-by: Li Zhong
    Reviewed-by: Srivatsa S. Bhat
    Acked-by: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Li Zhong
     

29 May, 2013

3 commits

  • Thomas Gleixner
     
  • Since commit 31ade30692dc9680bfc95700d794818fa3f754ac, timekeeping_init()
    checks for presence of persistent clock by attempting to read a non-zero
    time value. This is an issue on platforms where persistent_clock (instead
    is implemented as a free-running counter (instead of an RTC) starting
    from zero on each boot and running during suspend. Examples are some ARM
    platforms (e.g. PandaBoard).

    An attempt to read such a clock during timekeeping_init() may return zero
    value and falsely declare persistent clock as missing. Additionally, in
    the above case suspend times may be accounted twice (once from
    timekeeping_resume() and once from rtc_resume()), resulting in a gradual
    drift of system time.

    This patch does a run-time correction of the issue by doing the same check
    during timekeeping_suspend().

    A better long-term solution would have to return error when trying to read
    non-existing clock and zero when trying to read an uninitialized clock, but
    that would require changing all persistent_clock implementations.

    This patch addresses the immediate breakage, for now.

    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Feng Tang
    Cc: stable@vger.kernel.org
    Signed-off-by: Zoran Markovic
    [jstultz: Tweaked commit message and subject]
    Signed-off-by: John Stultz

    Zoran Markovic
     
  • kernel/time/ntp.c: In function ‘__hardpps’:
    kernel/time/ntp.c:877: warning: unused variable ‘flags’

    commit a076b2146fabb0894cae5e0189a8ba3f1502d737 ("ntp: Remove ntp_lock,
    using the timekeeping locks to protect ntp state") removed its users,
    but not the actual variable.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: John Stultz

    Geert Uytterhoeven
     

28 May, 2013

1 commit

  • commit 26517f3e (tick: Avoid programming the local cpu timer if
    broadcast pending) added a warning if the cpu enters broadcast mode
    again while the pending bit is still set. Meelis reported that the
    warning triggers. There are two corner cases which have been not
    considered:

    1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
    twice. That can result in the following scenario

    CPU0 CPU1
    cpuidle_idle_call()
    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
    set cpu in tick_broadcast_oneshot_mask

    broadcast interrupt
    event expired for cpu1
    set pending bit

    acpi_idle_enter_simple()
    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
    WARN_ON(pending bit)

    Move the WARN_ON into the section where we enter broadcast mode so
    it wont provide false positives on the second call.

    2) safe_halt() enables interrupts, so a broadcast interrupt can be
    delivered befor the broadcast mode is disabled. That sets the
    pending bit for the CPU which receives the broadcast
    interrupt. Though the interrupt is delivered right away from the
    broadcast handler and leaves the pending bit stale.

    Clear the pending bit for the current cpu in the broadcast handler.

    Reported-and-tested-by: Meelis Roos
    Cc: Len Brown
    Cc: Frederic Weisbecker
    Cc: Borislav Petkov
    Cc: Rafael J. Wysocki
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 May, 2013

1 commit

  • Pull timer fixes from Thomas Gleixner:

    - Cure for not using zalloc in the first place, which leads to random
    crashes with CPUMASK_OFF_STACK.

    - Revert a user space visible change which broke udev

    - Add a missing cpu_online early return introduced by the new full
    dyntick conversions

    - Plug a long standing race in the timer wheel cpu hotplug code.
    Sigh...

    - Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu
    up.

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
    timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
    tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
    tick: Cleanup NOHZ per cpu data on cpu down
    tick: Use zalloc_cpumask_var for allocating offstack cpumasks

    Linus Torvalds
     

15 May, 2013

1 commit

  • Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
    which enables some minor compile time optimization to avoid
    uncessary code in mostly the suspend/resume path could cause
    problems for userland.

    In particular, the dependency for RTC_HCTOSYS on
    !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
    twice and simplifies suspend/resume, has the side effect
    of causing the /sys/class/rtc/rtcN/hctosys flag to always be
    zero, and this flag is commonly used by udev to setup the
    /dev/rtc symlink to /dev/rtcN, which can cause pain for
    older applications.

    While the udev rules could use some work to be less fragile,
    breaking userland should strongly be avoided. Additionally
    the compile time optimizations are fairly minor, and the code
    being optimized is likely to be reworked in the future, so
    lets revert this change.

    Reported-by: Kay Sievers
    Signed-off-by: John Stultz
    Cc: stable #3.9
    Cc: Feng Tang
    Cc: Jason Gunthorpe
    Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     

14 May, 2013

1 commit

  • commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
    idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
    to bail out when the cpu is offline. That's causing subsequent
    failures as an offline CPU is supposed to die and not to fiddle with
    nohz magic.

    Return false in can_stop_idle_tick() if the cpu is offline.

    Reported-and-tested-by: Jiri Kosina
    Reported-and-tested-by: Prarit Bhargava
    Cc: Frederic Weisbecker
    Cc: Borislav Petkov
    Cc: Tony Luck
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

12 May, 2013

1 commit

  • Prarit reported a crash on CPU offline/online. The reason is that on
    CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
    up. If at cpu online an interrupt happens before the per cpu tick
    device is registered the irq_enter() check potentially sees stale data
    and dereferences a NULL pointer.

    Cleanup the data after the cpu is dead.

    Reported-by: Prarit Bhargava
    Cc: stable@vger.kernel.org
    Cc: Mike Galbraith
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

05 May, 2013

1 commit

  • commit b352bc1cbc (tick: Convert broadcast cpu bitmaps to
    cpumask_var_t) broke CONFIG_CPUMASK_OFFSTACK in a very subtle way.

    Instead of allocating the cpumasks with zalloc_cpumask_var it uses
    alloc_cpumask_var, so we can get random data there, which of course
    confuses the logic completely and causes random failures.

    Reported-and-tested-by: Dave Jones
    Reported-and-tested-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305032015060.2990@ionos
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

04 May, 2013

2 commits

  • The scheduler doesn't yet fully support environments
    with a single task running without a periodic tick.

    In order to ensure we still maintain the duties of scheduler_tick(),
    keep at least 1 tick per second.

    This makes sure that we keep the progression of various scheduler
    accounting and background maintainance even with a very low granularity.
    Examples include cpu load, sched average, CFS entity vruntime,
    avenrun and events such as load balancing, amongst other details
    handled in sched_class::task_tick().

    This limitation will be removed in the future once we get
    these individual items to work in full dynticks CPUs.

    Suggested-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Commit 0637e029392386e6996f5d6574aadccee8315efa
    ("nohz: Select wide RCU nocb for full dynticks") intended
    to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
    enabled.

    However this option is part of a choice menu and Kconfig's
    "select" instruction has no effect on such targets.

    Fix this by using reverse dependencies on the targets we
    don't want instead.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

02 May, 2013

1 commit


29 Apr, 2013

1 commit

  • I saw following error when testing the latest nohz code on
    Power:

    [ 85.295384] BUG: using smp_processor_id() in preemptible [00000000] code: rsyslogd/3493
    [ 85.295396] caller is .tick_nohz_task_switch+0x1c/0xb8
    [ 85.295402] Call Trace:
    [ 85.295408] [c0000001fababab0] [c000000000012dc4] .show_stack+0x110/0x25c (unreliable)
    [ 85.295420] [c0000001fababba0] [c0000000007c4b54] .dump_stack+0x20/0x30
    [ 85.295430] [c0000001fababc10] [c00000000044eb74] .debug_smp_processor_id+0xf4/0x124
    [ 85.295438] [c0000001fababca0] [c0000000000d7594] .tick_nohz_task_switch+0x1c/0xb8
    [ 85.295447] [c0000001fababd20] [c0000000000b9748] .finish_task_switch+0x13c/0x160
    [ 85.295455] [c0000001fababdb0] [c0000000000bbe50] .schedule_tail+0x50/0x124
    [ 85.295463] [c0000001fababe30] [c000000000009dc8] .ret_from_fork+0x4/0x54

    The code below moves the test into local_irq_save/restore
    section to avoid the above complaint.

    Signed-off-by: Li Zhong
    Acked-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Link: http://lkml.kernel.org/r/1367119558.6391.34.camel@ThinkPad-T5421.cn.ibm.com
    Signed-off-by: Ingo Molnar

    Li Zhong
     

27 Apr, 2013

2 commits

  • …rederic/linux-dynticks into timers/nohz

    Pull more full-dynticks updates from Frederic Weisbecker:

    * Get rid of the passive dependency on VIRT_CPU_ACCOUNTING_GEN (finally!)
    * Preparation patch to remove the dependency on CONFIG_64BITS

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Turn the full dynticks passive dependency on VIRT_CPU_ACCOUNTING_GEN
    to an active one.

    The full dynticks Kconfig is currently hidden behind the full dynticks
    cputime accounting, which is an awkward and counter-intuitive layout:
    the user first has to select the dynticks cputime accounting in order
    to make the full dynticks feature to be visible.

    We definetly want it the other way around. The usual way to perform
    this kind of active dependency is use "select" on the depended target.
    Now we can't use the Kconfig "select" instruction when the target is
    a "choice".

    So this patch inspires on how the RCU subsystem Kconfig interact
    with its dependencies on SMP and PREEMPT: we make sure that cputime
    accounting can't propose another option than VIRT_CPU_ACCOUNTING_GEN
    when NO_HZ_FULL is selected by using the right "depends on" instruction
    for each cputime accounting choices.

    v2: Keep full dynticks cputime accounting available even without
    full dynticks, as per Paul McKenney's suggestion.

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Christoph Lameter
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

26 Apr, 2013

1 commit

  • One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine,
    which apparently can break out of its idle loop rather frequently, with
    high frequency.

    In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant
    context switching, because tick_nohz_stop_sched_tick() will, if
    delta_jiffies == 0, mis-identify this as a timer event - activating the
    TIMER_SOFTIRQ, which wakes up ksoftirqd.

    Fix this by treating delta_jiffies == 0 the same way we treat other short
    wakeups, delta_jiffies == 1.

    Cc: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

25 Apr, 2013

2 commits

  • Vitaliy reported that a per cpu HPET timer interrupt crashes the
    system during hibernation. What happens is that the per cpu HPET timer
    gets shut down when the nonboot cpus are stopped. When the nonboot
    cpus are onlined again the HPET code sets up the MSI interrupt which
    fires before the clock event device is registered. The event handler
    is still set to hrtimer_interrupt, which then crashes the machine due
    to highres mode not being active.

    See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333

    There is no real good way to avoid that in the HPET code. The HPET
    code alrady has a mechanism to detect spurious interrupts when event
    handler == NULL for a similar reason.

    We can handle that in the clockevent/tick layer and replace the
    previous functional handler with a dummy handler like we do in
    tick_setup_new_device().

    The original clockevents code did this in clockevents_exchange_device(),
    but that got removed by commit 7c1e76897 (clockevents: prevent
    clockevent event_handler ending up handler_noop) which forgot to fix
    it up in tick_shutdown(). Same issue with the broadcast device.

    Reported-by: Vitaliy Fillipov
    Cc: Ben Hutchings
    Cc: stable@vger.kernel.org
    Cc: 700333@bugs.debian.org
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Reason: Get upstream fixes before adding conflicting code.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

24 Apr, 2013

1 commit

  • Remove the dependency on (TREE_RCU || TREE_PREEMPT_RCU). The full
    dynticks option already depends on SMP which implies
    (whatever flavour of) RCU tree config anyway.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

23 Apr, 2013

7 commits

  • It's not obvious to find out why the full dynticks subsystem
    doesn't always stop the tick: whether this is due to kthreads,
    posix timers, perf events, etc...

    These new tracepoints are here to help the user diagnose
    the failures and test this feature.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • It makes testing and implementation much easier as we
    know in advance that all CPUs are RCU nocbs.

    Also this prepares to remove the dynamic check for
    nohz_full= boot mask to be a subset of rcu_nocbs=

    Eventually this should also help removing the requirement
    for the boot CPU to be outside the full dynticks range.

    Suggested-by: Christoph Lameter
    Suggested-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • When a task is scheduled in, it may have some properties
    of its own that could make the CPU reconsider the need for
    the tick: posix cpu timers, perf events, ...

    So notify the full dynticks subsystem when a task gets
    scheduled in and re-check the tick dependency at this
    stage. This is done through a self IPI to avoid messing
    up with any current lock scenario.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Interrupt exit is a natural place to stop the tick: it happens
    after all events happening before and during the irq which
    are liable to update the dependency on the tick occured. Also
    it makes sure that any check on tick dependency is well ordered
    against dynticks kick IPIs.

    Bring in the infrastructure that performs the tick dependency
    checks on irq exit and shut it down if these checks show that we
    can do it safely.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Implement the full dynticks kick that is performed from
    IPIs sent by various subsystems (scheduler, posix timers, ...)
    when they want to notify about a new event that may
    reconsider the dependency on the tick.

    Most of the time, such an event end up restarting the tick.

    (Part of the design with subsystems providing *_can_stop_tick()
    helpers suggested by Peter Zijlstra a while ago).

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • commit 7ec98e15aa (timekeeping: Delay update of clock->cycle_last)
    forgot to update tk->cycle_last in the resume path. This results in a
    stale value versus clock->cycle_last and prevents resume in the worst
    case.

    Reported-by: Jiri Slaby
    Reported-and-tested-by: Borislav Petkov
    Acked-by: John Stultz
    Cc: Linux-pm mailing list
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304211648150.21884@ionos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The scheduler IPI is used by the scheduler to kick
    full dynticks CPUs asynchronously when more than one
    task are running or when a new timer list timer is
    enqueued. This way the destination CPU can decide
    to restart the tick to handle this new situation.

    Now let's call that kick in the scheduler IPI.

    (Reusing the scheduler IPI rather than implementing
    a new IPI was suggested by Peter Zijlstra a while ago)

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

21 Apr, 2013

1 commit


19 Apr, 2013

4 commits

  • Provide a new kernel config that defaults all CPUs to be part
    of the full dynticks range, except the boot one for timekeeping.

    This default setting is overriden by the nohz_full= boot option
    if passed by the user.

    This is helpful for those who don't need a finegrained range
    of full dynticks CPU and also for automated testing.

    Suggested-by: Ingo Molnar
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • We need full dynticks CPU to also be RCU nocb so
    that we don't have to keep the tick to handle RCU
    callbacks.

    Make sure the range passed to nohz_full= boot
    parameter is a subset of rcu_nocbs=

    The CPUs that fail to meet this requirement will be
    excluded from the nohz_full range. This is checked
    early in boot time, before any CPU has the opportunity
    to stop its tick.

    Suggested-by: Steven Rostedt
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • The timekeeping job must be able to run early on boot
    because there may be some pre-SMP (and thus pre-initcalls )
    components that rely on it. The IO-APIC is one such users
    as it tests the timer health by watching jiffies progression.

    Given that it happens before we know the initial online
    set, we can't rely on it to select a timekeeper. We need
    one before SMP time otherwise we simply crash on boot.

    To fix this and keep things simple for now, force the boot CPU
    outside of the full dynticks range in any case and do this early
    on kernel parameter parsing time.

    We might want a trickier solution later, expecially for aSMP
    architectures that need to assign housekeeping tasks to arbitrary
    low power CPUs.

    But it's still first pass KISS time for now.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     
  • Provide two new helpers in order to notify the full dynticks CPUs about
    some internal system changes against which they may reconsider the state
    of their tick. Some practical examples include: posix cpu timers, perf tick
    and sched clock tick.

    For now the notifying handler, implemented through IPIs, is a stub
    that will be implemented when we get the tick stop/restart infrastructure
    in.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker