17 Feb, 2017

1 commit

  • This reverts commit 24b91e360ef521a2808771633d76ebc68bd5604b and commit
    7bdb59f1ad47 ("tick/nohz: Fix possible missing clock reprog after tick
    soft restart") that depends on it,

    Pavel reports that it causes occasional boot hangs for him that seem to
    depend on just how the machine was booted. In particular, his machine
    hangs at around the PCI fixups of the EHCI USB host controller, but only
    hangs from cold boot, not from a warm boot.

    Thomas Gleixner suspecs it's a CPU hotplug interaction, particularly
    since Pavel also saw suspend/resume issues that seem to be related.
    We're reverting for now while trying to figure out the root cause.

    Reported-bisected-and-tested-by: Pavel Machek
    Acked-by: Frederic Weisbecker
    Cc: Wanpeng Li
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: stable@kernel.org # reverted commits were marked for stable
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Jan, 2017

1 commit

  • When the tick is stopped and an interrupt occurs afterward, we check on
    that interrupt exit if the next tick needs to be rescheduled. If it
    doesn't need any update, we don't want to do anything.

    In order to check if the tick needs an update, we compare it against the
    clockevent device deadline. Now that's a problem because the clockevent
    device is at a lower level than the tick itself if it is implemented
    on top of hrtimer.

    Every hrtimer share this clockevent device. So comparing the next tick
    deadline against the clockevent device deadline is wrong because the
    device may be programmed for another hrtimer whose deadline collides
    with the tick. As a result we may end up not reprogramming the tick
    accidentally.

    In a worst case scenario under full dynticks mode, the tick stops firing
    as it is supposed to every 1hz, leaving /proc/stat stalled:

    Task in a full dynticks CPU
    ----------------------------

    * hrtimer A is queued 2 seconds ahead
    * the tick is stopped, scheduled 1 second ahead
    * tick fires 1 second later
    * on tick exit, nohz schedules the tick 1 second ahead but sees
    the clockevent device is already programmed to that deadline,
    fooled by hrtimer A, the tick isn't rescheduled.
    * hrtimer A is cancelled before its deadline
    * tick never fires again until an interrupt happens...

    In order to fix this, store the next tick deadline to the tick_sched
    local structure and reuse that value later to check whether we need to
    reprogram the clock after an interrupt.

    On the other hand, ts->sleep_length still wants to know about the next
    clock event and not just the tick, so we want to improve the related
    comment to avoid confusion.

    Reported-by: James Hartsock
    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Wanpeng Li
    Acked-by: Peter Zijlstra
    Acked-by: Rik van Riel
    Link: http://lkml.kernel.org/r/1483539124-5693-1-git-send-email-fweisbec@gmail.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Frederic Weisbecker
     

29 Mar, 2016

1 commit

  • The tick dependency mask was intially unsigned long because this is the
    type on which clear_bit() operates on and fetch_or() accepts it.

    But now that we have atomic_fetch_or(), we can instead use
    atomic_andnot() to clear the bit. This consolidates the type of our
    tick dependency mask, reduce its size on structures and benefit from
    possible architecture optimizations on atomic_t operations.

    Suggested-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1458830281-4255-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

02 Mar, 2016

1 commit

  • The tick dependency is evaluated on every IRQ and context switch. This
    consists is a batch of checks which determine whether it is safe to
    stop the tick or not. These checks are often split in many details:
    posix cpu timers, scheduler, sched clock, perf events.... each of which
    are made of smaller details: posix cpu timer involves checking process
    wide timers then thread wide timers. Perf involves checking freq events
    then more per cpu details.

    Checking these informations asynchronously every time we update the full
    dynticks state bring avoidable overhead and a messy layout.

    Let's introduce instead tick dependency masks: one for system wide
    dependency (unstable sched clock, freq based perf events), one for CPU
    wide dependency (sched, throttling perf events), and task/signal level
    dependencies (posix cpu timers). The subsystems are responsible
    for setting and clearing their dependency through a set of APIs that will
    take care of concurrent dependency mask modifications and kick targets
    to restart the relevant CPU tick whenever needed.

    This new dependency engine stays beside the old one until all subsystems
    having a tick dependency are converted to it.

    Suggested-by: Thomas Gleixner
    Suggested-by: Peter Zijlstra
    Reviewed-by: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Chris Metcalf
    Cc: Ingo Molnar
    Cc: Luiz Capitulino
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Viresh Kumar
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

08 Jul, 2015

1 commit

  • Currently the broadcast busy check, which prevents the idle code from
    going into deep idle, works only in one shot mode.

    If NOHZ and HIGHRES are off (config or command line) there is no
    sanity check at all, so under certain conditions cpus are allowed to
    go into deep idle, where the local timer stops, and are not woken up
    again because there is no broadcast timer installed or a hrtimer based
    broadcast device is not evaluated.

    Move tick_broadcast_oneshot_control() into the common code and provide
    proper subfunctions for the various config combinations.

    The common check in tick_broadcast_oneshot_control() is for the C3STOP
    misfeature flag of the local clock event device. If its not set, idle
    can proceed. If set, further checks are necessary.

    Provide checks for the trivial cases:

    - If broadcast is disabled in the config, then return busy

    - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
    busy if the broadcast device is hrtimer based.

    - If oneshot mode is enabled in the config call the original
    tick_broadcast_oneshot_control() function. That function needs
    extra checks which will be implemented in seperate patches.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     

22 Apr, 2015

1 commit

  • The evaluation of the next timer in the nohz code is based on jiffies
    while all the tick internals are nano seconds based. We have also to
    convert hrtimer nanoseconds to jiffies in the !highres case. That's
    just wrong and introduces interesting corner cases.

    Turn it around and convert the next timer wheel timer expiry and the
    rcu event to clock monotonic and base all calculations on
    nanoseconds. That identifies the case where no timer is pending
    clearly with an absolute expiry value of KTIME_MAX.

    Makes the code more readable and gets rid of the jiffies magic in the
    nohz code.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Paul E. McKenney
    Acked-by: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Viresh Kumar
    Cc: Marcelo Tosatti
    Cc: Frederic Weisbecker
    Cc: Josh Triplett
    Cc: Lai Jiangshan
    Cc: John Stultz
    Cc: Marcelo Tosatti
    Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

01 Apr, 2015

2 commits

  • Use the new tick_suspend/resume_local() and get rid of the
    homebrewn implementation of these in the ARM bL switcher. The
    check for the cpumask is completely pointless. There is no harm
    to suspend a per cpu tick device unconditionally. If that's a
    real issue then we fix it proper at the core level and not with
    some completely undocumented hacks in some random core code.

    Move the tick internals to the core code, now that this nuisance
    is gone.

    Signed-off-by: Thomas Gleixner
    [ rjw: Rebase, changelog ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Nicolas Pitre
    Cc: Peter Zijlstra
    Cc: Russell King
    Link: http://lkml.kernel.org/r/1655112.Ws17YsMfN7@vostro.rjw.lan
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • No point to expose everything to the world. People just believe
    such functions can be abused for whatever purposes. Sigh.

    Signed-off-by: Thomas Gleixner
    [ Rebased on top of 4.0-rc5 ]
    Signed-off-by: Rafael J. Wysocki
    Cc: Nicolas Pitre
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/28017337.VbCUc39Gme@vostro.rjw.lan
    [ Merged to latest timers/core ]
    Signed-off-by: Ingo Molnar

    Thomas Gleixner