29 Oct, 2018

1 commit

  • When CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n, the call path
    hrtimer_reprogram -> clockevents_program_event ->
    clockevents_program_min_delta will not retry if the clock event driver
    returns -ETIME.

    If the driver could not satisfy the program_min_delta for any reason, the
    lack of a retry means the CPU may not receive a tick interrupt, potentially
    until the counter does a full period. This leads to rcu_sched timeout
    messages as the stalled CPU is detected by other CPUs, and other issues if
    the CPU is holding locks or other resources at the point at which it
    stalls.

    There have been a couple of observed mechanisms through which a clock event
    driver could not satisfy the requested min_delta and return -ETIME.

    With the MIPS GIC driver, the shared execution resource within MT cores
    means inconventient latency due to execution of instructions from other
    hardware threads in the core, within gic_next_event, can result in an event
    being set in the past.

    Additionally under virtualisation it is possible to get unexpected latency
    during a clockevent device's set_next_event() callback which can make it
    return -ETIME even for a delta based on min_delta_ns.

    It isn't appropriate to use MIN_ADJUST in the virtualisation case as
    occasional hypervisor induced high latency will cause min_delta_ns to
    quickly increase to the maximum.

    Instead, borrow the retry pattern from the MIN_ADJUST case, but without
    making adjustments. Retry up to 10 times, each time increasing the
    attempted delta by min_delta, before giving up.

    [ Matt: Reworked the loop and made retry increase the delta. ]

    Signed-off-by: James Hogan
    Signed-off-by: Matt Redfearn
    Signed-off-by: Thomas Gleixner
    Cc: linux-mips@linux-mips.org
    Cc: Daniel Lezcano
    Cc: "Martin Schwidefsky"
    Cc: James Hogan
    Link: https://lkml.kernel.org/r/1508422643-6075-1-git-send-email-matt.redfearn@mips.com

    James Hogan
     

04 Oct, 2018

3 commits

  • [ Upstream commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 ]

    The posix timer overrun handling is broken because the forwarding functions
    can return a huge number of overruns which does not fit in an int. As a
    consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
    random number generators.

    The k_clock::timer_forward() callbacks return a 64 bit value now. Make
    k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
    accounting is correct. 3Remove the temporary (int) casts.

    Add a helper function which clamps the overrun value returned to user space
    via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
    between 0 and INT_MAX. INT_MAX is an indicator for user space that the
    overrun value has been clamped.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • [ Upstream commit 6fec64e1c92d5c715c6d0f50786daa7708266bde ]

    The posix timer ti_overrun handling is broken because the forwarding
    functions can return a huge number of overruns which does not fit in an
    int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn
    into random number generators.

    As a first step to address that let the timer_forward() callbacks return
    the full 64 bit value.

    Cast it to (int) temporarily until k_itimer::ti_overrun is converted to
    64bit and the conversion to user space visible values is sanitized.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • [ Upstream commit 5f936e19cc0ef97dbe3a56e9498922ad5ba1edef ]

    Air Icy reported:

    UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
    signed integer overflow:
    1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
    Call Trace:
    alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
    __do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
    __se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
    __x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
    do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290

    alarm_timer_nsleep() uses ktime_add() to add the current time and the
    relative expiry value. ktime_add() has no sanity checks so the addition
    can overflow when the relative timeout is large enough.

    Use ktime_add_safe() which has the necessary sanity checks in place and
    limits the result to the valid range.

    Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers")
    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Sep, 2018

1 commit

  • Commit 0a0e0829f990 ("nohz: Fix missing tick reprogram when interrupting an
    inline softirq") got backported to stable trees and now causes the NOHZ
    softirq pending warning to trigger. It's not an upstream issue as the NOHZ
    update logic has been changed there.

    The problem is when a softirq disabled section gets interrupted and on
    return from interrupt the tick/nohz state is evaluated, which then can
    observe pending soft interrupts. These soft interrupts are legitimately
    pending because they cannot be processed as long as soft interrupts are
    disabled and the interrupted code will correctly process them when soft
    interrupts are reenabled.

    Add a check for softirqs disabled to the pending check to prevent the
    warning.

    Reported-by: Grygorii Strashko
    Reported-by: John Crispin
    Signed-off-by: Thomas Gleixner
    Tested-by: Grygorii Strashko
    Tested-by: John Crispin
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Anna-Maria Gleixner
    Cc: stable@vger.kernel.org
    Fixes: 2d898915ccf4838c ("nohz: Fix missing tick reprogram when interrupting an inline softirq")
    Acked-by: Frederic Weisbecker
    Tested-by: Geert Uytterhoeven

    Thomas Gleixner
     

20 Sep, 2018

1 commit

  • [ Upstream commit 363e934d8811d799c88faffc5bfca782fd728334 ]

    timer_base::must_forward_clock is indicating that the base clock might be
    stale due to a long idle sleep.

    The forwarding of the base clock takes place in the timer softirq or when a
    timer is enqueued to a base which is idle. If the enqueue of timer to an
    idle base happens from a remote CPU, then the following race can happen:

    CPU0 CPU1
    run_timer_softirq mod_timer

    base = lock_timer_base(timer);
    base->must_forward_clk = false
    if (base->must_forward_clk)
    forward(base); -> skipped

    enqueue_timer(base, timer, idx);
    -> idx is calculated high due to
    stale base
    unlock_timer_base(timer);
    base = lock_timer_base(timer);
    forward(base);

    The root cause is that timer_base::must_forward_clk is cleared outside the
    timer_base::lock held region, so the remote queuing CPU observes it as
    cleared, but the base clock is still stale. This can cause large
    granularity values for timers, i.e. the accuracy of the expiry time
    suffers.

    Prevent this by clearing the flag with timer_base::lock held, so that the
    forwarding takes place before the cleared flag is observable by a remote
    CPU.

    Signed-off-by: Gaurav Kohli
    Signed-off-by: Thomas Gleixner
    Cc: john.stultz@linaro.org
    Cc: sboyd@kernel.org
    Cc: linux-arm-msm@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Gaurav Kohli
     

09 Aug, 2018

1 commit

  • commit 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e upstream.

    local_timer_softirq_pending() checks whether the timer softirq is
    pending with: local_softirq_pending() & TIMER_SOFTIRQ.

    This is wrong because TIMER_SOFTIRQ is the softirq number and not a
    bitmask. So the test checks for the wrong bit.

    Use BIT(TIMER_SOFTIRQ) instead.

    Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
    Signed-off-by: Anna-Maria Gleixner
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Paul E. McKenney
    Reviewed-by: Daniel Bristot de Oliveira
    Acked-by: Frederic Weisbecker
    Cc: bigeasy@linutronix.de
    Cc: peterz@infradead.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Anna-Maria Gleixner
     

22 Jul, 2018

1 commit

  • commit 5b9e886a4af97574ca3ce1147f35545da0e7afc7 upstream.

    A number of places relies on list_empty(&cs->wd_list), however the
    list_head does not get initialized. Do so upon registration, such that
    thereafter it is possible to rely on list_empty() correctly reflecting
    the list membership status.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Tested-by: Diego Viola
    Reviewed-by: Rafael J. Wysocki
    Cc: stable@vger.kernel.org
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: rui.zhang@intel.com
    Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

03 Jul, 2018

1 commit

  • commit abcbcb80cd09cd40f2089d912764e315459b71f7 upstream.

    For the common cases where 1000 is a multiple of HZ, or HZ is a multiple of
    1000, jiffies_to_msecs() never returns zero when passed a non-zero time
    period.

    However, if HZ > 1000 and not an integer multiple of 1000 (e.g. 1024 or
    1200, as used on alpha and DECstation), jiffies_to_msecs() may return zero
    for small non-zero time periods. This may break code that relies on
    receiving back a non-zero value.

    jiffies_to_usecs() does not need such a fix: one jiffy can only be less
    than one µs if HZ > 1000000, and such large values of HZ are already
    rejected at build time, twice:

    - include/linux/jiffies.h does #error if HZ >= 12288,
    - kernel/time/time.c has BUILD_BUG_ON(HZ > USEC_PER_SEC).

    Broken since forever.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Arnd Bergmann
    Cc: John Stultz
    Cc: Stephen Boyd
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180622143357.7495-1-geert@linux-m68k.org
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     

23 May, 2018

1 commit

  • commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream.

    for_each_cpu() unintuitively reports CPU0 as set independent of the actual
    cpumask content on UP kernels. This causes an unexpected PIT interrupt
    storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
    a result, the virtual machine can suffer from a strange random delay of 1~20
    minutes during boot-up, and sometimes it can hang forever.

    Protect if by checking whether the cpumask is empty before entering the
    for_each_cpu() loop.

    [ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]

    Signed-off-by: Dexuan Cui
    Signed-off-by: Thomas Gleixner
    Cc: Josh Poulson
    Cc: "Michael Kelley (EOSG)"
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: stable@vger.kernel.org
    Cc: Rakib Mullick
    Cc: Jork Loeser
    Cc: Greg Kroah-Hartman
    Cc: Andrew Morton
    Cc: KY Srinivasan
    Cc: Linus Torvalds
    Cc: Alexey Dobriyan
    Cc: Dmitry Vyukov
    Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
    Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     

02 May, 2018

1 commit

  • commit 1f71addd34f4c442bec7d7c749acc1beb58126f2 upstream.

    Kaike reported that in tests rdma hrtimers occasionaly stopped working. He
    did great debugging, which provided enough context to decode the problem.

    CPU 3 CPU 2

    idle
    start sched_timer expires = 712171000000
    queue->next = sched_timer
    start rdmavt timer. expires = 712172915662
    lock(baseof(CPU3))
    tick_nohz_stop_tick()
    tick = 716767000000 timerqueue_add(tmr)

    hrtimer_set_expires(sched_timer, tick);
    sched_timer->expires = 716767000000 expires < queue->next->expires)
    hrtimer_start(sched_timer) queue->next = tmr;
    lock(baseof(CPU3))
    unlock(baseof(CPU3))
    timerqueue_remove()
    timerqueue_add()

    ts->sched_timer is queued and queue->next is pointing to it, but then
    ts->sched_timer.expires is modified.

    This not only corrupts the ordering of the timerqueue RB tree, it also
    makes CPU2 see the new expiry time of timerqueue->next->expires when
    checking whether timerqueue->next needs to be updated. So CPU2 sees that
    the rdma timer is earlier than timerqueue->next and sets the rdma timer as
    new next.

    Depending on whether it had also seen the new time at RB tree enqueue, it
    might have queued the rdma timer at the wrong place and then after removing
    the sched_timer the RB tree is completely hosed.

    The problem was introduced with a commit which tried to solve inconsistency
    between the hrtimer in the tick_sched data and the underlying hardware
    clockevent. It split out hrtimer_set_expires() to store the new tick time
    in both the NOHZ and the NOHZ + HIGHRES case, but missed the fact that in
    the NOHZ + HIGHRES case the hrtimer might still be queued.

    Use hrtimer_start(timer, tick...) for the NOHZ + HIGHRES case which sets
    timer->expires after canceling the timer and move the hrtimer_set_expires()
    invocation into the NOHZ only code path which is not affected as it merily
    uses the hrtimer as next event storage so code pathes can be shared with
    the NOHZ + HIGHRES case.

    Fixes: d4af6d933ccf ("nohz: Fix spurious warning when hrtimer and clockevent get out of sync")
    Reported-by: "Wan Kaike"
    Signed-off-by: Thomas Gleixner
    Acked-by: Frederic Weisbecker
    Cc: "Marciniszyn Mike"
    Cc: Anna-Maria Gleixner
    Cc: linux-rdma@vger.kernel.org
    Cc: "Dalessandro Dennis"
    Cc: "Fleck John"
    Cc: stable@vger.kernel.org
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: "Weiny Ira"
    Cc: "linux-rdma@vger.kernel.org"
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804241637390.1679@nanos.tec.linutronix.de
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804242119210.1597@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 Apr, 2018

1 commit

  • commit bd03143007eb9b03a7f2316c677780561b68ba2a upstream.

    syszbot reported the following debugobjects splat:

    ODEBUG: object is on stack, but not annotated
    WARNING: CPU: 0 PID: 4185 at lib/debugobjects.c:328

    RIP: 0010:debug_object_is_on_stack lib/debugobjects.c:327 [inline]
    debug_object_init+0x17/0x20 lib/debugobjects.c:391
    debug_hrtimer_init kernel/time/hrtimer.c:410 [inline]
    debug_init kernel/time/hrtimer.c:458 [inline]
    hrtimer_init+0x8c/0x410 kernel/time/hrtimer.c:1259
    alarm_init kernel/time/alarmtimer.c:339 [inline]
    alarm_timer_nsleep+0x164/0x4d0 kernel/time/alarmtimer.c:787
    SYSC_clock_nanosleep kernel/time/posix-timers.c:1226 [inline]
    SyS_clock_nanosleep+0x235/0x330 kernel/time/posix-timers.c:1204
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    This happens because the hrtimer for the alarm nanosleep is on stack, but
    the code does not use the proper debug objects initialization.

    Split out the code for the allocated use cases and invoke
    hrtimer_init_on_stack() for the nanosleep related functions.

    Reported-by: syzbot+a3e0726462b2e346a31d@syzkaller.appspotmail.com
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: syzkaller-bugs@googlegroups.com
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1803261528270.1585@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Mar, 2018

1 commit

  • commit 19b558db12f9f4e45a22012bae7b4783e62224da upstream.

    The clockid argument of clockid_to_kclock() comes straight from user space
    via various syscalls and is used as index into the posix_clocks array.

    Protect it against spectre v1 array out of bounds speculation. Remove the
    redundant check for !posix_clock[id] as this is another source for
    speculation and does not provide any advantage over the return
    posix_clock[id] path which returns NULL in that case anyway.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Dan Williams
    Cc: Rasmus Villemoes
    Cc: Greg KH
    Cc: stable@vger.kernel.org
    Cc: Linus Torvalds
    Cc: David Woodhouse
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802151718320.1296@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

09 Mar, 2018

1 commit

  • commit c52232a49e203a65a6e1a670cd5262f59e9364a0 upstream.

    On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
    live CPU. This happens from the control thread which initiated the unplug.

    If the CPU on which the control thread runs came out from a longer idle
    period then the base clock of that CPU might be stale because the control
    thread runs prior to any event which forwards the clock.

    In such a case the timers from the unplugged CPU are queued on the live CPU
    based on the stale clock which can cause large delays due to increased
    granularity of the outer timer wheels which are far away from base:;clock.

    But there is a worse problem than that. The following sequence of events
    illustrates it:

    - CPU0 timer1 is queued expires = 59969 and base->clk = 59131.

    The timer is queued at wheel level 2, with resulting expiry time = 60032
    (due to level granularity).

    - CPU1 enters idle @60007, with next timer expiry @60020.

    - CPU0 is hotplugged at @60009

    - CPU1 exits idle and runs the control thread which migrates the
    timers from CPU0

    timer1 is now queued in level 0 for immediate handling in the next
    softirq because the requested expiry time 59969 is before CPU1 base->clk
    60007

    - CPU1 runs code which forwards the base clock which succeeds because the
    next expiring timer. which was collected at idle entry time is still set
    to 60020.

    So it forwards beyond 60007 and therefore misses to expire the migrated
    timer1. That timer gets expired when the wheel wraps around again, which
    takes between 63 and 630ms depending on the HZ setting.

    Address both problems by invoking forward_timer_base() for the control CPUs
    timer base. All other places, which might run into a similar problem
    (mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
    that.

    [ tglx: Massaged comment and changelog ]

    Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
    Co-developed-by: Neeraj Upadhyay
    Signed-off-by: Neeraj Upadhyay
    Signed-off-by: Lingutla Chandrasekhar
    Signed-off-by: Thomas Gleixner
    Cc: Anna-Maria Gleixner
    Cc: linux-arm-msm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org
    Signed-off-by: Greg Kroah-Hartman

    Lingutla Chandrasekhar
     

03 Mar, 2018

1 commit

  • commit 48d0c9becc7f3c66874c100c126459a9da0fdced upstream.

    The POSIX specification defines that relative CLOCK_REALTIME timers are not
    affected by clock modifications. Those timers have to use CLOCK_MONOTONIC
    to ensure POSIX compliance.

    The introduction of the additional HRTIMER_MODE_PINNED mode broke this
    requirement for pinned timers.

    There is no user space visible impact because user space timers are not
    using pinned mode, but for consistency reasons this needs to be fixed.

    Check whether the mode has the HRTIMER_MODE_REL bit set instead of
    comparing with HRTIMER_MODE_ABS.

    Signed-off-by: Anna-Maria Gleixner
    Cc: Christoph Hellwig
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: keescook@chromium.org
    Fixes: 597d0275736d ("timers: Framework for identifying pinned timers")
    Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de
    Signed-off-by: Ingo Molnar
    Cc: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Anna-Maria Gleixner
     

31 Jan, 2018

1 commit

  • commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.

    The hrtimer interrupt code contains a hang detection and mitigation
    mechanism, which prevents that a long delayed hrtimer interrupt causes a
    continous retriggering of interrupts which prevent the system from making
    progress. If a hang is detected then the timer hardware is programmed with
    a certain delay into the future and a flag is set in the hrtimer cpu base
    which prevents newly enqueued timers from reprogramming the timer hardware
    prior to the chosen delay. The subsequent hrtimer interrupt after the delay
    clears the flag and resumes normal operation.

    If such a hang happens in the last hrtimer interrupt before a CPU is
    unplugged then the hang_detected flag is set and stays that way when the
    CPU is plugged in again. At that point the timer hardware is not armed and
    it cannot be armed because the hang_detected flag is still active, so
    nothing clears that flag. As a consequence the CPU does not receive hrtimer
    interrupts and no timers expire on that CPU which results in RCU stalls and
    other malfunctions.

    Clear the flag along with some other less critical members of the hrtimer
    cpu base to ensure starting from a clean state when a CPU is plugged in.

    Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
    root cause of that hard to reproduce heisenbug. Once understood it's
    trivial and certainly justifies a brown paperbag.

    Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
    Reported-by: Paul E. McKenney
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Sebastian Sewior
    Cc: Anna-Maria Gleixner
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

24 Jan, 2018

1 commit

  • commit ed4bbf7910b28ce3c691aef28d245585eaabda06 upstream.

    When the timer base is checked for expired timers then the deferrable base
    must be checked as well. This was missed when making the deferrable base
    independent of base::nohz_active.

    Fixes: ced6d5c11d3e ("timers: Use deferrable base independent of base::nohz_active")
    Signed-off-by: Thomas Gleixner
    Cc: Anna-Maria Gleixner
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Paul McKenney
    Cc: rt@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

03 Jan, 2018

5 commits

  • commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream.

    The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
    subsequently invokes tick_nohz_stop_sched_tick() are:

    if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

    If need_resched() is not set, but a timer softirq is pending then this is
    an indication that the softirq code punted and delegated the execution to
    softirqd. need_resched() is not true because the current interrupted task
    takes precedence over softirqd.

    Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
    timer interrupts because the timer wheel contains an expired timer, but
    softirqs are not yet executed. So it returns an immediate expiry request,
    which causes the timer to fire immediately again. Lather, rinse and
    repeat....

    Prevent that by adding a check for a pending timer soft interrupt to the
    conditions in tick_nohz_stop_sched_tick() which avoid calling
    get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
    prevents a repetitive programming of an already expired timer.

    Reported-by: Sebastian Siewior
    Signed-off-by: Thomas Gleixner
    Acked-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul McKenney
    Cc: Anna-Maria Gleixner
    Cc: Sebastian Siewior
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 26456f87aca7157c057de65c9414b37f1ab881d1 upstream.

    The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
    them with a potentially stale clk and next_expiry valuem, which can cause
    trouble then the CPU is plugged.

    Add a prepare callback which forwards the clock, sets next_expiry to far in
    the future and reset the control flags to a known state.

    Set base->must_forward_clk so the first timer which is queued will try to
    forward the clock to current jiffies.

    Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
    Reported-by: Paul E. McKenney
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Sebastian Siewior
    Cc: Anna-Maria Gleixner
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit fd45bb77ad682be728d1002431d77b8c73342836 upstream.

    The timer start debug function is called before the proper timer base is
    set. As a consequence the trace data contains the stale CPU and flags
    values.

    Call the debug function after setting the new base and flags.

    Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Sebastian Siewior
    Cc: rt@linutronix.de
    Cc: Paul McKenney
    Cc: Anna-Maria Gleixner
    Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd upstream.

    During boot and before base::nohz_active is set in the timer bases, deferrable
    timers are enqueued into the standard timer base. This works correctly as
    long as base::nohz_active is false.

    Once it base::nohz_active is set and a timer which was enqueued before that
    is accessed the lock selector code choses the lock of the deferred
    base. This causes unlocked access to the standard base and in case the
    timer is removed it does not clear the pending flag in the standard base
    bitmap which causes get_next_timer_interrupt() to return bogus values.

    To prevent that, the deferrable timers must be enqueued in the deferrable
    base, even when base::nohz_active is not set. Those deferrable timers also
    need to be expired unconditional.

    Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
    Signed-off-by: Anna-Maria Gleixner
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: rt@linutronix.de
    Cc: Paul McKenney
    Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Anna-Maria Gleixner
     
  • commit 466a2b42d67644447a1765276259a3ea5531ddff upstream.

    Since the recent remote cpufreq callback work, its possible that a cpufreq
    update is triggered from a remote CPU. For single policies however, the current
    code uses the local CPU when trying to determine if the remote sg_cpu entered
    idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
    idle_calls counter of the remote CPU.

    Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks)
    Acked-by: Viresh Kumar
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Joel Fernandes
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Joel Fernandes
     

20 Dec, 2017

1 commit

  • commit cef31d9af908243421258f1df35a4a644604efbe upstream.

    timer_create() specifies via sigevent->sigev_notify the signal delivery for
    the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
    and (SIGEV_SIGNAL | SIGEV_THREAD_ID).

    The sanity check in good_sigevent() is only checking the valid combination
    for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
    not set it accepts any random value.

    This has no real effects on the posix timer and signal delivery code, but
    it affects show_timer() which handles the output of /proc/$PID/timers. That
    function uses a string array to pretty print sigev_notify. The access to
    that array has no bound checks, so random sigev_notify cause access beyond
    the array bounds.

    Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
    masking from various code pathes as SIGEV_NONE can never be set in
    combination with SIGEV_THREAD_ID.

    Reported-by: Eric Biggers
    Reported-by: Dmitry Vyukov
    Reported-by: Alexey Dobriyan
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 Sep, 2017

1 commit

  • Collection of aesthetic adjustments to various PPS-related files,
    directories and Documentation, some quite minor just for the sake of
    consistency, including:

    * Updated example of pps device tree node (courtesy Rodolfo G.)
    * "PPS-API" -> "PPS API"
    * "pps_source_info_s" -> "pps_source_info"
    * "ktimer driver" -> "pps-ktimer driver"
    * "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example
    * Add missing PPS-related entries to MAINTAINERS file
    * Other trivialities

    Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261048220.8106@localhost.localdomain
    Signed-off-by: Robert P. J. Day
    Acked-by: Rodolfo Giometti
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

06 Sep, 2017

1 commit

  • Pull power management updates from Rafael Wysocki:
    "This time (again) cpufreq gets the majority of changes which mostly
    are driver updates (including a major consolidation of intel_pstate),
    some schedutil governor modifications and core cleanups.

    There also are some changes in the system suspend area, mostly related
    to diagnostics and debug messages plus some renames of things related
    to suspend-to-idle. One major change here is that suspend-to-idle is
    now going to be preferred over S3 on systems where the ACPI tables
    indicate to do so and provide requsite support (the Low Power Idle S0
    _DSM in particular). The system sleep documentation and the tools
    related to it are updated too.

    The rest is a few cpuidle changes (nothing major), devfreq updates,
    generic power domains (genpd) framework updates and a few assorted
    modifications elsewhere.

    Specifics:

    - Drop the P-state selection algorithm based on a PID controller from
    intel_pstate and make it use the same P-state selection method
    (based on the CPU load) for all types of systems in the active mode
    (Rafael Wysocki, Srinivas Pandruvada).

    - Rework the cpufreq core and governors to make it possible to take
    cross-CPU utilization updates into account and modify the schedutil
    governor to actually do so (Viresh Kumar).

    - Clean up the handling of transition latency information in the
    cpufreq core and untangle it from the information on which drivers
    cannot do dynamic frequency switching (Viresh Kumar).

    - Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
    cpufreq driver and update its DT bindings (Sean Wang).

    - Modify the cpufreq dt-platdev driver to autimatically create
    cpufreq devices for the new (v2) Operating Performance Points (OPP)
    DT bindings and update its whitelist of supported systems (Viresh
    Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
    Xiao).

    - Add support for Ux500 to the cpufreq-dt driver and drop the
    obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).

    - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
    Nguyen).

    - Fix and clean up assorted issues in the cpufreq drivers and core
    (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
    Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).

    - Update the IO-wait boost handling in the schedutil governor to make
    it less aggressive (Joel Fernandes).

    - Rework system suspend diagnostics to make it print fewer messages
    to the kernel log by default, add a sysfs knob to allow more
    suspend-related messages to be printed and add Low Power S0 Idle
    constraints checks to the ACPI suspend-to-idle code (Rafael
    Wysocki, Srinivas Pandruvada).

    - Prefer suspend-to-idle over S3 on ACPI-based systems with the
    ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
    interface present in the ACPI tables (Rafael Wysocki).

    - Update documentation related to system sleep and rename a number of
    items in the code to make it cleare that they are related to
    suspend-to-idle (Rafael Wysocki).

    - Export a variable allowing device drivers to check the target
    system sleep state from the core system suspend code (Florian
    Fainelli).

    - Clean up the cpuidle subsystem to handle the polling state on x86
    in a more straightforward way and to use %pOF instead of full_name
    (Rafael Wysocki, Rob Herring).

    - Update the devfreq framework to fix and clean up a few minor issues
    (Chanwoo Choi, Rob Herring).

    - Extend diagnostics in the generic power domains (genpd) framework
    and clean it up slightly (Thara Gopinath, Rob Herring).

    - Fix and clean up a couple of issues in the operating performance
    points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).

    - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
    (AVS) driver (David Wu).

    - Fix the usage of notifiers in CPU power management on some
    platforms (Alex Shi).

    - Update the pm-graph system suspend/hibernation and boot profiling
    utility (Todd Brandt).

    - Make it possible to run the cpupower utility without CPU0 (Prarit
    Bhargava)"

    * tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
    cpuidle: Make drivers initialize polling state
    cpuidle: Move polling state initialization code to separate file
    cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
    cpufreq: imx6q: Fix imx6sx low frequency support
    cpufreq: speedstep-lib: make several arrays static, makes code smaller
    PM: docs: Delete the obsolete states.txt document
    PM: docs: Describe high-level PM strategies and sleep states
    PM / devfreq: Fix memory leak when fail to register device
    PM / devfreq: Add dependency on PM_OPP
    PM / devfreq: Move private devfreq_update_stats() into devfreq
    PM / devfreq: Convert to using %pOF instead of full_name
    PM / AVS: rockchip-io: add io selectors and supplies for RV1108
    cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
    cpufreq: dt-platdev: Drop few entries from whitelist
    cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
    ARM: ux500: don't select CPUFREQ_DT
    cpuidle: Convert to using %pOF instead of full_name
    cpufreq: Convert to using %pOF instead of full_name
    PM / Domains: Convert to using %pOF instead of full_name
    cpufreq: Cap the default transition delay value to 10 ms
    ...

    Linus Torvalds
     

05 Sep, 2017

1 commit

  • Pull timer fixes from Thomas Gleixner:
    "A rather small update for the time(r) subsystem:

    - A new clocksource driver IMX-TPM

    - Minor fixes to the alarmtimer facility

    - Device tree cleanups for Renesas drivers

    - A new kselftest and fixes for the timer related tests

    - Conversion of the clocksource drivers to use %pOF

    - Use the proper helpers to access rlimits in the posix-cpu-timer
    code"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    alarmtimer: Ensure RTC module is not unloaded
    clocksource: Convert to using %pOF instead of full_name
    clocksource/drivers/bcm2835: Remove message for a memory allocation failure
    devicetree: bindings: Remove deprecated properties
    devicetree: bindings: Remove unused 32-bit CMT bindings
    devicetree: bindings: Deprecate property, update example
    devicetree: bindings: r8a73a4 and R-Car Gen2 CMT bindings
    devicetree: bindings: R-Car Gen2 CMT0 and CMT1 bindings
    devicetree: bindings: Remove sh7372 CMT binding
    clocksource/drivers/imx-tpm: Add imx tpm timer support
    dt-bindings: timer: Add nxp tpm timer binding doc
    posix-cpu-timers: Use dedicated helper to access rlimit values
    alarmtimer: Fix unavailable wake-up source in sysfs
    timekeeping: Use proper timekeeper for debug code
    kselftests: timers: set-timer-lat: Add one-shot timer test cases
    kselftests: timers: set-timer-lat: Tweak reporting when timer fires early
    kselftests: timers: freq-step: Fix build warning
    kselftests: timers: freq-step: Define ADJ_SETOFFSET if device has older kernel headers

    Linus Torvalds
     

04 Sep, 2017

1 commit

  • * pm-sleep:
    ACPI / PM: Check low power idle constraints for debug only
    PM / s2idle: Rename platform operations structure
    PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
    PM / s2idle: Rename freeze_state enum and related items
    PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
    ACPI / PM: Prefer suspend-to-idle over S3 on some systems
    platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
    PM / suspend: Define pr_fmt() in suspend.c
    PM / suspend: Use mem_sleep_labels[] strings in messages
    PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
    PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
    PM / core: Add error argument to dpm_show_time()
    PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
    PM / s2idle: Rearrange the main suspend-to-idle loop
    PM / timekeeping: Print debug messages when requested
    PM / sleep: Mark suspend/hibernation start and finish
    PM / sleep: Do not print debug messages by default
    PM / suspend: Export pm_suspend_target_state

    Rafael J. Wysocki
     

01 Sep, 2017

1 commit

  • When registering the rtc device to be used to handle alarm timers,
    get_device is used to ensure the device doesn't go away but the module can
    still be unloaded.

    Call try_module_get to ensure the rtc driver will not go away.

    Reported-and-tested-by: Michal Simek
    Signed-off-by: Alexandre Belloni
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Stephen Boyd
    Link: http://lkml.kernel.org/r/20170820220146.30969-1-alexandre.belloni@free-electrons.com

    Alexandre Belloni
     

26 Aug, 2017

1 commit

  • In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time
    handling"), the following code got mistakenly added to the update of the
    raw timekeeper:

    /* Update the monotonic raw base */
    seconds = tk->raw_sec;
    nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
    tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

    Which adds the raw_sec value and the shifted down raw xtime_nsec to the
    base value.

    But the read function adds the shifted down tk->tkr_raw.xtime_nsec value
    another time, The result of this is that ktime_get_raw() users (which are
    all internal users) see the raw time move faster then it should (the rate
    at which can vary with the current size of tkr_raw.xtime_nsec), which has
    resulted in at least problems with graphics rendering performance.

    The change tried to match the monotonic base update logic:

    seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
    nsec = (u32) tk->wall_to_monotonic.tv_nsec;
    tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

    Which adds the wall_to_monotonic.tv_nsec value, but not the
    tk->tkr_mono.xtime_nsec value to the base.

    To fix this, simplify the tkr_raw.base accumulation to only accumulate the
    raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which
    will be added at read time.

    Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling")
    Reported-and-tested-by: Chris Wilson
    Signed-off-by: John Stultz
    Signed-off-by: Thomas Gleixner
    Cc: Prarit Bhargava
    Cc: Kevin Brodsky
    Cc: Richard Cochran
    Cc: Stephen Boyd
    Cc: Will Deacon
    Cc: Miroslav Lichvar
    Cc: Daniel Mentz
    Link: http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stultz@linaro.org

    John Stultz
     

24 Aug, 2017

1 commit

  • When a timer base is idle, it is forwarded when a new timer is added
    to ensure that granularity does not become excessive. When not idle,
    the timer tick is expected to increment the base.

    However there are several problems:

    - If an existing timer is modified, the base is forwarded only after
    the index is calculated.

    - The base is not forwarded by add_timer_on.

    - There is a window after a timer is restarted from a nohz idle, after
    it is marked not-idle and before the timer tick on this CPU, where a
    timer may be added but the ancient base does not get forwarded.

    These result in excessive granularity (a 1 jiffy timeout can blow out
    to 100s of jiffies), which cause the rcu lockup detector to trigger,
    among other things.

    Fix this by keeping track of whether the timer base has been idle
    since it was last run or forwarded, and if so then forward it before
    adding a new timer.

    There is still a case where mod_timer optimises the case of a pending
    timer mod with the same expiry time, where the timer can see excessive
    granularity relative to the new, shorter interval. A comment is added,
    but it's not changed because it is an important fastpath for
    networking.

    This has been tested and found to fix the RCU softlockup messages.

    Testing was also done with tracing to measure requested versus
    achieved wakeup latencies for all non-deferrable timers in an idle
    system (with no lockup watchdogs running). Wakeup latency relative to
    absolute latency is calculated (note this suffers from round-up skew
    at low absolute times) and analysed:

    max avg std
    upstream 506.0 1.20 4.68
    patched 2.0 1.08 0.15

    The bug was noticed due to the lockup detector Kconfig changes
    dropping it out of people's .configs and resulting in larger base
    clk skew When the lockup detectors are enabled, no CPU can go idle for
    longer than 4 seconds, which limits the granularity errors.
    Sub-optimal timer behaviour is observable on a smaller scale in that
    case:

    max avg std
    upstream 9.0 1.05 0.19
    patched 2.0 1.04 0.11

    Fixes: Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Thomas Gleixner
    Tested-by: Jonathan Cameron
    Tested-by: David Miller
    Cc: dzickus@redhat.com
    Cc: sfr@canb.auug.org.au
    Cc: mpe@ellerman.id.au
    Cc: Stephen Boyd
    Cc: linuxarm@huawei.com
    Cc: abdhalee@linux.vnet.ibm.com
    Cc: John Stultz
    Cc: akpm@linux-foundation.org
    Cc: paulmck@linux.vnet.ibm.com
    Cc: torvalds@linux-foundation.org
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com

    Nicholas Piggin
     

20 Aug, 2017

1 commit


18 Aug, 2017

3 commits

  • Use rlimit() and rlimit_max() helper instead of manually writing
    whole chain from task to rlimit value

    Signed-off-by: Krzysztof Opasiak
    Signed-off-by: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170705172548.7911-1-k.opasiak@samsung.com

    Krzysztof Opasiak
     
  • Currently the alarmtimer registers a wake-up source unconditionally,
    regardless of the system having a (wake-up capable) RTC or not.
    Hence the alarmtimer will always show up in
    /sys/kernel/debug/wakeup_sources, even if it is not available, and thus
    cannot be a wake-up source.

    To fix this, postpone registration until a wake-up capable RTC device is
    added.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: John Stultz

    Geert Uytterhoeven
     
  • When CONFIG_DEBUG_TIMEKEEPING is enabled the timekeeping_check_update()
    function will update status like last_warning and underflow_seen on the
    timekeeper.

    If there are issues found this state is used to rate limit the warnings
    that get printed.

    This rate limiting doesn't really really work if stored in real_tk as
    the shadow timekeeper is overwritten onto real_tk at the end of every
    update_wall_time() call, resetting last_warning and other statuses.

    Fix rate limiting by using the shadow_timekeeper for
    timekeeping_check_update().

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Fixes: commit 57d05a93ada7 ("time: Rework debugging variables so they aren't global")
    Signed-off-by: Stafford Horne
    Signed-off-by: John Stultz

    Stafford Horne
     

01 Aug, 2017

1 commit

  • For e.g. HZ=100, timer being 430 jiffies in the future, and 32 bit
    unsigned int, there is an overflow on unsigned int right-hand side
    of the expression which results with wrong values being returned.

    Type cast the multiplier to 64bit to avoid that issue.

    Fixes: 46c8f0b077a8 ("timers: Fix get_next_timer_interrupt() computation")
    Signed-off-by: Matija Glavinic Pecotic
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexander Sverdlin
    Cc: khilman@baylibre.com
    Cc: akpm@linux-foundation.org
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/a7900f04-2a21-c9fd-67be-ab334d459ee5@nokia.com

    Matija Glavinic Pecotic
     

23 Jul, 2017

1 commit

  • The messages printed by tk_debug_account_sleep_time() are basically
    useful for system sleep debugging, so print them only when the other
    debug messages from the core suspend/hibernate code are enabled.

    While at it, make it clear that the messages from
    tk_debug_account_sleep_time() are about timekeeping suspend
    duration, because in general timekeeping may be suspeded and
    resumed for multiple times during one system suspend-resume cycle.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

06 Jul, 2017

1 commit

  • Pull timer-related user access updates from Al Viro:
    "Continuation of timers-related stuff (there had been more, but my
    parts of that series are already merged via timers/core). This is more
    of y2038 work by Deepa Dinamani, partially disrupted by the
    unification of native and compat timers-related syscalls"

    * 'timers-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    posix_clocks: Use get_itimerspec64() and put_itimerspec64()
    timerfd: Use get_itimerspec64() and put_itimerspec64()
    nanosleep: Use get_timespec64() and put_timespec64()
    posix-timers: Use get_timespec64() and put_timespec64()
    posix-stubs: Conditionally include COMPAT_SYS_NI defines
    time: introduce {get,put}_itimerspec64
    time: add get_timespec64 and put_timespec64

    Linus Torvalds
     

04 Jul, 2017

2 commits

  • Pull timer updates from Thomas Gleixner:
    "A rather large update for timers/timekeeping:

    - compat syscall consolidation (Al Viro)

    - Posix timer consolidation (Christoph Helwig / Thomas Gleixner)

    - Cleanup of the device tree based initialization for clockevents and
    clocksources (Daniel Lezcano)

    - Consolidation of the FTTMR010 clocksource/event driver (Linus
    Walleij)

    - The usual set of small fixes and updates all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (93 commits)
    timers: Make the cpu base lock raw
    clocksource/drivers/mips-gic-timer: Fix an error code in 'gic_clocksource_of_init()'
    clocksource/drivers/fsl_ftm_timer: Unmap region obtained by of_iomap
    clocksource/drivers/tcb_clksrc: Make IO endian agnostic
    clocksource/drivers/sun4i: Switch to the timer-of common init
    clocksource/drivers/timer-of: Fix invalid iomap check
    Revert "ktime: Simplify ktime_compare implementation"
    clocksource/drivers: Fix uninitialized variable use in timer_of_init
    kselftests: timers: Add test for frequency step
    kselftests: timers: Fix inconsistency-check to not ignore first timestamp
    time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
    time: Clean up CLOCK_MONOTONIC_RAW time handling
    posix-cpu-timers: Make timespec to nsec conversion safe
    itimer: Make timeval to nsec conversion range limited
    timers: Fix parameter description of try_to_del_timer_sync()
    ktime: Simplify ktime_compare implementation
    clocksource/drivers/fttmr010: Factor out clock read code
    clocksource/drivers/fttmr010: Implement delay timer
    clocksource/drivers: Add timer-of common init routine
    clocksource/drivers/tcb_clksrc: Save timer context on suspend/resume
    ...

    Linus Torvalds
     
  • Pull nohz updates from Ingo Molnar:
    "The main changes in this cycle relate to fixing another bad (but
    sporadic and hard to detect) interaction between the dynticks
    scheduler tick and hrtimers, plus related improvements to better
    detection and handling of similar problems - by Frédéric Weisbecker"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    nohz: Fix spurious warning when hrtimer and clockevent get out of sync
    nohz: Fix buggy tick delay on IRQ storms
    nohz: Reset next_tick cache even when the timer has no regs
    nohz: Fix collision between tick and other hrtimers, again
    nohz: Add hrtimer sanity check

    Linus Torvalds