Eric Lee / smarc-fsl-linux-kernel

29 Oct, 2018

1 commit

3a9036886 clockevents: Retry programming min delta up to 10 times ... Browse Code »

When CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n, the call path
hrtimer_reprogram -> clockevents_program_event ->
clockevents_program_min_delta will not retry if the clock event driver
returns -ETIME.

If the driver could not satisfy the program_min_delta for any reason, the
lack of a retry means the CPU may not receive a tick interrupt, potentially
until the counter does a full period. This leads to rcu_sched timeout
messages as the stalled CPU is detected by other CPUs, and other issues if
the CPU is holding locks or other resources at the point at which it
stalls.

There have been a couple of observed mechanisms through which a clock event
driver could not satisfy the requested min_delta and return -ETIME.

With the MIPS GIC driver, the shared execution resource within MT cores
means inconventient latency due to execution of instructions from other
hardware threads in the core, within gic_next_event, can result in an event
being set in the past.

Additionally under virtualisation it is possible to get unexpected latency
during a clockevent device's set_next_event() callback which can make it
return -ETIME even for a delta based on min_delta_ns.

It isn't appropriate to use MIN_ADJUST in the virtualisation case as
occasional hypervisor induced high latency will cause min_delta_ns to
quickly increase to the maximum.

Instead, borrow the retry pattern from the MIN_ADJUST case, but without
making adjustments. Retry up to 10 times, each time increasing the
attempted delta by min_delta, before giving up.

[ Matt: Reworked the loop and made retry increase the delta. ]

Signed-off-by: James Hogan
Signed-off-by: Matt Redfearn
Signed-off-by: Thomas Gleixner
Cc: linux-mips@linux-mips.org
Cc: Daniel Lezcano
Cc: "Martin Schwidefsky"
Cc: James Hogan
Link: https://lkml.kernel.org/r/1508422643-6075-1-git-send-email-matt.redfearn@mips.com

James Hogan
2018-10-29 11:10:38 +0800

04 Oct, 2018

3 commits

3e3f075f7 posix-timers: Sanitize overrun handling ... Browse Code »

[ Upstream commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 ]

The posix timer overrun handling is broken because the forwarding functions
can return a huge number of overruns which does not fit in an int. As a
consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
random number generators.

The k_clock::timer_forward() callbacks return a 64 bit value now. Make
k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
accounting is correct. 3Remove the temporary (int) casts.

Add a helper function which clamps the overrun value returned to user space
via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
between 0 and INT_MAX. INT_MAX is an indicator for user space that the
overrun value has been clamped.

Reported-by: Team OWL337
Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Cc: Peter Zijlstra
Cc: Michael Kerrisk
Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-10-04 08:00:50 +0800
a05bd4ba6 posix-timers: Make forward callback return s64 ... Browse Code »

[ Upstream commit 6fec64e1c92d5c715c6d0f50786daa7708266bde ]

The posix timer ti_overrun handling is broken because the forwarding
functions can return a huge number of overruns which does not fit in an
int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn
into random number generators.

As a first step to address that let the timer_forward() callbacks return
the full 64 bit value.

Cast it to (int) temporarily until k_itimer::ti_overrun is converted to
64bit and the conversion to user space visible values is sanitized.

Reported-by: Team OWL337
Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Cc: Peter Zijlstra
Cc: Michael Kerrisk
Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-10-04 08:00:50 +0800
a4dbaf7c2 alarmtimer: Prevent overflow for relative nanosleep ... Browse Code »

[ Upstream commit 5f936e19cc0ef97dbe3a56e9498922ad5ba1edef ]

Air Icy reported:

UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
signed integer overflow:
1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
Call Trace:
alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
__do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
__se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
__x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290

alarm_timer_nsleep() uses ktime_add() to add the current time and the
relative expiry value. ktime_add() has no sanity checks so the addition
can overflow when the relative timeout is large enough.

Use ktime_add_safe() which has the necessary sanity checks in place and
limits the result to the valid range.

Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers")
Reported-by: Team OWL337
Signed-off-by: Thomas Gleixner
Cc: John Stultz
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-10-04 08:00:49 +0800

29 Sep, 2018

1 commit

ed5e9462f tick/nohz: Prevent bogus softirq pending warning ... Browse Code »

Commit 0a0e0829f990 ("nohz: Fix missing tick reprogram when interrupting an
inline softirq") got backported to stable trees and now causes the NOHZ
softirq pending warning to trigger. It's not an upstream issue as the NOHZ
update logic has been changed there.

The problem is when a softirq disabled section gets interrupted and on
return from interrupt the tick/nohz state is evaluated, which then can
observe pending soft interrupts. These soft interrupts are legitimately
pending because they cannot be processed as long as soft interrupts are
disabled and the interrupted code will correctly process them when soft
interrupts are reenabled.

Add a check for softirqs disabled to the pending check to prevent the
warning.

Reported-by: Grygorii Strashko
Reported-by: John Crispin
Signed-off-by: Thomas Gleixner
Tested-by: Grygorii Strashko
Tested-by: John Crispin
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Anna-Maria Gleixner
Cc: stable@vger.kernel.org
Fixes: 2d898915ccf4838c ("nohz: Fix missing tick reprogram when interrupting an inline softirq")
Acked-by: Frederic Weisbecker
Tested-by: Geert Uytterhoeven

Thomas Gleixner
2018-09-29 18:06:07 +0800

20 Sep, 2018

1 commit

cb71229f6 timers: Clear timer_base::must_forward_clk with timer_base::lock held ... Browse Code »

[ Upstream commit 363e934d8811d799c88faffc5bfca782fd728334 ]

timer_base::must_forward_clock is indicating that the base clock might be
stale due to a long idle sleep.

The forwarding of the base clock takes place in the timer softirq or when a
timer is enqueued to a base which is idle. If the enqueue of timer to an
idle base happens from a remote CPU, then the following race can happen:

CPU0 CPU1
run_timer_softirq mod_timer

base = lock_timer_base(timer);
base->must_forward_clk = false
if (base->must_forward_clk)
forward(base); -> skipped

enqueue_timer(base, timer, idx);
-> idx is calculated high due to
stale base
unlock_timer_base(timer);
base = lock_timer_base(timer);
forward(base);

The root cause is that timer_base::must_forward_clk is cleared outside the
timer_base::lock held region, so the remote queuing CPU observes it as
cleared, but the base clock is still stale. This can cause large
granularity values for timers, i.e. the accuracy of the expiry time
suffers.

Prevent this by clearing the flag with timer_base::lock held, so that the
forwarding takes place before the cleared flag is observable by a remote
CPU.

Signed-off-by: Gaurav Kohli
Signed-off-by: Thomas Gleixner
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: linux-arm-msm@vger.kernel.org
Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Gaurav Kohli
2018-09-20 04:43:39 +0800

09 Aug, 2018

1 commit

e5bcbedad nohz: Fix local_timer_softirq_pending() ... Browse Code »

commit 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e upstream.

local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner
Signed-off-by: Thomas Gleixner
Reviewed-by: Paul E. McKenney
Reviewed-by: Daniel Bristot de Oliveira
Acked-by: Frederic Weisbecker
Cc: bigeasy@linutronix.de
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Anna-Maria Gleixner
2018-08-09 18:16:38 +0800

22 Jul, 2018

1 commit

16b3ae123 clocksource: Initialize cs->wd_list ... Browse Code »

commit 5b9e886a4af97574ca3ce1147f35545da0e7afc7 upstream.

A number of places relies on list_empty(&cs->wd_list), however the
list_head does not get initialized. Do so upon registration, such that
thereafter it is possible to rely on list_empty() correctly reflecting
the list membership status.

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Tested-by: Diego Viola
Reviewed-by: Rafael J. Wysocki
Cc: stable@vger.kernel.org
Cc: len.brown@intel.com
Cc: rjw@rjwysocki.net
Cc: rui.zhang@intel.com
Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org
Signed-off-by: Sudip Mukherjee
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2018-07-22 20:28:48 +0800

03 Jul, 2018

1 commit

88c4318d3 time: Make sure jiffies_to_msecs() preserves non-zero time periods ... Browse Code »

commit abcbcb80cd09cd40f2089d912764e315459b71f7 upstream.

For the common cases where 1000 is a multiple of HZ, or HZ is a multiple of
1000, jiffies_to_msecs() never returns zero when passed a non-zero time
period.

However, if HZ > 1000 and not an integer multiple of 1000 (e.g. 1024 or
1200, as used on alpha and DECstation), jiffies_to_msecs() may return zero
for small non-zero time periods. This may break code that relies on
receiving back a non-zero value.

jiffies_to_usecs() does not need such a fix: one jiffy can only be less
than one µs if HZ > 1000000, and such large values of HZ are already
rejected at build time, twice:

- include/linux/jiffies.h does #error if HZ >= 12288,
- kernel/time/time.c has BUILD_BUG_ON(HZ > USEC_PER_SEC).

Broken since forever.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Thomas Gleixner
Reviewed-by: Arnd Bergmann
Cc: John Stultz
Cc: Stephen Boyd
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180622143357.7495-1-geert@linux-m68k.org
Signed-off-by: Greg Kroah-Hartman

Geert Uytterhoeven
2018-07-03 17:24:56 +0800

23 May, 2018

1 commit

6986750cb tick/broadcast: Use for_each_cpu() specially on UP kernels ... Browse Code »

commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream.

for_each_cpu() unintuitively reports CPU0 as set independent of the actual
cpumask content on UP kernels. This causes an unexpected PIT interrupt
storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
a result, the virtual machine can suffer from a strange random delay of 1~20
minutes during boot-up, and sometimes it can hang forever.

Protect if by checking whether the cpumask is empty before entering the
for_each_cpu() loop.

[ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]

Signed-off-by: Dexuan Cui
Signed-off-by: Thomas Gleixner
Cc: Josh Poulson
Cc: "Michael Kelley (EOSG)"
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: stable@vger.kernel.org
Cc: Rakib Mullick
Cc: Jork Loeser
Cc: Greg Kroah-Hartman
Cc: Andrew Morton
Cc: KY Srinivasan
Cc: Linus Torvalds
Cc: Alexey Dobriyan
Cc: Dmitry Vyukov
Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Signed-off-by: Greg Kroah-Hartman

Dexuan Cui
2018-05-23 00:54:00 +0800

02 May, 2018

1 commit

a2066aa76 tick/sched: Do not mess with an enqueued hrtimer ... Browse Code »

commit 1f71addd34f4c442bec7d7c749acc1beb58126f2 upstream.

Kaike reported that in tests rdma hrtimers occasionaly stopped working. He
did great debugging, which provided enough context to decode the problem.

CPU 3 CPU 2

idle
start sched_timer expires = 712171000000
queue->next = sched_timer
start rdmavt timer. expires = 712172915662
lock(baseof(CPU3))
tick_nohz_stop_tick()
tick = 716767000000 timerqueue_add(tmr)

hrtimer_set_expires(sched_timer, tick);
sched_timer->expires = 716767000000 expires < queue->next->expires)
hrtimer_start(sched_timer) queue->next = tmr;
lock(baseof(CPU3))
unlock(baseof(CPU3))
timerqueue_remove()
timerqueue_add()

ts->sched_timer is queued and queue->next is pointing to it, but then
ts->sched_timer.expires is modified.

This not only corrupts the ordering of the timerqueue RB tree, it also
makes CPU2 see the new expiry time of timerqueue->next->expires when
checking whether timerqueue->next needs to be updated. So CPU2 sees that
the rdma timer is earlier than timerqueue->next and sets the rdma timer as
new next.

Depending on whether it had also seen the new time at RB tree enqueue, it
might have queued the rdma timer at the wrong place and then after removing
the sched_timer the RB tree is completely hosed.

The problem was introduced with a commit which tried to solve inconsistency
between the hrtimer in the tick_sched data and the underlying hardware
clockevent. It split out hrtimer_set_expires() to store the new tick time
in both the NOHZ and the NOHZ + HIGHRES case, but missed the fact that in
the NOHZ + HIGHRES case the hrtimer might still be queued.

Use hrtimer_start(timer, tick...) for the NOHZ + HIGHRES case which sets
timer->expires after canceling the timer and move the hrtimer_set_expires()
invocation into the NOHZ only code path which is not affected as it merily
uses the hrtimer as next event storage so code pathes can be shared with
the NOHZ + HIGHRES case.

Fixes: d4af6d933ccf ("nohz: Fix spurious warning when hrtimer and clockevent get out of sync")
Reported-by: "Wan Kaike"
Signed-off-by: Thomas Gleixner
Acked-by: Frederic Weisbecker
Cc: "Marciniszyn Mike"
Cc: Anna-Maria Gleixner
Cc: linux-rdma@vger.kernel.org
Cc: "Dalessandro Dennis"
Cc: "Fleck John"
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: "Weiny Ira"
Cc: "linux-rdma@vger.kernel.org"
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804241637390.1679@nanos.tec.linutronix.de
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804242119210.1597@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-05-02 03:58:26 +0800

26 Apr, 2018

1 commit

89f3232c3 alarmtimer: Init nanosleep alarm timer on stack ... Browse Code »

commit bd03143007eb9b03a7f2316c677780561b68ba2a upstream.

syszbot reported the following debugobjects splat:

ODEBUG: object is on stack, but not annotated
WARNING: CPU: 0 PID: 4185 at lib/debugobjects.c:328

RIP: 0010:debug_object_is_on_stack lib/debugobjects.c:327 [inline]
debug_object_init+0x17/0x20 lib/debugobjects.c:391
debug_hrtimer_init kernel/time/hrtimer.c:410 [inline]
debug_init kernel/time/hrtimer.c:458 [inline]
hrtimer_init+0x8c/0x410 kernel/time/hrtimer.c:1259
alarm_init kernel/time/alarmtimer.c:339 [inline]
alarm_timer_nsleep+0x164/0x4d0 kernel/time/alarmtimer.c:787
SYSC_clock_nanosleep kernel/time/posix-timers.c:1226 [inline]
SyS_clock_nanosleep+0x235/0x330 kernel/time/posix-timers.c:1204
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7

This happens because the hrtimer for the alarm nanosleep is on stack, but
the code does not use the proper debug objects initialization.

Split out the code for the allocated use cases and invoke
hrtimer_init_on_stack() for the nanosleep related functions.

Reported-by: syzbot+a3e0726462b2e346a31d@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner
Cc: John Stultz
Cc: syzkaller-bugs@googlegroups.com
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1803261528270.1585@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-04-26 17:02:21 +0800

29 Mar, 2018

1 commit

f7fbe38cc posix-timers: Protect posix clock array access against speculation ... Browse Code »

commit 19b558db12f9f4e45a22012bae7b4783e62224da upstream.

The clockid argument of clockid_to_kclock() comes straight from user space
via various syscalls and is used as index into the posix_clocks array.

Protect it against spectre v1 array out of bounds speculation. Remove the
redundant check for !posix_clock[id] as this is another source for
speculation and does not provide any advantage over the return
posix_clock[id] path which returns NULL in that case anyway.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra (Intel)
Acked-by: Dan Williams
Cc: Rasmus Villemoes
Cc: Greg KH
Cc: stable@vger.kernel.org
Cc: Linus Torvalds
Cc: David Woodhouse
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802151718320.1296@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-03-29 00:24:46 +0800

09 Mar, 2018

1 commit

6b218ed6b timers: Forward timer base before migrating timers ... Browse Code »

commit c52232a49e203a65a6e1a670cd5262f59e9364a0 upstream.

On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.

If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.

In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.

But there is a worse problem than that. The following sequence of events
illustrates it:

- CPU0 timer1 is queued expires = 59969 and base->clk = 59131.

The timer is queued at wheel level 2, with resulting expiry time = 60032
(due to level granularity).

- CPU1 enters idle @60007, with next timer expiry @60020.

- CPU0 is hotplugged at @60009

- CPU1 exits idle and runs the control thread which migrates the
timers from CPU0

timer1 is now queued in level 0 for immediate handling in the next
softirq because the requested expiry time 59969 is before CPU1 base->clk
60007

- CPU1 runs code which forwards the base clock which succeeds because the
next expiring timer. which was collected at idle entry time is still set
to 60020.

So it forwards beyond 60007 and therefore misses to expire the migrated
timer1. That timer gets expired when the wheel wraps around again, which
takes between 63 and 630ms depending on the HZ setting.

Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.

[ tglx: Massaged comment and changelog ]

Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Co-developed-by: Neeraj Upadhyay
Signed-off-by: Neeraj Upadhyay
Signed-off-by: Lingutla Chandrasekhar
Signed-off-by: Thomas Gleixner
Cc: Anna-Maria Gleixner
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org
Signed-off-by: Greg Kroah-Hartman

Lingutla Chandrasekhar
2018-03-09 14:41:04 +0800

03 Mar, 2018

1 commit

f92679fee hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers) ... Browse Code »

commit 48d0c9becc7f3c66874c100c126459a9da0fdced upstream.

The POSIX specification defines that relative CLOCK_REALTIME timers are not
affected by clock modifications. Those timers have to use CLOCK_MONOTONIC
to ensure POSIX compliance.

The introduction of the additional HRTIMER_MODE_PINNED mode broke this
requirement for pinned timers.

There is no user space visible impact because user space timers are not
using pinned mode, but for consistency reasons this needs to be fixed.

Check whether the mode has the HRTIMER_MODE_REL bit set instead of
comparing with HRTIMER_MODE_ABS.

Signed-off-by: Anna-Maria Gleixner
Cc: Christoph Hellwig
Cc: John Stultz
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: keescook@chromium.org
Fixes: 597d0275736d ("timers: Framework for identifying pinned timers")
Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar
Cc: Mike Galbraith
Signed-off-by: Greg Kroah-Hartman

Anna-Maria Gleixner
2018-03-03 17:24:21 +0800

31 Jan, 2018

1 commit

fdd88d753 hrtimer: Reset hrtimer cpu base proper on CPU hotplug ... Browse Code »

commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.

The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.

If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.

Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.

Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.

Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Sebastian Sewior
Cc: Anna-Maria Gleixner
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-01-31 21:03:49 +0800

24 Jan, 2018

1 commit

4db98c583 timers: Unconditionally check deferrable base ... Browse Code »

commit ed4bbf7910b28ce3c691aef28d245585eaabda06 upstream.

When the timer base is checked for expired timers then the deferrable base
must be checked as well. This was missed when making the deferrable base
independent of base::nohz_active.

Fixes: ced6d5c11d3e ("timers: Use deferrable base independent of base::nohz_active")
Signed-off-by: Thomas Gleixner
Cc: Anna-Maria Gleixner
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Paul McKenney
Cc: rt@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-01-24 02:58:12 +0800

03 Jan, 2018

5 commits

e798502cf nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() ... Browse Code »

commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream.

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Reported-by: Sebastian Siewior
Signed-off-by: Thomas Gleixner
Acked-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Paul McKenney
Cc: Anna-Maria Gleixner
Cc: Sebastian Siewior
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-01-03 03:31:16 +0800
6fae6de72 timers: Reinitialize per cpu bases on hotplug ... Browse Code »

commit 26456f87aca7157c057de65c9414b37f1ab881d1 upstream.

The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.

Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.

Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Sebastian Siewior
Cc: Anna-Maria Gleixner
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-01-03 03:31:15 +0800
8f1aa64ab timers: Invoke timer_start_debug() where it makes sense ... Browse Code »

commit fd45bb77ad682be728d1002431d77b8c73342836 upstream.

The timer start debug function is called before the proper timer base is
set. As a consequence the trace data contains the stale CPU and flags
values.

Call the debug function after setting the new base and flags.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Sebastian Siewior
Cc: rt@linutronix.de
Cc: Paul McKenney
Cc: Anna-Maria Gleixner
Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2018-01-03 03:31:15 +0800
e4fb2e7e9 timers: Use deferrable base independent of base::nohz_active ... Browse Code »

commit ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd upstream.

During boot and before base::nohz_active is set in the timer bases, deferrable
timers are enqueued into the standard timer base. This works correctly as
long as base::nohz_active is false.

Once it base::nohz_active is set and a timer which was enqueued before that
is accessed the lock selector code choses the lock of the deferred
base. This causes unlocked access to the standard base and in case the
timer is removed it does not clear the pending flag in the standard base
bitmap which causes get_next_timer_interrupt() to return bogus values.

To prevent that, the deferrable timers must be enqueued in the deferrable
base, even when base::nohz_active is not set. Those deferrable timers also
need to be expired unconditional.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Anna-Maria Gleixner
Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: rt@linutronix.de
Cc: Paul McKenney
Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Anna-Maria Gleixner
2018-01-03 03:31:15 +0800
0c688c288 cpufreq: schedutil: Use idle_calls counter of the remote CPU ... Browse Code »

commit 466a2b42d67644447a1765276259a3ea5531ddff upstream.

Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.

Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks)
Acked-by: Viresh Kumar
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Joel Fernandes
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Greg Kroah-Hartman

Joel Fernandes
2018-01-03 03:31:05 +0800

20 Dec, 2017

1 commit

3df23f7ce posix-timer: Properly check sigevent->sigev_notify ... Browse Code »

commit cef31d9af908243421258f1df35a4a644604efbe upstream.

timer_create() specifies via sigevent->sigev_notify the signal delivery for
the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
and (SIGEV_SIGNAL | SIGEV_THREAD_ID).

The sanity check in good_sigevent() is only checking the valid combination
for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
not set it accepts any random value.

This has no real effects on the posix timer and signal delivery code, but
it affects show_timer() which handles the output of /proc/$PID/timers. That
function uses a string array to pretty print sigev_notify. The access to
that array has no bound checks, so random sigev_notify cause access beyond
the array bounds.

Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
masking from various code pathes as SIGEV_NONE can never be set in
combination with SIGEV_THREAD_ID.

Reported-by: Eric Biggers
Reported-by: Dmitry Vyukov
Reported-by: Alexey Dobriyan
Signed-off-by: Thomas Gleixner
Cc: John Stultz
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2017-12-20 17:10:21 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

09 Sep, 2017

1 commit

a2d818030 drivers/pps: aesthetic tweaks to PPS-related content ... Browse Code »

Collection of aesthetic adjustments to various PPS-related files,
directories and Documentation, some quite minor just for the sake of
consistency, including:

* Updated example of pps device tree node (courtesy Rodolfo G.)
* "PPS-API" -> "PPS API"
* "pps_source_info_s" -> "pps_source_info"
* "ktimer driver" -> "pps-ktimer driver"
* "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example
* Add missing PPS-related entries to MAINTAINERS file
* Other trivialities

Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261048220.8106@localhost.localdomain
Signed-off-by: Robert P. J. Day
Acked-by: Rodolfo Giometti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2017-09-09 09:26:51 +0800

06 Sep, 2017

1 commit

439644096 Merge tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael Wysocki:
"This time (again) cpufreq gets the majority of changes which mostly
are driver updates (including a major consolidation of intel_pstate),
some schedutil governor modifications and core cleanups.

There also are some changes in the system suspend area, mostly related
to diagnostics and debug messages plus some renames of things related
to suspend-to-idle. One major change here is that suspend-to-idle is
now going to be preferred over S3 on systems where the ACPI tables
indicate to do so and provide requsite support (the Low Power Idle S0
_DSM in particular). The system sleep documentation and the tools
related to it are updated too.

The rest is a few cpuidle changes (nothing major), devfreq updates,
generic power domains (genpd) framework updates and a few assorted
modifications elsewhere.

Specifics:

- Drop the P-state selection algorithm based on a PID controller from
intel_pstate and make it use the same P-state selection method
(based on the CPU load) for all types of systems in the active mode
(Rafael Wysocki, Srinivas Pandruvada).

- Rework the cpufreq core and governors to make it possible to take
cross-CPU utilization updates into account and modify the schedutil
governor to actually do so (Viresh Kumar).

- Clean up the handling of transition latency information in the
cpufreq core and untangle it from the information on which drivers
cannot do dynamic frequency switching (Viresh Kumar).

- Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
cpufreq driver and update its DT bindings (Sean Wang).

- Modify the cpufreq dt-platdev driver to autimatically create
cpufreq devices for the new (v2) Operating Performance Points (OPP)
DT bindings and update its whitelist of supported systems (Viresh
Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
Xiao).

- Add support for Ux500 to the cpufreq-dt driver and drop the
obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).

- Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
Nguyen).

- Fix and clean up assorted issues in the cpufreq drivers and core
(Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).

- Update the IO-wait boost handling in the schedutil governor to make
it less aggressive (Joel Fernandes).

- Rework system suspend diagnostics to make it print fewer messages
to the kernel log by default, add a sysfs knob to allow more
suspend-related messages to be printed and add Low Power S0 Idle
constraints checks to the ACPI suspend-to-idle code (Rafael
Wysocki, Srinivas Pandruvada).

- Prefer suspend-to-idle over S3 on ACPI-based systems with the
ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
interface present in the ACPI tables (Rafael Wysocki).

- Update documentation related to system sleep and rename a number of
items in the code to make it cleare that they are related to
suspend-to-idle (Rafael Wysocki).

- Export a variable allowing device drivers to check the target
system sleep state from the core system suspend code (Florian
Fainelli).

- Clean up the cpuidle subsystem to handle the polling state on x86
in a more straightforward way and to use %pOF instead of full_name
(Rafael Wysocki, Rob Herring).

- Update the devfreq framework to fix and clean up a few minor issues
(Chanwoo Choi, Rob Herring).

- Extend diagnostics in the generic power domains (genpd) framework
and clean it up slightly (Thara Gopinath, Rob Herring).

- Fix and clean up a couple of issues in the operating performance
points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).

- Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
(AVS) driver (David Wu).

- Fix the usage of notifiers in CPU power management on some
platforms (Alex Shi).

- Update the pm-graph system suspend/hibernation and boot profiling
utility (Todd Brandt).

- Make it possible to run the cpupower utility without CPU0 (Prarit
Bhargava)"

* tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
cpuidle: Make drivers initialize polling state
cpuidle: Move polling state initialization code to separate file
cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
cpufreq: imx6q: Fix imx6sx low frequency support
cpufreq: speedstep-lib: make several arrays static, makes code smaller
PM: docs: Delete the obsolete states.txt document
PM: docs: Describe high-level PM strategies and sleep states
PM / devfreq: Fix memory leak when fail to register device
PM / devfreq: Add dependency on PM_OPP
PM / devfreq: Move private devfreq_update_stats() into devfreq
PM / devfreq: Convert to using %pOF instead of full_name
PM / AVS: rockchip-io: add io selectors and supplies for RV1108
cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
cpufreq: dt-platdev: Drop few entries from whitelist
cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
ARM: ux500: don't select CPUFREQ_DT
cpuidle: Convert to using %pOF instead of full_name
cpufreq: Convert to using %pOF instead of full_name
PM / Domains: Convert to using %pOF instead of full_name
cpufreq: Cap the default transition delay value to 10 ms
...

Linus Torvalds
2017-09-06 03:19:08 +0800

05 Sep, 2017

1 commit

dd90cccff Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Thomas Gleixner:
"A rather small update for the time(r) subsystem:

- A new clocksource driver IMX-TPM

- Minor fixes to the alarmtimer facility

- Device tree cleanups for Renesas drivers

- A new kselftest and fixes for the timer related tests

- Conversion of the clocksource drivers to use %pOF

- Use the proper helpers to access rlimits in the posix-cpu-timer
code"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Ensure RTC module is not unloaded
clocksource: Convert to using %pOF instead of full_name
clocksource/drivers/bcm2835: Remove message for a memory allocation failure
devicetree: bindings: Remove deprecated properties
devicetree: bindings: Remove unused 32-bit CMT bindings
devicetree: bindings: Deprecate property, update example
devicetree: bindings: r8a73a4 and R-Car Gen2 CMT bindings
devicetree: bindings: R-Car Gen2 CMT0 and CMT1 bindings
devicetree: bindings: Remove sh7372 CMT binding
clocksource/drivers/imx-tpm: Add imx tpm timer support
dt-bindings: timer: Add nxp tpm timer binding doc
posix-cpu-timers: Use dedicated helper to access rlimit values
alarmtimer: Fix unavailable wake-up source in sysfs
timekeeping: Use proper timekeeper for debug code
kselftests: timers: set-timer-lat: Add one-shot timer test cases
kselftests: timers: set-timer-lat: Tweak reporting when timer fires early
kselftests: timers: freq-step: Fix build warning
kselftests: timers: freq-step: Define ADJ_SETOFFSET if device has older kernel headers

Linus Torvalds
2017-09-05 04:06:34 +0800

04 Sep, 2017

1 commit

7b01463e5 Merge branch 'pm-sleep' ... Browse Code »

* pm-sleep:
ACPI / PM: Check low power idle constraints for debug only
PM / s2idle: Rename platform operations structure
PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
PM / s2idle: Rename freeze_state enum and related items
PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
ACPI / PM: Prefer suspend-to-idle over S3 on some systems
platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
PM / suspend: Define pr_fmt() in suspend.c
PM / suspend: Use mem_sleep_labels[] strings in messages
PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
PM / core: Add error argument to dpm_show_time()
PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
PM / s2idle: Rearrange the main suspend-to-idle loop
PM / timekeeping: Print debug messages when requested
PM / sleep: Mark suspend/hibernation start and finish
PM / sleep: Do not print debug messages by default
PM / suspend: Export pm_suspend_target_state

Rafael J. Wysocki
2017-09-04 06:06:02 +0800

01 Sep, 2017

1 commit

51218298a alarmtimer: Ensure RTC module is not unloaded ... Browse Code »

When registering the rtc device to be used to handle alarm timers,
get_device is used to ensure the device doesn't go away but the module can
still be unloaded.

Call try_module_get to ensure the rtc driver will not go away.

Reported-and-tested-by: Michal Simek
Signed-off-by: Alexandre Belloni
Signed-off-by: Thomas Gleixner
Acked-by: John Stultz
Cc: Stephen Boyd
Link: http://lkml.kernel.org/r/20170820220146.30969-1-alexandre.belloni@free-electrons.com

Alexandre Belloni
2017-09-01 03:36:45 +0800

26 Aug, 2017

1 commit

0bcdc0987 time: Fix ktime_get_raw() incorrect base accumulation ... Browse Code »

In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time
handling"), the following code got mistakenly added to the update of the
raw timekeeper:

/* Update the monotonic raw base */
seconds = tk->raw_sec;
nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the raw_sec value and the shifted down raw xtime_nsec to the
base value.

But the read function adds the shifted down tk->tkr_raw.xtime_nsec value
another time, The result of this is that ktime_get_raw() users (which are
all internal users) see the raw time move faster then it should (the rate
at which can vary with the current size of tkr_raw.xtime_nsec), which has
resulted in at least problems with graphics rendering performance.

The change tried to match the monotonic base update logic:

seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
nsec = (u32) tk->wall_to_monotonic.tv_nsec;
tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the wall_to_monotonic.tv_nsec value, but not the
tk->tkr_mono.xtime_nsec value to the base.

To fix this, simplify the tkr_raw.base accumulation to only accumulate the
raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which
will be added at read time.

Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling")
Reported-and-tested-by: Chris Wilson
Signed-off-by: John Stultz
Signed-off-by: Thomas Gleixner
Cc: Prarit Bhargava
Cc: Kevin Brodsky
Cc: Richard Cochran
Cc: Stephen Boyd
Cc: Will Deacon
Cc: Miroslav Lichvar
Cc: Daniel Mentz
Link: http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stultz@linaro.org

John Stultz
2017-08-26 22:06:12 +0800

24 Aug, 2017

1 commit

2fe59f507 timers: Fix excessive granularity of new timers after a nohz idle ... Browse Code »

When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
it is marked not-idle and before the timer tick on this CPU, where a
timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

max avg std
upstream 506.0 1.20 4.68
patched 2.0 1.08 0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

max avg std
upstream 9.0 1.05 0.19
patched 2.0 1.04 0.11

Fixes: Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Nicholas Piggin
Signed-off-by: Thomas Gleixner
Tested-by: Jonathan Cameron
Tested-by: David Miller
Cc: dzickus@redhat.com
Cc: sfr@canb.auug.org.au
Cc: mpe@ellerman.id.au
Cc: Stephen Boyd
Cc: linuxarm@huawei.com
Cc: abdhalee@linux.vnet.ibm.com
Cc: John Stultz
Cc: akpm@linux-foundation.org
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com

Nicholas Piggin
2017-08-24 17:40:18 +0800

20 Aug, 2017

1 commit

4e2a80970 Merge branch 'fortglx/4.14/time' of https://git.linaro.org/people/john.stultz/linux into timers/core ... Browse Code »

Pull timekeepig updates from John Stultz

- kselftest improvements

- Use the proper timekeeper in the debug code

- Prevent accessing an unavailable wakeup source in the alarmtimer sysfs
interface.

Thomas Gleixner
2017-08-20 17:46:46 +0800

18 Aug, 2017

3 commits

3cf294962 posix-cpu-timers: Use dedicated helper to access rlimit values ... Browse Code »

Use rlimit() and rlimit_max() helper instead of manually writing
whole chain from task to rlimit value

Signed-off-by: Krzysztof Opasiak
Signed-off-by: Thomas Gleixner
Link: http://lkml.kernel.org/r/20170705172548.7911-1-k.opasiak@samsung.com

Krzysztof Opasiak
2017-08-18 18:44:42 +0800
47b4a457e alarmtimer: Fix unavailable wake-up source in sysfs ... Browse Code »

Currently the alarmtimer registers a wake-up source unconditionally,
regardless of the system having a (wake-up capable) RTC or not.
Hence the alarmtimer will always show up in
/sys/kernel/debug/wakeup_sources, even if it is not available, and thus
cannot be a wake-up source.

To fix this, postpone registration until a wake-up capable RTC device is
added.

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Miroslav Lichvar
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Stephen Boyd
Signed-off-by: Geert Uytterhoeven
Signed-off-by: John Stultz

Geert Uytterhoeven
2017-08-18 03:15:10 +0800
a529bea8f timekeeping: Use proper timekeeper for debug code ... Browse Code »

When CONFIG_DEBUG_TIMEKEEPING is enabled the timekeeping_check_update()
function will update status like last_warning and underflow_seen on the
timekeeper.

If there are issues found this state is used to rate limit the warnings
that get printed.

This rate limiting doesn't really really work if stored in real_tk as
the shadow timekeeper is overwritten onto real_tk at the end of every
update_wall_time() call, resetting last_warning and other statuses.

Fix rate limiting by using the shadow_timekeeper for
timekeeping_check_update().

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Miroslav Lichvar
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Stephen Boyd
Fixes: commit 57d05a93ada7 ("time: Rework debugging variables so they aren't global")
Signed-off-by: Stafford Horne
Signed-off-by: John Stultz

Stafford Horne
2017-08-18 03:15:04 +0800

01 Aug, 2017

1 commit

34f41c031 timers: Fix overflow in get_next_timer_interrupt ... Browse Code »

For e.g. HZ=100, timer being 430 jiffies in the future, and 32 bit
unsigned int, there is an overflow on unsigned int right-hand side
of the expression which results with wrong values being returned.

Type cast the multiplier to 64bit to avoid that issue.

Fixes: 46c8f0b077a8 ("timers: Fix get_next_timer_interrupt() computation")
Signed-off-by: Matija Glavinic Pecotic
Signed-off-by: Thomas Gleixner
Reviewed-by: Alexander Sverdlin
Cc: khilman@baylibre.com
Cc: akpm@linux-foundation.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/a7900f04-2a21-c9fd-67be-ab334d459ee5@nokia.com

Matija Glavinic Pecotic
2017-08-01 20:20:53 +0800

23 Jul, 2017

1 commit

cb08e0353 PM / timekeeping: Print debug messages when requested ... Browse Code »

The messages printed by tk_debug_account_sleep_time() are basically
useful for system sleep debugging, so print them only when the other
debug messages from the core suspend/hibernate code are enabled.

While at it, make it clear that the messages from
tk_debug_account_sleep_time() are about timekeeping suspend
duration, because in general timekeeping may be suspeded and
resumed for multiple times during one system suspend-resume cycle.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2017-07-23 06:03:43 +0800

06 Jul, 2017

1 commit

ea3b25e13 Merge branch 'timers-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull timer-related user access updates from Al Viro:
"Continuation of timers-related stuff (there had been more, but my
parts of that series are already merged via timers/core). This is more
of y2038 work by Deepa Dinamani, partially disrupted by the
unification of native and compat timers-related syscalls"

* 'timers-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
posix_clocks: Use get_itimerspec64() and put_itimerspec64()
timerfd: Use get_itimerspec64() and put_itimerspec64()
nanosleep: Use get_timespec64() and put_timespec64()
posix-timers: Use get_timespec64() and put_timespec64()
posix-stubs: Conditionally include COMPAT_SYS_NI defines
time: introduce {get,put}_itimerspec64
time: add get_timespec64 and put_timespec64

Linus Torvalds
2017-07-06 06:34:35 +0800

04 Jul, 2017

2 commits

1b044f1cf Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Thomas Gleixner:
"A rather large update for timers/timekeeping:

- compat syscall consolidation (Al Viro)

- Posix timer consolidation (Christoph Helwig / Thomas Gleixner)

- Cleanup of the device tree based initialization for clockevents and
clocksources (Daniel Lezcano)

- Consolidation of the FTTMR010 clocksource/event driver (Linus
Walleij)

- The usual set of small fixes and updates all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (93 commits)
timers: Make the cpu base lock raw
clocksource/drivers/mips-gic-timer: Fix an error code in 'gic_clocksource_of_init()'
clocksource/drivers/fsl_ftm_timer: Unmap region obtained by of_iomap
clocksource/drivers/tcb_clksrc: Make IO endian agnostic
clocksource/drivers/sun4i: Switch to the timer-of common init
clocksource/drivers/timer-of: Fix invalid iomap check
Revert "ktime: Simplify ktime_compare implementation"
clocksource/drivers: Fix uninitialized variable use in timer_of_init
kselftests: timers: Add test for frequency step
kselftests: timers: Fix inconsistency-check to not ignore first timestamp
time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
time: Clean up CLOCK_MONOTONIC_RAW time handling
posix-cpu-timers: Make timespec to nsec conversion safe
itimer: Make timeval to nsec conversion range limited
timers: Fix parameter description of try_to_del_timer_sync()
ktime: Simplify ktime_compare implementation
clocksource/drivers/fttmr010: Factor out clock read code
clocksource/drivers/fttmr010: Implement delay timer
clocksource/drivers: Add timer-of common init routine
clocksource/drivers/tcb_clksrc: Save timer context on suspend/resume
...

Linus Torvalds
2017-07-04 07:14:51 +0800
59b60185b Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull nohz updates from Ingo Molnar:
"The main changes in this cycle relate to fixing another bad (but
sporadic and hard to detect) interaction between the dynticks
scheduler tick and hrtimers, plus related improvements to better
detection and handling of similar problems - by Frédéric Weisbecker"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: Fix spurious warning when hrtimer and clockevent get out of sync
nohz: Fix buggy tick delay on IRQ storms
nohz: Reset next_tick cache even when the timer has no regs
nohz: Fix collision between tick and other hrtimers, again
nohz: Add hrtimer sanity check

Linus Torvalds
2017-07-04 04:33:57 +0800