Eric Lee / smarc-fsl-linux-kernel

05 Mar, 2020

2 commits

1fabae5c8 lib/vdso: Update coarse timekeeper unconditionally ... Browse Code »

commit 9f24c540f7f8eb3a981528da9a9a636a5bdf5987 upstream.

The low resolution parts of the VDSO, i.e.:

clock_gettime(CLOCK_*_COARSE), clock_getres(), time()

can be used even if there is no VDSO capable clocksource.

But if an architecture opts out of the VDSO data update then this
information becomes stale. This affects ARM when there is no architected
timer available. The lack of update causes userspace to use stale data
forever.

Make the update of the low resolution parts unconditional and only skip
the update of the high resolution parts if the architecture requests it.

Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200114185946.765577901@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2020-03-05 23:43:49 +0800
91ebef861 lib/vdso: Make __arch_update_vdso_data() logic understandable ... Browse Code »

commit 9a6b55ac4a44060bcb782baf002859b2a2c63267 upstream.

The function name suggests that this is a boolean checking whether the
architecture asks for an update of the VDSO data, but it works the other
way round. To spare further confusion invert the logic.

Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200114185946.656652824@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2020-03-05 23:43:49 +0800

24 Feb, 2020

1 commit

251c53a92 alarmtimer: Make alarmtimer platform device child of RTC device ... Browse Code »

[ Upstream commit c79108bd19a8490315847e0c95ac6526fcd8e770 ]

The alarmtimer_suspend() function will fail if an RTC device is on a bus
such as SPI or i2c and that RTC device registers and probes after
alarmtimer_init() registers and probes the 'alarmtimer' platform device.

This is because system wide suspend suspends devices in the reverse order
of their probe. When alarmtimer_suspend() attempts to program the RTC for a
wakeup it will try to program an RTC device on a bus that has already been
suspended.

Move the alarmtimer device registration to happen when the RTC which is
used for wakeup is registered. Register the 'alarmtimer' platform device as
a child of the RTC device too, so that it can be guaranteed that the RTC
device won't be suspended when alarmtimer_suspend() is called.

Reported-by: Douglas Anderson
Signed-off-by: Stephen Boyd
Signed-off-by: Thomas Gleixner
Reviewed-by: Douglas Anderson
Link: https://lore.kernel.org/r/20200124055849.154411-2-swboyd@chromium.org
Signed-off-by: Sasha Levin

Stephen Boyd
2020-02-24 15:36:57 +0800

11 Feb, 2020

2 commits

d1318034e clocksource: Prevent double add_timer_on() for watchdog_timer ... Browse Code »

commit febac332a819f0e764aa4da62757ba21d18c182b upstream.

Kernel crashes inside QEMU/KVM are observed:

kernel BUG at kernel/time/timer.c:1154!
BUG_ON(timer_pending(timer) || !timer->function) in add_timer_on().

At the same time another cpu got:

general protection fault: 0000 [#1] SMP PTI of poinson pointer 0xdead000000000200 in:

__hlist_del at include/linux/list.h:681
(inlined by) detach_timer at kernel/time/timer.c:818
(inlined by) expire_timers at kernel/time/timer.c:1355
(inlined by) __run_timers at kernel/time/timer.c:1686
(inlined by) run_timer_softirq at kernel/time/timer.c:1699

Unfortunately kernel logs are badly scrambled, stacktraces are lost.

Printing the timer->function before the BUG_ON() pointed to
clocksource_watchdog().

The execution of clocksource_watchdog() can race with a sequence of
clocksource_stop_watchdog() .. clocksource_start_watchdog():

expire_timers()
detach_timer(timer, true);
timer->entry.pprev = NULL;
raw_spin_unlock_irq(&base->lock);
call_timer_fn
clocksource_watchdog()

clocksource_watchdog_kthread() or
clocksource_unbind()

spin_lock_irqsave(&watchdog_lock, flags);
clocksource_stop_watchdog();
del_timer(&watchdog_timer);
watchdog_running = 0;
spin_unlock_irqrestore(&watchdog_lock, flags);

spin_lock_irqsave(&watchdog_lock, flags);
clocksource_start_watchdog();
add_timer_on(&watchdog_timer, ...);
watchdog_running = 1;
spin_unlock_irqrestore(&watchdog_lock, flags);

spin_lock(&watchdog_lock);
add_timer_on(&watchdog_timer, ...);
BUG_ON(timer_pending(timer) || !timer->function);
timer_pending() -> true
BUG()

I.e. inside clocksource_watchdog() watchdog_timer could be already armed.

Check timer_pending() before calling add_timer_on(). This is sufficient as
all operations are synchronized by watchdog_lock.

Fixes: 75c5158f70c0 ("timekeeping: Update clocksource with stop_machine")
Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/158048693917.4378.13823603769948933793.stgit@buzz
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2020-02-11 20:35:54 +0800
ad2707341 alarmtimer: Unregister wakeup source when module get fails ... Browse Code »

commit 6b6d188aae79a630957aefd88ff5c42af6553ee3 upstream.

The alarmtimer_rtc_add_device() function creates a wakeup source and then
tries to grab a module reference. If that fails the function returns early
with an error code, but fails to remove the wakeup source.

Cleanup this exit path so there is no dangling wakeup source, which is
named 'alarmtime' left allocated which will conflict with another RTC
device that may be registered later.

Fixes: 51218298a25e ("alarmtimer: Ensure RTC module is not unloaded")
Signed-off-by: Stephen Boyd
Signed-off-by: Thomas Gleixner
Reviewed-by: Douglas Anderson
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200109155910.907-2-swboyd@chromium.org
Signed-off-by: Greg Kroah-Hartman

Stephen Boyd
2020-02-11 20:35:20 +0800

23 Jan, 2020

1 commit

2523c2693 tick/sched: Annotate lockless access to last_jiffies_update ... Browse Code »

commit de95a991bb72e009f47e0c4bbc90fc5f594588d5 upstream.

syzbot (KCSAN) reported a data-race in tick_do_update_jiffies64():

BUG: KCSAN: data-race in tick_do_update_jiffies64 / tick_do_update_jiffies64

write to 0xffffffff8603d008 of 8 bytes by interrupt on cpu 1:
tick_do_update_jiffies64+0x100/0x250 kernel/time/tick-sched.c:73
tick_sched_do_timer+0xd4/0xe0 kernel/time/tick-sched.c:138
tick_sched_timer+0x43/0xe0 kernel/time/tick-sched.c:1292
__run_hrtimer kernel/time/hrtimer.c:1514 [inline]
__hrtimer_run_queues+0x274/0x5f0 kernel/time/hrtimer.c:1576
hrtimer_interrupt+0x22a/0x480 kernel/time/hrtimer.c:1638
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
smp_apic_timer_interrupt+0xdc/0x280 arch/x86/kernel/apic/apic.c:1135
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
arch_local_irq_restore arch/x86/include/asm/paravirt.h:756 [inline]
kcsan_setup_watchpoint+0x1d4/0x460 kernel/kcsan/core.c:436
check_access kernel/kcsan/core.c:466 [inline]
__tsan_read1 kernel/kcsan/core.c:593 [inline]
__tsan_read1+0xc2/0x100 kernel/kcsan/core.c:593
kallsyms_expand_symbol.constprop.0+0x70/0x160 kernel/kallsyms.c:79
kallsyms_lookup_name+0x7f/0x120 kernel/kallsyms.c:170
insert_report_filterlist kernel/kcsan/debugfs.c:155 [inline]
debugfs_write+0x14b/0x2d0 kernel/kcsan/debugfs.c:256
full_proxy_write+0xbd/0x100 fs/debugfs/file.c:225
__vfs_write+0x67/0xc0 fs/read_write.c:494
vfs_write fs/read_write.c:558 [inline]
vfs_write+0x18a/0x390 fs/read_write.c:542
ksys_write+0xd5/0x1b0 fs/read_write.c:611
__do_sys_write fs/read_write.c:623 [inline]
__se_sys_write fs/read_write.c:620 [inline]
__x64_sys_write+0x4c/0x60 fs/read_write.c:620
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffffffff8603d008 of 8 bytes by task 0 on cpu 0:
tick_do_update_jiffies64+0x2b/0x250 kernel/time/tick-sched.c:62
tick_nohz_update_jiffies kernel/time/tick-sched.c:505 [inline]
tick_nohz_irq_enter kernel/time/tick-sched.c:1257 [inline]
tick_irq_enter+0x139/0x1c0 kernel/time/tick-sched.c:1274
irq_enter+0x4f/0x60 kernel/softirq.c:354
entering_irq arch/x86/include/asm/apic.h:517 [inline]
entering_ack_irq arch/x86/include/asm/apic.h:523 [inline]
smp_apic_timer_interrupt+0x55/0x280 arch/x86/kernel/apic/apic.c:1133
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1af/0x280 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
rest_init+0xec/0xf6 init/main.c:452
arch_call_rest_init+0x17/0x37
start_kernel+0x838/0x85e init/main.c:786
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Use READ_ONCE() and WRITE_ONCE() to annotate this expected race.

Reported-by: syzbot
Signed-off-by: Eric Dumazet
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20191205045619.204946-1-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-01-23 15:22:55 +0800

05 Jan, 2020

2 commits

bfa2e0cd3 ptp: fix the race between the release of ptp_clock and cdev ... Browse Code »

[ Upstream commit a33121e5487b424339636b25c35d3a180eaa5f5e ]

In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
device is removed, closing this file leads to a race. This reproduces
easily in a kvm virtual machine:

ts# cat openptp0.c
int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
ts# uname -r
5.5.0-rc3-46cf053e
ts# cat /proc/cmdline
... slub_debug=FZP
ts# modprobe ptp_kvm
ts# ./openptp0 &
[1] 670
opened /dev/ptp0, sleeping 10s...
ts# rmmod ptp_kvm
ts# ls /dev/ptp*
ls: cannot access '/dev/ptp*': No such file or directory
ts# ...woken up
[ 48.010809] general protection fault: 0000 [#1] SMP
[ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
[ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80
[ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
[ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
[ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
[ 48.019470] ... ^^^ a slub poison
[ 48.023854] Call Trace:
[ 48.024050] __fput+0x21f/0x240
[ 48.024288] task_work_run+0x79/0x90
[ 48.024555] do_exit+0x2af/0xab0
[ 48.024799] ? vfs_write+0x16a/0x190
[ 48.025082] do_group_exit+0x35/0x90
[ 48.025387] __x64_sys_exit_group+0xf/0x10
[ 48.025737] do_syscall_64+0x3d/0x130
[ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 48.026479] RIP: 0033:0x7f53b12082f6
[ 48.026792] ...
[ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
[ 48.045001] Fixing recursive fault but reboot is needed!

This happens in:

static void __fput(struct file *file)
{ ...
if (file->f_op->release)
file->f_op->release(inode, file); <<< cdev is kfree'd here
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
!(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev); <<< cdev fields are accessed here

Namely:

__fput()
posix_clock_release()
kref_put(&clk->kref, delete_clock) <<< the last reference
delete_clock()
delete_ptp_clock()
kfree(ptp) <<< cdev is embedded in ptp
cdev_put
module_put(p->owner) <<< *p is kfree'd, bang!

Here cdev is embedded in posix_clock which is embedded in ptp_clock.
The race happens because ptp_clock's lifetime is controlled by two
refcounts: kref and cdev.kobj in posix_clock. This is wrong.

Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
created especially for such cases. This way the parent device with its
ptp_clock is not released until all references to the cdev are released.
This adds a requirement that an initialized but not exposed struct
device should be provided to posix_clock_register() by a caller instead
of a simple dev_t.

This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
the race between the release of watchdog_core_data and cdev"). See
details of the implementation in the commit 233ed09d7fda ("chardev: add
helper function to register char devs with a struct device").

Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u
Analyzed-by: Stephen Johnston
Analyzed-by: Vern Lovejoy
Signed-off-by: Vladis Dronov
Acked-by: Richard Cochran
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Vladis Dronov
2020-01-05 02:18:48 +0800
2cd7c5f23 hrtimer: Annotate lockless access to timer->state ... Browse Code »

commit 56144737e67329c9aaed15f942d46a6302e2e3d8 upstream.

syzbot reported various data-race caused by hrtimer_is_queued() reading
timer->state. A READ_ONCE() is required there to silence the warning.

Also add the corresponding WRITE_ONCE() when timer->state is set.

In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
loading timer->state twice.

KCSAN reported these cases:

BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657

BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0xbb/0xe0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
__tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
__sys_sendto+0x21f/0x320 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto net/socket.c:1960 [inline]
__x64_sys_sendto+0x89/0xb0 net/socket.c:1960
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

[ tglx: Added comments ]

Reported-by: syzbot
Signed-off-by: Eric Dumazet
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2020-01-05 02:18:41 +0800

13 Dec, 2019

1 commit

05ee6e46a time: Zero the upper 32-bits in __kernel_timespec on 32-bit ... Browse Code »

commit 7b8474466ed97be458c825f34a85f2c2b84c3f95 upstream.

On compat interfaces, the high order bits of nanoseconds should be zeroed
out. This is because the application code or the libc do not guarantee
zeroing of these. If used without zeroing, kernel might be at risk of using
timespec values incorrectly.

Originally it was handled correctly, but lost during is_compat_syscall()
cleanup. Revert the condition back to check CONFIG_64BIT.

Fixes: 98f76206b335 ("compat: Cleanup in_compat_syscall() callers")
Reported-by: Ben Hutchings
Signed-off-by: Dmitry Safonov
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191121000303.126523-1-dima@arista.com
Signed-off-by: Greg Kroah-Hartman

Dmitry Safonov
2019-12-13 15:42:18 +0800

12 Nov, 2019

1 commit

2f5841349 ntp/y2038: Remove incorrect time_t truncation ... Browse Code »

A cast to 'time_t' was accidentally left in place during the
conversion of __do_adjtimex() to 64-bit timestamps, so the
resulting value is incorrectly truncated.

Remove the cast so the 64-bit time gets propagated correctly.

Fixes: ead25417f82e ("timex: use __kernel_timex internally")
Signed-off-by: Arnd Bergmann
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191108203435.112759-2-arnd@arndb.de

Arnd Bergmann
2019-11-12 15:13:44 +0800

05 Nov, 2019

1 commit

52338415c timekeeping/vsyscall: Update VDSO data unconditionally ... Browse Code »

The update of the VDSO data is depending on __arch_use_vsyscall() returning
True. This is a leftover from the attempt to map the features of various
architectures 1:1 into generic code.

The usage of __arch_use_vsyscall() in the actual vsyscall implementations
got dropped and replaced by the requirement for the architecture code to
return U64_MAX if the global clocksource is not usable in the VDSO.

But the __arch_use_vsyscall() check in the update code stayed which causes
the VDSO data to be stale or invalid when an architecture actually
implements that function and returns False when the current clocksource is
not usable in the VDSO.

As a consequence the VDSO implementations of clock_getres(), time(),
clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus
information.

Remove the __arch_use_vsyscall() check from the VDSO update function and
update the VDSO data unconditionally.

[ tglx: Massaged changelog and removed the now useless implementations in
asm-generic/ARM64/MIPS ]

Fixes: 44f57d788e7deecb50 ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Huacai Chen
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Vincenzo Frascino
Cc: Arnd Bergmann
Cc: Paul Burton
Cc: linux-mips@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com

Huacai Chen
2019-11-05 06:02:53 +0800

23 Oct, 2019

2 commits

7f2cbcbca posix-cpu-timers: Fix two trivial comments ... Browse Code »

Recent changes modified the function arguments of
thread_group_sample_cputime() and task_cputimers_expired(), but forgot to
update the comments. Fix it up.

[ tglx: Changed the argument name of task_cputimers_expired() as the pointer
points to an array of samples. ]

Fixes: b7be4ef1365d ("posix-cpu-timers: Switch thread group sampling to array")
Fixes: 001f7971433a ("posix-cpu-timers: Make expiry checks array based")
Signed-off-by: Yi Wang
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/1571643852-21848-1-git-send-email-wang.yi59@zte.com.cn

Yi Wang
2019-10-23 20:48:24 +0800
086ee46b0 timers/sched_clock: Include local timekeeping.h for missing declarations ... Browse Code »

Include the timekeeping.h header to get the declaration of the
sched_clock_{suspend,resume} functions. Fixes the following sparse
warnings:

kernel/time/sched_clock.c:275:5: warning: symbol 'sched_clock_suspend' was not declared. Should it be static?
kernel/time/sched_clock.c:286:6: warning: symbol 'sched_clock_resume' was not declared. Should it be static?

Signed-off-by: Ben Dooks (Codethink)
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20191022131226.11465-1-ben.dooks@codethink.co.uk

Ben Dooks (Codethink)
2019-10-23 20:48:23 +0800

14 Oct, 2019

1 commit

ff229eee3 hrtimer: Annotate lockless access to timer->base ... Browse Code »

Followup to commit dd2261ed45aa ("hrtimer: Protect lockless access
to timer->base")

lock_hrtimer_base() fetches timer->base without lock exclusion.

Compiler is allowed to read timer->base twice (even if considered dumb)
which could end up trying to lock migration_base and return
&migration_base.

base = timer->base;
if (likely(base != &migration_base)) {

/* compiler reads timer->base again, and now (base == &migration_base)

raw_spin_lock_irqsave(&base->cpu_base->lock, *flags);
if (likely(base == timer->base))
return base; /* == &migration_base ! */

Similarly the write sides must use WRITE_ONCE() to avoid store tearing.

Signed-off-by: Eric Dumazet
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20191008173204.180879-1-edumazet@google.com

Eric Dumazet
2019-10-14 21:51:49 +0800

27 Sep, 2019

2 commits

b9023b91d tick: broadcast-hrtimer: Fix a race in bc_set_next ... Browse Code »

When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.

The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.

In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().

In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings

Here is a depiction of the race condition

CPU #1 (Running timer callback) CPU #2 (Enter idle
and subscribe to
tick broadcast)
--------------------- ---------------------

__run_hrtimer() tick_broadcast_enter()

bc_handler() __tick_broadcast_oneshot_control()

tick_handle_oneshot_broadcast()

raw_spin_lock(&tick_broadcast_lock);

dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock
//next_event for tick broadcast clock
set to KTIME_MAX since no other cores
subscribed to tick broadcasting

raw_spin_unlock(&tick_broadcast_lock);

if (dev->next_event == KTIME_MAX)
return HRTIMER_NORESTART
// callback function exits without
restarting the hrtimer //tick_broadcast_lock acquired
raw_spin_lock(&tick_broadcast_lock);

tick_broadcast_set_event()

clockevents_program_event()

dev->next_event = expires;

bc_set_next()

hrtimer_try_to_cancel()
//returns -1 since the timer
callback is active. Exits without
restarting the timer
cpu_base->running = NULL;

The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.

Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner
Signed-off-by: Balasubramani Vivekanandan
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com

Balasubramani Vivekanandan
2019-09-27 20:45:55 +0800
da05b5ea1 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fix from Ingo Molnar:
"Fix a timer expiry bug that would cause spurious delay of timers"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timer: Read jiffies once when forwarding base clk

Linus Torvalds
2019-09-27 06:53:17 +0800

19 Sep, 2019

1 commit

e430d802d timer: Read jiffies once when forwarding base clk ... Browse Code »

The timer delayed for more than 3 seconds warning was triggered during
testing.

Workqueue: events_unbound sched_tick_remote
RIP: 0010:sched_tick_remote+0xee/0x100
...
Call Trace:
process_one_work+0x18c/0x3a0
worker_thread+0x30/0x380
kthread+0x113/0x130
ret_from_fork+0x22/0x40

The reason is that the code in collect_expired_timers() uses jiffies
unprotected:

if (next_event > jiffies)
base->clk = jiffies;

As the compiler is allowed to reload the value base->clk can advance
between the check and the store and in the worst case advance farther than
next event. That causes the timer expiry to be delayed until the wheel
pointer wraps around.

Convert the code to use READ_ONCE()

Fixes: 236968383cf5 ("timers: Optimize collect_expired_timers() for NOHZ")
Signed-off-by: Li RongQing
Signed-off-by: Liang ZhiCheng
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu.com

Li RongQing
2019-09-19 23:50:11 +0800

18 Sep, 2019

2 commits

77dcfe2b9 Merge tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael Wysocki:
"These include a rework of the main suspend-to-idle code flow (related
to the handling of spurious wakeups), a switch over of several users
of cpufreq notifiers to QoS-based limits, a new devfreq driver for
Tegra20, a new cpuidle driver and governor for virtualized guests, an
extension of the wakeup sources framework to expose wakeup sources as
device objects in sysfs, and more.

Specifics:

- Rework the main suspend-to-idle control flow to avoid repeating
"noirq" device resume and suspend operations in case of spurious
wakeups from the ACPI EC and decouple the ACPI EC wakeups support
from the LPS0 _DSM support (Rafael Wysocki).

- Extend the wakeup sources framework to expose wakeup sources as
device objects in sysfs (Tri Vo, Stephen Boyd).

- Expose system suspend statistics in sysfs (Kalesh Singh).

- Introduce a new haltpoll cpuidle driver and a new matching governor
for virtualized guests wanting to do guest-side polling in the idle
loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell).

- Fix the menu and teo cpuidle governors to allow the scheduler tick
to be stopped if PM QoS is used to limit the CPU idle state exit
latency in some cases (Rafael Wysocki).

- Increase the resolution of the play_idle() argument to microseconds
for more fine-grained injection of CPU idle cycles (Daniel
Lezcano).

- Switch over some users of cpuidle notifiers to the new QoS-based
frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
policy notifier events (Viresh Kumar).

- Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).

- Add support for MT8183 and MT8516 to the mediatek cpufreq driver
(Andrew-sh.Cheng, Fabien Parent).

- Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
Huang).

- Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).

- Update the qcom cpufreq driver (among other things, to make it
easier to extend and to use kryo cpufreq for other nvmem-based
SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
RAILLARD, Sibi Sankar, Sricharan R).

- Fix assorted issues and make assorted minor improvements in the
cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
Gustavo Silva, Hariprasad Kelam).

- Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
Bergmann).

- Add new Exynos PPMU events to devfreq events and extend that
mechanism (Lukasz Luba).

- Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).

- Improve devfreq documentation and governor code, fix spelling typos
in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez,
MyungJoo Ham, Gaël PORTAY).

- Add regulators enable and disable to the OPP (operating performance
points) framework (Kamil Konieczny).

- Update the OPP framework to support multiple opp-suspend properties
(Anson Huang).

- Fix assorted issues and make assorted minor improvements in the OPP
code (Niklas Cassel, Viresh Kumar, Yue Hu).

- Clean up the generic power domains (genpd) framework (Ulf Hansson).

- Clean up assorted pieces of power management code and documentation
(Akinobu Mita, Amit Kucheria, Chuhong Yuan).

- Update the pm-graph tool to version 5.5 including multiple fixes
and improvements (Todd Brandt).

- Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
Sébastien Szymanski)"

* tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits)
cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
cpuidle-haltpoll: do not set an owner to allow modunload
cpuidle-haltpoll: return -ENODEV on modinit failure
cpuidle-haltpoll: set haltpoll as preferred governor
cpuidle: allow governor switch on cpuidle_register_driver()
PM: runtime: Documentation: add runtime_status ABI document
pm-graph: make setVal unbuffered again for python2 and python3
powercap: idle_inject: Use higher resolution for idle injection
cpuidle: play_idle: Increase the resolution to usec
cpuidle-haltpoll: vcpu hotplug support
cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist
cpufreq: qcom: Add support for qcs404 on nvmem driver
cpufreq: qcom: Refactor the driver to make it easier to extend
cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs
dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR
dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain
Documentation: cpufreq: Update policy notifier documentation
cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
PM / Domains: Simplify genpd_lookup_dev()
...

Linus Torvalds
2019-09-18 10:15:14 +0800
7f2444d38 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core timer updates from Thomas Gleixner:
"Timers and timekeeping updates:

- A large overhaul of the posix CPU timer code which is a preparation
for moving the CPU timer expiry out into task work so it can be
properly accounted on the task/process.

An update to the bogus permission checks will come later during the
merge window as feedback was not complete before heading of for
travel.

- Switch the timerqueue code to use cached rbtrees and get rid of the
homebrewn caching of the leftmost node.

- Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
single function

- Implement the separation of hrtimers to be forced to expire in hard
interrupt context even when PREEMPT_RT is enabled and mark the
affected timers accordingly.

- Implement a mechanism for hrtimers and the timer wheel to protect
RT against priority inversion and live lock issues when a (hr)timer
which should be canceled is currently executing the callback.
Instead of infinitely spinning, the task which tries to cancel the
timer blocks on a per cpu base expiry lock which is held and
released by the (hr)timer expiry code.

- Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
resulting in faster access to timekeeping functions.

- Updates to various clocksource/clockevent drivers and their device
tree bindings.

- The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
posix-cpu-timers: Fix permission check regression
posix-cpu-timers: Always clear head pointer on dequeue
hrtimer: Add a missing bracket and hide `migration_base' on !SMP
posix-cpu-timers: Make expiry_active check actually work correctly
posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
tick: Mark sched_timer to expire in hard interrupt context
hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
posix-cpu-timers: Utilize timerqueue for storage
posix-cpu-timers: Move state tracking to struct posix_cputimers
posix-cpu-timers: Deduplicate rlimit handling
posix-cpu-timers: Remove pointless comparisons
posix-cpu-timers: Get rid of 64bit divisions
posix-cpu-timers: Consolidate timer expiry further
posix-cpu-timers: Get rid of zero checks
rlimit: Rewrite non-sensical RLIMIT_CPU comment
posix-cpu-timers: Respect INFINITY for hard RTTIME limit
posix-cpu-timers: Switch thread group sampling to array
posix-cpu-timers: Restructure expiry array
posix-cpu-timers: Remove cputime_expires
...

Linus Torvalds
2019-09-18 03:35:15 +0800

17 Sep, 2019

1 commit

d28170636 Merge branch 'pm-sleep' ... Browse Code »

* pm-sleep: (29 commits)
ACPI: PM: s2idle: Always set up EC GPE for system wakeup
ACPI: PM: s2idle: Avoid rearming SCI for wakeup unnecessarily
PM / wakeup: Unexport wakeup_source_sysfs_{add,remove}()
PM / wakeup: Register wakeup class kobj after device is added
PM / wakeup: Fix sysfs registration error path
PM / wakeup: Show wakeup sources stats in sysfs
PM / wakeup: Use wakeup_source_register() in wakelock.c
PM / wakeup: Drop wakeup_source_init(), wakeup_source_prepare()
PM: sleep: Replace strncmp() with str_has_prefix()
PM: suspend: Fix platform_suspend_prepare_noirq()
intel-hid: Disable button array during suspend-to-idle
intel-hid: intel-vbtn: Avoid leaking wakeup_mode set
ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices
ACPI: EC: PM: Make acpi_ec_dispatch_gpe() print debug message
ACPI: EC: PM: Consolidate some code depending on PM_SLEEP
ACPI: PM: s2idle: Eliminate acpi_sleep_no_ec_events()
ACPI: PM: s2idle: Switch EC over to polling during "noirq" suspend
ACPI: PM: s2idle: Add acpi.sleep_no_lps0 module parameter
ACPI: PM: s2idle: Rearrange lps0_device_attach()
PM/sleep: Expose suspend stats in sysfs
...

Rafael J. Wysocki
2019-09-17 15:36:34 +0800

10 Sep, 2019

1 commit

77b4b5420 posix-cpu-timers: Fix permission check regression ... Browse Code »

The recent consolidation of the three permission checks introduced a subtle
regression. For timer_create() with a process wide timer it returns the
current task if the lookup through the PID which is encoded into the
clockid results in returning current.

That's broken because it does not validate whether the current task is the
group leader.

That was caused by the two different variants of permission checks:

- posix_cpu_timer_get() allowed access to the process wide clock when the
looked up task is current. That's not an issue because the process wide
clock is in the shared sighand.

- posix_cpu_timer_create() made sure that the looked up task is the group
leader.

Restore the previous state.

Note, that these permission checks are more than questionable, but that's
subject to follow up changes.

Fixes: 6ae40e3fdcd3 ("posix-cpu-timers: Provide task validation functions")
Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1909052314110.1902@nanos.tec.linutronix.de

Thomas Gleixner
2019-09-10 19:13:07 +0800

06 Sep, 2019

1 commit

f18ddc13a alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP ... Browse Code »

ENOTSUPP is not supposed to be returned to userspace. This was found on an
OpenPower machine, where the RTC does not support set_alarm.

On that system, a clock_nanosleep(CLOCK_REALTIME_ALARM, ...) results in
"524 Unknown error 524"

Replace it with EOPNOTSUPP which results in the expected "95 Operation not
supported" error.

Fixes: 1c6b39ad3f01 (alarmtimers: Return -ENOTSUPP if no RTC device is present)
Signed-off-by: Thadeu Lima de Souza Cascardo
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190903171802.28314-1-cascardo@canonical.com

Thadeu Lima de Souza Cascardo
2019-09-06 03:19:26 +0800

05 Sep, 2019

1 commit

5d2295f3a hrtimer: Add a missing bracket and hide `migration_base' on !SMP ... Browse Code »

The recent change to avoid taking the expiry lock when a timer is currently
migrated missed to add a bracket at the end of the if statement leading to
compile errors. Since that commit the variable `migration_base' is always
used but it is only available on SMP configuration thus leading to another
compile error. The changelog says "The timer base and base->cpu_base
cannot be NULL in the code path", so it is safe to limit this check to SMP
configurations only.

Add the missing bracket to the if statement and hide `migration_base'
behind CONFIG_SMP bars.

[ tglx: Mark the functions inline ... ]

Fixes: 68b2c8c1e4210 ("hrtimer: Don't take expiry_lock when timer is currently migrated")
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190904145527.eah7z56ntwobqm6j@linutronix.de

Sebastian Andrzej Siewior
2019-09-05 16:39:06 +0800

29 Aug, 2019

1 commit

a2ed4fd68 posix-cpu-timers: Make expiry_active check actually work correctly ... Browse Code »

The state tracking changes broke the expiry active check by not writing to
it and instead sitting timers_active, which is already set.

That's not a big issue as the actual expiry is protected by sighand lock,
so concurrent handling is not possible. That means that the second task
which invokes that function executes the expiry code for nothing.

Write to the proper flag.

Also add a check whether the flag is set into check_process_timers(). That
check had been missing in the code before the rework already. The check for
another task handling the expiry of process wide timers was only done in
the fastpath check. If the fastpath check returns true because a per task
timer expired, then the checking of process wide timers was done in
parallel which is as explained above just a waste of cycles.

Fixes: 244d49e30653 ("posix-cpu-timers: Move state tracking to struct posix_cputimers")
Signed-off-by: Thomas Gleixner
Cc: Frederic Weisbecker

Thomas Gleixner
2019-08-29 18:52:28 +0800

28 Aug, 2019

16 commits

71fed982d tick: Mark sched_timer to expire in hard interrupt context ... Browse Code »

sched_timer must be initialized with the _HARD mode suffix to ensure expiry
in hard interrupt context on RT.

The previous conversion to HARD expiry mode missed on one instance in
tick_nohz_switch_to_nohz(). Fix it up.

Fixes: 902a9f9c50905 ("tick: Mark tick related hrtimers to expiry in hard interrupt context")
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190823113845.12125-3-bigeasy@linutronix.de

Sebastian Andrzej Siewior
2019-08-28 19:01:26 +0800
60bda037f posix-cpu-timers: Utilize timerqueue for storage ... Browse Code »

Using a linear O(N) search for timer insertion affects execution time and
D-cache footprint badly with a larger number of timers.

Switch the storage to a timerqueue which is already used for hrtimers and
alarmtimers. It does not affect the size of struct k_itimer as it.alarm is
still larger.

The extra list head for the expiry list will go away later once the expiry
is moved into task work context.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908272129220.1939@nanos.tec.linutronix.de

Thomas Gleixner
2019-08-28 17:50:43 +0800
244d49e30 posix-cpu-timers: Move state tracking to struct posix_cputimers ... Browse Code »

Put it where it belongs and clean up the ifdeffery in fork completely.

Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de

Thomas Gleixner
2019-08-28 17:50:42 +0800
8991afe26 posix-cpu-timers: Deduplicate rlimit handling ... Browse Code »

Both thread and process expiry functions have the same functionality for
sending signals for soft and hard RLIMITs duplicated in 4 different
ways.

Split it out into a common function and cleanup the callsites.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.653276779@linutronix.de

Thomas Gleixner
2019-08-28 17:50:42 +0800
dd6702241 posix-cpu-timers: Remove pointless comparisons ... Browse Code »

The soft RLIMIT expiry code checks whether the soft limit is greater than
the hard limit. That's pointless because if the soft RLIMIT is greater than
the hard RLIMIT then that code cannot be reached as the hard RLIMIT check
is before that and already killed the process.

Remove it.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.548747613@linutronix.de

Thomas Gleixner
2019-08-28 17:50:42 +0800
8ea1de90a posix-cpu-timers: Get rid of 64bit divisions ... Browse Code »

Instead of dividing A to match the units of B it's more efficient to
multiply B to match the units of A.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.458286860@linutronix.de

Thomas Gleixner
2019-08-28 17:50:41 +0800
1cd07c0b9 posix-cpu-timers: Consolidate timer expiry further ... Browse Code »

With the array based samples and expiry cache, the expiry function can use
a loop to collect timers from the clock specific lists.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.365469982@linutronix.de

Thomas Gleixner
2019-08-28 17:50:41 +0800
2bbdbdae0 posix-cpu-timers: Get rid of zero checks ... Browse Code »

Deactivation of the expiry cache is done by setting all clock caches to
0. That requires to have a check for zero in all places which update the
expiry cache:

if (cache == 0 || new < cache)
cache = new;

Use U64_MAX as the deactivated value, which allows to remove the zero
checks when updating the cache and reduces it to the obvious check:

if (new < cache)
cache = new;

This also removes the weird workaround in do_prlimit() which was required
to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing
in 0 to the posix CPU timer code would have effectively disarmed it.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.275086128@linutronix.de

Thomas Gleixner
2019-08-28 17:50:40 +0800
fe0517f89 posix-cpu-timers: Respect INFINITY for hard RTTIME limit ... Browse Code »

The RTIME limit expiry code does not check the hard RTTIME limit for
INFINITY, i.e. being disabled. Add it.

While this could be considered an ABI breakage if something would depend on
this behaviour. Though it's highly unlikely to have an effect because
RLIM_INFINITY is at minimum INT_MAX and the RTTIME limit is in seconds, so
the timer would fire after ~68 years.

Adding this obvious correct limit check also allows further consolidation
of that code and is a prerequisite for cleaning up the 0 based checks and
the rlimit setter code.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.078293002@linutronix.de

Thomas Gleixner
2019-08-28 17:50:39 +0800
b7be4ef13 posix-cpu-timers: Switch thread group sampling to array ... Browse Code »

That allows more simplifications in various places.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192921.988426956@linutronix.de

Thomas Gleixner
2019-08-28 17:50:39 +0800
87dc64480 posix-cpu-timers: Restructure expiry array ... Browse Code »

Now that the abused struct task_cputime is gone, it's more natural to
bundle the expiry cache and the list head of each clock into a struct and
have an array of those structs.

Follow the hrtimer naming convention of 'bases' and rename the expiry cache
to 'nextevt' and adapt all usage sites.

Generates also better code .text size shrinks by 80 bytes.

Suggested-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908262021140.1939@nanos.tec.linutronix.de

Thomas Gleixner
2019-08-28 17:50:39 +0800
46b883995 posix-cpu-timers: Remove cputime_expires ... Browse Code »

The last users of the magic struct cputime based expiry cache are
gone. Remove the leftovers.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192921.790209622@linutronix.de

Thomas Gleixner
2019-08-28 17:50:38 +0800
001f79714 posix-cpu-timers: Make expiry checks array based ... Browse Code »

The expiry cache is an array indexed by clock ids. The new sample functions
allow to retrieve a corresponding array of samples.

Convert the fastpath expiry checks to make use of the new sample functions
and do the comparisons on the sample and the expiry array.

Make the check for the expiry array being zero array based as well.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192921.695481430@linutronix.de

Thomas Gleixner
2019-08-28 17:50:38 +0800
b0d524f77 posix-cpu-timers: Provide array based sample functions ... Browse Code »

Instead of using task_cputime and doing the addition of utime and stime at
all call sites, it's way simpler to have a sample array which allows
indexed based checks against the expiry cache array.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192921.590362974@linutronix.de

Thomas Gleixner
2019-08-28 17:50:38 +0800
c02b078e6 posix-cpu-timers: Switch check_*_timers() to array cache ... Browse Code »

Use the array based expiry cache in check_thread_timers() and convert the
store in check_process_timers() for consistency.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192921.408222378@linutronix.de

Thomas Gleixner
2019-08-28 17:50:36 +0800
1b0dd96d0 posix-cpu-timers: Simplify set_process_cpu_timer() ... Browse Code »

The expiry cache can now be accessed as an array. Replace the per clock
checks with a simple comparison of the clock indexed array member.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192921.303316423@linutronix.de

Thomas Gleixner
2019-08-28 17:50:36 +0800