05 Mar, 2020

2 commits

  • commit 9f24c540f7f8eb3a981528da9a9a636a5bdf5987 upstream.

    The low resolution parts of the VDSO, i.e.:

    clock_gettime(CLOCK_*_COARSE), clock_getres(), time()

    can be used even if there is no VDSO capable clocksource.

    But if an architecture opts out of the VDSO data update then this
    information becomes stale. This affects ARM when there is no architected
    timer available. The lack of update causes userspace to use stale data
    forever.

    Make the update of the low resolution parts unconditional and only skip
    the update of the high resolution parts if the architecture requests it.

    Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20200114185946.765577901@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 9a6b55ac4a44060bcb782baf002859b2a2c63267 upstream.

    The function name suggests that this is a boolean checking whether the
    architecture asks for an update of the VDSO data, but it works the other
    way round. To spare further confusion invert the logic.

    Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20200114185946.656652824@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

24 Feb, 2020

1 commit

  • [ Upstream commit c79108bd19a8490315847e0c95ac6526fcd8e770 ]

    The alarmtimer_suspend() function will fail if an RTC device is on a bus
    such as SPI or i2c and that RTC device registers and probes after
    alarmtimer_init() registers and probes the 'alarmtimer' platform device.

    This is because system wide suspend suspends devices in the reverse order
    of their probe. When alarmtimer_suspend() attempts to program the RTC for a
    wakeup it will try to program an RTC device on a bus that has already been
    suspended.

    Move the alarmtimer device registration to happen when the RTC which is
    used for wakeup is registered. Register the 'alarmtimer' platform device as
    a child of the RTC device too, so that it can be guaranteed that the RTC
    device won't be suspended when alarmtimer_suspend() is called.

    Reported-by: Douglas Anderson
    Signed-off-by: Stephen Boyd
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Douglas Anderson
    Link: https://lore.kernel.org/r/20200124055849.154411-2-swboyd@chromium.org
    Signed-off-by: Sasha Levin

    Stephen Boyd
     

11 Feb, 2020

2 commits

  • commit febac332a819f0e764aa4da62757ba21d18c182b upstream.

    Kernel crashes inside QEMU/KVM are observed:

    kernel BUG at kernel/time/timer.c:1154!
    BUG_ON(timer_pending(timer) || !timer->function) in add_timer_on().

    At the same time another cpu got:

    general protection fault: 0000 [#1] SMP PTI of poinson pointer 0xdead000000000200 in:

    __hlist_del at include/linux/list.h:681
    (inlined by) detach_timer at kernel/time/timer.c:818
    (inlined by) expire_timers at kernel/time/timer.c:1355
    (inlined by) __run_timers at kernel/time/timer.c:1686
    (inlined by) run_timer_softirq at kernel/time/timer.c:1699

    Unfortunately kernel logs are badly scrambled, stacktraces are lost.

    Printing the timer->function before the BUG_ON() pointed to
    clocksource_watchdog().

    The execution of clocksource_watchdog() can race with a sequence of
    clocksource_stop_watchdog() .. clocksource_start_watchdog():

    expire_timers()
    detach_timer(timer, true);
    timer->entry.pprev = NULL;
    raw_spin_unlock_irq(&base->lock);
    call_timer_fn
    clocksource_watchdog()

    clocksource_watchdog_kthread() or
    clocksource_unbind()

    spin_lock_irqsave(&watchdog_lock, flags);
    clocksource_stop_watchdog();
    del_timer(&watchdog_timer);
    watchdog_running = 0;
    spin_unlock_irqrestore(&watchdog_lock, flags);

    spin_lock_irqsave(&watchdog_lock, flags);
    clocksource_start_watchdog();
    add_timer_on(&watchdog_timer, ...);
    watchdog_running = 1;
    spin_unlock_irqrestore(&watchdog_lock, flags);

    spin_lock(&watchdog_lock);
    add_timer_on(&watchdog_timer, ...);
    BUG_ON(timer_pending(timer) || !timer->function);
    timer_pending() -> true
    BUG()

    I.e. inside clocksource_watchdog() watchdog_timer could be already armed.

    Check timer_pending() before calling add_timer_on(). This is sufficient as
    all operations are synchronized by watchdog_lock.

    Fixes: 75c5158f70c0 ("timekeeping: Update clocksource with stop_machine")
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/158048693917.4378.13823603769948933793.stgit@buzz
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     
  • commit 6b6d188aae79a630957aefd88ff5c42af6553ee3 upstream.

    The alarmtimer_rtc_add_device() function creates a wakeup source and then
    tries to grab a module reference. If that fails the function returns early
    with an error code, but fails to remove the wakeup source.

    Cleanup this exit path so there is no dangling wakeup source, which is
    named 'alarmtime' left allocated which will conflict with another RTC
    device that may be registered later.

    Fixes: 51218298a25e ("alarmtimer: Ensure RTC module is not unloaded")
    Signed-off-by: Stephen Boyd
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Douglas Anderson
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20200109155910.907-2-swboyd@chromium.org
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     

23 Jan, 2020

1 commit

  • commit de95a991bb72e009f47e0c4bbc90fc5f594588d5 upstream.

    syzbot (KCSAN) reported a data-race in tick_do_update_jiffies64():

    BUG: KCSAN: data-race in tick_do_update_jiffies64 / tick_do_update_jiffies64

    write to 0xffffffff8603d008 of 8 bytes by interrupt on cpu 1:
    tick_do_update_jiffies64+0x100/0x250 kernel/time/tick-sched.c:73
    tick_sched_do_timer+0xd4/0xe0 kernel/time/tick-sched.c:138
    tick_sched_timer+0x43/0xe0 kernel/time/tick-sched.c:1292
    __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
    __hrtimer_run_queues+0x274/0x5f0 kernel/time/hrtimer.c:1576
    hrtimer_interrupt+0x22a/0x480 kernel/time/hrtimer.c:1638
    local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
    smp_apic_timer_interrupt+0xdc/0x280 arch/x86/kernel/apic/apic.c:1135
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
    arch_local_irq_restore arch/x86/include/asm/paravirt.h:756 [inline]
    kcsan_setup_watchpoint+0x1d4/0x460 kernel/kcsan/core.c:436
    check_access kernel/kcsan/core.c:466 [inline]
    __tsan_read1 kernel/kcsan/core.c:593 [inline]
    __tsan_read1+0xc2/0x100 kernel/kcsan/core.c:593
    kallsyms_expand_symbol.constprop.0+0x70/0x160 kernel/kallsyms.c:79
    kallsyms_lookup_name+0x7f/0x120 kernel/kallsyms.c:170
    insert_report_filterlist kernel/kcsan/debugfs.c:155 [inline]
    debugfs_write+0x14b/0x2d0 kernel/kcsan/debugfs.c:256
    full_proxy_write+0xbd/0x100 fs/debugfs/file.c:225
    __vfs_write+0x67/0xc0 fs/read_write.c:494
    vfs_write fs/read_write.c:558 [inline]
    vfs_write+0x18a/0x390 fs/read_write.c:542
    ksys_write+0xd5/0x1b0 fs/read_write.c:611
    __do_sys_write fs/read_write.c:623 [inline]
    __se_sys_write fs/read_write.c:620 [inline]
    __x64_sys_write+0x4c/0x60 fs/read_write.c:620
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    read to 0xffffffff8603d008 of 8 bytes by task 0 on cpu 0:
    tick_do_update_jiffies64+0x2b/0x250 kernel/time/tick-sched.c:62
    tick_nohz_update_jiffies kernel/time/tick-sched.c:505 [inline]
    tick_nohz_irq_enter kernel/time/tick-sched.c:1257 [inline]
    tick_irq_enter+0x139/0x1c0 kernel/time/tick-sched.c:1274
    irq_enter+0x4f/0x60 kernel/softirq.c:354
    entering_irq arch/x86/include/asm/apic.h:517 [inline]
    entering_ack_irq arch/x86/include/asm/apic.h:523 [inline]
    smp_apic_timer_interrupt+0x55/0x280 arch/x86/kernel/apic/apic.c:1133
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
    native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60
    arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
    default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
    cpuidle_idle_call kernel/sched/idle.c:154 [inline]
    do_idle+0x1af/0x280 kernel/sched/idle.c:263
    cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
    rest_init+0xec/0xf6 init/main.c:452
    arch_call_rest_init+0x17/0x37
    start_kernel+0x838/0x85e init/main.c:786
    x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
    x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471
    secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Use READ_ONCE() and WRITE_ONCE() to annotate this expected race.

    Reported-by: syzbot
    Signed-off-by: Eric Dumazet
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20191205045619.204946-1-edumazet@google.com
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

05 Jan, 2020

2 commits

  • [ Upstream commit a33121e5487b424339636b25c35d3a180eaa5f5e ]

    In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
    device is removed, closing this file leads to a race. This reproduces
    easily in a kvm virtual machine:

    ts# cat openptp0.c
    int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
    ts# uname -r
    5.5.0-rc3-46cf053e
    ts# cat /proc/cmdline
    ... slub_debug=FZP
    ts# modprobe ptp_kvm
    ts# ./openptp0 &
    [1] 670
    opened /dev/ptp0, sleeping 10s...
    ts# rmmod ptp_kvm
    ts# ls /dev/ptp*
    ls: cannot access '/dev/ptp*': No such file or directory
    ts# ...woken up
    [ 48.010809] general protection fault: 0000 [#1] SMP
    [ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
    [ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
    [ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80
    [ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
    [ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
    [ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
    [ 48.019470] ... ^^^ a slub poison
    [ 48.023854] Call Trace:
    [ 48.024050] __fput+0x21f/0x240
    [ 48.024288] task_work_run+0x79/0x90
    [ 48.024555] do_exit+0x2af/0xab0
    [ 48.024799] ? vfs_write+0x16a/0x190
    [ 48.025082] do_group_exit+0x35/0x90
    [ 48.025387] __x64_sys_exit_group+0xf/0x10
    [ 48.025737] do_syscall_64+0x3d/0x130
    [ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 48.026479] RIP: 0033:0x7f53b12082f6
    [ 48.026792] ...
    [ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
    [ 48.045001] Fixing recursive fault but reboot is needed!

    This happens in:

    static void __fput(struct file *file)
    { ...
    if (file->f_op->release)
    file->f_op->release(inode, file); <<< cdev is kfree'd here
    if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
    !(mode & FMODE_PATH))) {
    cdev_put(inode->i_cdev); <<< cdev fields are accessed here

    Namely:

    __fput()
    posix_clock_release()
    kref_put(&clk->kref, delete_clock) <<< the last reference
    delete_clock()
    delete_ptp_clock()
    kfree(ptp) <<< cdev is embedded in ptp
    cdev_put
    module_put(p->owner) <<< *p is kfree'd, bang!

    Here cdev is embedded in posix_clock which is embedded in ptp_clock.
    The race happens because ptp_clock's lifetime is controlled by two
    refcounts: kref and cdev.kobj in posix_clock. This is wrong.

    Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
    created especially for such cases. This way the parent device with its
    ptp_clock is not released until all references to the cdev are released.
    This adds a requirement that an initialized but not exposed struct
    device should be provided to posix_clock_register() by a caller instead
    of a simple dev_t.

    This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix
    the race between the release of watchdog_core_data and cdev"). See
    details of the implementation in the commit 233ed09d7fda ("chardev: add
    helper function to register char devs with a struct device").

    Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u
    Analyzed-by: Stephen Johnston
    Analyzed-by: Vern Lovejoy
    Signed-off-by: Vladis Dronov
    Acked-by: Richard Cochran
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vladis Dronov
     
  • commit 56144737e67329c9aaed15f942d46a6302e2e3d8 upstream.

    syzbot reported various data-race caused by hrtimer_is_queued() reading
    timer->state. A READ_ONCE() is required there to silence the warning.

    Also add the corresponding WRITE_ONCE() when timer->state is set.

    In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
    loading timer->state twice.

    KCSAN reported these cases:

    BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

    write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
    __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
    __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
    __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
    hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
    __do_softirq+0x115/0x33f kernel/softirq.c:292
    run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
    smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
    tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
    tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
    tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
    tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
    tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
    tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
    tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
    sk_backlog_rcv include/net/sock.h:945 [inline]
    __release_sock+0x135/0x1e0 net/core/sock.c:2435
    release_sock+0x61/0x160 net/core/sock.c:2951
    sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
    tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
    tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
    inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg+0x9f/0xc0 net/socket.c:657

    BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

    write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
    __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
    __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
    __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
    hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
    __do_softirq+0x115/0x33f kernel/softirq.c:292
    invoke_softirq kernel/softirq.c:373 [inline]
    irq_exit+0xbb/0xe0 kernel/softirq.c:413
    exiting_irq arch/x86/include/asm/apic.h:536 [inline]
    smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

    read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
    __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
    tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
    tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
    tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
    sk_backlog_rcv include/net/sock.h:945 [inline]
    __release_sock+0x135/0x1e0 net/core/sock.c:2435
    release_sock+0x61/0x160 net/core/sock.c:2951
    sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
    tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
    tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
    inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg+0x9f/0xc0 net/socket.c:657
    __sys_sendto+0x21f/0x320 net/socket.c:1952
    __do_sys_sendto net/socket.c:1964 [inline]
    __se_sys_sendto net/socket.c:1960 [inline]
    __x64_sys_sendto+0x89/0xb0 net/socket.c:1960
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    [ tglx: Added comments ]

    Reported-by: syzbot
    Signed-off-by: Eric Dumazet
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

13 Dec, 2019

1 commit

  • commit 7b8474466ed97be458c825f34a85f2c2b84c3f95 upstream.

    On compat interfaces, the high order bits of nanoseconds should be zeroed
    out. This is because the application code or the libc do not guarantee
    zeroing of these. If used without zeroing, kernel might be at risk of using
    timespec values incorrectly.

    Originally it was handled correctly, but lost during is_compat_syscall()
    cleanup. Revert the condition back to check CONFIG_64BIT.

    Fixes: 98f76206b335 ("compat: Cleanup in_compat_syscall() callers")
    Reported-by: Ben Hutchings
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20191121000303.126523-1-dima@arista.com
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Safonov
     

12 Nov, 2019

1 commit

  • A cast to 'time_t' was accidentally left in place during the
    conversion of __do_adjtimex() to 64-bit timestamps, so the
    resulting value is incorrectly truncated.

    Remove the cast so the 64-bit time gets propagated correctly.

    Fixes: ead25417f82e ("timex: use __kernel_timex internally")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20191108203435.112759-2-arnd@arndb.de

    Arnd Bergmann
     

05 Nov, 2019

1 commit

  • The update of the VDSO data is depending on __arch_use_vsyscall() returning
    True. This is a leftover from the attempt to map the features of various
    architectures 1:1 into generic code.

    The usage of __arch_use_vsyscall() in the actual vsyscall implementations
    got dropped and replaced by the requirement for the architecture code to
    return U64_MAX if the global clocksource is not usable in the VDSO.

    But the __arch_use_vsyscall() check in the update code stayed which causes
    the VDSO data to be stale or invalid when an architecture actually
    implements that function and returns False when the current clocksource is
    not usable in the VDSO.

    As a consequence the VDSO implementations of clock_getres(), time(),
    clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus
    information.

    Remove the __arch_use_vsyscall() check from the VDSO update function and
    update the VDSO data unconditionally.

    [ tglx: Massaged changelog and removed the now useless implementations in
    asm-generic/ARM64/MIPS ]

    Fixes: 44f57d788e7deecb50 ("timekeeping: Provide a generic update_vsyscall() implementation")
    Signed-off-by: Huacai Chen
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Vincenzo Frascino
    Cc: Arnd Bergmann
    Cc: Paul Burton
    Cc: linux-mips@vger.kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com

    Huacai Chen
     

23 Oct, 2019

2 commits

  • Recent changes modified the function arguments of
    thread_group_sample_cputime() and task_cputimers_expired(), but forgot to
    update the comments. Fix it up.

    [ tglx: Changed the argument name of task_cputimers_expired() as the pointer
    points to an array of samples. ]

    Fixes: b7be4ef1365d ("posix-cpu-timers: Switch thread group sampling to array")
    Fixes: 001f7971433a ("posix-cpu-timers: Make expiry checks array based")
    Signed-off-by: Yi Wang
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/1571643852-21848-1-git-send-email-wang.yi59@zte.com.cn

    Yi Wang
     
  • Include the timekeeping.h header to get the declaration of the
    sched_clock_{suspend,resume} functions. Fixes the following sparse
    warnings:

    kernel/time/sched_clock.c:275:5: warning: symbol 'sched_clock_suspend' was not declared. Should it be static?
    kernel/time/sched_clock.c:286:6: warning: symbol 'sched_clock_resume' was not declared. Should it be static?

    Signed-off-by: Ben Dooks (Codethink)
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20191022131226.11465-1-ben.dooks@codethink.co.uk

    Ben Dooks (Codethink)
     

14 Oct, 2019

1 commit

  • Followup to commit dd2261ed45aa ("hrtimer: Protect lockless access
    to timer->base")

    lock_hrtimer_base() fetches timer->base without lock exclusion.

    Compiler is allowed to read timer->base twice (even if considered dumb)
    which could end up trying to lock migration_base and return
    &migration_base.

    base = timer->base;
    if (likely(base != &migration_base)) {

    /* compiler reads timer->base again, and now (base == &migration_base)

    raw_spin_lock_irqsave(&base->cpu_base->lock, *flags);
    if (likely(base == timer->base))
    return base; /* == &migration_base ! */

    Similarly the write sides must use WRITE_ONCE() to avoid store tearing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20191008173204.180879-1-edumazet@google.com

    Eric Dumazet
     

27 Sep, 2019

2 commits

  • When a cpu requests broadcasting, before starting the tick broadcast
    hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
    using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
    the required synchronization when the callback is active on other core.

    The callback could have already executed tick_handle_oneshot_broadcast()
    and could have also returned. But still there is a small time window where
    the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
    without doing anything, but the next_event of the tick broadcast clock
    device is already set to a timeout value.

    In the race condition diagram below, CPU #1 is running the timer callback
    and CPU #2 is entering idle state and so calls bc_set_next().

    In the worst case, the next_event will contain an expiry time, but the
    hrtimer will not be started which happens when the racing callback returns
    HRTIMER_NORESTART. The hrtimer might never recover if all further requests
    from the CPUs to subscribe to tick broadcast have timeout greater than the
    next_event of tick broadcast clock device. This leads to cascading of
    failures and finally noticed as rcu stall warnings

    Here is a depiction of the race condition

    CPU #1 (Running timer callback) CPU #2 (Enter idle
    and subscribe to
    tick broadcast)
    --------------------- ---------------------

    __run_hrtimer() tick_broadcast_enter()

    bc_handler() __tick_broadcast_oneshot_control()

    tick_handle_oneshot_broadcast()

    raw_spin_lock(&tick_broadcast_lock);

    dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock
    //next_event for tick broadcast clock
    set to KTIME_MAX since no other cores
    subscribed to tick broadcasting

    raw_spin_unlock(&tick_broadcast_lock);

    if (dev->next_event == KTIME_MAX)
    return HRTIMER_NORESTART
    // callback function exits without
    restarting the hrtimer //tick_broadcast_lock acquired
    raw_spin_lock(&tick_broadcast_lock);

    tick_broadcast_set_event()

    clockevents_program_event()

    dev->next_event = expires;

    bc_set_next()

    hrtimer_try_to_cancel()
    //returns -1 since the timer
    callback is active. Exits without
    restarting the timer
    cpu_base->running = NULL;

    The comment that hrtimer cannot be armed from within the callback is
    wrong. It is fine to start the hrtimer from within the callback. Also it is
    safe to start the hrtimer from the enter/exit idle code while the broadcast
    handler is active. The enter/exit idle code and the broadcast handler are
    synchronized using tick_broadcast_lock. So there is no need for the
    existing try to cancel logic. All this can be removed which will eliminate
    the race condition as well.

    Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")
    Originally-by: Thomas Gleixner
    Signed-off-by: Balasubramani Vivekanandan
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com

    Balasubramani Vivekanandan
     
  • Pull timer fix from Ingo Molnar:
    "Fix a timer expiry bug that would cause spurious delay of timers"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timer: Read jiffies once when forwarding base clk

    Linus Torvalds
     

19 Sep, 2019

1 commit

  • The timer delayed for more than 3 seconds warning was triggered during
    testing.

    Workqueue: events_unbound sched_tick_remote
    RIP: 0010:sched_tick_remote+0xee/0x100
    ...
    Call Trace:
    process_one_work+0x18c/0x3a0
    worker_thread+0x30/0x380
    kthread+0x113/0x130
    ret_from_fork+0x22/0x40

    The reason is that the code in collect_expired_timers() uses jiffies
    unprotected:

    if (next_event > jiffies)
    base->clk = jiffies;

    As the compiler is allowed to reload the value base->clk can advance
    between the check and the store and in the worst case advance farther than
    next event. That causes the timer expiry to be delayed until the wheel
    pointer wraps around.

    Convert the code to use READ_ONCE()

    Fixes: 236968383cf5 ("timers: Optimize collect_expired_timers() for NOHZ")
    Signed-off-by: Li RongQing
    Signed-off-by: Liang ZhiCheng
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu.com

    Li RongQing
     

18 Sep, 2019

2 commits

  • Pull power management updates from Rafael Wysocki:
    "These include a rework of the main suspend-to-idle code flow (related
    to the handling of spurious wakeups), a switch over of several users
    of cpufreq notifiers to QoS-based limits, a new devfreq driver for
    Tegra20, a new cpuidle driver and governor for virtualized guests, an
    extension of the wakeup sources framework to expose wakeup sources as
    device objects in sysfs, and more.

    Specifics:

    - Rework the main suspend-to-idle control flow to avoid repeating
    "noirq" device resume and suspend operations in case of spurious
    wakeups from the ACPI EC and decouple the ACPI EC wakeups support
    from the LPS0 _DSM support (Rafael Wysocki).

    - Extend the wakeup sources framework to expose wakeup sources as
    device objects in sysfs (Tri Vo, Stephen Boyd).

    - Expose system suspend statistics in sysfs (Kalesh Singh).

    - Introduce a new haltpoll cpuidle driver and a new matching governor
    for virtualized guests wanting to do guest-side polling in the idle
    loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell).

    - Fix the menu and teo cpuidle governors to allow the scheduler tick
    to be stopped if PM QoS is used to limit the CPU idle state exit
    latency in some cases (Rafael Wysocki).

    - Increase the resolution of the play_idle() argument to microseconds
    for more fine-grained injection of CPU idle cycles (Daniel
    Lezcano).

    - Switch over some users of cpuidle notifiers to the new QoS-based
    frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
    policy notifier events (Viresh Kumar).

    - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).

    - Add support for MT8183 and MT8516 to the mediatek cpufreq driver
    (Andrew-sh.Cheng, Fabien Parent).

    - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
    Huang).

    - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).

    - Update the qcom cpufreq driver (among other things, to make it
    easier to extend and to use kryo cpufreq for other nvmem-based
    SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
    RAILLARD, Sibi Sankar, Sricharan R).

    - Fix assorted issues and make assorted minor improvements in the
    cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
    Gustavo Silva, Hariprasad Kelam).

    - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
    Bergmann).

    - Add new Exynos PPMU events to devfreq events and extend that
    mechanism (Lukasz Luba).

    - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).

    - Improve devfreq documentation and governor code, fix spelling typos
    in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez,
    MyungJoo Ham, Gaël PORTAY).

    - Add regulators enable and disable to the OPP (operating performance
    points) framework (Kamil Konieczny).

    - Update the OPP framework to support multiple opp-suspend properties
    (Anson Huang).

    - Fix assorted issues and make assorted minor improvements in the OPP
    code (Niklas Cassel, Viresh Kumar, Yue Hu).

    - Clean up the generic power domains (genpd) framework (Ulf Hansson).

    - Clean up assorted pieces of power management code and documentation
    (Akinobu Mita, Amit Kucheria, Chuhong Yuan).

    - Update the pm-graph tool to version 5.5 including multiple fixes
    and improvements (Todd Brandt).

    - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
    Sébastien Szymanski)"

    * tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits)
    cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
    cpuidle-haltpoll: do not set an owner to allow modunload
    cpuidle-haltpoll: return -ENODEV on modinit failure
    cpuidle-haltpoll: set haltpoll as preferred governor
    cpuidle: allow governor switch on cpuidle_register_driver()
    PM: runtime: Documentation: add runtime_status ABI document
    pm-graph: make setVal unbuffered again for python2 and python3
    powercap: idle_inject: Use higher resolution for idle injection
    cpuidle: play_idle: Increase the resolution to usec
    cpuidle-haltpoll: vcpu hotplug support
    cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist
    cpufreq: qcom: Add support for qcs404 on nvmem driver
    cpufreq: qcom: Refactor the driver to make it easier to extend
    cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs
    dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR
    dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain
    Documentation: cpufreq: Update policy notifier documentation
    cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
    PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
    PM / Domains: Simplify genpd_lookup_dev()
    ...

    Linus Torvalds
     
  • Pull core timer updates from Thomas Gleixner:
    "Timers and timekeeping updates:

    - A large overhaul of the posix CPU timer code which is a preparation
    for moving the CPU timer expiry out into task work so it can be
    properly accounted on the task/process.

    An update to the bogus permission checks will come later during the
    merge window as feedback was not complete before heading of for
    travel.

    - Switch the timerqueue code to use cached rbtrees and get rid of the
    homebrewn caching of the leftmost node.

    - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
    single function

    - Implement the separation of hrtimers to be forced to expire in hard
    interrupt context even when PREEMPT_RT is enabled and mark the
    affected timers accordingly.

    - Implement a mechanism for hrtimers and the timer wheel to protect
    RT against priority inversion and live lock issues when a (hr)timer
    which should be canceled is currently executing the callback.
    Instead of infinitely spinning, the task which tries to cancel the
    timer blocks on a per cpu base expiry lock which is held and
    released by the (hr)timer expiry code.

    - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
    resulting in faster access to timekeeping functions.

    - Updates to various clocksource/clockevent drivers and their device
    tree bindings.

    - The usual small improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
    posix-cpu-timers: Fix permission check regression
    posix-cpu-timers: Always clear head pointer on dequeue
    hrtimer: Add a missing bracket and hide `migration_base' on !SMP
    posix-cpu-timers: Make expiry_active check actually work correctly
    posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
    tick: Mark sched_timer to expire in hard interrupt context
    hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
    x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
    posix-cpu-timers: Utilize timerqueue for storage
    posix-cpu-timers: Move state tracking to struct posix_cputimers
    posix-cpu-timers: Deduplicate rlimit handling
    posix-cpu-timers: Remove pointless comparisons
    posix-cpu-timers: Get rid of 64bit divisions
    posix-cpu-timers: Consolidate timer expiry further
    posix-cpu-timers: Get rid of zero checks
    rlimit: Rewrite non-sensical RLIMIT_CPU comment
    posix-cpu-timers: Respect INFINITY for hard RTTIME limit
    posix-cpu-timers: Switch thread group sampling to array
    posix-cpu-timers: Restructure expiry array
    posix-cpu-timers: Remove cputime_expires
    ...

    Linus Torvalds
     

17 Sep, 2019

1 commit

  • * pm-sleep: (29 commits)
    ACPI: PM: s2idle: Always set up EC GPE for system wakeup
    ACPI: PM: s2idle: Avoid rearming SCI for wakeup unnecessarily
    PM / wakeup: Unexport wakeup_source_sysfs_{add,remove}()
    PM / wakeup: Register wakeup class kobj after device is added
    PM / wakeup: Fix sysfs registration error path
    PM / wakeup: Show wakeup sources stats in sysfs
    PM / wakeup: Use wakeup_source_register() in wakelock.c
    PM / wakeup: Drop wakeup_source_init(), wakeup_source_prepare()
    PM: sleep: Replace strncmp() with str_has_prefix()
    PM: suspend: Fix platform_suspend_prepare_noirq()
    intel-hid: Disable button array during suspend-to-idle
    intel-hid: intel-vbtn: Avoid leaking wakeup_mode set
    ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices
    ACPI: EC: PM: Make acpi_ec_dispatch_gpe() print debug message
    ACPI: EC: PM: Consolidate some code depending on PM_SLEEP
    ACPI: PM: s2idle: Eliminate acpi_sleep_no_ec_events()
    ACPI: PM: s2idle: Switch EC over to polling during "noirq" suspend
    ACPI: PM: s2idle: Add acpi.sleep_no_lps0 module parameter
    ACPI: PM: s2idle: Rearrange lps0_device_attach()
    PM/sleep: Expose suspend stats in sysfs
    ...

    Rafael J. Wysocki
     

10 Sep, 2019

1 commit

  • The recent consolidation of the three permission checks introduced a subtle
    regression. For timer_create() with a process wide timer it returns the
    current task if the lookup through the PID which is encoded into the
    clockid results in returning current.

    That's broken because it does not validate whether the current task is the
    group leader.

    That was caused by the two different variants of permission checks:

    - posix_cpu_timer_get() allowed access to the process wide clock when the
    looked up task is current. That's not an issue because the process wide
    clock is in the shared sighand.

    - posix_cpu_timer_create() made sure that the looked up task is the group
    leader.

    Restore the previous state.

    Note, that these permission checks are more than questionable, but that's
    subject to follow up changes.

    Fixes: 6ae40e3fdcd3 ("posix-cpu-timers: Provide task validation functions")
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1909052314110.1902@nanos.tec.linutronix.de

    Thomas Gleixner
     

06 Sep, 2019

1 commit

  • ENOTSUPP is not supposed to be returned to userspace. This was found on an
    OpenPower machine, where the RTC does not support set_alarm.

    On that system, a clock_nanosleep(CLOCK_REALTIME_ALARM, ...) results in
    "524 Unknown error 524"

    Replace it with EOPNOTSUPP which results in the expected "95 Operation not
    supported" error.

    Fixes: 1c6b39ad3f01 (alarmtimers: Return -ENOTSUPP if no RTC device is present)
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190903171802.28314-1-cascardo@canonical.com

    Thadeu Lima de Souza Cascardo
     

05 Sep, 2019

1 commit

  • The recent change to avoid taking the expiry lock when a timer is currently
    migrated missed to add a bracket at the end of the if statement leading to
    compile errors. Since that commit the variable `migration_base' is always
    used but it is only available on SMP configuration thus leading to another
    compile error. The changelog says "The timer base and base->cpu_base
    cannot be NULL in the code path", so it is safe to limit this check to SMP
    configurations only.

    Add the missing bracket to the if statement and hide `migration_base'
    behind CONFIG_SMP bars.

    [ tglx: Mark the functions inline ... ]

    Fixes: 68b2c8c1e4210 ("hrtimer: Don't take expiry_lock when timer is currently migrated")
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190904145527.eah7z56ntwobqm6j@linutronix.de

    Sebastian Andrzej Siewior
     

29 Aug, 2019

1 commit

  • The state tracking changes broke the expiry active check by not writing to
    it and instead sitting timers_active, which is already set.

    That's not a big issue as the actual expiry is protected by sighand lock,
    so concurrent handling is not possible. That means that the second task
    which invokes that function executes the expiry code for nothing.

    Write to the proper flag.

    Also add a check whether the flag is set into check_process_timers(). That
    check had been missing in the code before the rework already. The check for
    another task handling the expiry of process wide timers was only done in
    the fastpath check. If the fastpath check returns true because a per task
    timer expired, then the checking of process wide timers was done in
    parallel which is as explained above just a waste of cycles.

    Fixes: 244d49e30653 ("posix-cpu-timers: Move state tracking to struct posix_cputimers")
    Signed-off-by: Thomas Gleixner
    Cc: Frederic Weisbecker

    Thomas Gleixner
     

28 Aug, 2019

16 commits

  • sched_timer must be initialized with the _HARD mode suffix to ensure expiry
    in hard interrupt context on RT.

    The previous conversion to HARD expiry mode missed on one instance in
    tick_nohz_switch_to_nohz(). Fix it up.

    Fixes: 902a9f9c50905 ("tick: Mark tick related hrtimers to expiry in hard interrupt context")
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190823113845.12125-3-bigeasy@linutronix.de

    Sebastian Andrzej Siewior
     
  • Using a linear O(N) search for timer insertion affects execution time and
    D-cache footprint badly with a larger number of timers.

    Switch the storage to a timerqueue which is already used for hrtimers and
    alarmtimers. It does not affect the size of struct k_itimer as it.alarm is
    still larger.

    The extra list head for the expiry list will go away later once the expiry
    is moved into task work context.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908272129220.1939@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • Put it where it belongs and clean up the ifdeffery in fork completely.

    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de

    Thomas Gleixner
     
  • Both thread and process expiry functions have the same functionality for
    sending signals for soft and hard RLIMITs duplicated in 4 different
    ways.

    Split it out into a common function and cleanup the callsites.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.653276779@linutronix.de

    Thomas Gleixner
     
  • The soft RLIMIT expiry code checks whether the soft limit is greater than
    the hard limit. That's pointless because if the soft RLIMIT is greater than
    the hard RLIMIT then that code cannot be reached as the hard RLIMIT check
    is before that and already killed the process.

    Remove it.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.548747613@linutronix.de

    Thomas Gleixner
     
  • Instead of dividing A to match the units of B it's more efficient to
    multiply B to match the units of A.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.458286860@linutronix.de

    Thomas Gleixner
     
  • With the array based samples and expiry cache, the expiry function can use
    a loop to collect timers from the clock specific lists.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.365469982@linutronix.de

    Thomas Gleixner
     
  • Deactivation of the expiry cache is done by setting all clock caches to
    0. That requires to have a check for zero in all places which update the
    expiry cache:

    if (cache == 0 || new < cache)
    cache = new;

    Use U64_MAX as the deactivated value, which allows to remove the zero
    checks when updating the cache and reduces it to the obvious check:

    if (new < cache)
    cache = new;

    This also removes the weird workaround in do_prlimit() which was required
    to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing
    in 0 to the posix CPU timer code would have effectively disarmed it.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.275086128@linutronix.de

    Thomas Gleixner
     
  • The RTIME limit expiry code does not check the hard RTTIME limit for
    INFINITY, i.e. being disabled. Add it.

    While this could be considered an ABI breakage if something would depend on
    this behaviour. Though it's highly unlikely to have an effect because
    RLIM_INFINITY is at minimum INT_MAX and the RTTIME limit is in seconds, so
    the timer would fire after ~68 years.

    Adding this obvious correct limit check also allows further consolidation
    of that code and is a prerequisite for cleaning up the 0 based checks and
    the rlimit setter code.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192922.078293002@linutronix.de

    Thomas Gleixner
     
  • That allows more simplifications in various places.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192921.988426956@linutronix.de

    Thomas Gleixner
     
  • Now that the abused struct task_cputime is gone, it's more natural to
    bundle the expiry cache and the list head of each clock into a struct and
    have an array of those structs.

    Follow the hrtimer naming convention of 'bases' and rename the expiry cache
    to 'nextevt' and adapt all usage sites.

    Generates also better code .text size shrinks by 80 bytes.

    Suggested-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908262021140.1939@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • The last users of the magic struct cputime based expiry cache are
    gone. Remove the leftovers.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192921.790209622@linutronix.de

    Thomas Gleixner
     
  • The expiry cache is an array indexed by clock ids. The new sample functions
    allow to retrieve a corresponding array of samples.

    Convert the fastpath expiry checks to make use of the new sample functions
    and do the comparisons on the sample and the expiry array.

    Make the check for the expiry array being zero array based as well.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192921.695481430@linutronix.de

    Thomas Gleixner
     
  • Instead of using task_cputime and doing the addition of utime and stime at
    all call sites, it's way simpler to have a sample array which allows
    indexed based checks against the expiry cache array.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192921.590362974@linutronix.de

    Thomas Gleixner
     
  • Use the array based expiry cache in check_thread_timers() and convert the
    store in check_process_timers() for consistency.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192921.408222378@linutronix.de

    Thomas Gleixner
     
  • The expiry cache can now be accessed as an array. Replace the per clock
    checks with a simple comparison of the clock indexed array member.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190821192921.303316423@linutronix.de

    Thomas Gleixner