18 Apr, 2019

1 commit

  • Commit 0a0e0829f990 ("nohz: Fix missing tick reprogram when interrupting an
    inline softirq") got backported to stable trees and now causes the NOHZ
    softirq pending warning to trigger. It's not an upstream issue as the NOHZ
    update logic has been changed there.

    The problem is when a softirq disabled section gets interrupted and on
    return from interrupt the tick/nohz state is evaluated, which then can
    observe pending soft interrupts. These soft interrupts are legitimately
    pending because they cannot be processed as long as soft interrupts are
    disabled and the interrupted code will correctly process them when soft
    interrupts are reenabled.

    Add a check for softirqs disabled to the pending check to prevent the
    warning.

    Reported-by: Grygorii Strashko
    Reported-by: John Crispin
    Signed-off-by: Thomas Gleixner
    Tested-by: Grygorii Strashko
    Tested-by: John Crispin
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Anna-Maria Gleixner
    Cc: Greg Kroah-Hartman
    Cc: stable@vger.kernel.org
    Acked-by: Frederic Weisbecker
    Tested-by: Geert Uytterhoeven

    Signed-off-by: Leonard Crestez
    Acked-by: Jason Liu
    Signed-off-by: Arulpandiyan Vadivel

    Thomas Gleixner
     

17 Apr, 2019

1 commit

  • commit 07d7e12091f4ab869cc6a4bb276399057e73b0b3 upstream.

    To calculate a remaining time, it's required to subtract the current time
    from the expiration time. In alarm_timer_remaining() the arguments of
    ktime_sub are swapped.

    Fixes: d653d8457c76 ("alarmtimer: Implement remaining callback")
    Signed-off-by: Andrei Vagin
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Mukesh Ojha
    Cc: Stephen Boyd
    Cc: John Stultz
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190408041542.26338-1-avagin@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Andrei Vagin
     

13 Feb, 2019

1 commit

  • [ Upstream commit ce10a5b3954f2514af726beb78ed8d7350c5e41c ]

    tk_core.seq is initialized open coded, but that misses to initialize the
    lockdep map when lockdep is enabled. Lockdep splats involving tk_core seq
    consequently lack a name and are hard to read.

    Use the proper initializer which takes care of the lockdep map
    initialization.

    [ tglx: Massaged changelog ]

    Signed-off-by: Bart Van Assche
    Signed-off-by: Thomas Gleixner
    Cc: peterz@infradead.org
    Cc: tj@kernel.org
    Cc: johannes.berg@intel.com
    Link: https://lkml.kernel.org/r/20181128234325.110011-12-bvanassche@acm.org
    Signed-off-by: Sasha Levin

    Bart Van Assche
     

31 Jan, 2019

1 commit

  • commit 93ad0fc088c5b4631f796c995bdd27a082ef33a6 upstream.

    The recent commit which prevented a division by 0 issue in the alarm timer
    code broke posix CPU timers as an unwanted side effect.

    The reason is that the common rearm code checks for timer->it_interval
    being 0 now. What went unnoticed is that the posix cpu timer setup does not
    initialize timer->it_interval as it stores the interval in CPU timer
    specific storage. The reason for the separate storage is historical as the
    posix CPU timers always had a 64bit nanoseconds representation internally
    while timer->it_interval is type ktime_t which used to be a modified
    timespec representation on 32bit machines.

    Instead of reverting the offending commit and fixing the alarmtimer issue
    in the alarmtimer code, store the interval in timer->it_interval at CPU
    timer setup time so the common code check works. This also repairs the
    existing inconistency of the posix CPU timer code which kept a single shot
    timer armed despite of the interval being 0.

    The separate storage can be removed in mainline, but that needs to be a
    separate commit as the current one has to be backported to stable kernels.

    Fixes: 0e334db6bb4b ("posix-timers: Fix division by zero bug")
    Reported-by: H.J. Lu
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190111133500.840117406@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Dec, 2018

1 commit

  • commit 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 upstream.

    The signal delivery path of posix-timers can try to rearm the timer even if
    the interval is zero. That's handled for the common case (hrtimer) but not
    for alarm timers. In that case the forwarding function raises a division by
    zero exception.

    The handling for hrtimer based posix timers is wrong because it marks the
    timer as active despite the fact that it is stopped.

    Move the check from common_hrtimer_rearm() to posixtimer_rearm() to cure
    both issues.

    Reported-by: syzbot+9d38bedac9cc77b8ad5e@syzkaller.appspotmail.com
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: sboyd@kernel.org
    Cc: stable@vger.kernel.org
    Cc: syzkaller-bugs@googlegroups.com
    Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1812171328050.1880@nanos.tec.linutronix.de
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

07 Sep, 2018

1 commit

  • I turns out that the silly spawn kthread from worker was actually needed.

    clocksource_watchdog_kthread() cannot be called directly from
    clocksource_watchdog_work(), because clocksource_select() calls
    timekeeping_notify() which uses stop_machine(). One cannot use
    stop_machine() from a workqueue() due lock inversions wrt CPU hotplug.

    Revert the patch but add a comment that explain why we jump through such
    apparently silly hoops.

    Fixes: 7197e77abcb6 ("clocksource: Remove kthread")
    Reported-by: Siegfried Metz
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Tested-by: Niklas Cassel
    Tested-by: Kevin Shanahan
    Tested-by: viktor_jaegerskuepper@freenet.de
    Tested-by: Siegfried Metz
    Cc: rafael.j.wysocki@intel.com
    Cc: len.brown@intel.com
    Cc: diego.viola@gmail.com
    Cc: rui.zhang@intel.com
    Cc: bjorn.andersson@linaro.org
    Link: https://lkml.kernel.org/r/20180905084158.GR24124@hirez.programming.kicks-ass.net

    Peter Zijlstra
     

22 Aug, 2018

1 commit

  • …iederm/user-namespace

    Pull core signal handling updates from Eric Biederman:
    "It was observed that a periodic timer in combination with a
    sufficiently expensive fork could prevent fork from every completing.
    This contains the changes to remove the need for that restart.

    This set of changes is split into several parts:

    - The first part makes PIDTYPE_TGID a proper pid type instead
    something only for very special cases. The part starts using
    PIDTYPE_TGID enough so that in __send_signal where signals are
    actually delivered we know if the signal is being sent to a a group
    of processes or just a single process.

    - With that prep work out of the way the logic in fork is modified so
    that fork logically makes signals received while it is running
    appear to be received after the fork completes"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
    signal: Don't send signals to tasks that don't exist
    signal: Don't restart fork when signals come in.
    fork: Have new threads join on-going signal group stops
    fork: Skip setting TIF_SIGPENDING in ptrace_init_task
    signal: Add calculate_sigpending()
    fork: Unconditionally exit if a fatal signal is pending
    fork: Move and describe why the code examines PIDNS_ADDING
    signal: Push pid type down into complete_signal.
    signal: Push pid type down into __send_signal
    signal: Push pid type down into send_signal
    signal: Pass pid type into do_send_sig_info
    signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
    signal: Pass pid type into group_send_sig_info
    signal: Pass pid and pid type into send_sigqueue
    posix-timers: Noralize good_sigevent
    signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
    pid: Implement PIDTYPE_TGID
    pids: Move the pgrp and session pid pointers from task_struct to signal_struct
    kvm: Don't open code task_pid in kvm_vcpu_ioctl
    pids: Compute task_tgid using signal->leader_pid
    ...

    Linus Torvalds
     

14 Aug, 2018

3 commits

  • Pull parisc updates from Helge Deller:

    - parisc now uses the generic dma_noncoherent_ops implementation
    (Christoph Hellwig)

    - further memory barrier and spinlock improvements (John David Anglin)

    - prepare removal of current_text_addr() functions (Nick Desaulniers)

    - improve kernel stack unwinding on parisc (me)

    - drop ENOTSUP which was defined on parisc only (me)

    * 'parisc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Fix and improve kernel stack unwinding
    parisc: Remove unnecessary barriers from spinlock.h
    parisc: Remove ordered stores from syscall.S
    parisc: prefer _THIS_IP_ and _RET_IP_ statement expressions
    parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature
    parisc: Drop architecture-specific ENOTSUP define
    parisc: use generic dma_noncoherent_ops
    parisc: always use flush_kernel_dcache_range for DMA cache maintainance
    parisc: merge pcx_dma_ops and pcxl_dma_ops

    Linus Torvalds
     
  • Pull x86 timer updates from Thomas Gleixner:
    "Early TSC based time stamping to allow better boot time analysis.

    This comes with a general cleanup of the TSC calibration code which
    grew warts and duct taping over the years and removes 250 lines of
    code. Initiated and mostly implemented by Pavel with help from various
    folks"

    * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    x86/kvmclock: Mark kvm_get_preset_lpj() as __init
    x86/tsc: Consolidate init code
    sched/clock: Disable interrupts when calling generic_sched_clock_init()
    timekeeping: Prevent false warning when persistent clock is not available
    sched/clock: Close a hole in sched_clock_init()
    x86/tsc: Make use of tsc_calibrate_cpu_early()
    x86/tsc: Split native_calibrate_cpu() into early and late parts
    sched/clock: Use static key for sched_clock_running
    sched/clock: Enable sched clock early
    sched/clock: Move sched clock initialization and merge with generic clock
    x86/tsc: Use TSC as sched clock early
    x86/tsc: Initialize cyc2ns when tsc frequency is determined
    x86/tsc: Calibrate tsc only once
    ARM/time: Remove read_boot_clock64()
    s390/time: Remove read_boot_clock64()
    timekeeping: Default boot time offset to local_clock()
    timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
    s390/time: Add read_persistent_wall_and_boot_offset()
    x86/xen/time: Output xen sched_clock time from 0
    x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "The timers departement more or less proudly presents:

    - More Y2038 timekeeping work mostly in the core code. The work is
    slowly, but steadily targeting the actuall syscalls.

    - Enhanced timekeeping suspend/resume support by utilizing
    clocksources which do not stop during suspend, but are otherwise
    not the main timekeeping clocksources.

    - Make NTP adjustmets more accurate and immediate when the frequency
    is set directly and not incrementally.

    - Sanitize the overrung handing of posix timers

    - A new timer driver for Mediatek SoCs

    - The usual pile of fixes and updates all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
    clockevents: Warn if cpu_all_mask is used as cpumask
    tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimer
    clocksource/drivers/arm_arch_timer: Fix bogus cpu_all_mask usage
    clocksource: ti-32k: Remove CLOCK_SOURCE_SUSPEND_NONSTOP flag
    timers: Clear timer_base::must_forward_clk with timer_base::lock held
    clocksource/drivers/sprd: Register one always-on timer to compensate suspend time
    clocksource/drivers/timer-mediatek: Add support for system timer
    clocksource/drivers/timer-mediatek: Convert the driver to timer-of
    clocksource/drivers/timer-mediatek: Use specific prefix for GPT
    clocksource/drivers/timer-mediatek: Rename mtk_timer to timer-mediatek
    clocksource/drivers/timer-mediatek: Add system timer bindings
    clocksource/drivers: Set clockevent device cpumask to cpu_possible_mask
    time: Introduce one suspend clocksource to compensate the suspend time
    time: Fix extra sleeptime injection when suspend fails
    timekeeping/ntp: Constify some function arguments
    ntp: Use kstrtos64 for s64 variable
    ntp: Remove redundant arguments
    timer: Fix coding style
    ktime: Provide typesafe ktime_to_ns()
    hrtimer: Improve kernel message printing
    ...

    Linus Torvalds
     

13 Aug, 2018

1 commit

  • parisc is the only Linux architecture which has defined a value for ENOTSUP.
    All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers.

    Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives
    problems with userspace programs which expect both to be the same. One such
    example is a build error in the libuv package, as can be seen in
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237.

    Since we dropped HP-UX support, there is no real benefit in keeping an own
    value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the
    kernel sources. glibc needs no patch, it reuses the exported headers.

    Signed-off-by: Helge Deller

    Helge Deller
     

02 Aug, 2018

3 commits

  • Using cpu_all_mask in clockevents cpumask may result in issues while
    comparing multiple clockevent devices to choose the preferred one.

    On one of the platforms with 2 system (i.e. non per-CPU) timers with
    different ratings, having cpu_all_mask for one of the device resulted in a
    boot hang due to a endless loop in clockevents_notify_released() as both
    were clocksources were selected as preferred.

    In order to prevent such issues in the future, warn if any clockevent
    driver sets cpu_all_mask as it's cpumask and just override it to use
    cpu_possible_mask. All the existing occurrences of cpu_all_mask are already
    replaced with cpu_possible_mask.

    Signed-off-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: linux-arm-kernel@lists.infradead.org
    Link: https://lkml.kernel.org/r/1531308264-24220-3-git-send-email-sudeep.holla@arm.com

    Sudeep Holla
     
  • This is the last instance of cpu_all_mask usage in the core framework.

    Replace it with cpu_possible_mask like all other instances in the
    clockevent drivers. This makes it possible to add a warning in the core
    clockevents_register_device on usage of cpu_all_mask from any clockevent
    drivers in the future.

    Signed-off-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: linux-arm-kernel@lists.infradead.org
    Link: https://lkml.kernel.org/r/1531308264-24220-2-git-send-email-sudeep.holla@arm.com

    Sudeep Holla
     
  • timer_base::must_forward_clock is indicating that the base clock might be
    stale due to a long idle sleep.

    The forwarding of the base clock takes place in the timer softirq or when a
    timer is enqueued to a base which is idle. If the enqueue of timer to an
    idle base happens from a remote CPU, then the following race can happen:

    CPU0 CPU1
    run_timer_softirq mod_timer

    base = lock_timer_base(timer);
    base->must_forward_clk = false
    if (base->must_forward_clk)
    forward(base); -> skipped

    enqueue_timer(base, timer, idx);
    -> idx is calculated high due to
    stale base
    unlock_timer_base(timer);
    base = lock_timer_base(timer);
    forward(base);

    The root cause is that timer_base::must_forward_clk is cleared outside the
    timer_base::lock held region, so the remote queuing CPU observes it as
    cleared, but the base clock is still stale. This can cause large
    granularity values for timers, i.e. the accuracy of the expiry time
    suffers.

    Prevent this by clearing the flag with timer_base::lock held, so that the
    forwarding takes place before the cleared flag is observable by a remote
    CPU.

    Signed-off-by: Gaurav Kohli
    Signed-off-by: Thomas Gleixner
    Cc: john.stultz@linaro.org
    Cc: sboyd@kernel.org
    Cc: linux-arm-msm@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org

    Gaurav Kohli
     

01 Aug, 2018

1 commit

  • local_timer_softirq_pending() checks whether the timer softirq is
    pending with: local_softirq_pending() & TIMER_SOFTIRQ.

    This is wrong because TIMER_SOFTIRQ is the softirq number and not a
    bitmask. So the test checks for the wrong bit.

    Use BIT(TIMER_SOFTIRQ) instead.

    Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
    Signed-off-by: Anna-Maria Gleixner
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Paul E. McKenney
    Reviewed-by: Daniel Bristot de Oliveira
    Acked-by: Frederic Weisbecker
    Cc: bigeasy@linutronix.de
    Cc: peterz@infradead.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de

    Anna-Maria Gleixner
     

31 Jul, 2018

1 commit

  • On arches with no persistent clock a message like this is printed during
    boot:

    [ 0.000000] Persistent clock returned invalid value

    The value is not invalid: Zero means that no persistent clock is available
    and the absence of persistent clock should be quietly accepted.

    Fixes: 3eca993740b8 ("timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()")
    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: sboyd@kernel.org
    Cc: john.stultz@linaro.org
    Link: https://lkml.kernel.org/r/20180725200018.23722-1-pasha.tatashin@oracle.com

    Pavel Tatashin
     

21 Jul, 2018

3 commits

  • Make the code more maintainable by performing more of the signal
    related work in send_sigqueue.

    A quick inspection of do_timer_create will show that this code path
    does not lookup a thread group by a thread's pid. Making it safe
    to find the task pointed to by it_pid with "pid_task(it_pid, type)";

    This supports the changes needed in fork to tell if a signal was sent
    to a single process or a group of processes.

    Having the pid to task transition in signal.c will also make it easier
    to sort out races with de_thread and and the thread group leader
    exiting when it comes time to address that.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • In good_sigevent directly compute the default return value as
    "task_tgid(current)". This is exactly the same as
    "task_pid(current->group_leader)" but written more clearly.

    In the thread case first compute the thread's pid. Then veify that
    attached to that pid is a thread of the current thread group.

    This has the net effect of making the code a little clearer, and
    making it obvious that posix timers never look up a process by a the
    pid of a thread.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Everywhere except in the pid array we distinguish between a tasks pid and
    a tasks tgid (thread group id). Even in the enumeration we want that
    distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid
    we almost have an implementation of PIDTYPE_TGID in struct signal_struct.

    Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
    into the pids array. Then remove the __PIDTYPE_TGID special case and the
    leader_pid in signal_struct.

    The net size increase is just an extra pointer added to struct pid and
    an extra pair of pointers of an hlist_node added to task_struct.

    The effect on code maintenance is the removal of a number of special
    cases today and the potential to remove many more special cases as
    PIDTYPE_TGID gets used to it's fullest. The long term potential
    is allowing zombie thread group leaders to exit, which will remove
    a lot more special cases in the code.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Jul, 2018

9 commits

  • …/linux into timers/core

    Pull the second set of timekeeping things for 4.19 from John Stultz

    * NTP argument clenaups and constification from Ondrej Mosnacek
    * Fix to avoid RTC injecting sleeptime when suspend fails from
    Mukesh Ojha
    * Broading suspsend-timing to include non-stop clocksources that
    aren't currently used for timekeeping from Baolin Wang

    Thomas Gleixner
     
  • On some hardware with multiple clocksources, we have coarse grained
    clocksources that support the CLOCK_SOURCE_SUSPEND_NONSTOP flag, but
    which are less than ideal for timekeeping whereas other clocksources
    can be better candidates but halt on suspend.

    Currently, the timekeeping core only supports timing suspend using
    CLOCK_SOURCE_SUSPEND_NONSTOP clocksources if that clocksource is the
    current clocksource for timekeeping.

    As a result, some architectures try to implement read_persistent_clock64()
    using those non-stop clocksources, but isn't really ideal, which will
    introduce more duplicate code. To fix this, provide logic to allow a
    registered SUSPEND_NONSTOP clocksource, which isn't the current
    clocksource, to be used to calculate the suspend time.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Cc: Daniel Lezcano
    Reviewed-by: Thomas Gleixner
    Reviewed-by: Daniel Lezcano
    Suggested-by: Thomas Gleixner
    Signed-off-by: Baolin Wang
    [jstultz: minor tweaks to merge with previous resume changes]
    Signed-off-by: John Stultz

    Baolin Wang
     
  • Currently, there exists a corner case assuming when there is
    only one clocksource e.g RTC, and system failed to go to
    suspend mode. While resume rtc_resume() injects the sleeptime
    as timekeeping_rtc_skipresume() returned 'false' (default value
    of sleeptime_injected) due to which we can see mismatch in
    timestamps.

    This issue can also come in a system where more than one
    clocksource are present and very first suspend fails.

    Success case:
    ------------
    {sleeptime_injected=false}
    rtc_suspend() => timekeeping_suspend() => timekeeping_resume() =>

    (sleeptime injected)
    rtc_resume()

    Failure case:
    ------------
    {failure in sleep path} {sleeptime_injected=false}
    rtc_suspend() => rtc_resume()

    {sleeptime injected again which was not required as the suspend failed}

    Fix this by handling the boolean logic properly.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Originally-by: Thomas Gleixner
    Signed-off-by: Mukesh Ojha
    Signed-off-by: John Stultz

    Mukesh Ojha
     
  • Add 'const' to some function arguments and variables to make it easier
    to read the code.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Signed-off-by: Ondrej Mosnacek
    [jstultz: Also fixup pre-existing checkpatch warnings for
    prototype arguments with no variable name]
    Signed-off-by: John Stultz

    Ondrej Mosnacek
     
  • sched_clock_postinit() initializes a generic clock on systems where no
    other clock is provided. This function may be called only after
    timekeeping_init().

    Rename sched_clock_postinit to generic_clock_inti() and call it from
    sched_clock_init(). Move the call for sched_clock_init() until after
    time_init().

    Suggested-by: Peter Zijlstra
    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: linux@armlinux.org.uk
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: john.stultz@linaro.org
    Cc: sboyd@codeaurora.org
    Cc: hpa@zytor.com
    Cc: douly.fnst@cn.fujitsu.com
    Cc: prarit@redhat.com
    Cc: feng.tang@intel.com
    Cc: pmladek@suse.com
    Cc: gnomes@lxorguk.ukuu.org.uk
    Cc: linux-s390@vger.kernel.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180719205545.16512-23-pasha.tatashin@oracle.com

    Pavel Tatashin
     
  • read_persistent_wall_and_boot_offset() is called during boot to read
    both the persistent clock and also return the offset between the boot time
    and the value of persistent clock.

    Change the default boot_offset from zero to local_clock() so architectures,
    that do not have a dedicated boot_clock but have early sched_clock(), such
    as SPARCv9, x86, and possibly more will benefit from this change by getting
    a better and more consistent estimate of the boot time without need for an
    arch specific implementation.

    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: linux@armlinux.org.uk
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: john.stultz@linaro.org
    Cc: sboyd@codeaurora.org
    Cc: hpa@zytor.com
    Cc: douly.fnst@cn.fujitsu.com
    Cc: peterz@infradead.org
    Cc: prarit@redhat.com
    Cc: feng.tang@intel.com
    Cc: pmladek@suse.com
    Cc: gnomes@lxorguk.ukuu.org.uk
    Cc: linux-s390@vger.kernel.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180719205545.16512-17-pasha.tatashin@oracle.com

    Pavel Tatashin
     
  • If architecture does not support exact boot time, it is challenging to
    estimate boot time without having a reference to the current persistent
    clock value. Yet, it cannot read the persistent clock time again, because
    this may lead to math discrepancies with the caller of read_boot_clock64()
    who have read the persistent clock at a different time.

    This is why it is better to provide two values simultaneously: the
    persistent clock value, and the boot time.

    Replace read_boot_clock64() with:
    read_persistent_wall_and_boot_offset(wall_time, boot_offset)

    Where wall_time is returned by read_persistent_clock() And boot_offset is
    wall_time - boot time, which defaults to 0.

    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: linux@armlinux.org.uk
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: john.stultz@linaro.org
    Cc: sboyd@codeaurora.org
    Cc: hpa@zytor.com
    Cc: douly.fnst@cn.fujitsu.com
    Cc: peterz@infradead.org
    Cc: prarit@redhat.com
    Cc: feng.tang@intel.com
    Cc: pmladek@suse.com
    Cc: gnomes@lxorguk.ukuu.org.uk
    Cc: linux-s390@vger.kernel.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180719205545.16512-16-pasha.tatashin@oracle.com

    Pavel Tatashin
     
  • ...instead of kstrtol with a dirty cast.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Signed-off-by: Ondrej Mosnacek
    Signed-off-by: John Stultz

    Ondrej Mosnacek
     
  • The 'ts' argument of process_adj_status() and process_adjtimex_modes()
    is unused and can be safely removed.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Signed-off-by: Ondrej Mosnacek
    Signed-off-by: John Stultz

    Ondrej Mosnacek
     

19 Jul, 2018

1 commit

  • The call to wake_up_nohz_cpu() is incorrectly indented. Remove the surplus TAB.

    Signed-off-by: Yi Wang
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Jiang Biao
    Cc: john.stultz@linaro.org
    Cc: sboyd@kernel.org
    Cc: zhong.weidong@zte.com.cn
    CC: Anna-Maria Gleixner
    Link: https://lkml.kernel.org/r/1531721337-30284-1-git-send-email-wang.yi59@zte.com.cn

    Yi Wang
     

13 Jul, 2018

2 commits


11 Jul, 2018

2 commits

  • This reverts commit 1332a90558013ae4242e3dd7934bdcdeafb06c0d.

    The original issue was not because of incorrect checking of cpumask for
    both new and old tick device. It was incorrectly analysed was due to the
    misunderstanding of the comment and misinterpretation of the return value
    from tick_check_preferred. The main issue is with the clockevent driver
    that sets the cpumask to cpu_all_mask instead of cpu_possible_mask.

    Signed-off-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Tested-by: Kevin Hilman
    Tested-by: Martin Blumenstingl
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Marc Zyngier
    Link: https://lkml.kernel.org/r/1531151136-18297-1-git-send-email-sudeep.holla@arm.com

    Sudeep Holla
     
  • When the NTP frequency is set directly from userspace using the
    ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the
    timekeeper's multiplier instead of waiting for the next tick.

    This removes a hidden non-deterministic delay in setting of the
    frequency and allows an extremely tight control of the system clock
    with update rates close to or even exceeding the kernel HZ.

    The update is limited to archs using modern timekeeping
    (!ARCH_USES_GETTIMEOFFSET).

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Signed-off-by: Miroslav Lichvar
    Signed-off-by: John Stultz

    Miroslav Lichvar
     

02 Jul, 2018

3 commits

  • Air Icy reported:

    UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
    signed integer overflow:
    1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
    Call Trace:
    alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
    __do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
    __se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
    __x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
    do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290

    alarm_timer_nsleep() uses ktime_add() to add the current time and the
    relative expiry value. ktime_add() has no sanity checks so the addition
    can overflow when the relative timeout is large enough.

    Use ktime_add_safe() which has the necessary sanity checks in place and
    limits the result to the valid range.

    Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers")
    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • The posix timer overrun handling is broken because the forwarding functions
    can return a huge number of overruns which does not fit in an int. As a
    consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
    random number generators.

    The k_clock::timer_forward() callbacks return a 64 bit value now. Make
    k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
    accounting is correct. 3Remove the temporary (int) casts.

    Add a helper function which clamps the overrun value returned to user space
    via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
    between 0 and INT_MAX. INT_MAX is an indicator for user space that the
    overrun value has been clamped.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de

    Thomas Gleixner
     
  • The posix timer ti_overrun handling is broken because the forwarding
    functions can return a huge number of overruns which does not fit in an
    int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn
    into random number generators.

    As a first step to address that let the timer_forward() callbacks return
    the full 64 bit value.

    Cast it to (int) temporarily until k_itimer::ti_overrun is converted to
    64bit and the conversion to user space visible values is sanitized.

    Reported-by: Team OWL337
    Signed-off-by: Thomas Gleixner
    Acked-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Michael Kerrisk
    Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de

    Thomas Gleixner
     

24 Jun, 2018

3 commits

  • timer_set/gettime and timerfd_set/get apis use struct itimerspec at the
    user interface layer. struct itimerspec is not y2038-safe. Change these
    interfaces to use y2038-safe struct __kernel_itimerspec instead. This will
    help define new syscalls when 32bit architectures select CONFIG_64BIT_TIME.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Thomas Gleixner
    Cc: arnd@arndb.de
    Cc: viro@zeniv.linux.org.uk
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Cc: y2038@lists.linaro.org
    Link: https://lkml.kernel.org/r/20180617051144.29756-4-deepa.kernel@gmail.com

    Deepa Dinamani
     
  • This will aid in enabling the compat syscalls on 32-bit architectures later
    on.

    Also move compat_itimerspec and related defines to compat_time.h. The
    compat_time.h file will eventually be deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Thomas Gleixner
    Cc: arnd@arndb.de
    Cc: viro@zeniv.linux.org.uk
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Cc: y2038@lists.linaro.org
    Link: https://lkml.kernel.org/r/20180617051144.29756-3-deepa.kernel@gmail.com

    Deepa Dinamani
     
  • struct itimerspec is not y2038-safe.

    Introduce a new struct __kernel_itimerspec based on the kernel internal
    y2038-safe struct itimerspec64.

    The definition of struct __kernel_itimerspec includes two struct
    __kernel_timespec.

    Since struct __kernel_timespec has the same representation in native and
    compat modes, so does struct __kernel_itimerspec. This helps have a common
    entry point for syscalls using struct __kernel_itimerspec.

    New y2038-safe syscalls will use this new type. Since most of the new
    syscalls are just an update to the native syscalls with the type update,
    place the new definition under CONFIG_64BIT_TIME. This helps architectures
    that do not support the above config to keep using the old definition of
    struct itimerspec.

    Also change the get/put_itimerspec64 to use struct__kernel_itimerspec.
    This will help 32 bit architectures to use the new syscalls when
    architectures select CONFIG_64BIT_TIME.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Thomas Gleixner
    Cc: arnd@arndb.de
    Cc: viro@zeniv.linux.org.uk
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Cc: y2038@lists.linaro.org
    Link: https://lkml.kernel.org/r/20180617051144.29756-2-deepa.kernel@gmail.com

    Deepa Dinamani
     

22 Jun, 2018

1 commit

  • For the common cases where 1000 is a multiple of HZ, or HZ is a multiple of
    1000, jiffies_to_msecs() never returns zero when passed a non-zero time
    period.

    However, if HZ > 1000 and not an integer multiple of 1000 (e.g. 1024 or
    1200, as used on alpha and DECstation), jiffies_to_msecs() may return zero
    for small non-zero time periods. This may break code that relies on
    receiving back a non-zero value.

    jiffies_to_usecs() does not need such a fix: one jiffy can only be less
    than one µs if HZ > 1000000, and such large values of HZ are already
    rejected at build time, twice:

    - include/linux/jiffies.h does #error if HZ >= 12288,
    - kernel/time/time.c has BUILD_BUG_ON(HZ > USEC_PER_SEC).

    Broken since forever.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Arnd Bergmann
    Cc: John Stultz
    Cc: Stephen Boyd
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180622143357.7495-1-geert@linux-m68k.org

    Geert Uytterhoeven