21 Jun, 2017

2 commits

  • CONFIG_GENERIC_TIME_VSYSCALL_OLD was introduced five years ago
    to allow a transition from the old vsyscall implementations to
    the new method (which simplified internal accounting and made
    timekeeping more precise).

    However, PPC and IA64 have yet to make the transition, despite
    in some cases me sending test patches to try to help it along.

    http://patches.linaro.org/patch/30501/
    http://patches.linaro.org/patch/35412/

    If its helpful, my last pass at the patches can be found here:
    https://git.linaro.org/people/john.stultz/linux.git dev/oldvsyscall-cleanup

    So I think its time to set a deadline and make it clear this
    is going away. So this patch adds warnings about this
    functionality being dropped. Likely to be in v4.15.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Marcelo Tosatti
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: Tony Luck
    Cc: Michael Ellerman
    Cc: Fenghua Yu
    Signed-off-by: John Stultz

    John Stultz
     
  • Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
    remove the duplicitive tk->raw_time.tv_nsec, which can be
    stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
    for monotonic time).

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Stephen Boyd
    Cc: Kevin Brodsky
    Cc: Will Deacon
    Cc: Daniel Mentz
    Tested-by: Daniel Mentz
    Signed-off-by: John Stultz

    John Stultz
     

20 Jun, 2017

2 commits

  • Due to how the MONOTONIC_RAW accumulation logic was handled,
    there is the potential for a 1ns discontinuity when we do
    accumulations. This small discontinuity has for the most part
    gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
    in their vDSO clock_gettime implementation, we've seen failures
    with the inconsistency-check test in kselftest.

    This patch addresses the issue by using the same sub-ns
    accumulation handling that CLOCK_MONOTONIC uses, which avoids
    the issue for in-kernel users.

    Since the ARM64 vDSO implementation has its own clock_gettime
    calculation logic, this patch reduces the frequency of errors,
    but failures are still seen. The ARM64 vDSO will need to be
    updated to include the sub-nanosecond xtime_nsec values in its
    calculation for this issue to be completely fixed.

    Signed-off-by: John Stultz
    Tested-by: Daniel Mentz
    Cc: Prarit Bhargava
    Cc: Kevin Brodsky
    Cc: Richard Cochran
    Cc: Stephen Boyd
    Cc: Will Deacon
    Cc: "stable #4 . 8+"
    Cc: Miroslav Lichvar
    Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     
  • In tests, which excercise switching of clocksources, a NULL
    pointer dereference can be observed on AMR64 platforms in the
    clocksource read() function:

    u64 clocksource_mmio_readl_down(struct clocksource *c)
    {
    return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
    }

    This is called from the core timekeeping code via:

    cycle_now = tkr->read(tkr->clock);

    tkr->read is the cached tkr->clock->read() function pointer.
    When the clocksource is changed then tkr->clock and tkr->read
    are updated sequentially. The code above results in a sequential
    load operation of tkr->read and tkr->clock as well.

    If the store to tkr->clock hits between the loads of tkr->read
    and tkr->clock, then the old read() function is called with the
    new clock pointer. As a consequence the read() function
    dereferences a different data structure and the resulting 'reg'
    pointer can point anywhere including NULL.

    This problem was introduced when the timekeeping code was
    switched over to use struct tk_read_base. Before that, it was
    theoretically possible as well when the compiler decided to
    reload clock in the code sequence:

    now = tk->clock->read(tk->clock);

    Add a helper function which avoids the issue by reading
    tk_read_base->clock once into a local variable clk and then issue
    the read function via clk->read(clk). This guarantees that the
    read() function always gets the proper clocksource pointer handed
    in.

    Since there is now no use for the tkr.read pointer, this patch
    also removes it, and to address stopping the fast timekeeper
    during suspend/resume, it introduces a dummy clocksource to use
    rather then just a dummy read function.

    Signed-off-by: John Stultz
    Acked-by: Ingo Molnar
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Stephen Boyd
    Cc: stable
    Cc: Miroslav Lichvar
    Cc: Daniel Mentz
    Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     

31 Mar, 2017

1 commit


02 Mar, 2017

2 commits


07 Jan, 2017

1 commit

  • The last caller to timekeeping_set_tai_offset() was in commit
    0b5154fb9040 (timekeeping: Simplify tai updating from
    do_adjtimex, 2013-03-22) and the last caller to
    timekeeping_get_tai_offset() was in commit 76f4108892d9 (hrtimer:
    Cleanup hrtimer accessors to the timekepeing state, 2014-07-16).
    Remove these unused functions now that we handle TAI offsets
    differently.

    Cc: John Stultz
    Signed-off-by: Stephen Boyd
    Signed-off-by: John Stultz

    Stephen Boyd
     

26 Dec, 2016

1 commit

  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

25 Dec, 2016

1 commit


09 Dec, 2016

4 commits

  • The resume code must deal with a clocksource delta which is potentially big
    enough to overflow the 64bit mult.

    Replace the open coded handling with the proper function.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: David Gibson
    Acked-by: Peter Zijlstra (Intel)
    Cc: Parit Bhargava
    Cc: Laurent Vivier
    Cc: "Christopher S. Hall"
    Cc: Chris Metcalf
    Cc: Richard Cochran
    Cc: Liav Rehana
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • cycle_t is defined as u64, so casting it to u64 is a pointless and
    confusing exercise. cycle_t should simply go away and be replaced with a
    plain u64 to avoid further confusion.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: David Gibson
    Acked-by: Peter Zijlstra (Intel)
    Cc: Parit Bhargava
    Cc: Laurent Vivier
    Cc: "Christopher S. Hall"
    Cc: Chris Metcalf
    Cc: Richard Cochran
    Cc: Liav Rehana
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Propagating a unsigned value through signed variables and functions makes
    absolutely no sense and is just prone to (re)introduce subtle signed
    vs. unsigned issues as happened recently.

    Clean it up.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: David Gibson
    Acked-by: Peter Zijlstra (Intel)
    Cc: Parit Bhargava
    Cc: Laurent Vivier
    Cc: "Christopher S. Hall"
    Cc: Chris Metcalf
    Cc: Richard Cochran
    Cc: Liav Rehana
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The clocksource delta to nanoseconds conversion is using signed math, but
    the delta is unsigned. This makes the conversion space smaller than
    necessary and in case of a multiplication overflow the conversion can
    become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

    Shifting a signed integer right obvioulsy preserves the sign, which has
    interesting consequences:

    - Time jumps backwards

    - __iter_div_u64_rem() which is used in one of the calling code pathes
    will take forever to piecewise calculate the seconds/nanoseconds part.

    This has been reported by several people with different scenarios:

    David observed that when stopping a VM with a debugger:

    "It was essentially the stopped by debugger case. I forget exactly why,
    but the guest was being explicitly stopped from outside, it wasn't just
    scheduling lag. I think it was something in the vicinity of 10 minutes
    stopped."

    When lifting the stop the machine went dead.

    The stopped by debugger case is not really interesting, but nevertheless it
    would be a good thing not to die completely.

    But this was also observed on a live system by Liav:

    "When the OS is too overloaded, delta will get a high enough value for the
    msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
    after the shift the nsec variable will gain a value similar to
    0xffffffffff000000."

    Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
    ("time: Add cycles to nanoseconds translation"). It had been fixed a year
    ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
    timekeeping_get_ns()").

    Though it's not surprising that the issue has been reintroduced because the
    function itself and the whole call chain uses s64 for the result and the
    propagation of it. The change in this recent commit is subtle:

    s64 nsec;

    - nsec = (d * m + n) >> s:
    + nsec = d * m + n;
    + nsec >>= s;

    d being type of cycle_t adds another level of obfuscation.

    This wouldn't have happened if the previous change to unsigned computation
    would have made the 'nsec' variable u64 right away and a follow up patch
    had cleaned up the whole call chain.

    There have been patches submitted which basically did a revert of the above
    patch leaving everything else unchanged as signed. Back to square one. This
    spawned a admittedly pointless discussion about potential users which rely
    on the unsigned behaviour until someone pointed out that it had been fixed
    before. The changelogs of said patches added further confusion as they made
    finally false claims about the consequences for eventual users which expect
    signed results.

    Despite delta being cycle_t, aka. u64, it's very well possible to hand in
    a signed negative value and the signed computation will happily return the
    correct result. But nobody actually sat down and analyzed the code which
    was added as user after the propably unintended signed conversion.

    Though in sensitive code like this it's better to analyze it proper and
    make sure that nothing relies on this than hunting the subtle wreckage half
    a year later. After analyzing all call chains it stands that no caller can
    hand in a negative value (which actually would work due to the s64 cast)
    and rely on the signed math to do the right thing.

    Change the conversion function to unsigned math. The conversion of all call
    chains is done in a follow up patch.

    This solves the starvation issue, which was caused by the negative result,
    but it does not solve the underlying problem. It merily procrastinates
    it. When the timekeeper update is deferred long enough that the unsigned
    multiplication overflows, then time going backwards is observable again.

    It does neither solve the issue of clocksources with a small counter width
    which will wrap around possibly several times and cause random time stamps
    to be generated. But those are usually not found on systems used for
    virtualization, so this is likely a non issue.

    I took the liberty to claim authorship for this simply because
    analyzing all callsites and writing the changelog took substantially
    more time than just making the simple s/s64/u64/ change and ignore the
    rest.

    Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
    Reported-by: David Gibson
    Reported-by: Liav Rehana
    Signed-off-by: Thomas Gleixner
    Reviewed-by: David Gibson
    Acked-by: Peter Zijlstra (Intel)
    Cc: Parit Bhargava
    Cc: Laurent Vivier
    Cc: "Christopher S. Hall"
    Cc: Chris Metcalf
    Cc: Richard Cochran
    Cc: John Stultz
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

30 Nov, 2016

1 commit

  • This boot clock can be used as a tracing clock and will account for
    suspend time.

    To keep it NMI safe since we're accessing from tracing, we're not using a
    separate timekeeper with updates to monotonic clock and boot offset
    protected with seqlocks. This has the following minor side effects:

    (1) Its possible that a timestamp be taken after the boot offset is updated
    but before the timekeeper is updated. If this happens, the new boot offset
    is added to the old timekeeping making the clock appear to update slightly
    earlier:
    CPU 0 CPU 1
    timekeeping_inject_sleeptime64()
    __timekeeping_inject_sleeptime(tk, delta);
    timestamp();
    timekeeping_update(tk, TK_CLEAR_NTP...);

    (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
    partially updated. Since the tk->offs_boot update is a rare event, this
    should be a rare occurrence which postprocessing should be able to handle.

    Signed-off-by: Joel Fernandes
    Signed-off-by: John Stultz
    Reviewed-by: Thomas Gleixner
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1480372524-15181-6-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    Joel Fernandes
     

05 Oct, 2016

1 commit

  • In commit 27727df240c7 ("Avoid taking lock in NMI path with
    CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
    the timekeeping_get_ns() function, but I forgot to include
    the unit conversion from cycles to nanoseconds, breaking the
    function's output, which impacts users like perf.

    This results in bogus perf timestamps like:
    swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

    Instead of more reasonable expected timestamps like:
    swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
    swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

    Add the proper use of timekeeping_delta_to_ns() to convert
    the cycle delta to nanoseconds as needed.

    Thanks to Brendan and Alexei for finding this quickly after
    the v4.8 release. Unfortunately the problematic commit has
    landed in some -stable trees so they'll need this fix as
    well.

    Many apologies for this mistake. I'll be looking to add a
    perf-clock sanity test to the kselftest timers tests soon.

    Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
    Reported-by: Brendan Gregg
    Reported-by: Alexei Starovoitov
    Tested-and-reviewed-by: Mathieu Desnoyers
    Signed-off-by: John Stultz
    Cc: Peter Zijlstra
    Cc: stable
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     

24 Aug, 2016

1 commit

  • When I added some extra sanity checking in timekeeping_get_ns() under
    CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
    method was using timekeeping_get_ns().

    Thus the locking added to the debug checks broke the NMI-safety of
    __ktime_get_fast_ns().

    This patch open-codes the timekeeping_get_ns() logic for
    __ktime_get_fast_ns(), so can avoid any deadlocks in NMI.

    Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are observed"
    Reported-by: Steven Rostedt
    Reported-by: Peter Zijlstra
    Signed-off-by: John Stultz
    Cc: stable
    Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner

    John Stultz
     

26 Jul, 2016

1 commit

  • Pull timer updates from Thomas Gleixner:
    "This update provides the following changes:

    - The rework of the timer wheel which addresses the shortcomings of
    the current wheel (cascading, slow search for next expiring timer,
    etc). That's the first major change of the wheel in almost 20
    years since Finn implemted it.

    - A large overhaul of the clocksource drivers init functions to
    consolidate the Device Tree initialization

    - Some more Y2038 updates

    - A capability fix for timerfd

    - Yet another clock chip driver

    - The usual pile of updates, comment improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
    tick/nohz: Optimize nohz idle enter
    clockevents: Make clockevents_subsys static
    clocksource/drivers/time-armada-370-xp: Fix return value check
    timers: Implement optimization for same expiry time in mod_timer()
    timers: Split out index calculation
    timers: Only wake softirq if necessary
    timers: Forward the wheel clock whenever possible
    timers/nohz: Remove pointless tick_nohz_kick_tick() function
    timers: Optimize collect_expired_timers() for NOHZ
    timers: Move __run_timers() function
    timers: Remove set_timer_slack() leftovers
    timers: Switch to a non-cascading wheel
    timers: Reduce the CPU index space to 256k
    timers: Give a few structs and members proper names
    hlist: Add hlist_is_singular_node() helper
    signals: Use hrtimer for sigtimedwait()
    timers: Remove the deprecated mod_timer_pinned() API
    timers, net/ipv4/inet: Initialize connection request timers as pinned
    timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
    timers, drivers/tty/metag_da: Initialize the poll timer as pinned
    ...

    Linus Torvalds
     

01 Jul, 2016

1 commit

  • EXPORT_SYMBOL() get_monotonic_coarse64 for new IIO timestamping clock
    selection usage. This provides user apps the ability to request a
    particular IIO device to timestamp samples using a monotonic coarse clock
    granularity.

    Signed-off-by: Gregor Boirie
    Signed-off-by: Jonathan Cameron

    Gregor Boirie
     

21 Jun, 2016

1 commit

  • The user notices the problem in a raw and real time drift, calling
    clock_gettime with CLOCK_REALTIME / CLOCK_MONOTONIC_RAW on a system
    with no ntp correction taking place (no ntpd or ptp stuff running).

    The problem is, that old_vsyscall_fixup adds an extra 1ns even though
    xtime_nsec is already held in full nsecs and the remainder in this
    case is 0. Do the rounding up buisness only if needed.

    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Thomas Graziadei
    Signed-off-by: John Stultz

    Thomas Graziadei
     

18 Mar, 2016

1 commit


08 Mar, 2016

1 commit

  • Newer GCC versions trigger the following warning:

    kernel/time/timekeeping.c: In function ‘get_device_system_crosststamp’:
    kernel/time/timekeeping.c:987:5: warning: ‘clock_was_set_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (discontinuity) {
    ^
    kernel/time/timekeeping.c:1045:15: note: ‘clock_was_set_seq’ was declared here
    unsigned int clock_was_set_seq;
    ^

    GCC clearly is unable to recognize that the 'do_interp' boolean tracks
    the initialization status of 'clock_was_set_seq'.

    The GCC version used was:

    gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)

    Work it around by initializing clock_was_set_seq to 0. Compilers that
    are able to recognize the code flow will eliminate the unnecessary
    initialization.

    Acked-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

03 Mar, 2016

5 commits

  • Another representative use case of time sync and the correlated
    clocksource (in addition to PTP noted above) is PTP synchronized
    audio.

    In a streaming application, as an example, samples will be sent and/or
    received by multiple devices with a presentation time that is in terms
    of the PTP master clock. Synchronizing the audio output on these
    devices requires correlating the audio clock with the PTP master
    clock. The more precise this correlation is, the better the audio
    quality (i.e. out of sync audio sounds bad).

    From an application standpoint, to correlate the PTP master clock with
    the audio device clock, the system clock is used as a intermediate
    timebase. The transforms such an application would perform are:

    System Clock Audio clock
    System Clock Network Device Clock [ PTP Master Clock]

    Modern Intel platforms can perform a more accurate cross timestamp in
    hardware (ART,audio device clock). The audio driver requires
    ART->system time transforms -- the same as required for the network
    driver. These platforms offload audio processing (including
    cross-timestamps) to a DSP which to ensure uninterrupted audio
    processing, communicates and response to the host only once every
    millsecond. As a result is takes up to a millisecond for the DSP to
    receive a request, the request is processed by the DSP, the audio
    output hardware is polled for completion, the result is copied into
    shared memory, and the host is notified. All of these operation occur
    on a millisecond cadence. This transaction requires about 2 ms, but
    under heavier workloads it may take up to 4 ms.

    Adding a history allows these slow devices the option of providing an
    ART value outside of the current interval. In this case, the callback
    provided is an accessor function for the previously obtained counter
    value. If get_system_device_crosststamp() receives a counter value
    previous to cycle_last, it consults the history provided as an
    argument in history_ref and interpolates the realtime and monotonic
    raw system time using the provided counter value. If there are any
    clock discontinuities, e.g. from calling settimeofday(), the monotonic
    raw time is interpolated in the usual way, but the realtime clock time
    is adjusted by scaling the monotonic raw adjustment.

    When an accessor function is used a history argument *must* be
    provided. The history is initialized using ktime_get_snapshot() and
    must be called before the counter values are read.

    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: kevin.b.stanton@intel.com
    Cc: kevin.j.clarke@intel.com
    Cc: hpa@zytor.com
    Cc: jeffrey.t.kirsher@intel.com
    Cc: netdev@vger.kernel.org
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christopher S. Hall
    [jstultz: Fixed up cycles_t/cycle_t type confusion]
    Signed-off-by: John Stultz

    Christopher S. Hall
     
  • ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner
    . It has changed considerably and any mistakes are
    mine.

    The precision with which events on multiple networked systems can be
    synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
    by the precision of the cross timestamps between the system clock and
    the device (timestamp) clock. Precision here is the degree of
    simultaneity when capturing the cross timestamp.

    Currently the PTP cross timestamp is captured in software using the
    PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
    interleaved with reads of the realtime clock. At best, the precision
    of this cross timestamp is on the order of several microseconds due to
    software latencies. Sub-microsecond precision is required for
    industrial control and some media applications. To achieve this level
    of precision hardware supported cross timestamping is needed.

    The function get_device_system_crosstimestamp() allows device drivers
    to return a cross timestamp with system time properly scaled to
    nanoseconds. The realtime value is needed to discipline that clock
    using PTP and the monotonic raw value is used for applications that
    don't require a "real" time, but need an unadjusted clock time. The
    get_device_system_crosstimestamp() code calls back into the driver to
    ensure that the system counter is within the current timekeeping
    update interval.

    Modern Intel hardware provides an Always Running Timer (ART) which is
    exactly related to TSC through a known frequency ratio. The ART is
    routed to devices on the system and is used to precisely and
    simultaneously capture the device clock with the ART.

    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: kevin.b.stanton@intel.com
    Cc: kevin.j.clarke@intel.com
    Cc: hpa@zytor.com
    Cc: jeffrey.t.kirsher@intel.com
    Cc: netdev@vger.kernel.org
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christopher S. Hall
    [jstultz: Reworked to remove extra structures and simplify calling]
    Signed-off-by: John Stultz

    Christopher S. Hall
     
  • The code in ktime_get_snapshot() is a superset of the code in
    ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is
    called only by the PPS code, pps_get_ts(). Consolidate the
    pps_get_ts() code into a single function calling ktime_get_snapshot()
    and eliminate ktime_get_raw_and_real(). A side effect of this is that
    the raw and real results of pps_get_ts() correspond to exactly the
    same clock cycle. Previously these values represented separate reads
    of the system clock.

    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: kevin.b.stanton@intel.com
    Cc: kevin.j.clarke@intel.com
    Cc: hpa@zytor.com
    Cc: jeffrey.t.kirsher@intel.com
    Cc: netdev@vger.kernel.org
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christopher S. Hall
    Signed-off-by: John Stultz

    Christopher S. Hall
     
  • In the current timekeeping code there isn't any interface to
    atomically capture the current relationship between the system counter
    and system time. ktime_get_snapshot() returns this triple (counter,
    monotonic raw, realtime) in the system_time_snapshot struct.

    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: kevin.b.stanton@intel.com
    Cc: kevin.j.clarke@intel.com
    Cc: hpa@zytor.com
    Cc: jeffrey.t.kirsher@intel.com
    Cc: netdev@vger.kernel.org
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christopher S. Hall
    [jstultz: Moved structure definitions around to clean things up,
    fixed cycles_t/cycle_t confusion.]
    Signed-off-by: John Stultz

    Christopher S. Hall
     
  • The timekeeping code does not currently provide a way to translate
    externally provided clocksource cycles to system time. The cycle count
    is always provided by the result clocksource read() method internal to
    the timekeeping code. The added function timekeeping_cycles_to_ns()
    calculated a nanosecond value from a cycle count that can be added to
    tk_read_base.base value yielding the current system time. This allows
    clocksource cycle values external to the timekeeping code to provide a
    cycle count that can be transformed to system time.

    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: kevin.b.stanton@intel.com
    Cc: kevin.j.clarke@intel.com
    Cc: hpa@zytor.com
    Cc: jeffrey.t.kirsher@intel.com
    Cc: netdev@vger.kernel.org
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Christopher S. Hall
    Signed-off-by: John Stultz

    Christopher S. Hall
     

15 Feb, 2016

1 commit


17 Dec, 2015

2 commits

  • Thus its been occasionally noted that users have seen
    confusing warnings like:

    Adjusting tsc more than 11% (5941981 vs 7759439)

    We try to limit the maximum total adjustment to 11% (10% tick
    adjustment + 0.5% frequency adjustment). But this is done by
    bounding the requested adjustment values, and the internal
    steering that is done by tracking the error from what was
    requested and what was applied, does not have any such limits.

    This is usually not problematic, but in some cases has a risk
    that an adjustment could cause the clocksource mult value to
    overflow, so its an indication things are outside of what is
    expected.

    It ends up most of the reports of this 11% warning are on systems
    using chrony, which utilizes the adjtimex() ADJ_TICK interface
    (which allows a +-10% adjustment). The original rational for
    ADJ_TICK unclear to me but my assumption it was originally added
    to allow broken systems to get a big constant correction at boot
    (see adjtimex userspace package for an example) which would allow
    the system to work w/ ntpd's 0.5% adjustment limit.

    Chrony uses ADJ_TICK to make very aggressive short term corrections
    (usually right at startup). Which push us close enough to the max
    bound that a few late ticks can cause the internal steering to push
    past the max adjust value (tripping the warning).

    Thus this patch adds some extra logic to enforce the max adjustment
    cap in the internal steering.

    Note: This has the potential to slow corrections when the ADJ_TICK
    value is furthest away from the default value. So it would be good to
    get some testing from folks using chrony, to make sure we don't
    cause any troubles there.

    Cc: Miroslav Lichvar
    Cc: Thomas Gleixner
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Andy Lutomirski
    Tested-by: Miroslav Lichvar
    Reported-by: Andy Lutomirski
    Signed-off-by: John Stultz

    John Stultz
     
  • In order to fix Y2038 issues in the ntp code we will need replace
    get_seconds() with ktime_get_real_seconds() but as the ntp code uses
    the timekeeping lock which is also used by ktime_get_real_seconds(),
    we need a version without locking.
    Add a new function __ktime_get_real_seconds() in timekeeping to
    do this.

    Reviewed-by: John Stultz
    Signed-off-by: DengChao
    Signed-off-by: John Stultz

    DengChao
     

11 Dec, 2015

1 commit

  • For adjtimex()'s ADJ_SETOFFSET, make sure the tv_usec value is
    sane. We might multiply them later which can cause an overflow
    and undefined behavior.

    This patch introduces new helper functions to simplify the
    checking code and adds comments to clarify

    Orginally this patch was by Sasha Levin, but I've basically
    rewritten it, so he should get credit for finding the issue
    and I should get the blame for any mistakes made since.

    Also, credit to Richard Cochran for the phrasing used in the
    comment for what is considered valid here.

    Cc: Sasha Levin
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Reported-by: Sasha Levin
    Signed-off-by: John Stultz

    John Stultz
     

08 Dec, 2015

1 commit

  • 1e75fa8 "time: Condense timekeeper.xtime into xtime_sec" replaced a call to
    clocksource_cyc2ns() from timekeeping_get_ns() with an open-coded version
    of the same logic to avoid keeping a semi-redundant struct timespec
    in struct timekeeper.

    However, the commit also introduced a subtle semantic change - where
    clocksource_cyc2ns() uses purely unsigned math, the new version introduces
    a signed temporary, meaning that if (delta * tk->mult) has a 63-bit
    overflow the following shift will still give a negative result. The
    choice of 'maxsec' in __clocksource_updatefreq_scale() means this will
    generally happen if there's a ~10 minute pause in examining the
    clocksource.

    This can be triggered on a powerpc KVM guest by stopping it from qemu for
    a bit over 10 minutes. After resuming time has jumped backwards several
    minutes causing numerous problems (jiffies does not advance, msleep()s can
    be extended by minutes..). It doesn't happen on x86 KVM guests, because
    the guest TSC is effectively frozen while the guest is stopped, which is
    not the case for the powerpc timebase.

    Obviously an unsigned (64 bit) overflow will only take twice as long as a
    signed, 63-bit overflow. I don't know the time code well enough to know
    if that will still cause incorrect calculations, or if a 64-bit overflow
    is avoided elsewhere.

    Still, an incorrect forwards clock adjustment will cause less trouble than
    time going backwards. So, this patch removes the potential for
    intermediate signed overflow.

    Cc: stable@vger.kernel.org (3.7+)
    Suggested-by: Laurent Vivier
    Tested-by: Laurent Vivier
    Signed-off-by: David Gibson
    Signed-off-by: John Stultz

    David Gibson
     

10 Nov, 2015

1 commit

  • Switch everything to the new and more capable implementation of abs().
    Mainly to give the new abs() a bit of a workout.

    Cc: Michal Nazarewicz
    Cc: John Stultz
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

04 Nov, 2015

1 commit

  • Pull timer updates from Thomas Gleixner:
    "The timer departement provides:

    - More y2038 work in the area of ntp and pps.

    - Optimization of posix cpu timers

    - New time related selftests

    - Some new clocksource drivers

    - The usual pile of fixes, cleanups and improvements"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    timeconst: Update path in comment
    timers/x86/hpet: Type adjustments
    clocksource/drivers/armada-370-xp: Implement ARM delay timer
    clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
    clocksource/drivers/imx: Allow timer irq affinity change
    clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
    clocksource/drivers/h8300_*: Remove unneeded memset()s
    clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
    clocksource/drivers/em_sti: Remove unneeded memset()s
    clocksource/drivers/mediatek: Use GPT as sched clock source
    clockevents/drivers/mtk: Fix spurious interrupt leading to crash
    posix_cpu_timer: Reduce unnecessary sighand lock contention
    posix_cpu_timer: Convert cputimer->running to bool
    posix_cpu_timer: Check thread timers only when there are active thread timers
    posix_cpu_timer: Optimize fastpath_timer_check()
    timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
    timers: Use __fls in apply_slack()
    clocksource: Remove return statement from void functions
    net: sfc: avoid using timespec
    ntp/pps: use y2038 safe types in pps_event_time
    ...

    Linus Torvalds
     

20 Oct, 2015

1 commit


16 Oct, 2015

1 commit

  • timekeeping_init() can set the wall time offset, so we need to
    increment the clock_was_set_seq counter. That way hrtimers will pick
    up the early offset immediately. Otherwise on a machine which does not
    set wall time later in the boot process the hrtimer offset is stale at
    0 and wall time timers are going to expire with a delay of 45 years.

    Fixes: 868a3e915f7f "hrtimer: Make offset update smarter"
    Reported-and-tested-by: Heiko Carstens
    Signed-off-by: Thomas Gleixner
    Cc: Stefan Liebler
    Cc: Peter Zijlstra
    Cc: John Stultz

    Thomas Gleixner
     

02 Oct, 2015

2 commits

  • There is exactly one caller of getnstime_raw_and_real in the kernel,
    which is the pps_get_ts function. This changes the caller and
    the implementation to work on timespec64 types rather than timespec,
    to avoid the time_t overflow on 32-bit architectures.

    For consistency with the other new functions (ktime_get_seconds,
    ktime_get_real_*, ...), I'm renaming the function to
    ktime_get_raw_and_real_ts64.

    We still need to convert from the internal 64-bit type to 32 bit
    types in the caller, but this conversion is now pushed out from
    getnstime_raw_and_real to pps_get_ts. A follow-up patch changes
    the remaining pps code to completely avoid the conversion.

    Acked-by: Richard Cochran
    Acked-by: David S. Miller
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Arnd Bergmann
    Signed-off-by: John Stultz

    Arnd Bergmann
     
  • There is only one user of the hardpps function in the kernel, so
    it makes sense to atomically change it over to using 64-bit
    timestamps for y2038 safety. In the hardpps implementation,
    we also need to change the pps_normtime structure, which is
    similar to struct timespec and also requires a 64-bit
    seconds portion.

    This introduces two temporary variables in pps_kc_event() to
    do the conversion, they will be removed again in the next step,
    which seemed preferable to having a larger patch changing it
    all at the same time.

    Acked-by: Richard Cochran
    Acked-by: David S. Miller
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Arnd Bergmann
    Signed-off-by: John Stultz

    Arnd Bergmann
     

22 Sep, 2015

1 commit

  • Signed-off-by: Zhen Lei
    Cc: Hanjun Guo
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Tianhong Ding
    Cc: Viresh Kumar
    Cc: Xinwei Hu
    Cc: Xunlei Pang
    Cc: Zefan Li
    Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com
    [ Fixed yet another typo in one of the sentences fixed. ]
    Signed-off-by: Ingo Molnar

    Zhen Lei
     

13 Sep, 2015

1 commit

  • The internal clocksteering done for fine-grained error
    correction uses a logarithmic approximation, so any time
    adjtimex() adjusts the clock steering, timekeeping_freqadjust()
    quickly approximates the correct clock frequency over a series
    of ticks.

    Unfortunately, the logic in timekeeping_freqadjust(), introduced
    in commit:

    dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")

    used the abs() function with a s64 error value to calculate the
    size of the approximated adjustment to be made.

    Per include/linux/kernel.h:

    "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".

    Thus on 32-bit platforms, this resulted in the clocksteering to
    take a quite dampended random walk trying to converge on the
    proper frequency, which caused the adjustments to be made much
    slower then intended (most easily observed when large
    adjustments are made).

    This patch fixes the issue by using abs64() instead.

    Reported-by: Nuno Gonçalves
    Tested-by: Nuno Goncalves
    Signed-off-by: John Stultz
    Cc: # v3.17+
    Cc: Linus Torvalds
    Cc: Miroslav Lichvar
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz