21 Jun, 2017
2 commits
-
CONFIG_GENERIC_TIME_VSYSCALL_OLD was introduced five years ago
to allow a transition from the old vsyscall implementations to
the new method (which simplified internal accounting and made
timekeeping more precise).However, PPC and IA64 have yet to make the transition, despite
in some cases me sending test patches to try to help it along.http://patches.linaro.org/patch/30501/
http://patches.linaro.org/patch/35412/If its helpful, my last pass at the patches can be found here:
https://git.linaro.org/people/john.stultz/linux.git dev/oldvsyscall-cleanupSo I think its time to set a deadline and make it clear this
is going away. So this patch adds warnings about this
functionality being dropped. Likely to be in v4.15.Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Miroslav Lichvar
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Marcelo Tosatti
Cc: Paul Mackerras
Cc: Anton Blanchard
Cc: Benjamin Herrenschmidt
Cc: Tony Luck
Cc: Michael Ellerman
Cc: Fenghua Yu
Signed-off-by: John Stultz -
Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
remove the duplicitive tk->raw_time.tv_nsec, which can be
stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
for monotonic time).Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Miroslav Lichvar
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Stephen Boyd
Cc: Kevin Brodsky
Cc: Will Deacon
Cc: Daniel Mentz
Tested-by: Daniel Mentz
Signed-off-by: John Stultz
20 Jun, 2017
2 commits
-
Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.Signed-off-by: John Stultz
Tested-by: Daniel Mentz
Cc: Prarit Bhargava
Cc: Kevin Brodsky
Cc: Richard Cochran
Cc: Stephen Boyd
Cc: Will Deacon
Cc: "stable #4 . 8+"
Cc: Miroslav Lichvar
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner -
In tests, which excercise switching of clocksources, a NULL
pointer dereference can be observed on AMR64 platforms in the
clocksource read() function:u64 clocksource_mmio_readl_down(struct clocksource *c)
{
return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
}This is called from the core timekeeping code via:
cycle_now = tkr->read(tkr->clock);
tkr->read is the cached tkr->clock->read() function pointer.
When the clocksource is changed then tkr->clock and tkr->read
are updated sequentially. The code above results in a sequential
load operation of tkr->read and tkr->clock as well.If the store to tkr->clock hits between the loads of tkr->read
and tkr->clock, then the old read() function is called with the
new clock pointer. As a consequence the read() function
dereferences a different data structure and the resulting 'reg'
pointer can point anywhere including NULL.This problem was introduced when the timekeeping code was
switched over to use struct tk_read_base. Before that, it was
theoretically possible as well when the compiler decided to
reload clock in the code sequence:now = tk->clock->read(tk->clock);
Add a helper function which avoids the issue by reading
tk_read_base->clock once into a local variable clk and then issue
the read function via clk->read(clk). This guarantees that the
read() function always gets the proper clocksource pointer handed
in.Since there is now no use for the tkr.read pointer, this patch
also removes it, and to address stopping the fast timekeeper
during suspend/resume, it introduces a dummy clocksource to use
rather then just a dummy read function.Signed-off-by: John Stultz
Acked-by: Ingo Molnar
Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Stephen Boyd
Cc: stable
Cc: Miroslav Lichvar
Cc: Daniel Mentz
Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner
31 Mar, 2017
1 commit
-
interp_forward is type bool so assignment from a logical operation directly
is sufficient.Signed-off-by: Nicholas Mc Guire
Cc: "Christopher S. Hall"
Cc: John Stultz
Link: http://lkml.kernel.org/r/1490382215-30505-1-git-send-email-der.herr@hofr.at
Signed-off-by: Thomas Gleixner
02 Mar, 2017
2 commits
-
We are going to move softlockup APIs out of , which
will have to be picked up from other headers and a couple of .c files.already includes .
Include the header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
We are going to split out of , which
will have to be picked up from a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
07 Jan, 2017
1 commit
-
The last caller to timekeeping_set_tai_offset() was in commit
0b5154fb9040 (timekeeping: Simplify tai updating from
do_adjtimex, 2013-03-22) and the last caller to
timekeeping_get_tai_offset() was in commit 76f4108892d9 (hrtimer:
Cleanup hrtimer accessors to the timekepeing state, 2014-07-16).
Remove these unused functions now that we handle TAI offsets
differently.Cc: John Stultz
Signed-off-by: Stephen Boyd
Signed-off-by: John Stultz
26 Dec, 2016
1 commit
-
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
25 Dec, 2016
1 commit
-
There is no point in having an extra type for extra confusion. u64 is
unambiguous.Conversion was done with the following coccinelle script:
@rem@
@@
-typedef u64 cycle_t;@fix@
typedef cycle_t;
@@
-cycle_t
+u64Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: John Stultz
09 Dec, 2016
4 commits
-
The resume code must deal with a clocksource delta which is potentially big
enough to overflow the 64bit mult.Replace the open coded handling with the proper function.
Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: Liav Rehana
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.de
Signed-off-by: Thomas Gleixner -
cycle_t is defined as u64, so casting it to u64 is a pointless and
confusing exercise. cycle_t should simply go away and be replaced with a
plain u64 to avoid further confusion.Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: Liav Rehana
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.de
Signed-off-by: Thomas Gleixner -
Propagating a unsigned value through signed variables and functions makes
absolutely no sense and is just prone to (re)introduce subtle signed
vs. unsigned issues as happened recently.Clean it up.
Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: Liav Rehana
Cc: John Stultz
Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.de
Signed-off-by: Thomas Gleixner -
The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:- Time jumps backwards
- __iter_div_u64_rem() which is used in one of the calling code pathes
will take forever to piecewise calculate the seconds/nanoseconds part.This has been reported by several people with different scenarios:
David observed that when stopping a VM with a debugger:
"It was essentially the stopped by debugger case. I forget exactly why,
but the guest was being explicitly stopped from outside, it wasn't just
scheduling lag. I think it was something in the vicinity of 10 minutes
stopped."When lifting the stop the machine went dead.
The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.But this was also observed on a live system by Liav:
"When the OS is too overloaded, delta will get a high enough value for the
msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
after the shift the nsec variable will gain a value similar to
0xffffffffff000000."Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:s64 nsec;
- nsec = (d * m + n) >> s:
+ nsec = d * m + n;
+ nsec >>= s;d being type of cycle_t adds another level of obfuscation.
This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson
Reported-by: Liav Rehana
Signed-off-by: Thomas Gleixner
Reviewed-by: David Gibson
Acked-by: Peter Zijlstra (Intel)
Cc: Parit Bhargava
Cc: Laurent Vivier
Cc: "Christopher S. Hall"
Cc: Chris Metcalf
Cc: Richard Cochran
Cc: John Stultz
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner
30 Nov, 2016
1 commit
-
This boot clock can be used as a tracing clock and will account for
suspend time.To keep it NMI safe since we're accessing from tracing, we're not using a
separate timekeeper with updates to monotonic clock and boot offset
protected with seqlocks. This has the following minor side effects:(1) Its possible that a timestamp be taken after the boot offset is updated
but before the timekeeper is updated. If this happens, the new boot offset
is added to the old timekeeping making the clock appear to update slightly
earlier:
CPU 0 CPU 1
timekeeping_inject_sleeptime64()
__timekeeping_inject_sleeptime(tk, delta);
timestamp();
timekeeping_update(tk, TK_CLEAR_NTP...);(2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
partially updated. Since the tk->offs_boot update is a rare event, this
should be a rare occurrence which postprocessing should be able to handle.Signed-off-by: Joel Fernandes
Signed-off-by: John Stultz
Reviewed-by: Thomas Gleixner
Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1480372524-15181-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner
05 Oct, 2016
1 commit
-
In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.This results in bogus perf timestamps like:
swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])Instead of more reasonable expected timestamps like:
swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg
Reported-by: Alexei Starovoitov
Tested-and-reviewed-by: Mathieu Desnoyers
Signed-off-by: John Stultz
Cc: Peter Zijlstra
Cc: stable
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner
24 Aug, 2016
1 commit
-
When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are observed"
Reported-by: Steven Rostedt
Reported-by: Peter Zijlstra
Signed-off-by: John Stultz
Cc: stable
Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner
26 Jul, 2016
1 commit
-
Pull timer updates from Thomas Gleixner:
"This update provides the following changes:- The rework of the timer wheel which addresses the shortcomings of
the current wheel (cascading, slow search for next expiring timer,
etc). That's the first major change of the wheel in almost 20
years since Finn implemted it.- A large overhaul of the clocksource drivers init functions to
consolidate the Device Tree initialization- Some more Y2038 updates
- A capability fix for timerfd
- Yet another clock chip driver
- The usual pile of updates, comment improvements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
tick/nohz: Optimize nohz idle enter
clockevents: Make clockevents_subsys static
clocksource/drivers/time-armada-370-xp: Fix return value check
timers: Implement optimization for same expiry time in mod_timer()
timers: Split out index calculation
timers: Only wake softirq if necessary
timers: Forward the wheel clock whenever possible
timers/nohz: Remove pointless tick_nohz_kick_tick() function
timers: Optimize collect_expired_timers() for NOHZ
timers: Move __run_timers() function
timers: Remove set_timer_slack() leftovers
timers: Switch to a non-cascading wheel
timers: Reduce the CPU index space to 256k
timers: Give a few structs and members proper names
hlist: Add hlist_is_singular_node() helper
signals: Use hrtimer for sigtimedwait()
timers: Remove the deprecated mod_timer_pinned() API
timers, net/ipv4/inet: Initialize connection request timers as pinned
timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
timers, drivers/tty/metag_da: Initialize the poll timer as pinned
...
01 Jul, 2016
1 commit
-
EXPORT_SYMBOL() get_monotonic_coarse64 for new IIO timestamping clock
selection usage. This provides user apps the ability to request a
particular IIO device to timestamp samples using a monotonic coarse clock
granularity.Signed-off-by: Gregor Boirie
Signed-off-by: Jonathan Cameron
21 Jun, 2016
1 commit
-
The user notices the problem in a raw and real time drift, calling
clock_gettime with CLOCK_REALTIME / CLOCK_MONOTONIC_RAW on a system
with no ntp correction taking place (no ntpd or ptp stuff running).The problem is, that old_vsyscall_fixup adds an extra 1ns even though
xtime_nsec is already held in full nsecs and the remainder in this
case is 0. Do the rounding up buisness only if needed.Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Thomas Graziadei
Signed-off-by: John Stultz
18 Mar, 2016
1 commit
-
Pull trivial tree updates from Jiri Kosina.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
drivers/rtc: broken link fix
drm/i915 Fix typos in i915_gem_fence.c
Docs: fix missing word in REPORTING-BUGS
lib+mm: fix few spelling mistakes
MAINTAINERS: add git URL for APM driver
treewide: Fix typo in printk
08 Mar, 2016
1 commit
-
Newer GCC versions trigger the following warning:
kernel/time/timekeeping.c: In function ‘get_device_system_crosststamp’:
kernel/time/timekeeping.c:987:5: warning: ‘clock_was_set_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (discontinuity) {
^
kernel/time/timekeeping.c:1045:15: note: ‘clock_was_set_seq’ was declared here
unsigned int clock_was_set_seq;
^GCC clearly is unable to recognize that the 'do_interp' boolean tracks
the initialization status of 'clock_was_set_seq'.The GCC version used was:
gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)
Work it around by initializing clock_was_set_seq to 0. Compilers that
are able to recognize the code flow will eliminate the unnecessary
initialization.Acked-by: Thomas Gleixner
Cc: John Stultz
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
03 Mar, 2016
5 commits
-
Another representative use case of time sync and the correlated
clocksource (in addition to PTP noted above) is PTP synchronized
audio.In a streaming application, as an example, samples will be sent and/or
received by multiple devices with a presentation time that is in terms
of the PTP master clock. Synchronizing the audio output on these
devices requires correlating the audio clock with the PTP master
clock. The more precise this correlation is, the better the audio
quality (i.e. out of sync audio sounds bad).From an application standpoint, to correlate the PTP master clock with
the audio device clock, the system clock is used as a intermediate
timebase. The transforms such an application would perform are:System Clock Audio clock
System Clock Network Device Clock [ PTP Master Clock]Modern Intel platforms can perform a more accurate cross timestamp in
hardware (ART,audio device clock). The audio driver requires
ART->system time transforms -- the same as required for the network
driver. These platforms offload audio processing (including
cross-timestamps) to a DSP which to ensure uninterrupted audio
processing, communicates and response to the host only once every
millsecond. As a result is takes up to a millisecond for the DSP to
receive a request, the request is processed by the DSP, the audio
output hardware is polled for completion, the result is copied into
shared memory, and the host is notified. All of these operation occur
on a millisecond cadence. This transaction requires about 2 ms, but
under heavier workloads it may take up to 4 ms.Adding a history allows these slow devices the option of providing an
ART value outside of the current interval. In this case, the callback
provided is an accessor function for the previously obtained counter
value. If get_system_device_crosststamp() receives a counter value
previous to cycle_last, it consults the history provided as an
argument in history_ref and interpolates the realtime and monotonic
raw system time using the provided counter value. If there are any
clock discontinuities, e.g. from calling settimeofday(), the monotonic
raw time is interpolated in the usual way, but the realtime clock time
is adjusted by scaling the monotonic raw adjustment.When an accessor function is used a history argument *must* be
provided. The history is initialized using ktime_get_snapshot() and
must be called before the counter values are read.Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Andy Lutomirski
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner
Signed-off-by: Christopher S. Hall
[jstultz: Fixed up cycles_t/cycle_t type confusion]
Signed-off-by: John Stultz -
ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner
. It has changed considerably and any mistakes are
mine.The precision with which events on multiple networked systems can be
synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
by the precision of the cross timestamps between the system clock and
the device (timestamp) clock. Precision here is the degree of
simultaneity when capturing the cross timestamp.Currently the PTP cross timestamp is captured in software using the
PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
interleaved with reads of the realtime clock. At best, the precision
of this cross timestamp is on the order of several microseconds due to
software latencies. Sub-microsecond precision is required for
industrial control and some media applications. To achieve this level
of precision hardware supported cross timestamping is needed.The function get_device_system_crosstimestamp() allows device drivers
to return a cross timestamp with system time properly scaled to
nanoseconds. The realtime value is needed to discipline that clock
using PTP and the monotonic raw value is used for applications that
don't require a "real" time, but need an unadjusted clock time. The
get_device_system_crosstimestamp() code calls back into the driver to
ensure that the system counter is within the current timekeeping
update interval.Modern Intel hardware provides an Always Running Timer (ART) which is
exactly related to TSC through a known frequency ratio. The ART is
routed to devices on the system and is used to precisely and
simultaneously capture the device clock with the ART.Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Andy Lutomirski
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner
Signed-off-by: Christopher S. Hall
[jstultz: Reworked to remove extra structures and simplify calling]
Signed-off-by: John Stultz -
The code in ktime_get_snapshot() is a superset of the code in
ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is
called only by the PPS code, pps_get_ts(). Consolidate the
pps_get_ts() code into a single function calling ktime_get_snapshot()
and eliminate ktime_get_raw_and_real(). A side effect of this is that
the raw and real results of pps_get_ts() correspond to exactly the
same clock cycle. Previously these values represented separate reads
of the system clock.Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Andy Lutomirski
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner
Signed-off-by: Christopher S. Hall
Signed-off-by: John Stultz -
In the current timekeeping code there isn't any interface to
atomically capture the current relationship between the system counter
and system time. ktime_get_snapshot() returns this triple (counter,
monotonic raw, realtime) in the system_time_snapshot struct.Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Andy Lutomirski
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner
Signed-off-by: Christopher S. Hall
[jstultz: Moved structure definitions around to clean things up,
fixed cycles_t/cycle_t confusion.]
Signed-off-by: John Stultz -
The timekeeping code does not currently provide a way to translate
externally provided clocksource cycles to system time. The cycle count
is always provided by the result clocksource read() method internal to
the timekeeping code. The added function timekeeping_cycles_to_ns()
calculated a nanosecond value from a cycle count that can be added to
tk_read_base.base value yielding the current system time. This allows
clocksource cycle values external to the timekeeping code to provide a
cycle count that can be transformed to system time.Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Andy Lutomirski
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner
Signed-off-by: Christopher S. Hall
Signed-off-by: John Stultz
15 Feb, 2016
1 commit
-
This patch fix spelling typos found in printk and Kconfig.
Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina
17 Dec, 2015
2 commits
-
Thus its been occasionally noted that users have seen
confusing warnings like:Adjusting tsc more than 11% (5941981 vs 7759439)
We try to limit the maximum total adjustment to 11% (10% tick
adjustment + 0.5% frequency adjustment). But this is done by
bounding the requested adjustment values, and the internal
steering that is done by tracking the error from what was
requested and what was applied, does not have any such limits.This is usually not problematic, but in some cases has a risk
that an adjustment could cause the clocksource mult value to
overflow, so its an indication things are outside of what is
expected.It ends up most of the reports of this 11% warning are on systems
using chrony, which utilizes the adjtimex() ADJ_TICK interface
(which allows a +-10% adjustment). The original rational for
ADJ_TICK unclear to me but my assumption it was originally added
to allow broken systems to get a big constant correction at boot
(see adjtimex userspace package for an example) which would allow
the system to work w/ ntpd's 0.5% adjustment limit.Chrony uses ADJ_TICK to make very aggressive short term corrections
(usually right at startup). Which push us close enough to the max
bound that a few late ticks can cause the internal steering to push
past the max adjust value (tripping the warning).Thus this patch adds some extra logic to enforce the max adjustment
cap in the internal steering.Note: This has the potential to slow corrections when the ADJ_TICK
value is furthest away from the default value. So it would be good to
get some testing from folks using chrony, to make sure we don't
cause any troubles there.Cc: Miroslav Lichvar
Cc: Thomas Gleixner
Cc: Richard Cochran
Cc: Prarit Bhargava
Cc: Andy Lutomirski
Tested-by: Miroslav Lichvar
Reported-by: Andy Lutomirski
Signed-off-by: John Stultz -
In order to fix Y2038 issues in the ntp code we will need replace
get_seconds() with ktime_get_real_seconds() but as the ntp code uses
the timekeeping lock which is also used by ktime_get_real_seconds(),
we need a version without locking.
Add a new function __ktime_get_real_seconds() in timekeeping to
do this.Reviewed-by: John Stultz
Signed-off-by: DengChao
Signed-off-by: John Stultz
11 Dec, 2015
1 commit
-
For adjtimex()'s ADJ_SETOFFSET, make sure the tv_usec value is
sane. We might multiply them later which can cause an overflow
and undefined behavior.This patch introduces new helper functions to simplify the
checking code and adds comments to clarifyOrginally this patch was by Sasha Levin, but I've basically
rewritten it, so he should get credit for finding the issue
and I should get the blame for any mistakes made since.Also, credit to Richard Cochran for the phrasing used in the
comment for what is considered valid here.Cc: Sasha Levin
Cc: Richard Cochran
Cc: Thomas Gleixner
Reported-by: Sasha Levin
Signed-off-by: John Stultz
08 Dec, 2015
1 commit
-
1e75fa8 "time: Condense timekeeper.xtime into xtime_sec" replaced a call to
clocksource_cyc2ns() from timekeeping_get_ns() with an open-coded version
of the same logic to avoid keeping a semi-redundant struct timespec
in struct timekeeper.However, the commit also introduced a subtle semantic change - where
clocksource_cyc2ns() uses purely unsigned math, the new version introduces
a signed temporary, meaning that if (delta * tk->mult) has a 63-bit
overflow the following shift will still give a negative result. The
choice of 'maxsec' in __clocksource_updatefreq_scale() means this will
generally happen if there's a ~10 minute pause in examining the
clocksource.This can be triggered on a powerpc KVM guest by stopping it from qemu for
a bit over 10 minutes. After resuming time has jumped backwards several
minutes causing numerous problems (jiffies does not advance, msleep()s can
be extended by minutes..). It doesn't happen on x86 KVM guests, because
the guest TSC is effectively frozen while the guest is stopped, which is
not the case for the powerpc timebase.Obviously an unsigned (64 bit) overflow will only take twice as long as a
signed, 63-bit overflow. I don't know the time code well enough to know
if that will still cause incorrect calculations, or if a 64-bit overflow
is avoided elsewhere.Still, an incorrect forwards clock adjustment will cause less trouble than
time going backwards. So, this patch removes the potential for
intermediate signed overflow.Cc: stable@vger.kernel.org (3.7+)
Suggested-by: Laurent Vivier
Tested-by: Laurent Vivier
Signed-off-by: David Gibson
Signed-off-by: John Stultz
10 Nov, 2015
1 commit
-
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.Cc: Michal Nazarewicz
Cc: John Stultz
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Masami Hiramatsu
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Nov, 2015
1 commit
-
Pull timer updates from Thomas Gleixner:
"The timer departement provides:- More y2038 work in the area of ntp and pps.
- Optimization of posix cpu timers
- New time related selftests
- Some new clocksource drivers
- The usual pile of fixes, cleanups and improvements"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
timeconst: Update path in comment
timers/x86/hpet: Type adjustments
clocksource/drivers/armada-370-xp: Implement ARM delay timer
clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
clocksource/drivers/imx: Allow timer irq affinity change
clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
clocksource/drivers/h8300_*: Remove unneeded memset()s
clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
clocksource/drivers/em_sti: Remove unneeded memset()s
clocksource/drivers/mediatek: Use GPT as sched clock source
clockevents/drivers/mtk: Fix spurious interrupt leading to crash
posix_cpu_timer: Reduce unnecessary sighand lock contention
posix_cpu_timer: Convert cputimer->running to bool
posix_cpu_timer: Check thread timers only when there are active thread timers
posix_cpu_timer: Optimize fastpath_timer_check()
timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
timers: Use __fls in apply_slack()
clocksource: Remove return statement from void functions
net: sfc: avoid using timespec
ntp/pps: use y2038 safe types in pps_event_time
...
20 Oct, 2015
1 commit
-
Time updates from John Stultz:
- More 2038 work from Arnd Bergmann around ntp and pps
16 Oct, 2015
1 commit
-
timekeeping_init() can set the wall time offset, so we need to
increment the clock_was_set_seq counter. That way hrtimers will pick
up the early offset immediately. Otherwise on a machine which does not
set wall time later in the boot process the hrtimer offset is stale at
0 and wall time timers are going to expire with a delay of 45 years.Fixes: 868a3e915f7f "hrtimer: Make offset update smarter"
Reported-and-tested-by: Heiko Carstens
Signed-off-by: Thomas Gleixner
Cc: Stefan Liebler
Cc: Peter Zijlstra
Cc: John Stultz
02 Oct, 2015
2 commits
-
There is exactly one caller of getnstime_raw_and_real in the kernel,
which is the pps_get_ts function. This changes the caller and
the implementation to work on timespec64 types rather than timespec,
to avoid the time_t overflow on 32-bit architectures.For consistency with the other new functions (ktime_get_seconds,
ktime_get_real_*, ...), I'm renaming the function to
ktime_get_raw_and_real_ts64.We still need to convert from the internal 64-bit type to 32 bit
types in the caller, but this conversion is now pushed out from
getnstime_raw_and_real to pps_get_ts. A follow-up patch changes
the remaining pps code to completely avoid the conversion.Acked-by: Richard Cochran
Acked-by: David S. Miller
Reviewed-by: Thomas Gleixner
Signed-off-by: Arnd Bergmann
Signed-off-by: John Stultz -
There is only one user of the hardpps function in the kernel, so
it makes sense to atomically change it over to using 64-bit
timestamps for y2038 safety. In the hardpps implementation,
we also need to change the pps_normtime structure, which is
similar to struct timespec and also requires a 64-bit
seconds portion.This introduces two temporary variables in pps_kc_event() to
do the conversion, they will be removed again in the next step,
which seemed preferable to having a larger patch changing it
all at the same time.Acked-by: Richard Cochran
Acked-by: David S. Miller
Reviewed-by: Thomas Gleixner
Signed-off-by: Arnd Bergmann
Signed-off-by: John Stultz
22 Sep, 2015
1 commit
-
Signed-off-by: Zhen Lei
Cc: Hanjun Guo
Cc: John Stultz
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tianhong Ding
Cc: Viresh Kumar
Cc: Xinwei Hu
Cc: Xunlei Pang
Cc: Zefan Li
Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com
[ Fixed yet another typo in one of the sentences fixed. ]
Signed-off-by: Ingo Molnar
13 Sep, 2015
1 commit
-
The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.Per include/linux/kernel.h:
"abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).This patch fixes the issue by using abs64() instead.
Reported-by: Nuno Gonçalves
Tested-by: Nuno Goncalves
Signed-off-by: John Stultz
Cc: # v3.17+
Cc: Linus Torvalds
Cc: Miroslav Lichvar
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Richard Cochran
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar