24 Jul, 2014
40 commits
-
This adds some documentation about clock sources, clock events,
the weak sched_clock() function and delay timers that answers
questions that repeatedly arise on the mailing lists.Cc: Thomas Gleixner
Cc: Nicolas Pitre
Cc: Colin Cross
Cc: John Stultz
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Linus Walleij
Acked-by: Nicolas Pitre
Signed-off-by: John Stultz -
By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accummulating error.Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.Cc: Miroslav Lichvar
Cc: Richard Cochran
Cc: Prarit Bhargava
Reported-by: Miroslav Lichvar
Signed-off-by: John Stultz -
The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.This method performs fairly well in the simulator Miroslav created.
Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.Cc: Miroslav Lichvar
Cc: Richard Cochran
Cc: Prarit Bhargava
Reported-by: Miroslav Lichvar
Signed-off-by: John Stultz -
In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation,
we take the tk_xtime() value, which returns a timespec64, and
store it in a timespec.This luckily is ok, since the only architectures that use
GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both
64 bit systems where timespec64 is the same as a timespec.Even so, for cleanliness reasons, use the conversion function
to assign the proper type.Signed-off-by: John Stultz
-
Expose the new NMI safe accessor to clock monotonic to the tracer.
Signed-off-by: Thomas Gleixner
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Mathieu Desnoyers
Signed-off-by: John Stultz -
Tracers want a correlated time between the kernel instrumentation and
user space. We really do not want to export sched_clock() to user
space, so we need to provide something sensible for this.Using separate data structures with an non blocking sequence count
based update mechanism allows us to do that. The data structure
required for the readout has a sequence counter and two copies of the
timekeeping data.On the update side:
smp_wmb();
tkf->seq++;
smp_wmb();
update(tkf->base[0], tk);
smp_wmb();
tkf->seq++;
smp_wmb();
update(tkf->base[1], tk);On the reader side:
do {
seq = tkf->seq;
smp_rmb();
idx = seq & 0x01;
now = now(tkf->base[idx]);
smp_rmb();
} while (seq != tkf->seq)So if a NMI hits the update of base[0] it will use base[1] which is
still consistent, but this timestamp is not guaranteed to be monotonic
across an update.The timestamp is calculated by:
now = base_mono + clock_delta * slope
So if the update lowers the slope, readers who are forced to the
not yet updated second array are still using the old steeper slope.tmono
^
| o n
| o n
| u
| o
|o
|12345678---> reader ordero = old slope
u = update
n = new slopeSo reader 6 will observe time going backwards versus reader 5.
While other CPUs are likely to be able observe that, the only way
for a CPU local observation is when an NMI hits in the middle of
the update. Timestamps taken from that NMI context might be ahead
of the following timestamps. Callers need to be aware of that and
deal with it.V2: Got rid of clock monotonic raw and reorganized the data
structures. Folded in the barrier fix from Mathieu.Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Signed-off-by: John Stultz -
For NMI safe access to clock monotonic we use the seqcount LSB as
index of a timekeeper array. The update sequence looks like this:smp_wmb();
Cc: John Stultz
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
raw_read_seqcount opens a read critical section of the given seqcount
without any lockdep checking and without checking or masking the
LSB. Calling code is responsible for handling that.Preparatory patch to provide a NMI safe clock monotonic accessor
function.Signed-off-by: Thomas Gleixner
Cc: John Stultz
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Signed-off-by: John Stultz -
All the function needs is in the tk_read_base struct. No functional
change for the current code, just a preparatory patch for the NMI safe
accessor to clock monotonic which will use struct tk_read_base as well.Signed-off-by: Thomas Gleixner
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Mathieu Desnoyers
Signed-off-by: John Stultz -
The members of the new struct are the required ones for the new NMI
safe accessor to clcok monotonic. In order to reuse the existing
timekeeping code and to make the update of the fast NMI safe
timekeepers a simple memcpy use the struct for the timekeeper as well
and convert all users.Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
Signed-off-by: John Stultz -
Access to time requires to touch two cachelines at minimum
1) The timekeeper data structure
2) The clocksource data structure
The access to the clocksource data structure can be avoided as almost
all clocksource implementations ignore the argument to the read
callback, which is a pointer to the clocksource.But the core needs to touch it to access the members @read and @mask.
So we are better off by copying the @read function pointer and the
@mask from the clocksource to the core data structure itself.For the most used ktime_get() access all required data including the
@read and @mask copies fits together with the sequence counter into a
single 64 byte cacheline.For the other time access functions we touch in the current code three
cache lines in the worst case. But with the clocksource data copies we
can reduce that to two adjacent cachelines, which is more efficient
than disjunct cache lines.Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
cycle_last was added to the clocksource to support the TSC
validation. We moved that to the core code, so we can get rid of the
extra copy.Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
The only user of the cycle_last validation is the x86 TSC. In order to
provide NMI safe accessor functions for clock monotonic and
monotonic_raw we need to do that in the core.We can't do the TSC specific
if (now < cycle_last)
now = cycle_last;for the other wrapping around clocksources, but TSC has
CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
now is less than cycle_last the subtraction will give a negative
result. So we can check for that in clocksource_delta() and return 0
for that case.Implement and enable it for x86
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
We want to move the TSC sanity check into core code to make NMI safe
accessors to clock monotonic[_raw] possible. For this we need to
sanity check the delta calculation. Create a helper function and
convert all sites to use it.[ Build fix from jstultz ]
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
We have interfaces. Remove the open coded cruft. Reduces text size
along with the code.Signed-off-by: Thomas Gleixner
Cc: QCA ath9k Development
Cc: John W. Linville
Signed-off-by: John Stultz -
No point in converting timespecs back and forth.
Signed-off-by: Thomas Gleixner
Cc: Thomas Hellstrom
Signed-off-by: John Stultz -
Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.Signed-off-by: Thomas Gleixner
Acked-by: Daniel Vetter
Signed-off-by: John Stultz -
Provide a ktime_t based interface for raw monotonic time.
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
There is no point in having a S390 private implementation and there is
no point in using the raw monotonic time. The NTP freqeuency
adjustment of CLOCK_MONOTONIC is really not doing any harm for the
hang check timer.Use ktime_get_ns() for everything and get rid of the timespec
conversions.V2: Drop the raw monotonic and the S390 special case
Signed-off-by: Thomas Gleixner
Cc: Arnd Bergmann
Cc: Greg Kroah-Hartman
Cc: Heiko Carstens
Acked-by: Greg Kroah-Hartman
Signed-off-by: John Stultz -
timekeeping_clocktai() is not used in fast pathes, so the extra
timespec conversion is not problematic.Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
No more users. Remove it
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
Subtracting plain nsec values and converting to timespec is simpler
than the whole timespec math. Not really fastpath code, so the
division is not an issue.Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
get_monotonic_boottime() is not used in fast pathes, so the extra
timespec conversion is not problematic.Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
No more users.
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
Convert the relevant base data right away to nanoseconds instead of
doing the conversion on every readout. Reduces text size by 160 bytes.Signed-off-by: Thomas Gleixner
Cc: Gleb Natapov
Cc: kvm@vger.kernel.org
Acked-by: Paolo Bonzini
Signed-off-by: John Stultz -
Use the new nanoseconds based interface and get rid of the timespec
conversion dance.Signed-off-by: Thomas Gleixner
Cc: Gleb Natapov
Cc: kvm@vger.kernel.org
Acked-by: Paolo Bonzini
Signed-off-by: John Stultz -
Use the nanoseconds based interface instead of converting from a
timespec.Signed-off-by: Thomas Gleixner
Cc: Russell King
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: John Stultz -
No idea why iio needs wall clock based time stamps, but we can avoid
the timespec conversion dance by using the new interfaces.Signed-off-by: Thomas Gleixner
Acked-by: Jonathan Cameron
Signed-off-by: John Stultz -
Using the wall clock time for delta time calculations is wrong to
begin with because wall clock time can be set from userspace and NTP.
Such data wants to be based on clock monotonic.The calculations also are done on a nanosecond basis. Use the
nanoseconds based interface right away.Signed-off-by: Thomas Gleixner
Cc: Jean Delvare
Acked-by: Jean Delvare
Signed-off-by: John Stultz -
Replace the ever recurring:
ts = ktime_get_ts();
ns = timespec_to_ns(&ts);
with
ns = ktime_get_ns();Signed-off-by: Thomas Gleixner
Acked-by: Trond Myklebust
Cc: "J. Bruce Fields"
Signed-off-by: John Stultz -
This code is beyond silly:
struct timespec ts = ktime_get_ts();
ktime_t ktime = timespec_to_ktime(ts);Further down the code builds the delta of two ktime_t values and
converts the result to nanoseconds.Use ktime_get_ns() and replace all the nonsense.
Signed-off-by: Thomas Gleixner
Cc: Eli Cohen
Signed-off-by: John Stultz -
Replace the ever recurring:
ts = ktime_get_ts();
ns = timespec_to_ns(&ts);
with
ns = ktime_get_ns();Signed-off-by: Thomas Gleixner
Acked-by: Arnd Bergmann
Acked-by: Greg Kroah-Hartman
Signed-off-by: John Stultz -
Replace the ever recurring:
ts = ktime_get_ts();
ns = timespec_to_ns(&ts);
with
ns = ktime_get_ns();Signed-off-by: Thomas Gleixner
Acked-by: Lee Jones
Signed-off-by: John Stultz -
Replace the ever recurring:
ts = ktime_get_ts();
ns = timespec_to_ns(&ts);
with
ns = ktime_get_ns();Signed-off-by: Thomas Gleixner
Cc: Evgeniy Polyakov
Signed-off-by: John Stultz -
Replace the ever recurring:
ts = ktime_get_ts();
ns = timespec_to_ns(&ts);
with
ns = ktime_get_ns();Signed-off-by: Thomas Gleixner
Acked-by: Arnd Bergmann
Signed-off-by: John Stultz -
Converting cputime to timespec and timespec to nanoseconds makes no
sense. Use cputime_to_ns() and be done with it.Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
Kill the timespec juggling and calculate with plain nanoseconds.
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
Simplify the timespec to nsec/usec conversions.
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
Simplify the only user of this data by removing the timespec
conversion.Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz -
Required for moving drivers to the nanosecond based interfaces.
Signed-off-by: Thomas Gleixner
Signed-off-by: John Stultz