24 Jul, 2014

40 commits

  • This adds some documentation about clock sources, clock events,
    the weak sched_clock() function and delay timers that answers
    questions that repeatedly arise on the mailing lists.

    Cc: Thomas Gleixner
    Cc: Nicolas Pitre
    Cc: Colin Cross
    Cc: John Stultz
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Linus Walleij
    Acked-by: Nicolas Pitre
    Signed-off-by: John Stultz

    Linus Walleij
     
  • By caching the ntp_tick_length() when we correct the frequency error,
    and then using that cached value to accumulate error, we avoid large
    initial errors when the tick length is changed.

    This makes convergence happen much faster in the simulator, since the
    initial error doesn't have to be slowly whittled away.

    This initially seems like an accounting error, but Miroslav pointed out
    that ntp_tick_length() can change mid-tick, so when we apply it in the
    error accumulation, we are applying any recent change to the entire tick.

    This approach chooses to apply changes in the ntp_tick_length() only to
    the next tick, which allows us to calculate the freq correction before
    using the new tick length, which avoids accummulating error.

    Credit to Miroslav for pointing this out and providing the original patch
    this functionality has been pulled out from, along with the rational.

    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Reported-by: Miroslav Lichvar
    Signed-off-by: John Stultz

    John Stultz
     
  • The existing timekeeping_adjust logic has always been complicated
    to understand. Further, since it was developed prior to NOHZ becoming
    common, its not surprising it performs poorly when NOHZ is enabled.

    Since Miroslav pointed out the problematic nature of the existing code
    in the NOHZ case, I've tried to refactor the code to perform better.

    The problem with the previous approach was that it tried to adjust
    for the total cumulative error using a scaled dampening factor. This
    resulted in large errors to be corrected slowly, while small errors
    were corrected quickly. With NOHZ the timekeeping code doesn't know
    how far out the next tick will be, so this results in bad
    over-correction to small errors, and insufficient correction to large
    errors.

    Inspired by Miroslav's patch, I've refactored the code to try to
    address the correction in two steps.

    1) Check the future freq error for the next tick, and if the frequency
    error is large, try to make sure we correct it so it doesn't cause
    much accumulated error.

    2) Then make a small single unit adjustment to correct any cumulative
    error that has collected over time.

    This method performs fairly well in the simulator Miroslav created.

    Major credit to Miroslav for pointing out the issue, providing the
    original patch to resolve this, a simulator for testing, as well as
    helping debug and resolve issues in my implementation so that it
    performed closer to his original implementation.

    Cc: Miroslav Lichvar
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Reported-by: Miroslav Lichvar
    Signed-off-by: John Stultz

    John Stultz
     
  • In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation,
    we take the tk_xtime() value, which returns a timespec64, and
    store it in a timespec.

    This luckily is ok, since the only architectures that use
    GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both
    64 bit systems where timespec64 is the same as a timespec.

    Even so, for cleanliness reasons, use the conversion function
    to assign the proper type.

    Signed-off-by: John Stultz

    John Stultz
     
  • Expose the new NMI safe accessor to clock monotonic to the tracer.

    Signed-off-by: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Tracers want a correlated time between the kernel instrumentation and
    user space. We really do not want to export sched_clock() to user
    space, so we need to provide something sensible for this.

    Using separate data structures with an non blocking sequence count
    based update mechanism allows us to do that. The data structure
    required for the readout has a sequence counter and two copies of the
    timekeeping data.

    On the update side:

    smp_wmb();
    tkf->seq++;
    smp_wmb();
    update(tkf->base[0], tk);
    smp_wmb();
    tkf->seq++;
    smp_wmb();
    update(tkf->base[1], tk);

    On the reader side:

    do {
    seq = tkf->seq;
    smp_rmb();
    idx = seq & 0x01;
    now = now(tkf->base[idx]);
    smp_rmb();
    } while (seq != tkf->seq)

    So if a NMI hits the update of base[0] it will use base[1] which is
    still consistent, but this timestamp is not guaranteed to be monotonic
    across an update.

    The timestamp is calculated by:

    now = base_mono + clock_delta * slope

    So if the update lowers the slope, readers who are forced to the
    not yet updated second array are still using the old steeper slope.

    tmono
    ^
    | o n
    | o n
    | u
    | o
    |o
    |12345678---> reader order

    o = old slope
    u = update
    n = new slope

    So reader 6 will observe time going backwards versus reader 5.

    While other CPUs are likely to be able observe that, the only way
    for a CPU local observation is when an NMI hits in the middle of
    the update. Timestamps taken from that NMI context might be ahead
    of the following timestamps. Callers need to be aware of that and
    deal with it.

    V2: Got rid of clock monotonic raw and reorganized the data
    structures. Folded in the barrier fix from Mathieu.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • For NMI safe access to clock monotonic we use the seqcount LSB as
    index of a timekeeper array. The update sequence looks like this:

    smp_wmb();
    Cc: John Stultz
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Mathieu Desnoyers
     
  • raw_read_seqcount opens a read critical section of the given seqcount
    without any lockdep checking and without checking or masking the
    LSB. Calling code is responsible for handling that.

    Preparatory patch to provide a NMI safe clock monotonic accessor
    function.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • All the function needs is in the tk_read_base struct. No functional
    change for the current code, just a preparatory patch for the NMI safe
    accessor to clock monotonic which will use struct tk_read_base as well.

    Signed-off-by: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • The members of the new struct are the required ones for the new NMI
    safe accessor to clcok monotonic. In order to reuse the existing
    timekeeping code and to make the update of the fast NMI safe
    timekeepers a simple memcpy use the struct for the timekeeper as well
    and convert all users.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Access to time requires to touch two cachelines at minimum

    1) The timekeeper data structure

    2) The clocksource data structure

    The access to the clocksource data structure can be avoided as almost
    all clocksource implementations ignore the argument to the read
    callback, which is a pointer to the clocksource.

    But the core needs to touch it to access the members @read and @mask.

    So we are better off by copying the @read function pointer and the
    @mask from the clocksource to the core data structure itself.

    For the most used ktime_get() access all required data including the
    @read and @mask copies fits together with the sequence counter into a
    single 64 byte cacheline.

    For the other time access functions we touch in the current code three
    cache lines in the worst case. But with the clocksource data copies we
    can reduce that to two adjacent cachelines, which is more efficient
    than disjunct cache lines.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • cycle_last was added to the clocksource to support the TSC
    validation. We moved that to the core code, so we can get rid of the
    extra copy.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • The only user of the cycle_last validation is the x86 TSC. In order to
    provide NMI safe accessor functions for clock monotonic and
    monotonic_raw we need to do that in the core.

    We can't do the TSC specific

    if (now < cycle_last)
    now = cycle_last;

    for the other wrapping around clocksources, but TSC has
    CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
    now is less than cycle_last the subtraction will give a negative
    result. So we can check for that in clocksource_delta() and return 0
    for that case.

    Implement and enable it for x86

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • We want to move the TSC sanity check into core code to make NMI safe
    accessors to clock monotonic[_raw] possible. For this we need to
    sanity check the delta calculation. Create a helper function and
    convert all sites to use it.

    [ Build fix from jstultz ]

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • We have interfaces. Remove the open coded cruft. Reduces text size
    along with the code.

    Signed-off-by: Thomas Gleixner
    Cc: QCA ath9k Development
    Cc: John W. Linville
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No point in converting timespecs back and forth.

    Signed-off-by: Thomas Gleixner
    Cc: Thomas Hellstrom
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Use ktime_get_raw_ns() and get rid of the back and forth timespec
    conversions.

    Signed-off-by: Thomas Gleixner
    Acked-by: Daniel Vetter
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Provide a ktime_t based interface for raw monotonic time.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • There is no point in having a S390 private implementation and there is
    no point in using the raw monotonic time. The NTP freqeuency
    adjustment of CLOCK_MONOTONIC is really not doing any harm for the
    hang check timer.

    Use ktime_get_ns() for everything and get rid of the timespec
    conversions.

    V2: Drop the raw monotonic and the S390 special case

    Signed-off-by: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Greg Kroah-Hartman
    Cc: Heiko Carstens
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • timekeeping_clocktai() is not used in fast pathes, so the extra
    timespec conversion is not problematic.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No more users. Remove it

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Subtracting plain nsec values and converting to timespec is simpler
    than the whole timespec math. Not really fastpath code, so the
    division is not an issue.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • get_monotonic_boottime() is not used in fast pathes, so the extra
    timespec conversion is not problematic.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No more users.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Convert the relevant base data right away to nanoseconds instead of
    doing the conversion on every readout. Reduces text size by 160 bytes.

    Signed-off-by: Thomas Gleixner
    Cc: Gleb Natapov
    Cc: kvm@vger.kernel.org
    Acked-by: Paolo Bonzini
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Use the new nanoseconds based interface and get rid of the timespec
    conversion dance.

    Signed-off-by: Thomas Gleixner
    Cc: Gleb Natapov
    Cc: kvm@vger.kernel.org
    Acked-by: Paolo Bonzini
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Use the nanoseconds based interface instead of converting from a
    timespec.

    Signed-off-by: Thomas Gleixner
    Cc: Russell King
    Cc: linux-arm-kernel@lists.infradead.org
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • No idea why iio needs wall clock based time stamps, but we can avoid
    the timespec conversion dance by using the new interfaces.

    Signed-off-by: Thomas Gleixner
    Acked-by: Jonathan Cameron
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Using the wall clock time for delta time calculations is wrong to
    begin with because wall clock time can be set from userspace and NTP.
    Such data wants to be based on clock monotonic.

    The calculations also are done on a nanosecond basis. Use the
    nanoseconds based interface right away.

    Signed-off-by: Thomas Gleixner
    Cc: Jean Delvare
    Acked-by: Jean Delvare
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Replace the ever recurring:
    ts = ktime_get_ts();
    ns = timespec_to_ns(&ts);
    with
    ns = ktime_get_ns();

    Signed-off-by: Thomas Gleixner
    Acked-by: Trond Myklebust
    Cc: "J. Bruce Fields"
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • This code is beyond silly:

    struct timespec ts = ktime_get_ts();
    ktime_t ktime = timespec_to_ktime(ts);

    Further down the code builds the delta of two ktime_t values and
    converts the result to nanoseconds.

    Use ktime_get_ns() and replace all the nonsense.

    Signed-off-by: Thomas Gleixner
    Cc: Eli Cohen
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Replace the ever recurring:
    ts = ktime_get_ts();
    ns = timespec_to_ns(&ts);
    with
    ns = ktime_get_ns();

    Signed-off-by: Thomas Gleixner
    Acked-by: Arnd Bergmann
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Replace the ever recurring:
    ts = ktime_get_ts();
    ns = timespec_to_ns(&ts);
    with
    ns = ktime_get_ns();

    Signed-off-by: Thomas Gleixner
    Acked-by: Lee Jones
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Replace the ever recurring:
    ts = ktime_get_ts();
    ns = timespec_to_ns(&ts);
    with
    ns = ktime_get_ns();

    Signed-off-by: Thomas Gleixner
    Cc: Evgeniy Polyakov
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Replace the ever recurring:
    ts = ktime_get_ts();
    ns = timespec_to_ns(&ts);
    with
    ns = ktime_get_ns();

    Signed-off-by: Thomas Gleixner
    Acked-by: Arnd Bergmann
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Converting cputime to timespec and timespec to nanoseconds makes no
    sense. Use cputime_to_ns() and be done with it.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Kill the timespec juggling and calculate with plain nanoseconds.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Simplify the timespec to nsec/usec conversions.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Simplify the only user of this data by removing the timespec
    conversion.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner
     
  • Required for moving drivers to the nanosecond based interfaces.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: John Stultz

    Thomas Gleixner