27 Mar, 2015

6 commits

  • Trivial cleanups, to improve the readability of the generic sched_clock() code:

    - Improve and standardize comments
    - Standardize the coding style
    - Use vertical spacing where appropriate
    - etc.

    No code changed:

    md5:
    19a053b31e0c54feaeff1492012b019a sched_clock.o.before.asm
    19a053b31e0c54feaeff1492012b019a sched_clock.o.after.asm

    Cc: Catalin Marinas
    Cc: Daniel Thompson
    Cc: John Stultz
    Cc: Peter Zijlstra (Intel)
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Stephen Boyd
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Currently it is possible for an NMI (or FIQ on ARM) to come in
    and read sched_clock() whilst update_sched_clock() has locked
    the seqcount for writing. This results in the NMI handler
    locking up when it calls raw_read_seqcount_begin().

    This patch fixes the NMI safety issues by providing banked clock
    data. This is a similar approach to the one used in Thomas
    Gleixner's 4396e058c52e("timekeeping: Provide fast and NMI safe
    access to CLOCK_MONOTONIC").

    Suggested-by: Stephen Boyd
    Signed-off-by: Daniel Thompson
    Signed-off-by: John Stultz
    Reviewed-by: Stephen Boyd
    Acked-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1427397806-20889-6-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Thompson
     
  • Currently update_sched_clock() is marked as notrace but this
    function is not called by ftrace. This is trivially fixed by
    removing the mark up.

    Signed-off-by: Daniel Thompson
    Signed-off-by: John Stultz
    Reviewed-by: Stephen Boyd
    Acked-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1427397806-20889-5-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Thompson
     
  • Currently cd.read_data.suspended is read by the hotpath function
    sched_clock(). This variable need not be accessed on the
    hotpath. In fact, once it is removed, we can remove the
    conditional branches from sched_clock() and install a dummy
    read_sched_clock function to suspend the clock.

    The new master copy of the function pointer
    (actual_read_sched_clock) is introduced and is used for all
    reads of the clock hardware except those within sched_clock
    itself.

    Suggested-by: Thomas Gleixner
    Signed-off-by: Daniel Thompson
    Signed-off-by: John Stultz
    Reviewed-by: Stephen Boyd
    Acked-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1427397806-20889-4-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Thompson
     
  • Currently sched_clock(), a very hot code path, is not optimized
    to minimise its cache profile. In particular:

    1. cd is not ____cacheline_aligned,

    2. struct clock_data does not distinguish between hotpath and
    coldpath data, reducing locality of reference in the hotpath,

    3. Some hotpath data is missing from struct clock_data and is marked
    __read_mostly (which more or less guarantees it will not share a
    cache line with cd).

    This patch corrects these problems by extracting all hotpath
    data into a separate structure and using ____cacheline_aligned
    to ensure the hotpath uses a single (64 byte) cache line.

    Signed-off-by: Daniel Thompson
    Signed-off-by: John Stultz
    Reviewed-by: Stephen Boyd
    Acked-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1427397806-20889-3-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Thompson
     
  • Currently the scope of the raw_write_seqcount_begin/end() in
    sched_clock_register() far exceeds the scope of the read section
    in sched_clock(). This gives the impression of safety during
    cursory review but achieves little.

    Note that this is likely to be a latent issue at present because
    sched_clock_register() is typically called before we enable
    interrupts, however the issue does risk bugs being needlessly
    introduced as the code evolves.

    This patch fixes the problem by increasing the scope of the read
    locking performed by sched_clock() to cover all data modified by
    sched_clock_register.

    We also improve clarity by moving writes to struct clock_data
    that do not impact sched_clock() outside of the critical
    section.

    Signed-off-by: Daniel Thompson
    [ Reworked it slightly to apply to tip/timers/core]
    Signed-off-by: John Stultz
    Reviewed-by: Stephen Boyd
    Acked-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1427397806-20889-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    Daniel Thompson
     

12 Mar, 2015

2 commits

  • In order to facilitate clocksource validation, add a
    'max_cycles' field to the clocksource structure which
    will hold the maximum cycle value that can safely be
    multiplied without potentially causing an overflow.

    Signed-off-by: John Stultz
    Cc: Dave Jones
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Stephen Boyd
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     
  • The clocksource logic has a number of places where we try to
    include a safety margin. Most of these are 12% safety margins,
    but they are inconsistently applied and sometimes are applied
    on top of each other.

    Additionally, in the previous patch, we corrected an issue
    where we unintentionally in effect created a 50% safety margin,
    which these 12.5% margins where then added to.

    So to simplify the logic here, this patch removes the various
    12.5% margins, and consolidates adding the margin in one place:
    clocks_calc_max_nsecs().

    Additionally, Linus prefers a 50% safety margin, as it allows
    bad clock values to be more easily caught. This should really
    have no net effect, due to the corrected issue earlier which
    caused greater then 50% margins to be used w/o issue.

    Signed-off-by: John Stultz
    Acked-by: Stephen Boyd (for the sched_clock.c bit)
    Cc: Dave Jones
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     

24 Jul, 2014

1 commit

  • During suspend we call sched_clock_poll() to update the epoch and
    accumulated time and reprogram the sched_clock_timer to fire
    before the next wrap-around time. Unfortunately,
    sched_clock_poll() doesn't restart the timer, instead it relies
    on the hrtimer layer to do that and during suspend we aren't
    calling that function from the hrtimer layer. Instead, we're
    reprogramming the expires time while the hrtimer is enqueued,
    which can cause the hrtimer tree to be corrupted. Furthermore, we
    restart the timer during suspend but we update the epoch during
    resume which seems counter-intuitive.

    Let's fix this by saving the accumulated state and canceling the
    timer during suspend. On resume we can update the epoch and
    restart the timer similar to what we would do if we were starting
    the clock for the first time.

    Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
    Signed-off-by: Stephen Boyd
    Signed-off-by: John Stultz
    Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
    Cc: Ingo Molnar
    Cc: stable
    Signed-off-by: Thomas Gleixner

    Stephen Boyd
     

23 Apr, 2014

1 commit


20 Feb, 2014

1 commit

  • The generic sched_clock registration function was previously
    done lockless, due to the fact that it was expected to be called
    only once. However, now there are systems that may register
    multiple sched_clock sources, for which the lack of locking has
    casued problems:

    If two sched_clock sources are registered we may end up in a
    situation where a call to sched_clock() may be accessing the
    epoch cycle count for the old counter and the cycle count for the
    new counter. This can lead to confusing results where
    sched_clock() values jump and then are reset to 0 (due to the way
    the registration function forces the epoch_ns to be 0).

    Fix this by reorganizing the registration function to hold the
    seqlock for as short a time as possible while we update the
    clock_data structure for a new counter. We also put any
    accumulated time into epoch_ns instead of resetting the time to
    0 so that the clock doesn't reset after each successful
    registration.

    [jstultz: Added extra context to the commit message]

    Reported-by: Will Deacon
    Signed-off-by: Stephen Boyd
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Will Deacon
    Cc: Peter Zijlstra
    Cc: Josh Cartwright
    Link: http://lkml.kernel.org/r/1392662736-7803-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: John Stultz
    Signed-off-by: Thomas Gleixner

    Stephen Boyd
     

12 Jan, 2014

1 commit

  • Unfortunately the seqlock lockdep enablement can't be used
    in sched_clock(), since the lockdep infrastructure eventually
    calls into sched_clock(), which causes a deadlock.

    Thus, this patch changes all generic sched_clock() usage
    to use the raw_* methods.

    Acked-by: Linus Torvalds
    Reviewed-by: Stephen Boyd
    Reported-by: Krzysztof Hałasa
    Signed-off-by: John Stultz
    Cc: Uwe Kleine-König
    Cc: Willy Tarreau
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     

10 Oct, 2013

1 commit


31 Jul, 2013

3 commits

  • The ARM architected system counter has at least 56 usable bits.
    Add support for counters with more than 32 bits to the generic
    sched_clock implementation so we can increase the time between
    wakeups due to dealing with wrap-around on these devices while
    benefiting from the irqtime accounting and suspend/resume
    handling that the generic sched_clock code already has. On my
    system using 56 bits over 32 bits changes the wraparound time
    from a few minutes to an hour. For faster running counters (GHz
    range) this is even more important because we may not be able to
    execute the timer in time to deal with the wraparound if only 32
    bits are used.

    We choose a maxsec value of 3600 seconds because we assume no
    system will go idle for more than an hour. In the future we may
    need to increase this value.

    Note: All users should switch over to the 64-bit read function so
    we can remove setup_sched_clock() in favor of sched_clock_register().

    Cc: Russell King
    Signed-off-by: Stephen Boyd
    Signed-off-by: John Stultz

    Stephen Boyd
     
  • In the next patch we're going to increase the number of bits that
    the generic sched_clock can handle to be greater than 32. With
    more than 32 bits the wraparound time can be larger than what can
    fit into the units that msecs_to_jiffies takes (unsigned int).
    Luckily, the wraparound is initially calculated in nanoseconds
    which we can easily use with hrtimers, so switch to using an
    hrtimer.

    Cc: Russell King
    Signed-off-by: Stephen Boyd
    [jstultz: Fixup hrtimer intitialization order issue]
    Signed-off-by: John Stultz

    Stephen Boyd
     
  • We're going to increase the cyc value to 64 bits in the near
    future. Doing that is going to break the custom seqcount
    implementation in the sched_clock code because 64 bit numbers
    aren't guaranteed to be atomic. Replace the cyc_copy with a
    seqcount to avoid this problem.

    Cc: Russell King
    Acked-by: Will Deacon
    Signed-off-by: Stephen Boyd
    Signed-off-by: John Stultz

    Stephen Boyd
     

18 Jun, 2013

1 commit

  • There is a small race between when the cycle count is read from
    the hardware and when the epoch stabilizes. Consider this
    scenario:

    CPU0 CPU1
    ---- ----
    cyc = read_sched_clock()
    cyc_to_sched_clock()
    update_sched_clock()
    ...
    cd.epoch_cyc = cyc;
    epoch_cyc = cd.epoch_cyc;
    ...
    epoch_ns + cyc_to_ns((cyc - epoch_cyc)

    The cyc on cpu0 was read before the epoch changed. But we
    calculate the nanoseconds based on the new epoch by subtracting
    the new epoch from the old cycle count. Since epoch is most likely
    larger than the old cycle count we calculate a large number that
    will be converted to nanoseconds and added to epoch_ns, causing
    time to jump forward too much.

    Fix this problem by reading the hardware after the epoch has
    stabilized.

    Cc: Russell King
    Signed-off-by: Stephen Boyd
    Signed-off-by: John Stultz

    Stephen Boyd
     

13 Jun, 2013

1 commit