23 Nov, 2010

1 commit


09 Jun, 2010

1 commit

  • For people who otherwise get to write: cpu_clock(smp_processor_id()),
    there is now: local_clock().

    Also, as per suggestion from Andrew, provide some documentation on
    the various clock interfaces, and minimize the unsigned long long vs
    u64 mess.

    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jens Axboe
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Apr, 2010

1 commit

  • After merging the block tree, 20100414's linux-next build (x86_64
    allmodconfig) failed like this:

    ERROR: "get_gendisk" [block/blk-cgroup.ko] undefined!
    ERROR: "sched_clock" [block/blk-cgroup.ko] undefined!

    This happens because the two symbols aren't exported and hence not available
    when blk-cgroup code is built as a module. I've tried to stay consistent with
    the use of EXPORT_SYMBOL or EXPORT_SYMBOL_GPL with the other symbols in the
    respective files.

    Signed-off-by: Divyesh Shah
    Acked-by: Gui Jianfeng
    Signed-off-by: Jens Axboe

    Divyesh Shah
     

15 Dec, 2009

1 commit

  • Relax stable-sched-clock architectures to not save/disable/restore
    hardirqs in cpu_clock().

    The background is that I was trying to resolve a sparc64 perf
    issue when I discovered this problem.

    On sparc64 I implement pseudo NMIs by simply running the kernel
    at IRQ level 14 when local_irq_disable() is called, this allows
    performance counter events to still come in at IRQ level 15.

    This doesn't work if any code in an NMI handler does
    local_irq_save() or local_irq_disable() since the "disable" will
    kick us back to cpu IRQ level 14 thus letting NMIs back in and
    we recurse.

    The only path which that does that in the perf event IRQ
    handling path is the code supporting frequency based events. It
    uses cpu_clock().

    cpu_clock() simply invokes sched_clock() with IRQs disabled.

    And that's a fundamental bug all on it's own, particularly for
    the HAVE_UNSTABLE_SCHED_CLOCK case. NMIs can thus get into the
    sched_clock() code interrupting the local IRQ disable code
    sections of it.

    Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ
    disabling done by cpu_clock() is just pure overhead and
    completely unnecessary.

    So the core problem is that sched_clock() is not NMI safe, but
    we are invoking it from NMI contexts in the perf events code
    (via cpu_clock()).

    A less important issue is the overhead of IRQ disabling when it
    isn't necessary in cpu_clock().

    CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not
    affected by this patch.

    Signed-off-by: David S. Miller
    Acked-by: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    David Miller
     

01 Oct, 2009

1 commit

  • Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
    cmpxchg() of 64bit values was available on X86_32.

    That is not so - and causes some subtle scheduler misbehavior due
    to incorrect timestamps off to up by ~4 seconds.

    Two symptoms are known right now:

    - interactivity problems seen by Arjan: up to 600 msecs
    latencies instead of the expected 20-40 msecs. These
    latencies are very visible on the desktop.

    - incorrect CPU stats: occasionally too high percentages in 'top',
    and crazy CPU usage stats.

    Reported-by: Martin Schwidefsky
    Signed-off-by: Eric Dumazet
    Signed-off-by: Arjan van de Ven
    Acked-by: Linus Torvalds
    Cc: John Stultz
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     

19 Sep, 2009

1 commit

  • Arjan complained about the suckyness of TSC on modern machines, and
    asked if we could do something about that for PERF_SAMPLE_TIME.

    Make cpu_clock() NMI safe by removing the spinlock and using
    cmpxchg. This also makes it smaller and more robust.

    Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
    and x86.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 May, 2009

1 commit


02 Apr, 2009

1 commit


27 Feb, 2009

4 commits


31 Dec, 2008

1 commit

  • Redo:

    5b7dba4: sched_clock: prevent scd->clock from moving backwards

    which had to be reverted due to s2ram hangs:

    ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards"

    ... this time with resume restoring GTOD later in the sequence
    taken into account as well.

    The "timekeeping_suspended" flag is not very nice but we cannot call into
    GTOD before it has been properly resumed and the scheduler will run very
    early in the resume sequence.

    Cc:
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

15 Dec, 2008

1 commit

  • This reverts commit 5b7dba4ff834259a5623e03a565748704a8fe449, which
    caused a regression in hibernate, reported by and bisected by Fabio
    Comolli.

    This revert fixes

    http://bugzilla.kernel.org/show_bug.cgi?id=12155
    http://bugzilla.kernel.org/show_bug.cgi?id=12149

    Bisected-by: Fabio Comolli
    Requested-by: Rafael J. Wysocki
    Acked-by: Dave Kleikamp
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Oct, 2008

1 commit

  • When sched_clock_cpu() couples the clocks between two cpus, it may
    increment scd->clock beyond the GTOD tick window that __update_sched_clock()
    uses to clamp the clock. A later call to __update_sched_clock() may move
    the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
    monotonic property.

    This patch ensures that scd->clock will not be set backward.

    Signed-off-by: Dave Kleikamp
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Dave Kleikamp
     

25 Aug, 2008

1 commit

  • This patch fixes 3 issues:

    a) it removes the dependency on jiffies, because jiffies are incremented
    by a single CPU, and the tick is not synchronized between CPUs. Therefore
    relying on it to calculate a window to clip whacky TSC values doesn't work
    as it can drift around.

    So instead use [GTOD, GTOD+TICK_NSEC) as the window.

    b) __update_sched_clock() did (roughly speaking):

    delta = sched_clock() - scd->tick_raw;
    clock += delta;

    Which gives exponential growth, instead of linear.

    c) allows the sched_clock_cpu() value to warp the u64 without breaking.

    the results are more reliable sched_clock() deltas:

    before after sched_clock

    cpu_clock: 15750 51312 51488
    cpu_clock: 59719 51052 50947
    cpu_clock: 15879 51249 51061
    cpu_clock: 1 50933 51198
    cpu_clock: 1 50931 51039
    cpu_clock: 1 51093 50981
    cpu_clock: 1 51043 51040
    cpu_clock: 1 50959 50938
    cpu_clock: 1 50981 51011
    cpu_clock: 1 51364 51212
    cpu_clock: 1 51219 51273
    cpu_clock: 1 51389 51048
    cpu_clock: 1 51285 51611
    cpu_clock: 1 50964 51137
    cpu_clock: 1 50973 50968
    cpu_clock: 1 50967 50972
    cpu_clock: 1 58910 58485
    cpu_clock: 1 51082 51025
    cpu_clock: 1 50957 50958
    cpu_clock: 1 50958 50957
    cpu_clock: 1006128 51128 50971
    cpu_clock: 1 51107 51155
    cpu_clock: 1 51371 51081
    cpu_clock: 1 51104 51365
    cpu_clock: 1 51363 51309
    cpu_clock: 1 51107 51160
    cpu_clock: 1 51139 51100
    cpu_clock: 1 51216 51136
    cpu_clock: 1 51207 51215
    cpu_clock: 1 51087 51263
    cpu_clock: 1 51249 51177
    cpu_clock: 1 51519 51412
    cpu_clock: 1 51416 51255
    cpu_clock: 1 51591 51594
    cpu_clock: 1 50966 51374
    cpu_clock: 1 50966 50966
    cpu_clock: 1 51291 50948
    cpu_clock: 1 50973 50867
    cpu_clock: 1 50970 50970
    cpu_clock: 998306 50970 50971
    cpu_clock: 1 50971 50970
    cpu_clock: 1 50970 50970
    cpu_clock: 1 50971 50971
    cpu_clock: 1 50970 50970
    cpu_clock: 1 51351 50970
    cpu_clock: 1 50970 51352
    cpu_clock: 1 50971 50970
    cpu_clock: 1 50970 50970
    cpu_clock: 1 51321 50971
    cpu_clock: 1 50974 51324

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Aug, 2008

1 commit

  • Some arch's can't handle sched_clock() being called too early - delay
    this until sched_clock_init() has been called.

    Reported-by: Bill Gatliff
    Signed-off-by: Peter Zijlstra
    Tested-by: Nishanth Aravamudan
    CC: Russell King - ARM Linux
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

31 Jul, 2008

5 commits

  • When taking the time of a remote CPU, use the opportunity to
    couple (sync) the clocks to each other. (in a monotonic way)

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Mike Galbraith

    Ingo Molnar
     
  • - return the current clock instead of letting callers
    fetch it from scd->clock

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Mike Galbraith

    Ingo Molnar
     
  • eliminate prev_raw and use tick_raw instead.

    It's enough to base the current time on the scheduler tick timestamp
    alone - the monotonicity and maximum checks will prevent any damage.

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Mike Galbraith

    Ingo Molnar
     
  • - simplify the remote clock rebasing

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Mike Galbraith

    Ingo Molnar
     
  • Found an interactivity problem on a quad core test-system - simple
    CPU loops would occasionally delay the system un an unacceptable way.

    After much debugging with Peter Zijlstra it turned out that the problem
    is caused by the string of sched_clock() changes - they caused the CPU
    clock to jump backwards a bit - which confuses the scheduler arithmetics.

    (which is unsigned for performance reasons)

    So revert:

    # c300ba2: sched_clock: and multiplier for TSC to gtod drift
    # c0c8773: sched_clock: only update deltas with local reads.
    # af52a90: sched_clock: stop maximum check on NO HZ
    # f7cce27: sched_clock: widen the max and min time

    This solves the interactivity problems.

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Mike Galbraith

    Ingo Molnar
     

28 Jul, 2008

1 commit

  • Move sched_clock() up to stop warning: weak declaration of `sched_clock'
    after first use results in unspecified behavior (if -fno-unit-at-a-time).

    Signed-off-by: Hugh Dickins
    Cc: Mike Travis
    Cc: Ben Herrenschmidt
    Cc: Linuxppc-dev@ozlabs.org
    Signed-off-by: Ingo Molnar

    Hugh Dickins
     

14 Jul, 2008

1 commit


11 Jul, 2008

7 commits

  • The sched_clock code currently tries to keep all CPU clocks of all CPUS
    somewhat in sync. At every clock tick it records the gtod clock and
    uses that and jiffies and the TSC to calculate a CPU clock that tries to
    stay in sync with all the other CPUs.

    ftrace depends heavily on this timer and it detects when this timer
    "jumps". One problem is that the TSC and the gtod also drift.
    When the TSC is 0.1% faster or slower than the gtod it is very noticeable
    in ftrace. To help compensate for this, I've added a multiplier that
    tries to keep the CPU clock updating at the same rate as the gtod.

    I've tried various ways to get it to be in sync and this ended up being
    the most reliable. At every scheduler tick we calculate the new multiplier:

    multi = delta_gtod / delta_TSC

    This means we perform a 64 bit divide at the tick (once a HZ). A shift
    is used to handle the accuracy.

    Other methods that failed due to dynamic HZ are:

    (not used) multi += (gtod - tsc) / delta_gtod
    (not used) multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod

    as well as other variants.

    This code still allows for a slight drift between TSC and gtod, but
    it keeps the damage down to a minimum.

    Signed-off-by: Steven Rostedt
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: john stultz
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • To read the gtod we need to grab the xtime lock for read. Reading the gtod
    before the TSC can cause a bigger gab if the xtime lock is contended.

    This patch simply reverses the order to read the TSC after the gtod.
    The locking in the reading of the gtod handles any barriers one might
    think is needed.

    Signed-off-by: Steven Rostedt
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: john stultz
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Reading the CPU clock should try to stay accurate within the CPU.
    By reading the CPU clock from another CPU and updating the deltas can
    cause unneeded jumps when reading from the local CPU.

    This patch changes the code to update the last read TSC only when read
    from the local CPU.

    Signed-off-by: Steven Rostedt
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: john stultz
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • The algorithm to calculate the 'now' of another CPU is not correct.
    At each scheduler tick, each CPU records the last sched_clock and
    gtod (tick_raw and tick_gtod respectively). If the TSC is somewhat the
    same in speed between two clocks the algorithm would be:

    tick_gtod1 + (now1 - tick_raw1) = tick_gtod2 + (now2 - tick_raw2)

    To calculate now2 we would have:

    now2 = (tick_gtod1 - tick_gtod2) + (tick_raw2 - tick_raw1) + now1

    Currently the algorithm is:

    now2 = (tick_gtod1 - tick_gtod2) + (tick_raw1 - tick_raw2) + now1

    This solves most of the rest of the issues I've had with timestamps in
    ftace.

    Signed-off-by: Steven Rostedt
    Cc: Andrew Morton
    Cc: john stultz
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Working with ftrace I would get large jumps of 11 millisecs or more with
    the clock tracer. This killed the latencing timings of ftrace and also
    caused the irqoff self tests to fail.

    What was happening is with NO_HZ the idle would stop the jiffy counter and
    before the jiffy counter was updated the sched_clock would have a bad
    delta jiffies to compare with the gtod with the maximum.

    The jiffies would stop and the last sched_tick would record the last gtod.
    On wakeup, the sched clock update would compare the gtod + delta jiffies
    (which would be zero) and compare it to the TSC. The TSC would have
    correctly (with a stable TSC) moved forward several jiffies. But because the
    jiffies has not been updated yet the clock would be prevented from moving
    forward because it would appear that the TSC jumped too far ahead.

    The clock would then virtually stop, until the jiffies are updated. Then
    the next sched clock update would see that the clock was very much behind
    since the delta jiffies is now correct. This would then jump the clock
    forward by several jiffies.

    This caused ftrace to report several milliseconds of interrupts off
    latency at every resume from NO_HZ idle.

    This patch adds hooks into the nohz code to disable the checking of the
    maximum clock update when nohz is in effect. It resumes the max check
    when nohz has updated the jiffies again.

    Signed-off-by: Steven Rostedt
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • With keeping the max and min sched time within one jiffy of the gtod clock
    was too tight. Just before a schedule tick the max could easily be hit, as
    well as just after a schedule_tick the min could be hit. This caused the
    clock to jump around by a jiffy.

    This patch widens the minimum to
    last gtod + (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSECS

    and the maximum to
    last gtod + (2 + delta_jiffies) * TICK_NSECS

    This keeps the minum to gtod or if one jiffy less than delta jiffies
    and the maxim 2 jiffies ahead of gtod. This may cause unstable TSCs to be
    a bit more sporadic, but it helps keep a clock with a stable TSC working well.

    Signed-off-by: Steven Rostedt
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • The sched_clock code tries to keep within the gtod time by one tick (jiffy).
    The current code mistakenly keeps track of the delta jiffies between
    updates of the clock, where the the delta is used to compare with the
    number of jiffies that have past since an update of the gtod. The gtod is
    updated at each schedule tick not each sched_clock update. After one
    jiffy passes the clock is updated fine. But the delta is taken from the
    last update so if the next update happens before the next tick the delta
    jiffies used will be incorrect.

    This patch changes the code to check the delta of jiffies between ticks
    and not updates to match the comparison of the updates with the gtod.

    Signed-off-by: Steven Rostedt
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

29 Jun, 2008

1 commit

  • Vegard Nossum reported:

    > WARNING: at kernel/lockdep.c:2738 check_flags+0x142/0x160()

    which happens due to:

    unsigned long long cpu_clock(int cpu)
    {
    unsigned long long clock;
    unsigned long flags;

    raw_local_irq_save(flags);

    as lower level functions can take locks, we must not do that, use
    proper lockdep-annotated irq save/restore.

    Reported-by: Vegard Nossum
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

27 Jun, 2008

2 commits


29 May, 2008

1 commit


06 May, 2008

1 commit

  • this replaces the rq->clock stuff (and possibly cpu_clock()).

    - architectures that have an 'imperfect' hardware clock can set
    CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

    - the 'jiffie' window might be superfulous when we update tick_gtod
    before the __update_sched_clock() call in sched_clock_tick()

    - cpu_clock() might be implemented as:

    sched_clock_cpu(smp_processor_id())

    if the accuracy proves good enough - how far can TSC drift in a
    single jiffie when considering the filtering and idle hooks?

    [ mingo@elte.hu: various fixes and cleanups ]

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra