13 May, 2015

1 commit

  • Critical tracepoint hooks should never call anything that takes a lock,
    so they are unable to call getrawmonotonic() or ktime_get().

    Export the rest of the tracing clock functions so can be used in
    tracepoint hooks.

    Background: We have a customer that adds their own module and registers
    a tracepoint hook to sched_wakeup. They were using ktime_get() for a
    time source, but it grabs a seq lock and caused a deadlock to occur.

    Link: http://lkml.kernel.org/r/1430406624-22609-1-git-send-email-jsnitsel@redhat.com

    Signed-off-by: Jerry Snitselaar
    Signed-off-by: Steven Rostedt

    Jerry Snitselaar
     

21 Jul, 2014

1 commit

  • The "uptime" trace clock added in:

    commit 8aacf017b065a805d27467843490c976835eb4a5
    tracing: Add "uptime" trace clock that uses jiffies

    has wraparound problems when the system has been up more
    than 1 hour 11 minutes and 34 seconds. It converts jiffies
    to nanoseconds using:
    (u64)jiffies_to_usecs(jiffy) * 1000ULL
    but since jiffies_to_usecs() only returns a 32-bit value, it
    truncates at 2^32 microseconds. An additional problem on 32-bit
    systems is that the argument is "unsigned long", so fixing the
    return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
    system).

    Avoid these problems by using jiffies_64 as our basis, and
    not converting to nanoseconds (we do convert to clock_t because
    user facing API must not be dependent on internal kernel
    HZ values).

    Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com

    Cc: stable@vger.kernel.org # 3.10+
    Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies"
    Signed-off-by: Tony Luck
    Signed-off-by: Steven Rostedt

    Tony Luck
     

15 Mar, 2013

1 commit

  • Add a simple trace clock called "uptime" for those that are
    interested in the uptime of the trace. It uses jiffies as that's
    the safest method, as other uptime clocks grab seq locks, which could
    cause a deadlock if taken from an event or function tracer.

    Requested-by: Mauro Carvalho Chehab
    Cc: Thomas Gleixner
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

20 Feb, 2013

1 commit

  • Pull perf changes from Ingo Molnar:
    "There are lots of improvements, the biggest changes are:

    Main kernel side changes:

    - Improve uprobes performance by adding 'pre-filtering' support, by
    Oleg Nesterov.

    - Make some POWER7 events available in sysfs, equivalent to what was
    done on x86, from Sukadev Bhattiprolu.

    - tracing updates by Steve Rostedt - mostly misc fixes and smaller
    improvements.

    - Use perf/event tracing to report PCI Express advanced errors, by
    Tony Luck.

    - Enable northbridge performance counters on AMD family 15h, by Jacob
    Shin.

    - This tracing commit:

    tracing: Remove the extra 4 bytes of padding in events

    changes the ABI. All involved parties (PowerTop in particular)
    seem to agree that it's safe to do now with the introduction of
    libtraceevent, but the devil is in the details ...

    Main tooling side changes:

    - Add 'event group view', from Namyung Kim:

    To use it, 'perf record' should group events when recording. And
    then perf report parses the saved group relation from file header
    and prints them together if --group option is provided. You can
    use the 'perf evlist' command to see event group information:

    $ perf record -e '{ref-cycles,cycles}' noploop 1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]

    $ perf evlist --group
    {ref-cycles,cycles}

    With this example, default perf report will show you each event
    separately.

    You can use --group option to enable event group view:

    $ perf report --group
    ...
    # group: {ref-cycles,cycles}
    # ========
    # Samples: 7K of event 'anon group { ref-cycles, cycles }'
    # Event count (approx.): 6876107743
    #
    # Overhead Command Shared Object Symbol
    # ................ ....... ................. ..........................
    99.84% 99.76% noploop noploop [.] main
    0.07% 0.00% noploop ld-2.15.so [.] strcmp
    0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del
    0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu
    0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time
    0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask
    0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe
    0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock
    0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page
    0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks
    0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time

    As you can see the Overhead column now contains both of ref-cycles
    and cycles and header line shows group information also - 'anon
    group { ref-cycles, cycles }'. The output is sorted by period of
    group leader first.

    - Initial GTK+ annotate browser, from Namhyung Kim.

    - Add option for runtime switching perf data file in perf report,
    just press 's' and a menu with the valid files found in the current
    directory will be presented, from Feng Tang.

    - Add support to display whole group data for raw columns, from Jiri
    Olsa.

    - Add per processor socket count aggregation in perf stat, from
    Stephane Eranian.

    - Add interval printing in 'perf stat', from Stephane Eranian.

    - 'perf test' improvements

    - Add support for wildcards in tracepoint system name, from Jiri
    Olsa.

    - Add anonymous huge page recognition, from Joshua Zhu.

    - perf build-id cache now can show DSOs present in a perf.data file
    that are not in the cache, to integrate with build-id servers being
    put in place by organizations such as Fedora.

    - perf top now shares more of the evsel config/creation routines with
    'record', paving the way for further integration like 'top'
    snapshots, etc.

    - perf top now supports DWARF callchains.

    - Fix mmap limitations on 32-bit, fix from David Miller.

    - 'perf bench numa mem' NUMA performance measurement suite

    - ... and lots of fixes, performance improvements, cleanups and other
    improvements I failed to list - see the shortlog and git log for
    details."

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits)
    perf/x86/amd: Enable northbridge performance counters on AMD family 15h
    perf/hwbp: Fix cleanup in case of kzalloc failure
    perf tools: Fix build with bison 2.3 and older.
    perf tools: Limit unwind support to x86 archs
    perf annotate: Make it to be able to skip unannotatable symbols
    perf gtk/annotate: Fail early if it can't annotate
    perf gtk/annotate: Show source lines with gray color
    perf gtk/annotate: Support multiple event annotation
    perf ui/gtk: Implement basic GTK2 annotation browser
    perf annotate: Fix warning message on a missing vmlinux
    perf buildid-cache: Add --update option
    uprobes/perf: Avoid uprobe_apply() whenever possible
    uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE
    uprobes/perf: Teach trace_uprobe/perf code to pre-filter
    uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's
    uprobes: Introduce uprobe_apply()
    perf: Introduce hw_perf_event->tp_target and ->tp_list
    uprobes/perf: Always increment trace_uprobe->nhit
    uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe
    uprobes/tracing: Introduce is_trace_uprobe_enabled()
    ...

    Linus Torvalds
     

31 Jan, 2013

1 commit

  • For systems with an unstable sched_clock, all cpu_clock() does is enable/
    disable local irq during the call to sched_clock_cpu(). And for stable
    systems they are same.

    trace_clock_global() already disables interrupts, so it can call
    sched_clock_cpu() directly.

    Link: http://lkml.kernel.org/r/1356576585-28782-2-git-send-email-namhyung@kernel.org

    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Namhyung Kim
     

23 Jan, 2013

1 commit


09 Jan, 2013

1 commit

  • The rcutorture tests need to be able to trace the time of the
    beginning of an RCU read-side critical section, and thus need access
    to trace_clock_local(). This commit therefore adds a the needed
    EXPORT_SYMBOL_GPL().

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

19 Sep, 2011

1 commit

  • When debugging tight race conditions, it can be helpful to have a
    synchronized tracing method. Although in most cases the global clock
    provides this functionality, if timings is not the issue, it is more
    comforting to know that the order of events really happened in a precise
    order.

    Instead of using a clock, add a "counter" that is simply an incrementing
    atomic 64bit counter that orders the events as they are perceived to
    happen.

    The trace_clock_counter() is added from the attempt by Peter Zijlstra
    trying to convert the trace_clock_global() to it. I took Peter's counter
    code and made trace_clock_counter() instead, and added it to the choice
    of clocks. Just echo counter > /debug/tracing/trace_clock to activate
    it.

    Requested-by: Thomas Gleixner
    Requested-by: Peter Zijlstra
    Reviewed-By: Valdis Kletnieks
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

31 Mar, 2011

1 commit


07 Aug, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
    sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
    sched: No need for bootmem special cases
    sched: Revert nohz_ratelimit() for now
    sched: Reduce update_group_power() calls
    sched: Update rq->clock for nohz balanced cpus
    sched: Fix spelling of sibling
    sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
    sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
    sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
    sched: thread_group_cputime: Simplify, document the "alive" check
    sched: Remove the obsolete exit_state/signal hacks
    sched: task_tick_rt: Remove the obsolete ->signal != NULL check
    sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
    sched: Fix comments to make them DocBook happy
    sched: Fix fix_small_capacity
    powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
    powerpc: Enable asymmetric SMT scheduling on POWER7
    sched: Add asymmetric group packing option for sibling domain
    sched: Fix capacity calculations for SMT4
    sched: Change nohz idle load balancing logic to push model
    ...

    Linus Torvalds
     

09 Jun, 2010

1 commit

  • For people who otherwise get to write: cpu_clock(smp_processor_id()),
    there is now: local_clock().

    Also, as per suggestion from Andrew, provide some documentation on
    the various clock interfaces, and minimize the unsigned long long vs
    u64 mess.

    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jens Axboe
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

04 Jun, 2010

1 commit

  • The ftrace_preempt_disable/enable functions were to address a
    recursive race caused by the function tracer. The function tracer
    traces all functions which makes it easily susceptible to recursion.
    One area was preempt_enable(). This would call the scheduler and
    the schedulre would call the function tracer and loop.
    (So was it thought).

    The ftrace_preempt_disable/enable was made to protect against recursion
    inside the scheduler by storing the NEED_RESCHED flag. If it was
    set before the ftrace_preempt_disable() it would not call schedule
    on ftrace_preempt_enable(), thinking that if it was set before then
    it would have already scheduled unless it was already in the scheduler.

    This worked fine except in the case of SMP, where another task would set
    the NEED_RESCHED flag for a task on another CPU, and then kick off an
    IPI to trigger it. This could cause the NEED_RESCHED to be saved at
    ftrace_preempt_disable() but the IPI to arrive in the the preempt
    disabled section. The ftrace_preempt_enable() would not call the scheduler
    because the flag was already set before entring the section.

    This bug would cause a missed preemption check and cause lower latencies.

    Investigating further, I found that the recusion caused by the function
    tracer was not due to schedule(), but due to preempt_schedule(). Now
    that preempt_schedule is completely annotated with notrace, the recusion
    no longer is an issue.

    Reported-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

30 Mar, 2010

1 commit

  • # echo 1 > events/enable
    # echo global > trace_clock

    ------------[ cut here ]------------
    WARNING: at kernel/lockdep.c:3162 check_flags+0xb2/0x190()
    ...
    ---[ end trace 3f86734a89416623 ]---
    possible reason: unannotated irqs-on.
    ...

    There's no reason to use the raw_local_irq_save() in trace_clock_global.
    The local_irq_save() version is fine, and does not cause the bug in lockdep.

    Acked-by: Peter Zijlstra
    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

01 Mar, 2010

1 commit

  • trace_clock.c includes spinlock.h, which ends up including
    asm/system.h, which in turn includes linux/irqflags.h in x86.

    So the definition of raw_local_irq_save is luckily covered there,
    but this is not the case in parisc:

    tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save'
    tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore'

    We need to include linux/irqflags.h directly from trace_clock.c
    to avoid such build error.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Robert Richter
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

15 Dec, 2009

3 commits


12 Nov, 2009

1 commit

  • Disabling interrupts in trace_clock_local takes quite a performance
    hit to the recording of traces. Using perf top we see:

    ------------------------------------------------------------------------------
    PerfTop: 244 irqs/sec kernel:100.0% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
    ------------------------------------------------------------------------------

    samples pcnt kernel function
    _______ _____ _______________

    2842.00 - 40.4% : trace_clock_local
    1043.00 - 14.8% : rb_reserve_next_event
    784.00 - 11.1% : ring_buffer_lock_reserve
    600.00 - 8.5% : __rb_reserve_next
    579.00 - 8.2% : rb_end_commit
    440.00 - 6.3% : ring_buffer_unlock_commit
    290.00 - 4.1% : ring_buffer_producer_thread [ring_buffer_benchmark]
    155.00 - 2.2% : debug_smp_processor_id
    117.00 - 1.7% : trace_recursive_unlock
    103.00 - 1.5% : ring_buffer_event_data
    28.00 - 0.4% : do_gettimeofday
    22.00 - 0.3% : _spin_unlock_irq
    14.00 - 0.2% : native_read_tsc
    11.00 - 0.2% : getnstimeofday

    Where trace_clock_local is 40% of the tracing, and the time for recording
    a trace according to ring_buffer_benchmark is 210ns. After converting
    the interrupts to preemption disabling we have from perf top:

    ------------------------------------------------------------------------------
    PerfTop: 1084 irqs/sec kernel:99.9% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
    ------------------------------------------------------------------------------

    samples pcnt kernel function
    _______ _____ _______________

    1277.00 - 16.8% : native_read_tsc
    1148.00 - 15.1% : rb_reserve_next_event
    896.00 - 11.8% : ring_buffer_lock_reserve
    688.00 - 9.1% : __rb_reserve_next
    664.00 - 8.8% : rb_end_commit
    563.00 - 7.4% : ring_buffer_unlock_commit
    508.00 - 6.7% : _spin_unlock_irq
    365.00 - 4.8% : debug_smp_processor_id
    321.00 - 4.2% : trace_clock_local
    303.00 - 4.0% : ring_buffer_producer_thread [ring_buffer_benchmark]
    273.00 - 3.6% : native_sched_clock
    122.00 - 1.6% : trace_recursive_unlock
    113.00 - 1.5% : sched_clock
    101.00 - 1.3% : ring_buffer_event_data
    53.00 - 0.7% : tick_nohz_stop_sched_tick

    Where trace_clock_local drops from 40% to only taking 4% of the total time.
    The trace time also goes from 210ns down to 179ns (31ns).

    I talked with Peter Zijlstra about the impact that sched_clock may have
    without having interrupts disabled, and he told me that if a timer interrupt
    comes in, sched_clock may report a wrong time.

    Balancing a seldom incorrect timestamp with a 15% performance boost, I'll
    take the performance boost.

    Acked-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

16 Sep, 2009

1 commit


23 Mar, 2009

1 commit

  • Impact: cleanup.

    This patch fixes the following sparse warnings:

    kernel/trace/trace.c:385:9: warning: symbol 'trace_seq_to_buffer' was
    not declared. Should it be static?

    kernel/trace/trace_clock.c:29:13: warning: symbol 'trace_clock_local'
    was not declared. Should it be static?

    kernel/trace/trace_clock.c:54:13: warning: symbol 'trace_clock' was not
    declared. Should it be static?

    kernel/trace/trace_clock.c:74:13: warning: symbol 'trace_clock_global'
    was not declared. Should it be static?

    Signed-off-by: Dmitri Vorobiev
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Dmitri Vorobiev
     

11 Mar, 2009

1 commit

  • Using the function_graph tracer in recent kernels generates a spew of
    preemption BUGs. Fix this by not requiring trace_clock_local() users
    to disable preemption themselves.

    Signed-off-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Feb, 2009

1 commit

  • Impact: implement new tracing timestamp APIs

    Add three trace clock variants, with differing scalability/precision
    tradeoffs:

    - local: CPU-local trace clock
    - medium: scalable global clock with some jitter
    - global: globally monotonic, serialized clock

    Make the ring-buffer use the local trace clock internally.

    Acked-by: Peter Zijlstra
    Acked-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Ingo Molnar