20 Nov, 2015

1 commit


23 Apr, 2015

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "This adds three fixes for the tracing code.

    The first is a bug when ftrace_dump_on_oops is triggered in atomic
    context and function graph tracer is the tracer that is being
    reported.

    The second fix is bad parsing of the trace_events from the kernel
    command line, where it would ignore specific events if the system name
    is used with defining the event(it enables all events within the
    system).

    The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was
    missing to see if the ptr was incremented to the end of the string,
    but the loop increments it again and can miss the nul delimiter to
    stop processing"

    * tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix possible out of bounds memory access when parsing enums
    tracing: Fix incorrect enabling of trace events by boot cmdline
    tracing: Handle ftrace_dump() atomic context in graph_trace_open()

    Linus Torvalds
     

16 Apr, 2015

1 commit

  • graph_trace_open() can be called in atomic context from ftrace_dump().
    Use GFP_ATOMIC for the memory allocations when that's the case, in order
    to avoid the following splat.

    BUG: sleeping function called from invalid context at mm/slab.c:2849
    in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0
    Backtrace:
    ..
    [] (__might_sleep) from [] (kmem_cache_alloc_trace+0x160/0x238)
    r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0
    [] (kmem_cache_alloc_trace) from [] (graph_trace_open+0x30/0xd0)
    r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8
    r4:810d16f0
    [] (graph_trace_open) from [] (trace_init_global_iter+0x50/0x9c)
    r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30
    [] (trace_init_global_iter) from [] (ftrace_dump+0x90/0x2ec)
    r4:810d2580 r3:00000000
    [] (ftrace_dump) from [] (sysrq_ftrace_dump+0x1c/0x20)
    r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a
    r4:808d5394
    [] (sysrq_ftrace_dump) from [] (return_to_handler+0x0/0x18)
    [] (__handle_sysrq) from [] (return_to_handler+0x0/0x18)
    r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210
    [] (handle_sysrq) from [] (return_to_handler+0x0/0x18)
    [] (pl011_int) from [] (return_to_handler+0x0/0x18)
    r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60
    r4:8454fe00
    [] (handle_irq_event_percpu) from [] (handle_irq_event+0x4c/0x6c)
    r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60
    r4:84f01e00
    [] (handle_irq_event) from [] (handle_fasteoi_irq+0xf0/0x1ac)
    r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000
    [] (handle_fasteoi_irq) from [] (generic_handle_irq+0x3c/0x4c)
    r6:00000026 r5:00000000 r4:00000026 r3:8007a938
    [] (generic_handle_irq) from [] (__handle_domain_irq+0x8c/0xfc)
    r4:808c1e38 r3:0000002e
    [] (__handle_domain_irq) from [] (gic_handle_irq+0x34/0x6c)
    r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015
    r4:8880210c r3:808c7ef0
    [] (gic_handle_irq) from [] (__irq_svc+0x44/0x7c)

    Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in
    Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in

    Cc: stable@vger.kernel.org # 3.13+
    Signed-off-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Rabin Vincent
     

04 Feb, 2015

1 commit

  • debugfs was fine for the tracing facility as a quick way to get
    an interface. Now that tracing has matured, it should separate itself
    from debugfs such that it can be mounted separately without needing
    to mount all of debugfs with it. That is, users resist using tracing
    because it requires mounting debugfs. Having tracing have its own file
    system lets users get the features of tracing without needing to bring
    in the rest of the kernel's debug infrastructure.

    Another reason for tracefs is that debubfs does not support mkdir.
    Currently, to create instances, one does a mkdir in the tracing/instance
    directory. This is implemented via a hack that forces debugfs to do
    something it is not intended on doing. By converting over to tracefs, this
    hack can be removed and mkdir can be properly implemented. This patch does
    not address this yet, but it lays the ground work for that to be done.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

23 Jan, 2015

1 commit


11 Dec, 2014

1 commit

  • Pull nmi-safe seq_buf printk update from Steven Rostedt:
    "This code is a fork from the trace-3.19 pull as it needed the
    trace_seq clean ups from that branch.

    This code solves the issue of performing stack dumps from NMI context.
    The issue is that printk() is not safe from NMI context as if the NMI
    were to trigger when a printk() was being performed, the NMI could
    deadlock from the printk() internal locks. This has been seen in
    practice.

    With lots of review from Petr Mladek, this code went through several
    iterations, and we feel that it is now at a point of quality to be
    accepted into mainline.

    Here's what is contained in this patch set:

    - Creates a "seq_buf" generic buffer utility that allows a descriptor
    to be passed around where functions can write their own "printk()"
    formatted strings into it. The generic version was pulled out of
    the trace_seq() code that was made specifically for tracing.

    - The seq_buf code was change to model the seq_file code. I have a
    patch (not included for 3.19) that converts the seq_file.c code
    over to use seq_buf.c like the trace_seq.c code does. This was
    done to make sure that seq_buf.c is compatible with seq_file.c. I
    may try to get that patch in for 3.20.

    - The seq_buf.c file was moved to lib/ to remove it from being
    dependent on CONFIG_TRACING.

    - The printk() was updated to allow for a per_cpu "override" of the
    internal calls. That is, instead of writing to the console, a call
    to printk() may do something else. This made it easier to allow
    the NMI to change what printk() does in order to call dump_stack()
    without needing to update that code as well.

    - Finally, the dump_stack from all CPUs via NMI code was converted to
    use the seq_buf code. The caller to trigger the NMI code would
    wait till all the NMIs finished, and then it would print the
    seq_buf data to the console safely from a non NMI context

    One added bonus is that this code also makes the NMI dump stack work
    on PREEMPT_RT kernels. As printk() includes sleeping locks on
    PREEMPT_RT, printk() only writes to console if the console does not
    use any rt_mutex converted spin locks. Which a lot do"

    * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    x86/nmi: Fix use of unallocated cpumask_var_t
    printk/percpu: Define printk_func when printk is not defined
    x86/nmi: Perform a safe NMI stack trace on all CPUs
    printk: Add per_cpu printk func to allow printk to be diverted
    seq_buf: Move the seq_buf code to lib/
    seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
    tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
    tracing: Have seq_buf use full buffer
    seq_buf: Add seq_buf_can_fit() helper function
    tracing: Add paranoid size check in trace_printk_seq()
    tracing: Use trace_seq_used() and seq_buf_used() instead of len
    tracing: Clean up tracing_fill_pipe_page()
    seq_buf: Create seq_buf_used() to find out how much was written
    tracing: Add a seq_buf_clear() helper and clear len and readpos in init
    tracing: Convert seq_buf fields to be like seq_file fields
    tracing: Convert seq_buf_path() to be like seq_path()
    tracing: Create seq_buf layer in trace_seq

    Linus Torvalds
     

04 Dec, 2014

1 commit

  • Currently, function graph tracer prints "!" or "+" just before
    function execution time to signal a function overhead, depending
    on the time. And some tracers tracing latency also print "!" or
    "+" just after time to signal overhead, depending on the interval
    between events. Even it is usually enough to do that, we sometimes
    need to signal for bigger execution time than 100 micro seconds.

    For example, I used function graph tracer to detect if there is
    any case that exit_mm() takes too much time. I did following steps
    in /sys/kernel/debug/tracing. It was easier to detect very large
    excution time with patched kernel than with original kernel.

    $ echo exit_mm > set_graph_function
    $ echo function_graph > current_tracer
    $ echo > trace
    $ cat trace_pipe > $LOGFILE
    ... (do something and terminate logging)
    $ grep "\\$" $LOGFILE
    3) $ 22082032 us | } /* kernel_map_pages */
    3) $ 22082040 us | } /* free_pages_prepare */
    3) $ 22082113 us | } /* free_hot_cold_page */
    3) $ 22083455 us | } /* free_hot_cold_page_list */
    3) $ 22083895 us | } /* release_pages */
    3) $ 22177873 us | } /* free_pages_and_swap_cache */
    3) $ 22178929 us | } /* unmap_single_vma */
    3) $ 22198885 us | } /* unmap_vmas */
    3) $ 22206949 us | } /* exit_mmap */
    3) $ 22207659 us | } /* mmput */
    3) $ 22207793 us | } /* exit_mm */

    And then, it was easy to find out that a schedule-out occured by
    sub_preempt_count() within kernel_map_pages().

    To detect very large function exection time caused by either problematic
    function implementation or scheduling issues, this patch can be useful.

    Link: http://lkml.kernel.org/r/1416789259-24038-1-git-send-email-byungchul.park@lge.com

    Signed-off-by: Byungchul Park
    Signed-off-by: Steven Rostedt

    Byungchul Park
     

20 Nov, 2014

3 commits

  • As the seq_buf->len will soon be +1 size when there's an overflow, we
    must use trace_seq_used() or seq_buf_used() methods to get the real
    length. This will prevent buffer overflow issues if just the len
    of the seq_buf descriptor is used to copy memory.

    Link: http://lkml.kernel.org/r/20141114121911.09ba3d38@gandalf.local.home

    Reported-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Create a seq_buf layer that trace_seq sits on. The seq_buf will not
    be limited to page size. This will allow other usages of seq_buf
    instead of a hard set PAGE_SIZE one that trace_seq has.

    Link: http://lkml.kernel.org/r/20141104160221.864997179@goodmis.org
    Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org

    Tested-by: Jiri Kosina
    Acked-by: Jiri Kosina
    Reviewed-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Instead of doing individual checks all over the place that makes the code
    very messy. Just check trace_seq_has_overflowed() at the end or in
    strategic places.

    This makes the code much cleaner and also helps with getting closer
    to removing the return values of trace_seq_printf() and friends.

    Link: http://lkml.kernel.org/r/20141114011410.987913836@goodmis.org

    Reviewed-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

14 Nov, 2014

4 commits

  • Usually, "msecs" notation means milli-seconds, and "usecs" notation
    means micro-seconds. Since the unit used in the code is micro-seconds,
    the notation should be replaced from msecs to usecs.

    Link: http://lkml.kernel.org/r/1415171926-9782-2-git-send-email-byungchul.park@lge.com

    Signed-off-by: Byungchul Park
    Signed-off-by: Steven Rostedt

    Byungchul Park
     
  • On the function_graph tracer, the print_graph_irq() function prints a
    trace line with the flag ==========> on an irq handler entry, and the
    flag |
    0) d... | smp_apic_timer_interrupt() {

    This patch fixes this issue by printing the latency-format flags when
    it is enable.

    Link: http://lkml.kernel.org/r/7c2e226dac20c940b6242178fab7f0e3c9b5ce58.1415233316.git.bristot@redhat.com

    Reviewed-by: Luis Claudio R. Goncalves
    Signed-off-by: Daniel Bristot de Oliveira
    Signed-off-by: Steven Rostedt

    Daniel Bristot de Oliveira
     
  • Printing a single character to a seqfile might as well be done with
    seq_putc instead of seq_puts; this avoids a strlen() call and a memory
    access. It also shaves another few bytes off the generated code.

    Link: http://lkml.kernel.org/r/1415479332-25944-4-git-send-email-linux@rasmusvillemoes.dk

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Steven Rostedt

    Rasmus Villemoes
     
  • Using seq_printf to print a simple string or a single character is a
    lot more expensive than it needs to be, since seq_puts and seq_putc
    exist.

    These patches do

    seq_printf(m, s) -> seq_puts(m, s)
    seq_printf(m, "%s", s) -> seq_puts(m, s)
    seq_printf(m, "%c", c) -> seq_putc(m, c)

    Subsequent patches will simplify further.

    Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Steven Rostedt

    Rasmus Villemoes
     

19 Jul, 2014

3 commits


17 Jul, 2014

1 commit

  • ftrace_stop() is going away as it disables parts of function tracing
    that affects users that should not be affected. But ftrace_graph_stop()
    is built on ftrace_stop(). Here's another example of killing all of
    function tracing because something went wrong with function graph
    tracing.

    Instead of disabling all users of function tracing on function graph
    error, disable only function graph tracing.

    A new function is created called ftrace_graph_is_dead(). This is called
    in strategic paths to prevent function graph from doing more harm and
    allowing at least a warning to be printed before the system crashes.

    NOTE: ftrace_stop() is still used until all the archs are converted over
    to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed.

    Reviewed-by: Masami Hiramatsu
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

21 May, 2014

2 commits

  • In the function-graph tracer, add a funcgraph_tail option
    to print the function name on all } lines, not just
    functions whose first line is no longer in the trace
    buffer.

    If a function calls other traced functions, its total
    time appears on its } line. This change allows grep
    to be used to determine the function for which the
    line corresponds.

    Update Documentation/trace/ftrace.txt to describe
    this new option.

    Link: http://lkml.kernel.org/p/20140520221041.8359.6782.stgit@beardog.cce.hp.com

    Signed-off-by: Robert Elliott
    Signed-off-by: Steven Rostedt

    Robert Elliott
     
  • Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
    in trace_functions_graph.c that are already in
    trace.h.

    Add TRACE_GRAPH_PRINT_IRQS to trace.h, which is
    the only one that is missing.

    Link: http://lkml.kernel.org/p/20140520221031.8359.24733.stgit@beardog.cce.hp.com

    Signed-off-by: Robert Elliott
    Signed-off-by: Steven Rostedt

    Robert Elliott
     

30 Apr, 2014

1 commit

  • Now that the ring buffer has a built in way to wake up readers
    when there's data, using irq_work such that it is safe to do it
    in any context. But it was still using the old "poor man's"
    wait polling that checks every 1/10 of a second to see if it
    should wake up a waiter. This makes the latency for a wake up
    excruciatingly long. No need to do that anymore.

    Completely remove the different wait_poll types from the tracers
    and have them all use the default one now.

    Reported-by: Johannes Berg
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

21 Feb, 2014

1 commit


07 Nov, 2013

1 commit

  • The duration field of print_graph_duration() can also be used
    to do the space filling by passing an enum in it:

    DURATION_FILL_FULL
    DURATION_FILL_START
    DURATION_FILL_END

    The problem is that these are enums and defined as negative,
    but the duration field is unsigned long long. Most archs are
    fine with this but blackfin fails to compile because of it:

    kernel/built-in.o: In function `print_graph_duration':
    kernel/trace/trace_functions_graph.c:782: undefined reference to `__ucmpdi2'

    Overloading a unsigned long long with an signed enum is just
    bad in principle. We can accomplish the same thing by using
    part of the flags field instead.

    Cc: Mike Frysinger
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

06 Nov, 2013

1 commit

  • The trace event filters are still tied to event calls rather than
    event files, which means you don't get what you'd expect when using
    filters in the multibuffer case:

    Before:

    # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 8192
    # mkdir /sys/kernel/debug/tracing/instances/test1
    # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 2048
    # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    bytes_alloc > 2048

    Setting the filter in tracing/instances/test1/events shouldn't affect
    the same event in tracing/events as it does above.

    After:

    # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 8192
    # mkdir /sys/kernel/debug/tracing/instances/test1
    # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
    bytes_alloc > 8192
    # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
    bytes_alloc > 2048

    We'd like to just move the filter directly from ftrace_event_call to
    ftrace_event_file, but there are a couple cases that don't yet have
    multibuffer support and therefore have to continue using the current
    event_call-based filters. For those cases, a new USE_CALL_FILTER bit
    is added to the event_call flags, whose main purpose is to keep the
    old behavior for those cases until they can be updated with
    multibuffer support; at that point, the USE_CALL_FILTER flag (and the
    new associated call_filter_check_discard() function) can go away.

    The multibuffer support also made filter_current_check_discard()
    redundant, so this change removes that function as well and replaces
    it with filter_check_discard() (or call_filter_check_discard() as
    appropriate).

    Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     

19 Oct, 2013

1 commit

  • The set_graph_notrace filter is analogous to set_ftrace_notrace and
    can be used for eliminating uninteresting part of function graph trace
    output. It also works with set_graph_function nicely.

    # cd /sys/kernel/debug/tracing/
    # echo do_page_fault > set_graph_function
    # perf ftrace live true
    2) | do_page_fault() {
    2) | __do_page_fault() {
    2) 0.381 us | down_read_trylock();
    2) 0.055 us | __might_sleep();
    2) 0.696 us | find_vma();
    2) | handle_mm_fault() {
    2) | handle_pte_fault() {
    2) | __do_fault() {
    2) | filemap_fault() {
    2) | find_get_page() {
    2) 0.033 us | __rcu_read_lock();
    2) 0.035 us | __rcu_read_unlock();
    2) 1.696 us | }
    2) 0.031 us | __might_sleep();
    2) 2.831 us | }
    2) | _raw_spin_lock() {
    2) 0.046 us | add_preempt_count();
    2) 0.841 us | }
    2) 0.033 us | page_add_file_rmap();
    2) | _raw_spin_unlock() {
    2) 0.057 us | sub_preempt_count();
    2) 0.568 us | }
    2) | unlock_page() {
    2) 0.084 us | page_waitqueue();
    2) 0.126 us | __wake_up_bit();
    2) 1.117 us | }
    2) 7.729 us | }
    2) 8.397 us | }
    2) 8.956 us | }
    2) 0.085 us | up_read();
    2) + 12.745 us | }
    2) + 13.401 us | }
    ...

    # echo handle_mm_fault > set_graph_notrace
    # perf ftrace live true
    1) | do_page_fault() {
    1) | __do_page_fault() {
    1) 0.205 us | down_read_trylock();
    1) 0.041 us | __might_sleep();
    1) 0.344 us | find_vma();
    1) 0.069 us | up_read();
    1) 4.692 us | }
    1) 5.311 us | }
    ...

    Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org

    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Namhyung Kim
     

19 Jul, 2013

2 commits

  • The selftest for function and function graph tracers are defined as
    __init, as they are only executed at boot up. The "tracer" structs
    that are associated to those tracers are not setup as __init as they
    are used after boot. To stop mismatch warnings, those structures
    need to be annotated with __ref_data.

    Currently, the tracer structures are defined to __read_mostly, as they
    do not really change. But in the future they should be converted to
    consts, but that will take a little work because they have a "next"
    pointer that gets updated when they are registered. That will have to
    wait till the next major release.

    Link: http://lkml.kernel.org/r/1373596735.17876.84.camel@gandalf.local.home

    Reported-by: kbuild test robot
    Reported-by: Chen Gang
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • For string without format specifiers, use trace_seq_puts()
    or trace_seq_putc().

    Link: http://lkml.kernel.org/r/51E3B3AC.1000605@huawei.com

    Signed-off-by: zhangwei(Jovi)
    [ fixed a trace_seq_putc(s, " ") to trace_seq_putc(s, ' ') ]
    Signed-off-by: Steven Rostedt

    zhangwei(Jovi)
     

15 Mar, 2013

2 commits

  • Currently, the way the latency tracers and snapshot feature works
    is to have a separate trace_array called "max_tr" that holds the
    snapshot buffer. For latency tracers, this snapshot buffer is used
    to swap the running buffer with this buffer to save the current max
    latency.

    The only items needed for the max_tr is really just a copy of the buffer
    itself, the per_cpu data pointers, the time_start timestamp that states
    when the max latency was triggered, and the cpu that the max latency
    was triggered on. All other fields in trace_array are unused by the
    max_tr, making the max_tr mostly bloat.

    This change removes the max_tr completely, and adds a new structure
    called trace_buffer, that holds the buffer pointer, the per_cpu data
    pointers, the time_start timestamp, and the cpu where the latency occurred.

    The trace_array, now has two trace_buffers, one for the normal trace and
    one for the max trace or snapshot. By doing this, not only do we remove
    the bloat from the max_trace but the instances of traces can now use
    their own snapshot feature and not have just the top level global_trace have
    the snapshot feature and latency tracers for itself.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • The global and max-tr currently use static per_cpu arrays for the CPU data
    descriptors. But in order to get new allocated trace_arrays, they need to
    be allocated per_cpu arrays. Instead of using the static arrays, switch
    the global and max-tr to use allocated data.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

30 Jan, 2013

1 commit

  • While debugging the virtual cputime with the function graph tracer
    with a max_depth of 1 (most common use of the max_depth so far),
    I found that I was missing kernel execution because of a race condition.

    The code for the return side of the function has a slight race:

    ftrace_pop_return_trace(&trace, &ret, frame_pointer);
    trace.rettime = trace_clock_local();
    ftrace_graph_return(&trace);
    barrier();
    current->curr_ret_stack--;

    The ftrace_pop_return_trace() initializes the trace structure for
    the callback. The ftrace_graph_return() uses the trace structure
    for its own use as that structure is on the stack and is local
    to this function. Then the curr_ret_stack is decremented which
    is what the trace.depth is set to.

    If an interrupt comes in after the ftrace_graph_return() but
    before the curr_ret_stack, then the called function will get
    a depth of 2. If max_depth is set to 1 this function will be
    ignored.

    The problem is that the trace has already been called, and the
    timestamp for that trace will not reflect the time the function
    was about to re-enter userspace. Calls to the interrupt will not
    be traced because the max_depth has prevented this.

    To solve this issue, the ftrace_graph_return() can safely be
    moved after the current->curr_ret_stack has been updated.
    This way the timestamp for the return callback will reflect
    the actual time.

    If an interrupt comes in after the curr_ret_stack update and
    ftrace_graph_return(), it will be traced. It may look a little
    confusing to see it within the other function, but at least
    it will not be lost.

    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

22 Jan, 2013

1 commit

  • Add the file max_graph_depth to the debug tracing directory that lets
    the user define the depth of the function graph.

    A very useful operation is to set the depth to 1. Then it traces only
    the first function that is called when entering the kernel. This can
    be used to determine what system operations interrupt a process.

    For example, to work on NOHZ processes (single tasks running without
    a timer tick), if any interrupt goes off and preempts that task, this
    code will show it happening.

    # cd /sys/kernel/debug/tracing
    # echo 1 > max_graph_depth
    # echo function_graph > current_tracer
    # cat per_cpu/cpu//trace

    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

01 Nov, 2012

2 commits

  • Whenever an event is registered, the comm of tasks are saved at
    every task switch instead of saving them at every event. But if
    an event isn't executed much, the comm cache will be filled up
    by tasks that did not record the event and you lose out on the comms
    that did.

    Here's an example, if you enable the following events:

    echo 1 > /debug/tracing/events/kvm/kvm_cr/enable
    echo 1 > /debug/tracing/events/net/net_dev_xmit/enable

    Note, there's no kvm running on this machine so the first event will
    never be triggered, but because it is enabled, the storing of comms
    will continue. If we now disable the network event:

    echo 0 > /debug/tracing/events/net/net_dev_xmit/enable

    and look at the trace:

    cat /debug/tracing/trace
    sshd-2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    sshd-2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
    sshd-2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
    sshd-2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
    sshd-2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0

    We see that process 2672 which triggered the events has the comm "sshd".
    But if we run hackbench for a bit and look again:

    cat /debug/tracing/trace
    -2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
    -2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
    -2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
    -2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
    -2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0

    The stored "sshd" comm has been flushed out and we get a useless "".

    But by only storing comms after a trace event occurred, we can run
    hackbench all day and still get the same output.

    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • There's times during debugging that it is helpful to see traces of early
    boot functions. But the tracers are initialized at device_initcall()
    which is quite late during the boot process. Setting the kernel command
    line parameter ftrace=function will not show anything until the function
    tracer is initialized. This prevents being able to trace functions before
    device_initcall().

    There's no reason that the tracers need to be initialized so late in the
    boot process. Move them up to core_initcall() as they still need to come
    after early_initcall() which initializes the tracing buffers.

    Cc: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

23 Aug, 2012

1 commit

  • The function graph has a test to check if the frame pointer is
    corrupted, which can happen with various options of gcc with mcount.
    But this is not an issue with -mfentry as -mfentry does not need nor use
    frame pointers for function graph tracing.

    Link: http://lkml.kernel.org/r/20120807194059.773895870@goodmis.org

    Acked-by: H. Peter Anvin
    Acked-by: Ingo Molnar
    Cc: Andi Kleen
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

29 Jun, 2012

1 commit

  • Replace the NR_CPUS array of buffer_iter from the trace_iterator
    with an allocated array. This will just create an array of
    possible CPUS instead of the max number specified.

    The use of NR_CPUS in that array caused allocation failures for
    machines that were tight on memory. This did not cause any failures
    to the system itself (no crashes), but caused unnecessary failures
    for reading the trace files.

    Added a helper function called 'trace_buffer_iter()' that returns
    the buffer_iter item or NULL if it is not defined or the array was
    not allocated. Some routines do not require the array
    (tracing_open_pipe() for one).

    Reported-by: Dave Jones
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Jul, 2011

1 commit

  • If a function is set to be traced by the set_graph_function, but the
    option funcgraph-irqs is zero, and the traced function happens to be
    called from a interrupt, it will not be traced.

    The point of funcgraph-irqs is to not trace interrupts when we are
    preempted by an irq, not to not trace functions we want to trace that
    happen to be *in* a irq.

    Luckily the current->trace_recursion element is perfect to add a flag
    to help us be able to trace functions within an interrupt even when
    we are not tracing interrupts that preempt the trace.

    Reported-by: Heiko Carstens
    Tested-by: Heiko Carstens
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

15 Jun, 2011

4 commits

  • The function_graph tracer does not follow global context-info option.
    Adding TRACE_ITER_CONTEXT_INFO trace_flags check to enable it.

    With following commands:
    # echo function_graph > ./current_tracer
    # echo 0 > options/context-info
    # cat trace

    This is what it looked like before:
    # tracer: function_graph
    #
    # TIME CPU DURATION FUNCTION CALLS
    # | | | | | | | |
    1) 0.079 us | } /* __vma_link_rb */
    1) 0.056 us | copy_page_range();
    1) | security_vm_enough_memory() {
    ...

    This is what it looks like now:
    # tracer: function_graph
    #
    } /* update_ts_time_stats */
    timekeeping_max_deferment();
    ...

    Signed-off-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/1307113131-10045-6-git-send-email-jolsa@redhat.com
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • The lock_depth was removed in commit
    e6e1e25 tracing: Remove lock_depth from event entry

    Removing the lock_depth info from function_graph latency header.

    With following commands:
    # echo function_graph > ./current_tracer
    # echo 1 > options/latency-format
    # cat trace

    This is what it looked like before:
    # tracer: function_graph
    #
    # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
    # --------------------------------------------------------------------
    # latency: 0 us, #59756/311298, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
    # -----------------
    # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / _-=> lock-depth
    # |||| /
    # CPU||||| DURATION FUNCTION CALLS
    # | ||||| | | | | | |
    0) .... 0.068 us | } /* __rcu_read_unlock */
    ...

    This is what it looks like now:
    # tracer: function_graph
    #
    # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
    # --------------------------------------------------------------------
    # latency: 0 us, #59747/1744610, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
    # -----------------
    # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| /
    # CPU|||| DURATION FUNCTION CALLS
    # | |||| | | | | | |
    0) ..s. 1.641 us | } /* __rcu_process_callbacks */
    ...

    Signed-off-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/1307113131-10045-5-git-send-email-jolsa@redhat.com
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Functions print_graph_overhead() and print_graph_duration() displays
    data for one field - DURATION.

    I merged them into single function print_graph_duration(),
    and added a way to display the empty parts of the field.

    This way the print_graph_irq() function can use this column to display
    the IRQ signs if needed and the DURATION field details stays inside
    the print_graph_duration() function.

    Signed-off-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/1307113131-10045-3-git-send-email-jolsa@redhat.com
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • The display of absolute time and duration fields is based on the
    latency field. This was added during the irqsoff/wakeup tracers
    graph support changes.

    It's causing confusion in what fields will be displayed for the
    function_graph tracer itself. So I'm removing this depency, and
    adding absolute time and duration fields to the preemptirqsoff
    preemptoff irqsoff wakeup tracers.

    With following commands:
    # echo function_graph > ./current_tracer
    # cat trace

    This is what it looked like before:
    # tracer: function_graph
    #
    # TIME CPU DURATION FUNCTION CALLS
    # | | | | | | | |
    0) 0.068 us | } /* page_add_file_rmap */
    0) | _raw_spin_unlock() {
    ...

    This is what it looks like now:
    # tracer: function_graph
    #
    # CPU DURATION FUNCTION CALLS
    # | | | | | | |
    0) 0.068 us | } /* add_preempt_count */
    0) 0.993 us | } /* vfsmount_lock_local_lock */
    ...

    For preemptirqsoff preemptoff irqsoff wakeup tracers,
    this is what it looked like before:
    SNIP
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / _-=> lock-depth
    # |||| /
    # CPU TASK/PID ||||| DURATION FUNCTION CALLS
    # | | | ||||| | | | | | |
    1) -0 | d..1 0.000 us | acpi_idle_enter_simple();
    ...

    This is what it looks like now:
    SNIP
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| /
    # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
    # | | | | |||| | | | | | |
    19.847735 | 1) -0 | d..1 0.000 us | acpi_idle_enter_simple();
    ...

    Signed-off-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.com
    Signed-off-by: Steven Rostedt

    Jiri Olsa