15 Feb, 2017

1 commit

  • We get a lot of harmless warnings about this header file at W=1 level
    because of an unusual function declaration:

    kernel/trace/trace.h:766:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]

    This moves the inline statement where it normally belongs, avoiding the
    warning.

    Link: http://lkml.kernel.org/r/20170123122521.3389010-1-arnd@arndb.de

    Fixes: 4046bf023b06 ("ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Steven Rostedt (VMware)

    Arnd Bergmann
     

11 Feb, 2017

1 commit


21 Jan, 2017

2 commits

  • Use ftrace_hash instead of a static array of a fixed size. This is
    useful when a graph filter pattern matches to a large number of
    functions. Now hash lookup is done with preemption disabled to protect
    from the hash being changed/freed.

    Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.org

    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt (VMware)

    Namhyung Kim
     
  • It will be used when checking graph filter hashes later.

    Link: http://lkml.kernel.org/r/20170120024447.26097-2-namhyung@kernel.org

    Signed-off-by: Namhyung Kim
    [ Moved ftrace_hash dec and functions outside of FUNCTION_GRAPH define ]
    Signed-off-by: Steven Rostedt (VMware)

    Namhyung Kim
     

25 Dec, 2016

1 commit


09 Dec, 2016

1 commit

  • Currently both the wakeup and irqsoff traces do not handle set_graph_notrace
    well. The ftrace infrastructure will ignore the return paths of all
    functions leaving them hanging without an end:

    # echo '*spin*' > set_graph_notrace
    # cat trace
    [...]
    _raw_spin_lock() {
    preempt_count_add() {
    do_raw_spin_lock() {
    update_rq_clock();

    Where the '*spin*' functions should have looked like this:

    _raw_spin_lock() {
    preempt_count_add();
    do_raw_spin_lock();
    }
    update_rq_clock();

    Instead, have the wakeup and irqsoff tracers ignore the functions that are
    set by the set_graph_notrace like the function_graph tracer does. Move
    the logic in the function_graph tracer into a header to allow wakeup and
    irqsoff tracers to use it as well.

    Cc: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

24 Nov, 2016

1 commit

  • The function __buffer_unlock_commit() is called in a few places outside of
    trace.c. But for the most part, it should really be inlined, as it is in the
    hot path of the trace_events. For the callers outside of trace.c, create a
    new function trace_buffer_unlock_commit_nostack(), as the reason it was used
    was to avoid the stack tracing that trace_buffer_unlock_commit() could do.

    Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

    Reported-by: Andi Kleen
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

16 Nov, 2016

1 commit

  • A new file is created:

    /sys/kernel/debug/tracing/trace_marker_raw

    This allows for appications to create data structures and write the binary
    data directly into it, and then read the trace data out from trace_pipe_raw
    into the same type of data structure. This saves on converting numbers into
    ASCII that would be required by trace_marker.

    Suggested-by: Olof Johansson
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

15 Nov, 2016

1 commit

  • Use glob_match() to support flexible glob wildcards (*,?)
    and character classes ([) for ftrace.
    Since the full glob matching is slower than the current
    partial matching routines(*pat, pat*, *pat*), this leaves
    those routines and just add MATCH_GLOB for complex glob
    expression.

    e.g.
    ----
    [root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
    [root@localhost tracing]# cat set_ftrace_filter
    sched_free_group
    sched_change_group
    sched_create_group
    sched_online_group
    sched_destroy_group
    sched_offline_group
    [root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
    [root@localhost tracing]# head set_ftrace_filter
    sys_arch_prctl
    sys_rt_sigreturn
    sys_ioperm
    SyS_iopl
    sys_modify_ldt
    SyS_mmap
    SyS_set_thread_area
    SyS_get_thread_area
    SyS_set_tid_address
    sys_fork
    ----

    Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devbox

    Acked-by: Namhyung Kim
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     

12 Sep, 2016

1 commit


03 Sep, 2016

1 commit

  • The hardware latency tracer has been in the PREEMPT_RT patch for some time.
    It is used to detect possible SMIs or any other hardware interruptions that
    the kernel is unaware of. Note, NMIs may also be detected, but that may be
    good to note as well.

    The logic is pretty simple. It simply creates a thread that spins on a
    single CPU for a specified amount of time (width) within a periodic window
    (window). These numbers may be adjusted by their cooresponding names in

    /sys/kernel/tracing/hwlat_detector/

    The defaults are window = 1000000 us (1 second)
    width = 500000 us (1/2 second)

    The loop consists of:

    t1 = trace_clock_local();
    t2 = trace_clock_local();

    Where trace_clock_local() is a variant of sched_clock().

    The difference of t2 - t1 is recorded as the "inner" timestamp and also the
    timestamp t1 - prev_t2 is recorded as the "outer" timestamp. If either of
    these differences are greater than the time denoted in
    /sys/kernel/tracing/tracing_thresh then it records the event.

    When this tracer is started, and tracing_thresh is zero, it changes to the
    default threshold of 10 us.

    The hwlat tracer in the PREEMPT_RT patch was originally written by
    Jon Masters. I have modified it quite a bit and turned it into a
    tracer.

    Based-on-code-by: Jon Masters
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

06 Jul, 2016

1 commit

  • Currently ftrace_graph_ent{,_entry} and ftrace_graph_ret{,_entry} struct
    can have padding bytes at the end due to alignment in 64-bit data type.
    As these data are recorded so frequently, those paddings waste
    non-negligible space. As the ring buffer maintains alignment properly
    for each architecture, just to remove the extra padding using 'packed'
    attribute.

    ftrace_graph_ent_entry: 24 -> 20
    ftrace_graph_ret_entry: 48 -> 44

    Also I moved the 'overrun' field in struct ftrace_graph_ret to minimize
    the padding in the middle.

    Tested on x86_64 only.

    Link: http://lkml.kernel.org/r/1467197808-13578-1-git-send-email-namhyung@kernel.org

    Cc: Ingo Molnar
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Namhyung Kim
     

05 Jul, 2016

1 commit

  • Commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events
    do") placed ftrace_init_tracefs into the instance creation, and encapsulated
    the top level updating with an if conditional, as the top level only gets
    updated at boot up. Unfortunately, this triggers section mismatch errors as
    the init functions are called from a function that can be called later, and
    the section mismatch logic is unaware of the if conditional that would
    prevent it from happening at run time.

    To make everyone happy, create a separate ftrace_init_tracefs_toplevel()
    routine that only gets called by init functions, and this will be what calls
    other init functions for the toplevel directory.

    Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.home

    Reported-by: kbuild test robot
    Reported-by: Arnd Bergmann
    Fixes: 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

20 Jun, 2016

4 commits


04 May, 2016

1 commit

  • Filtering of events requires the data to be written to the ring buffer
    before it can be decided to filter or not. This is because the parameters of
    the filter are based on the result that is written to the ring buffer and
    not on the parameters that are passed into the trace functions.

    The ftrace ring buffer is optimized for writing into the ring buffer and
    committing. The discard procedure used when filtering decides the event
    should be discarded is much more heavy weight. Thus, using a temporary
    filter when filtering events can speed things up drastically.

    Without a temp buffer we have:

    # trace-cmd start -p nop
    # perf stat -r 10 hackbench 50
    0.790706626 seconds time elapsed ( +- 0.71% )

    # trace-cmd start -e all
    # perf stat -r 10 hackbench 50
    1.566904059 seconds time elapsed ( +- 0.27% )

    # trace-cmd start -e all -f 'common_preempt_count==20'
    # perf stat -r 10 hackbench 50
    1.690598511 seconds time elapsed ( +- 0.19% )

    # trace-cmd start -e all -f 'common_preempt_count!=20'
    # perf stat -r 10 hackbench 50
    1.707486364 seconds time elapsed ( +- 0.30% )

    The first run above is without any tracing, just to get a based figure.
    hackbench takes ~0.79 seconds to run on the system.

    The second run enables tracing all events where nothing is filtered. This
    increases the time by 100% and hackbench takes 1.57 seconds to run.

    The third run filters all events where the preempt count will equal "20"
    (this should never happen) thus all events are discarded. This takes 1.69
    seconds to run. This is 10% slower than just committing the events!

    The last run enables all events and filters where the filter will commit all
    events, and this takes 1.70 seconds to run. The filtering overhead is
    approximately 10%. Thus, the discard and commit of an event from the ring
    buffer may be about the same time.

    With this patch, the numbers change:

    # trace-cmd start -p nop
    # perf stat -r 10 hackbench 50
    0.778233033 seconds time elapsed ( +- 0.38% )

    # trace-cmd start -e all
    # perf stat -r 10 hackbench 50
    1.582102692 seconds time elapsed ( +- 0.28% )

    # trace-cmd start -e all -f 'common_preempt_count==20'
    # perf stat -r 10 hackbench 50
    1.309230710 seconds time elapsed ( +- 0.22% )

    # trace-cmd start -e all -f 'common_preempt_count!=20'
    # perf stat -r 10 hackbench 50
    1.786001924 seconds time elapsed ( +- 0.20% )

    The first run is again the base with no tracing.

    The second run is all tracing with no filtering. It is a little slower, but
    that may be well within the noise.

    The third run shows that discarding all events only took 1.3 seconds. This
    is a speed up of 23%! The discard is much faster than even the commit.

    The one downside is shown in the last run. Events that are not discarded by
    the filter will take longer to add, this is due to the extra copy of the
    event.

    Cc: Alexei Starovoitov
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

30 Apr, 2016

3 commits


27 Apr, 2016

2 commits


20 Apr, 2016

3 commits

  • Named triggers are sets of triggers that share a common set of trigger
    data. An example of functionality that could benefit from this type
    of capability would be a set of inlined probes that would each
    contribute event counts, for example, to a shared counter data
    structure.

    The first named trigger registered with a given name owns the common
    trigger data that the others subsequently registered with the same
    name will reference. The functions defined here allow users to add,
    delete, and find named triggers.

    It also adds functions to pause and unpause named triggers; since
    named triggers act upon common data, they should also be paused and
    unpaused as a group.

    Link: http://lkml.kernel.org/r/c09ff648360f65b10a3e321eddafe18060b4a04f.1457029949.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Similar to enable_event/disable_event triggers, these triggers enable
    and disable the aggregation of events into maps rather than enabling
    and disabling their writing into the trace buffer.

    They can be used to automatically start and stop hist triggers based
    on a matching filter condition.

    If there's a paused hist trigger on system:event, the following would
    start it when the filter condition was hit:

    # echo enable_hist:system:event [ if filter] > event/trigger

    And the following would disable a running system:event hist trigger:

    # echo disable_hist:system:event [ if filter] > event/trigger

    See Documentation/trace/events.txt for real examples.

    Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • 'hist' triggers allow users to continually aggregate trace events,
    which can then be viewed afterwards by simply reading a 'hist' file
    containing the aggregation in a human-readable format.

    The basic idea is very simple and boils down to a mechanism whereby
    trace events, rather than being exhaustively dumped in raw form and
    viewed directly, are automatically 'compressed' into meaningful tables
    completely defined by the user.

    This is done strictly via single-line command-line commands and
    without the aid of any kind of programming language or interpreter.

    A surprising number of typical use cases can be accomplished by users
    via this simple mechanism. In fact, a large number of the tasks that
    users typically do using the more complicated script-based tracing
    tools, at least during the initial stages of an investigation, can be
    accomplished by simply specifying a set of keys and values to be used
    in the creation of a hash table.

    The Linux kernel trace event subsystem happens to provide an extensive
    list of keys and values ready-made for such a purpose in the form of
    the event format files associated with each trace event. By simply
    consulting the format file for field names of interest and by plugging
    them into the hist trigger command, users can create an endless number
    of useful aggregations to help with investigating various properties
    of the system. See Documentation/trace/events.txt for examples.

    hist triggers are implemented on top of the existing event trigger
    infrastructure, and as such are consistent with the existing triggers
    from a user's perspective as well.

    The basic syntax follows the existing trigger syntax. Users start an
    aggregation by writing a 'hist' trigger to the event of interest's
    trigger file:

    # echo hist:keys=xxx [ if filter] > event/trigger

    Once a hist trigger has been set up, by default it continually
    aggregates every matching event into a hash table using the event key
    and a value field named 'hitcount'.

    To view the aggregation at any point in time, simply read the 'hist'
    file in the same directory as the 'trigger' file:

    # cat event/hist

    The detailed syntax provides additional options for user control, and
    is described exhaustively in Documentation/trace/events.txt and in the
    virtual tracing/README file in the tracing subsystem.

    Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     

19 Apr, 2016

2 commits

  • Add the infrastructure needed to have the PIDs in set_event_pid to
    automatically add PIDs of the children of the tasks that have their PIDs in
    set_event_pid. This will also remove PIDs from set_event_pid when a task
    exits

    This is implemented by adding hooks into the fork and exit tracepoints. On
    fork, the PIDs are added to the list, and on exit, they are removed.

    Add a new option called event_fork that when set, PIDs in set_event_pid will
    automatically get their children PIDs added when they fork, as well as any
    task that exits will have its PID removed from set_event_pid.

    This works for instances as well.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • In order to add the ability to let tasks that are filtered by the events
    have their children also be traced on fork (and then not traced on exit),
    convert the array into a pid bitmask. Most of the time the number of pids is
    only 32768 pids or a 4k bitmask, which is the same size as the default list
    currently is, and that list could grow if more pids are listed.

    This also greatly simplifies the code.

    Suggested-by: "H. Peter Anvin"
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

23 Mar, 2016

1 commit

  • The latency tracer format has a nice column to indicate IRQ state, but
    this is not able to tell us about NMI state.

    When tracing perf interrupt handlers (which often run in NMI context)
    it is very useful to see how the events nest.

    Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Steven Rostedt

    Peter Zijlstra
     

09 Mar, 2016

9 commits

  • gcc isn't known for handling bool in structures. Instead of using bool, use
    an integer mask and use bit flags instead.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Add a new unreg_all() callback that can be used to remove all
    command-specific triggers from an event and arrange to have it called
    whenever a trigger file is opened with O_TRUNC set.

    Commands that don't want truncate semantics, or existing commands that
    don't implement this function simply do nothing and their triggers
    remain intact.

    Link: http://lkml.kernel.org/r/2b7d62854d01f28c19185e1bbb8f826f385edfba.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Add a new needs_rec flag for triggers that require unconditional
    access to trace records in order to function.

    Normally a trigger requires access to the contents of a trace record
    only if it has a filter associated with it (since filters need the
    contents of a record in order to make a filtering decision). Some
    types of triggers, such as 'hist' triggers, require access to trace
    record contents independent of the presence of filters, so add a new
    flag for those triggers.

    Link: http://lkml.kernel.org/r/7be8fa38f9b90fdb6c47ca0f98d20a07b9fd512b.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Add a simple per-trigger 'paused' flag, allowing individual triggers
    to pause. We could leave it to individual triggers that need this
    functionality to do it themselves, but we also want to allow other
    events to control pausing, so add it to the trigger data.

    Link: http://lkml.kernel.org/r/fed37e4879684d7dcc57fe00ce0cbf170032b06d.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Add a utility function to grab the syscall name from the syscall
    metadata, given a syscall id.

    Link: http://lkml.kernel.org/r/be26a8dfe3f15e16a837799f1c1e2b4d62742843.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Some triggers may need access to the trace event, so pass it in. Also
    fix up the existing trigger funcs and their callers.

    Link: http://lkml.kernel.org/r/543e31e9fc445ef61077421ab219033401c39846.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Make various event trigger utility functions available outside of
    trace_events_trigger.c so that new triggers can be defined outside of
    that file.

    Link: http://lkml.kernel.org/r/4a40c1695dd43cac6cd475d72e13ffe30ba84bff.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • Make is_string_field() and is_function_field() accessible outside of
    trace_event_filters.c for other users of ftrace_event_fields.

    Link: http://lkml.kernel.org/r/2d3f00d3311702e556e82eed7754bae6f017939f.1449767187.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Reviewed-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • When I was updating the ftrace_stress test of ltp. I encountered
    a strange phenomemon, excute following steps:

    echo nop > /sys/kernel/debug/tracing/current_tracer
    echo 0 > /sys/kernel/debug/tracing/options/funcgraph-cpu
    bash: echo: write error: Invalid argument

    check dmesg:
    [ 1024.903855] nop_test_refuse flag set to 0: we refuse.Now cat trace_options to see the result

    The reason is that the trace option test will randomly setup trace
    option under tracing/options no matter what the current_tracer is.
    but the set_tracer_option is always using the set_flag callback
    from the current_tracer. This patch adds a pointer to tracer_flags
    and make it point to the tracer it belongs to. When the option is
    setup, the set_flag of the right tracer will be used no matter
    what the the current_tracer is.

    And the old dummy_tracer_flags is used for all the tracers which
    doesn't have a tracer_flags, having issue to use it to save the
    pointer of a tracer. So remove it and use dynamic dummy tracer_flags
    for tracers needing a dummy tracer_flags, as a result, there are no
    tracers sharing tracer_flags, so remove the check code.

    And save the current tracer to trace_option_dentry seems not good as
    it may waste mem space when mount the debug/trace fs more than one time.

    Link: http://lkml.kernel.org/r/1457444222-8654-1-git-send-email-chuhu@redhat.com

    Signed-off-by: Chunyu Hu
    [ Fixed up function tracer options to work with the change ]
    Signed-off-by: Steven Rostedt

    Chunyu Hu
     

24 Dec, 2015

2 commits

  • The file tracing_enable is obsolete and does not exist anymore. Replace
    the comment that references it with the proper tracing_on file.

    Link: http://lkml.kernel.org/r/1450787141-45544-1-git-send-email-chuhu@redhat.com

    Signed-off-by: Chuyu Hu
    Signed-off-by: Steven Rostedt

    Chuyu Hu
     
  • Currently perf has its own list function within the ftrace infrastructure
    that seems to be used only to allow for it to have per-cpu disabling as well
    as a check to make sure that it's not called while RCU is not watching. It
    uses something called the "control_ops" which is used to iterate over ops
    under it with the control_list_func().

    The problem is that this control_ops and control_list_func unnecessarily
    complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags
    (FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code
    that is special with the control ops and add the needed checks within the
    generic ftrace_list_func().

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)