19 Sep, 2020

1 commit

  • Current tracing_init_dentry() return a d_entry pointer, while is not
    necessary. This function returns NULL on success or error on failure,
    which means there is no valid d_entry pointer return.

    Let's return 0 on success and negative value for error.

    Link: https://lkml.kernel.org/r/20200712011036.70948-5-richard.weiyang@linux.alibaba.com

    Signed-off-by: Wei Yang
    Signed-off-by: Steven Rostedt (VMware)

    Wei Yang
     

20 Mar, 2020

1 commit

  • When the ring buffer was first created, the iterator followed the normal
    producer/consumer operations where it had both a peek() operation, that just
    returned the event at the current location, and a read(), that would return
    the event at the current location and also increment the iterator such that
    the next peek() or read() will return the next event.

    The only use of the ring_buffer_read() is currently to move the iterator to
    the next location and nothing now actually reads the event it returns.
    Rename this function to its actual use case to ring_buffer_iter_advance(),
    which also adds the "iter" part to the name, which is more meaningful. As
    the timestamp returned by ring_buffer_read() was never used, there's no
    reason that this new version should bother having returning it. It will also
    become a void function.

    Link: http://lkml.kernel.org/r/20200317213416.018928618@goodmis.org

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

14 Jan, 2020

2 commits

  • As there's two struct ring_buffers in the kernel, it causes some confusion.
    The other one being the perf ring buffer. It was agreed upon that as neither
    of the ring buffers are generic enough to be used globally, they should be
    renamed as:

    perf's ring_buffer -> perf_buffer
    ftrace's ring_buffer -> trace_buffer

    This implements the changes to the ring buffer that ftrace uses.

    Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • As we are working to remove the generic "ring_buffer" name that is used by
    both tracing and perf, the ring_buffer name for tracing will be renamed to
    trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

    As there already exists a trace_buffer that is used by the trace_arrays, it
    needs to be first renamed to array_buffer.

    Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

31 Jul, 2019

1 commit

  • We already have tested it before. The second one should be removed.
    With this change, the performance should have little improvement.

    Link: http://lkml.kernel.org/r/20190730140850.7927-1-changbin.du@gmail.com

    Cc: stable@vger.kernel.org
    Fixes: 9cd2992f2d6c ("fgraph: Have set_graph_notrace only affect function_graph tracer")
    Signed-off-by: Changbin Du
    Signed-off-by: Steven Rostedt (VMware)

    Changbin Du
     

07 Feb, 2019

2 commits

  • Don't mix context flags with function duration info.

    Instead of this:

    # tracer: wakeup_rt
    #
    # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+
    # --------------------------------------------------------------------
    # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
    # -----------------
    # | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99)
    # -----------------
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| /
    # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
    # | | | | |||| | | | | | |
    0 us | 0) -0 | dNh5 | /* 0:120:R + [000] 11: 0:R migration/0 */
    2 us | 0) -0 | dNh5 0.000 us | (null)();
    4 us | 0) -0 | dNh4 | _raw_spin_unlock() {
    4 us | 0) -0 | dNh4 0.304 us | preempt_count_sub();
    5 us | 0) -0 | dNh3 1.063 us | }
    5 us | 0) -0 | dNh3 0.266 us | ttwu_stat();
    6 us | 0) -0 | dNh3 | _raw_spin_unlock_irqrestore() {
    6 us | 0) -0 | dNh3 0.273 us | preempt_count_sub();
    6 us | 0) -0 | dNh2 0.818 us | }

    Show this:

    # tracer: wakeup
    #
    # wakeup latency trace v1.1.5 on 4.20.0+
    # --------------------------------------------------------------------
    # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
    # -----------------
    # | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0)
    # -----------------
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| /
    # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
    # | | | | |||| | | | | | |
    0 us | 0) -0 | dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */
    3 us | 0) -0 | dNs. | 0.000 us | (null)();
    67 us | 0) -0 | dNs. | 0.721 us | ttwu_stat();
    69 us | 0) -0 | dNs. | 0.607 us | _raw_spin_unlock_irqrestore();
    71 us | 0) -0 | .Ns. | 0.598 us | _raw_spin_lock_irq();
    72 us | 0) -0 | .Ns. | 0.584 us | _raw_spin_lock_irq();
    73 us | 0) -0 | dNs. | + 11.118 us | __next_timer_interrupt();
    75 us | 0) -0 | dNs. | | call_timer_fn() {
    76 us | 0) -0 | dNs. | | delayed_work_timer_fn() {
    76 us | 0) -0 | dNs. | | __queue_work() {
    ...

    Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com

    Signed-off-by: Changbin Du
    Signed-off-by: Steven Rostedt (VMware)

    Changbin Du
     
  • When function_graph is used for latency tracers, relative timestamp
    is more straightforward than absolute timestamp as function trace
    does. This change adds relative timestamp support to function_graph
    and applies to latency tracers (wakeup and irqsoff).

    Instead of:

    # tracer: irqsoff
    #
    # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test
    # --------------------------------------------------------------------
    # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
    # -----------------
    # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    # => started at: __schedule
    # => ended at: _raw_spin_unlock_irq
    #
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| /
    # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
    # | | | | |||| | | | | | |
    124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule();
    124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() {
    124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs();
    124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs();
    124.974310 | 2) systemd-693 | d..1 2.301 us | }
    [..]
    124.974826 | 2) -0 | d..2 | finish_task_switch() {
    124.974826 | 2) -0 | d..2 | _raw_spin_unlock_irq() {
    124.974827 | 2) -0 | d..2 0.000 us | _raw_spin_unlock_irq();
    124.974828 | 2) -0 | d..2 0.000 us | tracer_hardirqs_on();
    -0 2d..2 552us :
    => __schedule
    => schedule_idle
    => do_idle
    => cpu_startup_entry
    => start_secondary
    => secondary_startup_64

    Show:

    # tracer: irqsoff
    #
    # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+
    # --------------------------------------------------------------------
    # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
    # -----------------
    # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    # => started at: __schedule
    # => ended at: _raw_spin_unlock_irq
    #
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| /
    # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
    # | | | | |||| | | | | | |
    0 us | 7) sshd-1704 | d..1 0.000 us | __schedule();
    1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() {
    1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs();
    2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs();
    3 us | 7) sshd-1704 | d..1 2.599 us | }
    [..]
    509 us | 7) -0 | d..2 | finish_task_switch() {
    510 us | 7) -0 | d..2 | _raw_spin_unlock_irq() {
    510 us | 7) -0 | d..2 0.000 us | _raw_spin_unlock_irq();
    512 us | 7) -0 | d..2 0.000 us | tracer_hardirqs_on();
    -0 7d..2 543us :
    => __schedule
    => schedule_idle
    => do_idle
    => cpu_startup_entry
    => start_secondary
    => secondary_startup_64

    Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com

    Signed-off-by: Changbin Du
    Signed-off-by: Steven Rostedt (VMware)

    Changbin Du
     

09 Dec, 2018

4 commits

  • Move the function function_graph_ret_addr() to fgraph.c, as the management
    of the curr_ret_stack is going to change, and all the accesses to ret_stack
    needs to be done in fgraph.c.

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Currently the registering of function graph is to pass in a entry and return
    function. We need to have a way to associate those functions together where
    the entry can determine to run the return hook. Having a structure that
    contains both functions will facilitate the process of converting the code
    to be able to do such.

    This is similar to the way function hooks are enabled (it passes in
    ftrace_ops). Instead of passing in the functions to use, a single structure
    is passed in to the registering function.

    The unregister function is now passed in the fgraph_ops handle. When we
    allow more than one callback to the function graph hooks, this will let the
    system know which one to remove.

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • When the function profiler is not configured, the "graph_time" option is
    meaningless, as the function profiler is the only thing that makes use of
    it. Do not expose it if the profiler is not configured.

    Link: http://lkml.kernel.org/r/20181123061133.GA195223@google.com

    Reported-by: Joel Fernandes
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The curr_ret_stack is no longer set to a negative value when a function is
    not to be traced by the function graph tracer. Remove the usage of
    FTRACE_NOTRACE_DEPTH, as it is no longer needed.

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

30 Nov, 2018

4 commits

  • In order to make the function graph infrastructure more generic, there can
    not be code specific for the function_graph tracer in the generic code. This
    includes the set_graph_notrace logic, that stops all graph calls when a
    function in the set_graph_notrace is hit.

    By using the trace_recursion mask, we can use a bit in the current
    task_struct to implement the notrace code, and move the logic out of
    fgraph.c and into trace_functions_graph.c and keeps it affecting only the
    tracer and not all call graph callbacks.

    Acked-by: Namhyung Kim
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • As the function graph infrastructure can be used by thing other than
    tracing, moving the code to its own file out of the trace_functions_graph.c
    code makes more sense.

    The fgraph.c file will only contain the infrastructure required to hook into
    functions and their return code.

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Commit 588ca1786f2dd ("function_graph: Use new curr_ret_depth to manage
    depth instead of curr_ret_stack") removed a parameter from the call
    ftrace_push_return_trace() that made it so that the entire call was under 80
    characters, but it did not remove the line break. There's no reason to break
    that line up, so make it a single line.

    Link: http://lkml.kernel.org/r/20181122100322.GN2131@hirez.programming.kicks-ass.net

    Reported-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The tracefs file set_graph_function is used to only function graph functions
    that are listed in that file (or all functions if the file is empty). The
    way this is implemented is that the function graph tracer looks at every
    function, and if the current depth is zero and the function matches
    something in the file then it will trace that function. When other functions
    are called, the depth will be greater than zero (because the original
    function will be at depth zero), and all functions will be traced where the
    depth is greater than zero.

    The issue is that when a function is first entered, and the handler that
    checks this logic is called, the depth is set to zero. If an interrupt comes
    in and a function in the interrupt handler is traced, its depth will be
    greater than zero and it will automatically be traced, even if the original
    function was not. But because the logic only looks at depth it may trace
    interrupts when it should not be.

    The recent design change of the function graph tracer to fix other bugs
    caused the depth to be zero while the function graph callback handler is
    being called for a longer time, widening the race of this happening. This
    bug was actually there for a longer time, but because the race window was so
    small it seldom happened. The Fixes tag below is for the commit that widen
    the race window, because that commit belongs to a series that will also help
    fix the original bug.

    Cc: stable@kernel.org
    Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
    Reported-by: Joe Lawrence
    Tested-by: Joe Lawrence
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

28 Nov, 2018

4 commits

  • The function graph profiler uses the ret_stack to store the "subtime" and
    reuse it by nested functions and also on the return. But the current logic
    has the profiler callback called before the ret_stack is updated, and it is
    just modifying the ret_stack that will later be allocated (it's just lucky
    that the "subtime" is not touched when it is allocated).

    This could also cause a crash if we are at the end of the ret_stack when
    this happens.

    By reversing the order of the allocating the ret_stack and then calling the
    callbacks attached to a function being traced, the ret_stack entry is no
    longer used before it is allocated.

    Cc: stable@kernel.org
    Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • In the past, curr_ret_stack had two functions. One was to denote the depth
    of the call graph, the other is to keep track of where on the ret_stack the
    data is used. Although they may be slightly related, there are two cases
    where they need to be used differently.

    The one case is that it keeps the ret_stack data from being corrupted by an
    interrupt coming in and overwriting the data still in use. The other is just
    to know where the depth of the stack currently is.

    The function profiler uses the ret_stack to save a "subtime" variable that
    is part of the data on the ret_stack. If curr_ret_stack is modified too
    early, then this variable can be corrupted.

    The "max_depth" option, when set to 1, will record the first functions going
    into the kernel. To see all top functions (when dealing with timings), the
    depth variable needs to be lowered before calling the return hook. But by
    lowering the curr_ret_stack, it makes the data on the ret_stack still being
    used by the return hook susceptible to being overwritten.

    Now that there's two variables to handle both cases (curr_ret_depth), we can
    move them to the locations where they can handle both cases.

    Cc: stable@kernel.org
    Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Currently, the depth of the ret_stack is determined by curr_ret_stack index.
    The issue is that there's a race between setting of the curr_ret_stack and
    calling of the callback attached to the return of the function.

    Commit 03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling
    trace return callback") moved the calling of the callback to after the
    setting of the curr_ret_stack, even stating that it was safe to do so, when
    in fact, it was the reason there was a barrier() there (yes, I should have
    commented that barrier()).

    Not only does the curr_ret_stack keep track of the current call graph depth,
    it also keeps the ret_stack content from being overwritten by new data.

    The function profiler, uses the "subtime" variable of ret_stack structure
    and by moving the curr_ret_stack, it allows for interrupts to use the same
    structure it was using, corrupting the data, and breaking the profiler.

    To fix this, there needs to be two variables to handle the call stack depth
    and the pointer to where the ret_stack is being used, as they need to change
    at two different locations.

    Cc: stable@kernel.org
    Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • As all architectures now call function_graph_enter() to do the entry work,
    no architecture should ever call ftrace_push_return_trace(). Make it static.

    This is needed to prepare for a fix of a design bug on how the curr_ret_stack
    is used.

    Cc: stable@kernel.org
    Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

27 Nov, 2018

1 commit

  • Currently all the architectures do basically the same thing in preparing the
    function graph tracer on entry to a function. This code can be pulled into a
    generic location and then this will allow the function graph tracer to be
    fixed, as well as extended.

    Create a new function graph helper function_graph_enter() that will call the
    hook function (ftrace_graph_entry) and the shadow stack operation
    (ftrace_push_return_trace), and remove the need of the architecture code to
    manage the shadow stack.

    This is needed to prepare for a fix of a design bug on how the curr_ret_stack
    is used.

    Cc: stable@kernel.org
    Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

04 Jul, 2018

1 commit

  • The function_graph tracer does not show the interrupt return marker for the
    leaf entry. On leaf entries, we see an unbalanced interrupt marker (the
    interrupt was entered, but nevern left).

    Before:
    1) | SyS_write() {
    1) | __fdget_pos() {
    1) 0.061 us | __fget_light();
    1) 0.289 us | }
    1) | vfs_write() {
    1) 0.049 us | rw_verify_area();
    1) + 15.424 us | __vfs_write();
    1) ==========> |
    1) 6.003 us | smp_apic_timer_interrupt();
    1) 0.055 us | __fsnotify_parent();
    1) 0.073 us | fsnotify();
    1) + 23.665 us | }
    1) + 24.501 us | }

    After:
    0) | SyS_write() {
    0) | __fdget_pos() {
    0) 0.052 us | __fget_light();
    0) 0.328 us | }
    0) | vfs_write() {
    0) 0.057 us | rw_verify_area();
    0) | __vfs_write() {
    0) ==========> |
    0) 8.548 us | smp_apic_timer_interrupt();
    0)
    Signed-off-by: Steven Rostedt (VMware)

    Changbin Du
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 Sep, 2017

1 commit

  • First, number of CPUs can't be negative number.

    Second, different signnnedness leads to suboptimal code in the following
    cases:

    1)
    kmalloc(nr_cpu_ids * sizeof(X));

    "int" has to be sign extended to size_t.

    2)
    while (loff_t *pos < nr_cpu_ids)

    MOVSXD is 1 byte longed than the same MOV.

    Other cases exist as well. Basically compiler is told that nr_cpu_ids
    can't be negative which can't be deduced if it is "int".

    Code savings on allyesconfig kernel: -3KB

    add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
    function old new delta
    coretemp_cpu_online 450 512 +62
    rcu_init_one 1234 1272 +38
    pci_device_probe 374 399 +25

    ...

    pgdat_reclaimable_pages 628 556 -72
    select_fallback_rq 446 369 -77
    task_numa_find_cpu 1923 1807 -116

    Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

09 Dec, 2016

2 commits

  • Currently both the wakeup and irqsoff traces do not handle set_graph_notrace
    well. The ftrace infrastructure will ignore the return paths of all
    functions leaving them hanging without an end:

    # echo '*spin*' > set_graph_notrace
    # cat trace
    [...]
    _raw_spin_lock() {
    preempt_count_add() {
    do_raw_spin_lock() {
    update_rq_clock();

    Where the '*spin*' functions should have looked like this:

    _raw_spin_lock() {
    preempt_count_add();
    do_raw_spin_lock();
    }
    update_rq_clock();

    Instead, have the wakeup and irqsoff tracers ignore the functions that are
    set by the set_graph_notrace like the function_graph tracer does. Move
    the logic in the function_graph tracer into a header to allow wakeup and
    irqsoff tracers to use it as well.

    Cc: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Both the wakeup and irqsoff tracers can use the function graph tracer when
    the display-graph option is set. The problem is that they ignore the notrace
    file, and record the entry of functions that would be ignored by the
    function_graph tracer. This causes the trace->depth to be recorded into the
    ring buffer. The set_graph_notrace uses a trick by adding a large negative
    number to the trace->depth when a graph function is to be ignored.

    On trace output, the graph function uses the depth to record a stack of
    functions. But since the depth is negative, it accesses the array with a
    negative number and causes an out of bounds access that can cause a kernel
    oops or corrupt data.

    Have the print functions handle cases where a tracer still records functions
    even when they are in set_graph_notrace.

    Also add warnings if the depth is below zero before accessing the array.

    Note, the function graph logic will still prevent the return of these
    functions from being recorded, which means that they will be left hanging
    without a return. For example:

    # echo '*spin*' > set_graph_notrace
    # echo 1 > options/display-graph
    # echo wakeup > current_tracer
    # cat trace
    [...]
    _raw_spin_lock() {
    preempt_count_add() {
    do_raw_spin_lock() {
    update_rq_clock();

    Where it should look like:

    _raw_spin_lock() {
    preempt_count_add();
    do_raw_spin_lock();
    }
    update_rq_clock();

    Cc: stable@vger.kernel.org
    Cc: Namhyung Kim
    Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter")
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

24 Nov, 2016

1 commit

  • The function __buffer_unlock_commit() is called in a few places outside of
    trace.c. But for the most part, it should really be inlined, as it is in the
    hot path of the trace_events. For the callers outside of trace.c, create a
    new function trace_buffer_unlock_commit_nostack(), as the reason it was used
    was to avoid the stack tracing that trace_buffer_unlock_commit() could do.

    Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

    Reported-by: Andi Kleen
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

07 Oct, 2016

1 commit

  • Pull tracing updates from Steven Rostedt:
    "This release cycle is rather small. Just a few fixes to tracing.

    The big change is the addition of the hwlat tracer. It not only
    detects SMIs, but also other latency that's caused by the hardware. I
    have detected some latency from large boxes having bus contention"

    * tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Call traceoff trigger after event is recorded
    ftrace/scripts: Add helper script to bisect function tracing problem functions
    tracing: Have max_latency be defined for HWLAT_TRACER as well
    tracing: Add NMI tracing in hwlat detector
    tracing: Have hwlat trace migrate across tracing_cpumask CPUs
    tracing: Add documentation for hwlat_detector tracer
    tracing: Added hardware latency tracer
    ftrace: Access ret_stack->subtime only in the function profiler
    function_graph: Handle TRACE_BPUTS in print_graph_comment
    tracing/uprobe: Drop isdigit() check in create_trace_uprobe

    Linus Torvalds
     

02 Sep, 2016

1 commit

  • The subtime is used only for function profiler with function graph
    tracer enabled. Move the definition of subtime under
    CONFIG_FUNCTION_PROFILER to reduce the memory usage. Also move the
    initialization of subtime into the graph entry callback.

    Link: http://lkml.kernel.org/r/20160831025529.24018-1-namhyung@kernel.org

    Cc: Ingo Molnar
    Cc: Josh Poimboeuf
    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Namhyung Kim
     

01 Sep, 2016

1 commit

  • It missed to handle TRACE_BPUTS so messages recorded by trace_bputs()
    will be shown with symbol info unnecessarily.

    You can see it with the trace_printk sample code:

    # cd /sys/kernel/tracing/
    # echo sys_sync > set_graph_function
    # echo 1 > options/sym-offset
    # echo function_graph > current_tracer

    Note that the sys_sync filter was there to prevent recording other
    functions and the sym-offset option was needed since the first message
    was called from a module init function so kallsyms doesn't have the
    symbol and omitted in the output.

    # cd ~/build/kernel
    # insmod samples/trace_printk/trace-printk.ko

    # cd -
    # head trace

    Before:

    # tracer: function_graph
    #
    # CPU DURATION FUNCTION CALLS
    # | | | | | | |
    1) | /* 0xffffffffa0002000: This is a static string that will use trace_bputs */
    1) | /* This is a dynamic string that will use trace_puts */
    1) | /* trace_printk_irq_work+0x5/0x7b [trace_printk]: (irq) This is a static string that will use trace_bputs */
    1) | /* (irq) This is a dynamic string that will use trace_puts */
    1) | /* (irq) This is a static string that will use trace_bprintk() */
    1) | /* (irq) This is a dynamic string that will use trace_printk */

    After:

    # tracer: function_graph
    #
    # CPU DURATION FUNCTION CALLS
    # | | | | | | |
    1) | /* This is a static string that will use trace_bputs */
    1) | /* This is a dynamic string that will use trace_puts */
    1) | /* (irq) This is a static string that will use trace_bputs */
    1) | /* (irq) This is a dynamic string that will use trace_puts */
    1) | /* (irq) This is a static string that will use trace_bprintk() */
    1) | /* (irq) This is a dynamic string that will use trace_printk */

    Link: http://lkml.kernel.org/r/20160901024354.13720-1-namhyung@kernel.org

    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Namhyung Kim
     

24 Aug, 2016

4 commits

  • When function graph tracing is enabled for a function, ftrace modifies
    the stack by replacing the original return address with the address of a
    hook function (return_to_handler).

    Stack unwinders need a way to get the original return address. Add an
    arch-independent helper function for that named ftrace_graph_ret_addr().

    This adds two variations of the function: one depends on
    HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and the other relies on an index state
    variable.

    The former is recommended because, in some cases, the latter can cause
    problems when the unwinder skips stack frames. It can get out of sync
    with the ret_stack index and wrong addresses can be reported for the
    stack trace.

    Once all arches have been ported to use
    HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, we can get rid of the distinction.

    Signed-off-by: Josh Poimboeuf
    Acked-by: Steven Rostedt
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Byungchul Park
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Nilay Vaish
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/36bd90f762fc5e5af3929e3797a68a64906421cf.1471607358.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • Storing this value will help prevent unwinders from getting out of sync
    with the function graph tracer ret_stack. Now instead of needing a
    stateful iterator, they can compare the return address pointer to find
    the right ret_stack entry.

    Note that an array of 50 ftrace_ret_stack structs is allocated for every
    task. So when an arch implements this, it will add either 200 or 400
    bytes of memory usage per task (depending on whether it's a 32-bit or
    64-bit platform).

    Signed-off-by: Josh Poimboeuf
    Acked-by: Steven Rostedt
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Byungchul Park
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Nilay Vaish
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • This saves some memory when HAVE_FUNCTION_GRAPH_FP_TEST isn't defined.
    On x86_64 with newer versions of gcc which have -mfentry, it saves 400
    bytes per task.

    Signed-off-by: Josh Poimboeuf
    Acked-by: Steven Rostedt
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Byungchul Park
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Nilay Vaish
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/5c7747d9ea7b5cb47ef0a8ce8a6cea6bf7aa94bf.1471607358.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
    kconfig. This removes some config file pollution and simplifies the
    checking for the fp test.

    Suggested-by: Steven Rostedt
    Signed-off-by: Josh Poimboeuf
    Acked-by: Steven Rostedt
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Byungchul Park
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Nilay Vaish
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/2c4e5f05054d6d367f702fd153af7a0109dd5c81.1471607358.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

28 Jun, 2016

1 commit

  • Function graph tracer currently ignores filters if tracing_thresh is set.
    For example, even if set_ftrace_pid is set, then its ignored if tracing_thresh
    set, resulting in all processes being traced.

    To fix this, we reuse the same entry function as when tracing_thresh is not
    set and do everything as in the regular case except for writing the function entry
    to the ring buffer.

    Link: http://lkml.kernel.org/r/1466228694-2677-1-git-send-email-agnel.joel@gmail.com

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Signed-off-by: Joel Fernandes
    Signed-off-by: Steven Rostedt

    Joel Fernandes
     

20 Jun, 2016

1 commit


26 Mar, 2016

1 commit

  • KASAN needs to know whether the allocation happens in an IRQ handler.
    This lets us strip everything below the IRQ entry point to reduce the
    number of unique stack traces needed to be stored.

    Move the definition of __irq_entry to so that the
    users don't need to pull in . Also introduce the
    __softirq_entry macro which is similar to __irq_entry, but puts the
    corresponding functions to the .softirqentry.text section.

    Signed-off-by: Alexander Potapenko
    Acked-by: Steven Rostedt
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

23 Mar, 2016

1 commit

  • Use the more common logging method with the eventual goal of removing
    pr_warning altogether.

    Miscellanea:

    - Realign arguments
    - Coalesce formats
    - Add missing space between a few coalesced formats

    Signed-off-by: Joe Perches
    Acked-by: Rafael J. Wysocki [kernel/power/suspend.c]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

08 Nov, 2015

1 commit

  • Since the ring buffer is lockless, there is no need to disable ftrace on
    CPU. And no one doing so: after commit 68179686ac67cb ("tracing: Remove
    ftrace_disable/enable_cpu()") ftrace_cpu_disabled stays the same after
    initialization, nothing changes it.
    ftrace_cpu_disabled shouldn't be used by any external module since it
    disables only function and graph_function tracers but not any other
    tracer.

    Link: http://lkml.kernel.org/r/1446836846-22239-1-git-send-email-0x7f454c46@gmail.com

    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steven Rostedt

    Dmitry Safonov
     

01 Oct, 2015

2 commits

  • In preparation to make trace options per instance, the global trace_flags
    needs to be moved from being a global variable to a field within the trace
    instance trace_array structure.

    There's still more work to do, as there's some functions that use
    trace_flags without passing in a way to get to the current_trace array. For
    those, the global_trace is used directly (from trace.c). This includes
    setting and clearing the trace_flags. This means that when a new instance is
    created, it just gets the trace_flags of the global_trace and will not be
    able to modify them. Depending on the functions that have access to the
    trace_array, the flags of an instance may not affect parts of its trace,
    where the global_trace is used. These will be fixed in future changes.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • The sleep-time and graph-time options are only for the function graph tracer
    and are not used by anything else. As tracer options are now visible when
    the tracer is not activated, its better to move the function graph specific
    tracer options into the function graph tracer.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)