30 Dec, 2020

1 commit

  • commit 60efe21e5976d3d4170a8190ca76a271d6419754 upstream.

    Disable ftrace selftests when any tracer (kernel command line options
    like ftrace=, trace_events=, kprobe_events=, and boot-time tracing)
    starts running because selftest can disturb it.

    Currently ftrace= and trace_events= are checked, but kprobe_events
    has a different flag, and boot-time tracing didn't checked. This unifies
    the disabled flag and all of those boot-time tracing features sets
    the flag.

    This also fixes warnings on kprobe-event selftest
    (CONFIG_FTRACE_STARTUP_TEST=y and CONFIG_KPROBE_EVENTS=y) with boot-time
    tracing (ftrace.event.kprobes.EVENT.probes) like below;

    [ 59.803496] trace_kprobe: Testing kprobe tracing:
    [ 59.804258] ------------[ cut here ]------------
    [ 59.805682] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:1987 kprobe_trace_self_tests_ib
    [ 59.806944] Modules linked in:
    [ 59.807335] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc7+ #172
    [ 59.808029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/204
    [ 59.808999] RIP: 0010:kprobe_trace_self_tests_init+0x5f/0x42b
    [ 59.809696] Code: e8 03 00 00 48 c7 c7 30 8e 07 82 e8 6d 3c 46 ff 48 c7 c6 00 b2 1a 81 48 c7 c7 7
    [ 59.812439] RSP: 0018:ffffc90000013e78 EFLAGS: 00010282
    [ 59.813038] RAX: 00000000ffffffef RBX: 0000000000000000 RCX: 0000000000049443
    [ 59.813780] RDX: 0000000000049403 RSI: 0000000000049403 RDI: 000000000002deb0
    [ 59.814589] RBP: ffffc90000013e90 R08: 0000000000000001 R09: 0000000000000001
    [ 59.815349] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffef
    [ 59.816138] R13: ffff888004613d80 R14: ffffffff82696940 R15: ffff888004429138
    [ 59.816877] FS: 0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000
    [ 59.817772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 59.818395] CR2: 0000000001a8dd38 CR3: 0000000002222000 CR4: 00000000000006a0
    [ 59.819144] Call Trace:
    [ 59.819469] ? init_kprobe_trace+0x6b/0x6b
    [ 59.819948] do_one_initcall+0x5f/0x300
    [ 59.820392] ? rcu_read_lock_sched_held+0x4f/0x80
    [ 59.820916] kernel_init_freeable+0x22a/0x271
    [ 59.821416] ? rest_init+0x241/0x241
    [ 59.821841] kernel_init+0xe/0x10f
    [ 59.822251] ret_from_fork+0x22/0x30
    [ 59.822683] irq event stamp: 16403349
    [ 59.823121] hardirqs last enabled at (16403359): [] console_unlock+0x48e/0x580
    [ 59.824074] hardirqs last disabled at (16403368): [] console_unlock+0x3f6/0x580
    [ 59.825036] softirqs last enabled at (16403200): [] __do_softirq+0x33a/0x484
    [ 59.825982] softirqs last disabled at (16403087): [] asm_call_irq_on_stack+0x10
    [ 59.827034] ---[ end trace 200c544775cdfeb3 ]---
    [ 59.827635] trace_kprobe: error on probing function entry.

    Link: https://lkml.kernel.org/r/160741764955.3448999.3347769358299456915.stgit@devnote2

    Fixes: 4d655281eb1b ("tracing/boot Add kprobe event support")
    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     

02 Nov, 2020

2 commits

  • When an interrupt or NMI comes in and switches the context, there's a delay
    from when the preempt_count() shows the update. As the preempt_count() is
    used to detect recursion having each context have its own bit get set when
    tracing starts, and if that bit is already set, it is considered a recursion
    and the function exits. But if this happens in that section where context
    has changed but preempt_count() has not been updated, this will be
    incorrectly flagged as a recursion.

    To handle this case, create another bit call TRANSITION and test it if the
    current context bit is already set. Flag the call as a recursion if the
    TRANSITION bit is already set, and if not, set it and continue. The
    TRANSITION bit will be cleared normally on the return of the function that
    set it, or if the current context bit is clear, set it and clear the
    TRANSITION bit to allow for another transition between the current context
    and an even higher one.

    Cc: stable@vger.kernel.org
    Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The code that checks recursion will work to only do the recursion check once
    if there's nested checks. The top one will do the check, the other nested
    checks will see recursion was already checked and return zero for its "bit".
    On the return side, nothing will be done if the "bit" is zero.

    The problem is that zero is returned for the "good" bit when in NMI context.
    This will set the bit for NMIs making it look like *all* NMI tracing is
    recursing, and prevent tracing of anything in NMI context!

    The simple fix is to return "bit + 1" and subtract that bit on the end to
    get the real bit.

    Cc: stable@vger.kernel.org
    Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

26 Oct, 2020

1 commit

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     

16 Oct, 2020

2 commits

  • is_good_name() is useful for other trace infrastructure, such as
    synthetic events, so make it available via trace.h.

    Link: https://lkml.kernel.org/r/cc6d6a2d7da6957fcbe1e2922e76d18d2bb459b4.1602598160.git.zanussi@kernel.org

    Acked-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi
     
  • s/wihin/within/
    s/retrieven/retrieved/
    s/suppport/support/
    s/wil/will/
    s/accidently/accidentally/
    s/if the if the/if the/

    Link: https://lkml.kernel.org/r/20201010140924.3809-1-hqjagain@gmail.com

    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     

26 Sep, 2020

1 commit

  • Initialize per-instance event list in early boot time (before
    initializing instance directory on tracefs). This fixes boot-time
    tracing to correctly handle the boot-time per-instance settings.

    Link: https://lkml.kernel.org/r/160096560826.182763.17110991546046128881.stgit@devnote2

    Fixes: 4114fbfd02f1 ("tracing: Enable creating new instance early boot")
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

22 Sep, 2020

1 commit

  • Enable creating new trace_array instance in early boot stage.
    If the instances directory is not created, postpone it until
    the tracefs is initialized.

    Link: https://lkml.kernel.org/r/159974154763.478751.6289753509587233103.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

19 Sep, 2020

1 commit

  • Current tracing_init_dentry() return a d_entry pointer, while is not
    necessary. This function returns NULL on success or error on failure,
    which means there is no valid d_entry pointer return.

    Let's return 0 on success and negative value for error.

    Link: https://lkml.kernel.org/r/20200712011036.70948-5-richard.weiyang@linux.alibaba.com

    Signed-off-by: Wei Yang
    Signed-off-by: Steven Rostedt (VMware)

    Wei Yang
     

04 Aug, 2020

1 commit

  • I was attempting to use pid filtering with function_graph, but it wasn't
    allowing anything to make it through. Turns out ftrace_trace_task
    returns false if ftrace_ignore_pid is not-empty, which isn't correct
    anymore. We're now setting it to FTRACE_PID_IGNORE if we need to ignore
    that pid, otherwise it's set to the pid (which is weird considering the
    name) or to FTRACE_PID_TRACE. Fix the check to check for !=
    FTRACE_PID_IGNORE. With this we can now use function_graph with pid
    filtering.

    Link: https://lkml.kernel.org/r/20200725005048.1790-1-josef@toxicpanda.com

    Fixes: 717e3f5ebc82 ("ftrace: Make function trace pid filtering a bit more exact")
    Signed-off-by: Josef Bacik
    Signed-off-by: Steven Rostedt (VMware)

    Josef Bacik
     

01 Jul, 2020

1 commit

  • If a process has the trace_pipe open on a trace_array, the current tracer
    for that trace array should not be changed. This was original enforced by a
    global lock, but when instances were introduced, it was moved to the
    current_trace. But this structure is shared by all instances, and a
    trace_pipe is for a single instance. There's no reason that a process that
    has trace_pipe open on one instance should prevent another instance from
    changing its current tracer. Move the reference counter to the trace_array
    instead.

    This is marked as "Fixes" but is more of a clean up than a true fix.
    Backport if you want, but its not critical.

    Fixes: cf6ab6d9143b1 ("tracing: Add ref count to tracer for when they are being read by pipe")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

17 Jun, 2020

1 commit

  • When using trace-cmd on 5.6-rt for the function graph tracer, the output was
    corrupted. It gave output like this:

    funcgraph_entry: func=0xffffffff depth=38982
    funcgraph_entry: func=0x1ffffffff depth=16044
    funcgraph_exit: func=0xffffffff overrun=0x92539aaf00000000 calltime=0x92539c9900000072 rettime=0x100000072 depth=11084
    funcgraph_exit: func=0xffffffff overrun=0x9253946e00000000 calltime=0x92539e2100000072 rettime=0x72 depth=26033702
    funcgraph_entry: func=0xffffffff depth=85798
    funcgraph_entry: func=0x1ffffffff depth=12044

    The reason was because the tracefs/events/ftrace/funcgraph_entry/exit format
    file was incorrect. The -rt kernel adds more common fields to the trace
    events. Namely, common_migrate_disable and common_preempt_lazy_count. Each
    is one byte in size. This changes the alignment of the normal payload. Most
    events are aligned normally, but the function and function graph events are
    defined with a "PACKED" macro, that packs their payload. As the offsets
    displayed in the format files are now calculated by an aligned field, the
    aligned field for function and function graph events should be 1, not their
    normal alignment.

    With aligning of the funcgraph_entry event, the format file has:

    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    field:int common_pid; offset:4; size:4; signed:1;
    field:unsigned char common_migrate_disable; offset:8; size:1; signed:0;
    field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0;

    field:unsigned long func; offset:16; size:8; signed:0;
    field:int depth; offset:24; size:4; signed:1;

    But the actual alignment is:

    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    field:int common_pid; offset:4; size:4; signed:1;
    field:unsigned char common_migrate_disable; offset:8; size:1; signed:0;
    field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0;

    field:unsigned long func; offset:12; size:8; signed:0;
    field:int depth; offset:20; size:4; signed:1;

    Link: https://lkml.kernel.org/r/20200609220041.2a3b527f@oasis.local.home

    Cc: stable@vger.kernel.org
    Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

01 Jun, 2020

1 commit

  • Add a new "hist_debug" file for each trace event, which when read will
    dump out a bunch of internal details about the hist triggers defined
    on that event.

    This is normally off but can be enabled by saying 'y' to the new
    CONFIG_HIST_TRIGGERS_DEBUG config option.

    This is in support of the new Documentation file describing histogram
    internals, Documentation/trace/histogram-design.rst, which was
    requested by developers trying to understand the internals when
    extending or making use of the hist triggers for higher-level tools.

    The histogram-design.rst documentation refers to the hist_debug files
    and demonstrates their use with output in the test examples.

    Link: http://lkml.kernel.org/r/77914c22b0ba493d9783c53bbfbc6087d6a7e1b1.1585941485.git.zanussi@kernel.org

    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi
     

28 Mar, 2020

4 commits

  • There's currently a way to select a task that should only have its events
    traced, but there's no way to select a task not to have itsevents traced.
    Add a set_event_notrace_pid file that acts the same as set_event_pid (and is
    also affected by event-fork), but the task pids in this file will not be
    traced even if they are listed in the set_event_pid file. This makes it easy
    for tools like trace-cmd to "hide" itself from beint traced by events when
    it is recording other tasks.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • There's currently a way to select a task that should only be traced by
    functions, but there's no way to select a task not to be traced by the
    function tracer. Add a set_ftrace_notrace_pid file that acts the same as
    set_ftrace_pid (and is also affected by function-fork), but the task pids in
    this file will not be traced even if they are listed in the set_ftrace_pid
    file. This makes it easy for tools like trace-cmd to "hide" itself from the
    function tracer when it is recording other tasks.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The set_ftrace_pid file is used to filter function tracing to only trace
    tasks that are listed in that file. Instead of testing the pids listed in
    that file (it's a bitmask) at each function trace event, the logic is done
    via a sched_switch hook. A flag is set when the next task to run is in the
    list of pids in the set_ftrace_pid file. But the sched_switch hook is not at
    the exact location of when the task switches, and the flag gets set before
    the task to be traced actually runs. This leaves a residue of traced
    functions that do not belong to the pid that should be filtered on.

    By changing the logic slightly, where instead of having a boolean flag to
    test, record the pid that should be traced, with special values for not to
    trace and always trace. Then at each function call, a check will be made to
    see if the function should be ignored, or if the current pid matches the
    function that should be traced, and only trace if it matches (or if it has
    the special value to always trace).

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • When opening the "trace" file, it is no longer necessary to disable tracing.

    Note, a new option is created called "pause-on-trace", when set, will cause
    the trace file to emulate its original behavior.

    Link: http://lkml.kernel.org/r/20200317213416.903351225@goodmis.org

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

04 Mar, 2020

1 commit

  • Commit 567cd4da54ff ("ring-buffer: User context bit recursion checking")
    added the TRACE_BUFFER bits to be used in the current task's trace_recursion
    field. But the final submission of the logic removed the use of those bits,
    but never removed the bits themselves (they were never used in upstream
    Linux). These can be safely removed.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

06 Feb, 2020

5 commits

  • Pull tracing updates from Steven Rostedt:

    - Added new "bootconfig".

    This looks for a file appended to initrd to add boot config options,
    and has been discussed thoroughly at Linux Plumbers.

    Very useful for adding kprobes at bootup.

    Only enabled if "bootconfig" is on the real kernel command line.

    - Created dynamic event creation.

    Merges common code between creating synthetic events and kprobe
    events.

    - Rename perf "ring_buffer" structure to "perf_buffer"

    - Rename ftrace "ring_buffer" structure to "trace_buffer"

    Had to rename existing "trace_buffer" to "array_buffer"

    - Allow trace_printk() to work withing (some) tracing code.

    - Sort of tracing configs to be a little better organized

    - Fixed bug where ftrace_graph hash was not being protected properly

    - Various other small fixes and clean ups

    * tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
    bootconfig: Show the number of nodes on boot message
    tools/bootconfig: Show the number of bootconfig nodes
    bootconfig: Add more parse error messages
    bootconfig: Use bootconfig instead of boot config
    ftrace: Protect ftrace_graph_hash with ftrace_sync
    ftrace: Add comment to why rcu_dereference_sched() is open coded
    tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
    tracing: Annotate ftrace_graph_hash pointer with __rcu
    bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
    tracing: Use seq_buf for building dynevent_cmd string
    tracing: Remove useless code in dynevent_arg_pair_add()
    tracing: Remove check_arg() callbacks from dynevent args
    tracing: Consolidate some synth_event_trace code
    tracing: Fix now invalid var_ref_vals assumption in trace action
    tracing: Change trace_boot to use synth_event interface
    tracing: Move tracing selftests to bottom of menu
    tracing: Move mmio tracer config up with the other tracers
    tracing: Move tracing test module configs together
    tracing: Move all function tracing configs together
    tracing: Documentation for in-kernel synthetic event API
    ...

    Linus Torvalds
     
  • As function_graph tracer can run when RCU is not "watching", it can not be
    protected by synchronize_rcu() it requires running a task on each CPU before
    it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.

    Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72

    Cc: stable@vger.kernel.org
    Fixes: b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables")
    Reported-by: "Paul E. McKenney"
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Because the function graph tracer can execute in sections where RCU is not
    "watching", the rcu_dereference_sched() for the has needs to be open coded.
    This is fine because the RCU "flavor" of the ftrace hash is protected by
    its own RCU handling (it does its own little synchronization on every CPU
    and does not rely on RCU sched).

    Acked-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Fix following instances of sparse error
    kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
    kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
    kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
    kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison

    Use rcu_dereference_protected to dereference the newly annotated pointer.

    Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com

    Signed-off-by: Amol Grover
    Signed-off-by: Steven Rostedt (VMware)

    Amol Grover
     
  • Fix following instances of sparse error
    kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
    kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
    kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
    kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison

    Use rcu_dereference_protected to access the __rcu annotated pointer.

    Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Amol Grover
    Signed-off-by: Steven Rostedt (VMware)

    Amol Grover
     

30 Jan, 2020

1 commit

  • Add a new trace_array_find() function that can be used to find a trace
    array given the instance name, and replace existing code that does the
    same thing with it. Also add trace_array_find_get() which does the
    same but returns the trace array after upping its refcount.

    Also make both available for use outside of trace.c.

    Link: http://lkml.kernel.org/r/cb68528c975eba95bee4561ac67dd1499423b2e5.1580323897.git.zanussi@kernel.org

    Acked-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi
     

29 Jan, 2020

1 commit


25 Jan, 2020

1 commit

  • As warnings can trigger panics, especially when "panic_on_warn" is set,
    memory failure warnings can cause panics and fail fuzz testers that are
    stressing memory.

    Create a MEM_FAIL() macro to use instead of WARN() in the tracing code
    (perhaps this should be a kernel wide macro?), and use that for memory
    failure issues. This should stop failing fuzz tests due to warnings.

    Link: https://lore.kernel.org/r/CACT4Y+ZP-7np20GVRu3p+eZys9GPtbu+JpfV+HtsufAzvTgJrg@mail.gmail.com

    Suggested-by: Dmitry Vyukov
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

14 Jan, 2020

2 commits

  • As there's two struct ring_buffers in the kernel, it causes some confusion.
    The other one being the perf ring buffer. It was agreed upon that as neither
    of the ring buffers are generic enough to be used globally, they should be
    renamed as:

    perf's ring_buffer -> perf_buffer
    ftrace's ring_buffer -> trace_buffer

    This implements the changes to the ring buffer that ftrace uses.

    Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • As we are working to remove the generic "ring_buffer" name that is used by
    both tracing and perf, the ring_buffer name for tracing will be renamed to
    trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

    As there already exists a trace_buffer that is used by the trace_arrays, it
    needs to be first renamed to array_buffer.

    Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

10 Dec, 2019

1 commit


03 Dec, 2019

1 commit

  • We have been trying to use rasdaemon to monitor hardware errors like
    correctable memory errors. rasdaemon uses trace events to monitor
    various hardware errors. In order to test it, we have to inject some
    hardware errors, unfortunately not all of them provide error
    injections. MCE does provide a way to inject MCE errors, but errors
    like PCI error and devlink error don't, it is not easy to add error
    injection to each of them. Instead, it is relatively easier to just
    allow users to inject trace events in a generic way so that all trace
    events can be injected.

    This patch introduces trace event injection, where a new 'inject' is
    added to each tracepoint directory. Users could write into this file
    with key=value pairs to specify the value of each fields of the trace
    event, all unspecified fields are set to zero values by default.

    For example, for the net/net_dev_queue tracepoint, we can inject:

    INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
    echo "" > $INJECT
    echo "name='test'" > $INJECT
    echo "name='test' len=1024" > $INJECT
    cat /sys/kernel/debug/tracing/trace
    ...
    -614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
    -614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
    -614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024

    Triggers could be triggered as usual too:

    echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
    echo "len=1025" > $INJECT
    cat /sys/kernel/debug/tracing/trace
    ...
    bash-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
    bash-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
    bash-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
    bash-614 [001] .N.1 284.236349:
    => event_inject_write
    => vfs_write
    => ksys_write
    => do_syscall_64
    => entry_SYSCALL_64_after_hwframe

    The only thing that can't be injected is string pointers as they
    require constant string pointers, this can't be done at run time.

    Link: http://lkml.kernel.org/r/20191130045218.18979-1-xiyou.wangcong@gmail.com

    Cc: Ingo Molnar
    Signed-off-by: Cong Wang
    Signed-off-by: Steven Rostedt (VMware)

    Cong Wang
     

27 Nov, 2019

1 commit

  • Rework event_create_dir() to use an array of static data instead of
    function pointers where possible.

    The problem is that it would call the function pointer on module load
    before parse_args(), possibly even before jump_labels were initialized.
    Luckily the generated functions don't use jump_labels but it still seems
    fragile. It also gets in the way of changing when we make the module map
    executable.

    The generated function are basically calling trace_define_field() with a
    bunch of static arguments. So instead of a function, capture these
    arguments in a static array, avoiding the function call.

    Now there are a number of cases where the fields are dynamic (syscall
    arguments, kprobes and uprobes), in which case a static array does not
    work, for these we preserve the function call. Luckily all these cases
    are not related to modules and so we can retain the function call for
    them.

    Also fix up all broken tracepoint definitions that now generate a
    compile error.

    Tested-by: Alexei Starovoitov
    Tested-by: Steven Rostedt (VMware)
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Acked-by: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20191111132458.342979914@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

23 Nov, 2019

1 commit

  • Adding 2 new functions -
    1) struct trace_array *trace_array_get_by_name(const char *name);

    Return pointer to a trace array with given name. If it does not exist,
    create and return pointer to the new trace array.

    2) int trace_array_set_clr_event(struct trace_array *tr,
    const char *system ,const char *event, bool enable);

    Enable/Disable events to this trace array.

    Additionally,
    - To handle reference counters, export trace_array_put()
    - Due to introduction of the above 2 new functions, we no longer need to
    export - ftrace_set_clr_event & trace_array_create APIs.

    Link: http://lkml.kernel.org/r/1574276919-11119-2-git-send-email-divya.indi@oracle.com

    Signed-off-by: Divya Indi
    Reviewed-by: Aruna Ramakrishna
    Signed-off-by: Steven Rostedt (VMware)

    Divya Indi
     

15 Nov, 2019

1 commit


13 Nov, 2019

3 commits

  • Declare the newly introduced and exported APIs in the header file -
    include/linux/trace.h. Moving previous declarations from
    kernel/trace/trace.h to include/linux/trace.h.

    Link: http://lkml.kernel.org/r/1565805327-579-2-git-send-email-divya.indi@oracle.com

    Signed-off-by: Divya Indi
    Signed-off-by: Steven Rostedt (VMware)

    Divya Indi
     
  • This patch implements the feature that the tracing_max_latency file,
    e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
    notifications through the fsnotify framework when a new latency is
    available.

    One particularly interesting use of this facility is when enabling
    threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
    together with the preempt/irqsoff tracers. This makes it possible to
    implement a user space program that can, with equal probability,
    obtain traces of latencies that occur immediately after each other in
    spite of the fact that the preempt/irqsoff tracers operate in overwrite
    mode.

    This facility works with the hwlat, preempt/irqsoff, and wakeup
    tracers.

    The tracers may call the latency_fsnotify() from places such as
    __schedule() or do_idle(); this makes it impossible to call
    queue_work() directly without risking a deadlock. The same would
    happen with a softirq, kernel thread or tasklet. For this reason we
    use the irq_work mechanism to call queue_work().

    This patch creates a new workqueue. The reason for doing this is that
    I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was
    that WQ_UNBOUND might help with the latency in some important cases.

    If we use:

    queue_work(system_highpri_wq, &tr->fsnotify_work);

    then the work will (almost) always execute on the same CPU but if we are
    unlucky that CPU could be too busy while there could be another CPU in
    the system that would be able to process the work soon enough.

    queue_work_on() could be used to queue the work on another CPU but it
    seems difficult to select the right CPU.

    Link: http://lkml.kernel.org/r/20191008220824.7911-2-viktor.rosendahl@gmail.com

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Viktor Rosendahl (BMW)
    [ Added max() to have one compare for max latency ]
    Signed-off-by: Steven Rostedt (VMware)

    Viktor Rosendahl (BMW)
     
  • Looking for ways to shrink the size of the dyn_ftrace structure, knowing the
    information about how many pages and the number of groups of those pages, is
    useful in working out the best ways to save on memory.

    This adds one info print on how many groups of pages were used to allocate
    the ftrace dyn_ftrace structures, and also shows the number of pages and
    groups in the dyn_ftrace_total_info (which is used for debugging).

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

13 Oct, 2019

2 commits

  • Currently, most files in the tracefs directory test if tracing_disabled is
    set. If so, it should return -ENODEV. The tracing_disabled is called when
    tracing is found to be broken. Originally it was done in case the ring
    buffer was found to be corrupted, and we wanted to prevent reading it from
    crashing the kernel. But it's also called if a tracing selftest fails on
    boot. It's a one way switch. That is, once it is triggered, tracing is
    disabled until reboot.

    As most tracefs files can also be used by instances in the tracefs
    directory, they need to be carefully done. Each instance has a trace_array
    associated to it, and when the instance is removed, the trace_array is
    freed. But if an instance is opened with a reference to the trace_array,
    then it requires looking up the trace_array to get its ref counter (as there
    could be a race with it being deleted and the open itself). Once it is
    found, a reference is added to prevent the instance from being removed (and
    the trace_array associated with it freed).

    Combine the two checks (tracing_disabled and trace_array_get()) into a
    single helper function. This will also make it easier to add lockdown to
    tracefs later.

    Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Instead of having the trace events system open call open code the taking of
    the trace_array descriptor (with trace_array_get()) and then calling
    trace_open_generic(), have it use the tracing_open_generic_tr() that does
    the combination of the two. This requires making tracing_open_generic_tr()
    global.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

29 Sep, 2019

1 commit

  • After r372664 in clang, the IF_ASSIGN macro causes a couple hundred
    warnings along the lines of:

    kernel/trace/trace_output.c:1331:2: warning: converting the enum
    constant to a boolean [-Wint-in-bool-context]
    kernel/trace/trace.h:409:3: note: expanded from macro
    'trace_assign_type'
    IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry,
    ^
    kernel/trace/trace.h:371:14: note: expanded from macro 'IF_ASSIGN'
    WARN_ON(id && (entry)->type != id); \
    ^
    264 warnings generated.

    This warning can catch issues with constructs like:

    if (state == A || B)

    where the developer really meant:

    if (state == A || state == B)

    This is currently the only occurrence of the warning in the kernel
    tree across defconfig, allyesconfig, allmodconfig for arm32, arm64,
    and x86_64. Add the implicit '!= 0' to the WARN_ON statement to fix
    the warnings and find potential issues in the future.

    Link: https://github.com/llvm/llvm-project/commit/28b38c277a2941e9e891b2db30652cfd962f070b
    Link: https://github.com/ClangBuiltLinux/linux/issues/686
    Link: http://lkml.kernel.org/r/20190926162258.466321-1-natechancellor@gmail.com

    Reviewed-by: Nick Desaulniers
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Steven Rostedt (VMware)

    Nathan Chancellor
     

01 Sep, 2019

1 commit