05 Mar, 2020

2 commits

  • [ Upstream commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 ]

    KASAN is reporting that __blk_add_trace() has a use-after-free issue
    when accessing q->blk_trace. Indeed the switching of block tracing (and
    thus eventual freeing of q->blk_trace) is completely unsynchronized with
    the currently running tracing and thus it can happen that the blk_trace
    structure is being freed just while __blk_add_trace() works on it.
    Protect accesses to q->blk_trace by RCU during tracing and make sure we
    wait for the end of RCU grace period when shutting down tracing. Luckily
    that is rare enough event that we can afford that. Note that postponing
    the freeing of blk_trace to an RCU callback should better be avoided as
    it could have unexpected user visible side-effects as debugfs files
    would be still existing for a short while block tracing has been shut
    down.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711
    CC: stable@vger.kernel.org
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Ming Lei
    Tested-by: Ming Lei
    Reviewed-by: Bart Van Assche
    Reported-by: Tristan Madani
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Jan Kara
     
  • commit 78041c0c9e935d9ce4086feeff6c569ed88ddfd4 upstream.

    The tracing seftests checks various aspects of the tracing infrastructure,
    and one is filtering. If trace_printk() is active during a self test, it can
    cause the filtering to fail, which will disable that part of the trace.

    To keep the selftests from failing because of trace_printk() calls,
    trace_printk() checks the variable tracing_selftest_running, and if set, it
    does not write to the tracing buffer.

    As some tracers were registered earlier in boot, the selftest they triggered
    would fail because not all the infrastructure was set up for the full
    selftest. Thus, some of the tests were post poned to when their
    infrastructure was ready (namely file system code). The postpone code did
    not set the tracing_seftest_running variable, and could fail if a
    trace_printk() was added and executed during their run.

    Cc: stable@vger.kernel.org
    Fixes: 9afecfbb95198 ("tracing: Postpone tracer start-up tests till the system is more robust")
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

24 Feb, 2020

5 commits

  • [ Upstream commit 6722b23e7a2ace078344064a9735fb73e554e9ef ]

    if seq_file .next fuction does not change position index,
    read after some lseek can generate unexpected output.

    Without patch:
    # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
    dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
    n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
    # Available triggers:
    # traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
    6+1 records in
    6+1 records out
    206 bytes copied, 0.00027916 s, 738 kB/s

    Notice the printing of "# Available triggers:..." after the line.

    With the patch:
    # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
    dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
    n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
    2+1 records in
    2+1 records out
    88 bytes copied, 0.000526867 s, 167 kB/s

    It only prints the end of the file, and does not restart.

    Link: http://lkml.kernel.org/r/3c35ee24-dd3a-8119-9c19-552ed253388a@virtuozzo.com

    https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Vasily Averin
     
  • [ Upstream commit e4075e8bdffd93a9b6d6e1d52fabedceeca5a91b ]

    if seq_file .next fuction does not change position index,
    read after some lseek can generate unexpected output.

    Without patch:
    # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
    dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
    id
    no pid
    2+1 records in
    2+1 records out
    10 bytes copied, 0.000213285 s, 46.9 kB/s

    Notice the "id" followed by "no pid".

    With the patch:
    # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
    dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
    id
    0+1 records in
    0+1 records out
    3 bytes copied, 0.000202112 s, 14.8 kB/s

    Notice that it only prints "id" and not the "no pid" afterward.

    Link: http://lkml.kernel.org/r/4f87c6ad-f114-30bb-8506-c32274ce2992@virtuozzo.com

    https://bugzilla.kernel.org/show_bug.cgi?id=206283
    Signed-off-by: Vasily Averin
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Vasily Averin
     
  • [ Upstream commit b527b638fd63ba791dc90a0a6e9a3035b10df52b ]

    In the process of adding better error messages for sorting, I realized
    that strsep was being used incorrectly and some of the error paths I
    was expecting to be hit weren't and just fell through to the common
    invalid key error case.

    It also became obvious that for keyword assignments, it wasn't
    necessary to save the full assignment and reparse it later, and having
    a common empty-assignment check would also make more sense in terms of
    error processing.

    Change the code to fix these problems and simplify it for new error
    message changes in a subsequent patch.

    Link: http://lkml.kernel.org/r/1c3ef0b6655deaf345f6faee2584a0298ac2d743.1561743018.git.zanussi@kernel.org

    Fixes: e62347d24534 ("tracing: Add hist trigger support for user-defined sorting ('sort=' param)")
    Fixes: 7ef224d1d0e3 ("tracing: Add 'hist' event trigger command")
    Fixes: a4072fe85ba3 ("tracing: Add a clock attribute for hist triggers")
    Reported-by: Masami Hiramatsu
    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Tom Zanussi
     
  • [ Upstream commit dfb6cd1e654315168e36d947471bd2a0ccd834ae ]

    Looking through old emails in my INBOX, I came across a patch from Luis
    Henriques that attempted to fix a race of two stat tracers registering the
    same stat trace (extremely unlikely, as this is done in the kernel, and
    probably doesn't even exist). The submitted patch wasn't quite right as it
    needed to deal with clean up a bit better (if two stat tracers were the
    same, it would have the same files).

    But to make the code cleaner, all we needed to do is to keep the
    all_stat_sessions_mutex held for most of the registering function.

    Link: http://lkml.kernel.org/r/1410299375-20068-1-git-send-email-luis.henriques@canonical.com

    Fixes: 002bb86d8d42f ("tracing/ftrace: separate events tracing and stats tracing engine")
    Reported-by: Luis Henriques
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Steven Rostedt (VMware)
     
  • [ Upstream commit afccc00f75bbbee4e4ae833a96c2d29a7259c693 ]

    tracing_stat_init() was always returning '0', even on the error paths. It
    now returns -ENODEV if tracing_init_dentry() fails or -ENOMEM if it fails
    to created the 'trace_stat' debugfs directory.

    Link: http://lkml.kernel.org/r/1410299381-20108-1-git-send-email-luis.henriques@canonical.com

    Fixes: ed6f1c996bfe4 ("tracing: Check return value of tracing_init_dentry()")
    Signed-off-by: Luis Henriques
    [ Pulled from the archeological digging of my INBOX ]
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Luis Henriques
     

11 Feb, 2020

7 commits

  • [ Upstream commit 54a16ff6f2e50775145b210bcd94d62c3c2af117 ]

    As function_graph tracer can run when RCU is not "watching", it can not be
    protected by synchronize_rcu() it requires running a task on each CPU before
    it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.

    Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72

    Cc: stable@vger.kernel.org
    Fixes: b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables")
    Reported-by: "Paul E. McKenney"
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Steven Rostedt (VMware)
     
  • [ Upstream commit 16052dd5bdfa16dbe18d8c1d4cde2ddab9d23177 ]

    Because the function graph tracer can execute in sections where RCU is not
    "watching", the rcu_dereference_sched() for the has needs to be open coded.
    This is fine because the RCU "flavor" of the ftrace hash is protected by
    its own RCU handling (it does its own little synchronization on every CPU
    and does not rely on RCU sched).

    Acked-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Steven Rostedt (VMware)
     
  • [ Upstream commit fd0e6852c407dd9aefc594f54ddcc21d84803d3b ]

    Fix following instances of sparse error
    kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
    kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
    kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
    kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison

    Use rcu_dereference_protected to dereference the newly annotated pointer.

    Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com

    Signed-off-by: Amol Grover
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Amol Grover
     
  • [ Upstream commit 24a9729f831462b1d9d61dc85ecc91c59037243f ]

    Fix following instances of sparse error
    kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
    kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
    kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
    kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison

    Use rcu_dereference_protected to access the __rcu annotated pointer.

    Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Amol Grover
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Amol Grover
     
  • [ Upstream commit d380dcde9a07ca5de4805dee11f58a98ec0ad6ff ]

    The patch 'tracing: Fix histogram code when expression has same var as
    value' added code to return an existing variable reference when
    creating a new variable reference, which resulted in var_ref_vals
    slots being reused instead of being duplicated.

    The implementation of the trace action assumes that the end of the
    var_ref_vals array starting at action_data.var_ref_idx corresponds to
    the values that will be assigned to the trace params. The patch
    mentioned above invalidates that assumption, which means that each
    param needs to explicitly specify its index into var_ref_vals.

    This fix changes action_data.var_ref_idx to an array of var ref
    indexes to account for that.

    Link: https://lore.kernel.org/r/1580335695.6220.8.camel@kernel.org

    Fixes: 8bcebc77e85f ("tracing: Fix histogram code when expression has same var as value")
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Tom Zanussi
     
  • commit 64ae572bc7d0060429e40e1c8d803ce5eb31a0d6 upstream.

    Reading the sched_cmdline_ref and sched_tgid_ref initial state within
    tracing_start_sched_switch without holding the sched_register_mutex is
    racy against concurrent updates, which can lead to tracepoint probes
    being registered more than once (and thus trigger warnings within
    tracepoint.c).

    [ May be the fix for this bug ]
    Link: https://lore.kernel.org/r/000000000000ab6f84056c786b93@google.com

    Link: http://lkml.kernel.org/r/20190817141208.15226-1-mathieu.desnoyers@efficios.com

    Cc: stable@vger.kernel.org
    CC: Steven Rostedt (VMware)
    CC: Joel Fernandes (Google)
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Paul E. McKenney
    Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com
    Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks")
    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Mathieu Desnoyers
     
  • commit 20279420ae3a8ef4c5d9fedc360a2c37a1dbdf1b upstream.

    Thomas Richter reported:

    > Test case 66 'Use vfs_getname probe to get syscall args filenames'
    > is broken on s390, but works on x86. The test case fails with:
    >
    > [root@m35lp76 perf]# perf test -F 66
    > 66: Use vfs_getname probe to get syscall args filenames
    > :Recording open file:
    > [ perf record: Woken up 1 times to write data ]
    > [ perf record: Captured and wrote 0.004 MB /tmp/__perf_test.perf.data.TCdYj\
    > (20 samples) ]
    > Looking at perf.data file for vfs_getname records for the file we touched:
    > FAILED!
    > [root@m35lp76 perf]#

    The root cause was the print_fmt of the kprobe event that referenced the
    "ustring"

    > Setting up the kprobe event using perf command:
    >
    > # ./perf probe "vfs_getname=getname_flags:72 pathname=filename:ustring"
    >
    > generates this format file:
    > [root@m35lp76 perf]# cat /sys/kernel/debug/tracing/events/probe/\
    > vfs_getname/format
    > name: vfs_getname
    > ID: 1172
    > format:
    > field:unsigned short common_type; offset:0; size:2; signed:0;
    > field:unsigned char common_flags; offset:2; size:1; signed:0;
    > field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    > field:int common_pid; offset:4; size:4; signed:1;
    >
    > field:unsigned long __probe_ip; offset:8; size:8; signed:0;
    > field:__data_loc char[] pathname; offset:16; size:4; signed:1;
    >
    > print fmt: "(%lx) pathname=\"%s\"", REC->__probe_ip, REC->pathname

    Instead of using "__get_str(pathname)" it referenced it directly.

    Link: http://lkml.kernel.org/r/20200124100742.4050c15e@gandalf.local.home

    Cc: stable@vger.kernel.org
    Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
    Acked-by: Masami Hiramatsu
    Reported-by: Thomas Richter
    Tested-by: Thomas Richter
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

06 Feb, 2020

1 commit

  • [ Upstream commit b61387cb732cf283d318b2165c44913525fe545f ]

    Commit 99c9a923e97a ("tracing/uprobe: Fix double perf_event
    linking on multiprobe uprobe") moved trace_uprobe_filter on
    trace_probe_event. However, since it introduced a flexible
    data structure with char array and type casting, the
    alignment of trace_uprobe_filter can be broken.

    This changes the type of the array to trace_uprobe_filter
    data strucure to fix it.

    Link: http://lore.kernel.org/r/20200120124022.GA14897@hirez.programming.kicks-ass.net
    Link: http://lkml.kernel.org/r/157966340499.5107.10978352478952144902.stgit@devnote2

    Fixes: 99c9a923e97a ("tracing/uprobe: Fix double perf_event linking on multiprobe uprobe")
    Suggested-by: Peter Zijlstra
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     

29 Jan, 2020

4 commits

  • commit 8bcebc77e85f3d7536f96845a0fe94b1dddb6af0 upstream.

    While working on a tool to convert SQL syntex into the histogram language of
    the kernel, I discovered the following bug:

    # echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
    # echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
    # echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

    Would not display any histograms in the sched_switch histogram side.

    But if I were to swap the location of

    "delta=common_timestamp-$start" with "start2=$start"

    Such that the last line had:

    # echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

    The histogram works as expected.

    What I found out is that the expressions clear out the value once it is
    resolved. As the variables are resolved in the order listed, when
    processing:

    delta=common_timestamp-$start

    The $start is cleared. When it gets to "start2=$start", it errors out with
    "unresolved symbol" (which is silent as this happens at the location of the
    trace), and the histogram is dropped.

    When processing the histogram for variable references, instead of adding a
    new reference for a variable used twice, use the same reference. That way,
    not only is it more efficient, but the order will no longer matter in
    processing of the variables.

    From Tom Zanussi:

    "Just to clarify some more about what the problem was is that without
    your patch, we would have two separate references to the same variable,
    and during resolve_var_refs(), they'd both want to be resolved
    separately, so in this case, since the first reference to start wasn't
    part of an expression, it wouldn't get the read-once flag set, so would
    be read normally, and then the second reference would do the read-once
    read and also be read but using read-once. So everything worked and
    you didn't see a problem:

    from: start2=$start,delta=common_timestamp-$start

    In the second case, when you switched them around, the first reference
    would be resolved by doing the read-once, and following that the second
    reference would try to resolve and see that the variable had already
    been read, so failed as unset, which caused it to short-circuit out and
    not do the trigger action to generate the synthetic event:

    to: delta=common_timestamp-$start,start2=$start

    With your patch, we only have the single resolution which happens
    correctly the one time it's resolved, so this can't happen."

    Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home

    Cc: stable@vger.kernel.org
    Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
    Reviewed-by: Tom Zanuss
    Tested-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit bf24daac8f2bd5b8affaec03c2be1d20bcdd6837 upstream.

    When trace_clock option is not set and unstable clcok detected,
    tracing_set_default_clock() sets trace_clock(ThinkPad A285 is one of
    case). In that case, if lockdown is in effect, null pointer
    dereference error happens in ring_buffer_set_clock().

    Link: http://lkml.kernel.org/r/20200116131236.3866925-1-masami256@gmail.com

    Cc: stable@vger.kernel.org
    Fixes: 17911ff38aa58 ("tracing: Add locked_down checks to the open calls of files created for tracefs")
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1788488
    Signed-off-by: Masami Ichikawa
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Ichikawa
     
  • commit 99c9a923e97a583a38050baa92c9377d73946330 upstream.

    Fix double perf_event linking to trace_uprobe_filter on
    multiple uprobe event by moving trace_uprobe_filter under
    trace_probe_event.

    In uprobe perf event, trace_uprobe_filter data structure is
    managing target mm filters (in perf_event) related to each
    uprobe event.

    Since commit 60d53e2c3b75 ("tracing/probe: Split trace_event
    related data from trace_probe") left the trace_uprobe_filter
    data structure in trace_uprobe, if a trace_probe_event has
    multiple trace_uprobe (multi-probe event), a perf_event is
    added to different trace_uprobe_filter on each trace_uprobe.
    This leads a linked list corruption.

    To fix this issue, move trace_uprobe_filter to trace_probe_event
    and link it once on each event instead of each probe.

    Link: http://lkml.kernel.org/r/157862073931.1800.3800576241181489174.stgit@devnote2

    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: "David S . Miller"
    Cc: Namhyung Kim
    Cc: =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?b?w7hyZ2Vuc2Vu?=
    Cc: Jean-Tsung Hsiao
    Cc: Jesper Dangaard Brouer
    Cc: stable@vger.kernel.org
    Fixes: 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe")
    Link: https://lkml.kernel.org/r/20200108171611.GA8472@kernel.org
    Reported-by: Arnaldo Carvalho de Melo
    Tested-by: Arnaldo Carvalho de Melo
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • commit aeed8aa3874dc15b9d82a6fe796fd7cfbb684448 upstream.

    With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings
    when I ran ftracetest trigger testcases.

    -----
    # dmesg -c > /dev/null
    # ./ftracetest test.d/trigger
    ...
    # dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " "
    kernel/trace/trace_events_hist.c:6070
    kernel/trace/trace_events_hist.c:1760
    kernel/trace/trace_events_hist.c:5911
    kernel/trace/trace_events_trigger.c:504
    kernel/trace/trace_events_hist.c:1810
    kernel/trace/trace_events_hist.c:3158
    kernel/trace/trace_events_hist.c:3105
    kernel/trace/trace_events_hist.c:5518
    kernel/trace/trace_events_hist.c:5998
    kernel/trace/trace_events_hist.c:6019
    kernel/trace/trace_events_hist.c:6044
    kernel/trace/trace_events_trigger.c:1500
    kernel/trace/trace_events_trigger.c:1540
    kernel/trace/trace_events_trigger.c:539
    kernel/trace/trace_events_trigger.c:584
    -----

    I investigated those warnings and found that the RCU-list
    traversals in event trigger and hist didn't need to use
    RCU version because those were called only under event_mutex.

    I also checked other RCU-list traversals related to event
    trigger list, and found that most of them were called from
    event_hist_trigger_func() or hist_unregister_trigger() or
    register/unregister functions except for a few cases.

    Replace these unneeded RCU-list traversals with normal list
    traversal macro and lockdep_assert_held() to check the
    event_mutex is held.

    Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2

    Cc: stable@vger.kernel.org
    Fixes: 30350d65ac567 ("tracing: Add variable support to hist triggers")
    Reviewed-by: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     

18 Jan, 2020

1 commit

  • commit eb1b66887472eaa7342305b7890ae510dd9d1a79 upstream.

    Convert the bpf_probe_write_user() helper to probe_user_write() such that
    writes are not attempted under KERNEL_DS anymore which is buggy as kernel
    and user space pointers can have overlapping addresses. Also, given we have
    the access_ok() check inside probe_user_write(), the helper doesn't need
    to do it twice.

    Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/841c461781874c07a0ee404a454c3bc0459eed30.1572649915.git.daniel@iogearbox.net
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

15 Jan, 2020

2 commits

  • commit b8299d362d0837ae39e87e9019ebe6b736e0f035 upstream.

    On some archs with some configurations, MCOUNT_INSN_SIZE is not defined, and
    this makes the stack tracer fail to compile. Just define it to zero in this
    case.

    Link: https://lore.kernel.org/r/202001020219.zvE3vsty%lkp@intel.com

    Cc: stable@vger.kernel.org
    Fixes: 4df297129f622 ("tracing: Remove most or all of stack tracer stack size from stack_max_size")
    Reported-by: kbuild test robot
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 50f9ad607ea891a9308e67b81f774c71736d1098 upstream.

    In the function, if register_trace_sched_migrate_task() returns error,
    sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is
    why fail_deprobe_sched_switch was added.

    Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com

    Cc: stable@vger.kernel.org
    Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing")
    Signed-off-by: Kaitao Cheng
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Kaitao Cheng
     

12 Jan, 2020

1 commit

  • commit a356646a56857c2e5ad875beec734d7145ecd49a upstream.

    If lockdown is disabling tracing on boot up, it prevents the tracing files
    from even bering created. But when that happens, there's several places that
    will give a warning that the files were not created as that is usually a
    sign of a bug.

    Add in strategic locations where a check is made to see if tracing is
    disabled by lockdown, and if it is, do not go further, and fail silently
    (but print that tracing is disabled by lockdown, without doing a WARN_ON()).

    Cc: Matthew Garrett
    Fixes: 17911ff38aa5 ("tracing: Add locked_down checks to the open calls of files created for tracefs")
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

09 Jan, 2020

5 commits

  • commit e31f7939c1c27faa5d0e3f14519eaf7c89e8a69d upstream.

    The ftrace_profile->counter is unsigned long and
    do_div truncates it to 32 bits, which means it can test
    non-zero and be truncated to zero for division.
    Fix this issue by using div64_ul() instead.

    Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com

    Cc: stable@vger.kernel.org
    Fixes: e330b3bcd8319 ("tracing: Show sample std dev in function profiling")
    Fixes: 34886c8bc590f ("tracing: add average time in function to function profiler")
    Signed-off-by: Wen Yang
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Wen Yang
     
  • commit fe6e096a5bbf73a142f09c72e7aa2835026eb1a3 upstream.

    At least on PA-RISC and s390 synthetic histogram triggers are failing
    selftests because trace_event_raw_event_synth() always writes a 64 bit
    values, but the reader expects a field->size sized value. On little endian
    machines this doesn't hurt, but on big endian this makes the reader always
    read zero values.

    Link: http://lore.kernel.org/linux-trace-devel/20191218074427.96184-4-svens@linux.ibm.com

    Cc: stable@vger.kernel.org
    Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events")
    Acked-by: Tom Zanussi
    Signed-off-by: Sven Schnelle
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Sven Schnelle
     
  • commit 106f41f5a302cb1f36c7543fae6a05de12e96fa4 upstream.

    The compare functions of the histogram code would be specific for the size
    of the value being compared (byte, short, int, long long). It would
    reference the value from the array via the type of the compare, but the
    value was stored in a 64 bit number. This is fine for little endian
    machines, but for big endian machines, it would end up comparing zeros or
    all ones (depending on the sign) for anything but 64 bit numbers.

    To fix this, first derference the value as a u64 then convert it to the type
    being compared.

    Link: http://lkml.kernel.org/r/20191211103557.7bed6928@gandalf.local.home

    Cc: stable@vger.kernel.org
    Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map")
    Acked-by: Tom Zanussi
    Reported-by: Sven Schnelle
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 79e65c27f09683fbb50c33acab395d0ddf5302d2 upstream.

    When failing in the allocation of filter_item, process_system_preds()
    goes to fail_mem, where the allocated filter is freed.

    However, this leads to memory leak of filter->filter_string and
    filter->prog, which is allocated before and in process_preds().
    This bug has been detected by kmemleak as well.

    Fix this by changing kfree to __free_fiter.

    unreferenced object 0xffff8880658007c0 (size 32):
    comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
    hex dump (first 32 bytes):
    63 6f 6d 6d 6f 6e 5f 70 69 64 20 20 3e 20 31 30 common_pid > 10
    00 00 00 00 00 00 00 00 65 73 00 00 00 00 00 00 ........es......
    backtrace:
    [] kstrdup+0x2d/0x60
    [] apply_subsystem_event_filter+0x378/0x932
    [] subsystem_filter_write+0x5a/0x90
    [] vfs_write+0xe1/0x240
    [] ksys_write+0xb4/0x150
    [] do_syscall_64+0x6d/0x1e0
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    unreferenced object 0xffff888060c22d00 (size 64):
    comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 e8 d7 41 80 88 ff ff ...........A....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] process_preds+0x243/0x1820
    [] apply_subsystem_event_filter+0x3be/0x932
    [] subsystem_filter_write+0x5a/0x90
    [] vfs_write+0xe1/0x240
    [] ksys_write+0xb4/0x150
    [] do_syscall_64+0x6d/0x1e0
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    unreferenced object 0xffff888041d7e800 (size 512):
    comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
    hex dump (first 32 bytes):
    70 bc 85 97 ff ff ff ff 0a 00 00 00 00 00 00 00 p...............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] process_preds+0x71a/0x1820
    [] apply_subsystem_event_filter+0x3be/0x932
    [] subsystem_filter_write+0x5a/0x90
    [] vfs_write+0xe1/0x240
    [] ksys_write+0xb4/0x150
    [] do_syscall_64+0x6d/0x1e0
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Link: http://lkml.kernel.org/r/20191211091258.11310-1-keitasuzuki.park@sslab.ics.keio.ac.jp

    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Fixes: 404a3add43c9c ("tracing: Only add filter list when needed")
    Signed-off-by: Keita Suzuki
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Keita Suzuki
     
  • commit 3a53acf1d9bea11b57c1f6205e3fe73f9d8a3688 upstream.

    Task T2 Task T3
    trace_options_core_write() subsystem_open()

    mutex_lock(trace_types_lock) mutex_lock(event_mutex)

    set_tracer_flag()

    trace_event_enable_tgid_record() mutex_lock(trace_types_lock)

    mutex_lock(event_mutex)

    This gives a circular dependency deadlock between trace_types_lock and
    event_mutex. To fix this invert the usage of trace_types_lock and
    event_mutex in trace_options_core_write(). This keeps the sequence of
    lock usage consistent.

    Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com

    Cc: stable@vger.kernel.org
    Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks")
    Signed-off-by: Prateek Sood
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Prateek Sood
     

31 Dec, 2019

2 commits

  • [ Upstream commit c7411a1a126f649be71526a36d4afac9e5aefa13 ]

    Check whether the non-suffixed symbol is notrace, since suffixed
    symbols are generated by the compilers for optimization. Based on
    these suffixed symbols, notrace check might not work because
    some of them are just a partial code of the original function.
    (e.g. cold-cache (unlikely) code is separated from original
    function as FUNCTION.cold.XX)

    For example, without this fix,
    # echo p device_add.cold.67 > /sys/kernel/debug/tracing/kprobe_events
    sh: write error: Invalid argument

    # cat /sys/kernel/debug/tracing/error_log
    [ 135.491035] trace_kprobe: error: Failed to register probe event
    Command: p device_add.cold.67
    ^
    # dmesg | tail -n 1
    [ 135.488599] trace_kprobe: Could not probe notrace function device_add.cold.67

    With this,
    # echo p device_add.cold.66 > /sys/kernel/debug/tracing/kprobe_events
    # cat /sys/kernel/debug/kprobes/list
    ffffffff81599de9 k device_add.cold.66+0x0 [DISABLED]

    Actually, kprobe blacklist already did similar thing,
    see within_kprobe_blacklist().

    Link: http://lkml.kernel.org/r/157233790394.6706.18243942030937189679.stgit@devnote2

    Fixes: 45408c4f9250 ("tracing: kprobes: Prohibit probing on notrace function")
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     
  • [ Upstream commit 6ee40511cb838f9ced002dff7131bca87e3ccbdd ]

    Fail to allocate memory for tgid_map, because it requires order-6 page.
    detail as:

    c3 sh: page allocation failure: order:6,
    mode:0x140c0c0(GFP_KERNEL), nodemask=(null)
    c3 sh cpuset=/ mems_allowed=0
    c3 CPU: 3 PID: 5632 Comm: sh Tainted: G W O 4.14.133+ #10
    c3 Hardware name: Generic DT based system
    c3 Backtrace:
    c3 [] (dump_backtrace) from [](show_stack+0x18/0x1c)
    c3 [] (show_stack) from [](dump_stack+0x84/0xa4)
    c3 [] (dump_stack) from [](warn_alloc+0xc4/0x19c)
    c3 [] (warn_alloc) from [](__alloc_pages_nodemask+0xd18/0xf28)
    c3 [] (__alloc_pages_nodemask) from [](kmalloc_order+0x20/0x38)
    c3 [] (kmalloc_order) from [](kmalloc_order_trace+0x24/0x108)
    c3 [] (kmalloc_order_trace) from [](set_tracer_flag+0xb0/0x158)
    c3 [] (set_tracer_flag) from [](trace_options_core_write+0x7c/0xcc)
    c3 [] (trace_options_core_write) from [](__vfs_write+0x40/0x14c)
    c3 [] (__vfs_write) from [](vfs_write+0xc4/0x198)
    c3 [] (vfs_write) from [](SyS_write+0x6c/0xd0)
    c3 [] (SyS_write) from [](ret_fast_syscall+0x0/0x54)

    Switch to use kvcalloc to avoid unexpected allocation failures.

    Link: http://lkml.kernel.org/r/1571888070-24425-1-git-send-email-chunyan.zhang@unisoc.com

    Signed-off-by: Yuming Han
    Signed-off-by: Chunyan Zhang
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Yuming Han
     

22 Oct, 2019

1 commit

  • A race condition exists while initialiazing perf_trace_buf from
    perf_trace_init() and perf_kprobe_init().

    CPU0 CPU1
    perf_trace_init()
    mutex_lock(&event_mutex)
    perf_trace_event_init()
    perf_trace_event_reg()
    total_ref_count == 0
    buf = alloc_percpu()
    perf_trace_buf[i] = buf
    tp_event->class->reg() //fails perf_kprobe_init()
    goto fail perf_trace_event_init()
    perf_trace_event_reg()
    fail:
    total_ref_count == 0

    total_ref_count == 0
    buf = alloc_percpu()
    perf_trace_buf[i] = buf
    tp_event->class->reg()
    total_ref_count++

    free_percpu(perf_trace_buf[i])
    perf_trace_buf[i] = NULL

    Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
    causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
    getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
    event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
    fix this race.

    The race caused the following bug:

    Unable to handle kernel paging request at virtual address 0000003106f2003c
    Mem abort info:
    ESR = 0x96000045
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    Data abort info:
    ISV = 0, ISS = 0x00000045
    CM = 0, WnR = 1
    user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
    [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
    Internal error: Oops: 96000045 [#1] PREEMPT SMP
    Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
    pstate: 80400005 (Nzcv daif +PAN -UAO)
    pc : __memset+0x20/0x1ac
    lr : memset+0x3c/0x50
    sp : ffffffc09319fc50

    __memset+0x20/0x1ac
    perf_trace_buf_alloc+0x140/0x1a0
    perf_trace_sys_enter+0x158/0x310
    syscall_trace_enter+0x348/0x7c0
    el0_svc_common+0x11c/0x368
    el0_svc_handler+0x12c/0x198
    el0_svc+0x8/0xc

    Ramdumps showed the following:
    total_ref_count = 3
    perf_trace_buf = (
    0x0 -> NULL,
    0x0 -> NULL,
    0x0 -> NULL,
    0x0 -> NULL)

    Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org

    Cc: stable@vger.kernel.org
    Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU")
    Acked-by: Song Liu
    Signed-off-by: Prateek Sood
    Signed-off-by: Steven Rostedt (VMware)

    Prateek Sood
     

19 Oct, 2019

1 commit

  • In the format of synthetic events, the "gfp_t" is shown as "signed:1",
    but in fact the "gfp_t" is "unsigned", should be shown as "signed:0".

    The issue can be reproduced by the following commands:

    echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events
    cat /sys/kernel/debug/tracing/events/synthetic/memlatency/format

    name: memlatency
    ID: 2233
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    field:int common_pid; offset:4; size:4; signed:1;

    field:u64 lat; offset:8; size:8; signed:0;
    field:unsigned int order; offset:16; size:4; signed:0;
    field:gfp_t gfp_flags; offset:24; size:4; signed:1;
    field:int migratetype; offset:32; size:4; signed:1;

    print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype

    Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.com

    Reviewed-by: Tom Zanussi
    Signed-off-by: Zhengjun Xing
    Signed-off-by: Steven Rostedt (VMware)

    Zhengjun Xing
     

13 Oct, 2019

8 commits

  • A customer reported the following softlockup:

    [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
    [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
    [899688.160002] RIP: 0010:up_write+0x1a/0x30
    [899688.160002] Kernel panic - not syncing: softlockup: hung tasks
    [899688.160002] RIP: 0010:up_write+0x1a/0x30
    [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
    [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
    [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
    [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
    [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
    [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
    [899688.160002] tracing_read_pipe+0x336/0x3c0
    [899688.160002] __vfs_read+0x26/0x140
    [899688.160002] vfs_read+0x87/0x130
    [899688.160002] SyS_read+0x42/0x90
    [899688.160002] do_syscall_64+0x74/0x160

    It caught the process in the middle of trace_access_unlock(). There is
    no loop. So, it must be looping in the caller tracing_read_pipe()
    via the "waitagain" label.

    Crashdump analyze uncovered that iter->seq was completely zeroed
    at this point, including iter->seq.seq.size. It means that
    print_trace_line() was never able to print anything and
    there was no forward progress.

    The culprit seems to be in the code:

    /* reset all but tr, trace, and overruns */
    memset(&iter->seq, 0,
    sizeof(struct trace_iterator) -
    offsetof(struct trace_iterator, seq));

    It was added by the commit 53d0aa773053ab182877 ("ftrace:
    add logic to record overruns"). It was v2.6.27-rc1.
    It was the time when iter->seq looked like:

    struct trace_seq {
    unsigned char buffer[PAGE_SIZE];
    unsigned int len;
    };

    There was no "size" variable and zeroing was perfectly fine.

    The solution is to reinitialize the structure after or without
    zeroing.

    Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com

    Signed-off-by: Petr Mladek
    Signed-off-by: Steven Rostedt (VMware)

    Petr Mladek
     
  • max_latency is intended to record the maximum ever observed hardware
    latency, which may occur in either part of the loop (inner/outer). So
    we need to also consider the outer-loop sample when updating
    max_latency.

    Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu

    Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer")
    Cc: stable@vger.kernel.org
    Signed-off-by: Srivatsa S. Bhat (VMware)
    Signed-off-by: Steven Rostedt (VMware)

    Srivatsa S. Bhat (VMware)
     
  • nmi_total_ts is supposed to record the total time spent in *all* NMIs
    that occur on the given CPU during the (active portion of the)
    sampling window. However, the code seems to be overwriting this
    variable for each NMI, thereby only recording the time spent in the
    most recent NMI. Fix it by accumulating the duration instead.

    Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu

    Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector")
    Cc: stable@vger.kernel.org
    Signed-off-by: Srivatsa S. Bhat (VMware)
    Signed-off-by: Steven Rostedt (VMware)

    Srivatsa S. Bhat (VMware)
     
  • Added various checks on open tracefs calls to see if tracefs is in lockdown
    mode, and if so, to return -EPERM.

    Note, the event format files (which are basically standard on all machines)
    as well as the enabled_functions file (which shows what is currently being
    traced) are not lockde down. Perhaps they should be, but it seems counter
    intuitive to lockdown information to help you know if the system has been
    modified.

    Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com

    Suggested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Currently, most files in the tracefs directory test if tracing_disabled is
    set. If so, it should return -ENODEV. The tracing_disabled is called when
    tracing is found to be broken. Originally it was done in case the ring
    buffer was found to be corrupted, and we wanted to prevent reading it from
    crashing the kernel. But it's also called if a tracing selftest fails on
    boot. It's a one way switch. That is, once it is triggered, tracing is
    disabled until reboot.

    As most tracefs files can also be used by instances in the tracefs
    directory, they need to be carefully done. Each instance has a trace_array
    associated to it, and when the instance is removed, the trace_array is
    freed. But if an instance is opened with a reference to the trace_array,
    then it requires looking up the trace_array to get its ref counter (as there
    could be a race with it being deleted and the open itself). Once it is
    found, a reference is added to prevent the instance from being removed (and
    the trace_array associated with it freed).

    Combine the two checks (tracing_disabled and trace_array_get()) into a
    single helper function. This will also make it easier to add lockdown to
    tracefs later.

    Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • Instead of having the trace events system open call open code the taking of
    the trace_array descriptor (with trace_array_get()) and then calling
    trace_open_generic(), have it use the tracing_open_generic_tr() that does
    the combination of the two. This requires making tracing_open_generic_tr()
    global.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • As instances may have different tracers available, we need to look at the
    trace_array descriptor that shows the list of the available tracers for the
    instance. But there's a race between opening the file and an admin
    deleting the instance. The trace_array_get() needs to be called before
    accessing the trace_array.

    Cc: stable@vger.kernel.org
    Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
    an instance now. They need to take a reference to the instance otherwise
    there could be a race between accessing the files and deleting the instance.

    It wasn't until the :mod: caching where these file operations started
    referencing the trace_array directly.

    Cc: stable@vger.kernel.org
    Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)