06 Apr, 2019
1 commit
-
[ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ]
As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
BUG for "sleeping function called from invalid context".kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
atomic context. A very simple solution for this is to add allocation
flags to ring_buffer_read_prepare() so kdb can call it without
triggering the allocation error. This patch does that.Note that in the original email thread about this, it was suggested
that perhaps the solution for kdb was to either preallocate the buffer
ahead of time or create our own iterator. I'm hoping that this
alternative of adding allocation flags to ring_buffer_read_prepare()
can be considered since it means I don't need to duplicate more of the
core trace code into "trace_kdb.c" (for either creating my own
iterator or re-preparing a ring allocator whose memory was already
allocated).NOTE: another option for kdb is to actually figure out how to make it
reuse the existing ftrace_dump() function and totally eliminate the
duplication. This sounds very appealing and actually works (the "sr
z" command can be seen to properly dump the ftrace buffer). The
downside here is that ftrace_dump() fully consumes the trace buffer.
Unless that is changed I'd rather not use it because it means "ftdump
| grep xyz" won't be very useful to search the ftrace buffer since it
will throw away the whole trace on the first grep. A future patch to
dump only the last few lines of the buffer will also be hard to
implement.[1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org
Reported-by: Brian Norris
Signed-off-by: Douglas Anderson
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Sasha Levin
24 Mar, 2019
3 commits
-
commit 83540fbc8812a580b6ad8f93f4c29e62e417687e upstream.
The first version of this method was missing the check for
`ret == PATH_MAX`; then such a check was added, but it didn't call kfree()
on error, so there was still a small memory leak in the error case.
Fix it by using strndup_user() instead of open-coding it.Link: http://lkml.kernel.org/r/20190220165443.152385-1-jannh@google.com
Cc: Ingo Molnar
Cc: stable@vger.kernel.org
Fixes: 0eadcc7a7bc0 ("perf/core: Fix perf_uprobe_init()")
Reviewed-by: Masami Hiramatsu
Acked-by: Song Liu
Signed-off-by: Jann Horn
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit e7f0c424d0806b05d6f47be9f202b037eb701707 upstream.
Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in
pipe files") use the current tracer instead of the copy in
tracing_open_pipe(), but it forget to remove the freeing sentence in
the error path.There's an error path that can call kfree(iter->trace) after the iter->trace
was assigned to tr->current_trace, which would be bad to free.Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com
Cc: stable@vger.kernel.org
Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files")
Signed-off-by: zhangyi (F)
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit 9f0bbf3115ca9f91f43b7c74e9ac7d79f47fc6c2 upstream.
Because there may be random garbage beyond a string's null terminator,
it's not correct to copy the the complete character array for use as a
hist trigger key. This results in multiple histogram entries for the
'same' string key.So, in the case of a string key, use strncpy instead of memcpy to
avoid copying in the extra bytes.Before, using the gdbus entries in the following hist trigger as an
example:# echo 'hist:key=comm' > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist...
{ comm: ImgDecoder #4 } hitcount: 203
{ comm: gmain } hitcount: 213
{ comm: gmain } hitcount: 216
{ comm: StreamTrans #73 } hitcount: 221
{ comm: mozStorage #3 } hitcount: 230
{ comm: gdbus } hitcount: 233
{ comm: StyleThread#5 } hitcount: 253
{ comm: gdbus } hitcount: 256
{ comm: gdbus } hitcount: 260
{ comm: StyleThread#4 } hitcount: 271...
# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
51After:
# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
1Link: http://lkml.kernel.org/r/50c35ae1267d64eee975b8125e151e600071d4dc.1549309756.git.tom.zanussi@linux.intel.com
Cc: Namhyung Kim
Cc: stable@vger.kernel.org
Fixes: 79e577cbce4c4 ("tracing: Support string type key properly")
Signed-off-by: Tom Zanussi
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
14 Mar, 2019
1 commit
-
[ Upstream commit e16ec34039c701594d55d08a5aa49ee3e1abc821 ]
Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
[ 13.007000] WARNING: possible circular locking dependency detected
[ 13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[ 13.008124] ------------------------------------------------------
[ 13.008624] test_progs/246 is trying to acquire lock:
[ 13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[ 13.009770]
[ 13.009770] but task is already holding lock:
[ 13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.010877]
[ 13.010877] which lock already depends on the new lock.
[ 13.010877]
[ 13.011532]
[ 13.011532] the existing dependency chain (in reverse order) is:
[ 13.012129]
[ 13.012129] -> #4 (bpf_event_mutex){+.+.}:
[ 13.012582] perf_event_query_prog_array+0x9b/0x130
[ 13.013016] _perf_ioctl+0x3aa/0x830
[ 13.013354] perf_ioctl+0x2e/0x50
[ 13.013668] do_vfs_ioctl+0x8f/0x6a0
[ 13.014003] ksys_ioctl+0x70/0x80
[ 13.014320] __x64_sys_ioctl+0x16/0x20
[ 13.014668] do_syscall_64+0x4a/0x180
[ 13.015007] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.015469]
[ 13.015469] -> #3 (&cpuctx_mutex){+.+.}:
[ 13.015910] perf_event_init_cpu+0x5a/0x90
[ 13.016291] perf_event_init+0x1b2/0x1de
[ 13.016654] start_kernel+0x2b8/0x42a
[ 13.016995] secondary_startup_64+0xa4/0xb0
[ 13.017382]
[ 13.017382] -> #2 (pmus_lock){+.+.}:
[ 13.017794] perf_event_init_cpu+0x21/0x90
[ 13.018172] cpuhp_invoke_callback+0xb3/0x960
[ 13.018573] _cpu_up+0xa7/0x140
[ 13.018871] do_cpu_up+0xa4/0xc0
[ 13.019178] smp_init+0xcd/0xd2
[ 13.019483] kernel_init_freeable+0x123/0x24f
[ 13.019878] kernel_init+0xa/0x110
[ 13.020201] ret_from_fork+0x24/0x30
[ 13.020541]
[ 13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[ 13.021051] static_key_slow_inc+0xe/0x20
[ 13.021424] tracepoint_probe_register_prio+0x28c/0x300
[ 13.021891] perf_trace_event_init+0x11f/0x250
[ 13.022297] perf_trace_init+0x6b/0xa0
[ 13.022644] perf_tp_event_init+0x25/0x40
[ 13.023011] perf_try_init_event+0x6b/0x90
[ 13.023386] perf_event_alloc+0x9a8/0xc40
[ 13.023754] __do_sys_perf_event_open+0x1dd/0xd30
[ 13.024173] do_syscall_64+0x4a/0x180
[ 13.024519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.024968]
[ 13.024968] -> #0 (tracepoints_mutex){+.+.}:
[ 13.025434] __mutex_lock+0x86/0x970
[ 13.025764] tracepoint_probe_register_prio+0x2d/0x300
[ 13.026215] bpf_probe_register+0x40/0x60
[ 13.026584] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.027042] __do_sys_bpf+0x94f/0x1a90
[ 13.027389] do_syscall_64+0x4a/0x180
[ 13.027727] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.028171]
[ 13.028171] other info that might help us debug this:
[ 13.028171]
[ 13.028807] Chain exists of:
[ 13.028807] tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[ 13.028807]
[ 13.029666] Possible unsafe locking scenario:
[ 13.029666]
[ 13.030140] CPU0 CPU1
[ 13.030510] ---- ----
[ 13.030875] lock(bpf_event_mutex);
[ 13.031166] lock(&cpuctx_mutex);
[ 13.031645] lock(bpf_event_mutex);
[ 13.032135] lock(tracepoints_mutex);
[ 13.032441]
[ 13.032441] *** DEADLOCK ***
[ 13.032441]
[ 13.032911] 1 lock held by test_progs/246:
[ 13.033239] #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.033909]
[ 13.033909] stack backtrace:
[ 13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[ 13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 13.035657] Call Trace:
[ 13.035859] dump_stack+0x5f/0x8b
[ 13.036130] print_circular_bug.isra.37+0x1ce/0x1db
[ 13.036526] __lock_acquire+0x1158/0x1350
[ 13.036852] ? lock_acquire+0x98/0x190
[ 13.037154] lock_acquire+0x98/0x190
[ 13.037447] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.037876] __mutex_lock+0x86/0x970
[ 13.038167] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.038600] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.039028] ? __mutex_lock+0x86/0x970
[ 13.039337] ? __mutex_lock+0x24a/0x970
[ 13.039649] ? bpf_probe_register+0x1d/0x60
[ 13.039992] ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[ 13.040478] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.040906] tracepoint_probe_register_prio+0x2d/0x300
[ 13.041325] bpf_probe_register+0x40/0x60
[ 13.041649] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.042068] ? __might_fault+0x3e/0x90
[ 13.042374] __do_sys_bpf+0x94f/0x1a90
[ 13.042678] do_syscall_64+0x4a/0x180
[ 13.042975] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.043382] RIP: 0033:0x7f23b10a07f9
[ 13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[ 13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[ 13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[ 13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
10 Mar, 2019
1 commit
-
commit 6a072128d262d2b98d31626906a96700d1fc11eb upstream.
Then tracing syscall exit event it is extremely useful to filter exit
codes equal to some negative value, to react only to required errors.
But negative numbers does not work:[root@snorch sys_exit_read]# echo "ret == -1" > filter
bash: echo: write error: Invalid argument
[root@snorch sys_exit_read]# cat filter
ret == -1
^
parse_error: Invalid value (did you forget quotes)?Similar thing happens when setting triggers.
These is a regression in v4.17 introduced by the commit mentioned below,
testing without these commit shows no problem with negative numbers.Link: http://lkml.kernel.org/r/20180823102534.7642-1-ptikhomirov@virtuozzo.com
Cc: stable@vger.kernel.org
Fixes: 80765597bc58 ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Pavel Tikhomirov
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
27 Feb, 2019
1 commit
-
commit 9e7382153f80ba45a0bbcd540fb77d4b15f6e966 upstream.
The following commit
441dae8f2f29 ("tracing: Add support for display of tgid in trace output")
removed the call to print_event_info() from print_func_help_header_irq()
which results in the ftrace header not reporting the number of entries
written in the buffer. As this wasn't the original intent of the patch,
re-introduce the call to print_event_info() to restore the orginal
behaviour.Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com
Acked-by: Joel Fernandes
Cc: stable@vger.kernel.org
Fixes: 441dae8f2f29 ("tracing: Add support for display of tgid in trace output")
Signed-off-by: Quentin Perret
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
20 Feb, 2019
1 commit
-
commit 0722069a5374b904ec1a67f91249f90e1cfae259 upstream.
When printing multiple uprobe arguments as strings the output for the
earlier arguments would also include all later string arguments.This is best explained in an example:
Consider adding a uprobe to a function receiving two strings as
parameters which is at offset 0xa0 in strlib.so and we want to print
both parameters when the uprobe is hit (on x86_64):$ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
/sys/kernel/debug/tracing/uprobe_eventsWhen the function is called as func("foo", "bar") and we hit the probe,
the trace file shows a line like the following:[...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"
Note the extra "bar" printed as part of arg1. This behaviour stacks up
for additional string arguments.The strings are stored in a dynamically growing part of the uprobe
buffer by fetch_store_string() after copying them from userspace via
strncpy_from_user(). The return value of strncpy_from_user() is then
directly used as the required size for the string. However, this does
not take the terminating null byte into account as the documentation
for strncpy_from_user() cleary states that it "[...] returns the
length of the string (not including the trailing NUL)" even though the
null byte will be copied to the destination.Therefore, subsequent calls to fetch_store_string() will overwrite
the terminating null byte of the most recently fetched string with
the first character of the current string, leading to the
"accumulation" of strings in earlier arguments in the output.Fix this by incrementing the return value of strncpy_from_user() by
one if we did not hit the maximum buffer size.Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de
Cc: Ingo Molnar
Cc: stable@vger.kernel.org
Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes")
Acked-by: Masami Hiramatsu
Signed-off-by: Andreas Ziegler
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Masami Hiramatsu
Signed-off-by: Greg Kroah-Hartman
15 Feb, 2019
1 commit
-
commit ea6eb5e7d15e1838de335609994b4546e2abcaaf upstream.
The subsystem-specific message prefix for uprobes was also
"trace_kprobe: " instead of "trace_uprobe: " as described in
the original commit message.Link: http://lkml.kernel.org/r/20190117133023.19292-1-andreas.ziegler@fau.de
Cc: Ingo Molnar
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu
Fixes: 7257634135c24 ("tracing/probe: Show subsystem name in messages")
Signed-off-by: Andreas Ziegler
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
20 Dec, 2018
3 commits
-
commit 2840f84f74035e5a535959d5f17269c69fa6edc5 upstream.
The following commands will cause a memory leak:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo schedule > instance/foo/set_ftrace_filter
# rmdir instances/fooThe reason is that the hashes that hold the filters to set_ftrace_filter and
set_ftrace_notrace are not freed if they contain any data on the instance
and the instance is removed.Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions")
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit 3cec638b3d793b7cacdec5b8072364b41caeb0e1 upstream.
When create_event_filter() fails in set_trigger_filter(), the filter may
still be allocated and needs to be freed. The caller expects the
data->filter to be updated with the new filter, even if the new filter
failed (we could add an error message by setting set_str parameter of
create_event_filter(), but that's another update).But because the error would just exit, filter was left hanging and
nothing could free it.Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: bac5fb97a173a ("tracing: Add and use generic set_trigger_filter() implementation")
Reviewed-by: Tom Zanussi
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit b61c19209c2c35ea2a2fe502d484703686eba98c upstream.
The create_filter() calls create_filter_start() which allocates a
"parse_error" descriptor, but fails to call create_filter_finish() that
frees it.The op_stack and inverts in predicate_parse() were also not freed.
Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster")
Reviewed-by: Tom Zanussi
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
17 Dec, 2018
1 commit
-
[ Upstream commit 1efb6ee3edea57f57f9fb05dba8dcb3f7333f61f ]
A format string consisting of "%p" or "%s" followed by an invalid
specifier (e.g. "%p%\n" or "%s%") could pass the check which
would make format_decode (lib/vsprintf.c) to warn.Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
Signed-off-by: Martynas Pumputis
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
08 Dec, 2018
1 commit
-
commit 5cf99a0f3161bc3ae2391269d134d6bf7e26f00e upstream.
The tracefs file set_graph_function is used to only function graph functions
that are listed in that file (or all functions if the file is empty). The
way this is implemented is that the function graph tracer looks at every
function, and if the current depth is zero and the function matches
something in the file then it will trace that function. When other functions
are called, the depth will be greater than zero (because the original
function will be at depth zero), and all functions will be traced where the
depth is greater than zero.The issue is that when a function is first entered, and the handler that
checks this logic is called, the depth is set to zero. If an interrupt comes
in and a function in the interrupt handler is traced, its depth will be
greater than zero and it will automatically be traced, even if the original
function was not. But because the logic only looks at depth it may trace
interrupts when it should not be.The recent design change of the function graph tracer to fix other bugs
caused the depth to be zero while the function graph callback handler is
being called for a longer time, widening the race of this happening. This
bug was actually there for a longer time, but because the race window was so
small it seldom happened. The Fixes tag below is for the commit that widen
the race window, because that commit belongs to a series that will also help
fix the original bug.Cc: stable@kernel.org
Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
Reported-by: Joe Lawrence
Tested-by: Joe Lawrence
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
06 Dec, 2018
6 commits
-
commit 7c6ea35ef50810aa12ab26f21cb858d980881576 upstream.
The function graph profiler uses the ret_stack to store the "subtime" and
reuse it by nested functions and also on the return. But the current logic
has the profiler callback called before the ret_stack is updated, and it is
just modifying the ret_stack that will later be allocated (it's just lucky
that the "subtime" is not touched when it is allocated).This could also cause a crash if we are at the end of the ret_stack when
this happens.By reversing the order of the allocating the ret_stack and then calling the
callbacks attached to a function being traced, the ret_stack entry is no
longer used before it is allocated.Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit 552701dd0fa7c3d448142e87210590ba424694a0 upstream.
In the past, curr_ret_stack had two functions. One was to denote the depth
of the call graph, the other is to keep track of where on the ret_stack the
data is used. Although they may be slightly related, there are two cases
where they need to be used differently.The one case is that it keeps the ret_stack data from being corrupted by an
interrupt coming in and overwriting the data still in use. The other is just
to know where the depth of the stack currently is.The function profiler uses the ret_stack to save a "subtime" variable that
is part of the data on the ret_stack. If curr_ret_stack is modified too
early, then this variable can be corrupted.The "max_depth" option, when set to 1, will record the first functions going
into the kernel. To see all top functions (when dealing with timings), the
depth variable needs to be lowered before calling the return hook. But by
lowering the curr_ret_stack, it makes the data on the ret_stack still being
used by the return hook susceptible to being overwritten.Now that there's two variables to handle both cases (curr_ret_depth), we can
move them to the locations where they can handle both cases.Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit b1b35f2e218a5b57d03bbc3b0667d5064570dc60 upstream.
The profiler uses trace->depth to find its entry on the ret_stack, but the
depth may not match the actual location of where its entry is (if an
interrupt were to preempt the processing of the profiler for another
function, the depth and the curr_ret_stack will be different).Have it use the curr_ret_stack as the index to find its ret_stack entry
instead of using the depth variable, as that is no longer guaranteed to be
the same.Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit 39eb456dacb543de90d3bc6a8e0ac5cf51ac475e upstream.
Currently, the depth of the ret_stack is determined by curr_ret_stack index.
The issue is that there's a race between setting of the curr_ret_stack and
calling of the callback attached to the return of the function.Commit 03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling
trace return callback") moved the calling of the callback to after the
setting of the curr_ret_stack, even stating that it was safe to do so, when
in fact, it was the reason there was a barrier() there (yes, I should have
commented that barrier()).Not only does the curr_ret_stack keep track of the current call graph depth,
it also keeps the ret_stack content from being overwritten by new data.The function profiler, uses the "subtime" variable of ret_stack structure
and by moving the curr_ret_stack, it allows for interrupts to use the same
structure it was using, corrupting the data, and breaking the profiler.To fix this, there needs to be two variables to handle the call stack depth
and the pointer to where the ret_stack is being used, as they need to change
at two different locations.Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit d125f3f866df88da5a85df00291f88f0baa89f7c upstream.
As all architectures now call function_graph_enter() to do the entry work,
no architecture should ever call ftrace_push_return_trace(). Make it static.This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman -
commit 8114865ff82e200b383e46821c25cb0625b842b5 upstream.
Currently all the architectures do basically the same thing in preparing the
function graph tracer on entry to a function. This code can be pulled into a
generic location and then this will allow the function graph tracer to be
fixed, as well as extended.Create a new function graph helper function_graph_enter() that will call the
hook function (ftrace_graph_entry) and the shadow stack operation
(ftrace_push_return_trace), and remove the need of the architecture code to
manage the shadow stack.This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
21 Nov, 2018
1 commit
-
[ Upstream commit 59158ec4aef7d44be51a6f3e7e17fc64c32604eb ]
Current kprobe event doesn't checks correctly whether the
given event is on unloaded module or not. It just checks
the event has ":" in the name.That is not enough because if we define a probe on non-exist
symbol on loaded module, it allows to define that (with
warning message)To ensure it correctly, this searches the module name on
loaded module list and only if there is not, it allows to
define it. (this event will be available when the target
module is loaded)Link: http://lkml.kernel.org/r/153547309528.26502.8300278470528281328.stgit@devbox
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
14 Nov, 2018
1 commit
-
commit 18858511fd8a877303cc34c06efa461b26a0e070 upstream.
Return -ENOENT error if there is no target synthetic event.
This notices an operation failure to user as below;# echo 'wakeup_latency u64 lat; pid_t pid;' > synthetic_events
# echo '!wakeup' >> synthetic_events
sh: write error: No such file or directoryLink: http://lkml.kernel.org/r/154013449986.25576.9487131386597290172.stgit@devbox
Acked-by: Tom Zanussi
Tested-by: Tom Zanussi
Cc: Shuah Khan
Cc: Rajvi Jingar
Cc: stable@vger.kernel.org
Fixes: 4b147936fa50 ('tracing: Add support for 'synthetic' events')
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Greg Kroah-Hartman
20 Oct, 2018
2 commits
-
Fix synthetic event to allow independent semicolon at end.
The synthetic_events interface accepts a semicolon after the
last word if there is no space.# echo "myevent u64 var;" >> synthetic_events
But if there is a space, it returns an error.
# echo "myevent u64 var ;" > synthetic_events
sh: write error: Invalid argumentThis behavior is difficult for users to understand. Let's
allow the last independent semicolon too.Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox
Cc: Shuah Khan
Cc: Tom Zanussi
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware) -
Fix synthetic event to accept unsigned modifier for its field type
correctly.Currently, synthetic_events interface returns error for "unsigned"
modifiers as below;# echo "myevent unsigned long var" >> synthetic_events
sh: write error: Invalid argumentThis is because argv_split() breaks "unsigned long" into "unsigned"
and "long", but parse_synth_field() doesn't expected it.With this fix, synthetic_events can handle the "unsigned long"
correctly like as below;# echo "myevent unsigned long var" >> synthetic_events
# cat synthetic_events
myevent unsigned long varLink: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox
Cc: Shuah Khan
Cc: Tom Zanussi
Cc: stable@vger.kernel.org
Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
18 Oct, 2018
1 commit
-
The preemptirq_delay_test module is used for the ftrace selftest code that
tests the latency tracers. The problem is that it uses ktime for the delay
loop, and then checks the tracer to see if the delay loop is caught, but the
tracer uses trace_clock_local() which uses various different other clocks to
measure the latency. As ktime uses the clock cycles, and the code then
converts that to nanoseconds, it causes rounding errors, and the preemptirq
latency tests are failing due to being off by 1 (it expects to see a delay
of 500000 us, but the delay is only 499999 us). This is happening due to a
rounding error in the ktime (which is totally legit). The purpose of the
test is to see if it can catch the delay, not to test the accuracy between
trace_clock_local() and ktime_get(). Best to use apples to apples, and have
the delay loop use the same clock as the latency tracer does.Cc: stable@vger.kernel.org
Fixes: f96e8577da102 ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Acked-by: Joel Fernandes (Google)
Signed-off-by: Steven Rostedt (VMware)
18 Sep, 2018
1 commit
-
When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com
Cc: stable@vger.kernel.org
Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Jason Behmer
Signed-off-by: Vaibhav Nagarnaik
Signed-off-by: Steven Rostedt (VMware)
24 Aug, 2018
1 commit
-
Pull tracing fixes from Steven Rostedt:
"Masami found an off by one bug in the code that keeps "notrace"
functions from being traced by kprobes. During my testing, I found
that there's places that we may want to add kprobes to notrace, thus
we may end up changing this code before 4.19 is released.The history behind this change is that we found that adding kprobes to
various notrace functions caused the kernel to crashed. We took the
safe route and decided not to allow kprobes to trace any notrace
function.But because notrace is added to functions that just cause weird side
effects to the function tracer, but are still safe, preventing kprobes
for all notrace functios may be too much of a big hammer.One such place is __schedule() is marked notrace, to keep function
tracer from doing strange recursive loops when it gets traced with
NEED_RESCHED set. With this change, one can not add kprobes to the
scheduler.Masami also added code to use gcov on ftrace"
* tag 'trace-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobes: Fix to check notrace function with correct range
tracing: Allow gcov profiling on only ftrace subsystem
23 Aug, 2018
1 commit
-
Pull more block updates from Jens Axboe:
- Set of bcache fixes and changes (Coly)
- The flush warn fix (me)
- Small series of BFQ fixes (Paolo)
- wbt hang fix (Ming)
- blktrace fix (Steven)
- blk-mq hardware queue count update fix (Jianchao)
- Various little fixes
* tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits)
block/DAC960.c: make some arrays static const, shrinks object size
blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter
blk-mq: init hctx sched after update ctx and hctx mapping
block: remove duplicate initialization
tracing/blktrace: Fix to allow setting same value
pktcdvd: fix setting of 'ret' error return for a few cases
block: change return type to bool
block, bfq: return nbytes and not zero from struct cftype .write() method
block, bfq: improve code of bfq_bfqq_charge_time
block, bfq: reduce write overcharge
block, bfq: always update the budget of an entity when needed
block, bfq: readd missing reset of parent-entity service
blk-wbt: fix IO hang in wbt_wait()
block: don't warn for flush on read-only device
bcache: add the missing comments for smp_mb()/smp_wmb()
bcache: remove unnecessary space before ioctl function pointer arguments
bcache: add missing SPDX header
bcache: move open brace at end of function definitions to next line
bcache: add static const prefix to char * array declarations
bcache: fix code comments style
...
21 Aug, 2018
3 commits
-
Fix within_notrace_func() to check notrace function correctly.
Since the ftrace_location_range(start, end) function checks
the range inclusively (start
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware) -
Add GCOV_PROFILE_FTRACE to allow gcov profiling on only files in ftrace
subsystem. This config option will be used for checking kselftest/ftrace
coverage.Link: http://lkml.kernel.org/r/153483647755.32472.4746349899604275441.stgit@devbox
Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware) -
Pull tracing updates from Steven Rostedt:
- Restructure of lockdep and latency tracers
This is the biggest change. Joel Fernandes restructured the hooks
from irqs and preemption disabling and enabling. He got rid of a lot
of the preprocessor #ifdef mess that they caused.He turned both lockdep and the latency tracers to use trace events
inserted in the preempt/irqs disabling paths. But unfortunately,
these started to cause issues in corner cases. Thus, parts of the
code was reverted back to where lockdep and the latency tracers just
get called directly (without using the trace events). But because the
original change cleaned up the code very nicely we kept that, as well
as the trace events for preempt and irqs disabling, but they are
limited to not being called in NMIs.- Have trace events use SRCU for "rcu idle" calls. This was required
for the preempt/irqs off trace events. But it also had to not allow
them to be called in NMI context. Waiting till Paul makes an NMI safe
SRCU API.- New notrace SRCU API to allow trace events to use SRCU.
- Addition of mcount-nop option support
- SPDX headers replacing GPL templates.
- Various other fixes and clean ups.
- Some fixes are marked for stable, but were not fully tested before
the merge window opened.* tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
tracing: Fix SPDX format headers to use C++ style comments
tracing: Add SPDX License format tags to tracing files
tracing: Add SPDX License format to bpf_trace.c
blktrace: Add SPDX License format header
s390/ftrace: Add -mfentry and -mnop-mcount support
tracing: Add -mcount-nop option support
tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
tracing: Handle CC_FLAGS_FTRACE more accurately
Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Uprobes: Simplify uprobe_register() body
tracepoints: Free early tracepoints after RCU is initialized
uprobes: Use synchronize_rcu() not synchronize_sched()
tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()
ftrace: Remove unused pointer ftrace_swapper_pid
tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
tracing/irqsoff: Handle preempt_count for different configs
tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"
tracing: irqsoff: Account for additional preempt_disable
trace: Use rcu_dereference_raw for hooks from trace-event subsystem
tracing/kprobes: Fix within_notrace_func() to check only notrace functions
...
17 Aug, 2018
5 commits
-
The Linux kernel adopted the SPDX License format headers to ease license
compliance management, and uses the C++ '//' style comments for the SPDX
header tags. Some files in the tracing directory used the C style /* */
comments for them. To be consistent across all files, replace the /* */
C style SPDX tags with the C++ // SPDX tags.Signed-off-by: Steven Rostedt (VMware)
-
Add the SPDX License header to ease license compliance management.
Signed-off-by: Steven Rostedt (VMware)
-
Add the SPDX License header to ease license compliance management.
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Steven Rostedt (VMware) -
Masami Hiramatsu reported:
Current trace-enable attribute in sysfs returns an error
if user writes the same setting value as current one,
e.g.# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
bash: echo: write error: Invalid argument
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
bash: echo: write error: Device or resource busyBut this is not a preferred behavior, it should ignore
if new setting is same as current one. This fixes the
problem as below.# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enableLink: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home
Cc: Ingo Molnar
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()")
Reported-by: Masami Hiramatsu
Tested-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)
Signed-off-by: Jens Axboe -
Add the SPDX License header to ease license compliance management.
Acked-by: Jens Axboe
Signed-off-by: Steven Rostedt (VMware)
16 Aug, 2018
2 commits
-
-mcount-nop gcc option generates the calls to the profiling functions
as nops which allows to avoid patching mcount jump with NOP instructions
initially.-mcount-nop gcc option will be activated if platform selects
HAVE_NOP_MCOUNT and gcc actually supports it.
In addition to that CC_USING_NOP_MCOUNT is defined and could be used by
architectures to adapt ftrace patching behavior.Link: http://lkml.kernel.org/r/patch-3.thread-aa7b8d.git-e02ed2dc082b.your-ad-here.call-01533557518-ext-9465@work.hours
Signed-off-by: Vasily Gorbik
Signed-off-by: Steven Rostedt (VMware) -
Pull printk updates from Petr Mladek:
- Different vendors have a different expectation about a console
quietness. Make it configurable to reduce bike-shedding about the
upstream default- Decide about the message visibility when the message is stored. It
avoids races caused by a delayed console handling- Always store printk() messages into the per-CPU buffers again in NMI.
The only exception is when flushing trace log in panic(). There the
risk of loosing messages is worth an eventual reordering- Handle invalid %pO printf modifiers correctly
- Better handle %p printf modifier tests before crng is initialized
- Some clean up
* tag 'printk-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
lib/vsprintf: Do not handle %pO[^F] as %px
printk: Fix warning about unused suppress_message_printing
printk/nmi: Prevent deadlock when accessing the main log buffer in NMI
printk: Create helper function to queue deferred console handling
printk: Split the code for storing a message into the log buffer
printk: Clean up syslog_print_all()
printk: Remove unnecessary kmalloc() from syslog during clear
printk: Make CONSOLE_LOGLEVEL_QUIET configurable
printk: make sure to print log on console.
lib/test_printf.c: accept "ptrval" as valid result for plain 'p' tests
15 Aug, 2018
2 commits
-
Pull documentation update from Jonathan Corbet:
"This was a moderately busy cycle for docs, with the usual collection
of small fixes and updates.We also have new ktime_get_*() docs from Arnd, some kernel-doc fixes,
a new set of Italian translations (non so se vale la pena, ma non fa
male - speriamo bene), and some extensive early memory-management
documentation improvements from Mike Rapoport"* tag 'docs-4.19' of git://git.lwn.net/linux: (52 commits)
Documentation: corrections to console/console.txt
Documentation: add ioctl number entry for v4l2-subdev.h
Remove gendered language from management style documentation
scripts/kernel-doc: Escape all literal braces in regexes
docs/mm: add description of boot time memory management
docs/mm: memblock: add overview documentation
docs/mm: memblock: add kernel-doc description for memblock types
docs/mm: memblock: add kernel-doc comments for memblock_add[_node]
docs/mm: memblock: update kernel-doc comments
mm/memblock: add a name for memblock flags enumeration
docs/mm: bootmem: add overview documentation
docs/mm: bootmem: add kernel-doc description of 'struct bootmem_data'
docs/mm: bootmem: fix kernel-doc warnings
docs/mm: nobootmem: fixup kernel-doc comments
mm/bootmem: drop duplicated kernel-doc comments
Documentation: vm.txt: Adding 'nr_hugepages_mempolicy' parameter description.
doc:it_IT: translation for kernel-hacking
docs: Fix the reference labels in Locking.rst
doc: tracing: Fix a typo of trace_stat
mm: Introduce new type vm_fault_t
... -
Pull block updates from Jens Axboe:
"First pull request for this merge window, there will also be a
followup request with some stragglers.This pull request contains:
- Fix for a thundering heard issue in the wbt block code (Anchal
Agarwal)- A few NVMe pull requests:
* Improved tracepoints (Keith)
* Larger inline data support for RDMA (Steve Wise)
* RDMA setup/teardown fixes (Sagi)
* Effects log suppor for NVMe target (Chaitanya Kulkarni)
* Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
* TP4004 (ANA) support (Christoph)
* Various NVMe fixes- Block io-latency controller support. Much needed support for
properly containing block devices. (Josef)- Series improving how we handle sense information on the stack
(Kees)- Lightnvm fixes and updates/improvements (Mathias/Javier et al)
- Zoned device support for null_blk (Matias)
- AIX partition fixes (Mauricio Faria de Oliveira)
- DIF checksum code made generic (Max Gurtovoy)
- Add support for discard in iostats (Michael Callahan / Tejun)
- Set of updates for BFQ (Paolo)
- Removal of async write support for bsg (Christoph)
- Bio page dirtying and clone fixups (Christoph)
- Set of bcache fix/changes (via Coly)
- Series improving blk-mq queue setup/teardown speed (Ming)
- Series improving merging performance on blk-mq (Ming)
- Lots of other fixes and cleanups from a slew of folks"
* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
blkcg: Make blkg_root_lookup() work for queues in bypass mode
bcache: fix error setting writeback_rate through sysfs interface
null_blk: add lock drop/acquire annotation
Blk-throttle: reduce tail io latency when iops limit is enforced
block: paride: pd: mark expected switch fall-throughs
block: Ensure that a request queue is dissociated from the cgroup controller
block: Introduce blk_exit_queue()
blkcg: Introduce blkg_root_lookup()
block: Remove two superfluous #include directives
blk-mq: count the hctx as active before allocating tag
block: bvec_nr_vecs() returns value for wrong slab
bcache: trivial - remove tailing backslash in macro BTREE_FLAG
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
bcache: set max writeback rate when I/O request is idle
bcache: add code comments for bset.c
bcache: fix mistaken comments in request.c
bcache: fix mistaken code comments in bcache.h
bcache: add a comment in super.c
bcache: avoid unncessary cache prefetch bch_btree_node_get()
bcache: display rate debug parameters to 0 when writeback is not running
...