10 Mar, 2019
1 commit
-
…ux/kernel/git/acme/linux into perf/urgent
Pull perf/core changes from Arnaldo Carvalho de Melo:
perf bpf:
Arnaldo Carvalho de Melo:
- Automatically add BTF ELF markers to 'perf trace' BPF programs, so that
tools such as 'bpftool map dump' can pretty print map keys and values.perf c2c:
Jiri Olsa:
- Fix report for empty NUMA node.
perf diff:
Jin Yao:
- Support --time, --cpu, --pid and --tid filter options.
perf probe:
Arnaldo Carvalho de Melo:
- Clarify error message about not finding kernel modules debuginfo.
perf record:
Jiri Olsa:
- Fixup probing for max attr.precise_ip.
perf trace:
Arnaldo Carvalho de Melo:
- Add missing %s lost in the 'msg_flags' recvmmsg arg when adding prefix suppression logic.
perf annotate:
Arnaldo Carvalho de Melo:
- Calculate the max instruction name, align column to that, removing the
hardcoded max 6 chars and cope with instructions with names longer than that,
such as vpmovmskb, vpcmpeqb, etc.kernel:
Song Liu:
- Consider events with attr.bpf_event set as side-band.
Gustavo A. R. Silva:
- Mark expected switch fall-through in perf_event_parse_addr_filter().
Libraries:
Jiri Olsa:
- Fix leaks and double frees on error paths.
libtraceevent:
Tony Jones:
- Fix buffer overflow in arg_eval().
python scripting:
Tony Jones:
- More python3 fixes.
Trivial:
Yang Wei:
- Remove needless extra semicolon in clang C++ glue code.
Intel PT/BTS:
Adrian Hunter:
- Improve auxtrace address filter error message when there is no DSO.
- Fix divide by zero when TSC is not available.
- Further improvements to the export to sqlite/posgresql python scripts
and to the GUI sqlviewer, exporting 'parent_id' so that we have enable
the creation of call trees.Andi Kleen:
- Generalize function to copy from thread addr space from intel-bts code.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
07 Mar, 2019
8 commits
-
Making sure the data->file.path is zeroed on perf_data__open error path
and in perf_data__close, so we don't double free it in case someone call
it twice.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jonas Rabenstein
Cc: Nageswara R Sastry
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Ravi Bangoria
Link: http://lkml.kernel.org/r/20190305152536.21035-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
We can't call perf_data__close and subsequently perf_session__delete,
because it will call perf_data__close again and cause double free for
data->file.path.$ perf report -i .
incompatible file format (rerun with -v to learn more)
free(): double free detected in tcache 2
Aborted (core dumped)In fact we don't need to call perf_data__close at all, because at the
time the got out_close is reached, session->data is already initialized,
so the perf_data__close call will be triggered from
perf_session__delete.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jonas Rabenstein
Cc: Nageswara R Sastry
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Ravi Bangoria
Fixes: 2d4f27999b88 ("perf data: Add global path holder")
Link: http://lkml.kernel.org/r/20190305152536.21035-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Currently we probe for precise_ip with user specified perf_event_attr,
which might fail because of unsupported kernel features, which would get
disabled during the open time anyway.Switching the probe to take place on simple hw cycles, so the following
record sets proper precise_ip:# perf record -e cycles:P ls
# perf evlist -v
cycles:P: size: 112, ... precise_ip: 3, ...Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jonas Rabenstein
Cc: Nageswara R Sastry
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Ravi Bangoria
Link: http://lkml.kernel.org/r/20190305152536.21035-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Read the caps/max_precise value and store it in struct perf_pmu to be
used when setting the maximum precise_ip field in following patch.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jonas Rabenstein
Cc: Nageswara R Sastry
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Ravi Bangoria
Link: http://lkml.kernel.org/r/20190305152536.21035-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
We can't allocate he->srcline unconditionaly, only when new hist_entry
is created. Moving he->srcline allocation into hist_entry__init
function.Original-patch-by: Jonas Rabenstein
Suggested-by: Namhyung Kim
Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Nageswara R Sastry
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Ravi Bangoria
Link: http://lkml.kernel.org/r/20190305152536.21035-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Adding error path into hist_entry__init to unify error handling, so
every new member does not need to free everything else.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jonas Rabenstein
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Ravi Bangoria
Cc: nageswara r sastry
Link: http://lkml.kernel.org/r/20190305152536.21035-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Add a utility function to fetch executable code. Convert one
user over to it. There are more places doing that, but they
do significantly different actions, so they are not
easy to fit into a single library function.Committer changes:
. No need to cast around, make 'buf' be a void pointer.
. Rename it to thread__memcpy() to reflect the fact it is about copying
a chunk of memory from a thread, i.e. from its address space.. No need to have it in a separate object file, move it to thread.[ch]
. Check the return of map__load(), the original code didn't do it, but
since we're moving this around, check that as well.Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/r/20190305144758.12397-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo -
We were hardcoding '6' as the max instruction name, and we have lots
that are longer than that, see the diff from two 'P' printed TUI
annotations for a libc function that uses instructions with long names,
such as 'vpmovmskb' with its 9 chars:--- __strcmp_avx2.annotation.before 2019-03-06 16:31:39.368020425 -0300
+++ __strcmp_avx2.annotation 2019-03-06 16:32:12.079450508 -0300
@@ -2,284 +2,284 @@
Event: cycles:pppPercent endbr64
- 0.10 mov %edi,%eax
+ 0.10 mov %edi,%eax
- xor %edx,%edx
+ xor %edx,%edx
- 3.54 vpxor %ymm7,%ymm7,%ymm7
+ 3.54 vpxor %ymm7,%ymm7,%ymm7
- or %esi,%eax
+ or %esi,%eax
- and $0xfff,%eax
+ and $0xfff,%eax
- cmp $0xf80,%eax
+ cmp $0xf80,%eax
- ↓ jg 370
+ ↓ jg 370
- 27.07 vmovdqu (%rdi),%ymm1
+ 27.07 vmovdqu (%rdi),%ymm1
- 7.97 vpcmpeqb (%rsi),%ymm1,%ymm0
+ 7.97 vpcmpeqb (%rsi),%ymm1,%ymm0
- 2.15 vpminub %ymm1,%ymm0,%ymm0
+ 2.15 vpminub %ymm1,%ymm0,%ymm0
- 4.09 vpcmpeqb %ymm7,%ymm0,%ymm0
+ 4.09 vpcmpeqb %ymm7,%ymm0,%ymm0
- 0.43 vpmovmskb %ymm0,%ecx
+ 0.43 vpmovmskb %ymm0,%ecx
- 1.53 test %ecx,%ecx
+ 1.53 test %ecx,%ecx
- ↓ je b0
+ ↓ je b0
- 5.26 tzcnt %ecx,%edx
+ 5.26 tzcnt %ecx,%edx
- 18.40 movzbl (%rdi,%rdx,1),%eax
+ 18.40 movzbl (%rdi,%rdx,1),%eax
- 7.09 movzbl (%rsi,%rdx,1),%edx
+ 7.09 movzbl (%rsi,%rdx,1),%edx
- 3.34 sub %edx,%eax
+ 3.34 sub %edx,%eax
2.37 vzeroupper
← retq
nop
- 50: tzcnt %ecx,%edx
+ 50: tzcnt %ecx,%edx
- movzbl 0x20(%rdi,%rdx,1),%eax
+ movzbl 0x20(%rdi,%rdx,1),%eax
- movzbl 0x20(%rsi,%rdx,1),%edx
+ movzbl 0x20(%rsi,%rdx,1),%edx
- sub %edx,%eax
+ sub %edx,%eax
vzeroupper
← retq
- data16 nopw %cs:0x0(%rax,%rax,1)
+ data16 nopw %cs:0x0(%rax,%rax,1)Reported-by: Travis Downs
LPU-Reference: CAOBGo4z1KfmWeOm6Et0cnX5Z6DWsG2PQbAvRn1MhVPJmXHrc5g@mail.gmail.com
Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-89wsdd9h9g6bvq52sgp6d0u4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
06 Mar, 2019
2 commits
-
Pull perf updates from Ingo Molnar:
"Lots of tooling updates - too many to list, here's a few highlights:- Various subcommand updates to 'perf trace', 'perf report', 'perf
record', 'perf annotate', 'perf script', 'perf test', etc.- CPU and NUMA topology and affinity handling improvements,
- HW tracing and HW support updates:
- Intel PT updates
- ARM CoreSight updates
- vendor HW event updates- BPF updates
- Tons of infrastructure updates, both on the build system and the
library support side- Documentation updates.
- ... and lots of other changes, see the changelog for details.
Kernel side updates:
- Tighten up kprobes blacklist handling, reduce the number of places
where developers can install a kprobe and hang/crash the system.- Fix/enhance vma address filter handling.
- Various PMU driver updates, small fixes and additions.
- refcount_t conversions
- BPF updates
- error code propagation enhancements
- misc other changes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
perf script python: Add Python3 support to syscall-counts-by-pid.py
perf script python: Add Python3 support to syscall-counts.py
perf script python: Add Python3 support to stat-cpi.py
perf script python: Add Python3 support to stackcollapse.py
perf script python: Add Python3 support to sctop.py
perf script python: Add Python3 support to powerpc-hcalls.py
perf script python: Add Python3 support to net_dropmonitor.py
perf script python: Add Python3 support to mem-phys-addr.py
perf script python: Add Python3 support to failed-syscalls-by-pid.py
perf script python: Add Python3 support to netdev-times.py
perf tools: Add perf_exe() helper to find perf binary
perf script: Handle missing fields with -F +..
perf data: Add perf_data__open_dir_data function
perf data: Add perf_data__(create_dir|close_dir) functions
perf data: Fail check_backup in case of error
perf data: Make check_backup work over directories
perf tools: Add rm_rf_perf_data function
perf tools: Add pattern name checking to rm_rf
perf tools: Add depth checking to rm_rf
perf data: Add global path holder
... -
Delete a superfluous semicolon in getBPFObjectFromModule().
Signed-off-by: Yang Wei
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Yang Wei
Link: http://lkml.kernel.org/r/1551710174-3349-1-git-send-email-albin_yang@163.com
Signed-off-by: Arnaldo Carvalho de Melo
02 Mar, 2019
3 commits
-
The call_path can be used to find the parent symbol for a call but not
the exact parent call. To do that add parent_id to the call_return
export. This enables the creation of a call tree from the exported data.Signed-off-by: Adrian Hunter
Cc: Jiri Olsa
Link: https://lkml.kernel.org/n/tip-6j7tzdxo67cox6kan7k22oo6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
When TSC is not available, "timeless" decoding is used but a divide by
zero occurs if perf_time_to_tsc() is called.Ensure the divisor is not zero.
Signed-off-by: Adrian Hunter
Cc: Jiri Olsa
Cc: stable@vger.kernel.org # v4.9+
Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
The message does not indicate the possibility that the symbol is not
found because the file does not exist.Before:
$ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls
Symbol 'strcmp' not found.
Note that symbols must be functions.
Failed to parse address filter: 'filter strcmp / strcpy @ foo '
Filter format is: filter|start|stop|tracestop [/ ] [@]
Where multiple filters are separated by space or comma.After:
$ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls
File 'foo' not found or has no symbols.
Symbol 'strcmp' not found.
Note that symbols must be functions.
Failed to parse address filter: 'filter strcmp / strcpy @ foo '
Filter format is: filter|start|stop|tracestop [/ ] [@]
Where multiple filters are separated by space or comma.Reported-by: Alexander Shishkin
Signed-off-by: Adrian Hunter
Tested-by: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Link: https://lkml.kernel.org/n/tip-dvngzxd0jkplzw1ary69dilb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
01 Mar, 2019
3 commits
-
Jiri points out that we don't need any time checking and time string
parsing if the --time option is not set. That makes sense.This patch refactors the time range parsing code, move the duplicated
code from perf report and perf script to time_utils and check if --time
option is set before parsing the time string. This patch is no logic
change expected. So the usage of --time is same as before.For example:
Select the first and second 10% time slices:
perf report --time 10%/1,10%/2
perf script --time 10%/1,10%/2Select the slices from 0% to 10% and from 30% to 40%:
perf report --time 0%-10%,30%-40%
perf script --time 0%-10%,30%-40%Select the time slices from timestamp 3971 to 3973
perf report --time 3971,3973
perf script --time 3971,3973Committer testing:
Using the above examples, check before and after to see if it remains
the same:$ perf record -F 10000 -- find . -name "*.[ch]" -exec cat {} + > /dev/null
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 1.626 MB perf.data (42392 samples) ]
$
$ perf report --time 10%/1,10%/2 > /tmp/report.before.1
$ perf script --time 10%/1,10%/2 > /tmp/script.before.1
$ perf report --time 0%-10%,30%-40% > /tmp/report.before.2
$ perf script --time 0%-10%,30%-40% > /tmp/script.before.2
$ perf report --time 180457.375844,180457.377717 > /tmp/report.before.3
$ perf script --time 180457.375844,180457.377717 > /tmp/script.before.3For example, the 3rd test produces this slice:
$ cat /tmp/script.before.3
cat 3147 180457.375844: 2143 cycles:uppp: 7f79362590d9 cfree@GLIBC_2.2.5+0x9 (/usr/lib64/libc-2.28.so)
cat 3147 180457.375986: 2245 cycles:uppp: 558b70f3d86e [unknown] (/usr/bin/cat)
cat 3147 180457.376012: 2164 cycles:uppp: 7f7936257430 _int_malloc+0x8c0 (/usr/lib64/libc-2.28.so)
cat 3147 180457.376140: 2921 cycles:uppp: 558b70f3a554 [unknown] (/usr/bin/cat)
cat 3147 180457.376296: 2844 cycles:uppp: 7f7936258abe malloc+0x4e (/usr/lib64/libc-2.28.so)
cat 3147 180457.376431: 2717 cycles:uppp: 558b70f3b0ca [unknown] (/usr/bin/cat)
cat 3147 180457.376667: 2630 cycles:uppp: 558b70f3d86e [unknown] (/usr/bin/cat)
cat 3147 180457.376795: 2442 cycles:uppp: 7f79362bff55 read+0x15 (/usr/lib64/libc-2.28.so)
cat 3147 180457.376927: 2376 cycles:uppp: ffffffff9aa00163 [unknown] ([unknown])
cat 3147 180457.376954: 2307 cycles:uppp: 7f7936257438 _int_malloc+0x8c8 (/usr/lib64/libc-2.28.so)
cat 3147 180457.377116: 3091 cycles:uppp: 7f7936258a70 malloc+0x0 (/usr/lib64/libc-2.28.so)
cat 3147 180457.377362: 2945 cycles:uppp: 558b70f3a3b0 [unknown] (/usr/bin/cat)
cat 3147 180457.377517: 2727 cycles:uppp: 558b70f3a9aa [unknown] (/usr/bin/cat)
$Install 'coreutils-debuginfo' to see cat's guts (symbols), but then, the
above chunk translates into this 'perf report' output:$ cat /tmp/report.before.3
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles:uppp' (time slices: 180457.375844,180457.377717)
# Event count (approx.): 33552
#
# Overhead Command Shared Object Symbol
# ........ ....... ................ ......................
#
17.69% cat libc-2.28.so [.] malloc
14.53% cat cat [.] 0x000000000000586e
13.33% cat libc-2.28.so [.] _int_malloc
8.78% cat cat [.] 0x00000000000023b0
8.71% cat cat [.] 0x0000000000002554
8.13% cat cat [.] 0x00000000000029aa
8.10% cat cat [.] 0x00000000000030ca
7.28% cat libc-2.28.so [.] read
7.08% cat [unknown] [k] 0xffffffff9aa00163
6.39% cat libc-2.28.so [.] cfree@GLIBC_2.2.5#
# (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
#
$Now lets see after applying this patch, nothing should change:
$ perf report --time 10%/1,10%/2 > /tmp/report.after.1
$ perf script --time 10%/1,10%/2 > /tmp/script.after.1
$ perf report --time 0%-10%,30%-40% > /tmp/report.after.2
$ perf script --time 0%-10%,30%-40% > /tmp/script.after.2
$ perf report --time 180457.375844,180457.377717 > /tmp/report.after.3
$ perf script --time 180457.375844,180457.377717 > /tmp/script.after.3
$ diff -u /tmp/report.before.1 /tmp/report.after.1
$ diff -u /tmp/script.before.1 /tmp/script.after.1
$ diff -u /tmp/report.before.2 /tmp/report.after.2
--- /tmp/report.before.2 2019-03-01 11:01:53.526094883 -0300
+++ /tmp/report.after.2 2019-03-01 11:09:18.231770467 -0300
@@ -352,5 +352,5 @@#
-# (Tip: Generate a script for your data: perf script -g )
+# (Tip: Treat branches as callchains: perf report --branch-history)
#
$ diff -u /tmp/script.before.2 /tmp/script.after.2
$ diff -u /tmp/report.before.3 /tmp/report.after.3
--- /tmp/report.before.3 2019-03-01 11:03:08.890045588 -0300
+++ /tmp/report.after.3 2019-03-01 11:09:40.660224002 -0300
@@ -22,5 +22,5 @@#
-# (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
+# (Tip: List events using substring match: perf list )
#
$ diff -u /tmp/script.before.3 /tmp/script.after.3
$Cool, just the 'perf report' tips changed, QED.
Signed-off-by: Jin Yao
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1551435186-6008-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
For historical reasons the helper to loop over maps in an object
is called bpf_map__for_each while it really should be called
bpf_object__for_each_map. Rename and add a correctly named
define for backward compatibility.Switch all in-tree users to the correct name (Quentin).
Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
Signed-off-by: Daniel Borkmann -
'perf probe' supports using just the kernel module name, but that will
work only when the module is loaded, or using the full pathname to the
file with the DWARF debug info, but the warning was cryptic:Before:
# perf probe -m cls_flower -L fl_change
Failed to find the path for cls_flower: No such file or directory
Error: Failed to show lines.
#After:
# perf probe -m cls_flower -L fl_change
Module cls_flower is not loaded, please specify its full path name.
Error: Failed to show lines.
# perf probe -m /lib/modules/5.0.0-rc7+/kernel/net/sched/cls_flower.ko -L fl_change | head -7
0 static int fl_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base,
u32 handle, struct nlattr **tca,
void **arg, bool ovr, struct netlink_ext_ack *extack)
4 {
5 struct cls_fl_head *head = rtnl_dereference(tp->root);
#The behaviour doesn't change when the module is loaded:
# modprobe cls_flower
# perf probe -m cls_flower -L fl_change | head -7
0 static int fl_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base,
u32 handle, struct nlattr **tca,
void **arg, bool ovr, struct netlink_ext_ack *extack)
4 {
5 struct cls_fl_head *head = rtnl_dereference(tp->root);
#Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Marcelo Ricardo Leitner
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-q4njvk9mshra00jacqjbzfn5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
25 Feb, 2019
8 commits
-
Also convert one existing user.
Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Cc: Namhyung Kim
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224153722.27020-9-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo -
Add perf_data__open_dir_data to open files inside 'struct perf_data'
path directory:static int perf_data__open_dir(struct perf_data *data);
Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224190656.30163-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Add perf_data__create_dir() to create nr files inside 'struct perf_data'
path directory:int perf_data__create_dir(struct perf_data *data, int nr);
and function to close that data:
void perf_data__close_dir(struct perf_data *data);
Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224190656.30163-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
And display the error message from removing the old data file:
$ perf record ls
Can't remove old data: Permission denied (perf.data.old)
Perf session creation failed.$ perf record ls
Can't remove old data: Unknown file found (perf.data.old)
Perf session creation failed.Not sure how to make fail the rename (after we successfully remove the
destination file/dir) to show the message, anyway let's have it there.Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224190656.30163-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Change check_backup() to call rm_rf_perf_data() instead of unlink() to
work over directory paths.Also move the call earlier in the code, before we fork for file/dir, so
it can backup also directory data.Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224190656.30163-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
To remove perf.data including the directory, with checking on expected
files and no other directories inside.Signed-off-by: Jiri Olsa
Suggested-by: Andi Kleen
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224190656.30163-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Add pattern argument to rm_rf_depth() (and rename it to rm_rf_depth_pat())
to specify the name pattern files need to match inside the directory.The function fails if we find different file to remove.
Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224190656.30163-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Adding depth argument to rm_rf (and renaming it to rm_rf_depth) to
specify the depth we will go searching for files to remove.It will be used to specify single depth for perf.data directory removal
in following patch.Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190224190656.30163-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
23 Feb, 2019
3 commits
-
Add a 'path' member to 'struct perf_data'. It will keep the configured
path for the data (const char *). The path in struct perf_data_file is
now dynamically allocated (duped) from it.This scheme is useful/used in following patches where struct
perf_data::path holds the 'configure' directory path and struct
perf_data_file::path holds the allocated path for specific files.Also it actually makes the code little simpler.
Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org
[ Fixup data-convert-bt.c missing conversion ]
Signed-off-by: Arnaldo Carvalho de Melo -
We are about to add support for multiple files, so we need each file to
keep its size.Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20190221094145.9151-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
x86 retpoline functions pollute the call graph by showing up everywhere
there is an indirect branch, but they do not really mean anything. Make
changes so that the default retpoline functions will no longer appear in
the call graph. Note this only affects the call graph, since all the
original branches are left unchanged.This does not handle function return thunks, nor is there any
improvement for the handling of inline thunks or extern thunks.Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}int foo(void)
{
return bar() + 1;
}__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
0000000000001040 :
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179
0000000000001160 :
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
0000000000001170 :
1170: e8 eb ff ff ff callq 1160
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 :
1179: e8 07 00 00 00 callq 1185
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.dbBefore:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> barAfter:
main
-> foo
-> barSigned-off-by: Adrian Hunter
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20190109091835.5570-7-adrian.hunter@intel.com
[ Remove (sym->name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ]
Signed-off-by: Arnaldo Carvalho de Melo
22 Feb, 2019
2 commits
-
Improve thread_stack__no_call_return() to better handle 'returns' that
do not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}int foo(void)
{
return bar() + 1;
}__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
0000000000001040 :
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179
0000000000001160 :
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
0000000000001170 :
1170: e8 eb ff ff ff callq 1160
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 :
1179: e8 07 00 00 00 callq 1185
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.dbBefore:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> barAfter:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> barCommitter testing:
Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
expanding:Before:
simple-retpolin
PID:PID
_start
_start
__libc_start_main
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
barAfter:
Remove the "simple.retpoline.db" file, run again the 'perf script' line
to regenerate the .db file and run the exported-sql-viewer.py again to
get the same all the way to 'main', then, from there, including 'main':main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
foo
barSigned-off-by: Adrian Hunter
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0de33
("perf annotate: No need to calculate notes->start twice") removed notes->start
assignment in symbol__calc_lines(). It will get failed in
find_address_in_section() from symbol__tty_annotate() subroutine as the
a2l->addr is wrong. So the annotate summary doesn't report the line number of
source code correctly.Before fix:
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
void hotspot_1(void)
{
volatile int i;for (i = 0; i < 0x10000000; i++);
for (i = 0; i < 0x10000000; i++);
for (i = 0; i < 0x10000000; i++);
}int main(void)
{
hotspot_1();return 0;
}
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdioSorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
----------------------------------------------19.30 common_while_1[32]
19.03 common_while_1[4e]
19.01 common_while_1[16]
5.04 common_while_1[13]
4.99 common_while_1[4b]
4.78 common_while_1[2c]
4.77 common_while_1[10]
4.66 common_while_1[2f]
4.59 common_while_1[51]
4.59 common_while_1[35]
4.52 common_while_1[19]
4.20 common_while_1[56]
0.51 common_while_1[48]
Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
-----------------------------------------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000000005fa :
: hotspot_1():
: void hotspot_1(void)
: {
0.00 : 5fa: push %rbp
0.00 : 5fb: mov %rsp,%rbp
: volatile int i;
:
: for (i = 0; i < 0x10000000; i++);
0.00 : 5fe: movl $0x0,-0x4(%rbp)
0.00 : 605: jmp 610
0.00 : 607: mov -0x4(%rbp),%eax
common_while_1[10] 4.77 : 60a: add $0x1,%eax
common_while_1[13] 5.04 : 60d: mov %eax,-0x4(%rbp)
common_while_1[16] 19.01 : 610: mov -0x4(%rbp),%eax
common_while_1[19] 4.52 : 613: cmp $0xfffffff,%eax
0.00 : 618: jle 607
: for (i = 0; i < 0x10000000; i++);
...After fix:
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdioSorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
----------------------------------------------33.34 common_while_1.c:5
33.34 common_while_1.c:6
33.32 common_while_1.c:7
Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
-----------------------------------------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000000005fa :
: hotspot_1():
: void hotspot_1(void)
: {
0.00 : 5fa: push %rbp
0.00 : 5fb: mov %rsp,%rbp
: volatile int i;
:
: for (i = 0; i < 0x10000000; i++);
0.00 : 5fe: movl $0x0,-0x4(%rbp)
0.00 : 605: jmp 610
0.00 : 607: mov -0x4(%rbp),%eax
common_while_1.c:5 4.70 : 60a: add $0x1,%eax
4.89 : 60d: mov %eax,-0x4(%rbp)
common_while_1.c:5 19.03 : 610: mov -0x4(%rbp),%eax
common_while_1.c:5 4.72 : 613: cmp $0xfffffff,%eax
0.00 : 618: jle 607
: for (i = 0; i < 0x10000000; i++);
0.00 : 61a: movl $0x0,-0x4(%rbp)
0.00 : 621: jmp 62c
0.00 : 623: mov -0x4(%rbp),%eax
common_while_1.c:6 4.54 : 626: add $0x1,%eax
4.73 : 629: mov %eax,-0x4(%rbp)
common_while_1.c:6 19.54 : 62c: mov -0x4(%rbp),%eax
common_while_1.c:6 4.54 : 62f: cmp $0xfffffff,%eax
...Signed-off-by: Wei Li
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Jin Yao
Cc: Namhyung Kim
Cc: Peter Zijlstra
Fixes: 425859ff0de33 ("perf annotate: No need to calculate notes->start twice")
Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo
21 Feb, 2019
5 commits
-
Let rm_rf() remove a file if it's provided by path, not just
directories.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20190220122800.864-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
So it does not screw up single -v verbose output.
Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20190220122800.864-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Add a missing new line into pr_debug call in perf_event__synthesize_bpf_events(),
so that the error message does not screw the verbose output.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Song Liu
Link: http://lkml.kernel.org/r/20190220122800.864-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Force sample_type setup for slave events in group leader sessions.
We don't get sample for slave events, we make them when delivering group
leader sample. Set the slave event to follow the master sample_type to
ease up report.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20190220122800.864-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
There's no reason to deliver a sample with zero period. It means there
was no value for slave event since its last group leader sample.Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20190220122800.864-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
20 Feb, 2019
1 commit
-
At some point I'll suggest moving this to libbpf, for now I'll
experiment with ways to dump BPF maps set by events in 'perf trace',
starting with a very basic dumper for the current very limited needs
of the augmented_raw_syscalls code: dumping booleans.Having functions that apply to the map keys and values and do table
lookup in things like syscall id to string tables should come next.Cc: Adrian Hunter
Cc: Alexei Starovoitov
Cc: Daniel Borkmann
Cc: Jiri Olsa
Cc: Martin KaFai Lau
Cc: Namhyung Kim
Cc: Yonghong Song
Link: https://lkml.kernel.org/n/tip-lz14w0esqyt1333aon05jpwc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
19 Feb, 2019
4 commits
-
We can't assume inlined symbols with the same name are equal, because
their address range may be different. This will cause the symbols with
different addresses be shadowed when adding to the hist entry, and lead
to ERANGE error when checking the symbol address during sample parse,
the addr should be within the range of [sym.start, sym.end].The error message is like: "0x36aea60 [0x8]: failed to process type: 68".
The second parameter of symbol__new() is the length of the fake symbol
for the inline frame, which is the subtraction of the end and start
address of base_sym.Signed-off-by: He Kuang
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Fixes: aa441895f7b4 ("perf report: Compare symbol name for inlined frames when sorting")
Link: http://lkml.kernel.org/r/20190219130531.15692-1-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo -
Use sysfs__mountpoint() when reading sysfs files to obtain cpu/numa
topologies.Also use scnprintf instead of sprintf as suggested by Namhyung.
Signed-off-by: Jiri Olsa
Acked-by: Namhyung Kim
Cc: Alexander Shishkin
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20190219095815.15931-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Add the numa_topology object to return the list of numa nodes together
with their cpus. It will replace the numa code in header.c and will be
used from 'perf record' in the following patches.Add the following interface functions to load numa details:
struct numa_topology *numa_topology__new(void);
void numa_topology__delete(struct numa_topology *tp);And replace the current (copied) local interface, with no functional
changes.Signed-off-by: Jiri Olsa
Acked-by: Namhyung Kim
Cc: Alexander Shishkin
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20190219095815.15931-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Make struct cpu_topo global and rename it to 'struct cpu_topology', so
that it can be used from the 'perf record' command in the following
patches.Add the following interface functions to load/free cpu topology details:
struct cpu_topology *cpu_topology__new(void);
void cpu_topology__delete(struct cpu_topology *tp);Move it to a separate source file cputopo.c together with numa related
object in the following patches.No functional change, the new interface will be used in upcoming changes.
Signed-off-by: Jiri Olsa
Acked-by: Namhyung Kim
Cc: Alexander Shishkin
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20190219095815.15931-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo