10 May, 2019
1 commit
-
Pull arch/csky perf update from Guo Ren:
"Add support for perf unwind-libdw"* tag 'csky-for-linus-5.2-perf-unwind-libdw' of git://github.com/c-sky/csky-linux:
csky: Add support for perf unwind-libdw
09 May, 2019
1 commit
-
This patch add support for DWARF register mappings and libdw registers
initialization, which is used by perf callchain analyzing, eg:perf record --call-graph=dwarf
Here is elfutils csky backend patch set:
https://sourceware.org/ml/elfutils-devel/2019-q2/msg00007.htmlSigned-off-by: Mao Han
Signed-off-by: Guo Ren
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Arnd Bergmann
07 May, 2019
1 commit
-
Pull perf updates from Ingo Molnar:
"The main kernel changes were:- add support for Intel's "adaptive PEBS v4" - which embedds LBS data
in PEBS records and can thus batch up and reduce the IRQ (NMI) rate
significantly - reducing overhead and making call-graph profiling
less intrusive.- add Intel CPU core and uncore support updates for Tremont, Icelake,
- extend the x86 PMU constraints scheduler with 'constraint ranges'
to better support Icelake hw constraints,- make x86 call-chain support work better with CONFIG_FRAME_POINTER=y
- misc other changes
Tooling changes:
- updates to the main tools: 'perf record', 'perf trace', 'perf
stat'- updated Intel and S/390 vendor events
- libtraceevent updates
- misc other updates and fixes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER
watchdog: Fix typo in comment
perf/x86/intel: Add Tremont core PMU support
perf/x86/intel/uncore: Add Intel Icelake uncore support
perf/x86/msr: Add Icelake support
perf/x86/intel/rapl: Add Icelake support
perf/x86/intel/cstate: Add Icelake support
perf/x86/intel: Add Icelake support
perf/x86: Support constraint ranges
perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them
perf/x86/intel: Support adaptive PEBS v4
perf/x86/intel/ds: Extract code of event update in short period
perf/x86/intel: Extract memory code PEBS parser for reuse
perf/x86: Support outputting XMM registers
perf/x86/intel: Force resched when TFA sysctl is modified
perf/core: Add perf_pmu_resched() as global function
perf/headers: Fix stale comment for struct perf_addr_filter
perf/core: Make perf_swevent_init_cpu() static
perf/x86: Add sanity checks to x86_schedule_events()
perf/x86: Optimize x86_schedule_events()
...
03 May, 2019
8 commits
-
We were including sys/syscall.h and asm/unistd.h, since sys/syscall.h
includes asm/unistd.h, sometimes this leads to the redefinition of
defines, breaking the build.Noticed on ARC with uCLibc.
Cc: Adrian Hunter
Cc: Arnaldo Carvalho de Melo
Cc: Arnd Bergmann
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Rich Felker
Cc: Vineet Gupta
Link: https://lkml.kernel.org/n/tip-xjpf80o64i2ko74aj2jih0qg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Thomas Backlund reported that the perf build was failing on the Mageia 7
distro, that is because it uses:cat /tmp/build/perf/feature/test-disassembler-four-args.make.output
/usr/bin/ld: /usr/lib64/libbfd.a(plugin.o): in function `try_load_plugin':
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:243:
undefined reference to `dlopen'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:271:
undefined reference to `dlsym'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:256:
undefined reference to `dlclose'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:246:
undefined reference to `dlerror'
as we allow dynamic linking and loadingMageia 7 uses these linker flags:
$ rpm --eval %ldflags
-Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id -Wl,--enable-new-dtagsSo add -ldl to this feature LDFLAGS.
Reported-by: Thomas Backlund
Tested-by: Thomas Backlund
Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Song Liu
Link: https://lkml.kernel.org/r/20190501173158.GC21436@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Robert Walker reported a segmentation fault is observed when process
CoreSight trace data; this issue can be easily reproduced by the command
'perf report --itrace=i1000i' for decoding tracing data.If neither the 'b' flag (synthesize branches events) nor 'l' flag
(synthesize last branch entries) are specified to option '--itrace',
cs_etm_queue::prev_packet will not been initialised. After merging the
code to support exception packets and sample flags, there introduced a
number of uses of cs_etm_queue::prev_packet without checking whether it
is valid, for these cases any accessing to uninitialised prev_packet
will cause crash.As cs_etm_queue::prev_packet is used more widely now and it's already
hard to follow which functions have been called in a context where the
validity of cs_etm_queue::prev_packet has been checked, this patch
always allocates memory for cs_etm_queue::prev_packet.Reported-by: Robert Walker
Suggested-by: Robert Walker
Signed-off-by: Leo Yan
Tested-by: Robert Walker
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Mathieu Poirier
Cc: Mike Leach
Cc: Namhyung Kim
Cc: Suzuki K Poulouse
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 7100b12cf474 ("perf cs-etm: Generate branch sample for exception packet")
Fixes: 24fff5eb2b93 ("perf cs-etm: Avoid stale branch samples when flush packet")
Link: http://lkml.kernel.org/r/20190428083228.20246-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo -
Since cs_etm_queue::prev_packet is allocated for all cases, it will
never be NULL pointer; now validity checking prev_packet is pointless,
remove all of them.Signed-off-by: Leo Yan
Tested-by: Robert Walker
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Mathieu Poirier
Cc: Mike Leach
Cc: Namhyung Kim
Cc: Suzuki K Poulouse
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190428083228.20246-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo -
An -ENOMEM error is not reported in the GTK GUI. Instead this error
message pops up on the screen:[root@m35lp76 perf]# ./perf report -i perf.data.error68-1
Processing events... [974K/3M]
Error:failed to process sample0xf4198 [0x8]: failed to process type: 68
However when I use the same perf.data file with --stdio it works:
[root@m35lp76 perf]# ./perf report -i perf.data.error68-1 --stdio \
| head -12# Total Lost Samples: 0
#
# Samples: 76K of event 'cycles'
# Event count (approx.): 99056160000
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .........
#
8.81% find [kernel.kallsyms] [k] ftrace_likely_update
8.74% swapper [kernel.kallsyms] [k] ftrace_likely_update
8.34% sshd [kernel.kallsyms] [k] ftrace_likely_update
2.19% kworker/u512:1- [kernel.kallsyms] [k] ftrace_likely_updateThe sample precentage is a bit low.....
The GUI always fails in the FINISHED_ROUND event (68) and does not
indicate the reason why.When happened is the following. Perf report calls a lot of functions and
down deep when a FINISHED_ROUND event is processed, these functions are
called:perf_session__process_event()
+ perf_session__process_user_event()
+ process_finished_round()
+ ordered_events__flush()
+ __ordered_events__flush()
+ do_flush()
+ ordered_events__deliver_event()
+ perf_session__deliver_event()
+ machine__deliver_event()
+ perf_evlist__deliver_event()
+ process_sample_event()
+ hist_entry_iter_add() --> only called in GUI case!!!
+ hist_iter__report__callback()
+ symbol__inc_addr_sample()Now this functions runs out of memory and
returns -ENOMEM. This is reported all the way up
until functionperf_session__process_event() returns to its caller, where -ENOMEM is
changed to -EINVAL and processing stops:if ((skip = perf_session__process_event(session, event, head)) < 0) {
pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
head, event->header.size, event->header.type);
err = -EINVAL;
goto out_err;
}This occurred in the FINISHED_ROUND event when it has to process some
10000 entries and ran out of memory.This patch indicates the root cause and displays it in the status line
of ther perf report GUI.Output before (on GUI status line):
0xf4198 [0x8]: failed to process type: 68
Output after:
0xf4198 [0x8]: failed to process type: 68 [not enough memory]
Committer notes:
the 'skip' variable needs to be initialized to -EINVAL, so that when the
size is less than sizeof(struct perf_event_attr) we avoid this valid
compiler warning:util/session.c: In function ‘perf_session__process_events’:
util/session.c:1936:7: error: ‘skip’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
err = skip;
~~~~^~~~~~
util/session.c:1874:6: note: ‘skip’ was declared here
s64 skip;
^~~~
cc1: all warnings being treated as errorsSigned-off-by: Thomas Richter
Reviewed-by: Hendrik Brueckner
Reviewed-by: Jiri Olsa
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Link: http://lkml.kernel.org/r/20190423105303.61683-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo -
While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in[perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
[perfbuilder@60d5802468f6 perf]$Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.Cc: Arnd Bergmann
Cc: Jiri Olsa
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim
Cc: Vineet Gupta
Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Commit 6987561c9e86 ("perf annotate: Enable annotation of BPF programs") adds
support for BPF programs annotations but the new code does not build on 32-bit.Signed-off-by: Thadeu Lima de Souza Cascardo
Acked-by: Song Liu
Fixes: 6987561c9e86 ("perf annotate: Enable annotation of BPF programs")
Link: http://lkml.kernel.org/r/20190403194452.10845-1-cascardo@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo -
In perf_env__find_btf(), we're returning without unlocking
"env->bpf_progs.lock". There may be cause lockdep issue.Detected by CoversityScan, CID# 1444762:(program hangs(LOCK))
Signed-off-by: Bo YU
Acked-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Daniel Borkmann
Cc: Martin KaFai Lau
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Song Liu
Cc: Yonghong Song
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Fixes: 2db7b1e0bd49d: (perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf())
Link: http://lkml.kernel.org/r/20190422080138.10088-1-tsu.yubo@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
18 Apr, 2019
5 commits
-
We don't return NULL when we don't find the bpf_prog_info_node, fix
that.Signed-off-by: Jiri Olsa
Reported-by: Song Liu
Acked-by: Song Liu
Cc: Alexander Shishkin
Cc: Namhyung Kim
Cc: Peter Zijlstra
Fixes: 3792cb2ff43b ("perf bpf: Save BTF in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190417145539.11669-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.However if there's already same map name, we currently don't bump the
reference and can crash, like:Program received signal SIGABRT, Aborted.
0x00007ffff75e60f5 in raise () from /lib64/libc.so.6(gdb) bt
#0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
#1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6
#2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
#3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
#5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
#6 map__put (map=0x1224df0) at util/map.c:299
#7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
#8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
#9 0x00000000004f7d8a in map_groups__remove (map=, mg=) at util/map_groups.h:65
#10 machine__process_ksymbol_unregister (sample=, event=0x7ffff7279670, machine=) at util/machine.c:728
#11 machine__process_ksymbol (machine=, event=0x7ffff7279670, sample=) at util/machine.c:741
#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=) at util/ordered-events.c:322
#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
...Add the map to the list and getting the reference event if we find the
map with same name.Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Andi Kleen
Cc: Daniel Borkmann
Cc: Eric Saint-Etienne
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Song Liu
Fixes: 1e6285699b30 ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Current perf_evlist__poll_thread() code could finish without draining
the data. Adding the logic that makes sure we won't finish before the
drain.Signed-off-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Andi Kleen
Cc: Daniel Borkmann
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Song Liu
Fixes: 657ee5531903 ("perf evlist: Introduce side band thread")
Link: http://lkml.kernel.org/r/20190416160127.30203-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
As reported by Jiri Olsa in:
"[BUG] perf: intel_pt won't display kernel function"
https://lore.kernel.org/lkml/20190403143738.GB32001@kravaRecent changes to support PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT
broke --kallsyms option. This is because it broke test __map__is_kmodule.This patch fixes this by adding check for bpf program, so that these maps
are not mistaken as kernel modules.Signed-off-by: Song Liu
Reported-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Andi Kleen
Cc: Andrii Nakryiko
Cc: Daniel Borkmann
Cc: Martin KaFai Lau
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Yonghong Song
Link: http://lkml.kernel.org/r/20190416160127.30203-8-jolsa@kernel.org
Fixes: 76193a94522f ("perf, bpf: Introduce PERF_RECORD_KSYMBOL")
Signed-off-by: Jiri Olsa
Signed-off-by: Arnaldo Carvalho de Melo -
We currently don't return NULL in case we don't find the
bpf_prog_info_node, fixing that.Signed-off-by: Jiri Olsa
Acked-by: Song Liu
Cc: Alexander Shishkin
Cc: Namhyung Kim
Cc: Peter Zijlstra
Fixes: e4378f0cb90b ("perf bpf: Save bpf_prog_info in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190416134151.15282-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
16 Apr, 2019
5 commits
-
Bastian reported broken 'perf top -p PID' command, it won't display any
data.The problem is that for -p option we monitor single thread, so we don't
enable time in samples, because it's not needed.However since commit 16c66bc167cc we use ordered queues to stash data
plus later commits added logic for dropping samples in case there's big
load and we don't keep up. All this needs timestamp for sample. Enabling
it unconditionally for perf top.Reported-by: Bastian Beischer
Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: bastian beischer
Fixes: 16c66bc167cc ("perf top: Add processing thread")
Link: http://lkml.kernel.org/r/20190415125333.27160-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
On 32-bits platform with more than 32 registers, the 64 bits mask is
truncate to the lower 32 bits and the return value of hweight_long will
always smaller than 32. When kernel outputs more than 32 registers, but
the user perf program only counts 32, there will be a data mismatch
result to overflow check fail.Signed-off-by: Mao Han
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Fixes: 6a21c0b5c2ab ("perf tools: Add core support for sampling intr machine state regs")
Fixes: d03f2170546d ("perf tools: Expand perf_event__synthesize_sample()")
Fixes: 0f6a30150ca2 ("perf tools: Support user regs and stack in sample parsing")
Link: http://lkml.kernel.org/r/29ad7947dc8fd1ff0abd2093a72cc27a2446be9f.1554883878.git.han_mao@c-sky.com
Signed-off-by: Arnaldo Carvalho de Melo -
Arnaldo reported assertion in perf stat record:
assertion failed at util/header.c:875
There's no support for this in the 'perf state record' command, disable
the feature for that case.Reported-by: Arnaldo Carvalho de Melo
Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Namhyung Kim
Cc: Peter Zijlstra
Fixes: 258031c017c3 ("perf header: Add DIR_FORMAT feature to describe directory data")
Link: http://lkml.kernel.org/r/20190409100156.20303-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Fix following error using calls_view:
Query failed: ambiguous column name: parent_id Unable to execute statement
Signed-off-by: Adrian Hunter
Cc: Jiri Olsa
Fixes: 8ce9a7251d11 ("perf scripts python: export-to-sqlite.py: Export calls parent_id")
Link: http://lkml.kernel.org/r/20190409062557.26138-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Fix lock/unlock imbalances by refactoring the code a bit and adding
calls to up_write() before return.Signed-off-by: Gustavo A. R. Silva
Acked-by: Song Liu
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Addresses-Coverity-ID: 1444315 ("Missing unlock")
Addresses-Coverity-ID: 1444316 ("Missing unlock")
Fixes: a70a1123174a ("perf bpf: Save BTF information as headers to perf.data")
Fixes: 606f972b1361 ("perf bpf: Save bpf_prog_info information as headers to perf.data")
Link: http://lkml.kernel.org/r/20190408173355.GA10501@embeddedor
[ Simplified the exit path to have just one up_write() + return ]
Signed-off-by: Arnaldo Carvalho de Melo
02 Apr, 2019
19 commits
-
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Update all the Intel JSON metrics from Ahmad Yasin's TMAM 3.5
for Intel big core from Sandy Bridge to Cascade Lake.This has many improvements and new metircs
- New TopDownL1_SMT group that provides a per SMT thread version
of --topdown that does not require -a anymore. The drawback is
increased multiplexing though since L1 TopDown does not fit into
4 generic counters anymore.- Added SMT aware versions of other metrics
- Split SMT aware metrics into separate metrics to avoid
unnecessary event collections- New metrics for better branch analysis:
Estimated Branch_Mispredict_Costs, Instructions per taken Branch,
Branch Instructions per Taken Branch, etc.- Instruction mix metrics:
Instructions per load, Instructions per store, Instructions per Branch,
Instructions per Call- New Cache metrics:
Bandwidth to L1/L2/L3 caches. L1/L2/L3 misses per kilo instructions.
memory level parallelism- New memory controller metrics:
Normalized memory bandwidth in interval mode, Average memory latency,
Average number of parallel read requests,- 3DXP persistent memory metrics for Cascade Lake:
3dxp read latency, 3dxp read/write bandwidth- Some other useful metrics like Instruction Level Parallelism,
- Various other improvements.
Not all metrics are available on all CPUs. Skylake has best coverage.
Signed-off-by: Andi Kleen
Cc: Kan Liang
Cc: Jiri Olsa
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gccThe option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:# perf trace -m 2048 --call-graph dwarf -e write -- perf record
101.142 ( 0.004 ms): perf/25821 write(fd: 3, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:107.561 ( 0.005 ms): perf/25821 write(fd: 3, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3, buf: 0x7f1befc83150, count: 184984) = 184984
12920.094 ( 0.155 ms): perf/25821 write(fd: 3, buf: 0x7f1befc02150, count: 261816) = 261816
12920.253 ( 0.093 ms): perf/25821 write(fd: 3, buf: 0x7f1befb81120, count: 170832) = 170832
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':mmap flush: 132096
mmap size 528384BWith that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3, buf: 0x7ff9476c6000, count: 4029760) = 4029760Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704BA warning about that would be good to have but can be added later,
something like:"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is hereSigned-off-by: Alexey Budankov
Reviewed-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Implement libzstd feature check, NO_LIBZSTD and LIBZSTD_DIR defines to
override Zstd library sources or disable the feature from the command
line:$ make -C tools/perf LIBZSTD_DIR=/path/to/zstd/sources/ clean all
$ make -C tools/perf NO_LIBZSTD=1 clean allAuto detection feature status is reported just before compilation
starts. If your system has some version of the zstd library
preinstalled then the build system finds and uses it during the build.If you still prefer to compile with some other version of zstd library
you have capability to refer the compilation to that version using
LIBZSTD_DIR define.Signed-off-by: Alexey Budankov
Reviewed-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/9b4cd8b0-10a3-1f1e-8d6b-5922a7ca216b@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo