Eric Lee / smarc-fsl-linux-kernel

27 Mar, 2017

1 commit

5dfa210e4 perf report: Enable sorting by srcline as key ... Browse Code »

Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.

Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.

The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:

~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g address
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--60.31%--hypot +20
| | |
| | |--8.52%--__hypot_finite +273
| | |
| | |--7.32%--__hypot_finite +411
...
--35.34%--_start +4194346
__libc_start_main +241
|
|--6.65%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--2.70%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--1.69%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~

With this patch and `-g srcline` we instead get the following output:

~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g srcline
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--64.02%--hypot
| | |
| | --59.81%--__hypot_finite
| |
| --0.53%--cabs
|
--35.34%--_start
__libc_start_main
|
|--12.48%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~

Signed-off-by: Milian Wolff
Cc: Jiri Olsa
Cc: Yao Jin
Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo

Milian Wolff
2017-03-27 23:13:28 +0800

02 Feb, 2017

1 commit

aa33b9b9a perf callchain: Reference count maps ... Browse Code »

If dso__load_kcore frees all of the existing maps, but one has already
been attached to a callchain cursor node, then we can get a SIGSEGV in
any function that happens to try to use this invalid cursor. Use the
existing map refcount mechanism to forestall cleanup of a map until the
cursor iterates past the node.

Signed-off-by: Krister Johansen
Tested-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: stable@kernel.org
Fixes: 84c2cafa2889 ("perf tools: Reference count struct map")
Link: http://lkml.kernel.org/r/20170106062331.GB2707@templeofstupid.com
Signed-off-by: Arnaldo Carvalho de Melo

Krister Johansen
2017-02-02 22:39:09 +0800

07 Dec, 2016

1 commit

571f1eb9b perf callchain: Introduce callchain_cursor__copy() ... Browse Code »

The callchain_cursor__copy() function is to save current callchain
captured by a cursor. It'll be used to keep callchains when switching
to idle task for each cpu.

Signed-off-by: Namhyung Kim
Cc: Andi Kleen
Cc: David Ahern
Cc: Jiri Olsa
Cc: Minchan Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20161206034010.6499-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2016-12-07 23:00:33 +0800

15 Nov, 2016

2 commits

3dd029ef9 perf report: Calculate and return the branch flag counting ... Browse Code »

Create some branch counters in per callchain list entry. Each counter
is for a branch flag. For example, predicted_count counts all the
*predicted* branches. The counters get updated by processing the
callchain cursor nodes.

It also provides functions to retrieve or print the values of counters
in callchain list.

Besides the counting for branch flags, it also counts and returns the
average number of iterations.

Signed-off-by: Yao Jin
Acked-by: Andi Kleen
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin
Link: http://lkml.kernel.org/r/1477876794-30749-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2016-11-15 00:25:58 +0800
410024dbb perf report: Add branch flag to callchain cursor node ... Browse Code »

Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.

Then we can know if the cursor node represents a branch and know what
the branch flag it has.

The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.

For example:

Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800

After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800

The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).

iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples

This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.

Signed-off-by: Yao Jin
Acked-by: Andi Kleen
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2016-11-15 00:15:56 +0800

08 Nov, 2016

1 commit

c56cb33b5 perf callchain: Fixup help/config for no-unwinding ... Browse Code »

Since 841e3558b2d ("perf callchain: Recording 'dwarf' callchains do not
need DWARF unwinding support"), --call-graph dwarf is allowed in 'perf
record' even without unwind support. A couple of other places don't
reflect this yet though: the help text should list dwarf as a valid
record mode and the dump_size config should be respected too.

Signed-off-by: Rabin Vincent
Cc: He Kuang
Fixes: 841e3558b2de ("perf callchain: Recording 'dwarf' callchains do not need DWARF unwinding support")
Link: http://lkml.kernel.org/r/1470837148-7642-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo

Rabin Vincent
2016-11-08 09:13:47 +0800

05 Jul, 2016

1 commit

347ca8780 perf tests: Fix hist accumulation test ... Browse Code »

User's values from .perfconfig could overload the default callchain
setup and cause this test to fail. Making sure the test is using
default callchain_param values.

Signed-off-by: Jiri Olsa
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1467634583-29147-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2016-07-05 06:39:01 +0800

30 May, 2016

1 commit

792d48b4c perf tools: Per event max-stack settings ... Browse Code »

The tooling counterpart, now it is possible to do:

# perf record -e sched:sched_switch/max-stack=10/ -e cycles/call-graph=dwarf,max-stack=4/ -e cpu-cycles/call-graph=dwarf,max-stack=1024/ usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.052 MB perf.data (5 samples) ]
# perf evlist -v
sched:sched_switch: type: 2, size: 112, config: 0x110, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, sample_max_stack: 10
cycles/call-graph=dwarf,max-stack=4/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 4
cpu-cycles/call-graph=dwarf,max-stack=1024/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 1024
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Using just /max-stack=N/ means /call-graph=fp,max-stack=N/, that should
be further configurable by means of some .perfconfig knob.

Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Brendan Gregg
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: He Kuang
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Masami Hiramatsu
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Wang Nan
Cc: Zefan Li
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2016-05-30 23:41:44 +0800

16 Apr, 2016

1 commit

0883e820a perf record: Export record_opts based callchain parsing helper ... Browse Code »

To be able to call it outside option parsing, like when setting a
default --call-graph parameter in 'perf trace' when just --min-stack is
used.

Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-xay69plylwibpb3l4isrpl1k@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2016-04-16 03:37:17 +0800

15 Apr, 2016

1 commit

91d7b2de3 perf callchain: Start moving away from global per thread cursors ... Browse Code »

The recent perf_evsel__fprintf_callchain() move to evsel.c added several
new symbol requirements to the python binding, for instance:

# perf test -v python
16: Try 'import perf' in python, checking link problems :
--- start ---
test child forked, pid 18030
Traceback (most recent call last):
File "", line 1, in
ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
callchain_cursor
test child finished with -1
---- end ----
Try 'import perf' in python, checking link problems: FAILED!
#

This would require linking against callchain.c to access to the global
callchain_cursor variables.

Since lots of functions already receive as a parameter a
callchain_cursor struct pointer, make that be the case for some more
function so that we can start phasing out usage of yet another global
variable.

Cc: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-djko3097eyg2rn66v2qcqfvn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2016-04-15 01:48:07 +0800

24 Mar, 2016

1 commit

3938bad44 perf tools: Remove needless 'extern' from function prototypes ... Browse Code »

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-w246stf7ponfamclsai6b9zo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2016-03-24 02:06:35 +0800

08 Jan, 2016

1 commit

42b276a23 perf top: Decay periods in callchains ... Browse Code »

It missed to decay periods in callchains when decaying hist entries.
This resulted in more than 100 percent overhead in callchains in the
fractal style output.

Reported-by: Arnaldo Carvalho de Melo
Signed-off-by: Namhyung Kim
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1451963160-17196-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2016-01-08 23:37:51 +0800

24 Nov, 2015

1 commit

646a6e846 perf callchain: Add missing parent_val initialization ... Browse Code »

Adding missing parent_val callchain_node initialization.
It's causing segfault in perf top:

$ sudo perf top -g
perf: Segmentation fault
-------- backtrace --------
free_callchain_node(+0x29) in perf [0x4a4b3e]
free_callchain(+0x29) in perf [0x4a5a83]
hist_entry__delete(+0x126) in perf [0x4c6649]
hists__delete_entry(+0x6e) in perf [0x4c66dc]
hists__decay_entries(+0x7d) in perf [0x4c6776]
perf_top__sort_new_samples(+0x7c) in perf [0x436a78]
hist_browser__run(+0xf2) in perf [0x507760]
perf_evsel__hists_browse(+0x1da) in perf [0x507c8d]
perf_evlist__tui_browse_hists(+0x3e) in perf [0x5088cf]
display_thread_tui(+0x7f) in perf [0x437953]
start_thread(+0xc5) in libpthread-2.21.so [0x7f7068fbb555]
__clone(+0x6d) in libc-2.21.so [0x7f7066fc3b9d]
[0x0]

Reported-and-Tested-by: Arnaldo Carvalho de Melo
Signed-off-by: Jiri Olsa
Acked-by: Namhyung Kim
Cc: Masami Hiramatsu
Cc: Wang Nan
Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains")
Link: http://lkml.kernel.org/r/20151121102355.GA17313@krava.local
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2015-11-24 05:31:25 +0800

20 Nov, 2015

5 commits

4b3a32122 perf hists browser: Support flat callchains ... Browse Code »

The flat callchain mode is to print all chains in a single, simple
hierarchy so make it easy to see.

Currently perf report --tui doesn't show flat callchains properly. With
flat callchains, only leaf nodes are added to the final rbtree so it
should show entries in parent nodes. To do that, add parent_val list to
struct callchain_node and show them along with the (normal) val list.

For example, consider following callchains with '-g graph'.

$ perf report -g graph
- 39.93% swapper [kernel.vmlinux] [k] intel_idle
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
- cpu_startup_entry
28.63% start_secondary
- 11.30% rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel

Before:
$ perf report -g flat
- 39.93% swapper [kernel.vmlinux] [k] intel_idle
28.63% start_secondary
- 11.30% rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel

After:
$ perf report -g flat
- 39.93% swapper [kernel.vmlinux] [k] intel_idle
- 28.63% intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary
- 11.30% intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_kernel
x86_64_start_reservations
x86_64_start_kernel

Signed-off-by: Namhyung Kim
Tested-by: Arnaldo Carvalho de Melo
Tested-by: Brendan Gregg
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1447047946-1691-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-11-20 00:19:24 +0800
f2af00869 perf report: Add callchain value option ... Browse Code »

Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'. The percent is
same as before and it's the default behavior. The period displays the
raw period value rather than the percentage. The count displays the
number of occurrences.

$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init

$ perf report --no-children --show-total-period --stdio -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init

$ perf report --no-children --show-nr-samples --stdio -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init

Signed-off-by: Namhyung Kim
Acked-by: Brendan Gregg
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1447047946-1691-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-11-20 00:19:23 +0800
5e47f8ff4 perf callchain: Add count fields to struct callchain_node ... Browse Code »

It's to track the count of occurrences of the callchains.

Signed-off-by: Namhyung Kim
Acked-by: Brendan Gregg
Acked-by: Jiri Olsa
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1447047946-1691-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-11-20 00:19:23 +0800
5ab250caf perf callchain: Abstract callchain print function ... Browse Code »

This is a preparation to support for printing other type of callchain
value like count or period.

Signed-off-by: Namhyung Kim
Tested-by: Brendan Gregg
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1447047946-1691-4-git-send-email-namhyung@kernel.org
[ renamed new _sprintf_ operation to _scnprintf_ ]
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-11-20 00:19:22 +0800
26e779245 perf report: Support folded callchain mode on --stdio ... Browse Code »

Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, and preceded by (absolute)
percent values and a space.

For example, the following 20 lines can be printed in 3 lines with the
folded output mode:

$ perf report -g flat --no-children | grep -v ^# | head -20
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary

5.88%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel

$ perf report -g folded --no-children | grep -v ^# | head -3
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.88% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel

This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1]. Support for other UI might be
added later.

[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Requested-and-Tested-by: Brendan Gregg
Signed-off-by: Namhyung Kim
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Jiri Olsa
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1447047946-1691-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-11-20 00:19:22 +0800

23 Oct, 2015

4 commits

76a26549e perf tools: Improve call graph documents and help messages ... Browse Code »

The --call-graph option is complex so we should provide better guide for
users. Also change help message to be consistent with config option
names. Now perf top will show help like below:

$ perf top --call-graph
Error: option `call-graph' requires a value

Usage: perf top []

--call-graph
setup and enables call-graph (stack chain/backtrace):

record_mode: call graph recording mode (fp|dwarf|lbr)
record_size: if record_mode is 'dwarf', max size of stack recording ()
default: 8192 (bytes)
print_type: call graph printing style (graph|flat|fractal|none)
threshold: minimum call graph inclusion threshold ()
print_limit: maximum number of call graph entry ()
order: call graph order (caller|callee)
sort_key: call graph sort key (function|address)
branch: include last branch info to call graph (branch)

Default: fp,graph,0.5,caller,function

Requested-by: Ingo Molnar
Signed-off-by: Namhyung Kim
Acked-by: Frederic Weisbecker
Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: Brendan Gregg
Cc: Chandler Carruth
Cc: David Ahern
Cc: Jiri Olsa
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-10-23 03:23:19 +0800
792aeafa8 perf tools: Defaults to 'caller' callchain order only if --children is enabled ... Browse Code »

The caller callchain order is useful with --children option since it can
show 'overview' style output, but other commands which don't use
--children feature like 'perf script' or even 'perf report/top' without
--children are better to keep callee order.

Signed-off-by: Namhyung Kim
Acked-by: Brendan Gregg
Acked-by: Frederic Weisbecker
Acked-by: Ingo Molnar
Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: Chandler Carruth
Cc: David Ahern
Cc: Jiri Olsa
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1445499946-29817-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-10-23 02:40:11 +0800
a2c10d39a perf top: Support call-graph display options also ... Browse Code »

Currently 'perf top --call-graph' option is same as 'perf record'. But
'perf top' also need to receive display options in 'perf report'. To do
that, change parse_callchain_report_opt() to allow record options too.

Now perf top can receive display options like below:

$ perf top --call-graph
Error: option `call-graph' requires a value

Usage: perf top []

--call-graph

setup and enables call-graph (stack chain/backtrace)
recording: fp dwarf lbr, output_type (graph, flat,
fractal, or none), min percent threshold, optional
print limit, callchain order, key (function or
address), add branches

$ perf top --call-graph callee,graph,fp

Signed-off-by: Namhyung Kim
Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: Brendan Gregg
Cc: Chandler Carruth
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1445495330-25416-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-10-23 02:40:02 +0800
21cf62847 perf tools: Move callchain help messages to callchain.h ... Browse Code »

These messages will be used by 'perf top' in the next patch.

Signed-off-by: Namhyung Kim
Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: Brendan Gregg
Cc: Chandler Carruth
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1445495330-25416-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-10-23 02:39:51 +0800

09 Aug, 2015

1 commit

076a30c41 perf callchain: Move option parsing code to util.c ... Browse Code »

Move callchain option parse related code to util.c, to avoid dragging
more object files into the python binding.

Signed-off-by: Kan Liang
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1438890294-33409-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Kan Liang
2015-08-09 01:16:49 +0800

06 Aug, 2015

1 commit

c3a6a8c40 perf tools: Refine parse/config callchain functions ... Browse Code »

Pass global callchain_param into parse_callchain_record_opt and
perf_evsel__config_callgraph as parameter. So we can reuse these
functions to parse/config local param for callchain.

Signed-off-by: Kan Liang
Acked-by: Jiri Olsa
Cc: Andi Kleen
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1438677022-34296-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Kan Liang
2015-08-06 03:42:11 +0800

06 May, 2015

1 commit

3698dab1c perf tools: Move TUI-specific fields out of map_symbol ... Browse Code »

The has_children and unfolded fields don't belong to the struct
map_symbol since they're used by the TUI only. Move those fields out of
map_symbol since the struct is also used by other places.

This will also help to compact the sizeof struct hist_entry.

Signed-off-by: Namhyung Kim
Cc: David Ahern
Cc: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1429687101-4360-11-git-send-email-namhyung@kernel.org
Link: http://lkml.kernel.org/r/1430837746-5439-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-05-06 05:13:24 +0800

19 Feb, 2015

1 commit

aad2b21c1 perf tools: Enable LBR call stack support ... Browse Code »

Currently, there are two call chain recording options, fp and dwarf.

Haswell has a new feature that utilizes the existing LBR facility to
record call chains. Kernel side LBR support code provides this as a
third option to record call chains. This patch enables the lbr call
stack support on the tooling side.

LBR call stack has some limitations:

- It reuses current LBR facility, so LBR call stack and branch record
can not be enabled at the same time.

- It is only available for user-space callchains.

However, it also offers some advantages:

- LBR call stack can work on user apps which don't have frame-pointers
or dwarf debug info compiled. It is a good alternative when nothing
else works.

Tested-by: Jiri Olsa
Signed-off-by: Kan Liang
Signed-off-by: Peter Zijlstra (Intel)
Cc: Adrian Hunter
Cc: Anshuman Khandual
Cc: Arnaldo Carvalho de Melo
Cc: Cody P Schafer
Cc: David Ahern
Cc: Don Zickus
Cc: Jacob Shin
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Masanari Iida
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Rodrigo Campos
Cc: Stephane Eranian
Cc: Sukadev Bhattiprolu
Link: http://lkml.kernel.org/r/1420482185-29830-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar

Kan Liang
2015-02-19 00:16:17 +0800

08 Jan, 2015

1 commit

d114960c4 perf callchain: Free callchains when hist entries are deleted ... Browse Code »

Markus reported that "perf top -g" can leak ~300MB per second on his
machine. This is partly because it missed to free callchains when hist
entries are deleted. Fix it.

Reported-by: Markus Trippelsdorf
Signed-off-by: Namhyung Kim
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Markus Trippelsdorf
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20141230053813.GD6081@sejong
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-01-08 22:56:35 +0800

02 Dec, 2014

1 commit

8b7bad58e perf callchain: Support handling complete branch stacks as histograms ... Browse Code »

Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.

This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.

Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:

tcall.c:

volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
c = a / b;
}

__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record or anywhere
else.

This adds the basic code to report:

- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.

The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.

- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:

- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.

Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2014-12-02 07:00:31 +0800

25 Nov, 2014

1 commit

23f0981bb perf callchain: Enable printing the srcline in the history ... Browse Code »

For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.

When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.

This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.

Contains fixes from Namhyung Kim

v2: Refactor code into common function
v3: Fix GTK build
v4: Rebase

Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2014-11-25 05:03:46 +0800

19 Nov, 2014

1 commit

2989ccaac perf callchain: Use a common function to resolve symbol or name ... Browse Code »

Refactor the duplicated code to resolve the symbol name or
the address of a symbol into a single function.

Used in next patch to add common functionality.

Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2014-11-19 23:33:47 +0800

29 Oct, 2014

1 commit

bb871a9c8 perf tools: A thread's machine can be found via thread->mg->machine ... Browse Code »

So stop passing both machine and thread to several thread methods,
reducing function signature length.

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-ckcy19dcp1jfkmdihdjcqdn1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2014-10-29 20:32:46 +0800

15 Oct, 2014

1 commit

8f651eae1 perf callchain: Move the callchain_param extern to callchain.h ... Browse Code »

It was lost in hist.h, move it to where it belongs, callchain.h, as
there are places that gets hist.h by means of evsel.h, and since evsel.h
is being untangled from hist.h...

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-0rg7ji1jnbm6q6gj35j37jby@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2014-10-15 04:32:51 +0800

26 Sep, 2014

3 commits

2b9240caf perf tools: Introduce perf_callchain_config() ... Browse Code »

This patch adds support for following config options to ~/.perfconfig file.

[call-graph]
record-mode = dwarf
dump-size = 8192
print-type = fractal
order = callee
threshold = 0.5
print-limit = 128
sort-key = function

Reviewed-by: David Ahern
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1411434104-5307-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-09-26 23:43:24 +0800
f7f084f4d perf callchain: Move some parser functions to callchain.c ... Browse Code »

And rename record_callchain_parse() to parse_callchain_record_opt() in
accordance to parse_callchain_report_opt().

Reviewed-by: David Ahern
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1411434104-5307-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-09-26 23:41:57 +0800
72a128aa0 perf tools: Move callchain config from record_opts to callchain_param ... Browse Code »

So that all callchain config parameters can be read/written to a single
place. It's a preparation to consolidate handling of all callchain
options.

Reviewed-by: David Ahern
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1411434104-5307-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-09-26 23:40:33 +0800

27 Jun, 2014

1 commit

a60335ba3 perf tools powerpc: Adjust callchain based on DWARF debug info ... Browse Code »

When saving the callchain on Power, the kernel conservatively saves excess
entries in the callchain. A few of these entries are needed in some cases
but not others. We should use the DWARF debug information to determine
when the entries are needed.

Eg: the value in the link register (LR) is needed only when it holds the
return address of a function. At other times it must be ignored.

If the unnecessary entries are not ignored, we end up with duplicate arcs
in the call-graphs.

Use the DWARF debug information to determine if any callchain entries
should be ignored when building call-graphs.

Callgraph before the patch:

14.67% 2234 sprintft libc-2.18.so [.] __random
|
--- __random
|
|--61.12%-- __random
| |
| |--97.15%-- rand
| | do_my_sprintf
| | main
| | generic_start_main.isra.0
| | __libc_start_main
| | 0x0
| |
| --2.85%-- do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--38.88%-- rand
|
|--94.01%-- rand
| do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--5.99%-- do_my_sprintf
main
generic_start_main.isra.0
__libc_start_main
0x0

Callgraph after the patch:

14.67% 2234 sprintft libc-2.18.so [.] __random
|
--- __random
|
|--95.93%-- rand
| do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--4.07%-- do_my_sprintf
main
generic_start_main.isra.0
__libc_start_main
0x0

TODO: For split-debug info objects like glibc, we can only determine
the call-frame-address only when both .eh_frame and .debug_info
sections are available. We should be able to determin the CFA
even without the .eh_frame section.

Fix suggested by Anton Blanchard.

Thanks to valuable input on DWARF debug information from Ulrich Weigand.

Reported-by: Maynard Johnson
Tested-by: Maynard Johnson
Signed-off-by: Sukadev Bhattiprolu
Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.com
Signed-off-by: Jiri Olsa

Sukadev Bhattiprolu
2014-06-27 17:14:51 +0800

01 Jun, 2014

2 commits

be1f13e30 perf callchain: Add callchain_cursor_snapshot() ... Browse Code »

The callchain_cursor_snapshot() is for saving current status of the
callchain. It'll be used to accumulate callchain information for each node.

Signed-off-by: Namhyung Kim
Tested-by: Arun Sharma
Tested-by: Rodrigo Campos
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1401335910-16832-9-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa

Namhyung Kim
2014-06-01 20:34:59 +0800
c7405d85d perf tools: Update cpumode for each cumulative entry ... Browse Code »

The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added. This led to have
non-matching symbol and cpumode in the output.

Update it accordingly based on the fact whether the map is a part of
the kernel or not. This is a reverse of what thread__find_addr_map()
does.

Signed-off-by: Namhyung Kim
Tested-by: Arun Sharma
Tested-by: Rodrigo Campos
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1401335910-16832-7-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa

Namhyung Kim
2014-06-01 20:34:58 +0800

05 May, 2014

1 commit

2c83bc08e perf tools: Move perf_call_graph_mode enum from perf.h ... Browse Code »

Into util/callchain.h header where all callchain related
structures should be.

Acked-by: Arnaldo Carvalho de Melo
Acked-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Borislav Petkov
Cc: Corey Ashford
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1399293219-8732-8-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa

Jiri Olsa
2014-05-05 23:48:10 +0800

22 Apr, 2014

1 commit

cff6bb46d perf callchain: Add generic report parse callchain callback function ... Browse Code »

This takes the parse_callchain_opt function and copies it into the
callchain.c file. Now the c2c tool can use it too without duplicating.

Update perf-report to use the new routine too.

Signed-off-by: Don Zickus
Reviewed-by: Namhyung Kim
Link: http://lkml.kernel.org/r/1396896924-129847-5-git-send-email-dzickus@redhat.com
[ Adding missing braces to multiline if condition ]
Signed-off-by: Jiri Olsa

Don Zickus
2014-04-22 23:39:24 +0800