Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

08 Jan, 2015

1 commit

d114960c4 perf callchain: Free callchains when hist entries are deleted ... Browse Code »

Markus reported that "perf top -g" can leak ~300MB per second on his
machine. This is partly because it missed to free callchains when hist
entries are deleted. Fix it.

Reported-by: Markus Trippelsdorf
Signed-off-by: Namhyung Kim
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Markus Trippelsdorf
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20141230053813.GD6081@sejong
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-01-08 22:56:35 +0800

02 Dec, 2014

1 commit

8b7bad58e perf callchain: Support handling complete branch stacks as histograms ... Browse Code »

Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.

This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.

Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:

tcall.c:

volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
c = a / b;
}

__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record or anywhere
else.

This adds the basic code to report:

- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.

The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.

- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:

- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.

Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2014-12-02 07:00:31 +0800

25 Nov, 2014

1 commit

23f0981bb perf callchain: Enable printing the srcline in the history ... Browse Code »

For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.

When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.

This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.

Contains fixes from Namhyung Kim

v2: Refactor code into common function
v3: Fix GTK build
v4: Rebase

Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2014-11-25 05:03:46 +0800

19 Nov, 2014

1 commit

2989ccaac perf callchain: Use a common function to resolve symbol or name ... Browse Code »

Refactor the duplicated code to resolve the symbol name or
the address of a symbol into a single function.

Used in next patch to add common functionality.

Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2014-11-19 23:33:47 +0800

29 Oct, 2014

1 commit

bb871a9c8 perf tools: A thread's machine can be found via thread->mg->machine ... Browse Code »

So stop passing both machine and thread to several thread methods,
reducing function signature length.

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-ckcy19dcp1jfkmdihdjcqdn1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2014-10-29 20:32:46 +0800

15 Oct, 2014

1 commit

8f651eae1 perf callchain: Move the callchain_param extern to callchain.h ... Browse Code »

It was lost in hist.h, move it to where it belongs, callchain.h, as
there are places that gets hist.h by means of evsel.h, and since evsel.h
is being untangled from hist.h...

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-0rg7ji1jnbm6q6gj35j37jby@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2014-10-15 04:32:51 +0800

26 Sep, 2014

3 commits

2b9240caf perf tools: Introduce perf_callchain_config() ... Browse Code »

This patch adds support for following config options to ~/.perfconfig file.

[call-graph]
record-mode = dwarf
dump-size = 8192
print-type = fractal
order = callee
threshold = 0.5
print-limit = 128
sort-key = function

Reviewed-by: David Ahern
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1411434104-5307-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-09-26 23:43:24 +0800
f7f084f4d perf callchain: Move some parser functions to callchain.c ... Browse Code »

And rename record_callchain_parse() to parse_callchain_record_opt() in
accordance to parse_callchain_report_opt().

Reviewed-by: David Ahern
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1411434104-5307-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-09-26 23:41:57 +0800
72a128aa0 perf tools: Move callchain config from record_opts to callchain_param ... Browse Code »
13

So that all callchain config parameters can be read/written to a single
place. It's a preparation to consolidate handling of all callchain
options.

Reviewed-by: David Ahern
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1411434104-5307-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-09-26 23:40:33 +0800

27 Jun, 2014

1 commit

a60335ba3 perf tools powerpc: Adjust callchain based on DWARF debug info ... Browse Code »

When saving the callchain on Power, the kernel conservatively saves excess
entries in the callchain. A few of these entries are needed in some cases
but not others. We should use the DWARF debug information to determine
when the entries are needed.

Eg: the value in the link register (LR) is needed only when it holds the
return address of a function. At other times it must be ignored.

If the unnecessary entries are not ignored, we end up with duplicate arcs
in the call-graphs.

Use the DWARF debug information to determine if any callchain entries
should be ignored when building call-graphs.

Callgraph before the patch:

14.67% 2234 sprintft libc-2.18.so [.] __random
|
--- __random
|
|--61.12%-- __random
| |
| |--97.15%-- rand
| | do_my_sprintf
| | main
| | generic_start_main.isra.0
| | __libc_start_main
| | 0x0
| |
| --2.85%-- do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--38.88%-- rand
|
|--94.01%-- rand
| do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--5.99%-- do_my_sprintf
main
generic_start_main.isra.0
__libc_start_main
0x0

Callgraph after the patch:

14.67% 2234 sprintft libc-2.18.so [.] __random
|
--- __random
|
|--95.93%-- rand
| do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--4.07%-- do_my_sprintf
main
generic_start_main.isra.0
__libc_start_main
0x0

TODO: For split-debug info objects like glibc, we can only determine
the call-frame-address only when both .eh_frame and .debug_info
sections are available. We should be able to determin the CFA
even without the .eh_frame section.

Fix suggested by Anton Blanchard.

Thanks to valuable input on DWARF debug information from Ulrich Weigand.

Reported-by: Maynard Johnson
Tested-by: Maynard Johnson
Signed-off-by: Sukadev Bhattiprolu
Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.com
Signed-off-by: Jiri Olsa

Sukadev Bhattiprolu
2014-06-27 17:14:51 +0800

01 Jun, 2014

2 commits

be1f13e30 perf callchain: Add callchain_cursor_snapshot() ... Browse Code »

The callchain_cursor_snapshot() is for saving current status of the
callchain. It'll be used to accumulate callchain information for each node.

Signed-off-by: Namhyung Kim
Tested-by: Arun Sharma
Tested-by: Rodrigo Campos
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1401335910-16832-9-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa

Namhyung Kim
2014-06-01 20:34:59 +0800
c7405d85d perf tools: Update cpumode for each cumulative entry ... Browse Code »

The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added. This led to have
non-matching symbol and cpumode in the output.

Update it accordingly based on the fact whether the map is a part of
the kernel or not. This is a reverse of what thread__find_addr_map()
does.

Signed-off-by: Namhyung Kim
Tested-by: Arun Sharma
Tested-by: Rodrigo Campos
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1401335910-16832-7-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa

Namhyung Kim
2014-06-01 20:34:58 +0800

05 May, 2014

1 commit

2c83bc08e perf tools: Move perf_call_graph_mode enum from perf.h ... Browse Code »

Into util/callchain.h header where all callchain related
structures should be.

Acked-by: Arnaldo Carvalho de Melo
Acked-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Borislav Petkov
Cc: Corey Ashford
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1399293219-8732-8-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa

Jiri Olsa
2014-05-05 23:48:10 +0800

22 Apr, 2014

1 commit

cff6bb46d perf callchain: Add generic report parse callchain callback function ... Browse Code »

This takes the parse_callchain_opt function and copies it into the
callchain.c file. Now the c2c tool can use it too without duplicating.

Update perf-report to use the new routine too.

Signed-off-by: Don Zickus
Reviewed-by: Namhyung Kim
Link: http://lkml.kernel.org/r/1396896924-129847-5-git-send-email-dzickus@redhat.com
[ Adding missing braces to multiline if condition ]
Signed-off-by: Jiri Olsa

Don Zickus
2014-04-22 23:39:24 +0800

16 Jan, 2014

1 commit

2dc9fb1a7 perf tools: Factor out sample__resolve_callchain() ... Browse Code »

The report__resolve_callchain() can be shared with perf top code as it
doesn't really depend on the perf report code. Factor it out as
sample__resolve_callchain(). The same goes to the hist_entry__append_
callchain() too.

Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: Andi Kleen
Cc: Arun Sharma
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Rodrigo Campos
Link: http://lkml.kernel.org/r/1389677157-30513-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-01-16 02:32:43 +0800

20 Dec, 2013

1 commit

b40067964 perf tools: Rename 'perf_record_opts' to 'record_opts ... Browse Code »

Reduce typing, functions use class__method convention, so unlikely to
clash with other libraries.

This actually was discussed in the "Link:" referenced message below.

Cc: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20131112113427.GA4053@ghostprotocols.net
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2013-12-20 01:43:45 +0800

29 Oct, 2013

2 commits

aac898548 Merge branch 'perf/urgent' into perf/core ... Browse Code »

Conflicts:
tools/perf/builtin-record.c
tools/perf/builtin-top.c
tools/perf/util/hist.h

Ingo Molnar
2013-10-29 18:23:32 +0800
09b0fd45f perf record: Split -g and --call-graph ... Browse Code »
13

Splitting -g and --call-graph for record command, so we could use '-g'
with no option.

The '-g' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.

It will be possible to configure unwind method via config file in
upcoming patches.

All current '-g' arguments is overtaken by --call-graph option.

Signed-off-by: Jiri Olsa
Tested-by: David Ahern
Tested-by: Ingo Molnar
Reviewed-by: David Ahern
Acked-by: Ingo Molnar
Cc: Adrian Hunter
Cc: Andi Kleen
Cc: Corey Ashford
Cc: David Ahern
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1382797536-32303-2-git-send-email-jolsa@redhat.com
[ reordered -g/--call-graph on --help and expanded the man page
according to comments by David Ahern and Namhyung Kim ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2013-10-29 03:05:59 +0800

22 Oct, 2013

1 commit

e369517ce perf callchain: Convert children list to rbtree ... Browse Code »

Current collapse stage has a scalability problem which can be reproduced
easily with a parallel kernel build.

This is because it needs to traverse every children of callchains
linearly during the collapse/merge stage.

Converting it to a rbtree reduced the overhead significantly.

On my 400MB perf.data file which recorded with make -j32 kernel build:

$ time perf --no-pager report --stdio > /dev/null

before:
real 6m22.073s
user 6m18.683s
sys 0m0.706s

after:
real 0m20.780s
user 0m19.962s
sys 0m0.689s

During the perf report the overhead on append_chain_children went down
from 96.69% to 18.16%:

- 18.16% perf perf [.] append_chain_children
- append_chain_children
- 77.48% append_chain_children
+ 69.79% merge_chain_branch
- 22.96% append_chain_children
+ 67.44% merge_chain_branch
+ 30.15% append_chain_children
+ 2.41% callchain_append
+ 7.25% callchain_append
+ 12.26% callchain_append
+ 10.22% merge_chain_branch
+ 11.58% perf perf [.] dso__find_symbol
+ 8.02% perf perf [.] sort__comm_cmp
+ 5.48% perf libc-2.17.so [.] malloc_consolidate

Reported-by: Linus Torvalds
Signed-off-by: Namhyung Kim
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1381468543-25334-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2013-10-22 04:33:23 +0800

30 Aug, 2013

1 commit

07940293b perf callchain: Remove unnecessary validation ... Browse Code »

Now that the sample parsing correctly checks data sizes there is no
reason for it to be done again for callchains.

Signed-off-by: Adrian Hunter
Acked-by: Namhyung Kim
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1377591794-30553-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Adrian Hunter
2013-08-30 02:11:29 +0800

22 Jul, 2013

1 commit

99571ab3d perf tools: Support callchain sorting based on addresses ... Browse Code »

With programs with very large functions it can be useful to distinguish
the callgraph nodes on more than just function names. So for example if
you have multiple calls to the same function, it ends up being separate
nodes in the chain.

This patch adds a new key field to the callgraph options, that allows
comparing nodes on functions (as today, default) and addresses.

Longer term it would be nice to also handle src lines, but that would
need more changes and address is a reasonable proxy for it today.

I right now reference the global params, as there was no simple way to
register a params pointer.

Signed-off-by: Andi Kleen
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-0uskktybf0e7wrnoi5e9b9it@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2013-07-22 23:42:18 +0800

12 Dec, 2012

1 commit

75d9a1085 perf record: Export the callchain parsing routine and help ... Browse Code »

Will be used by perf top, that will first setup the symbol system to
deal with callchains and then call these routines to ask the kernel
for callchains.

Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-jg0dh8rmlx7x11e7u7mnasvd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2012-12-12 04:22:14 +0800

01 Sep, 2012

1 commit

3fd44cd40 tools: perf: Fix typo in tools/perf ... Browse Code »

Correct spelling typo in tools/perf.

Signed-off-by: Masanari iida
Signed-off-by: Jiri Kosina

Masanari Iida
2012-09-01 23:49:34 +0800

31 May, 2012

1 commit

472606458 perf callchain: Make callchain cursors TLS ... Browse Code »

perf top -G has a race on callchain cursor between main thread and
display thread. Since the callchain cursors are used locally make them
thread-local data would solve the problem.

Signed-off-by: Namhyung Kim
Reported-by: Sunjin Yang
Suggested-by: Arnaldo Carvalho de Melo
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Sunjin Yang
Link: http://lkml.kernel.org/r/1338443007-24857-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2012-05-31 21:47:12 +0800

28 Nov, 2011

1 commit

d20deb64e perf tools: Pass tool context in the the perf_event_ops functions ... Browse Code »

So that we don't need to have that many globals.

Next steps will remove the 'session' pointer, that in most cases is
not needed.

Then we can rename perf_event_ops to 'perf_tool' that better describes
this class hierarchy.

Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-11-28 20:38:56 +0800

30 Jun, 2011

1 commit

d797fdc5c perf tools: Add inverted call graph report support. ... Browse Code »

Add "caller/callee" option to support inverted butterfly report,
in the inverted report (with caller option), the call graph start
from the callee's ancestor. Users can use such view to catch system's
performance bottleneck from a sysprof like view. Using this option
with specified sort order like pid gives us high level view of call
graph statistics.

Also add "-G" alias for inverted call graph.

Signed-off-by: Sam Liao
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Stephane Eranian
Cc: David Ahern
Signed-off-by: Frederic Weisbecker

Sam Liao
2011-06-30 06:24:30 +0800

30 Jan, 2011

1 commit

8115d60c3 perf tools: Kill event_t typedef, use 'union perf_event' instead ... Browse Code »

And move the event_t methods to the perf_event__ too.

No code changes, just namespace consistency.

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-30 02:25:37 +0800

23 Jan, 2011

4 commits

529363b76 perf callchain: Don't give arbitrary gender to callchain tree nodes ... Browse Code »

Some little callchain tree nodes shyly asked me if they can have
sisters.

How cute!

Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Frederic Weisbecker
Signed-off-by: Arnaldo Carvalho de Melo

Frederic Weisbecker
2011-01-23 05:56:31 +0800
16537f135 perf callchain: Rename register_callchain_param into callchain_register_param ... Browse Code »

To make the callchain API naming more consistent.

Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Frederic Weisbecker
Signed-off-by: Arnaldo Carvalho de Melo

Frederic Weisbecker
2011-01-23 05:56:31 +0800
f08c3154a perf callchain: Rename cumul_hits into callchain_cumul_hits ... Browse Code »

That makes the callchain API naming more consistent and
reduce potential naming clashes.

Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Frederic Weisbecker
Signed-off-by: Arnaldo Carvalho de Melo

Frederic Weisbecker
2011-01-23 05:56:31 +0800
1b3a0e959 perf callchain: Feed callchains into a cursor ... Browse Code »

The callchains are fed with an array of a fixed size.
As a result we iterate over each callchains three times:

- 1st to resolve symbols
- 2nd to filter out context boundaries
- 3rd for the insertion into the tree

This also involves some pairs of memory allocation/deallocation
everytime we insert a callchain, for the filtered out array of
addresses and for the array of symbols that comes along.

Instead, feed the callchains through a linked list with persistent
allocations. It brings several pros like:

- Merge the 1st and 2nd iterations in one. That was possible before
but in a way that would involve allocating an array slightly taller
than necessary because we don't know in advance the number of context
boundaries to filter out.

- Much lesser allocations/deallocations. The linked list keeps
persistent empty entries for the next usages and is extendable at
will.

- Makes it easier for multiple sources of callchains to feed a
stacktrace together. This is deemed to pave the way for cfi based
callchains wherein traditional frame pointer based kernel
stacktraces will precede cfi based user ones, producing an overall
callchain which size is hardly predictable. This requirement
makes the static array obsolete and makes a linked list based
iterator a much more flexible fit.

Basic testing on a big perf file containing callchains (~ 176 MB)
has shown a throughput gain of about 11% with perf report.

Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Frederic Weisbecker
Signed-off-by: Arnaldo Carvalho de Melo

Frederic Weisbecker
2011-01-23 05:56:31 +0800

27 Aug, 2010

2 commits

98ee74a75 Merge branch 'perf/urgent' into perf/core ... Browse Code »

Conflicts:
tools/perf/util/callchain.h

Merge reason:
Fix a non-trivial conflict with latest fixes

Frederic Weisbecker
2010-08-27 08:30:07 +0800
5225c4589 perf: Initialize callchains roots's childen hits ... Browse Code »

Each histogram entry has a callchain root that stores the
callchain samples. However we forgot to initialize the
tracking of children hits of these roots, which then got
random values on their creation.

The root children hits is multiplied by the minimum percentage
of hits provided by the user, and the result becomes the minimum
hits expected from children branches. If the random value due
to the uninitialization is big enough, then this minimum number
of hits can be huge and eventually filter every children branches.

The end result was invisible callchains. All we need to
fix this is to initialize the children hits of the root.

Reported-by: Christoph Hellwig
Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: 2.6.32.x-2.6.35.y

Frederic Weisbecker
2010-08-27 07:51:36 +0800

23 Aug, 2010

3 commits

612d4fd7d perf: Support for callchains merge ... Browse Code »

If we sort the histograms by comm, which is the default,
we need to merge some of them, typically different thread
histograms of a same process, or just same comm. But during
this merge, we forgot to merge callchains.

So imagine we have three threads (tids: 1000, 1001, 1002) that
belong to comm "foo".

tid 1000 got 100 events
tid 1001 got 10 events
tid 1002 got 3 events

Once we merge these histograms to get a per comm result, we'll
finally get:

"foo" got 113 events

The problem is if we merge 1000 and 1001 histograms into 1002, then
the end merge result, wrt callchains, will be only callchains that
belong to 1002.
This is because we haven't handled callchains in the merge. Only those
from one of the threads inside a common comm survive.

It means during this merge, we can lose a lot of callchains.

Fix this by implementing callchains merge and apply it on histograms
that collapse.

Reported-by: Christoph Hellwig
Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras

Frederic Weisbecker
2010-08-23 03:10:35 +0800
6cb8e5616 perf: Rename append_callchain into callchain_append ... Browse Code »

Do that to start a consistant callchain API namespace.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Christoph Hellwig

Frederic Weisbecker
2010-08-23 02:43:51 +0800
d2009c513 perf: Keep track of the max depth of a callchain ... Browse Code »

In order to implement callchains collapsing, we need to keep
track of the maximum depth in a histogram tree of callchains.
This way we'll avoid allocating an arbitrary temporary buffer
size on callchain merge time.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Christoph Hellwig

Frederic Weisbecker
2010-08-23 02:43:17 +0800

22 Jul, 2010

1 commit

9dcdbf7a3 Merge branch 'linus' into perf/core ... Browse Code »

Merge reason: Pick up the latest perf fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-07-22 03:43:06 +0800

08 Jul, 2010

2 commits

108553e1f perf: Sync callchains with period based hits ... Browse Code »

Hists have their hits increased by the event period. And this
period based counting is the foundation of all the stats in
perf report.

But callchains still use the raw number of hits, without taking
the period into account. So when we compute the percentage,
absolute based percentages are totally broken, and relative ones
too in the first parent level. Because we pass the number of events
muliplied by their period as the total number of hits to the
callchain filtering, while callchains expect this number to be
the number of raw hits.

perf report -g graph was simply not working, showing no graph unless
the min percent was zero. And even there the percentage of the
branches was always 0. And may be fractal filtering was broken on
the first branch level too.

flat also was broken, but it was hidden because of other breakages.

Anyway fix this by counting using periods on callchains.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras

Frederic Weisbecker
2010-07-08 12:26:56 +0800
97aa10527 perf: Resurrect flat callchains ... Browse Code »

Initialize the callchain radix tree root correctly.

When we walk through the parents, we must stop after the root, but
since it wasn't well initialized, its parent pointer was random.

Also the number of hits was random because uninitialized, hence it
was part of the callchain while the root doesn't contain anything.

This fixes segfaults and percentages followed by empty callchains
while running:

perf report -g flat

Reported-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: 2.6.31.x-2.6.34.x

Frederic Weisbecker
2010-07-08 12:20:15 +0800

05 Jun, 2010

1 commit

41a37e201 perf tools: Make event__preprocess_sample parse the sample ... Browse Code »

Simplifying the tools that were using both in sequence and allowing
upcoming simplifications, such as Arun's patch to sort by cpus.

Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2010-06-05 20:35:19 +0800