27 Mar, 2017

1 commit

  • Often it is interesting to know how costly a given source line is in
    total. Previously, one had to build these sums manually based on all
    addresses that pointed to the same source line. This patch introduces
    srcline as a sort key, which will do the aggregation for us.

    Paired with the recent addition of showing inline frames, this makes
    perf report much more useful for many C++ work loads.

    The following shows the new feature in action. First, let's show the
    status quo output when we sort by address. The result contains many hist
    entries that generate the same output:

    ~~~~~~~~~~~~~~~~
    $ perf report --stdio --inline -g address
    # Children Self Command Shared Object Symbol
    # ........ ........ ............ ................... .........................................
    #
    99.89% 35.34% cpp-inlining cpp-inlining [.] main
    |
    |--64.55%--main complex:655
    | /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
    | /usr/include/c++/6.3.1/complex:664 (inline)
    | |
    | |--60.31%--hypot +20
    | | |
    | | |--8.52%--__hypot_finite +273
    | | |
    | | |--7.32%--__hypot_finite +411
    ...
    --35.34%--_start +4194346
    __libc_start_main +241
    |
    |--6.65%--main random.tcc:3326
    | /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:185 (inline)
    |
    |--2.70%--main random.tcc:3326
    | /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:185 (inline)
    |
    |--1.69%--main random.tcc:3326
    | /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:185 (inline)
    ...
    ~~~~~~~~~~~~~~~~

    With this patch and `-g srcline` we instead get the following output:

    ~~~~~~~~~~~~~~~~
    $ perf report --stdio --inline -g srcline
    # Children Self Command Shared Object Symbol
    # ........ ........ ............ ................... .........................................
    #
    99.89% 35.34% cpp-inlining cpp-inlining [.] main
    |
    |--64.55%--main complex:655
    | /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
    | /usr/include/c++/6.3.1/complex:664 (inline)
    | |
    | |--64.02%--hypot
    | | |
    | | --59.81%--__hypot_finite
    | |
    | --0.53%--cabs
    |
    --35.34%--_start
    __libc_start_main
    |
    |--12.48%--main random.tcc:3326
    | /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
    | /usr/include/c++/6.3.1/bits/random.h:185 (inline)
    ...
    ~~~~~~~~~~~~~~~~

    Signed-off-by: Milian Wolff
    Cc: Jiri Olsa
    Cc: Yao Jin
    Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Milian Wolff
     

02 Feb, 2017

1 commit

  • If dso__load_kcore frees all of the existing maps, but one has already
    been attached to a callchain cursor node, then we can get a SIGSEGV in
    any function that happens to try to use this invalid cursor. Use the
    existing map refcount mechanism to forestall cleanup of a map until the
    cursor iterates past the node.

    Signed-off-by: Krister Johansen
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: stable@kernel.org
    Fixes: 84c2cafa2889 ("perf tools: Reference count struct map")
    Link: http://lkml.kernel.org/r/20170106062331.GB2707@templeofstupid.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Krister Johansen
     

07 Dec, 2016

1 commit

  • The callchain_cursor__copy() function is to save current callchain
    captured by a cursor. It'll be used to keep callchains when switching
    to idle task for each cpu.

    Signed-off-by: Namhyung Kim
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Minchan Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161206034010.6499-3-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

15 Nov, 2016

2 commits

  • Create some branch counters in per callchain list entry. Each counter
    is for a branch flag. For example, predicted_count counts all the
    *predicted* branches. The counters get updated by processing the
    callchain cursor nodes.

    It also provides functions to retrieve or print the values of counters
    in callchain list.

    Besides the counting for branch flags, it also counts and returns the
    average number of iterations.

    Signed-off-by: Yao Jin
    Acked-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linux-kernel@vger.kernel.org
    Cc: Yao Jin
    Link: http://lkml.kernel.org/r/1477876794-30749-4-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • Since the branch ip has been added to call stack for easier browsing,
    this patch adds more branch information. For example, add a flag to
    indicate if this ip is a branch, and also add with the branch flag.

    Then we can know if the cursor node represents a branch and know what
    the branch flag it has.

    The branch history code has a loop detection pass that removes loops. It
    would be nice for knowing how many loops were removed then in next
    steps, we can compute out the average number of iterations.

    For example:

    Before remove_loops(),
    entry0: from = 0x100, to = 0x200
    entry1: from = 0x300, to = 0x250
    entry2: from = 0x300, to = 0x250
    entry3: from = 0x300, to = 0x250
    entry4: from = 0x700, to = 0x800

    After remove_loops()
    entry0: from = 0x100, to = 0x200
    entry1: from = 0x300, to = 0x250
    entry2: from = 0x700, to = 0x800

    The original entry2 and entry3 are removed. So the number of iterations
    (from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).

    iterations = removed number + 1;
    average iteractions = Sum(iteractions) / number of samples

    This formula ignores other cases, for example, iterations cross multiple
    buffers and one buffer contains 2+ loops. Because in practice, it's good
    enough.

    Signed-off-by: Yao Jin
    Acked-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linux-kernel@vger.kernel.org
    Cc: Yao Jin
    Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
    [ Renamed 'iter' to 'nr_loop_iter' for clarity ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

08 Nov, 2016

1 commit

  • Since 841e3558b2d ("perf callchain: Recording 'dwarf' callchains do not
    need DWARF unwinding support"), --call-graph dwarf is allowed in 'perf
    record' even without unwind support. A couple of other places don't
    reflect this yet though: the help text should list dwarf as a valid
    record mode and the dump_size config should be respected too.

    Signed-off-by: Rabin Vincent
    Cc: He Kuang
    Fixes: 841e3558b2de ("perf callchain: Recording 'dwarf' callchains do not need DWARF unwinding support")
    Link: http://lkml.kernel.org/r/1470837148-7642-1-git-send-email-rabin.vincent@axis.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Rabin Vincent
     

05 Jul, 2016

1 commit

  • User's values from .perfconfig could overload the default callchain
    setup and cause this test to fail. Making sure the test is using
    default callchain_param values.

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1467634583-29147-3-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

30 May, 2016

1 commit

  • The tooling counterpart, now it is possible to do:

    # perf record -e sched:sched_switch/max-stack=10/ -e cycles/call-graph=dwarf,max-stack=4/ -e cpu-cycles/call-graph=dwarf,max-stack=1024/ usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.052 MB perf.data (5 samples) ]
    # perf evlist -v
    sched:sched_switch: type: 2, size: 112, config: 0x110, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, sample_max_stack: 10
    cycles/call-graph=dwarf,max-stack=4/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 4
    cpu-cycles/call-graph=dwarf,max-stack=1024/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 1024
    # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

    Using just /max-stack=N/ means /call-graph=fp,max-stack=N/, that should
    be further configurable by means of some .perfconfig knob.

    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Wang Nan
    Cc: Zefan Li
    Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

16 Apr, 2016

1 commit


15 Apr, 2016

1 commit

  • The recent perf_evsel__fprintf_callchain() move to evsel.c added several
    new symbol requirements to the python binding, for instance:

    # perf test -v python
    16: Try 'import perf' in python, checking link problems :
    --- start ---
    test child forked, pid 18030
    Traceback (most recent call last):
    File "", line 1, in
    ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
    callchain_cursor
    test child finished with -1
    ---- end ----
    Try 'import perf' in python, checking link problems: FAILED!
    #

    This would require linking against callchain.c to access to the global
    callchain_cursor variables.

    Since lots of functions already receive as a parameter a
    callchain_cursor struct pointer, make that be the case for some more
    function so that we can start phasing out usage of yet another global
    variable.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-djko3097eyg2rn66v2qcqfvn@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

24 Mar, 2016

1 commit


08 Jan, 2016

1 commit

  • It missed to decay periods in callchains when decaying hist entries.
    This resulted in more than 100 percent overhead in callchains in the
    fractal style output.

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Namhyung Kim
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1451963160-17196-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

24 Nov, 2015

1 commit

  • Adding missing parent_val callchain_node initialization.
    It's causing segfault in perf top:

    $ sudo perf top -g
    perf: Segmentation fault
    -------- backtrace --------
    free_callchain_node(+0x29) in perf [0x4a4b3e]
    free_callchain(+0x29) in perf [0x4a5a83]
    hist_entry__delete(+0x126) in perf [0x4c6649]
    hists__delete_entry(+0x6e) in perf [0x4c66dc]
    hists__decay_entries(+0x7d) in perf [0x4c6776]
    perf_top__sort_new_samples(+0x7c) in perf [0x436a78]
    hist_browser__run(+0xf2) in perf [0x507760]
    perf_evsel__hists_browse(+0x1da) in perf [0x507c8d]
    perf_evlist__tui_browse_hists(+0x3e) in perf [0x5088cf]
    display_thread_tui(+0x7f) in perf [0x437953]
    start_thread(+0xc5) in libpthread-2.21.so [0x7f7068fbb555]
    __clone(+0x6d) in libc-2.21.so [0x7f7066fc3b9d]
    [0x0]

    Reported-and-Tested-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jiri Olsa
    Acked-by: Namhyung Kim
    Cc: Masami Hiramatsu
    Cc: Wang Nan
    Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains")
    Link: http://lkml.kernel.org/r/20151121102355.GA17313@krava.local
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

20 Nov, 2015

5 commits

  • The flat callchain mode is to print all chains in a single, simple
    hierarchy so make it easy to see.

    Currently perf report --tui doesn't show flat callchains properly. With
    flat callchains, only leaf nodes are added to the final rbtree so it
    should show entries in parent nodes. To do that, add parent_val list to
    struct callchain_node and show them along with the (normal) val list.

    For example, consider following callchains with '-g graph'.

    $ perf report -g graph
    - 39.93% swapper [kernel.vmlinux] [k] intel_idle
    intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    - cpu_startup_entry
    28.63% start_secondary
    - 11.30% rest_init
    start_kernel
    x86_64_start_reservations
    x86_64_start_kernel

    Before:
    $ perf report -g flat
    - 39.93% swapper [kernel.vmlinux] [k] intel_idle
    28.63% start_secondary
    - 11.30% rest_init
    start_kernel
    x86_64_start_reservations
    x86_64_start_kernel

    After:
    $ perf report -g flat
    - 39.93% swapper [kernel.vmlinux] [k] intel_idle
    - 28.63% intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    cpu_startup_entry
    start_secondary
    - 11.30% intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    cpu_startup_entry
    start_kernel
    x86_64_start_reservations
    x86_64_start_kernel

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Tested-by: Brendan Gregg
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1447047946-1691-8-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Now -g/--call-graph option supports how to display callchain values.
    Possible values are 'percent', 'period' and 'count'. The percent is
    same as before and it's the default behavior. The period displays the
    raw period value rather than the percentage. The count displays the
    number of occurrences.

    $ perf report --no-children --stdio -g percent
    ...
    39.93% swapper [kernel.vmlinux] [k] intel_idel
    |
    ---intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    cpu_startup_entry
    |
    |--28.63%-- start_secondary
    |
    --11.30%-- rest_init

    $ perf report --no-children --show-total-period --stdio -g period
    ...
    39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
    |
    ---intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    cpu_startup_entry
    |
    |--9334403-- start_secondary
    |
    --3684302-- rest_init

    $ perf report --no-children --show-nr-samples --stdio -g count
    ...
    39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
    |
    ---intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    cpu_startup_entry
    |
    |--57-- start_secondary
    |
    --23-- rest_init

    Signed-off-by: Namhyung Kim
    Acked-by: Brendan Gregg
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1447047946-1691-6-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • It's to track the count of occurrences of the callchains.

    Signed-off-by: Namhyung Kim
    Acked-by: Brendan Gregg
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1447047946-1691-5-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • This is a preparation to support for printing other type of callchain
    value like count or period.

    Signed-off-by: Namhyung Kim
    Tested-by: Brendan Gregg
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1447047946-1691-4-git-send-email-namhyung@kernel.org
    [ renamed new _sprintf_ operation to _scnprintf_ ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Add new call chain option (-g) 'folded' to print callchains in a line.
    The callchains are separated by semicolons, and preceded by (absolute)
    percent values and a space.

    For example, the following 20 lines can be printed in 3 lines with the
    folded output mode:

    $ perf report -g flat --no-children | grep -v ^# | head -20
    60.48% swapper [kernel.vmlinux] [k] intel_idle
    54.60%
    intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    cpu_startup_entry
    start_secondary

    5.88%
    intel_idle
    cpuidle_enter_state
    cpuidle_enter
    call_cpuidle
    cpu_startup_entry
    rest_init
    start_kernel
    x86_64_start_reservations
    x86_64_start_kernel

    $ perf report -g folded --no-children | grep -v ^# | head -3
    60.48% swapper [kernel.vmlinux] [k] intel_idle
    54.60% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
    5.88% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel

    This mode is supported only for --stdio now and intended to be used by
    some scripts like in FlameGraphs[1]. Support for other UI might be
    added later.

    [1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

    Requested-and-Tested-by: Brendan Gregg
    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1447047946-1691-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

23 Oct, 2015

4 commits

  • The --call-graph option is complex so we should provide better guide for
    users. Also change help message to be consistent with config option
    names. Now perf top will show help like below:

    $ perf top --call-graph
    Error: option `call-graph' requires a value

    Usage: perf top []

    --call-graph
    setup and enables call-graph (stack chain/backtrace):

    record_mode: call graph recording mode (fp|dwarf|lbr)
    record_size: if record_mode is 'dwarf', max size of stack recording ()
    default: 8192 (bytes)
    print_type: call graph printing style (graph|flat|fractal|none)
    threshold: minimum call graph inclusion threshold ()
    print_limit: maximum number of call graph entry ()
    order: call graph order (caller|callee)
    sort_key: call graph sort key (function|address)
    branch: include last branch info to call graph (branch)

    Default: fp,graph,0.5,caller,function

    Requested-by: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Acked-by: Frederic Weisbecker
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Brendan Gregg
    Cc: Chandler Carruth
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • The caller callchain order is useful with --children option since it can
    show 'overview' style output, but other commands which don't use
    --children feature like 'perf script' or even 'perf report/top' without
    --children are better to keep callee order.

    Signed-off-by: Namhyung Kim
    Acked-by: Brendan Gregg
    Acked-by: Frederic Weisbecker
    Acked-by: Ingo Molnar
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Chandler Carruth
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1445499946-29817-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Currently 'perf top --call-graph' option is same as 'perf record'. But
    'perf top' also need to receive display options in 'perf report'. To do
    that, change parse_callchain_report_opt() to allow record options too.

    Now perf top can receive display options like below:

    $ perf top --call-graph
    Error: option `call-graph' requires a value

    Usage: perf top []

    --call-graph

    setup and enables call-graph (stack chain/backtrace)
    recording: fp dwarf lbr, output_type (graph, flat,
    fractal, or none), min percent threshold, optional
    print limit, callchain order, key (function or
    address), add branches

    $ perf top --call-graph callee,graph,fp

    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Brendan Gregg
    Cc: Chandler Carruth
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1445495330-25416-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • These messages will be used by 'perf top' in the next patch.

    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Brendan Gregg
    Cc: Chandler Carruth
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1445495330-25416-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

09 Aug, 2015

1 commit

  • Move callchain option parse related code to util.c, to avoid dragging
    more object files into the python binding.

    Signed-off-by: Kan Liang
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1438890294-33409-1-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

06 Aug, 2015

1 commit

  • Pass global callchain_param into parse_callchain_record_opt and
    perf_evsel__config_callgraph as parameter. So we can reuse these
    functions to parse/config local param for callchain.

    Signed-off-by: Kan Liang
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1438677022-34296-3-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

06 May, 2015

1 commit

  • The has_children and unfolded fields don't belong to the struct
    map_symbol since they're used by the TUI only. Move those fields out of
    map_symbol since the struct is also used by other places.

    This will also help to compact the sizeof struct hist_entry.

    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1429687101-4360-11-git-send-email-namhyung@kernel.org
    Link: http://lkml.kernel.org/r/1430837746-5439-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

19 Feb, 2015

1 commit

  • Currently, there are two call chain recording options, fp and dwarf.

    Haswell has a new feature that utilizes the existing LBR facility to
    record call chains. Kernel side LBR support code provides this as a
    third option to record call chains. This patch enables the lbr call
    stack support on the tooling side.

    LBR call stack has some limitations:

    - It reuses current LBR facility, so LBR call stack and branch record
    can not be enabled at the same time.

    - It is only available for user-space callchains.

    However, it also offers some advantages:

    - LBR call stack can work on user apps which don't have frame-pointers
    or dwarf debug info compiled. It is a good alternative when nothing
    else works.

    Tested-by: Jiri Olsa
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Adrian Hunter
    Cc: Anshuman Khandual
    Cc: Arnaldo Carvalho de Melo
    Cc: Cody P Schafer
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Jacob Shin
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masanari Iida
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Rodrigo Campos
    Cc: Stephane Eranian
    Cc: Sukadev Bhattiprolu
    Link: http://lkml.kernel.org/r/1420482185-29830-2-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Kan Liang
     

08 Jan, 2015

1 commit

  • Markus reported that "perf top -g" can leak ~300MB per second on his
    machine. This is partly because it missed to free callchains when hist
    entries are deleted. Fix it.

    Reported-by: Markus Trippelsdorf
    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Markus Trippelsdorf
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20141230053813.GD6081@sejong
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

02 Dec, 2014

1 commit

  • Currently branch stacks can be only shown as edge histograms for
    individual branches. I never found this display particularly useful.

    This implements an alternative mode that creates histograms over
    complete branch traces, instead of individual branches, similar to how
    normal callgraphs are handled. This is done by putting it in front of
    the normal callgraph and then using the normal callgraph histogram
    infrastructure to unify them.

    This way in complex functions we can understand the control flow that
    lead to a particular sample, and may even see some control flow in the
    caller for short functions.

    Example (simplified, of course for such simple code this is usually not
    needed), please run this after the whole patchkit is in, as at this
    point in the patch order there is no --branch-history, that will be
    added in a patch after this one:

    tcall.c:

    volatile a = 10000, b = 100000, c;

    __attribute__((noinline)) f2()
    {
    c = a / b;
    }

    __attribute__((noinline)) f1()
    {
    f2();
    f2();
    }
    main()
    {
    int i;
    for (i = 0; i < 1000000; i++)
    f1();
    }

    % perf record -b -g ./tsrc/tcall
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
    % perf report --no-children --branch-history
    ...
    54.91% tcall.c:6 [.] f2 tcall
    |
    |--65.53%-- f2 tcall.c:5
    | |
    | |--70.83%-- f1 tcall.c:11
    | | f1 tcall.c:10
    | | main tcall.c:18
    | | main tcall.c:18
    | | main tcall.c:17
    | | main tcall.c:17
    | | f1 tcall.c:13
    | | f1 tcall.c:13
    | | f2 tcall.c:7
    | | f2 tcall.c:5
    | | f1 tcall.c:12
    | | f1 tcall.c:12
    | | f2 tcall.c:7
    | | f2 tcall.c:5
    | | f1 tcall.c:11
    | |
    | --29.17%-- f1 tcall.c:12
    | f1 tcall.c:12
    | f2 tcall.c:7
    | f2 tcall.c:5
    | f1 tcall.c:11
    | f1 tcall.c:10
    | main tcall.c:18
    | main tcall.c:18
    | main tcall.c:17
    | main tcall.c:17
    | f1 tcall.c:13
    | f1 tcall.c:13
    | f2 tcall.c:7
    | f2 tcall.c:5
    | f1 tcall.c:12

    The default output is unchanged.

    This is only implemented in perf report, no change to record or anywhere
    else.

    This adds the basic code to report:

    - add a new "branch" option to the -g option parser to enable this mode
    - when the flag is set include the LBR into the callstack in machine.c.

    The rest of the history code is unchanged and doesn't know the
    difference between LBR entry and normal call entry.

    - detect overlaps with the callchain
    - remove small loop duplicates in the LBR

    Current limitations:

    - The LBR flags (mispredict etc.) are not shown in the history
    and LBR entries have no special marker.
    - It would be nice if annotate marked the LBR entries somehow
    (e.g. with arrows)

    v2: Various fixes.
    v3: Merge further patches into this one. Fix white space.
    v4: Improve manpage. Address review feedback.
    v5: Rename functions. Better error message without -g. Fix crash without
    -b.
    v6: Rebase
    v7: Rebase. Use NO_ENTRY in memset.
    v8: Port to latest tip. Move add_callchain_ip to separate
    patch. Skip initial entries in callchain. Minor cleanups.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

25 Nov, 2014

1 commit

  • For lbr-as-callgraph we need to see the line number in the history,
    because many LBR entries can be in a single function, and just
    showing the same function name many times is not useful.

    When the history code is configured to sort by address, also try to
    resolve the address to a file:srcline and display this in the browser.
    If that doesn't work still display the address.

    This can be also useful without LBRs for understanding which call in a large
    function (or in which inlined function) called something else.

    Contains fixes from Namhyung Kim

    v2: Refactor code into common function
    v3: Fix GTK build
    v4: Rebase

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-7-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

19 Nov, 2014

1 commit

  • Refactor the duplicated code to resolve the symbol name or
    the address of a symbol into a single function.

    Used in next patch to add common functionality.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-6-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

29 Oct, 2014

1 commit

  • So stop passing both machine and thread to several thread methods,
    reducing function signature length.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-ckcy19dcp1jfkmdihdjcqdn1@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 Oct, 2014

1 commit

  • It was lost in hist.h, move it to where it belongs, callchain.h, as
    there are places that gets hist.h by means of evsel.h, and since evsel.h
    is being untangled from hist.h...

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-0rg7ji1jnbm6q6gj35j37jby@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

26 Sep, 2014

3 commits

  • This patch adds support for following config options to ~/.perfconfig file.

    [call-graph]
    record-mode = dwarf
    dump-size = 8192
    print-type = fractal
    order = callee
    threshold = 0.5
    print-limit = 128
    sort-key = function

    Reviewed-by: David Ahern
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1411434104-5307-5-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • And rename record_callchain_parse() to parse_callchain_record_opt() in
    accordance to parse_callchain_report_opt().

    Reviewed-by: David Ahern
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1411434104-5307-4-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • So that all callchain config parameters can be read/written to a single
    place. It's a preparation to consolidate handling of all callchain
    options.

    Reviewed-by: David Ahern
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1411434104-5307-3-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

27 Jun, 2014

1 commit

  • When saving the callchain on Power, the kernel conservatively saves excess
    entries in the callchain. A few of these entries are needed in some cases
    but not others. We should use the DWARF debug information to determine
    when the entries are needed.

    Eg: the value in the link register (LR) is needed only when it holds the
    return address of a function. At other times it must be ignored.

    If the unnecessary entries are not ignored, we end up with duplicate arcs
    in the call-graphs.

    Use the DWARF debug information to determine if any callchain entries
    should be ignored when building call-graphs.

    Callgraph before the patch:

    14.67% 2234 sprintft libc-2.18.so [.] __random
    |
    --- __random
    |
    |--61.12%-- __random
    | |
    | |--97.15%-- rand
    | | do_my_sprintf
    | | main
    | | generic_start_main.isra.0
    | | __libc_start_main
    | | 0x0
    | |
    | --2.85%-- do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --38.88%-- rand
    |
    |--94.01%-- rand
    | do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --5.99%-- do_my_sprintf
    main
    generic_start_main.isra.0
    __libc_start_main
    0x0

    Callgraph after the patch:

    14.67% 2234 sprintft libc-2.18.so [.] __random
    |
    --- __random
    |
    |--95.93%-- rand
    | do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --4.07%-- do_my_sprintf
    main
    generic_start_main.isra.0
    __libc_start_main
    0x0

    TODO: For split-debug info objects like glibc, we can only determine
    the call-frame-address only when both .eh_frame and .debug_info
    sections are available. We should be able to determin the CFA
    even without the .eh_frame section.

    Fix suggested by Anton Blanchard.

    Thanks to valuable input on DWARF debug information from Ulrich Weigand.

    Reported-by: Maynard Johnson
    Tested-by: Maynard Johnson
    Signed-off-by: Sukadev Bhattiprolu
    Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.com
    Signed-off-by: Jiri Olsa

    Sukadev Bhattiprolu
     

01 Jun, 2014

2 commits

  • The callchain_cursor_snapshot() is for saving current status of the
    callchain. It'll be used to accumulate callchain information for each node.

    Signed-off-by: Namhyung Kim
    Tested-by: Arun Sharma
    Tested-by: Rodrigo Campos
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1401335910-16832-9-git-send-email-namhyung@kernel.org
    Signed-off-by: Jiri Olsa

    Namhyung Kim
     
  • The cpumode and level in struct addr_localtion was set for a sample
    and but updated as cumulative callchains were added. This led to have
    non-matching symbol and cpumode in the output.

    Update it accordingly based on the fact whether the map is a part of
    the kernel or not. This is a reverse of what thread__find_addr_map()
    does.

    Signed-off-by: Namhyung Kim
    Tested-by: Arun Sharma
    Tested-by: Rodrigo Campos
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1401335910-16832-7-git-send-email-namhyung@kernel.org
    Signed-off-by: Jiri Olsa

    Namhyung Kim
     

05 May, 2014

1 commit

  • Into util/callchain.h header where all callchain related
    structures should be.

    Acked-by: Arnaldo Carvalho de Melo
    Acked-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Corey Ashford
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1399293219-8732-8-git-send-email-jolsa@kernel.org
    Signed-off-by: Jiri Olsa

    Jiri Olsa
     

22 Apr, 2014

1 commit

  • This takes the parse_callchain_opt function and copies it into the
    callchain.c file. Now the c2c tool can use it too without duplicating.

    Update perf-report to use the new routine too.

    Signed-off-by: Don Zickus
    Reviewed-by: Namhyung Kim
    Link: http://lkml.kernel.org/r/1396896924-129847-5-git-send-email-dzickus@redhat.com
    [ Adding missing braces to multiline if condition ]
    Signed-off-by: Jiri Olsa

    Don Zickus