08 Jan, 2015

1 commit

  • Markus reported that "perf top -g" can leak ~300MB per second on his
    machine. This is partly because it missed to free callchains when hist
    entries are deleted. Fix it.

    Reported-by: Markus Trippelsdorf
    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Markus Trippelsdorf
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20141230053813.GD6081@sejong
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

02 Dec, 2014

1 commit

  • Currently branch stacks can be only shown as edge histograms for
    individual branches. I never found this display particularly useful.

    This implements an alternative mode that creates histograms over
    complete branch traces, instead of individual branches, similar to how
    normal callgraphs are handled. This is done by putting it in front of
    the normal callgraph and then using the normal callgraph histogram
    infrastructure to unify them.

    This way in complex functions we can understand the control flow that
    lead to a particular sample, and may even see some control flow in the
    caller for short functions.

    Example (simplified, of course for such simple code this is usually not
    needed), please run this after the whole patchkit is in, as at this
    point in the patch order there is no --branch-history, that will be
    added in a patch after this one:

    tcall.c:

    volatile a = 10000, b = 100000, c;

    __attribute__((noinline)) f2()
    {
    c = a / b;
    }

    __attribute__((noinline)) f1()
    {
    f2();
    f2();
    }
    main()
    {
    int i;
    for (i = 0; i < 1000000; i++)
    f1();
    }

    % perf record -b -g ./tsrc/tcall
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
    % perf report --no-children --branch-history
    ...
    54.91% tcall.c:6 [.] f2 tcall
    |
    |--65.53%-- f2 tcall.c:5
    | |
    | |--70.83%-- f1 tcall.c:11
    | | f1 tcall.c:10
    | | main tcall.c:18
    | | main tcall.c:18
    | | main tcall.c:17
    | | main tcall.c:17
    | | f1 tcall.c:13
    | | f1 tcall.c:13
    | | f2 tcall.c:7
    | | f2 tcall.c:5
    | | f1 tcall.c:12
    | | f1 tcall.c:12
    | | f2 tcall.c:7
    | | f2 tcall.c:5
    | | f1 tcall.c:11
    | |
    | --29.17%-- f1 tcall.c:12
    | f1 tcall.c:12
    | f2 tcall.c:7
    | f2 tcall.c:5
    | f1 tcall.c:11
    | f1 tcall.c:10
    | main tcall.c:18
    | main tcall.c:18
    | main tcall.c:17
    | main tcall.c:17
    | f1 tcall.c:13
    | f1 tcall.c:13
    | f2 tcall.c:7
    | f2 tcall.c:5
    | f1 tcall.c:12

    The default output is unchanged.

    This is only implemented in perf report, no change to record or anywhere
    else.

    This adds the basic code to report:

    - add a new "branch" option to the -g option parser to enable this mode
    - when the flag is set include the LBR into the callstack in machine.c.

    The rest of the history code is unchanged and doesn't know the
    difference between LBR entry and normal call entry.

    - detect overlaps with the callchain
    - remove small loop duplicates in the LBR

    Current limitations:

    - The LBR flags (mispredict etc.) are not shown in the history
    and LBR entries have no special marker.
    - It would be nice if annotate marked the LBR entries somehow
    (e.g. with arrows)

    v2: Various fixes.
    v3: Merge further patches into this one. Fix white space.
    v4: Improve manpage. Address review feedback.
    v5: Rename functions. Better error message without -g. Fix crash without
    -b.
    v6: Rebase
    v7: Rebase. Use NO_ENTRY in memset.
    v8: Port to latest tip. Move add_callchain_ip to separate
    patch. Skip initial entries in callchain. Minor cleanups.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

25 Nov, 2014

1 commit

  • For lbr-as-callgraph we need to see the line number in the history,
    because many LBR entries can be in a single function, and just
    showing the same function name many times is not useful.

    When the history code is configured to sort by address, also try to
    resolve the address to a file:srcline and display this in the browser.
    If that doesn't work still display the address.

    This can be also useful without LBRs for understanding which call in a large
    function (or in which inlined function) called something else.

    Contains fixes from Namhyung Kim

    v2: Refactor code into common function
    v3: Fix GTK build
    v4: Rebase

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-7-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

19 Nov, 2014

1 commit

  • Refactor the duplicated code to resolve the symbol name or
    the address of a symbol into a single function.

    Used in next patch to add common functionality.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-6-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

29 Oct, 2014

1 commit

  • So stop passing both machine and thread to several thread methods,
    reducing function signature length.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-ckcy19dcp1jfkmdihdjcqdn1@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 Oct, 2014

1 commit

  • It was lost in hist.h, move it to where it belongs, callchain.h, as
    there are places that gets hist.h by means of evsel.h, and since evsel.h
    is being untangled from hist.h...

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-0rg7ji1jnbm6q6gj35j37jby@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

26 Sep, 2014

3 commits

  • This patch adds support for following config options to ~/.perfconfig file.

    [call-graph]
    record-mode = dwarf
    dump-size = 8192
    print-type = fractal
    order = callee
    threshold = 0.5
    print-limit = 128
    sort-key = function

    Reviewed-by: David Ahern
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1411434104-5307-5-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • And rename record_callchain_parse() to parse_callchain_record_opt() in
    accordance to parse_callchain_report_opt().

    Reviewed-by: David Ahern
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1411434104-5307-4-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • So that all callchain config parameters can be read/written to a single
    place. It's a preparation to consolidate handling of all callchain
    options.

    Reviewed-by: David Ahern
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1411434104-5307-3-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

27 Jun, 2014

1 commit

  • When saving the callchain on Power, the kernel conservatively saves excess
    entries in the callchain. A few of these entries are needed in some cases
    but not others. We should use the DWARF debug information to determine
    when the entries are needed.

    Eg: the value in the link register (LR) is needed only when it holds the
    return address of a function. At other times it must be ignored.

    If the unnecessary entries are not ignored, we end up with duplicate arcs
    in the call-graphs.

    Use the DWARF debug information to determine if any callchain entries
    should be ignored when building call-graphs.

    Callgraph before the patch:

    14.67% 2234 sprintft libc-2.18.so [.] __random
    |
    --- __random
    |
    |--61.12%-- __random
    | |
    | |--97.15%-- rand
    | | do_my_sprintf
    | | main
    | | generic_start_main.isra.0
    | | __libc_start_main
    | | 0x0
    | |
    | --2.85%-- do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --38.88%-- rand
    |
    |--94.01%-- rand
    | do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --5.99%-- do_my_sprintf
    main
    generic_start_main.isra.0
    __libc_start_main
    0x0

    Callgraph after the patch:

    14.67% 2234 sprintft libc-2.18.so [.] __random
    |
    --- __random
    |
    |--95.93%-- rand
    | do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --4.07%-- do_my_sprintf
    main
    generic_start_main.isra.0
    __libc_start_main
    0x0

    TODO: For split-debug info objects like glibc, we can only determine
    the call-frame-address only when both .eh_frame and .debug_info
    sections are available. We should be able to determin the CFA
    even without the .eh_frame section.

    Fix suggested by Anton Blanchard.

    Thanks to valuable input on DWARF debug information from Ulrich Weigand.

    Reported-by: Maynard Johnson
    Tested-by: Maynard Johnson
    Signed-off-by: Sukadev Bhattiprolu
    Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.com
    Signed-off-by: Jiri Olsa

    Sukadev Bhattiprolu
     

01 Jun, 2014

2 commits

  • The callchain_cursor_snapshot() is for saving current status of the
    callchain. It'll be used to accumulate callchain information for each node.

    Signed-off-by: Namhyung Kim
    Tested-by: Arun Sharma
    Tested-by: Rodrigo Campos
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1401335910-16832-9-git-send-email-namhyung@kernel.org
    Signed-off-by: Jiri Olsa

    Namhyung Kim
     
  • The cpumode and level in struct addr_localtion was set for a sample
    and but updated as cumulative callchains were added. This led to have
    non-matching symbol and cpumode in the output.

    Update it accordingly based on the fact whether the map is a part of
    the kernel or not. This is a reverse of what thread__find_addr_map()
    does.

    Signed-off-by: Namhyung Kim
    Tested-by: Arun Sharma
    Tested-by: Rodrigo Campos
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1401335910-16832-7-git-send-email-namhyung@kernel.org
    Signed-off-by: Jiri Olsa

    Namhyung Kim
     

05 May, 2014

1 commit

  • Into util/callchain.h header where all callchain related
    structures should be.

    Acked-by: Arnaldo Carvalho de Melo
    Acked-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Corey Ashford
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1399293219-8732-8-git-send-email-jolsa@kernel.org
    Signed-off-by: Jiri Olsa

    Jiri Olsa
     

22 Apr, 2014

1 commit

  • This takes the parse_callchain_opt function and copies it into the
    callchain.c file. Now the c2c tool can use it too without duplicating.

    Update perf-report to use the new routine too.

    Signed-off-by: Don Zickus
    Reviewed-by: Namhyung Kim
    Link: http://lkml.kernel.org/r/1396896924-129847-5-git-send-email-dzickus@redhat.com
    [ Adding missing braces to multiline if condition ]
    Signed-off-by: Jiri Olsa

    Don Zickus
     

16 Jan, 2014

1 commit

  • The report__resolve_callchain() can be shared with perf top code as it
    doesn't really depend on the perf report code. Factor it out as
    sample__resolve_callchain(). The same goes to the hist_entry__append_
    callchain() too.

    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Arun Sharma
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rodrigo Campos
    Link: http://lkml.kernel.org/r/1389677157-30513-3-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

20 Dec, 2013

1 commit

  • Reduce typing, functions use class__method convention, so unlikely to
    clash with other libraries.

    This actually was discussed in the "Link:" referenced message below.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20131112113427.GA4053@ghostprotocols.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

29 Oct, 2013

2 commits

  • Conflicts:
    tools/perf/builtin-record.c
    tools/perf/builtin-top.c
    tools/perf/util/hist.h

    Ingo Molnar
     
  • Splitting -g and --call-graph for record command, so we could use '-g'
    with no option.

    The '-g' option now takes NO argument and enables the configured unwind
    method, which is currently the frame pointers method.

    It will be possible to configure unwind method via config file in
    upcoming patches.

    All current '-g' arguments is overtaken by --call-graph option.

    Signed-off-by: Jiri Olsa
    Tested-by: David Ahern
    Tested-by: Ingo Molnar
    Reviewed-by: David Ahern
    Acked-by: Ingo Molnar
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Corey Ashford
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1382797536-32303-2-git-send-email-jolsa@redhat.com
    [ reordered -g/--call-graph on --help and expanded the man page
    according to comments by David Ahern and Namhyung Kim ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

22 Oct, 2013

1 commit

  • Current collapse stage has a scalability problem which can be reproduced
    easily with a parallel kernel build.

    This is because it needs to traverse every children of callchains
    linearly during the collapse/merge stage.

    Converting it to a rbtree reduced the overhead significantly.

    On my 400MB perf.data file which recorded with make -j32 kernel build:

    $ time perf --no-pager report --stdio > /dev/null

    before:
    real 6m22.073s
    user 6m18.683s
    sys 0m0.706s

    after:
    real 0m20.780s
    user 0m19.962s
    sys 0m0.689s

    During the perf report the overhead on append_chain_children went down
    from 96.69% to 18.16%:

    - 18.16% perf perf [.] append_chain_children
    - append_chain_children
    - 77.48% append_chain_children
    + 69.79% merge_chain_branch
    - 22.96% append_chain_children
    + 67.44% merge_chain_branch
    + 30.15% append_chain_children
    + 2.41% callchain_append
    + 7.25% callchain_append
    + 12.26% callchain_append
    + 10.22% merge_chain_branch
    + 11.58% perf perf [.] dso__find_symbol
    + 8.02% perf perf [.] sort__comm_cmp
    + 5.48% perf libc-2.17.so [.] malloc_consolidate

    Reported-by: Linus Torvalds
    Signed-off-by: Namhyung Kim
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1381468543-25334-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

30 Aug, 2013

1 commit

  • Now that the sample parsing correctly checks data sizes there is no
    reason for it to be done again for callchains.

    Signed-off-by: Adrian Hunter
    Acked-by: Namhyung Kim
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1377591794-30553-4-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

22 Jul, 2013

1 commit

  • With programs with very large functions it can be useful to distinguish
    the callgraph nodes on more than just function names. So for example if
    you have multiple calls to the same function, it ends up being separate
    nodes in the chain.

    This patch adds a new key field to the callgraph options, that allows
    comparing nodes on functions (as today, default) and addresses.

    Longer term it would be nice to also handle src lines, but that would
    need more changes and address is a reasonable proxy for it today.

    I right now reference the global params, as there was no simple way to
    register a params pointer.

    Signed-off-by: Andi Kleen
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-0uskktybf0e7wrnoi5e9b9it@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

12 Dec, 2012

1 commit

  • Will be used by perf top, that will first setup the symbol system to
    deal with callchains and then call these routines to ask the kernel
    for callchains.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-jg0dh8rmlx7x11e7u7mnasvd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

01 Sep, 2012

1 commit


31 May, 2012

1 commit

  • perf top -G has a race on callchain cursor between main thread and
    display thread. Since the callchain cursors are used locally make them
    thread-local data would solve the problem.

    Signed-off-by: Namhyung Kim
    Reported-by: Sunjin Yang
    Suggested-by: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Sunjin Yang
    Link: http://lkml.kernel.org/r/1338443007-24857-1-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

28 Nov, 2011

1 commit

  • So that we don't need to have that many globals.

    Next steps will remove the 'session' pointer, that in most cases is
    not needed.

    Then we can rename perf_event_ops to 'perf_tool' that better describes
    this class hierarchy.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 Jun, 2011

1 commit

  • Add "caller/callee" option to support inverted butterfly report,
    in the inverted report (with caller option), the call graph start
    from the callee's ancestor. Users can use such view to catch system's
    performance bottleneck from a sysprof like view. Using this option
    with specified sort order like pid gives us high level view of call
    graph statistics.

    Also add "-G" alias for inverted call graph.

    Signed-off-by: Sam Liao
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: David Ahern
    Signed-off-by: Frederic Weisbecker

    Sam Liao
     

30 Jan, 2011

1 commit


23 Jan, 2011

4 commits

  • Some little callchain tree nodes shyly asked me if they can have
    sisters.

    How cute!

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • To make the callchain API naming more consistent.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • That makes the callchain API naming more consistent and
    reduce potential naming clashes.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • The callchains are fed with an array of a fixed size.
    As a result we iterate over each callchains three times:

    - 1st to resolve symbols
    - 2nd to filter out context boundaries
    - 3rd for the insertion into the tree

    This also involves some pairs of memory allocation/deallocation
    everytime we insert a callchain, for the filtered out array of
    addresses and for the array of symbols that comes along.

    Instead, feed the callchains through a linked list with persistent
    allocations. It brings several pros like:

    - Merge the 1st and 2nd iterations in one. That was possible before
    but in a way that would involve allocating an array slightly taller
    than necessary because we don't know in advance the number of context
    boundaries to filter out.

    - Much lesser allocations/deallocations. The linked list keeps
    persistent empty entries for the next usages and is extendable at
    will.

    - Makes it easier for multiple sources of callchains to feed a
    stacktrace together. This is deemed to pave the way for cfi based
    callchains wherein traditional frame pointer based kernel
    stacktraces will precede cfi based user ones, producing an overall
    callchain which size is hardly predictable. This requirement
    makes the static array obsolete and makes a linked list based
    iterator a much more flexible fit.

    Basic testing on a big perf file containing callchains (~ 176 MB)
    has shown a throughput gain of about 11% with perf report.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     

27 Aug, 2010

2 commits

  • Conflicts:
    tools/perf/util/callchain.h

    Merge reason:
    Fix a non-trivial conflict with latest fixes

    Frederic Weisbecker
     
  • Each histogram entry has a callchain root that stores the
    callchain samples. However we forgot to initialize the
    tracking of children hits of these roots, which then got
    random values on their creation.

    The root children hits is multiplied by the minimum percentage
    of hits provided by the user, and the result becomes the minimum
    hits expected from children branches. If the random value due
    to the uninitialization is big enough, then this minimum number
    of hits can be huge and eventually filter every children branches.

    The end result was invisible callchains. All we need to
    fix this is to initialize the children hits of the root.

    Reported-by: Christoph Hellwig
    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: 2.6.32.x-2.6.35.y

    Frederic Weisbecker
     

23 Aug, 2010

3 commits

  • If we sort the histograms by comm, which is the default,
    we need to merge some of them, typically different thread
    histograms of a same process, or just same comm. But during
    this merge, we forgot to merge callchains.

    So imagine we have three threads (tids: 1000, 1001, 1002) that
    belong to comm "foo".

    tid 1000 got 100 events
    tid 1001 got 10 events
    tid 1002 got 3 events

    Once we merge these histograms to get a per comm result, we'll
    finally get:

    "foo" got 113 events

    The problem is if we merge 1000 and 1001 histograms into 1002, then
    the end merge result, wrt callchains, will be only callchains that
    belong to 1002.
    This is because we haven't handled callchains in the merge. Only those
    from one of the threads inside a common comm survive.

    It means during this merge, we can lose a lot of callchains.

    Fix this by implementing callchains merge and apply it on histograms
    that collapse.

    Reported-by: Christoph Hellwig
    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras

    Frederic Weisbecker
     
  • Do that to start a consistant callchain API namespace.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Christoph Hellwig

    Frederic Weisbecker
     
  • In order to implement callchains collapsing, we need to keep
    track of the maximum depth in a histogram tree of callchains.
    This way we'll avoid allocating an arbitrary temporary buffer
    size on callchain merge time.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Christoph Hellwig

    Frederic Weisbecker
     

22 Jul, 2010

1 commit


08 Jul, 2010

2 commits

  • Hists have their hits increased by the event period. And this
    period based counting is the foundation of all the stats in
    perf report.

    But callchains still use the raw number of hits, without taking
    the period into account. So when we compute the percentage,
    absolute based percentages are totally broken, and relative ones
    too in the first parent level. Because we pass the number of events
    muliplied by their period as the total number of hits to the
    callchain filtering, while callchains expect this number to be
    the number of raw hits.

    perf report -g graph was simply not working, showing no graph unless
    the min percent was zero. And even there the percentage of the
    branches was always 0. And may be fractal filtering was broken on
    the first branch level too.

    flat also was broken, but it was hidden because of other breakages.

    Anyway fix this by counting using periods on callchains.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras

    Frederic Weisbecker
     
  • Initialize the callchain radix tree root correctly.

    When we walk through the parents, we must stop after the root, but
    since it wasn't well initialized, its parent pointer was random.

    Also the number of hits was random because uninitialized, hence it
    was part of the callchain while the root doesn't contain anything.

    This fixes segfaults and percentages followed by empty callchains
    while running:

    perf report -g flat

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: 2.6.31.x-2.6.34.x

    Frederic Weisbecker
     

05 Jun, 2010

1 commit