17 Jan, 2015

1 commit

  • When thread__init_map_groups() fails, a new thread should be removed
    from the rbtree since it's gonna be freed. Also update last match cache
    only if the function succeeded.

    Reported-by: David Ahern
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1420763892-15535-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

09 Dec, 2014

1 commit

  • Using flag to distinguish between branch_history and normal callchain.

    Move the cpumode to add_callchain_ip function.

    No change in behavior.

    Signed-off-by: Kan Liang
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1417532814-26208-3-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

02 Dec, 2014

1 commit

  • Currently branch stacks can be only shown as edge histograms for
    individual branches. I never found this display particularly useful.

    This implements an alternative mode that creates histograms over
    complete branch traces, instead of individual branches, similar to how
    normal callgraphs are handled. This is done by putting it in front of
    the normal callgraph and then using the normal callgraph histogram
    infrastructure to unify them.

    This way in complex functions we can understand the control flow that
    lead to a particular sample, and may even see some control flow in the
    caller for short functions.

    Example (simplified, of course for such simple code this is usually not
    needed), please run this after the whole patchkit is in, as at this
    point in the patch order there is no --branch-history, that will be
    added in a patch after this one:

    tcall.c:

    volatile a = 10000, b = 100000, c;

    __attribute__((noinline)) f2()
    {
    c = a / b;
    }

    __attribute__((noinline)) f1()
    {
    f2();
    f2();
    }
    main()
    {
    int i;
    for (i = 0; i < 1000000; i++)
    f1();
    }

    % perf record -b -g ./tsrc/tcall
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
    % perf report --no-children --branch-history
    ...
    54.91% tcall.c:6 [.] f2 tcall
    |
    |--65.53%-- f2 tcall.c:5
    | |
    | |--70.83%-- f1 tcall.c:11
    | | f1 tcall.c:10
    | | main tcall.c:18
    | | main tcall.c:18
    | | main tcall.c:17
    | | main tcall.c:17
    | | f1 tcall.c:13
    | | f1 tcall.c:13
    | | f2 tcall.c:7
    | | f2 tcall.c:5
    | | f1 tcall.c:12
    | | f1 tcall.c:12
    | | f2 tcall.c:7
    | | f2 tcall.c:5
    | | f1 tcall.c:11
    | |
    | --29.17%-- f1 tcall.c:12
    | f1 tcall.c:12
    | f2 tcall.c:7
    | f2 tcall.c:5
    | f1 tcall.c:11
    | f1 tcall.c:10
    | main tcall.c:18
    | main tcall.c:18
    | main tcall.c:17
    | main tcall.c:17
    | f1 tcall.c:13
    | f1 tcall.c:13
    | f2 tcall.c:7
    | f2 tcall.c:5
    | f1 tcall.c:12

    The default output is unchanged.

    This is only implemented in perf report, no change to record or anywhere
    else.

    This adds the basic code to report:

    - add a new "branch" option to the -g option parser to enable this mode
    - when the flag is set include the LBR into the callstack in machine.c.

    The rest of the history code is unchanged and doesn't know the
    difference between LBR entry and normal call entry.

    - detect overlaps with the callchain
    - remove small loop duplicates in the LBR

    Current limitations:

    - The LBR flags (mispredict etc.) are not shown in the history
    and LBR entries have no special marker.
    - It would be nice if annotate marked the LBR entries somehow
    (e.g. with arrows)

    v2: Various fixes.
    v3: Merge further patches into this one. Fix white space.
    v4: Improve manpage. Address review feedback.
    v5: Rename functions. Better error message without -g. Fix crash without
    -b.
    v6: Rebase
    v7: Rebase. Use NO_ENTRY in memset.
    v8: Port to latest tip. Move add_callchain_ip to separate
    patch. Skip initial entries in callchain. Minor cleanups.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

25 Nov, 2014

1 commit

  • Jiri reported that the commit 96d78059d6d9 ("perf tools: Make vmlinux
    short name more like kallsyms short name") segfaults on perf script.

    When processing kernel mmap event, it should access the 'kernel'
    variable as sometimes it cannot find a matching dso from build-id table
    so 'dso' might be invalid.

    Reported-by: Jiri Olsa
    Tested-by: Jiri Olsa
    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1416285028-30572-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

19 Nov, 2014

2 commits

  • Use the relative address, this makes get_srcline work correctly in the
    end.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-4-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Move the code to resolve and add a new callchain entry into a new
    add_callchain_ip function. This will be used in the next patches to add
    LBRs too.

    No change in behavior.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1415844328-4884-2-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

05 Nov, 2014

2 commits

  • The previous patch changed kernel dso name from '[kernel.kallsyms]' to
    vmlinux. However it might add confusion to old users accustomed to the
    old name. So change the short name to '[kernel.vmlinux]' to reduce such
    confusion.

    Before:
    # Overhead Command Shared Object Symbol
    # ........ .............. ....................... ...............................
    #
    9.83% swapper vmlinux [k] intel_idle
    4.10% awk libc-2.20.so [.] __strcmp_sse2
    1.86% sed libc-2.20.so [.] __strcmp_sse2
    1.78% netctl-auto libc-2.20.so [.] __strcmp_sse2
    1.23% netctl-auto libc-2.20.so [.] __mbrtowc
    1.21% firefox libxul.so [.] 0x00000000024b62bd
    1.20% swapper vmlinux [k] cpuidle_enter_state
    1.03% sleep vmlinux [k] copy_user_generic_unrolled

    After:
    # Overhead Command Shared Object Symbol
    # ........ .............. ....................... ...............................
    #
    9.83% swapper [kernel.vmlinux] [k] intel_idle
    4.10% awk libc-2.20.so [.] __strcmp_sse2
    1.86% sed libc-2.20.so [.] __strcmp_sse2
    1.78% netctl-auto libc-2.20.so [.] __strcmp_sse2
    1.23% netctl-auto libc-2.20.so [.] __mbrtowc
    1.21% firefox libxul.so [.] 0x00000000024b62bd
    1.20% swapper [kernel.vmlinux] [k] cpuidle_enter_state
    1.03% sleep [kernel.vmlinux] [k] copy_user_generic_unrolled

    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1415063674-17206-9-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • There's a problem on finding correct kernel symbols when perf report
    runs on a different kernel. Although a part of the problem was solved
    by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
    version when finding vmlinux"), there's a remaining problem still.

    When perf records samples, it synthesizes the kernel map using
    machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
    You can easily see it using 'perf report -D' command.

    After finishing record, it goes through the recorded events to find
    maps/dsos actually used. And then record build-id info of them.

    During this process, it needs to load symbols in a dso and it'd call
    dso__load_vmlinux_path() since the default value of the symbol_conf.
    try_vmlinux_path is true. However it changes dso->long_name to a real
    path of the vmlinux file (e.g. /lib/modules/3.16.4/build/vmlinux) if one
    is running on a custom kernel.

    It resulted in that perf report reads the build-id of the vmlinux, but
    cannot use it since it only knows about the [kernel.kallsyms] map. It
    then falls back to possible vmlinux paths by using the recorded kernel
    version (in case of a recent version) or a running kernel silently.

    Even with the recent tools, this still has a possibility of breaking
    the result. As the build directory is a symbolic link, if one built a
    new kernel in the same directory with different source/config, the old
    link to vmlinux will point the new file. So it's absolutely needed to
    use build-id when finding a kernel image.

    In this patch, it's now changed to try to search a kernel dso in the
    existing dso list which was constructed during build-id table parsing
    so it'll always have a build-id. If not found, search "[kernel.kallsyms]".

    Before:

    $ perf report
    # Children Self Command Shared Object Symbol
    # ........ ........ ....... ................. ...............................
    #
    72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
    72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
    72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
    71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
    ...

    After (for the same perf.data):

    72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
    72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
    72.15% 0.00% swapper vmlinux [k] default_idle
    71.87% 71.87% swapper vmlinux [k] native_safe_halt
    ...

    Signed-off-by: Namhyung Kim
    Acked-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/20140924073356.GB1962@gmail.com
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1415063674-17206-8-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

04 Nov, 2014

1 commit

  • This patch adds basic support to handle compressed kernel module as some
    distro (such as Archlinux) carries on it now. The actual work using
    compression library will be added later.

    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1415063674-17206-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

29 Oct, 2014

4 commits

  • The unwind__get_entries() already receives the thread parameter, from where it can
    obtain the matching machine structure, shorten the signature.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-isjc6bm8mv4612mhi6af64go@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Shortening function signature lenght too, since a thread's machine can be
    obtained from thread->mg->machine, no need to pass thread, machine.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-5wb6css280ty0cel5p0zo2b1@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • So stop passing both machine and thread to several thread methods,
    reducing function signature length.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-ckcy19dcp1jfkmdihdjcqdn1@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • We were setting this only in machine__init(), i.e. for the map_groups that
    holds the kernel module maps, not for the one used for a thread's executable
    mmaps.

    Now we are sure that we can obtain the machine where a thread is by going
    via thread->mg->machine, thus we can, in the following patch, make all
    codepaths that receive machine _and_ thread, drop the machine one.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-y6zgaqsvhrf04v57u15e4ybm@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 Oct, 2014

1 commit

  • A segfault happens on 'perf test hists_link' because we end up using a
    struct machines on the stack, and then machines__init() was not
    initializing the newly introduced rb_root, just the existing list_head.

    When we introduced struct dsos, to group the two ways to store dsos,
    i.e. the linked list and the rbtree, we didn't turned the initialization
    done in:

    machines__init(machines->host) ->
    machine__init() ->
    INIT_LIST_HEAD

    into a dsos__init() to keep on initializing the list_head but _as well_
    initializing the rb_root, oops.

    All worked because outside perf-test we probably zalloc the whole thing
    which ends up initializing it in to NULL.

    So the problem looks contained to 'perf test' that uses it on stack,
    etc.

    Reported-by: Jiri Olsa
    Acked-by: Waiman Long ,
    Cc: Adrian Hunter ,
    Cc: Don Zickus
    Cc: Douglas Hatch
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Scott J Norton
    Cc: Waiman Long ,
    Link: http://lkml.kernel.org/r/20141014180353.GF3198@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

02 Oct, 2014

1 commit

  • With workload that spawns and destroys many threads and processes, it
    was found that perf-mem could took a long time to post-process the perf
    data after the target workload had completed its operation.

    The performance bottleneck was found to be the lookup and insertion of
    the new DSO structures (thousands of them in this case).

    In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
    perf profile below shows what perf was doing after the profiled AIM7
    shared workload completed:

    - 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
    - __strcmp_sse42
    - 99.82% map__new
    machine__process_mmap_event
    perf_session_deliver_event
    perf_session__process_event
    __perf_session__process_events
    cmd_record
    cmd_mem
    run_builtin
    main
    __libc_start_main
    - 13.17% perf perf [.] __dsos__findnew
    __dsos__findnew
    map__new
    machine__process_mmap_event
    perf_session_deliver_event
    perf_session__process_event
    __perf_session__process_events
    cmd_record
    cmd_mem
    run_builtin
    main
    __libc_start_main

    So about 97% of CPU times were spent in the map__new() function trying
    to insert new DSO entry into the DSO linked list. The whole
    post-processing step took about 9 minutes.

    The DSO structures are currently searched linearly. So the total
    processing time will be proportional to n^2.

    To overcome this performance problem, the DSO code is modified to also
    put the DSO structures in a RB tree sorted by its long name in
    additional to being in a simple linked list. With this change, the
    processing time will become proportional to n*log(n) which will be much
    quicker for large n. However, the short name will still be searched
    using the old linear searching method. With that patch in place, the
    same perf-mem post-processing step took less than 30 seconds to
    complete.

    Signed-off-by: Waiman Long
    Cc: Adrian Hunter
    Cc: Don Zickus
    Cc: Douglas Hatch
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Scott J Norton
    Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Waiman Long
     

30 Sep, 2014

1 commit

  • This is a precursor patch to enable long name searching of DSOs using
    a rbtree.

    In this patch, a new dsos structure is created which contains only a
    list head structure for the moment.

    The new dsos structure is used, in turn, in the machine structure for
    the user_dsos and kernel_dsos fields.

    Only the following 3 dsos functions are modified to accept the new dsos
    structure parameter instead of list_head:

    - dsos__add()
    - dsos__find()
    - __dsos__findnew()

    Signed-off-by: Waiman Long
    Cc: Adrian Hunter
    Cc: Don Zickus
    Cc: Douglas Hatch
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Scott J Norton
    Link: http://lkml.kernel.org/r/1412021249-19201-2-git-send-email-Waiman.Long@hp.com
    [ Move struct dsos to dso.h to reduce the dso methods depends on machine.h ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Waiman Long
     

23 Aug, 2014

3 commits

  • As we run "perf c2c" on more applications, we noticed we're missing
    significant samples from a common customer's application. Looking at
    the /proc//maps file for the app, we see "rwxs" and "rwxp"
    permissions on many of the shared memory & heap regions, and on all the
    thread stacks.

    Because those regions have the "x" bit set, perf marks them with a
    MAP_FUNCTION type. Hence ip_resolve_data() never finds load or store
    events coming from them.

    We fixed this by re-calling thread__find_addr_location with
    MAP__FUNCTION in the case where map is NULL as a last ditch effort to
    map the sample before giving up and dropping it.

    Reported-by: Joe Mario
    Tested-by: Joe Mario
    Signed-off-by: Don Zickus
    Acked-by: Jiri Olsa
    Cc: Jiri Olsa
    Cc: Joe Mario
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1408591511-57884-1-git-send-email-dzickus@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Don Zickus
     
  • Add a function to determine if an address is in the kernel. This is
    based on the kernel function kernel_ip().

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1408129739-17368-5-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • Rename machine__get_kernel_start_addr() to
    machine__get_running_kernel_start() so that a new function, with a
    similar name to the original name, can be added that gets the kernel
    start address from the kernel map.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1408129739-17368-4-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

14 Aug, 2014

2 commits

  • Add machine__thread_exec_comm() to return the comm that matches the last
    exec, if the comm_exec flag is present, or the last comm otherwise.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1406786474-9306-3-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • For grouping together all the data from a single execution, which is
    needed for pairing calls and returns e.g. any outstanding calls when a
    process exec's will never return.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1406786474-9306-2-git-send-email-adrian.hunter@intel.com
    [ Remove testing if comm->exec is false before setting it to true ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

24 Jul, 2014

3 commits

  • The thread will be needed to determine the VDSO type.

    Reviewed-by: Jiri Olsa
    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1406035081-14301-52-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • The VDSO temporary file is unlinked when a session is deleted. That
    precludes the possibilities that there is no session or there is more
    than one session.

    Correctly the vdso belongs to the machine so put the information on
    'struct machine' and get rid of the global variables.

    Reviewed-by: Jiri Olsa
    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/53CF9B14.7040408@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • This is preparation for removing the global variables used in vdso.c and
    thereby fixing the lifetime of the VDSO temporary file.

    Reviewed-by: Jiri Olsa
    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1406035081-14301-45-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

23 Jul, 2014

1 commit

  • Add an array to struct machine to store the current tid running on each
    cpu.

    Add machine functions to get / set the tid for a cpu.

    This will be used to determine the tid when decoding a per-cpu
    Instruction Trace.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1406035081-14301-17-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

17 Jul, 2014

3 commits

  • __machine__findnew_thread() creates a 'struct thread' but does not free
    it on the error path. Fix it.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1405495184-20441-3-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • Events like sched_switch do not provide a pid (tgid) which can result in
    threads with an unknown pid. If the pid is later discovered, join the
    map groups.

    Note the thread's map groups should be empty because they are populated
    by MMAP events which do provide the pid and tid.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1405498033-23817-1-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • The value used for unknown pids cannot be zero because that is used by
    the "idle" task.

    Use -1 instead. Also handle the unknown pid case when creating map
    groups.

    Note that, threads with an unknown pid should not occur because fork (or
    synthesized) events precede the thread's existence.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1405332185-4050-2-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

27 Jun, 2014

1 commit

  • When saving the callchain on Power, the kernel conservatively saves excess
    entries in the callchain. A few of these entries are needed in some cases
    but not others. We should use the DWARF debug information to determine
    when the entries are needed.

    Eg: the value in the link register (LR) is needed only when it holds the
    return address of a function. At other times it must be ignored.

    If the unnecessary entries are not ignored, we end up with duplicate arcs
    in the call-graphs.

    Use the DWARF debug information to determine if any callchain entries
    should be ignored when building call-graphs.

    Callgraph before the patch:

    14.67% 2234 sprintft libc-2.18.so [.] __random
    |
    --- __random
    |
    |--61.12%-- __random
    | |
    | |--97.15%-- rand
    | | do_my_sprintf
    | | main
    | | generic_start_main.isra.0
    | | __libc_start_main
    | | 0x0
    | |
    | --2.85%-- do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --38.88%-- rand
    |
    |--94.01%-- rand
    | do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --5.99%-- do_my_sprintf
    main
    generic_start_main.isra.0
    __libc_start_main
    0x0

    Callgraph after the patch:

    14.67% 2234 sprintft libc-2.18.so [.] __random
    |
    --- __random
    |
    |--95.93%-- rand
    | do_my_sprintf
    | main
    | generic_start_main.isra.0
    | __libc_start_main
    | 0x0
    |
    --4.07%-- do_my_sprintf
    main
    generic_start_main.isra.0
    __libc_start_main
    0x0

    TODO: For split-debug info objects like glibc, we can only determine
    the call-frame-address only when both .eh_frame and .debug_info
    sections are available. We should be able to determin the CFA
    even without the .eh_frame section.

    Fix suggested by Anton Blanchard.

    Thanks to valuable input on DWARF debug information from Ulrich Weigand.

    Reported-by: Maynard Johnson
    Tested-by: Maynard Johnson
    Signed-off-by: Sukadev Bhattiprolu
    Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.com
    Signed-off-by: Jiri Olsa

    Sukadev Bhattiprolu
     

20 Jun, 2014

1 commit

  • The function machine__get_kernel_start_addr() was taking the first symbol
    of kallsyms as the start address. This is incorrect in certain cases
    where the first symbol is something at 0, while the actual kernel
    functions begin at a later point (e.g. 0x80200000).

    This patch fixes machine__get_kernel_start_addr() to search for the
    symbol "_text" or "_stext", which marks the beginning of kernel mapping.
    This was already being done in machine__create_kernel_maps(). Thus, this
    patch is just a refactor, to move that code into
    machine__get_kernel_start_addr().

    Signed-off-by: Simon Que
    Link: http://lkml.kernel.org/r/1402943529-13244-1-git-send-email-sque@chromium.org
    Signed-off-by: Jiri Olsa

    Simon Que
     

09 Jun, 2014

1 commit


01 May, 2014

1 commit


30 Apr, 2014

1 commit

  • Modules installed outside of the kernel's build system should go into
    "%s/lib/modules/%s/extra", but at present, perf will only look at them
    when they are in "%s/lib/modules/%s/kernel". Lets encourage good
    citizenship by relaxing this requirement to "%s/lib/modules/%s". This
    way open source modules that are out-of-tree have no incentive to start
    populating a directory reserved for in-kernel modules and I can stop
    hex-editing my system's perf binary when profiling OSS out-of-tree
    modules.

    Feedback from Namhyung Kim correctly revealed that the hex-edits that I
    had been doing meant that perf was also traversing the build and source
    symlinks in %s/lib/modules/%s. That is undesireable, so we explicitly
    exclude them from traversal with a minor tweak to the traversal routine.

    Signed-off-by: Richard Yao
    Acked-by: Namhyung kim
    Link: http://lkml.kernel.org/r/1398532675-13684-1-git-send-email-ryao@gentoo.org
    Signed-off-by: Jiri Olsa

    Richard Yao
     

28 Apr, 2014

1 commit

  • Sharing map groups within all process threads. This way
    there's only one copy of mmap info and it's reachable
    from any thread within the process.

    Original-patch-by: Arnaldo Carvalho de Melo
    Acked-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: Corey Ashford
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1397490723-1992-5-git-send-email-jolsa@redhat.com
    Signed-off-by: Jiri Olsa

    Jiri Olsa
     

19 Mar, 2014

2 commits

  • Now that we can properly synthesize threads system-wide, make sure the
    mmap and mmap2 events use tids instead of pids to locate their maps.

    Signed-off-by: Don Zickus
    Cc: Jiri Olsa
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1393429527-167840-3-git-send-email-dzickus@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Don Zickus
     
  • By turning the addr_location->filtered member from a boolean to a u8
    bitmap, reusing (and extending) the hist_filter enum for that.

    This patch doesn't change the logic at all, as it keeps the meaning of
    al->filtered !0 to mean that the entry _was_ filtered, so no change in
    how this value is interpreted needs to be done at this point.

    This will be soon used in upcoming patches.

    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-89hmfgtr9t22sky1lyg7nw7l@git.kernel.org
    [ yanked this out of a previous patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

15 Mar, 2014

2 commits

  • Forcing the code to always search thread by pid/tid pair.

    The PID value will be needed in future to determine the process thread
    leader for map groups sharing.

    Signed-off-by: Jiri Olsa
    Acked-by: Adrian Hunter
    Cc: Corey Ashford
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1394805606-25883-3-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Its one level up thread__find_addr_location, where it will look in
    different domains for a sample: user, kernel, hypervisor, etc.

    Will soon be used by a patchkit by Andi Kleen.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-so6nxkh7xj48bc5kq4jpj991@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

11 Mar, 2014

1 commit


10 Mar, 2014

1 commit

  • When trying to map a bunch of instruction addresses to their respective
    threads, I kept getting a lot of bogus entries [I forget the exact
    reason as I patched my code months ago].

    Looking through ip__resolve_ams, I noticed the check for

    if (al.sym)

    and realized, most times I have an al.map definition but sometimes an
    al.sym is undefined. In the cases where al.sym is undefined, the loop
    keeps going even though a valid al.map exists.

    Modify this check to use the more reliable al.map. This fixed my bogus
    entries.

    Signed-off-by: Don Zickus
    Cc: Jiri Olsa
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1393386227-149412-2-git-send-email-dzickus@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Don Zickus