08 Oct, 2011

1 commit

  • The goal of this patch is to include more information about the host
    environment into the perf.data so it is more self-descriptive. Overtime,
    profiles are captured on various machines and it becomes hard to track
    what was recorded, on what machine and when.

    This patch provides a way to solve this by extending the perf.data file
    with basic information about the host machine. To add those extensions,
    we leverage the feature bits capabilities of the perf.data format. The
    change is backward compatible with existing perf.data files.

    We define the following useful new extensions:
    - HEADER_HOSTNAME: the hostname
    - HEADER_OSRELEASE: the kernel release number
    - HEADER_ARCH: the hw architecture
    - HEADER_CPUDESC: generic CPU description
    - HEADER_NRCPUS: number of online/avail cpus
    - HEADER_CMDLINE: perf command line
    - HEADER_VERSION: perf version
    - HEADER_TOPOLOGY: cpu topology
    - HEADER_EVENT_DESC: full event description (attrs)
    - HEADER_CPUID: easy-to-parse low level CPU identication

    The small granularity for the entries is to make it easier to extend
    without breaking backward compatiblity. Many entries are provided as
    ASCII strings.

    Perf report/script have been modified to print the basic information as
    easy-to-parse ASCII strings. Extended information about CPU and NUMA
    topology may be requested with the -I option.

    Thanks to David Ahern for reviewing and testing the many versions of
    this patch.

    $ perf report --stdio
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # HEADER_CPU_TOPOLOGY info available, use -I to display
    # HEADER_NUMA_TOPOLOGY info available, use -I to display
    # ========
    #
    ...

    $ perf report --stdio -I
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # sibling cores : 0-3
    # sibling threads : 0
    # sibling threads : 1
    # sibling threads : 2
    # sibling threads : 3
    # node0 meminfo : total = 8320608 kB, free = 7571024 kB
    # node0 cpu list : 0-3
    # ========
    #
    ...

    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/20110930134040.GA5575@quad
    Signed-off-by: Stephane Eranian
    [ committer notes: Use --show-info in the tools as was in the docs, rename
    perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
    conflict with f69b64f7 "perf: Support setting the disassembler style" ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     

15 Mar, 2011

1 commit

  • [root@emilia ~]# perf record -a -e sched:* -e timer:timer* sleep 5
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.172 MB perf.data (~7530 samples) ]
    [root@emilia ~]# perf evlist
    sched:sched_kthread_stop
    sched:sched_kthread_stop_ret
    sched:sched_wakeup
    sched:sched_wakeup_new
    sched:sched_switch
    sched:sched_migrate_task
    sched:sched_process_free
    sched:sched_process_exit
    sched:sched_wait_task
    sched:sched_process_wait
    sched:sched_process_fork
    sched:sched_stat_wait
    sched:sched_stat_sleep
    sched:sched_stat_iowait
    sched:sched_stat_runtime
    sched:sched_pi_setprio
    timer:timer_init
    timer:timer_start
    timer:timer_expire_entry
    timer:timer_expire_exit
    timer:timer_cancel
    [root@emilia ~]#

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

17 Nov, 2010

1 commit


03 May, 2010

1 commit

  • Currently, perf 'live mode' writes build-ids at the end of the
    session, which isn't actually useful for processing live mode events.

    What would be better would be to have the build-ids sent before any of
    the samples that reference them, which can be done by processing the
    event stream and retrieving the build-ids on the first hit. Doing
    that in perf-record itself, however, is off-limits.

    This patch introduces perf-inject, which does the same job while
    leaving perf-record untouched. Normal mode perf still records the
    build-ids at the end of the session as it should, but for live mode,
    perf-inject can be injected in between the record and report steps
    e.g.:

    perf record -o - ./hackbench 10 | perf inject -v -b | perf report -v -i -

    perf-inject reads a perf-record event stream and repipes it to stdout.
    At any point the processing code can inject other events into the
    event stream - in this case build-ids (-b option) are read and
    injected as needed into the event stream.

    Build-ids are just the first user of perf-inject - potentially
    anything that needs userspace processing to augment the trace stream
    with additional information could make use of this facility.

    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Frédéric Weisbecker
    LKML-Reference:
    Signed-off-by: Tom Zanussi
    Signed-off-by: Arnaldo Carvalho de Melo

    Tom Zanussi
     

30 Apr, 2010

1 commit

  • First an example with the first internal test:

    [acme@doppio linux-2.6-tip]$ perf test
    1: vmlinux symtab matches kallsyms: Ok

    So it run just one test, that is "vmlinux symtab matches kallsyms", and it was
    successful.

    If we run it in verbose mode, we'll see details about errors and extra warnings
    for non-fatal problems:

    [acme@doppio linux-2.6-tip]$ perf test -v
    1: vmlinux symtab matches kallsyms:
    --- start ---
    Looking at the vmlinux_path (5 entries long)
    No build_id in vmlinux, ignoring it
    No build_id in /boot/vmlinux, ignoring it
    No build_id in /boot/vmlinux-2.6.34-rc4-tip+, ignoring it
    Using /lib/modules/2.6.34-rc4-tip+/build/vmlinux for symbols
    Maps only in vmlinux:
    ffffffff81cb81b1-ffffffff81e1149b 0 [kernel].init.text
    ffffffff81e1149c-ffffffff9fffffff 0 [kernel].exit.text
    ffffffffff600000-ffffffffff6000ff 0 [kernel].vsyscall_0
    ffffffffff600100-ffffffffff6003ff 0 [kernel].vsyscall_fn
    ffffffffff600400-ffffffffff6007ff 0 [kernel].vsyscall_1
    ffffffffff600800-ffffffffffffffff 0 [kernel].vsyscall_2
    Maps in vmlinux with a different name in kallsyms:
    ffffffffff600000-ffffffffff6000ff 0 [kernel].vsyscall_0 in kallsyms as [kernel].0
    ffffffffff600100-ffffffffff6003ff 0 [kernel].vsyscall_fn in kallsyms as:
    *ffffffffff600100-ffffffffff60012f 0 [kernel].2
    ffffffffff600400-ffffffffff6007ff 0 [kernel].vsyscall_1 in kallsyms as [kernel].6
    ffffffffff600800-ffffffffffffffff 0 [kernel].vsyscall_2 in kallsyms as [kernel].8
    Maps only in kallsyms:
    ffffffffff600130-ffffffffff6003ff 0 [kernel].4
    ---- end ----
    vmlinux symtab matches kallsyms: Ok
    [acme@doppio linux-2.6-tip]$

    In the above case we only know the name of the non contiguous kernel ranges in
    the address space when reading the symbol information from the ELF symtab in
    vmlinux.

    The /proc/kallsyms file lack this, we only notice they are separate because
    there are modules after the kernel and after that more kernel functions, so we
    need to have a module rbtree backed by the module .ko path to get symtabs in
    the vmlinux case.

    The tool uses it to match by address to emit appropriate warning, but don't
    considers this fatal.

    The .init.text and .exit.text ines, of course, aren't in kallsyms, so I left
    these cases just as extra info in verbose mode.

    The end of the sections also aren't in kallsyms, so we the symbols layer does
    another pass and sets the end addresses as the next map start minus one, which
    sometimes pads, causing harmless mismatches.

    But at least the symbols match, tested it by copying /proc/kallsyms to
    /tmp/kallsyms and doing changes to see if they were detected.

    This first test also should serve as a first stab at documenting the
    symbol library by providing a self contained example that exercises it
    together with comments about what is being done.

    More tests to check if actions done on a monitored app, like doing mmaps, etc,
    makes the kernel generate the expected events should be added next.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

19 Apr, 2010

1 commit


31 Jan, 2010

1 commit


21 Jan, 2010

1 commit

  • For now it just has operations to examine a given file, find its
    build-id and add or remove it to/from the cache.

    Useful, for instance, when adding binaries sent together with a
    perf.data file, so that we can add them to the cache and have
    the tools find it when resolving symbols.

    It'll also manage the size of the cache like 'ccache' does.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

15 Dec, 2009

1 commit

  • I guess it is enough to show some examples:

    [root@doppio linux-2.6-tip]# rm -f perf.data*
    [root@doppio linux-2.6-tip]# ls -la perf.data*
    ls: cannot access perf.data*: No such file or directory
    [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ]
    [root@doppio linux-2.6-tip]# ls -la perf.data*
    -rw------- 1 root root 74440 2009-12-14 20:03 perf.data
    [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ]
    [root@doppio linux-2.6-tip]# ls -la perf.data*
    -rw------- 1 root root 74280 2009-12-14 20:03 perf.data
    -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old
    [root@doppio linux-2.6-tip]# perf diff | head -5
    1 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 -15307806 [kernel.kallsyms] __kmalloc
    3 +1 +3665941 /lib64/libc-2.10.1.so __GI_memmove
    4 +4 +23508995 /lib64/libc-2.10.1.so _int_malloc
    5 +7 +38538813 [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]# perf diff -p | head -5
    1 +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 [kernel.kallsyms] __kmalloc
    3 +1 /lib64/libc-2.10.1.so __GI_memmove
    4 +4 /lib64/libc-2.10.1.so _int_malloc
    5 +7 -1.00% [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]# perf diff -v | head -5
    1 361449551 326454971 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 151009241 135701435 -15307806 [kernel.kallsyms] __kmalloc
    3 +1 101805328 105471269 +3665941 /lib64/libc-2.10.1.so __GI_memmove
    4 +4 78041440 101550435 +23508995 /lib64/libc-2.10.1.so _int_malloc
    5 +7 59536172 98074985 +38538813 [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]# perf diff -vp | head -5
    1 9.00% 8.00% +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 3.00% 3.00% [kernel.kallsyms] __kmalloc
    3 +1 2.00% 2.00% /lib64/libc-2.10.1.so __GI_memmove
    4 +4 2.00% 2.00% /lib64/libc-2.10.1.so _int_malloc
    5 +7 1.00% 2.00% -1.00% [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]#

    This should be enough for diffs where the system is non
    volatile, i.e. when one doesn't updates binaries.

    For volatile environments, stay tuned for the next perf tool
    feature: a buildid cache populated by 'perf record', managed by
    'perf buildid-cache' a-la ccache, and used by all the report
    tools.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: "Paul E. McKenney"
    Cc: Stephen Hemminger
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Paul E. McKenney
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

20 Nov, 2009

1 commit

  • This tool is mostly a perf version of kmemtrace-user.

    The following information is provided by this tool:

    - the total amount of memory allocated and fragmentation per
    call-site

    - the total amount of memory allocated and fragmentation per
    allocation

    - total memory allocated and fragmentation in the collected
    dataset - ...

    Sample output:

    # ./perf kmem record
    ^C
    # ./perf kmem --stat caller --stat alloc -l 10

    ------------------------------------------------------------------------------
    Callsite | Total_alloc/Per | Total_req/Per | Hit | Fragmentation
    ------------------------------------------------------------------------------
    0xc052f37a | 790528/4096 | 790528/4096 | 193 | 0.000%
    0xc0541d70 | 524288/4096 | 524288/4096 | 128 | 0.000%
    0xc051cc68 | 481600/200 | 481600/200 | 2408 | 0.000%
    0xc0572623 | 297444/676 | 297440/676 | 440 | 0.001%
    0xc05399f1 | 73476/164 | 73472/164 | 448 | 0.005%
    0xc05243bf | 51456/256 | 51456/256 | 201 | 0.000%
    0xc0730d0e | 31844/497 | 31808/497 | 64 | 0.113%
    0xc0734c4e | 17152/256 | 17152/256 | 67 | 0.000%
    0xc0541a6d | 16384/128 | 16384/128 | 128 | 0.000%
    0xc059c217 | 13120/40 | 13120/40 | 328 | 0.000%
    0xc0501ee6 | 11264/88 | 11264/88 | 128 | 0.000%
    0xc04daef0 | 7504/682 | 7128/648 | 11 | 5.011%
    0xc04e14a3 | 4216/191 | 4216/191 | 22 | 0.000%
    0xc05041ca | 3524/44 | 3520/44 | 80 | 0.114%
    0xc0734fa3 | 2104/701 | 1620/540 | 3 | 23.004%
    0xc05ec9f1 | 2024/289 | 2016/288 | 7 | 0.395%
    0xc06a1999 | 1792/256 | 1792/256 | 7 | 0.000%
    0xc0463b9a | 1584/144 | 1584/144 | 11 | 0.000%
    0xc0541eb0 | 1024/16 | 1024/16 | 64 | 0.000%
    0xc06a19ac | 896/128 | 896/128 | 7 | 0.000%
    0xc05721c0 | 772/12 | 768/12 | 64 | 0.518%
    0xc054d1e6 | 288/57 | 280/56 | 5 | 2.778%
    0xc04b562e | 157/31 | 154/30 | 5 | 1.911%
    0xc04b536f | 80/16 | 80/16 | 5 | 0.000%
    0xc05855a0 | 64/64 | 36/36 | 1 | 43.750%
    ------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
    Alloc Ptr | Total_alloc/Per | Total_req/Per | Hit | Fragmentation
    ------------------------------------------------------------------------------
    0xda884000 | 1052672/4096 | 1052672/4096 | 257 | 0.000%
    0xda886000 | 262144/4096 | 262144/4096 | 64 | 0.000%
    0xf60c7c00 | 16512/128 | 16512/128 | 129 | 0.000%
    0xf59a4118 | 13120/40 | 13120/40 | 328 | 0.000%
    0xdfd4b2c0 | 11264/88 | 11264/88 | 128 | 0.000%
    0xf5274600 | 7680/256 | 7680/256 | 30 | 0.000%
    0xe8395000 | 5948/594 | 5464/546 | 10 | 8.137%
    0xe59c3c00 | 5748/479 | 5712/476 | 12 | 0.626%
    0xf4cd1a80 | 3524/44 | 3520/44 | 80 | 0.114%
    0xe5bd1600 | 2892/482 | 2856/476 | 6 | 1.245%
    ... | ... | ... | ... | ...
    ------------------------------------------------------------------------------

    SUMMARY
    =======
    Total bytes requested: 2333626
    Total bytes allocated: 2353712
    Total bytes wasted on internal fragmentation: 20086
    Internal fragmentation: 0.853375%

    TODO:
    - show sym+offset in 'callsite' column
    - show cross node allocation stats
    - collect more useful stats?
    - ...

    Signed-off-by: Li Zefan
    Acked-by: Pekka Enberg
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Eduard - Gabriel Munteanu
    Cc: linux-mm@kvack.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

17 Nov, 2009

2 commits

  • Resolved merge conflict in tools/perf/Makefile

    Merge reason: we want to queue up a dependent patch.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • With this we can list the buildids in a perf.data file so that
    we can pipe them to other, distro specific tools that from the
    buildids can figure out separate packages (foo-debuginfo) where
    we can find the matching symtabs so that perf report can do its
    job.

    E.g:

    [acme@doppio linux-2.6-tip]$ perf buildid-list | head -5
    8e08b117e5458ad3f85da16d42d0fc5cd21c5869
    520c2387a587cc5acfcf881e27dba1caaeab4b1f
    ec8dd400904ddfcac8b1c343263a790f977159dc
    7caedbca5a6d8ab39a7fe44bd28c07d3e14a3f3f
    379bb828fd08859dbea73279f04abefabc95a6a3
    [acme@doppio linux-2.6-tip]$ perf buildid-list -v | head -5
    8e08b117e5458ad3f85da16d42d0fc5cd21c5869 /sbin/init
    520c2387a587cc5acfcf881e27dba1caaeab4b1f /lib64/ld-2.10.1.so
    ec8dd400904ddfcac8b1c343263a790f977159dc /lib64/libc-2.10.1.so
    7caedbca5a6d8ab39a7fe44bd28c07d3e14a3f3f /sbin/udevd
    379bb828fd08859dbea73279f04abefabc95a6a3 /lib64/libdl-2.10.1.so
    [acme@doppio linux-2.6-tip]$

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

08 Nov, 2009

1 commit


13 Oct, 2009

1 commit

  • Add perf probe subcommand that implements a kprobe-event setup helper
    to the perf command.
    This allows user to define kprobe events using C expressions (C line
    numbers, C function names, and C local variables).

    Usage
    -----
    perf probe [] -P 'PROBEDEF' [-P 'PROBEDEF' ...]

    -k, --vmlinux vmlinux/module pathname
    -P, --probe
    probe point definition, where
    p: kprobe probe
    r: kretprobe probe
    GRP: Group name (optional)
    NAME: Event name
    FUNC: Function name
    OFFS: Offset from function entry (in byte)
    SRC: Source code path
    LINE: Line number
    ARG: Probe argument (local variable name or
    kprobe-tracer argument format is supported.)

    Changes in v4:
    - Add _GNU_SOURCE macro for strndup().

    Changes in v3:
    - Remove -r option because perf always be used for online kernel.
    - Check malloc/calloc results.

    Changes in v2:
    - Check synthesized string length.
    - Rename perf kprobe to perf probe.
    - Use spaces for separator and update usage comment.
    - Check error paths in parse_probepoint().
    - Check optimized-out variables.

    Signed-off-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Christoph Hellwig
    Cc: Ananth N Mavinakayanahalli
    Cc: Jim Keniston
    Cc: Frank Ch. Eigler
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Masami Hiramatsu
     

19 Sep, 2009

1 commit

  • timechart is a tool to visualize what is going on in the system.

    The user makes a trace of what is going on with

    > perf record --timechart /usr/bin/some_command

    and then can turn the output of this into an svg file

    > perf timechart

    which then can be viewed with any SVG view; inkscape works well
    enough for me.

    The idea behind timechart is to create a "infinitely zoomable"
    picture; something that has high level information on a 1:1 zoom
    level, but which exposes more details every time you zoom into a
    specific area.

    Signed-off-by: Arjan van de Ven
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

13 Sep, 2009

1 commit

  • This turn-key tool allows scheduler measurements to be
    conducted and the results be displayed numerically.

    First baby step towards that goal: clone the new command off of
    perf trace.

    Fix a few other details along the way:

    - add (minimal) perf trace documentation

    - reorder a few places

    - list perf trace in the mainporcelain list as well
    as it's a very useful utility.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

12 Aug, 2009

1 commit

  • Factorize multiple definitions of high level dso helpers into the
    symbol source file.

    The side effect is a general export of the verbose and eprintf
    debugging helpers into a new file dedicated to debugging purposes.

    Signed-off-by: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Brice Goglin

    Frederic Weisbecker
     

07 Jun, 2009

1 commit