23 Jul, 2009

10 commits

  • …nel/git/peterz/linux-2.6-perf

    * 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf: (31 commits)
    perf_counter tools: Give perf top inherit option
    perf_counter tools: Fix vmlinux symbol generation breakage
    perf_counter: Detect debugfs location
    perf_counter: Add tracepoint support to perf list, perf stat
    perf symbol: C++ demangling
    perf: avoid structure size confusion by using a fixed size
    perf_counter: Fix throttle/unthrottle event logging
    perf_counter: Improve perf stat and perf record option parsing
    perf_counter: PERF_SAMPLE_ID and inherited counters
    perf_counter: Plug more stack leaks
    perf: Fix stack data leak
    perf_counter: Remove unused variables
    perf_counter: Make call graph option consistent
    perf_counter: Add perf record option to log addresses
    perf_counter: Log vfork as a fork event
    perf_counter: Synthesize VDSO mmap event
    perf_counter: Make sure we dont leak kernel memory to userspace
    perf_counter tools: Fix index boundary check
    perf_counter: Fix the tracepoint channel to perfcounters
    perf_counter, x86: Extend perf_counter Pentium M support
    ...

    Linus Torvalds
     
  • Currently, perf top -p only tracks the pid provided, which isn't very useful
    for watching forky loads, so give it an inherit option.

    Signed-off-by: Mike Galbraith
    Cc: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Mike Galbraith
     
  • vmlinux meets the criteria for symbol adjustment, which breaks vmlinux generated symbols.
    Fix this by exempting vmlinux. This is a bit fragile in that someone could change the
    kernel dso's name, but currently that name is also hardwired.

    Signed-off-by: Mike Galbraith
    Cc: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Mike Galbraith
     
  • If "/sys/kernel/debug" is not a debugfs mount point, search for the debugfs
    filesystem in /proc/mounts, but also allows the user to specify
    '--debugfs-dir=blah' or set the environment variable: 'PERF_DEBUGFS_DIR'

    Signed-off-by: Jason Baron
    [ also made it probe "/debug" by default ]
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Jason Baron
     
  • Add support to 'perf list' and 'perf stat' for kernel tracepoints. The
    implementation creates a 'for_each_subsystem' and 'for_each_event' for
    easy iteration over the tracepoints.

    Signed-off-by: Jason Baron
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Jason Baron
     
  • [acme@doppio ~]$ perf report -s comm,dso,symbol -C firefox -d /usr/lib64/xulrunner-1.9.1/libxul.so | grep :: | head
    2.21% [.] nsDeque::Push(void*)
    1.78% [.] GraphWalker::DoWalk(nsDeque&)
    1.30% [.] GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*)
    1.27% [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
    1.18% [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
    1.13% [.] nsDeque::PopFront()
    1.11% [.] nsGlobalWindow::RunTimeout(nsTimeout*)
    0.97% [.] nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)
    0.95% [.] nsJSEventListener::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
    0.95% [.] nsCOMPtr_base::~nsCOMPtr_base()
    [acme@doppio ~]$

    Cc: Pekka Enberg
    Cc: Vegard Nossum
    Cc: Paul Mackerras
    Cc: Frédéric Weisbecker
    Suggested-by: Clark Williams
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Arnaldo Carvalho de Melo
     
  • for some reason, this structure gets compiled as 36 bytes in some files
    (the ones that alloacte it) but 40 bytes in others (the ones that use it).
    The cause is an off_t type that gets a different size in different
    compilation units for some yet-to-be-explained reason.

    But the effect is disasterous; the size/offset members of the struct
    are at different offsets, and result in mostly complete garbage.
    The parser in perf is so robust that this all gets hidden, and after
    skipping an certain amount of samples, it recovers.... so this bug
    is not normally noticed.

    .... except when you want every sample to be exact.

    Fix this by just using an explicitly sized type.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Arjan van de Ven
     
  • perf stat and perf record currently look for all options on the command
    line. This can lead to some confusion:

    # perf stat ls -l
    Error: unknown switch `l'

    While we can work around this by adding '--' before the command, the git
    option parsing code can stop at the first non option:

    # perf stat ls -l
    Performance counter stats for 'ls -l':
    ....

    Signed-off-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Anton Blanchard
     
  • Anton noted that for inherited counters the counter-id as provided by
    PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
    because each inherited counter gets its own id.

    His suggestion was to always return the parent counter id, since that
    is the primary counter id as exposed. However, these inherited
    counters have a unique identifier so that events like
    PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
    counter gets modified, which is important when trying to normalize the
    sample streams.

    This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
    which is more useful anyway, since changing periods became a lot more
    common than initially thought -- rendering PERF_EVENT_PERIOD the less
    useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
    value, since it reports the value used to trigger the overflow,
    whereas PERF_EVENT_PERIOD simply reports the requested period changed,
    which might only take effect on the next cycle).

    This still leaves us PERF_EVENT_THROTTLE to consider, but since that
    _should_ be a rare occurrence, and linking it to a primary id is the
    most useful bit to diagnose the problem, we introduce a
    PERF_SAMPLE_STREAM_ID, for those few cases where the full
    reconstruction is important.

    [Does change the ABI a little, but I see no other way out]

    Suggested-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Peter Zijlstra
     
  • Peter Zijlstra
     

18 Jul, 2009

3 commits


13 Jul, 2009

1 commit


12 Jul, 2009

5 commits

  • [acme@doppio pahole]$ perf report -ns comm,dso,symbol -d /lib64/libc-2.10.1.so -C pahole | head -17
    21.94% 32101 [.] _int_malloc
    20.10% 29402 [.] __GI_strcmp
    16.77% 24533 [.] __tsearch
    12.61% 18450 [.] malloc_consolidate
    6.42% 9394 [.] _int_free
    6.28% 9191 [.] __tfind
    4.56% 6678 [.] __GI___libc_free
    4.46% 6520 [.] _IO_vfprintf_internal
    2.59% 3786 [.] __malloc
    1.17% 1716 [.] __GI_memcpy
    [acme@doppio pahole]$

    Signed-off-by: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • So we need to get the richer .symtab from the debuginfo
    packages but the PLT info from the original DSO where we have
    just the leaner .dynsym symtab.

    Example:

    | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > before
    | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > after
    | [acme@doppio pahole]$ diff -U1 before after
    | --- before 2009-07-11 11:04:22.688595741 -0300
    | +++ after 2009-07-11 11:04:33.380595676 -0300
    | @@ -80,3 +80,2 @@
    | 0.07% pahole ./build/pahole [.] pahole_stealer
    | - 0.06% pahole /usr/lib64/libdw-0.141.so [.] 0x00000000007140
    | 0.06% pahole /usr/lib64/libdw-0.141.so [.] __libdw_getabbrev
    | @@ -91,2 +90,3 @@
    | 0.06% pahole [kernel] [k] free_hot_cold_page
    | + 0.06% pahole /usr/lib64/libdw-0.141.so [.] tfind@plt
    | 0.05% pahole ./build/libdwarves.so.1.0.0 [.] ftype__add_parameter
    | @@ -242,2 +242,3 @@
    | 0.01% pahole [kernel] [k] account_group_user_time
    | + 0.01% pahole /usr/lib64/libdw-0.141.so [.] strlen@plt
    | 0.01% pahole ./build/pahole [.] strcmp@plt
    | [acme@doppio pahole]$

    Signed-off-by: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • When we filter by column content we may end up with a column
    that has the same value for all the lines. So remove that
    column and tell its unique value on the top, as a comment.

    Example:

    [acme@doppio pahole]$ perf report --sort comm,dso,symbol -d ./build/libdwarves.so.1.0.0 -C pahole | head -15
    # dso: ./build/libdwarves.so.1.0.0
    # comm: pahole
    # Samples: 58409
    #
    # Overhead Symbol
    # ........ ......
    #
    20.93% [.] tag__recode_dwarf_type
    14.94% [.] namespace__recode_dwarf_types
    10.38% [.] cu__table_add_tag
    6.69% [.] __die__process_tag
    5.05% [.] die__process_function
    4.70% [.] list__for_all_tags
    3.68% [.] tag__init
    3.48% [.] die__create_new_parameter
    [acme@doppio pahole]$

    Signed-off-by: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • The strlist__entry method allows accessing strlists like an
    array, will be used in the 'perf report' to access the first
    entry.

    We now keep the nr_entries so that we can check if we have just
    one entry, will be used in 'perf report' to improve the output
    by showing just at the top when we have just, say, one DSO.

    While at it use nr_entries to optimize strlist__is_empty by not
    using the far more costly rb_first based implementation.

    Signed-off-by: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Always printing the level info about if it is in the kernel,
    hypervisor or userspace as that is in the hist_entry.

    Signed-off-by: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

11 Jul, 2009

2 commits

  • Auto-adjust column width of perf report output to the
    longest occuring string length.

    Example:

    [acme@doppio pahole]$ perf report --sort comm,dso,symbol | head -13

    12.79% pahole /usr/lib64/libdw-0.141.so [.] __libdw_find_attr
    8.90% pahole /lib64/libc-2.10.1.so [.] _int_malloc
    8.68% pahole /usr/lib64/libdw-0.141.so [.] __libdw_form_val_len
    8.15% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp
    6.80% pahole /lib64/libc-2.10.1.so [.] __tsearch
    5.54% pahole ./build/libdwarves.so.1.0.0 [.] tag__recode_dwarf_type
    [acme@doppio pahole]$

    [acme@doppio pahole]$ perf report --sort comm,dso,symbol -d /lib64/libc-2.10.1.so | head -10

    21.92% pahole /lib64/libc-2.10.1.so [.] _int_malloc
    20.08% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp
    16.75% pahole /lib64/libc-2.10.1.so [.] __tsearch
    [acme@doppio pahole]$

    Also add these extra options to control the new behaviour:

    -w, --field-width

    Force each column width to the provided list, for large terminal
    readability.

    -t, --field-separator:

    Use a special separator character and don't pad with spaces, replacing
    all occurances of this separator in symbol names (and other output) with
    a '.' character, that thus it's the only non valid separator.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    perf report: Add "Fractal" mode output - support callchains with relative overhead rate
    perf_counter tools: callchains: Manage the cumul hits on the fly
    perf report: Change default callchain parameters
    perf report: Use a modifiable string for default callchain options
    perf report: Warn on callchain output request from non-callchain file
    x86: atomic64: Inline atomic64_read() again
    x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
    x86: atomic64: Improve atomic64_xchg()
    x86: atomic64: Export APIs to modules
    x86: atomic64: Improve atomic64_read()
    x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
    x86: atomic64: Fix unclean type use in atomic64_xchg()
    x86: atomic64: Make atomic_read() type-safe
    x86: atomic64: Reduce size of functions
    x86: atomic64: Improve atomic64_add_return()
    x86: atomic64: Improve cmpxchg8b()
    x86: atomic64: Improve atomic64_read()
    x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
    x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
    perf report: Annotate variable initialization
    ...

    Linus Torvalds
     

10 Jul, 2009

2 commits

  • Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
    enable/disable both its counters. We use this for the
    global enable/disable, and clear all config bits (except EN)
    to disable individual counters.

    Actual ia32 hardware doesn't support lfence, so use a locked
    op without side-effect to implement a full barrier.

    perf stat and perf record seem to function correctly.

    [a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • The cache events contain '$' which will hit shell variable
    expansion. To avoid confusion change this to 'cache', ie
    L1-d$-loads becomes L1-dcache-loads.

    Signed-off-by: Anton Blanchard
    Cc: Roland Dreier
    Cc: Jaswinder Singh Rajput
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

05 Jul, 2009

6 commits

  • The current callchain displays the overhead rates as absolute:
    relative to the total overhead.

    This patch provides relative overhead percentage, in which each
    branch of the callchain tree is a independant instrumentated object.

    This provides a 'fractal' view of the call-chain profile: each
    sub-graph looks like a profile in itself - relative to its parent.

    You can produce such output by using the "fractal" mode
    that you can abbreviate via f, fr, fra, frac, etc...

    ./perf report -s sym -c fractal

    Example:

    8.46% [k] copy_user_generic_string
    |
    |--52.01%-- generic_file_aio_read
    | do_sync_read
    | vfs_read
    | |
    | |--97.20%-- sys_pread64
    | | system_call_fastpath
    | | pread64
    | |
    | --2.81%-- sys_read
    | system_call_fastpath
    | __read
    |
    |--39.85%-- generic_file_buffered_write
    | __generic_file_aio_write_nolock
    | generic_file_aio_write
    | do_sync_write
    | reiserfs_file_write
    | vfs_write
    | |
    | |--97.05%-- sys_pwrite64
    | | system_call_fastpath
    | | __pwrite64
    | |
    | --2.95%-- sys_write
    | system_call_fastpath
    | __write_nocancel
    [...]

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The cumul hits are the number of hits of every childs of a node
    plus the hits of the current nodes, required for percentage
    computing of a branch.

    Theses numbers are calculated during the sorting of the branches of
    the callchain tree using a depth first postfix traversal, so that
    cumulative hits are propagated in the right order.

    But if we plan to implement percentages relative to the parent and not
    absolute percentages (relative to the whole overhead), we need to know
    the cumulative hits of the parent before computing the children
    because the relative minimum acceptable number of entries (ie: minimum
    rate against the cumulative hits from the parent) is the basis to
    filter the children against a given rate.

    Then we need to handle the cumul hits on the fly to prepare the
    implementation of relative overhead rates.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The default callchain parameters are set to use the flat mode and never
    filter any overhead threshold of backtrace.

    But flat mode is boring compared to graph mode.
    Also the number of callchains may be very high if none is
    filtered.

    Let's change this to set the graph view and a minimum overhead of 0.5%
    as default parameters.

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • If the user doesn't provide options to tune his callchain output
    (ie: if he uses -c without arguments) then the default value passed
    in the OPT_CALLBACK_DEFAULT() macro is used.

    But it's parsed later by strtok() which will replace comma separators
    to a zero. This may segfault as we are using a read-only string.

    Use a modifiable one instead, and also fix the "100%" default
    minimum threshold value by turning it into a 0 (output every callchains)
    as it was intended in the origin.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • perf report segfaults while trying to handle callchains from a non
    callchain data file.

    Instead of a segfault, print a useful message to the user.

    Reported-by: Jens Axboe
    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: (27 commits)
    parisc: use generic atomic64 on 32-bit
    parisc: superio: fix build breakage
    parisc: Fix PCI resource allocation on non-PAT SBA machines
    parisc: perf: wire up sys_perf_counter_open
    parisc: add task_pt_regs macro
    parisc: wire sys_perf_counter_open to sys_ni_syscall
    parisc: inventory.c, fix bloated stack frame
    parisc: processor.c, fix bloated stack frame
    parisc: fix compile warning in mm/init.c
    parisc: remove dead code from sys_parisc32.c
    parisc: wire up rt_tgsigqueueinfo
    parisc: ensure broadcast tlb purge runs single threaded
    parisc: fix "delay!" timer handling
    parisc: fix mismatched parenthesis in memcpy.c
    parisc: Fix gcc 4.4 warning in lba_pci.c
    parisc: add parameter to read_cr16()
    parisc: decode_exc.c should include kernel.h
    parisc: remove obsolete hw_interrupt_type
    parisc: fix irq compile bugs in arch/parisc/kernel/irq.c
    parisc: advertise PCI devs after "assign_resources"
    ...

    Manually fixed up trivial conflicts in tools/perf/perf.h due to addition
    of SH vs HPPA perf-counter support.

    Linus Torvalds
     

03 Jul, 2009

9 commits

  • Certain versions of GCC dont see the initialization that is done here:

    builtin-report.c: In function ‘__cmd_report’:
    builtin-report.c:1038: warning: ‘syms’ may be used uninitialized in this function

    So annotate it with a NULL initialization.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Ingo Molnar wrote:

    > i just bisected a 'perf report' bug that would cause us to not
    > resolve all user-space symbols in a 'git gc' run to:
    >
    > f5812a7a336fb952d819e4427b9a2dce02368e82 is first bad commit
    > commit f5812a7a336fb952d819e4427b9a2dce02368e82
    > Author: Arnaldo Carvalho de Melo
    > Date: Tue Jun 30 11:43:17 2009 -0300
    >
    > perf_counter tools: Adjust only prelinked symbol's addresses

    Rename ->prelinked to ->adjust_symbols and making what was done
    only for prelinked libraries also to ET_EXEC binaries, such as
    /usr/bin/git:

    [acme@doppio pahole]$ readelf -h /usr/bin/git | grep Type
    Type: EXEC (Executable file)
    [acme@doppio pahole]$

    And after installing the 'git-debuginfo' package, I get correct results:

    [acme@doppio linux-2.6-tip]$ perf report --sort comm,dso,symbol -d /usr/bin/git | head -20

    #
    # (1139614 samples)
    #
    # Overhead Command Shared Object Symbol
    # ........ ................ ......................... ......
    #
    34.98% git /usr/bin/git [.] send_sideband
    33.39% git /usr/bin/git [.] enter_repo
    6.81% git /usr/bin/git [.] diff_opt_parse
    4.95% git /usr/bin/git [.] is_repository_shallow
    3.24% git /usr/bin/git [.] odb_mkstemp
    1.39% git /usr/bin/git [.] output
    1.34% git /usr/bin/git [.] xmmap
    1.25% git /usr/bin/git [.] receive_pack_config
    1.16% git /usr/bin/git [.] git_pathdup
    0.90% git /usr/bin/git [.] read_object_with_reference
    0.86% git /usr/bin/git [.] show_patch_diff
    0.85% git /usr/bin/git 0x00000000095e2e
    0.69% git /usr/bin/git [.] display
    [acme@doppio linux-2.6-tip]$

    I'll check what are the last cases where we can't resolve symbols, like
    this 0x00000000095e2e later.

    And I guess this will fix the problems Mike were seeing too:

    [acme@doppio linux-2.6-tip]$ readelf -h ../build/perf/vmlinux | grep Type
    Type: EXEC (Executable file)
    [acme@doppio linux-2.6-tip]$

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Kyle McMartin

    Kyle McMartin
     
  • This adds the use of colors to signal at a glance the important
    overhead thresholds in callchains hit rates.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Among perf annotate, perf report and perf top, we can find the
    common colored printing of percents according to the following
    rules:

    High overhead = > 5%, colored in red
    Mid overhead = > 0.5%, colored in green
    Low overhead = < 0.5%, default color

    Factorize these multiple checks in a single function named
    percent_color_fprintf() and also provide a get_percent_color()
    for sites which print percentages and other things at the same
    time.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Callchains output may become a burden on a trace because even
    rarely hit site are exposed. This can be too much information.

    Let the user set a threshold as a minimum percent of hits using
    the new pattern for the -c option:

    -c mode,min_percent

    Example:

    $ perf report -s sym -c flat,4

    8.25% [k] copy_user_generic_string
    4.19%
    copy_user_generic_string
    generic_file_aio_read
    do_sync_read
    vfs_read
    sys_pread64
    system_call_fastpath
    pread64

    5.39% [k] search_by_key
    4.63% 0x00000000009e0a
    2.36% [k] memcpy_c
    [...]

    $ perf report -s sym -c graph,2

    8.25% [k] copy_user_generic_string
    |
    |--4.31%-- generic_file_aio_read
    | do_sync_read
    | vfs_read
    | |
    | --4.19%-- sys_pread64
    | system_call_fastpath
    | pread64
    |
    --3.24%-- generic_file_buffered_write
    __generic_file_aio_write_nolock
    generic_file_aio_write
    do_sync_write
    reiserfs_file_write
    vfs_write
    |
    --3.14%-- sys_pwrite64
    system_call_fastpath
    __pwrite64

    5.39% [k] search_by_key
    |
    --2.23%-- reiserfs_update_sd_size

    4.63% 0x00000000009e0a

    2.36% [k] memcpy_c
    [...]

    You can also omit it and it will default to 0.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Currently, the printing of callchains is done in a single
    vertical level, this is the "flat" mode:

    8.25% [k] copy_user_generic_string
    4.19%
    copy_user_generic_string
    generic_file_aio_read
    do_sync_read
    vfs_read
    sys_pread64
    system_call_fastpath
    pread64

    This patch introduces a new "graph" mode which provides a
    hierarchical output of factorized paths recursively sorted:

    8.25% [k] copy_user_generic_string
    |
    |--4.31%-- generic_file_aio_read
    | do_sync_read
    | vfs_read
    | |
    | |--4.19%-- sys_pread64
    | | system_call_fastpath
    | | pread64
    | |
    | --0.12%-- sys_read
    | system_call_fastpath
    | __read
    |
    |--3.24%-- generic_file_buffered_write
    | __generic_file_aio_write_nolock
    | generic_file_aio_write
    | do_sync_write
    | reiserfs_file_write
    | vfs_write
    | |
    | |--3.14%-- sys_pwrite64
    | | system_call_fastpath
    | | __pwrite64
    | |
    | --0.10%-- sys_write
    [...]

    The command line has then changed.

    By providing the -c option, the callchain will output in the
    flat mode by default.

    But you can override it:

    perf report -c graph

    or

    perf report -c flat

    You can also pass the abreviated mode:

    perf report -c g

    or

    perf report -c gra

    will both make use of the graph mode.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • There is no predefined macro to create an option that can have
    a custom value or a default one if none is given.

    This patch provides a new helper OPT_CALLBACK_DEFAULT() which
    defines such kind of option.

    For example, considering an option -c, we want to get the
    default value in the following cases:

    perf command -c -d
    perf command -d -c

    And the foo value when it's given:

    perf command -c foo -d
    perf command -d -c foo

    That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to
    support default values whatever the position of the option, not
    only in the end.

    Should it now be renamed to PARSE_OPT_ARG_DEFAULT ?

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    Cc: git@vger.kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Iterating through children of a node in the callchain tree
    shows something that may be quite confusing at a first glance.
    The head is the children field of the parent and the list nodes
    are in the brothers field of the children.

    This is because the childs are linked to the parent as a list
    of "brothers" using the "children" list of the parent as a
    head:

    ---------------
    | Parent (head) |-------------------------------------
    --------------- |
    | |
    children |
    | |
    ----------- ----------- |
    | 1st child |---brother---| 2nd child |---brother-----
    ----------- -----------

    This makes the following strange pattern often occuring:

    list_for_each_entry(child, &parent->children, brothers) {
    // do something with children
    }

    Abstract it to chain_for_each_child() to factorize and simplify
    this pattern.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

02 Jul, 2009

2 commits