23 Mar, 2012

1 commit

  • The perf diff command is broken since:
    perf hists: Threaded addition and sorting of entries
    commit 1980c2ebd7020d82c024b8c4046849b38e78e7da

    Several places were broken:
    - hists data need to be collected into opened sessions instead
    of into events
    - session's hists data need to be initialized properly when the
    session is created
    - hist_entry__pcnt_snprintf: the percentage and displacement
    buffer preparation must not use 'ret' because it's used
    as a pointer to the final buffer

    Signed-off-by: Jiri Olsa
    Cc: Corey Ashford
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120322133726.GB1601@m.brq.redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

28 Nov, 2011

3 commits

  • To better reflect that it became the base class for all tools, that must
    be in each tool struct and where common stuff will be put.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-qgpc4msetqlwr8y2k7537cxe@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Reducing the exposure of perf_session further, so that we can use the
    classes in cases where no perf.data file is created.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-stua66dcscsezzrcdugvbmvd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • So that we don't need to have that many globals.

    Next steps will remove the 'session' pointer, that in most cases is
    not needed.

    Then we can rename perf_event_ops to 'perf_tool' that better describes
    this class hierarchy.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

07 Oct, 2011

1 commit


24 Mar, 2011

1 commit

  • Resolving the sample->id to an evsel since the most advanced tools,
    report and annotate, and the others will too when they evolve to
    properly support multi-event perf.data files.

    Good also because it does an extra validation, checking that the ID is
    valid when present. When that is not the case, the overhead is just a
    branch + function call (perf_evlist__id2evsel).

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 Jan, 2011

2 commits


22 Dec, 2010

3 commits

  • The symfs argument allows analysis of perf.data file using a locally accessible
    filesystem tree with debug symbols - e.g., tree created during image builds,
    sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything
    with an OS tree can be analyzed from anywhere without the need to populate a
    local data store with build-ids.

    Commiter notes:

    o Fixed up symfs="/" variants handling.

    o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files
    outside the symfs directory.

    LKML-Reference:
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • This patch changes perf report to ask for the ID info on all events be
    default if recording from multiple CPUs.

    Perf report, annotate and diff will now process the events in order if
    the kernel is able to provide timestamps on all events. This ensures
    that events such as COMM and MMAP which are necessary to correctly
    interpret samples are processed prior to those samples so that they are
    attributed correctly.

    Before:
    # perf record ./cachetest
    # perf report

    # Events: 6K cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. ...............................
    #
    74.11% :3259 [unknown] [k] 0x4a6c
    1.50% cachetest ld-2.11.2.so [.] 0x1777c
    1.46% :3259 [kernel.kallsyms] [k] .perf_event_mmap_ctx
    1.25% :3259 [kernel.kallsyms] [k] restore
    0.74% :3259 [kernel.kallsyms] [k] ._raw_spin_lock
    0.71% :3259 [kernel.kallsyms] [k] .filemap_fault
    0.66% :3259 [kernel.kallsyms] [k] .memset
    0.54% cachetest [kernel.kallsyms] [k] .sha_transform
    0.54% :3259 [kernel.kallsyms] [k] .copy_4K_page
    0.54% :3259 [kernel.kallsyms] [k] .find_get_page
    0.52% :3259 [kernel.kallsyms] [k] .trace_hardirqs_off
    0.50% :3259 [kernel.kallsyms] [k] .__do_fault

    After:
    # perf report

    # Events: 6K cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. ...............................
    #
    44.28% cachetest cachetest [.] sumArrayNaive
    22.53% cachetest cachetest [.] sumArrayOptimal
    6.59% cachetest ld-2.11.2.so [.] 0x1777c
    2.13% cachetest [unknown] [k] 0x340
    1.46% cachetest [kernel.kallsyms] [k] .perf_event_mmap_ctx
    1.25% cachetest [kernel.kallsyms] [k] restore
    0.74% cachetest [kernel.kallsyms] [k] ._raw_spin_lock
    0.71% cachetest [kernel.kallsyms] [k] .filemap_fault
    0.66% cachetest [kernel.kallsyms] [k] .memset
    0.54% cachetest [kernel.kallsyms] [k] .copy_4K_page
    0.54% cachetest [kernel.kallsyms] [k] .find_get_page
    0.54% cachetest [kernel.kallsyms] [k] .sha_transform
    0.52% cachetest [kernel.kallsyms] [k] .trace_hardirqs_off
    0.50% cachetest [kernel.kallsyms] [k] .__do_fault

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ian Munsie
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian Munsie
     
  • If we are running the new perf on an old kernel without support for
    sample_id_all, we should fall back to the old unordered processing of
    events. If we didn't than we would *always* process events without
    timestamps out of order, whether or not we hit a reordering race. In
    other words, instead of there being a chance of not attributing samples
    correctly, we would guarantee that samples would not be attributed.

    While processing all events without timestamps before events with
    timestamps may seem like an intuitive solution, it falls down as
    PERF_RECORD_EXIT events would also be processed before any samples.
    Even with a workaround for that case, samples before/after an exec would
    not be attributed correctly.

    This patch allows commands to indicate whether they need to fall back to
    unordered processing, so that commands that do not care about timestamps
    on every event will not be affected. If we do fallback, this will print
    out a warning if report -D was invoked.

    This patch adds the test in perf_session__new so that we only need to
    test once per session. Commands that do not use an event_ops (such as
    record and top) can simply pass NULL in it's place.

    Acked-by: Thomas Gleixner
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ian Munsie
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian Munsie
     

05 Dec, 2010

1 commit

  • At perf_session__process_event, so that we reduce the number of lines in eache
    tool sample processing routine that now receives a sample_data pointer already
    parsed.

    This will also be useful in the next patch, where we'll allow sample the
    identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu,
    timestamp) just after before every event.

    Also validate callchains in perf_session__process_event, i.e. as early as
    possible, and keep a counter of the number of events discarded due to invalid
    callchains, warning the user about it if it happens.

    There is an assumption that was kept that all events have the same sample_type,
    that will be dealt with in the future, when this preexisting limitation will be
    removed.

    Tested-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Acked-by: Ian Munsie
    Acked-by: Thomas Gleixner
    Cc: Frédéric Weisbecker
    Cc: Ian Munsie
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

02 Dec, 2010

1 commit


27 Jul, 2010

1 commit


05 Jun, 2010

1 commit


15 May, 2010

2 commits

  • Number of samples is meaningless after we switched to auto-freq, so
    report the number of events, i.e. not the sum of the different periods,
    but the number PERF_RECORD_SAMPLE emitted by the kernel.

    While doing this I noticed that naming "count" to the sum of all the
    event periods can be confusing, so rename it to .period, just like in
    struct sample.data, so that we become more consistent.

    This helps with the next step, that was to record in struct hist_entry
    the number of sample events for each instance, we need that because we
    use it to generate the number of events when applying filters to the
    tree of hist entries like it is being done in the TUI report browser.

    Suggested-by: Ingo Molnar
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The events_stats.total field is too generic, rename it to .total_period,
    and also add a comment explaining that it is the sum of all the .period
    fields in samples, that is needed because we use auto-freq to avoid
    sampling artifacts.

    Ditto for events_stats.lost, that is the sum of all lost_event.lost
    fields, i.e. the number of events the kernel dropped.

    Looking at the users, builtin-sched.c can make use of these fields and
    stop doing it again.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

11 May, 2010

1 commit

  • In cbbc79a we introduced support for multiple events by introducing a
    new "event_stat_id" struct and then made several perf_session methods
    receive a point to it instead of a pointer to perf_session, and kept the
    event_stats and hists rb_tree in perf_session.

    While working on the new newt based browser, I realised that it would be
    better to introduce a new class, "hists" (short for "histograms"),
    renaming the "event_stat_id" struct and the perf_session methods that
    were really "hists" methods, as they manipulate only struct hists
    members, not touching anything in the other perf_session members.

    Other optimizations, such as calculating the maximum lenght of a symbol
    name present in an hists instance will be possible as we add them,
    avoiding a re-traversal just for finding that information.

    The rationale for the name "hists" to replace "event_stat_id" is that we
    may have multiple sets of hists for the same event_stat id, as, for
    instance, the 'perf diff' tool has, so event stat id is not what
    characterizes what this struct and the functions that manipulate it do.

    Cc: Eric B Munson
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

10 May, 2010

1 commit

  • And with that fix at least one bug:

    The first hit for an entry, the one that calls malloc to create a new
    instance in __perf_session__add_hist_entry, wasn't adding the count to
    the per cpumode (PERF_RECORD_MISC_USER, etc) total variable.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

03 May, 2010

1 commit

  • Currently, perf 'live mode' writes build-ids at the end of the
    session, which isn't actually useful for processing live mode events.

    What would be better would be to have the build-ids sent before any of
    the samples that reference them, which can be done by processing the
    event stream and retrieving the build-ids on the first hit. Doing
    that in perf-record itself, however, is off-limits.

    This patch introduces perf-inject, which does the same job while
    leaving perf-record untouched. Normal mode perf still records the
    build-ids at the end of the session as it should, but for live mode,
    perf-inject can be injected in between the record and report steps
    e.g.:

    perf record -o - ./hackbench 10 | perf inject -v -b | perf report -v -i -

    perf-inject reads a perf-record event stream and repipes it to stdout.
    At any point the processing code can inject other events into the
    event stream - in this case build-ids (-b option) are read and
    injected as needed into the event stream.

    Build-ids are just the first user of perf-inject - potentially
    anything that needs userspace processing to augment the trace stream
    with additional information could make use of this facility.

    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Frédéric Weisbecker
    LKML-Reference:
    Signed-off-by: Tom Zanussi
    Signed-off-by: Arnaldo Carvalho de Melo

    Tom Zanussi
     

19 Apr, 2010

1 commit


14 Apr, 2010

1 commit

  • Parsing an option from the command line with OPT_BOOLEAN on a
    bool data type would not work on a big-endian machine due to the
    manner in which the boolean was being cast into an int and
    incremented. For example, running 'perf probe --list' on a
    PowerPC machine would fail to properly set the list_events bool
    and would therefore print out the usage information and
    terminate.

    This patch makes OPT_BOOLEAN work as expected with a bool
    datatype. For cases where the original OPT_BOOLEAN was
    intentionally being used to increment an int each time it was
    passed in on the command line, this patch introduces OPT_INCR
    with the old behaviour of OPT_BOOLEAN (the verbose variable is
    currently the only such example of this).

    I have reviewed every use of OPT_BOOLEAN to verify that a true
    C99 bool was passed. Where integers were used, I verified that
    they were only being used for boolean logic and changed them to
    bools to ensure that they would not be mistakenly used as ints.
    The major exception was the verbose variable which now uses
    OPT_INCR instead of OPT_BOOLEAN.

    Signed-off-by: Ian Munsie
    Acked-by: David S. Miller
    Cc: # NOTE: wont apply to .3[34].x cleanly, please backport
    Cc: Git development list
    Cc: Ian Munsie
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: KOSAKI Motohiro
    Cc: Hitoshi Mitake
    Cc: Rusty Russell
    Cc: Frederic Weisbecker
    Cc: Eric B Munson
    Cc: Valdis.Kletnieks@vt.edu
    Cc: WANG Cong
    Cc: Thiago Farina
    Cc: Masami Hiramatsu
    Cc: Xiao Guangrong
    Cc: Jaswinder Singh Rajput
    Cc: Arjan van de Ven
    Cc: OGAWA Hirofumi
    Cc: Mike Galbraith
    Cc: Tom Zanussi
    Cc: Anton Blanchard
    Cc: John Kacur
    Cc: Li Zefan
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ian Munsie
     

10 Mar, 2010

2 commits

  • Now that report can store historgrams for multiple events we
    need to be able to do the post processing work for each
    histogram. This patch changes the post processing functions so
    that they can be called individually for each event's histogram.

    Signed-off-by: Eric B Munson
    [ Guarantee bisectabilty by fixing up builtin-report.c ]
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • In order to minimize the impact of storing multiple events in a
    report this function will now take the root of the histogram
    tree so that the logic for selecting the proper tree can be
    inserted before the call.

    Signed-off-by: Eric B Munson
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     

16 Jan, 2010

1 commit

  • Since they can come from another architecture with bigger
    pointers, i.e. processing a 64-bit perf.data on a 32-bit arch.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

30 Dec, 2009

2 commits

  • When we finish creating the hist_entries we _already_ have them
    sorted "by name", in fact by what is in --sort, that is exactly
    how we can find the pairs in perf_session__match_hists as
    'comm', 'dso' & 'symbol' all are strings we need to find the
    matches in the baseline session.

    So only do the sort by hits followed by a resort by --sort if we
    need to find the position for shwowing the --displacement of
    hist entries.

    Now all these modes work correctly:

    Example is a simple 'perf record -f find / > /dev/null' ran
    twice then followed by the following commands:

    $ perf diff -f --sort comm
    # Baseline Delta Command
    # ........ .......... .......
    #
    0.00% +100.00% find
    $ perf diff -f --sort dso
    # Baseline Delta Shared Object
    # ........ .......... ..................
    #
    59.97% -0.44% [kernel]
    21.17% +0.28% libc-2.5.so
    18.49% +0.16% [ext3]
    0.37% find
    $ perf diff -f --sort symbol | head -8
    # Baseline Delta Symbol
    # ........ .......... ......
    #
    6.21% +0.36% [k] ext3fs_dirhash
    3.43% +0.41% [.] __GI_strlen
    3.53% +0.16% [k] __kmalloc
    3.17% +0.49% [k] system_call
    3.06% +0.37% [k] ext3_htree_store_dirent
    $ perf diff -f --sort dso,symbol | head -8
    # Baseline Delta Shared Object Symbol
    # ........ .......... .................. ......
    #
    6.21% +0.36% [ext3] [k] ext3fs_dirhash
    3.43% +0.41% libc-2.5.so [.] __GI_strlen
    3.53% +0.16% [kernel] [k] __kmalloc
    3.17% +0.49% [kernel] [k] system_call
    3.06% +0.37% [ext3] [k] ext3_htree_store_dirent
    $

    And we don't have to do two expensive resorts in the common, non
    --displacement case.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Since we don't add histograms buckets for them, this way the sum
    of baselines should be 100%.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

28 Dec, 2009

2 commits


19 Dec, 2009

1 commit

  • Fixing this:

    [acme@doppio linux-2.6-tip]$ perf diff --hell
    Error: unknown option `hell'

    usage: perf diff [] [old_file] [new_file]
    Segmentation fault
    [acme@doppio linux-2.6-tip]$

    Also go over the other such arrays to check if they all were OK,
    they are, but there were some minor changes to do like making
    one static and renaming another to match the command it refers
    to.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

17 Dec, 2009

1 commit

  • This is a more intuitive / more meaningful default:

    $ perf diff | head -8
    9.02% +1.00% libc-2.10.1.so [.] _IO_vfprintf_internal
    2.91% -1.00% [kernel] [k] __kmalloc
    2.85% -1.00% [kernel] [k] ext4_htree_store_dirent
    1.99% -1.00% [kernel] [k] _atomic_dec_and_lock
    2.44% [kernel]
    $

    Suggested-by: Ingo Molnar
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

16 Dec, 2009

4 commits

  • That means that almost everything you can do with 'perf report'
    can be done with 'perf diff', for instance:

    $ perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2699
    samples) ] $ perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2687
    samples) ] perf diff | head -8
    9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
    2.91% -1.00% find [kernel] [k] __kmalloc
    2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
    1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
    2.44% find [kernel] [k] half_md4_transform
    $

    So if you want to zoom into libc:

    $ perf diff --dsos libc-2.10.1.so | head -8
    37.34% find [.] _IO_vfprintf_internal
    10.34% find [.] __GI_memmove
    8.25% +2.00% find [.] _int_malloc
    5.07% -1.00% find [.] __GI_mempcpy
    7.62% +2.00% find [.] _int_free
    $

    And if there were multiple commands using libc, it is also
    possible to aggregate them all by using --sort symbol:

    $ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
    37.34% [.] _IO_vfprintf_internal
    10.34% [.] __GI_memmove
    8.25% +2.00% [.] _int_malloc
    5.07% -1.00% [.] __GI_mempcpy
    7.62% +2.00% [.] _int_free
    $

    The displacement column now is off by default, to use it:

    perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
    37.34% [.] _IO_vfprintf_internal
    10.34% [.] __GI_memmove
    8.25% +2.00% [.] _int_malloc
    5.07% -1.00% +2 [.] __GI_mempcpy
    7.62% +2.00% -1 [.] _int_free
    $

    Using -t/--field-separator can be used for scripting:

    $ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
    37.34, , ,[.] _IO_vfprintf_internal
    10.34, , ,[.] __GI_memmove
    8.25,+2.00%, ,[.] _int_malloc
    5.07,-1.00%, +2,[.] __GI_mempcpy
    7.62,+2.00%, -1,[.] _int_free
    6.99,+1.00%, -1,[.] _IO_new_file_xsputn
    1.89,-2.00%, +4,[.] __readdir64
    $

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • So that --dsos, --comm, --symbols can bem used in more tools,
    like in perf diff:

    $ perf record -f find / > /dev/null
    $ perf record -f find / > /dev/null
    $ perf diff --dsos /lib64/libc-2.10.1.so | head -5
    1 +22392124 /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 +6410655 /lib64/libc-2.10.1.so __GI_memmove
    3 +1 +9192692 /lib64/libc-2.10.1.so _int_malloc
    4 -1 -15158605 /lib64/libc-2.10.1.so _int_free
    5 +45669 /lib64/libc-2.10.1.so _IO_new_file_xsputn
    $

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Will be used in perf diff too.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • This simplifies a lot of functions, less stuff to be done by
    tool writers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

15 Dec, 2009

1 commit

  • I guess it is enough to show some examples:

    [root@doppio linux-2.6-tip]# rm -f perf.data*
    [root@doppio linux-2.6-tip]# ls -la perf.data*
    ls: cannot access perf.data*: No such file or directory
    [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ]
    [root@doppio linux-2.6-tip]# ls -la perf.data*
    -rw------- 1 root root 74440 2009-12-14 20:03 perf.data
    [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ]
    [root@doppio linux-2.6-tip]# ls -la perf.data*
    -rw------- 1 root root 74280 2009-12-14 20:03 perf.data
    -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old
    [root@doppio linux-2.6-tip]# perf diff | head -5
    1 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 -15307806 [kernel.kallsyms] __kmalloc
    3 +1 +3665941 /lib64/libc-2.10.1.so __GI_memmove
    4 +4 +23508995 /lib64/libc-2.10.1.so _int_malloc
    5 +7 +38538813 [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]# perf diff -p | head -5
    1 +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 [kernel.kallsyms] __kmalloc
    3 +1 /lib64/libc-2.10.1.so __GI_memmove
    4 +4 /lib64/libc-2.10.1.so _int_malloc
    5 +7 -1.00% [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]# perf diff -v | head -5
    1 361449551 326454971 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 151009241 135701435 -15307806 [kernel.kallsyms] __kmalloc
    3 +1 101805328 105471269 +3665941 /lib64/libc-2.10.1.so __GI_memmove
    4 +4 78041440 101550435 +23508995 /lib64/libc-2.10.1.so _int_malloc
    5 +7 59536172 98074985 +38538813 [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]# perf diff -vp | head -5
    1 9.00% 8.00% +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 3.00% 3.00% [kernel.kallsyms] __kmalloc
    3 +1 2.00% 2.00% /lib64/libc-2.10.1.so __GI_memmove
    4 +4 2.00% 2.00% /lib64/libc-2.10.1.so _int_malloc
    5 +7 1.00% 2.00% -1.00% [kernel.kallsyms] __d_lookup
    [root@doppio linux-2.6-tip]#

    This should be enough for diffs where the system is non
    volatile, i.e. when one doesn't updates binaries.

    For volatile environments, stay tuned for the next perf tool
    feature: a buildid cache populated by 'perf record', managed by
    'perf buildid-cache' a-la ccache, and used by all the report
    tools.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: "Paul E. McKenney"
    Cc: Stephen Hemminger
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Paul E. McKenney
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo