12 Mar, 2010

1 commit

  • Newt has widespread availability and provides a rather simple
    API as can be seen by the size of this patch.

    The work needed to support it will benefit other frontends too.

    In this initial patch it just checks if the output is a tty, if
    not it falls back to the previous behaviour, also if
    newt-devel/libnewt-dev is not installed the previous behaviour
    is maintaned.

    Pressing enter on a symbol will annotate it, ESC in the
    annotation window will return to the report symbol list.

    More work will be done to remove the special casing in
    color_fprintf, stop using fmemopen/FILE in the printing of
    hist_entries, etc.

    Also the annotation doesn't need to be done via spawning "perf
    annotate" and then browsing its output, we can do better by
    calling directly the builtin-annotate.c functions, that would
    then be moved to tools/perf/util/annotate.c and shared with perf
    top, etc

    But lets go by baby steps, this patch already improves perf
    usability by allowing to quickly do annotations on symbols from
    the report screen and provides a first experimentation with
    libnewt/TUI integration of tools.

    Tested on RHEL5 and Fedora12 X86_64 and on Debian PARISC64 to
    browse a perf.data file collected on a Fedora12 x86_64 box.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Avi Kivity
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

10 Mar, 2010

3 commits

  • Perf report does not handle multiple events being reported, even
    though perf record stores them properly on disk. This patch
    addresses that issue by adding the logic to perf report to use
    the event stream id that is saved by record and the new data
    structures to seperate the event streams and report them
    individually.

    Signed-off-by: Eric B Munson
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Now that report can store historgrams for multiple events we
    need to be able to do the post processing work for each
    histogram. This patch changes the post processing functions so
    that they can be called individually for each event's histogram.

    Signed-off-by: Eric B Munson
    [ Guarantee bisectabilty by fixing up builtin-report.c ]
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • In order to minimize the impact of storing multiple events in a
    report this function will now take the root of the histogram
    tree so that the logic for selecting the proper tree can be
    inserted before the call.

    Signed-off-by: Eric B Munson
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     

29 Jan, 2010

1 commit


27 Jan, 2010

1 commit


16 Jan, 2010

1 commit

  • Since they can come from another architecture with bigger
    pointers, i.e. processing a 64-bit perf.data on a 32-bit arch.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

13 Jan, 2010

1 commit

  • To avoid the funny:

    [root@doppio ~]# perf record -a -f sleep 2s
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.334 MB perf.data (~14572 samples) ]
    [root@doppio ~]# perf report --no-call-graph
    selected -g but no callchain data. Did you call perf record without -g?

    And fix the bug reported by peterz when we do indeed record with
    callchains and then ask for a report without:

    [root@doppio ~]# perf record -a -g -f sleep 2s
    [root@doppio ~]# perf report --no-call-graph
    Segmentation fault
    [root@doppio ~]#

    Reported-by: Peter Zijlstra
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

30 Dec, 2009

1 commit


28 Dec, 2009

3 commits


19 Dec, 2009

1 commit

  • Fixing this:

    [acme@doppio linux-2.6-tip]$ perf diff --hell
    Error: unknown option `hell'

    usage: perf diff [] [old_file] [new_file]
    Segmentation fault
    [acme@doppio linux-2.6-tip]$

    Also go over the other such arrays to check if they all were OK,
    they are, but there were some minor changes to do like making
    one static and renaming another to match the command it refers
    to.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

18 Dec, 2009

1 commit

  • Pekka Enberg reported weird percentages in perf report. It
    turns out we are overflowing a 32-bit variables in struct
    events_stats on 32-bit architectures.

    Before:

    [acme@ana linux-2.6-tip]$ perf report -i pekka.perf.data 2> /dev/null | head -10
    281.96% Xorg b710a561 [.] 0x000000b710a561
    140.15% Xorg [kernel] [k] __initramfs_end
    51.56% metacity libgobject-2.0.so.0.2000.1 [.] 0x00000000026e46
    35.12% evolution libcairo.so.2.10800.6 [.] 0x000000000203bd
    33.84% metacity libpthread-2.9.so [.] 0x00000000007a3d

    After:

    [acme@ana linux-2.6-tip]$ perf report -i pekka.perf.data 2> /dev/null | head -10
    30.04% Xorg b710a561 [.] 0x000000b710a561
    14.93% Xorg [kernel] [k] __initramfs_end
    5.49% metacity libgobject-2.0.so.0.2000.1 [.] 0x00000000026e46
    3.74% evolution libcairo.so.2.10800.6 [.] 0x000000000203bd
    3.61% metacity libpthread-2.9.so [.] 0x00000000007a3d

    Reported-by: Pekka Enberg
    Tested-by: Pekka Enberg
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

16 Dec, 2009

8 commits

  • That means that almost everything you can do with 'perf report'
    can be done with 'perf diff', for instance:

    $ perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2699
    samples) ] $ perf record -f find / > /dev/null
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.062 MB perf.data (~2687
    samples) ] perf diff | head -8
    9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
    2.91% -1.00% find [kernel] [k] __kmalloc
    2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
    1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
    2.44% find [kernel] [k] half_md4_transform
    $

    So if you want to zoom into libc:

    $ perf diff --dsos libc-2.10.1.so | head -8
    37.34% find [.] _IO_vfprintf_internal
    10.34% find [.] __GI_memmove
    8.25% +2.00% find [.] _int_malloc
    5.07% -1.00% find [.] __GI_mempcpy
    7.62% +2.00% find [.] _int_free
    $

    And if there were multiple commands using libc, it is also
    possible to aggregate them all by using --sort symbol:

    $ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
    37.34% [.] _IO_vfprintf_internal
    10.34% [.] __GI_memmove
    8.25% +2.00% [.] _int_malloc
    5.07% -1.00% [.] __GI_mempcpy
    7.62% +2.00% [.] _int_free
    $

    The displacement column now is off by default, to use it:

    perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
    37.34% [.] _IO_vfprintf_internal
    10.34% [.] __GI_memmove
    8.25% +2.00% [.] _int_malloc
    5.07% -1.00% +2 [.] __GI_mempcpy
    7.62% +2.00% -1 [.] _int_free
    $

    Using -t/--field-separator can be used for scripting:

    $ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
    37.34, , ,[.] _IO_vfprintf_internal
    10.34, , ,[.] __GI_memmove
    8.25,+2.00%, ,[.] _int_malloc
    5.07,-1.00%, +2,[.] __GI_mempcpy
    7.62,+2.00%, -1,[.] _int_free
    6.99,+1.00%, -1,[.] _IO_new_file_xsputn
    1.89,-2.00%, +4,[.] __readdir64
    $

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Introduced in:

    d599db3fc5dd4f1e8432fdbc6d899584b25f4dff
    "perf report: Generalize perf_session__fprintf_hists()"

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Those don't make sense for tools such as 'perf diff'.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Will be used in other tools such as 'perf diff'.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Pull it out of builtin-report - further changes will be made and it
    will then be reusable in 'perf diff' as well.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • So that --dsos, --comm, --symbols can bem used in more tools,
    like in perf diff:

    $ perf record -f find / > /dev/null
    $ perf record -f find / > /dev/null
    $ perf diff --dsos /lib64/libc-2.10.1.so | head -5
    1 +22392124 /lib64/libc-2.10.1.so _IO_vfprintf_internal
    2 +6410655 /lib64/libc-2.10.1.so __GI_memmove
    3 +1 +9192692 /lib64/libc-2.10.1.so _int_malloc
    4 -1 -15158605 /lib64/libc-2.10.1.so _int_free
    5 +45669 /lib64/libc-2.10.1.so _IO_new_file_xsputn
    $

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Will be used in perf diff too.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • This simplifies a lot of functions, less stuff to be done by
    tool writers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

15 Dec, 2009

5 commits


14 Dec, 2009

8 commits


12 Dec, 2009

1 commit

  • That does all the initialization boilerplate, opening the file,
    reading the header, checking if it is valid, etc.

    And that will as well have the threads list, kmap (now) global
    variable, etc, so that we can handle two (or more) perf.data files
    describing sessions to compare.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

07 Dec, 2009

1 commit

  • Currently, sample event data is parsed for each commands, and it
    is assuming that the data is not including other data. (E.g.
    timechart, trace, etc. can't parse the event if it has
    PERF_SAMPLE_CALLCHAIN)

    So, even if we record the superset data for multiple commands at
    a time, commands can't parse. etc.

    To fix it, this makes common sample event parser, and use it to
    parse sample event correctly. (PERF_SAMPLE_READ is unsupported
    for now though, it seems to be not using.)

    Signed-off-by: OGAWA Hirofumi
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    OGAWA Hirofumi
     

28 Nov, 2009

3 commits

  • Now we have a very high level routine for simple tools to
    process IP sample events:

    int event__preprocess_sample(const event_t *self,
    struct addr_location *al,
    symbol_filter_t filter)

    It receives the event itself and will insert new threads in the
    global threads list and resolve the map and symbol, filling all
    this info into the new addr_location struct, so that tools like
    annotate and report can further process the event by creating
    hist_entries in their specific way (with or without callgraphs,
    etc).

    It in turn uses the new next layer function:

    void thread__find_addr_location(struct thread *self, u8 cpumode,
    enum map_type type, u64 addr,
    struct addr_location *al,
    symbol_filter_t filter)

    This one will, given a thread (userspace or the kernel kthread
    one), will find the given type (MAP__FUNCTION now, MAP__VARIABLE
    too in the near future) at the given cpumode, taking vdsos into
    account (userspace hit, but kernel symbol) and will fill all
    these details in the addr_location given.

    Tools that need a more compact API for plain function
    resolution, like 'kmem', can use this other one:

    struct symbol *thread__find_function(struct thread *self, u64 addr,
    symbol_filter_t filter)

    So, to resolve a kernel symbol, that is all the 'kmem' tool
    needs, its just a matter of calling:

    sym = thread__find_function(kthread, addr, NULL);

    The 'filter' parameter is needed because we do lazy
    parsing/loading of ELF symtabs or /proc/kallsyms.

    With this we remove more code duplication all around, which is
    always good, huh? :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: John Kacur
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • While implementing event__preprocess_sample, that will do all of
    the symbol lookup in one convenient function, I noticed that
    util/process_event.[ch] were not being used at all, then started
    looking if there were other functions that could be shared
    and...

    All those functions really don't need to receive offset + head,
    the only thing they did was common to all of them, so do it at
    one place instead.

    Stats about number of each type of event processed now is done
    in a central place.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: John Kacur
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Making the routines that were so far specific to the kernel maps
    useful for all threads.

    This is done by making the kernel maps be contained in a kernel
    "thread".

    This gets the kernel specific routines closer to the userspace
    counterparts, which will help in reducing the boilerplate for
    resolving a symbol, as will be demonstrated in the next patches.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo