02 Aug, 2012

1 commit

  • Since we need evsel->{attr.{sample_{id_all,type}},sample_size},
    reducing the number of parameters tools have to pass.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-wdtmgak0ihgsmw1brb54a8h4@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

09 Mar, 2012

1 commit

  • This patch adds:

    - ability to parse samples with PERF_SAMPLE_BRANCH_STACK
    - sort on branches (dso_from, symbol_from, dso_to, symbol_to, mispredict)
    - build histograms on branches

    Signed-off-by: Roberto Agostino Vitillo
    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-12-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Roberto Agostino Vitillo
     

12 Dec, 2011

1 commit

  • It's the counterpart of perf_session__parse_sample.

    v2: fixed mistakes found by David Ahern.
    v3: s/data/sample/
    s/perf_event__change_sample/perf_event__synthesize_sample

    Reviewed-by: David Ahern
    Cc: Arun Sharma
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: devel@openvz.org
    Link: http://lkml.kernel.org/r/1323266161-394927-3-git-send-email-avagin@openvz.org
    Signed-off-by: Andrew Vagin
    Signed-off-by: Arnaldo Carvalho de Melo

    Andrew Vagin
     

02 Dec, 2011

1 commit

  • So that tools like 'perf test' can print the events when in verbose
    mode, for instance.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-xnovdqfi25nc48gy6604k7yp@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

28 Nov, 2011

3 commits

  • To better reflect that it became the base class for all tools, that must
    be in each tool struct and where common stuff will be put.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-qgpc4msetqlwr8y2k7537cxe@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Reducing the exposure of perf_session further, so that we can use the
    classes in cases where no perf.data file is created.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-stua66dcscsezzrcdugvbmvd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • So that we don't need to have that many globals.

    Next steps will remove the 'session' pointer, that in most cases is
    not needed.

    Then we can rename perf_event_ops to 'perf_tool' that better describes
    this class hierarchy.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

24 Sep, 2011

1 commit

  • Currently, analyzing PPC data files on x86 the cpu field is always 0 and
    the tid and pid are backwards. For example, analyzing a PPC file on PPC
    the pid/tid fields show:

    rsyslogd 1210/1212

    and analyzing the same PPC file using an x86 perf binary shows:

    rsyslogd 1212/1210

    The problem is that the swap_op method for samples is
    perf_event__all64_swap which assumes all elements in the sample_data
    struct are u64s. cpu, tid and pid are u32s and need to be handled
    individually. Given that the swap is done before the sample is parsed,
    the simplest solution is to undo the 64-bit swap of those elements when
    the sample is parsed and do the proper swap.

    The RAW data field is generic and perf cannot have programmatic knowledge
    of how to treat that data. Instead a warning is given to the user.

    Thanks to Anton Blanchard for providing a data file for a mult-CPU
    PPC system so I could verify the fix for the CPU fields.

    v3 -> v4:
    - fixed use of WARN_ONCE

    v2 -> v3:
    - used WARN_ONCE for message regarding raw data
    - removed struct wrapper around union
    - fixed whitespace issues

    v1 -> v2:
    - added a union for undoing the byte-swap on u64 and redoing swap on
    u32's to address compiler errors (see git commit 65014ab3)

    Cc: Anton Blanchard
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

03 Jun, 2011

1 commit

  • Fixes two more cases where the python binding would not load:

    . Not finding die(), which it shouldn't anyway, not good to just stop the
    world because some particular perf.data file is invalid, just propagate
    the error to the caller.

    . Not finding perf_sample_size: fix it by moving it from event.c to evsel,
    where it belongs, as most cases are moving to operate on an evsel object.o

    One of the fixed problems:

    [root@emilia ~]# python
    >>> import perf
    Traceback (most recent call last):
    File "", line 1, in
    ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: perf_sample_size
    >>>
    [root@emilia ~]#

    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-1hkj7b2cvgbfnoizsekjb6c9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

22 May, 2011

1 commit


11 Feb, 2011

1 commit


10 Feb, 2011

1 commit

  • Jeff Moyer reported these messages:

    Warning: ... trying to fall back to cpu-clock-ticks

    couldn't open /proc/-1/status
    couldn't open /proc/-1/maps
    [ls output]
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.008 MB perf.data (~363 samples) ]

    That lead me and David Ahern to see that something was fishy on the thread
    synthesizing routines, at least for the case where the workload is started
    from 'perf record', as -1 is the default for target_tid in 'perf record --tid'
    parameter, so somehow we were trying to synthesize the PERF_RECORD_MMAP and
    PERF_RECORD_COMM events for the thread -1, a bug.

    So I investigated this and noticed that when we introduced support for
    recording a process and its threads using --pid some bugs were introduced and
    that the way to fix it was to instead of passing the target_tid to the event
    synthesizing routines we should better pass the thread_map that has the list of
    threads for a --pid or just the single thread for a --tid.

    Checked in the following ways:

    On a 8-way machine run cyclictest:

    [root@emilia ~]# perf record cyclictest -a -t -n -p99 -i100 -d50
    policy: fifo: loadavg: 0.00 0.13 0.31 2/139 28798

    T: 0 (28791) P:99 I:100 C: 25072 Min: 4 Act: 5 Avg: 6 Max: 122
    T: 1 (28792) P:98 I:150 C: 16715 Min: 4 Act: 6 Avg: 5 Max: 27
    T: 2 (28793) P:97 I:200 C: 12534 Min: 4 Act: 5 Avg: 4 Max: 8
    T: 3 (28794) P:96 I:250 C: 10028 Min: 4 Act: 5 Avg: 5 Max: 96
    T: 4 (28795) P:95 I:300 C: 8357 Min: 5 Act: 6 Avg: 5 Max: 12
    T: 5 (28796) P:94 I:350 C: 7163 Min: 5 Act: 6 Avg: 5 Max: 12
    T: 6 (28797) P:93 I:400 C: 6267 Min: 4 Act: 5 Avg: 5 Max: 9
    T: 7 (28798) P:92 I:450 C: 5571 Min: 4 Act: 5 Avg: 5 Max: 9
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.108 MB perf.data (~4719 samples) ]

    [root@emilia ~]#

    This will create one extra thread per CPU:

    [root@emilia ~]# tuna -t cyclictest -CP
    thread ctxt_switches
    pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
    28825 OTHER 0 0xff 2169 671 cyclictest
    28832 FIFO 93 6 52338 1 cyclictest
    28833 FIFO 92 7 46524 1 cyclictest
    28826 FIFO 99 0 209360 1 cyclictest
    28827 FIFO 98 1 139577 1 cyclictest
    28828 FIFO 97 2 104686 0 cyclictest
    28829 FIFO 96 3 83751 1 cyclictest
    28830 FIFO 95 4 69794 1 cyclictest
    28831 FIFO 94 5 59825 1 cyclictest
    [root@emilia ~]#

    So we should expect only samples for the above 9 threads when using the
    --dump-raw-trace|-D perf report switch to look at the column with the tid:

    [root@emilia ~]# perf report -D | grep RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort | uniq -c
    629 28825
    110 28826
    491 28827
    308 28828
    198 28829
    621 28830
    225 28831
    203 28832
    89 28833
    [root@emilia ~]#

    So for workloads started by 'perf record' seems to work, now for existing workloads,
    just run cyclictest first, without 'perf record':

    [root@emilia ~]# tuna -t cyclictest -CP
    thread ctxt_switches
    pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
    28859 OTHER 0 0xff 594 200 cyclictest
    28864 FIFO 95 4 16587 1 cyclictest
    28865 FIFO 94 5 14219 1 cyclictest
    28866 FIFO 93 6 12443 0 cyclictest
    28867 FIFO 92 7 11062 1 cyclictest
    28860 FIFO 99 0 49779 1 cyclictest
    28861 FIFO 98 1 33190 1 cyclictest
    28862 FIFO 97 2 24895 1 cyclictest
    28863 FIFO 96 3 19918 1 cyclictest
    [root@emilia ~]#

    and then later did:

    [root@emilia ~]# perf record --pid 28859 sleep 3
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.027 MB perf.data (~1195 samples) ]
    [root@emilia ~]#

    To collect 3 seconds worth of samples for pid 28859 and its children:

    [root@emilia ~]# perf report -D | grep RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort | uniq -c
    15 28859
    33 28860
    19 28861
    13 28862
    13 28863
    10 28864
    11 28865
    9 28866
    255 28867
    [root@emilia ~]#

    Works, last thing is to check if looking at just one of those threads also works:

    [root@emilia ~]# perf record --tid 28866 sleep 3
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.006 MB perf.data (~242 samples) ]
    [root@emilia ~]# perf report -D | grep RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort | uniq -c
    3 28866
    [root@emilia ~]#

    Works too.

    Reported-by: Jeff Moyer
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jeff Moyer
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 Jan, 2011

2 commits


24 Jan, 2011

1 commit

  • To avoid linking more stuff in the python binding I'm working on, future
    csets will make the sample type be taken from the evsel itself, but for
    that we need to first have one file per cpu and per sample_type, not a
    single perf.data file.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

09 Dec, 2010

2 commits

  • The dump code used by perf report -D is scattered all over the place.
    Move it to separate functions.

    Acked-by: Ian Munsie
    Cc: Frederic Weisbecker
    Cc: Ian Munsie
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Gleixner
     
  • event__name[] is missing an entry for PERF_RECORD_FINISHED_ROUND, but we
    happily access the array from the dump code.

    Make event__name[] static and provide an accessor function, fix up all
    callers and add the missing string.

    Cc: Frederic Weisbecker
    Cc: Ian Munsie
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Gleixner
     

05 Dec, 2010

2 commits

  • So that we can use -T == --timestamp, asking for PERF_SAMPLE_TIME:

    $ perf record -aT
    $ perf report -D | grep PERF_RECORD_

    3 5951915425 0x47530 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff8138c1a2 period: 215979 cpu:3
    3 5952026879 0x47588 [0x90]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff810cb480 period: 215979 cpu:3
    3 5952059959 0x47618 [0x38]: PERF_RECORD_FORK(6853:6853):(16811:16811)
    3 5952138878 0x47650 [0x78]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff811bac35 period: 431478 cpu:3
    3 5952375068 0x476c8 [0x30]: PERF_RECORD_COMM: find:6853
    3 5952395923 0x476f8 [0x50]: PERF_RECORD_MMAP 6853/6853: [0x400000(0x25000) @ 0]: /usr/bin/find
    3 5952413756 0x47748 [0xa0]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff810d080f period: 859332 cpu:3
    3 5952419837 0x477e8 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44600000(0x21d000) @ 0]: /lib64/ld-2.5.so
    3 5952437929 0x47840 [0x48]: PERF_RECORD_MMAP 6853/6853: [0x7fff7e1c9000(0x1000) @ 0x7fff7e1c9000]: [vdso]
    3 5952570127 0x47888 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f46200000(0x218000) @ 0]: /lib64/libselinux.so.1
    3 5952623637 0x478e0 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44a00000(0x356000) @ 0]: /lib64/libc-2.5.so
    3 5952675720 0x47938 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44e00000(0x204000) @ 0]: /lib64/libdl-2.5.so
    3 5952710080 0x47990 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f45a00000(0x246000) @ 0]: /lib64/libsepol.so.1
    3 5952847802 0x479e8 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff813897f0 period: 1142536 cpu:3

    First column is the cpu and the second the timestamp.

    That way we can investigate problems in the event stream.

    If the new perf binary is run on an older kernel, it will disable this feature
    automatically.

    Tested-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Acked-by: Ian Munsie
    Acked-by: Thomas Gleixner
    Cc: Frédéric Weisbecker
    Cc: Ian Munsie
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • At perf_session__process_event, so that we reduce the number of lines in eache
    tool sample processing routine that now receives a sample_data pointer already
    parsed.

    This will also be useful in the next patch, where we'll allow sample the
    identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu,
    timestamp) just after before every event.

    Also validate callchains in perf_session__process_event, i.e. as early as
    possible, and keep a counter of the number of events discarded due to invalid
    callchains, warning the user about it if it happens.

    There is an assumption that was kept that all events have the same sample_type,
    that will be dealt with in the future, when this preexisting limitation will be
    removed.

    Tested-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Acked-by: Ian Munsie
    Acked-by: Thomas Gleixner
    Cc: Frédéric Weisbecker
    Cc: Ian Munsie
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

04 Aug, 2010

1 commit

  • The event__process function is useful in processing /proc//maps. All of
    the functions that are called from event__process are defined in util/event.c.
    Though its defined in builtin-top.c, it could be reused for perf probe for
    uprobes. Hence moving it to util/event.c and exporting the function.

    LKML-Reference:
    Signed-off-by: Srikar Dronamraju
    Signed-off-by: Arnaldo Carvalho de Melo

    Srikar Dronamraju
     

05 Jun, 2010

1 commit


14 May, 2010

1 commit


11 May, 2010

1 commit

  • In cbbc79a we introduced support for multiple events by introducing a
    new "event_stat_id" struct and then made several perf_session methods
    receive a point to it instead of a pointer to perf_session, and kept the
    event_stats and hists rb_tree in perf_session.

    While working on the new newt based browser, I realised that it would be
    better to introduce a new class, "hists" (short for "histograms"),
    renaming the "event_stat_id" struct and the perf_session methods that
    were really "hists" methods, as they manipulate only struct hists
    members, not touching anything in the other perf_session members.

    Other optimizations, such as calculating the maximum lenght of a symbol
    name present in an hists instance will be possible as we add them,
    avoiding a re-traversal just for finding that information.

    The rationale for the name "hists" to replace "event_stat_id" is that we
    may have multiple sets of hists for the same event_stat id, as, for
    instance, the 'perf diff' tool has, so event stat id is not what
    characterizes what this struct and the functions that manipulate it do.

    Cc: Eric B Munson
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

09 May, 2010

1 commit

  • In order to provide a more rubust and deterministic reordering
    algorithm, we need to know when we reach a point where we just
    did a pass through over every counter buffers to read every thing
    they had.

    This patch introduces a new PERF_RECORD_FINISHED_ROUND pseudo event
    that only consist in an event header and doesn't need to contain
    anything.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Tom Zanussi
    Cc: Masami Hiramatsu

    Frederic Weisbecker
     

28 Apr, 2010

1 commit

  • struct kernel_info and kerninfo__ are too vague, what they really
    describe are machines, virtual ones or hosts.

    There are more changes to introduce helpers to shorten function calls
    and to make more clear what is really being done, but I left that for
    subsequent patches.

    Cc: Avi Kivity
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Zhang, Yanmin
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

19 Apr, 2010

1 commit


14 Apr, 2010

5 commits

  • Bypasses the build_id perf header code and replaces it with a
    synthesized event and processing function that accomplishes the
    same thing, used when reading/writing perf data to/from a pipe.

    Signed-off-by: Tom Zanussi
    Acked-by: Thomas Gleixner
    Cc: fweisbec@gmail.com
    Cc: rostedt@goodmis.org
    Cc: k-keiichi@bx.jp.nec.com
    Cc: acme@ghostprotocols.net
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     
  • Bypasses the tracing_data perf header code and replaces it with
    a synthesized event and processing function that accomplishes
    the same thing, used when reading/writing perf data to/from a
    pipe.

    The tracing data is pretty large, and this patch doesn't attempt
    to break it down into component events. The tracing_data event
    itself doesn't actually contain the tracing data, rather it
    arranges for the event processing code to skip over it after
    it's read, using the skip return value added to the event
    processing loop in a previous patch.

    Signed-off-by: Tom Zanussi
    Acked-by: Thomas Gleixner
    Cc: fweisbec@gmail.com
    Cc: rostedt@goodmis.org
    Cc: k-keiichi@bx.jp.nec.com
    Cc: acme@ghostprotocols.net
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     
  • Bypasses the event type perf header code and replaces it with a
    synthesized event and processing function that accomplishes the
    same thing, used when reading/writing perf data to/from a pipe.

    Signed-off-by: Tom Zanussi
    Acked-by: Thomas Gleixner
    Cc: fweisbec@gmail.com
    Cc: rostedt@goodmis.org
    Cc: k-keiichi@bx.jp.nec.com
    Cc: acme@ghostprotocols.net
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     
  • Bypasses the attr perf header code and replaces it with a
    synthesized event and processing function that accomplishes the
    same thing, used when reading/writing perf data to/from a pipe.

    Making the attrs into events allows them to be streamed over a
    pipe along with the rest of the header data (in later patches).
    It also paves the way to allowing events to be added and removed
    from perf sessions dynamically.

    Signed-off-by: Tom Zanussi
    Acked-by: Thomas Gleixner
    Cc: fweisbec@gmail.com
    Cc: rostedt@goodmis.org
    Cc: k-keiichi@bx.jp.nec.com
    Cc: acme@ghostprotocols.net
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     
  • This patch makes several changes to allow the perf event stream
    to be sent and received over a pipe:

    - adds pipe-specific versions of the header read/write code

    - adds pipe-specific version of the event processing code

    - adds a range of event types to be used for header or other
    pseudo events, above the range used by the kernel

    - checks the return value of event handlers, which they can use
    to skip over large events during event processing rather than actually
    reading them into event objects.

    - unifies the multiple do_read() functions and updates its
    users.

    Note that none of these changes affect the existing perf data
    file format or processing - this code only comes into play if
    perf output is sent to stdout (or is read from stdin).

    Signed-off-by: Tom Zanussi
    Acked-by: Thomas Gleixner
    Cc: fweisbec@gmail.com
    Cc: rostedt@goodmis.org
    Cc: k-keiichi@bx.jp.nec.com
    Cc: acme@ghostprotocols.net
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     

08 Apr, 2010

1 commit

  • Using 'pahole --packable' I found some structs that could be reorganized
    to eliminate alignment holes, in some cases getting them to be cacheline
    multiples.

    [acme@doppio linux-2.6-tip]$ codiff perf.old ~/bin/perf
    builtin-annotate.c:
    struct perf_session | -8
    struct perf_header | -8
    2 structs changed

    builtin-diff.c:
    struct sample_data | -8
    1 struct changed
    diff__process_sample_event | -8
    1 function changed, 8 bytes removed, diff: -8

    builtin-sched.c:
    struct sched_atom | -8
    1 struct changed

    builtin-timechart.c:
    struct per_pid | -8
    1 struct changed
    cmd_timechart | -16
    1 function changed, 16 bytes removed, diff: -16

    builtin-probe.c:
    struct perf_probe_point | -8
    struct perf_probe_event | -8
    2 structs changed
    opt_add_probe_event | -3
    1 function changed, 3 bytes removed, diff: -3

    util/probe-finder.c:
    struct probe_finder | -8
    1 struct changed
    find_kprobe_trace_events | -16
    1 function changed, 16 bytes removed, diff: -16

    /home/acme/bin/perf:
    4 functions changed, 43 bytes removed, diff: -43
    [acme@doppio linux-2.6-tip]$

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

10 Mar, 2010

1 commit


14 Jan, 2010

1 commit

  • We were always looking at the running machine /proc/modules,
    even when processing a perf.data file, which only makes sense
    when we're doing 'perf record' and 'perf report' on the same
    machine, and in close sucession, or if we don't use modules at
    all, right Peter? ;-)

    Now, at 'perf record' time we read /proc/modules, find the long
    path for modules, and put them as PERF_MMAP events, just like we
    did to encode the reloc reference symbol for vmlinux. Talking
    about that now it is encoded in .pgoff, so that we can use
    .{start,len} to store the address boundaries for the kernel so
    that when we reconstruct the kmaps tree we can do lookups right
    away, without having to fixup the end of the kernel maps like we
    did in the past (and now only in perf record).

    One more step in the 'perf archive' direction when we'll finally
    be able to collect data in one machine and analyse in another.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

13 Jan, 2010

2 commits

  • Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • DSOs don't have this problem because the kernel emits a
    PERF_MMAP for each new executable mapping it performs on
    monitored threads.

    To fix the kernel case we simulate the same behaviour, by having
    'perf record' to synthesize a PERF_MMAP for the kernel, encoded
    like this:

    [root@doppio ~]# perf record -a -f sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.344 MB perf.data (~15038 samples) ]
    [root@doppio ~]# perf report -D | head -10

    0xd0 [0x40]: event: 1
    .
    . ... raw event: size 64 bytes
    . 0000: 01 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 ......@........
    . 0010: 00 00 00 81 ff ff ff ff 00 00 00 00 00 00 00 00 ...............
    . 0020: 00 00 00 00 00 00 00 00 5b 6b 65 72 6e 65 6c 2e ........ [kernel
    . 0030: 6b 61 6c 6c 73 79 6d 73 2e 5f 74 65 78 74 5d 00 kallsyms._text]
    . 0xd0
    [0x40]: PERF_RECORD_MMAP 0/0: [0xffffffff81000000((nil)) @ (nil)]: [kernel.kallsyms._text]

    I.e. we identify such event as having:

    .pid = 0
    .filename = [kernel.kallsyms.REFNAME]
    .start = REFNAME addr in /proc/kallsyms at 'perf record' time

    and use now a hardcoded value of '.text' for REFNAME.

    Then, later, in 'perf report', if there are any kernel hits and
    thus we need to resolve kernel symbols, we search for REFNAME
    and if its address changed, relocation happened and we thus must
    change the kernel mapping routines to one that uses .pgoff as
    the relocation to apply.

    This way we use the same mechanism used for the other DSOs and
    don't have to do a two pass in all the kernel symbols.

    Reported-by: Xiao Guangrong
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: "H. Peter Anvin"
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Xiao Guangrong
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

28 Dec, 2009

1 commit


18 Dec, 2009

1 commit

  • Pekka Enberg reported weird percentages in perf report. It
    turns out we are overflowing a 32-bit variables in struct
    events_stats on 32-bit architectures.

    Before:

    [acme@ana linux-2.6-tip]$ perf report -i pekka.perf.data 2> /dev/null | head -10
    281.96% Xorg b710a561 [.] 0x000000b710a561
    140.15% Xorg [kernel] [k] __initramfs_end
    51.56% metacity libgobject-2.0.so.0.2000.1 [.] 0x00000000026e46
    35.12% evolution libcairo.so.2.10800.6 [.] 0x000000000203bd
    33.84% metacity libpthread-2.9.so [.] 0x00000000007a3d

    After:

    [acme@ana linux-2.6-tip]$ perf report -i pekka.perf.data 2> /dev/null | head -10
    30.04% Xorg b710a561 [.] 0x000000b710a561
    14.93% Xorg [kernel] [k] __initramfs_end
    5.49% metacity libgobject-2.0.so.0.2000.1 [.] 0x00000000026e46
    3.74% evolution libcairo.so.2.10800.6 [.] 0x000000000203bd
    3.61% metacity libpthread-2.9.so [.] 0x00000000007a3d

    Reported-by: Pekka Enberg
    Tested-by: Pekka Enberg
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

16 Dec, 2009

1 commit

  • Check build-id of vmlinux by using functions in symbol.c.
    This also exposes map__load() for getting vmlinux path,
    and removes vmlinux path list in builtin-probe.c,
    because symbol.c already has that. Checking build-id
    prevents users to open old or different debuginfo from
    current running kernel.

    Signed-off-by: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Jim Keniston
    Cc: Ananth N Mavinakayanahalli
    Cc: Christoph Hellwig
    Cc: Frank Ch. Eigler
    Cc: Jason Baron
    Cc: K.Prasad
    Cc: Peter Zijlstra
    Cc: Srikar Dronamraju
    Cc: systemtap
    Cc: DLE
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

15 Dec, 2009

1 commit