24 Dec, 2011

1 commit

  • The default input file for perf report is not handled the same way as
    perf record does it for its output file. This leads to unexpected
    behavior of perf report, etc. E.g.:

    # perf record -a -e cpu-cycles sleep 2 | perf report | cat
    failed to open perf.data: No such file or directory (try 'perf record' first)

    While perf record writes to a fifo, perf report expects perf.data to be
    read. This patch changes this to accept fifos as input file.

    Applies to the following commands:

    perf annotate
    perf buildid-list
    perf evlist
    perf kmem
    perf lock
    perf report
    perf sched
    perf script
    perf timechart

    Also fixes char const* -> const char* type declaration for filename
    strings.

    v2:
    * Prevent potential null pointer access to input_name in
    builtin-report.c. Needed due to removal of patch "perf report: Setup
    browser if stdout is a pipe"

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1323248577-11268-5-git-send-email-robert.richter@amd.com
    Signed-off-by: Robert Richter
    Signed-off-by: Arnaldo Carvalho de Melo

    Robert Richter
     

29 Nov, 2011

1 commit

  • Since we already ask for PERF_SAMPLE_ID and use it to quickly find the
    associated evsel, add handler func + data to struct perf_evsel to avoid
    using chains of if(strcmp(event_name)) and also to avoid all the linear
    list searches via trace_event_find.

    To demonstrate the technique convert 'perf sched' to it:

    # perf sched record sleep 5m

    And then:

    Performance counter stats for '/tmp/oldperf sched lat':

    646.929438 task-clock # 0.999 CPUs utilized
    9 context-switches # 0.000 M/sec
    0 CPU-migrations # 0.000 M/sec
    20,901 page-faults # 0.032 M/sec
    1,290,144,450 cycles # 1.994 GHz
    stalled-cycles-frontend
    stalled-cycles-backend
    1,606,158,439 instructions # 1.24 insns per cycle
    339,088,395 branches # 524.151 M/sec
    4,550,735 branch-misses # 1.34% of all branches

    0.647524759 seconds time elapsed

    Versus:

    Performance counter stats for 'perf sched lat':

    473.564691 task-clock # 0.999 CPUs utilized
    9 context-switches # 0.000 M/sec
    0 CPU-migrations # 0.000 M/sec
    20,903 page-faults # 0.044 M/sec
    944,367,984 cycles # 1.994 GHz
    stalled-cycles-frontend
    stalled-cycles-backend
    1,442,385,571 instructions # 1.53 insns per cycle
    308,383,106 branches # 651.195 M/sec
    4,481,784 branch-misses # 1.45% of all branches

    0.474215751 seconds time elapsed

    [root@emilia ~]#

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-1kbzpl74lwi6lavpqke2u2p3@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

28 Nov, 2011

4 commits

  • To better reflect that it became the base class for all tools, that must
    be in each tool struct and where common stuff will be put.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-qgpc4msetqlwr8y2k7537cxe@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Reducing the exposure of perf_session further, so that we can use the
    classes in cases where no perf.data file is created.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-stua66dcscsezzrcdugvbmvd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • So that we don't need to have that many globals.

    Next steps will remove the 'session' pointer, that in most cases is
    not needed.

    Then we can rename perf_event_ops to 'perf_tool' that better describes
    this class hierarchy.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Eventually session->sample_type will go away as we want to support
    multiple sample types per session, so use it from the evsel which is a
    step in that direction.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-0vwdpjcwbjezw459lw5n3ew1@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

10 Aug, 2011

2 commits

  • The 'perf sched' command usage still showing 'trace' command instead of
    the 'script' command.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110809124651.GD2056@jolsa.brq.redhat.com
    Signed-off-by: Jiri Olsa
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • The session object is released prematurely when processing events for
    latency command. The session's thread objects are used within the
    output_lat_thread function.

    Runnning following commands:

    # perf sched record
    # perf sched latency

    the latter displays incorrect data and might cause access violation.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1312837414-3819-1-git-send-email-jolsa@redhat.com
    Signed-off-by: Jiri Olsa
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

24 Mar, 2011

1 commit

  • Resolving the sample->id to an evsel since the most advanced tools,
    report and annotate, and the others will too when they evolve to
    properly support multi-event perf.data files.

    Good also because it does an extra validation, checking that the ID is
    valid when present. When that is not the case, the overhead is just a
    branch + function call (perf_evlist__id2evsel).

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

07 Feb, 2011

1 commit

  • GCC 4.6.0 in Fedora rawhide turned up some compile errors in tools/perf
    due to the -Werror=unused-but-set-variable flag.

    I've gone through and annotated some of the assignments that had side
    effects (ie: return value from a function) with the __used annotation,
    and in some cases, just removed unused code.

    In a few cases, we were assigning something useful, but not using it in
    later parts of the function.

    kyle@dreadnought:~/src% gcc --version
    gcc (GCC) 4.6.0 20110122 (Red Hat 4.6.0-0.3)

    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Kyle McMartin
    [ committer note: Fixed up the annotation fixes, as that code moved recently ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Kyle McMartin
     

30 Jan, 2011

2 commits


23 Jan, 2011

1 commit

  • Using %L[uxd] has issues in some architectures, like on ppc64. Fix it
    by making our 64 bit integers typedefs of stdint.h types and using
    PRI[ux]64 like, for instance, git does.

    Reported by Denis Kirjanov that provided a patch for one case, I went
    and changed all cases.

    Reported-by: Denis Kirjanov
    Tested-by: Denis Kirjanov
    LKML-Reference:
    Cc: Denis Kirjanov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Pingtian Han
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

13 Jan, 2011

1 commit


11 Jan, 2011

1 commit

  • on ppc64:
    /usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 131072

    therefore following set of commands:

    gives:
    perf.2.6.37test: builtin-sched.c:493: create_tasks: Assertion `!(err)' failed.

    So make sure we do not set stack size lower than PTHREAD_STACK_MIN.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Jiri Pirko
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Pirko
     

10 Jan, 2011

1 commit


22 Dec, 2010

1 commit

  • If we are running the new perf on an old kernel without support for
    sample_id_all, we should fall back to the old unordered processing of
    events. If we didn't than we would *always* process events without
    timestamps out of order, whether or not we hit a reordering race. In
    other words, instead of there being a chance of not attributing samples
    correctly, we would guarantee that samples would not be attributed.

    While processing all events without timestamps before events with
    timestamps may seem like an intuitive solution, it falls down as
    PERF_RECORD_EXIT events would also be processed before any samples.
    Even with a workaround for that case, samples before/after an exec would
    not be attributed correctly.

    This patch allows commands to indicate whether they need to fall back to
    unordered processing, so that commands that do not care about timestamps
    on every event will not be affected. If we do fallback, this will print
    out a warning if report -D was invoked.

    This patch adds the test in perf_session__new so that we only need to
    test once per session. Commands that do not use an event_ops (such as
    record and top) can simply pass NULL in it's place.

    Acked-by: Thomas Gleixner
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ian Munsie
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian Munsie
     

06 Dec, 2010

1 commit

  • There were a few stray calloc()'s and malloc()'s which were not having
    their return values checked for success.

    As the calling code either already coped with failure or didn't actually
    care we just return -ENOMEM at that point.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Signed-off-by: Chris Samuel
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Chris Samuel
     

05 Dec, 2010

1 commit

  • At perf_session__process_event, so that we reduce the number of lines in eache
    tool sample processing routine that now receives a sample_data pointer already
    parsed.

    This will also be useful in the next patch, where we'll allow sample the
    identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu,
    timestamp) just after before every event.

    Also validate callchains in perf_session__process_event, i.e. as early as
    possible, and keep a counter of the number of events discarded due to invalid
    callchains, warning the user about it if it happens.

    There is an assumption that was kept that all events have the same sample_type,
    that will be dealt with in the future, when this preexisting limitation will be
    removed.

    Tested-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Acked-by: Ian Munsie
    Acked-by: Thomas Gleixner
    Cc: Frédéric Weisbecker
    Cc: Ian Munsie
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

17 Nov, 2010

1 commit


01 Jun, 2010

1 commit

  • perf sched uses event__process_comm(), which means it can resolve
    comms from:

    - tasks that have exec'ed (kernel comm events)
    - tasks that were running when perf record started the actual
    recording (synthetized comm events)

    But perf sched can't resolve the pids of tasks that were created
    after the recording started.

    To solve this, we need to inherit the comms on fork events using
    event__process_task().

    This fixes various unresolved pids in perf sched, easily visible
    with:
    perf sched record perf bench sched messaging

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Tom Zanussi
    Cc: Stephane Eranian

    Frederic Weisbecker
     

18 May, 2010

2 commits

  • OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these
    tools is to set something that has an enum type, that is builtin
    compatible with unsigned int.

    Several string constifications were done to make OPT_STRING require a
    const char * type.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • To avoid problems like the one fixed by Stephane Eranian in 3de29ca, now
    we'll got this instead:

    bench/sched-messaging.c:259: error: negative width in bit-field ‘’
    bench/sched-messaging.c:261: error: negative width in bit-field ‘’

    Which is rather cryptic, but is how BUILD_BUG_ON_ZERO works, so kernel
    hackers should be already used to this.

    With it in place found some problems, fixed by changing the affected
    variables to sensible types or changed some OPT_INTEGER to OPT_UINTEGER.

    Next csets will go thru converting each of the remaining OPT_ so that
    review can be made easier by grouping changes per type per patch.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 May, 2010

1 commit

  • The events_stats.total field is too generic, rename it to .total_period,
    and also add a comment explaining that it is the sum of all the .period
    fields in samples, that is needed because we use auto-freq to avoid
    sampling artifacts.

    Ditto for events_stats.lost, that is the sum of all lost_event.lost
    fields, i.e. the number of events the kernel dropped.

    Looking at the users, builtin-sched.c can make use of these fields and
    stop doing it again.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

03 May, 2010

1 commit

  • Currently, perf 'live mode' writes build-ids at the end of the
    session, which isn't actually useful for processing live mode events.

    What would be better would be to have the build-ids sent before any of
    the samples that reference them, which can be done by processing the
    event stream and retrieving the build-ids on the first hit. Doing
    that in perf-record itself, however, is off-limits.

    This patch introduces perf-inject, which does the same job while
    leaving perf-record untouched. Normal mode perf still records the
    build-ids at the end of the session as it should, but for live mode,
    perf-inject can be injected in between the record and report steps
    e.g.:

    perf record -o - ./hackbench 10 | perf inject -v -b | perf report -v -i -

    perf-inject reads a perf-record event stream and repipes it to stdout.
    At any point the processing code can inject other events into the
    event stream - in this case build-ids (-b option) are read and
    injected as needed into the event stream.

    Build-ids are just the first user of perf-inject - potentially
    anything that needs userspace processing to augment the trace stream
    with additional information could make use of this facility.

    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Frédéric Weisbecker
    LKML-Reference:
    Signed-off-by: Tom Zanussi
    Signed-off-by: Arnaldo Carvalho de Melo

    Tom Zanussi
     

24 Apr, 2010

1 commit

  • Use the new generic sample events reordering from perf sched,
    this drops the need of multiplexing the buffers on record time,
    improving the scalability of perf sched.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Hitoshi Mitake
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Tom Zanussi

    Frederic Weisbecker
     

14 Apr, 2010

1 commit

  • Parsing an option from the command line with OPT_BOOLEAN on a
    bool data type would not work on a big-endian machine due to the
    manner in which the boolean was being cast into an int and
    incremented. For example, running 'perf probe --list' on a
    PowerPC machine would fail to properly set the list_events bool
    and would therefore print out the usage information and
    terminate.

    This patch makes OPT_BOOLEAN work as expected with a bool
    datatype. For cases where the original OPT_BOOLEAN was
    intentionally being used to increment an int each time it was
    passed in on the command line, this patch introduces OPT_INCR
    with the old behaviour of OPT_BOOLEAN (the verbose variable is
    currently the only such example of this).

    I have reviewed every use of OPT_BOOLEAN to verify that a true
    C99 bool was passed. Where integers were used, I verified that
    they were only being used for boolean logic and changed them to
    bools to ensure that they would not be mistakenly used as ints.
    The major exception was the verbose variable which now uses
    OPT_INCR instead of OPT_BOOLEAN.

    Signed-off-by: Ian Munsie
    Acked-by: David S. Miller
    Cc: # NOTE: wont apply to .3[34].x cleanly, please backport
    Cc: Git development list
    Cc: Ian Munsie
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: KOSAKI Motohiro
    Cc: Hitoshi Mitake
    Cc: Rusty Russell
    Cc: Frederic Weisbecker
    Cc: Eric B Munson
    Cc: Valdis.Kletnieks@vt.edu
    Cc: WANG Cong
    Cc: Thiago Farina
    Cc: Masami Hiramatsu
    Cc: Xiao Guangrong
    Cc: Jaswinder Singh Rajput
    Cc: Arjan van de Ven
    Cc: OGAWA Hirofumi
    Cc: Mike Galbraith
    Cc: Tom Zanussi
    Cc: Anton Blanchard
    Cc: John Kacur
    Cc: Li Zefan
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ian Munsie
     

08 Apr, 2010

1 commit

  • Using 'pahole --packable' I found some structs that could be reorganized
    to eliminate alignment holes, in some cases getting them to be cacheline
    multiples.

    [acme@doppio linux-2.6-tip]$ codiff perf.old ~/bin/perf
    builtin-annotate.c:
    struct perf_session | -8
    struct perf_header | -8
    2 structs changed

    builtin-diff.c:
    struct sample_data | -8
    1 struct changed
    diff__process_sample_event | -8
    1 function changed, 8 bytes removed, diff: -8

    builtin-sched.c:
    struct sched_atom | -8
    1 struct changed

    builtin-timechart.c:
    struct per_pid | -8
    1 struct changed
    cmd_timechart | -16
    1 function changed, 16 bytes removed, diff: -16

    builtin-probe.c:
    struct perf_probe_point | -8
    struct perf_probe_event | -8
    2 structs changed
    opt_add_probe_event | -3
    1 function changed, 3 bytes removed, diff: -3

    util/probe-finder.c:
    struct probe_finder | -8
    1 struct changed
    find_kprobe_trace_events | -16
    1 function changed, 16 bytes removed, diff: -16

    /home/acme/bin/perf:
    4 functions changed, 43 bytes removed, diff: -43
    [acme@doppio linux-2.6-tip]$

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

16 Jan, 2010

1 commit

  • Since they can come from another architecture with bigger
    pointers, i.e. processing a 64-bit perf.data on a 32-bit arch.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

28 Dec, 2009

3 commits


16 Dec, 2009

1 commit


15 Dec, 2009

1 commit


14 Dec, 2009

6 commits

  • As we'll need to sort multiple times for multiple perf sessions,
    so that we can then do a diff.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • There is still some more work to do to disentangle map creation
    from DSO loading, but this happens only for the kernel, and for
    the early adopters of perf diff, where this disentanglement
    matters most, we'll be testing different kernels, so no problem
    here.

    Further clarification: right now we create the kernel maps for
    the various modules and discontiguous kernel text maps when
    loading the DSO, we should do it as a two step process, first
    creating the maps, for multiple mappings with the same DSO
    store, then doing the dso load just once, for the first hit on
    one of the maps sharing this DSO backing store.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • So that we can process two perf.data files.

    We still need to add a O_MMAP mode for perf_session so that we
    can do all the mmap stuff in it.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • By having the cwd/cwdlen in the perf_session struct and
    full_paths in perf_event_ops.

    Now its just a matter of passing the ops.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • No need for all tools to register it and then immediately call
    perf_session__process_events.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Pass the event_ops to perf_session__process_events instead.

    Also move the event_ops definition to session.h, starting to
    move things around to their right place, trimming the many
    unneeded headers we have.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo