11 Mar, 2010

1 commit

  • At present, the perf subcommands that do system-wide monitoring
    (perf stat, perf record and perf top) don't work properly unless
    the online cpus are numbered 0, 1, ..., N-1. These tools ask
    for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
    and then try to create events for cpus 0, 1, ..., N-1.

    This creates problems for systems where the online cpus are
    numbered sparsely. For example, a POWER6 system in
    single-threaded mode (i.e. only running 1 hardware thread per
    core) will have only even-numbered cpus online.

    This fixes the problem by reading the /sys/devices/system/cpu/online
    file to find out which cpus are online. The code that does that is in
    tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
    function that sets up a cpumap[] array and returns the number of
    online cpus. If /sys/devices/system/cpu/online can't be read or
    can't be parsed successfully, it falls back to using sysconf to
    ask how many cpus are online and sets up an identity map in cpumap[].

    The perf record, perf stat and perf top code then calls
    read_cpu_map() in the system-wide monitoring case (instead of
    sysconf) and uses cpumap[] to get the cpu numbers to pass to
    perf_event_open.

    Signed-off-by: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

13 Jan, 2010

1 commit


15 Nov, 2009

1 commit

  • The ratio between the number of events and the time elapsed makes
    sense only if task-clock event is counted. Otherwise it will be
    simply a (confusing)

    # 0.000 M/sec

    This patch outputs the ratio only if task-clock event is counted.
    Some test examples of before and after:

    Before:

    [lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1

    Performance counter stats for 'sleep 1':

    1367818 branch-misses # 0.000 M/sec

    1.001494325 seconds time elapsed

    After (without task-clock):

    [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1

    Performance counter stats for 'sleep 1':

    1135044 branch-misses

    1.001370775 seconds time elapsed

    After (with task-clock):

    [lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1

    Performance counter stats for 'sleep 1':

    1070111 branch-misses # 0.534 M/sec
    2002.730893 task-clock-msecs # 1.999 CPUs

    1.001640292 seconds time elapsed

    Signed-off-by: Lucas De Marchi
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lucas De Marchi
     

19 Oct, 2009

4 commits

  • Count branches first, cache-misses second. The reason is that
    on x86 branches are not counted by all counters on all CPUs.

    Before:

    Performance counter stats for 'ls':

    0.756653 task-clock-msecs # 0.802 CPUs
    0 context-switches # 0.000 M/sec
    0 CPU-migrations # 0.000 M/sec
    250 page-faults # 0.330 M/sec
    2375725 cycles # 3139.781 M/sec
    1628129 instructions # 0.685 IPC
    19643 cache-references # 25.960 M/sec
    4608 cache-misses # 6.090 M/sec
    342532 branches # 452.694 M/sec
    branch-misses

    0.000943356 seconds time elapsed

    After:

    Performance counter stats for 'ls':

    1.056734 task-clock-msecs # 0.859 CPUs
    0 context-switches # 0.000 M/sec
    0 CPU-migrations # 0.000 M/sec
    259 page-faults # 0.245 M/sec
    3345932 cycles # 3166.295 M/sec
    3074090 instructions # 0.919 IPC
    616928 branches # 583.806 M/sec
    39279 branch-misses # 6.367 %
    21312 cache-references # 20.168 M/sec
    3661 cache-misses # 3.464 M/sec

    0.001230551 seconds time elapsed

    (also prettify the printout of branch misses, in case it's
    getting scaled.)

    Cc: Tim Blechmann
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    ---
    tools/perf/builtin-stat.c | 2 ++
    1 files changed, 2 insertions(+), 0 deletions(-)

    diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
    index c373683..95a55ea 100644
    --- a/tools/perf/builtin-stat.c
    +++ b/tools/perf/builtin-stat.c
    @@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
    { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
    { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
    { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

    };
    ---
    tools/perf/builtin-stat.c | 20 ++++++++++----------
    1 files changed, 10 insertions(+), 10 deletions(-)

    diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
    index 95a55ea..90e0a26 100644
    --- a/tools/perf/builtin-stat.c
    +++ b/tools/perf/builtin-stat.c
    @@ -50,17 +50,17 @@

    static struct perf_event_attr default_attrs[] = {

    - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
    - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES},
    - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
    - { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
    -
    - { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
    - { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
    - { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
    - { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
    - { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
    - { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
    + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
    + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
    + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
    + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
    +
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES },
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

    };

    Ingo Molnar
     
  • Clean up the array definition to be vertically aligned.

    No functional effects.

    Cc: Tim Blechmann
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    ---
    tools/perf/builtin-stat.c | 2 ++
    1 files changed, 2 insertions(+), 0 deletions(-)

    diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
    index c373683..95a55ea 100644
    --- a/tools/perf/builtin-stat.c
    +++ b/tools/perf/builtin-stat.c
    @@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
    { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
    { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
    { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
    + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

    };

    Ingo Molnar
     
  • Adds performance event information about branches
    and branch misses to the default output of perf stat.

    Signed-off-by: Tim Blechmann
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tim Blechmann
     
  • When we count both branches and branch-misses it is useful to
    print out the percentage of branch-misses:

    # perf stat -e branches -e branch-misses /bin/true

    Performance counter stats for '/bin/true':

    401684 branches # 0.000 M/sec
    23301 branch-misses # 5.801 %

    Signed-off-by: Anton Blanchard
    Cc: paulus@samba.org
    Cc: a.p.zijlstra@chello.nl
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

05 Oct, 2009

1 commit


22 Sep, 2009

1 commit

  • Before:

    0 sched:sched_switch # nan M/sec

    After:

    0 sched:sched_switch # 0.000 M/sec

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

05 Sep, 2009

1 commit


04 Sep, 2009

4 commits

  • Use the more advanced single pass variance algorithm outlined
    on the wikipedia page. This is numerically more stable for
    larger sample sets.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • When we're computing the mean by sampling the distribution,
    then the std dev of the mean is related to the std dev of the
    sample set by:

    stddev_mean = std_dev / sqrt(N)

    Which is exactly what we want.

    This results in the error on the mean decreasing with
    increasing number of samples.

    Also fix the scaled == -1, aka not counted case.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since we don't need all the individual samples to calculate the
    error remove both the limit and the storage overhead associated
    with that.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The current noise computation does:

    \Sum abs(n_i - avg(n)) * N^-1.5

    Which is (afaik) not a regular noise function, and needs the
    complete sample set available to post-process.

    Change this to use a regular stddev computation which can be
    done by keeping a two sums:

    stddev = sqrt( 1/N (\Sum n_i^2) - avg(n)^2 )

    For which we only need to keep \Sum n_i and \Sum n_i^2.

    Signed-off-by: Peter Zijlstra
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Aug, 2009

1 commit

  • Librarize trace_event() helper so that perf trace can use it
    too. Also clean up the debug.h includes a bit.

    It's not good to have it included in perf.h because it doesn't
    make it flexible against other headers it may need (headers
    that can also depend on perf.h and then create a recursive
    header dependency).

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

12 Aug, 2009

1 commit

  • Factorize multiple definitions of high level dso helpers into the
    symbol source file.

    The side effect is a general export of the verbose and eprintf
    debugging helpers into a new file dedicated to debugging purposes.

    Signed-off-by: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Brice Goglin

    Frederic Weisbecker
     

09 Aug, 2009

1 commit


23 Jul, 2009

1 commit

  • perf stat and perf record currently look for all options on the command
    line. This can lead to some confusion:

    # perf stat ls -l
    Error: unknown switch `l'

    While we can work around this by adding '--' before the command, the git
    option parsing code can stop at the first non option:

    # perf stat ls -l
    Performance counter stats for 'ls -l':
    ....

    Signed-off-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    Anton Blanchard
     

02 Jul, 2009

1 commit

  • Building builtin-stat.c reports the following errors:

    cc1: warnings being treated as errors
    builtin-stat.c: In function ‘run_perf_stat’:
    builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
    builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
    make: *** [builtin-stat.o] Erreur 1

    This patch handles the possible pipe read failures.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

01 Jul, 2009

2 commits


30 Jun, 2009

3 commits

  • This provides a way to mark a counter to be enabled on the next
    exec. This is useful for measuring the total activity of a
    program without including overhead from the process that
    launches it.

    This also changes the perf stat command to use this new
    facility.

    Signed-off-by: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Vince Weaver reported a 'perf stat' measurement overhead in the
    count of retired instructions, which can amount to a +6000
    instructions inflated count in the reported count.

    At present, perf stat creates its counters on the perf process. Thus
    the counters count the fork and various other activity in both the
    parent and child, such as the resolver overhead for resolving PLT
    entries for any libc functions that haven't been called before, such
    as execvp.

    This reduces the overhead by creating the counters on the child process
    after the fork, using a couple of pipes to synchronize so that the
    child process waits until the parent has created the counters before
    doing the exec. To eliminate the PLT resolution overhead on calling
    execvp, this does a dummy execvp first which will always fail.

    With this, the overhead of executing a program goes down from over
    4800 instructions to about 90 instructions on powerpc (32-bit).
    This was measured with a statically-linked program written in
    assembler which only does the 3 instructions needed to call _exit(0).

    Before:

    $ perf stat -e 0:1:u ./three

    Performance counter stats for './three':

    4858 instructions

    0.001274523 seconds time elapsed

    After:

    $ perf stat -e 0:1:u ./three

    Performance counter stats for './three':

    92 instructions

    0.000468153 seconds time elapsed

    Reported-by: Vince Weaver
    Signed-off-by: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Peter expressed a strong preference for percentage based
    display of scaled values - so revert to that from the
    recently introduced multiplication-factor unit.

    Reported-by: Peter Zijlstra
    Cc: Jaswinder Singh Rajput
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Jun, 2009

2 commits

  • Set attrs and nr_counters if no event is selected and !null_run.

    Setting of attrs should depend on number of counters,
    so we need to memcpy only for sizeof(default_attrs)

    Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
    hardcoded value.

    Signed-off-by: Jaswinder Singh Rajput
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jaswinder Singh Rajput
     
  • Increase size for event name to handle bigger names like
    'L1-d$-prefetch-misses'

    Changed scaled counters from percentage to a multiplicative
    factor because the latter is more expressive.

    Also aligned the scaling factor, otherwise sometimes it looks
    like:

    384 iTLB-load-misses (4.74x scaled)
    452029 branch-loads (8.00x scaled)
    5892 branch-load-misses (20.39x scaled)
    972315 iTLB-loads (3.24x scaled)

    Before:
    150708 L1-d$-stores (scaled from 23.57%)
    428804 L1-d$-prefetches (scaled from 23.47%)
    314446 L1-d$-prefetch-misses (scaled from 23.42%)
    252626137 L1-i$-loads (scaled from 23.24%)
    5297550 dTLB-load-misses (scaled from 23.96%)
    106992392 branch-loads (scaled from 23.67%)
    5239561 branch-load-misses (scaled from 23.43%)

    After:
    1731713 L1-d$-loads ( 14.25x scaled)
    44241 L1-d$-prefetches ( 3.88x scaled)
    21076 L1-d$-prefetch-misses ( 3.40x scaled)
    5789421 L1-i$-loads ( 3.78x scaled)
    29645 dTLB-load-misses ( 2.95x scaled)
    461474 branch-loads ( 6.52x scaled)
    7493 branch-load-misses ( 26.57x scaled)

    Reported-by: Ingo Molnar
    Signed-off-by: Jaswinder Singh Rajput
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jaswinder Singh Rajput
     

27 Jun, 2009

2 commits

  • In multi-run (-r/--repeat) printouts, print out the noise of
    the wall-clock average as well.

    Also, fix a bug in printing out scaled counters: if it was not
    scaled then we should not update the average with -1.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Allow a no-counters run. This can be useful to measure just
    elapsed wall-clock time - or to assess the raw overhead of perf
    stat itself, without running any counters.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

24 Jun, 2009

2 commits

  • Remove dead code and do some code alignment.

    Signed-off-by: Jaswinder Singh Rajput
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jaswinder Singh Rajput
     
  • Error message should use stderr for verbose (-v), otherwise
    message will be lost for:

    $ ./perf stat -v > /dev/null

    For example on AMD bus-cycles event is not available so now
    it looks like:

    $ ./perf stat -v -e bus-cycles ls > /dev/null
    Error: counter 0, sys_perf_counter_open() syscall returned with -1 (Invalid argument)

    Performance counter stats for 'ls':

    bus-cycles

    0.006765877 seconds time elapsed.

    Signed-off-by: Jaswinder Singh Rajput
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jaswinder Singh Rajput
     

20 Jun, 2009

1 commit

  • On 64-bit powerpc, __u64 is defined to be unsigned long rather than
    unsigned long long. This causes compiler warnings every time we
    print a __u64 value with %Lx.

    Rather than changing __u64, we define our own u64 to be unsigned long
    long on all architectures, and similarly s64 as signed long long.
    For consistency we also define u32, s32, u16, s16, u8 and s8. These
    definitions are put in a new header, types.h, because these definitions
    are needed in util/string.h and util/symbol.h.

    The main change here is the mechanical change of __[us]{64,32,16,8}
    to remove the "__". The other changes are:

    * Create types.h
    * Include types.h in perf.h, util/string.h and util/symbol.h
    * Add types.h to the LIB_H definition in Makefile
    * Added (u64) casts in process_overflow_event() and print_sym_table()
    to kill two remaining warnings.

    Signed-off-by: Paul Mackerras
    Acked-by: Peter Zijlstra
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

13 Jun, 2009

3 commits

  • If -vv (very verbose) is specified, print out raw data
    in the following format:

    $ perf stat -vv -r 3 ./loop_1b_instructions

    [ perf stat: executing run #1 ... ]
    [ perf stat: executing run #2 ... ]
    [ perf stat: executing run #3 ... ]

    debug: runtime[0]: 235871872
    debug: walltime[0]: 236646752
    debug: runtime_cycles[0]: 755150182
    debug: counter/0[0]: 235871872
    debug: counter/1[0]: 235871872
    debug: counter/2[0]: 235871872
    debug: scaled[0]: 0
    debug: counter/0[1]: 2
    debug: counter/1[1]: 235870662
    debug: counter/2[1]: 235870662
    debug: scaled[1]: 0
    debug: counter/0[2]: 1
    debug: counter/1[2]: 235870437
    debug: counter/2[2]: 235870437
    debug: scaled[2]: 0
    debug: counter/0[3]: 140
    debug: counter/1[3]: 235870298
    debug: counter/2[3]: 235870298
    debug: scaled[3]: 0
    debug: counter/0[4]: 755150182
    debug: counter/1[4]: 235870145
    debug: counter/2[4]: 235870145
    debug: scaled[4]: 0
    debug: counter/0[5]: 1001411258
    debug: counter/1[5]: 235868838
    debug: counter/2[5]: 235868838
    debug: scaled[5]: 0
    debug: counter/0[6]: 27897
    debug: counter/1[6]: 235868560
    debug: counter/2[6]: 235868560
    debug: scaled[6]: 0
    debug: counter/0[7]: 2910
    debug: counter/1[7]: 235868151
    debug: counter/2[7]: 235868151
    debug: scaled[7]: 0
    debug: runtime[0]: 235980257
    debug: walltime[0]: 236770942
    debug: runtime_cycles[0]: 755114546
    debug: counter/0[0]: 235980257
    debug: counter/1[0]: 235980257
    debug: counter/2[0]: 235980257
    debug: scaled[0]: 0
    debug: counter/0[1]: 3
    debug: counter/1[1]: 235980049
    debug: counter/2[1]: 235980049
    debug: scaled[1]: 0
    debug: counter/0[2]: 1
    debug: counter/1[2]: 235979907
    debug: counter/2[2]: 235979907
    debug: scaled[2]: 0
    debug: counter/0[3]: 135
    debug: counter/1[3]: 235979780
    debug: counter/2[3]: 235979780
    debug: scaled[3]: 0
    debug: counter/0[4]: 755114546
    debug: counter/1[4]: 235979652
    debug: counter/2[4]: 235979652
    debug: scaled[4]: 0
    debug: counter/0[5]: 1001439771
    debug: counter/1[5]: 235979304
    debug: counter/2[5]: 235979304
    debug: scaled[5]: 0
    debug: counter/0[6]: 23723
    debug: counter/1[6]: 235979050
    debug: counter/2[6]: 235979050
    debug: scaled[6]: 0
    debug: counter/0[7]: 2213
    debug: counter/1[7]: 235978820
    debug: counter/2[7]: 235978820
    debug: scaled[7]: 0
    debug: runtime[0]: 235888002
    debug: walltime[0]: 236700533
    debug: runtime_cycles[0]: 754881504
    debug: counter/0[0]: 235888002
    debug: counter/1[0]: 235888002
    debug: counter/2[0]: 235888002
    debug: scaled[0]: 0
    debug: counter/0[1]: 2
    debug: counter/1[1]: 235887793
    debug: counter/2[1]: 235887793
    debug: scaled[1]: 0
    debug: counter/0[2]: 1
    debug: counter/1[2]: 235887645
    debug: counter/2[2]: 235887645
    debug: scaled[2]: 0
    debug: counter/0[3]: 135
    debug: counter/1[3]: 235887499
    debug: counter/2[3]: 235887499
    debug: scaled[3]: 0
    debug: counter/0[4]: 754881504
    debug: counter/1[4]: 235887368
    debug: counter/2[4]: 235887368
    debug: scaled[4]: 0
    debug: counter/0[5]: 1001401731
    debug: counter/1[5]: 235887024
    debug: counter/2[5]: 235887024
    debug: scaled[5]: 0
    debug: counter/0[6]: 24212
    debug: counter/1[6]: 235886786
    debug: counter/2[6]: 235886786
    debug: scaled[6]: 0
    debug: counter/0[7]: 1824
    debug: counter/1[7]: 235886560
    debug: counter/2[7]: 235886560
    debug: scaled[7]: 0

    Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):

    235.913377 task-clock-msecs # 0.997 CPUs ( +- 0.011% )
    2 context-switches # 0.000 M/sec ( +- 0.000% )
    1 CPU-migrations # 0.000 M/sec ( +- 0.000% )
    136 page-faults # 0.001 M/sec ( +- 0.730% )
    755048744 cycles # 3200.534 M/sec ( +- 0.009% )
    1001417586 instructions # 1.326 IPC ( +- 0.001% )
    25277 cache-references # 0.107 M/sec ( +- 3.988% )
    2315 cache-misses # 0.010 M/sec ( +- 9.845% )

    0.236706075 seconds time elapsed.

    This allows the summary stats to be validated.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Add the --repeat feature to perf stat, which repeats a given
    command up to a 100 times, collects the stats and calculates an
    average and a stddev.

    For example, the following oneliner 'perf stat' command runs hackbench
    5 times and prints a tabulated result of all metrics, with averages
    and noise levels (in percentage) printed:

    aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
    Time: 0.117
    Time: 0.108
    Time: 0.089
    Time: 0.088
    Time: 0.100

    Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

    1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% )
    47706 context-switches # 0.038 M/sec ( +- 19.706% )
    387 CPU-migrations # 0.000 M/sec ( +- 3.608% )
    17793 page-faults # 0.014 M/sec ( +- 0.354% )
    3770941606 cycles # 3031.329 M/sec ( +- 4.621% )
    1566372416 instructions # 0.415 IPC ( +- 2.703% )
    16783421 cache-references # 13.492 M/sec ( +- 5.202% )
    7128590 cache-misses # 5.730 M/sec ( +- 7.420% )

    0.118924455 seconds time elapsed.

    The goal of this feature is to allow the reliance on these accurate
    statistics and to know how many times a command has to be repeated
    for the noise to go down to an acceptable level.

    (The -v option can be used to see a line printed out as each run progresses.)

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • - use IPC for the instruction normalization output
    - CPUs for the CPU utilization factor value.
    - print out time elapsed like the other rows
    - tidy up the task-clocks/cpu-clocks printout

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

11 Jun, 2009

1 commit


10 Jun, 2009

1 commit

  • Currently report and stat catch SIGINT (and others) without altering
    their exit state. This means that things like:

    while :; do perf stat ./foo ; done

    Loops become hard-to-interrupt, because bash never sees perf terminate
    due to interruption. Fix this.

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Jun, 2009

1 commit

  • Before:

    7549326754 cycles # 3201.811 M/sec
    10007594937 instructions # 4244.408 M/sec

    After:

    7542051194 cycles # 3201.996 M/sec
    10007743852 instructions # 4248.811 M/sec # 1.327 per cycle

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

07 Jun, 2009

2 commits

  • Before:

    $ perf stat ~/hackbench 5

    error: syscall returned with -1 (No such device)

    After:

    $ perf stat ~/hackbench 5
    Time: 1.640

    Performance counter stats for '/home/mingo/hackbench 5':

    6524.570382 task-clock-ticks # 3.838 CPU utilization factor
    35704 context-switches # 0.005 M/sec
    191 CPU-migrations # 0.000 M/sec
    8958 page-faults # 0.001 M/sec
    cycles
    instructions
    cache-references
    cache-misses

    Wall-clock time elapsed: 1699.999995 msecs

    Also add -v (--verbose) option to allow the printing of failed
    counter opens.

    Plus dont print 'inf' if wall-time is zero (due to jiffies granularity),
    instead skip the printing of the CPU utilization factor.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Several people have suggested that 'perf' has become a full-fledged
    tool that should be moved out of Documentation/. Move it to the
    (new) tools/ directory.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar