17 Mar, 2012

1 commit

  • When event group is enabled for forked task (i.e. no target task was
    specified) all events were disabled and marked ->enable_on_exec.
    However they are not counted at all since only group leader will be
    enabled on exec actually. So the result looked like below:

    $ ./perf stat --group -- sleep 1

    Performance counter stats for 'sleep 1':

    0.554926 task-clock # 0.001 CPUs utilized
    context-switches
    CPU-migrations
    page-faults
    cycles
    stalled-cycles-frontend
    stalled-cycles-backend
    instructions
    branches
    branch-misses

    1.001228093 seconds time elapsed

    Fix it by disabling group leader only.

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

14 Feb, 2012

1 commit

  • Allow a user to collect events for multiple threads or processes
    using a comma separated list.

    e.g., collect data on a VM and its vhost thread:
    perf top -p 21483,21485
    perf stat -p 21483,21485 -ddd
    perf record -p 21483,21485

    or monitoring vcpu threads
    perf top -t 21488,21489
    perf stat -t 21488,21489 -ddd
    perf record -t 21488,21489

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1328718772-16688-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

07 Feb, 2012

2 commits

  • The output of cpu-clock event is controlled in nsec_printout(),
    but its alignment was broken:

    Performance counter stats for 'sleep 1':

    6,038,774 instructions # 0.00 insns per cycle
    180 faults # 0.007 K/sec [99.95%]
    1,282,201 branches # 0.053 M/sec [99.84%]
    24126.221811 cpu-clock [99.62%]
    24121.689540 task-clock # 24.098 CPUs utilized [99.52%]

    1.001001017 seconds time elapsed

    This patch fixes this:

    Performance counter stats for 'sleep 1':

    13,540,843 instructions # 0.00 insns per cycle
    180 faults # 0.007 K/sec [99.94%]
    2,875,386 branches # 0.119 M/sec [99.82%]
    24144.221137 cpu-clock [99.61%]
    24133.515366 task-clock # 24.109 CPUs utilized [99.52%]

    1.001020946 seconds time elapsed

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1328514285-26232-2-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • The default 'M/sec' unit is not useful if the result is small enough.

    Adjust it dynamically according to the value.

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1328514285-26232-1-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

25 Jan, 2012

1 commit

  • The new --uid command line option will show only the tasks for a given
    user, using the proc interface to figure out the existing tasks.

    Kernel work is needed to close races at startup, but this should already
    be useful in many use cases.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-bdnspm000gw2l984a2t53o8z@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

04 Jan, 2012

1 commit

  • The get_ratio_color() returns appropriate color string based on @ratio.
    It helps reducing code duplication.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1325000151-4463-2-git-send-email-namhyung@gmail.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

21 Dec, 2011

1 commit


06 Dec, 2011

1 commit

  • perf stat is failing on PowerPC:

    Error: open_counter returned with 95 (Operation not supported). /bin/dmesg may provide additional information.

    Fatal: Not all events could be opened.

    commit 370faf1dd046 (perf stat: Fail softly on unsupported events)
    added a check for failure returning ENOENT, but the POWER backend
    returns EOPNOTSUPP. It looks like alpha, blackfin and mips do the
    same.

    With the patch applied, things work as expected:

    Performance counter stats for '/bin/true':

    0.362176 task-clock # 0.623 CPUs utilized
    0 context-switches # 0.000 M/sec
    0 CPU-migrations # 0.000 M/sec
    28 page-faults # 0.077 M/sec
    1,677,020 cycles # 4.630 GHz
    stalled-cycles-frontend
    stalled-cycles-backend
    431,220 instructions # 0.26 insns per cycle
    101,889 branches # 281.325 M/sec
    4,145 branch-misses # 4.07% of all branches

    0.000581361 seconds time elapsed

    Cc: # 3.0+
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111202093833.5fef7226@kryten
    Signed-off-by: Anton Blanchard
    Signed-off-by: Arnaldo Carvalho de Melo

    Anton Blanchard
     

29 Nov, 2011

1 commit

  • At first tools were required to do that, but while writing the python
    bindings to simplify the API I made them auto-allocate when needed.

    This just makes record, stat and top use that auto allocation,
    simplifying them a bit.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-iokhcvkzzijr3keioubx8hlq@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

28 Nov, 2011

1 commit


26 Oct, 2011

1 commit

  • The __perf_evsel__open routing was grouping just the threads for that
    specific events per cpu when we want to group all threads in all events
    to the first fd opened on that cpu.

    So pass the xyarray with the first event, where the other events will be
    able to get that first per cpu fd.

    At some point top and record will switch to using perf_evlist__open that
    takes care of this detail and probably will also handle the fallback
    from hw to soft counters, etc.

    Reported-by: Deng-Cheng Zhu
    Tested-by: Deng-Cheng Zhu
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-ebm34rh098i9y9v4cytfdp0x@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 Sep, 2011

6 commits

  • When a program crashes under perf there is no message about it, unlike
    when running it from bash. This can be confusing and lead to wrong
    actions during debugging.

    Print fatal signals in perf stat/record.

    Thanks to Furat Afram for finding the problem originally

    Link: http://lkml.kernel.org/r/1316122302-24306-1-git-send-email-andi@firstfloor.org
    Cc: Frederic Weisbecker
    Cc: Stephane Eranian
    Signed-off-by: Andi Kleen
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Link: http://lkml.kernel.org/r/1315437244-3788-6-git-send-email-jim.cromie@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jim Cromie
    Signed-off-by: Arnaldo Carvalho de Melo

    Jim Cromie
     
  • If option -x '\t' is given, convert '\t' to "\t". This makes cvs
    printing more flexible.

    Link: http://lkml.kernel.org/r/1315437244-3788-5-git-send-email-jim.cromie@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jim Cromie
    Signed-off-by: Arnaldo Carvalho de Melo

    Jim Cromie
     
  • For pretty output only (preserve column for cvs output), dont print
    std-deviation when its 0.00. Do this based upon value, instead of
    checking for --no-aggr, since the stats could conceivably be computed
    over the runs on each CPU, and theres no reason to preclude that.

    Link: http://lkml.kernel.org/r/1315437244-3788-4-git-send-email-jim.cromie@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jim Cromie
    Signed-off-by: Arnaldo Carvalho de Melo

    Jim Cromie
     
  • Without this patch, running:

    $ sudo ./perf stat -r20 --no-aggr -a perl -e '$i++ for 1..100000'

    I get computations like this:

    CPU0 12.488247 task-clock # 1.224 CPUs utilized ( +- -nan% )
    CPU1 12.488909 task-clock # 1.225 CPUs utilized ( +- -nan% )
    CPU2 12.500221 task-clock # 1.226 CPUs utilized ( +- -nan% )
    CPU3 12.481713 task-clock # 1.224 CPUs utilized ( +- -nan% )

    but with patch, I get:

    CPU0 8.233682 task-clock # 0.754 CPUs utilized ( +- 0.00% )
    CPU1 8.226318 task-clock # 0.754 CPUs utilized ( +- 0.00% )
    CPU2 8.210737 task-clock # 0.752 CPUs utilized ( +- 0.00% )
    CPU3 8.201691 task-clock # 0.751 CPUs utilized ( +- 0.00% )

    Note that without --no-aggr, I get non-0 statistics both before and after patch:

    231.986022 task-clock # 4.030 CPUs utilized ( +- 0.97% )
    212 context-switches # 0.001 M/sec ( +- 12.07% )
    9 CPU-migrations # 0.000 M/sec ( +- 25.80% )
    466 page-faults # 0.002 M/sec ( +- 3.23% )
    174,318,593 cycles # 0.751 GHz ( +- 1.06% )

    I couldnt see anything wrong in the caller, so fixed it in
    stddev_stats(). ISTM that 0.00 is better than nan, since perf stat was
    passed -A (--no-aggr) so no standard deviation should be expected, and
    nan is suggestive of a deeper error.

    When running with --no-aggr, perhaps we should suppress the statistics
    printing entirely, or do so when they are 0.00.

    Link: http://lkml.kernel.org/r/1315437244-3788-3-git-send-email-jim.cromie@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jim Cromie
    Signed-off-by: Arnaldo Carvalho de Melo

    Jim Cromie
     
  • This perf stat option emulates valgrind's --log-fd option, allowing the
    user to send perf results elsewhere, and leaving stderr for use by the
    program under test. This complements --output file option, and is
    mutually exclusive with it.

    3>results perf stat --log-fd 3 -- $cmd
    3>>results perf stat --log-fd 3 --append -- $cmd

    The perl distro's make test.valgrind target uses valgrind's --log-fd
    option, I've adapted it to invoke perf also, and tested this patch
    there.

    Link: http://lkml.kernel.org/r/1315437244-3788-2-git-send-email-jim.cromie@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jim Cromie
    Signed-off-by: Arnaldo Carvalho de Melo

    Jim Cromie
     

19 Aug, 2011

1 commit


18 Aug, 2011

2 commits

  • This patch adds an option (-o) to save the output of perf stat into a
    file. You could do this with perf record but not with perf stat.
    Instead, you had to fiddle with stderr to save the counts into a
    separate file.

    The patch also adds the --append option so that results can be
    concatenated into a single file across runs. Each run of the tool is
    clearly separated by a comment line starting with a hash mark. The -A
    option of perf record is already used by perf stat, so we only add a
    long option.

    $ perf stat -o res.txt date
    $ cat res.txt

    Performance counter stats for 'date':

    0.791306 task-clock # 0.668 CPUs utilized
    2 context-switches # 0.003 M/sec
    0 CPU-migrations # 0.000 M/sec
    197 page-faults # 0.249 M/sec
    1878143 cycles # 2.373 GHz
    stalled-cycles-frontend
    stalled-cycles-backend
    1083367 instructions # 0.58 insns per cycle
    193027 branches # 243.935 M/sec
    9014 branch-misses # 4.67% of all branches

    0.001184746 seconds time elapsed

    The option can be combined with -x to make the output file much easier
    to parse.

    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110815202233.GA18535@quad
    Signed-off-by: Stephane Eranian
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     
  • Group event scheduling command line option is missing in perf
    record/stat.

    Add it to perf record/stat, which is same as in perf top.

    Reported-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1313577727.2754.5.camel@hp6530s
    Signed-off-by: Lin Ming
    Signed-off-by: Arnaldo Carvalho de Melo

    Lin Ming
     

21 Jul, 2011

1 commit

  • Moving out the option parameter from parse_events function,
    and adding new parse_events_option function instead.

    The option parameter is used only to carry "struct perf_evlist"
    pointer for chaining new events. Putting it away, enable us
    to call parse_events from other places without using the
    option parameter.

    Signed-off-by: Jiri Olsa
    Cc: acme@redhat.com
    Cc: a.p.zijlstra@chello.nl
    Cc: paulus@samba.org
    Link: http://lkml.kernel.org/r/1310635534-4013-2-git-send-email-jolsa@redhat.com
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

01 Jul, 2011

1 commit

  • Previously, when you want perf-stat to output the statistics in
    csv mode, no information of the noise will be printed out.

    For example right now we output this --repeat information:

    ./perf stat -r3 -x, sleep 1
    1.164789,task-clock
    8,context-switches
    0,CPU-migrations
    219,page-faults
    3337800,cycles

    With this patch, the output will be appended with an additional
    entry for the noise value:

    ./perf stat -r3 -x, sleep 1
    1.164789,task-clock,3.75%
    8,context-switches,75.00%
    0,CPU-migrations,100.00%
    219,page-faults,0.00%
    3337800,cycles,3.36%

    Signed-off-by: Zhengyu He
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: Venkatesh Pallipadi
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com
    Signed-off-by: Ingo Molnar

    Zhengyu He
     

03 Jun, 2011

1 commit

  • perf stat continues running even if the event list contains counters
    that are not supported. The resulting output then contains
    for those events which gets confusing as to which events are supported,
    but not counted and which are not supported.

    Before:

    perf stat -ddd -- sleep 1

    Performance counter stats for 'sleep 1':

    0.571283 task-clock # 0.001 CPUs utilized
    1 context-switches # 0.002 M/sec
    0 CPU-migrations # 0.000 M/sec
    157 page-faults # 0.275 M/sec
    1,037,707 cycles # 1.816 GHz
    stalled-cycles-frontend
    stalled-cycles-backend
    654,499 instructions # 0.63 insns per cycle
    136,129 branches # 238.286 M/sec
    branch-misses
    L1-dcache-loads
    L1-dcache-load-misses
    LLC-loads
    LLC-load-misses
    L1-icache-loads
    L1-icache-load-misses
    dTLB-loads
    dTLB-load-misses
    iTLB-loads
    iTLB-load-misses
    L1-dcache-prefetches
    L1-dcache-prefetch-misses

    1.001004836 seconds time elapsed

    After:

    perf stat -ddd -- sleep 1

    Performance counter stats for 'sleep 1':

    1.350326 task-clock # 0.001 CPUs utilized
    2 context-switches # 0.001 M/sec
    0 CPU-migrations # 0.000 M/sec
    157 page-faults # 0.116 M/sec
    11,986 cycles # 0.009 GHz
    stalled-cycles-frontend
    stalled-cycles-backend
    496,986 instructions # 41.46 insns per cycle
    138,065 branches # 102.246 M/sec
    7,245 branch-misses # 5.25% of all branches
    L1-dcache-loads
    L1-dcache-load-misses
    LLC-loads
    LLC-load-misses
    L1-icache-loads
    L1-icache-load-misses
    dTLB-loads
    dTLB-load-misses
    iTLB-loads
    iTLB-load-misses
    L1-dcache-prefetches
    L1-dcache-prefetch-misses

    1.002397333 seconds time elapsed

    v1->v2:
    changed supported type from int to bool

    v2->v3
    fixed vertical alignment of new struct element

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

19 May, 2011

2 commits

  • Print out the cache-miss percentage as well if the cache refs were
    collected, for all the generic cache event types.

    Before:

    11,103,723,230 dTLB-loads # 622.471 M/sec ( +- 0.30% )
    87,065,337 dTLB-load-misses # 4.881 M/sec ( +- 0.90% )

    After:

    11,353,713,242 dTLB-loads # 626.020 M/sec ( +- 0.35% )
    113,393,472 dTLB-load-misses # 1.00% of all dTLB cache hits ( +- 0.49% )

    Also ASCII color highlight too high percentages, them when it's executed on the console.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Print even more detailed statistics if requested via perf stat -d:

    -d: detailed events, L1 and LLC data cache
    -d -d: more detailed events, dTLB and iTLB events
    -d -d -d: very detailed events, adding prefetch events

    Full output looks like this now:

    Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

    1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
    49,068 context-switches # 0.029 M/sec ( +- 16.66% )
    8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
    17,397 page-faults # 0.010 M/sec ( +- 0.46% )
    2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
    1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
    743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
    1,314,416,379 instructions # 0.56 insns per cycle
    # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
    272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
    3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
    449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
    22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
    6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
    1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
    411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
    27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
    464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
    10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
    1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
    117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
    4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
    1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]

    0.195622057 seconds time elapsed ( +- 6.84% )

    Also clean up the attribute construction code to be appending, and factor
    it out into add_default_attributes().

    Tweak the coverage percentage printout a bit, so that it's easier to view it
    alongside the +- sttddev colum.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

30 Apr, 2011

1 commit

  • Similar to perf-record, tell user about unsupported events
    that will not be counted if invoked in verbose mode.

    e.g.,

    $ perf stat -e dTLB-prefetch-misses -v -- sleep 1
    dTLB-prefetch-misses event is not supported by the kernel.
    dTLB-prefetch-misses: 0 0 0

    Performance counter stats for 'sleep 1':

    dTLB-prefetch-misses

    1.001884783 seconds time elapsed

    Signed-off-by: David Ahern
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1304114655-10600-1-git-send-email-dsahern@gmail.com
    Signed-off-by: Ingo Molnar

    David Ahern
     

29 Apr, 2011

5 commits

  • David Ahern reported this perf stat failure:

    > # /tmp/build-perf/perf stat -- sleep 1
    > Error: stalled-cycles-frontend event is not supported.
    > Fatal: Not all events could be opened.
    >
    > This is a Dell R410 with an E5620 processor.

    Fail in a softer fashion on unknown/unsupported events.

    Reported-by: David Ahern
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n006io7hjpn1dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Triple digit percentages do not fit otherwise.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n005io7hjpn1dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Adjust to color thresholds to better match the percentages seen in
    real workloads. Both are now a bit more sensitive.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n004io7hjpn1dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Sample output:

    Performance counter stats for './loop_1b':

    873.691065 task-clock # 1.000 CPUs utilized
    1 context-switches # 0.000 M/sec
    1 CPU-migrations # 0.000 M/sec
    96 page-faults # 0.000 M/sec
    2,012,637,222 cycles # 2.304 GHz (66.58%)
    1,001,397,911 stalled-cycles-frontend # 49.76% frontend cycles idle (66.58%)
    7,523,398 stalled-cycles-backend # 0.37% backend cycles idle (66.76%)
    2,004,551,046 instructions # 1.00 insns per cycle
    # 0.50 stalled cycles per insn (66.80%)
    1,001,304,992 branches # 1146.063 M/sec (66.76%)
    39,453 branch-misses # 0.00% of all branches (66.64%)

    0.874046121 seconds time elapsed

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n003io7hjpn1dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Update perf tooling to deal with front-end and back-end stalled cycles events.

    Add both the default 'perf stat' output.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n002io7hjpn1dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Apr, 2011

2 commits

  • Instead of failing on an unknown event, when new perf stat is run on
    older kernels:

    $ ./perf stat true
    Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
    may provide additional information.

    Fatal: Not all events could be opened.

    Just ignore EINVAL and ENOSYS, we'll print the results as not counted:

    Performance counter stats for 'true':

    0.239483 task-clock # 0.493 CPUs utilized
    0 context-switches # 0.000 M/sec
    0 CPU-migrations # 0.000 M/sec
    86 page-faults # 0.359 M/sec
    704,766 cycles # 2.943 GHz
    stalled-cycles
    381,961 instructions # 0.54 insns per cycle
    69,626 branches # 290.735 M/sec
    4,594 branch-misses # 6.60% of all branches

    0.000485883 seconds time elapsed

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • --sync will tell perf stat to run sync() before starting a command.

    This allows IO-heavy tests to be used with --repeat, without one
    iteration impacting the other.

    Elapsed time will stabilize for example:

    before: 3.971525714 seconds time elapsed ( +- 8.56% )
    after: 3.211098537 seconds time elapsed ( +- 1.52% )

    So measurements will be more accurate.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

27 Apr, 2011

7 commits

  • Before:

    |
    | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
    |
    | 71,321,607 instructions:u # 0.42 insns per cycle ( +- 0.00% )
    | 168,040,009 cycles:u # 0.000 GHz ( +- 0.81% )
    |
    | 1.468002368 seconds time elapsed ( +- 1.33% )
    |

    After:

    |
    | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
    |
    | 71,321,607 instructions:u # 0.42 insns per cycle ( +- 0.00% )
    | 168,040,009 cycles:u # 0.000 GHz ( +- 0.81% )
    |
    | 1.468002368 seconds time elapsed ( +- 1.33% )
    |

    The last column (stddev noise) is properly aligned, vertically.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn0dsrm@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Add the new -d/--detailed flag, which generates a pretty detailed event list:

    Performance counter stats for './hackbench 10' (10 runs):

    1514.287888 task-clock # 10.897 CPUs utilized ( +- 3.05% )
    39,698 context-switches # 0.026 M/sec ( +- 12.19% )
    8,147 CPU-migrations # 0.005 M/sec ( +- 16.55% )
    17,918 page-faults # 0.012 M/sec ( +- 0.37% )
    2,944,504,050 cycles # 1.944 GHz ( +- 3.89% ) (32.60%)
    1,043,971,283 stalled-cycles # 35.45% of all cycles are idle ( +- 5.22% ) (44.48%)
    1,655,906,768 instructions # 0.56 insns per cycle
    # 0.63 stalled cycles per insn ( +- 1.95% ) (55.09%)
    338,832,373 branches # 223.757 M/sec ( +- 1.96% ) (64.47%)
    3,892,416 branch-misses # 1.15% of all branches ( +- 5.49% ) (73.12%)
    606,410,482 L1-dcache-loads # 400.459 M/sec ( +- 1.29% ) (71.21%)
    31,204,395 L1-dcache-load-misses # 5.15% of all L1-dcache hits ( +- 3.04% ) (60.43%)
    3,922,751 LLC-loads # 2.590 M/sec ( +- 6.80% ) (46.87%)
    5,037,288 LLC-load-misses # 3.327 M/sec ( +- 3.56% ) (13.00%)

    0.138966828 seconds time elapsed ( +- 4.11% )

    This can be used "at a glance" for narrower analysis.

    -d can also be used in addition to other -e events, to further expand an event list.

    Acked-by: Peter Zijlstra
    Acked-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-cxs98quixs3qyvdqx3goojc4@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Print out this kind of l1-dcache-misses percentage:

    Performance counter stats for './bw_tcp localhost':

    29,956,262,201 cycles # 3.002 GHz (scaled from 85.14%)
    8,255,209,558 stalled-cycles # 27.56% of all cycles are idle (scaled from 86.56%)
    1,206,130,308 l1-dcache-misses # 40.49% of all L1-dcache hits (scaled from 86.30%)
    2,978,756,779 l1-dcache-refs # 298.512 M/sec (scaled from 70.02%)
    8,861,956,159 instructions # 0.30 insns per cycle
    # 0.93 stalled cycles per insn (scaled from 84.27%)
    1,644,306,068 branches # 164.782 M/sec (scaled from 86.43%)
    74,778,443 branch-misses # 4.55% of all branches (scaled from 70.69%)
    9978.695711 task-clock # 0.693 CPUs utilized

    14.404347983 seconds time elapsed

    And color the result depending on the severity of cache-trashing.

    Acked-by: Peter Zijlstra
    Acked-by Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-54gmz0zymaid84zcs7joq02p@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Print the missed-branches percentage with different warning level ASCII colors,
    as the percentage passes the 5%/10%/20% thresholds.

    These thresholds are set to relatively low levels, because on most CPUs even a
    moderate percentage of branch-misses already shows up as a slowdown.

    Acked-by: Peter Zijlstra
    Acked-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-ybqukg7p86leiup7gl03ecgk@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Print the stalled-cycles percentage with different warning level ASCII colors,
    as the percentage passes the 25%/50%/75% thresholds.

    Acked-by: Peter Zijlstra
    Acked-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-e25zz44rcms7mu9az4fu5zp0@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Before:

    0 CPU-migrations # 0.000 M/sec ( +- -nan% )

    After:

    0 CPU-migrations # 0.000 M/sec ( +- 0.00% )

    Also factor out the noise printing function.

    Acked-by: Peter Zijlstra
    Acked-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-z89h2v1bk1mikcbsf7e6v34q@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The new default output looks like this:

    Performance counter stats for './loop_1b_instructions':

    236.010686 task-clock # 0.996 CPUs utilized
    0 context-switches # 0.000 M/sec
    0 CPU-migrations # 0.000 M/sec
    99 page-faults # 0.000 M/sec
    756,487,646 cycles # 3.205 GHz
    354,938,996 stalled-cycles # 46.92% of all cycles are idle
    1,001,403,797 instructions # 1.32 insns per cycle
    # 0.35 stalled cycles per insn
    100,279,773 branches # 424.895 M/sec
    12,646 branch-misses # 0.013 % of all branches

    0.236902540 seconds time elapsed

    We dropped cache-refs and cache-misses and added stalled-cycles - this is a
    more generic "how well utilized is the CPU" metric.

    If the stalled-cycles ratio is too high then more specific measurements can be
    taken to figure out the source of the inefficiency.

    Acked-by: Peter Zijlstra
    Acked-by: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/n/tip-pbpl2l4mn797s69bclfpwkwn@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar