Doug / smarc-fsl-linux-kernel | Embedian Git Server

17 Mar, 2012

1 commit

4c19ea453 perf stat: Fix event grouping on forked task ... Browse Code »

When event group is enabled for forked task (i.e. no target task was
specified) all events were disabled and marked ->enable_on_exec.
However they are not counted at all since only group leader will be
enabled on exec actually. So the result looked like below:

$ ./perf stat --group -- sleep 1

Performance counter stats for 'sleep 1':

0.554926 task-clock # 0.001 CPUs utilized
context-switches
CPU-migrations
page-faults
cycles
stalled-cycles-frontend
stalled-cycles-backend
instructions
branches
branch-misses

1.001228093 seconds time elapsed

Fix it by disabling group leader only.

Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2012-03-17 03:13:45 +0800

14 Feb, 2012

1 commit

b52956c96 perf tools: Allow multiple threads or processes in record, stat, top ... Browse Code »

Allow a user to collect events for multiple threads or processes
using a comma separated list.

e.g., collect data on a VM and its vhost thread:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485

or monitoring vcpu threads
perf top -t 21488,21489
perf stat -t 21488,21489 -ddd
perf record -t 21488,21489

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1328718772-16688-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo

David Ahern
2012-02-14 08:54:11 +0800

07 Feb, 2012

2 commits

9dac6a29e perf stat: Align scaled output of cpu-clock ... Browse Code »

The output of cpu-clock event is controlled in nsec_printout(),
but its alignment was broken:

Performance counter stats for 'sleep 1':

6,038,774 instructions # 0.00 insns per cycle
180 faults # 0.007 K/sec [99.95%]
1,282,201 branches # 0.053 M/sec [99.84%]
24126.221811 cpu-clock [99.62%]
24121.689540 task-clock # 24.098 CPUs utilized [99.52%]

1.001001017 seconds time elapsed

This patch fixes this:

Performance counter stats for 'sleep 1':

13,540,843 instructions # 0.00 insns per cycle
180 faults # 0.007 K/sec [99.94%]
2,875,386 branches # 0.119 M/sec [99.82%]
24144.221137 cpu-clock [99.61%]
24133.515366 task-clock # 24.109 CPUs utilized [99.52%]

1.001020946 seconds time elapsed

Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1328514285-26232-2-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2012-02-07 05:17:39 +0800
5fde2523b perf stat: Adjust print unit ... Browse Code »

The default 'M/sec' unit is not useful if the result is small enough.

Adjust it dynamically according to the value.

Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1328514285-26232-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2012-02-07 05:17:11 +0800

25 Jan, 2012

1 commit

0d37aa34f perf tools: Introduce per user view ... Browse Code »

The new --uid command line option will show only the tasks for a given
user, using the proc interface to figure out the existing tasks.

Kernel work is needed to close races at startup, but this should already
be useful in many use cases.

Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-bdnspm000gw2l984a2t53o8z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2012-01-25 05:47:37 +0800

04 Jan, 2012

1 commit

15e6392fe perf stat: Introduce get_ratio_color() helper ... Browse Code »

The get_ratio_color() returns appropriate color string based on @ratio.
It helps reducing code duplication.

Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1325000151-4463-2-git-send-email-namhyung@gmail.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2012-01-04 00:36:32 +0800

21 Dec, 2011

1 commit

d87f69a16 Merge commit 'v3.2-rc6' into perf/core ... Browse Code »

Merge reason: Update with the latest fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-12-21 03:32:11 +0800

06 Dec, 2011

1 commit

38f6ae1e1 perf stat: Failure with "Operation not supported" ... Browse Code »

perf stat is failing on PowerPC:

Error: open_counter returned with 95 (Operation not supported). /bin/dmesg may provide additional information.

Fatal: Not all events could be opened.

commit 370faf1dd046 (perf stat: Fail softly on unsupported events)
added a check for failure returning ENOENT, but the POWER backend
returns EOPNOTSUPP. It looks like alpha, blackfin and mips do the
same.

With the patch applied, things work as expected:

Performance counter stats for '/bin/true':

0.362176 task-clock # 0.623 CPUs utilized
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
28 page-faults # 0.077 M/sec
1,677,020 cycles # 4.630 GHz
stalled-cycles-frontend
stalled-cycles-backend
431,220 instructions # 0.26 insns per cycle
101,889 branches # 281.325 M/sec
4,145 branch-misses # 4.07% of all branches

0.000581361 seconds time elapsed

Cc: # 3.0+
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111202093833.5fef7226@kryten
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo

Anton Blanchard
2011-12-06 00:32:40 +0800

29 Nov, 2011

1 commit

806fb6300 perf evlist: Always do automatic allocation of pollfd and mmap structures ... Browse Code »

At first tools were required to do that, but while writing the python
bindings to simplify the API I made them auto-allocate when needed.

This just makes record, stat and top use that auto allocation,
simplifying them a bit.

Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-iokhcvkzzijr3keioubx8hlq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-11-29 18:05:52 +0800

28 Nov, 2011

1 commit

50d08e47b perf evlist: Introduce perf_evlist__add_attrs ... Browse Code »

Replacing the open coded equivalents in 'perf stat'.

Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-1btwadnf2tds2g07hsccsdse@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-11-28 20:24:43 +0800

26 Oct, 2011

1 commit

727ab04ed perf evlist: Fix grouping of multiple events ... Browse Code »

The __perf_evsel__open routing was grouping just the threads for that
specific events per cpu when we want to group all threads in all events
to the first fd opened on that cpu.

So pass the xyarray with the first event, where the other events will be
able to get that first per cpu fd.

At some point top and record will switch to using perf_evlist__open that
takes care of this detail and probably will also handle the fallback
from hw to soft counters, etc.

Reported-by: Deng-Cheng Zhu
Tested-by: Deng-Cheng Zhu
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-ebm34rh098i9y9v4cytfdp0x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-10-26 20:25:02 +0800

30 Sep, 2011

6 commits

33e49ea70 perf tools: Make stat/record print fatal signals of the target program ... Browse Code »

When a program crashes under perf there is no message about it, unlike
when running it from bash. This can be confusing and lead to wrong
actions during debugging.

Print fatal signals in perf stat/record.

Thanks to Furat Afram for finding the problem originally

Link: http://lkml.kernel.org/r/1316122302-24306-1-git-send-email-andi@firstfloor.org
Cc: Frederic Weisbecker
Cc: Stephane Eranian
Signed-off-by: Andi Kleen
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2011-09-30 04:09:46 +0800
61a9f3242 perf stat: Fix spelling in comment ... Browse Code »

Link: http://lkml.kernel.org/r/1315437244-3788-6-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo

Jim Cromie
2011-09-30 04:09:36 +0800
d4ffd04df perf stat: Allow tab as cvs delimiter ... Browse Code »

If option -x '\t' is given, convert '\t' to "\t". This makes cvs
printing more flexible.

Link: http://lkml.kernel.org/r/1315437244-3788-5-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo

Jim Cromie
2011-09-30 04:04:30 +0800
a1bca6cc8 perf stat: Suppress printing std-dev when its 0 ... Browse Code »

For pretty output only (preserve column for cvs output), dont print
std-deviation when its 0.00. Do this based upon value, instead of
checking for --no-aggr, since the stats could conceivably be computed
over the runs on each CPU, and theres no reason to preclude that.

Link: http://lkml.kernel.org/r/1315437244-3788-4-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo

Jim Cromie
2011-09-30 04:04:00 +0800
19f474025 perf stat: Fix +- nan% in --no-aggr runs ... Browse Code »

Without this patch, running:

$ sudo ./perf stat -r20 --no-aggr -a perl -e '$i++ for 1..100000'

I get computations like this:

CPU0 12.488247 task-clock # 1.224 CPUs utilized ( +- -nan% )
CPU1 12.488909 task-clock # 1.225 CPUs utilized ( +- -nan% )
CPU2 12.500221 task-clock # 1.226 CPUs utilized ( +- -nan% )
CPU3 12.481713 task-clock # 1.224 CPUs utilized ( +- -nan% )

but with patch, I get:

CPU0 8.233682 task-clock # 0.754 CPUs utilized ( +- 0.00% )
CPU1 8.226318 task-clock # 0.754 CPUs utilized ( +- 0.00% )
CPU2 8.210737 task-clock # 0.752 CPUs utilized ( +- 0.00% )
CPU3 8.201691 task-clock # 0.751 CPUs utilized ( +- 0.00% )

Note that without --no-aggr, I get non-0 statistics both before and after patch:

231.986022 task-clock # 4.030 CPUs utilized ( +- 0.97% )
212 context-switches # 0.001 M/sec ( +- 12.07% )
9 CPU-migrations # 0.000 M/sec ( +- 25.80% )
466 page-faults # 0.002 M/sec ( +- 3.23% )
174,318,593 cycles # 0.751 GHz ( +- 1.06% )

I couldnt see anything wrong in the caller, so fixed it in
stddev_stats(). ISTM that 0.00 is better than nan, since perf stat was
passed -A (--no-aggr) so no standard deviation should be expected, and
nan is suggestive of a deeper error.

When running with --no-aggr, perhaps we should suppress the statistics
printing entirely, or do so when they are 0.00.

Link: http://lkml.kernel.org/r/1315437244-3788-3-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo

Jim Cromie
2011-09-30 04:03:46 +0800
56f3bae70 perf stat: Add --log-fd <N> option to redirect stderr elsewhere ... Browse Code »

This perf stat option emulates valgrind's --log-fd option, allowing the
user to send perf results elsewhere, and leaving stderr for use by the
program under test. This complements --output file option, and is
mutually exclusive with it.

3>results perf stat --log-fd 3 -- $cmd
3>>results perf stat --log-fd 3 --append -- $cmd

The perl distro's make test.valgrind target uses valgrind's --log-fd
option, I've adapted it to invoke perf also, and tested this patch
there.

Link: http://lkml.kernel.org/r/1315437244-3788-2-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo

Jim Cromie
2011-09-30 04:03:23 +0800

19 Aug, 2011

1 commit

51887c823 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core ... Browse Code »

Conflicts:
tools/perf/builtin-stat.c

Ingo Molnar
2011-08-19 03:58:46 +0800

18 Aug, 2011

2 commits

4aa9015f8 perf stat: Add -o and --append options ... Browse Code »

This patch adds an option (-o) to save the output of perf stat into a
file. You could do this with perf record but not with perf stat.
Instead, you had to fiddle with stderr to save the counts into a
separate file.

The patch also adds the --append option so that results can be
concatenated into a single file across runs. Each run of the tool is
clearly separated by a comment line starting with a hash mark. The -A
option of perf record is already used by perf stat, so we only add a
long option.

$ perf stat -o res.txt date
$ cat res.txt

Performance counter stats for 'date':

0.791306 task-clock # 0.668 CPUs utilized
2 context-switches # 0.003 M/sec
0 CPU-migrations # 0.000 M/sec
197 page-faults # 0.249 M/sec
1878143 cycles # 2.373 GHz
stalled-cycles-frontend
stalled-cycles-backend
1083367 instructions # 0.58 insns per cycle
193027 branches # 243.935 M/sec
9014 branch-misses # 4.67% of all branches

0.001184746 seconds time elapsed

The option can be combined with -x to make the output file much easier
to parse.

Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110815202233.GA18535@quad
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2011-08-18 18:46:13 +0800
43bece797 perf tools: Add group event scheduling option to perf record/stat ... Browse Code »

Group event scheduling command line option is missing in perf
record/stat.

Add it to perf record/stat, which is same as in perf top.

Reported-by: Andi Kleen
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1313577727.2754.5.camel@hp6530s
Signed-off-by: Lin Ming
Signed-off-by: Arnaldo Carvalho de Melo

Lin Ming
2011-08-18 18:35:46 +0800

21 Jul, 2011

1 commit

f120f9d51 perf tools: De-opt the parse_events function ... Browse Code »

Moving out the option parameter from parse_events function,
and adding new parse_events_option function instead.

The option parameter is used only to carry "struct perf_evlist"
pointer for chaining new events. Putting it away, enable us
to call parse_events from other places without using the
option parameter.

Signed-off-by: Jiri Olsa
Cc: acme@redhat.com
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Link: http://lkml.kernel.org/r/1310635534-4013-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2011-07-21 16:41:11 +0800

01 Jul, 2011

1 commit

3ae9a34d7 perf stat: Add noise output for csv mode ... Browse Code »

Previously, when you want perf-stat to output the statistics in
csv mode, no information of the noise will be printed out.

For example right now we output this --repeat information:

./perf stat -r3 -x, sleep 1
1.164789,task-clock
8,context-switches
0,CPU-migrations
219,page-faults
3337800,cycles

With this patch, the output will be appended with an additional
entry for the noise value:

./perf stat -r3 -x, sleep 1
1.164789,task-clock,3.75%
8,context-switches,75.00%
0,CPU-migrations,100.00%
219,page-faults,0.00%
3337800,cycles,3.36%

Signed-off-by: Zhengyu He
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Stephane Eranian
Cc: Venkatesh Pallipadi
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com
Signed-off-by: Ingo Molnar

Zhengyu He
2011-07-01 18:52:40 +0800

03 Jun, 2011

1 commit

2cee77c45 perf stat: clarify unsupported events from uncounted events ... Browse Code »

perf stat continues running even if the event list contains counters
that are not supported. The resulting output then contains
for those events which gets confusing as to which events are supported,
but not counted and which are not supported.

Before:

perf stat -ddd -- sleep 1

Performance counter stats for 'sleep 1':

0.571283 task-clock # 0.001 CPUs utilized
1 context-switches # 0.002 M/sec
0 CPU-migrations # 0.000 M/sec
157 page-faults # 0.275 M/sec
1,037,707 cycles # 1.816 GHz
stalled-cycles-frontend
stalled-cycles-backend
654,499 instructions # 0.63 insns per cycle
136,129 branches # 238.286 M/sec
branch-misses
L1-dcache-loads
L1-dcache-load-misses
LLC-loads
LLC-load-misses
L1-icache-loads
L1-icache-load-misses
dTLB-loads
dTLB-load-misses
iTLB-loads
iTLB-load-misses
L1-dcache-prefetches
L1-dcache-prefetch-misses

1.001004836 seconds time elapsed

After:

perf stat -ddd -- sleep 1

Performance counter stats for 'sleep 1':

1.350326 task-clock # 0.001 CPUs utilized
2 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
157 page-faults # 0.116 M/sec
11,986 cycles # 0.009 GHz
stalled-cycles-frontend
stalled-cycles-backend
496,986 instructions # 41.46 insns per cycle
138,065 branches # 102.246 M/sec
7,245 branch-misses # 5.25% of all branches
L1-dcache-loads
L1-dcache-load-misses
LLC-loads
LLC-load-misses
L1-icache-loads
L1-icache-load-misses
dTLB-loads
dTLB-load-misses
iTLB-loads
iTLB-load-misses
L1-dcache-prefetches
L1-dcache-prefetch-misses

1.002397333 seconds time elapsed

v1->v2:
changed supported type from int to bool

v2->v3
fixed vertical alignment of new struct element

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo

David Ahern
2011-06-03 00:26:15 +0800

19 May, 2011

2 commits

c3305257c perf stat: Add more cache-miss percentage printouts ... Browse Code »

Print out the cache-miss percentage as well if the cache refs were
collected, for all the generic cache event types.

Before:

11,103,723,230 dTLB-loads # 622.471 M/sec ( +- 0.30% )
87,065,337 dTLB-load-misses # 4.881 M/sec ( +- 0.90% )

After:

11,353,713,242 dTLB-loads # 626.020 M/sec ( +- 0.35% )
113,393,472 dTLB-load-misses # 1.00% of all dTLB cache hits ( +- 0.49% )

Also ASCII color highlight too high percentages, them when it's executed on the console.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Steven Rostedt
Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-05-19 20:30:50 +0800
2cba3ffb9 perf stat: Add -d -d and -d -d -d options to show more CPU events ... Browse Code »

Print even more detailed statistics if requested via perf stat -d:

-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch events

Full output looks like this now:

Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

1703.674707 task-clock
49,068 context-switches
8,303 CPU-migrations
17,397 page-faults
2,345,389,239 cycles
1,884,503,527 stalled-cycles-frontend
743,919,737 stalled-cycles-backend
1,314,416,379 instructions

272,592,567 branches
3,794,846 branch-misses
449,982,778 L1-dcache-loads
22,404,961 L1-dcache-load-misses
6,204,750 LLC-loads
1,837,411 LLC-load-misses
411,440,421 L1-icache-loads
27,556,832 L1-icache-load-misses
464,067,627 dTLB-loads
10,765,648 dTLB-load-misses
1,273,080,386 iTLB-loads
117,481 iTLB-load-misses
4,590,653 L1-dcache-prefetches
1,712,660 L1-dcache-prefetch-misses # # 8.709 CPUs utilized ( +- 4.19% ) # 0.029 M/sec ( +- 16.66% ) # 0.005 M/sec ( +- 24.90% ) # 0.010 M/sec ( +- 0.46% ) # 1.377 GHz ( +- 4.61% ) [55.90%] # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] # 160.003 M/sec ( +- 1.74% ) [56.56%] # 1.39% of all branches ( +- 6.59% ) [58.50%] # 264.125 M/sec ( +- 2.47% ) [49.88%] # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] # 3.642 M/sec ( +- 8.91% ) [43.75%] # 1.078 M/sec ( +- 7.27% ) [12.07%] # 241.502 M/sec ( +- 5.60% ) [36.52%] # 16.175 M/sec ( +- 7.46% ) [46.72%] # 272.392 M/sec ( +- 4.46% ) [54.17%] # 6.319 M/sec ( +- 3.18% ) [48.68%] # 747.256 M/sec ( +- 3.38% ) [47.53%] # 0.069 M/sec ( +- 14.99% ) [47.01%] # 2.695 M/sec ( +- 4.49% ) [46.19%] 1.005 M/sec ( +- 3.75% ) [44.82%]

0.195622057 seconds time elapsed ( +- 6.84% )

Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().

Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Steven Rostedt
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-05-19 20:29:51 +0800

30 Apr, 2011

1 commit

c63ca0c01 perf stat: Tell user about unsupported events in the list ... Browse Code »

Similar to perf-record, tell user about unsupported events
that will not be counted if invoked in verbose mode.

e.g.,

$ perf stat -e dTLB-prefetch-misses -v -- sleep 1
dTLB-prefetch-misses event is not supported by the kernel.
dTLB-prefetch-misses: 0 0 0

Performance counter stats for 'sleep 1':

dTLB-prefetch-misses

1.001884783 seconds time elapsed

Signed-off-by: David Ahern
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1304114655-10600-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar

David Ahern
2011-04-30 06:18:14 +0800

29 Apr, 2011

5 commits

370faf1dd perf stat: Fail softly on unsupported events ... Browse Code »

David Ahern reported this perf stat failure:

> # /tmp/build-perf/perf stat -- sleep 1
> Error: stalled-cycles-frontend event is not supported.
> Fatal: Not all events could be opened.
>
> This is a Dell R410 with an E5620 processor.

Fail in a softer fashion on unknown/unsupported events.

Reported-by: David Ahern
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n006io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-29 22:22:33 +0800
fce3c786d perf stat: Leave more room for percentages ... Browse Code »

Triple digit percentages do not fit otherwise.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n005io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-29 22:06:09 +0800
2b427e14b perf stat: Adjust stall cycles warning percentages ... Browse Code »

Adjust to color thresholds to better match the percentages seen in
real workloads. Both are now a bit more sensitive.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n004io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-29 20:35:57 +0800
d3d1e86da perf stat: Analyze front-end and back-end stall counts ... Browse Code »

Sample output:

Performance counter stats for './loop_1b':

873.691065 task-clock # 1.000 CPUs utilized
1 context-switches # 0.000 M/sec
1 CPU-migrations # 0.000 M/sec
96 page-faults # 0.000 M/sec
2,012,637,222 cycles # 2.304 GHz (66.58%)
1,001,397,911 stalled-cycles-frontend # 49.76% frontend cycles idle (66.58%)
7,523,398 stalled-cycles-backend # 0.37% backend cycles idle (66.76%)
2,004,551,046 instructions # 1.00 insns per cycle
# 0.50 stalled cycles per insn (66.80%)
1,001,304,992 branches # 1146.063 M/sec (66.76%)
39,453 branch-misses # 0.00% of all branches (66.64%)

0.874046121 seconds time elapsed

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n003io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-29 20:35:55 +0800
129c04cb8 perf tools: Add front-end and back-end stalled cycles support ... Browse Code »

Update perf tooling to deal with front-end and back-end stalled cycles events.

Add both the default 'perf stat' output.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n002io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-29 20:35:49 +0800

28 Apr, 2011

2 commits

ede702900 perf stat: Fix compatibility behavior ... Browse Code »

Instead of failing on an unknown event, when new perf stat is run on
older kernels:

$ ./perf stat true
Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
may provide additional information.

Fatal: Not all events could be opened.

Just ignore EINVAL and ENOSYS, we'll print the results as not counted:

Performance counter stats for 'true':

0.239483 task-clock # 0.493 CPUs utilized
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
86 page-faults # 0.359 M/sec
704,766 cycles # 2.943 GHz
stalled-cycles
381,961 instructions # 0.54 insns per cycle
69,626 branches # 290.735 M/sec
4,594 branch-misses # 6.60% of all branches

0.000485883 seconds time elapsed

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-28 14:48:42 +0800
f9cef0a90 perf stat: Add --sync/-S option ... Browse Code »

--sync will tell perf stat to run sync() before starting a command.

This allows IO-heavy tests to be used with --repeat, without one
iteration impacting the other.

Elapsed time will stabilize for example:

before: 3.971525714 seconds time elapsed ( +- 8.56% )
after: 3.211098537 seconds time elapsed ( +- 1.52% )

So measurements will be more accurate.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-28 14:39:39 +0800

27 Apr, 2011

7 commits

9ceb1c3d1 perf stat: Fix printout vertical alignment ... Browse Code »

Before:

|
| Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
|
| 71,321,607 instructions:u # 0.42 insns per cycle ( +- 0.00% )
| 168,040,009 cycles:u # 0.000 GHz ( +- 0.81% )
|
| 1.468002368 seconds time elapsed ( +- 1.33% )
|

After:

|
| Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
|
| 71,321,607 instructions:u # 0.42 insns per cycle ( +- 0.00% )
| 168,040,009 cycles:u # 0.000 GHz ( +- 0.81% )
|
| 1.468002368 seconds time elapsed ( +- 1.33% )
|

The last column (stddev noise) is properly aligned, vertically.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn0dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 23:48:55 +0800
c6264deff perf stat: Add -d/--detailed flag to run with a lot of events ... Browse Code »

Add the new -d/--detailed flag, which generates a pretty detailed event list:

Performance counter stats for './hackbench 10' (10 runs):

1514.287888 task-clock # 10.897 CPUs utilized ( +- 3.05% )
39,698 context-switches # 0.026 M/sec ( +- 12.19% )
8,147 CPU-migrations # 0.005 M/sec ( +- 16.55% )
17,918 page-faults # 0.012 M/sec ( +- 0.37% )
2,944,504,050 cycles # 1.944 GHz ( +- 3.89% ) (32.60%)
1,043,971,283 stalled-cycles # 35.45% of all cycles are idle ( +- 5.22% ) (44.48%)
1,655,906,768 instructions # 0.56 insns per cycle
# 0.63 stalled cycles per insn ( +- 1.95% ) (55.09%)
338,832,373 branches # 223.757 M/sec ( +- 1.96% ) (64.47%)
3,892,416 branch-misses # 1.15% of all branches ( +- 5.49% ) (73.12%)
606,410,482 L1-dcache-loads # 400.459 M/sec ( +- 1.29% ) (71.21%)
31,204,395 L1-dcache-load-misses # 5.15% of all L1-dcache hits ( +- 3.04% ) (60.43%)
3,922,751 LLC-loads # 2.590 M/sec ( +- 6.80% ) (46.87%)
5,037,288 LLC-load-misses # 3.327 M/sec ( +- 3.56% ) (13.00%)

0.138966828 seconds time elapsed ( +- 4.11% )

This can be used "at a glance" for narrower analysis.

-d can also be used in addition to other -e events, to further expand an event list.

Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-cxs98quixs3qyvdqx3goojc4@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 03:03:16 +0800
8bb6c79f2 perf stat: Print out miss/hit ratio for L1 data-cache events ... Browse Code »

Print out this kind of l1-dcache-misses percentage:

Performance counter stats for './bw_tcp localhost':

29,956,262,201 cycles # 3.002 GHz (scaled from 85.14%)
8,255,209,558 stalled-cycles # 27.56% of all cycles are idle (scaled from 86.56%)
1,206,130,308 l1-dcache-misses # 40.49% of all L1-dcache hits (scaled from 86.30%)
2,978,756,779 l1-dcache-refs # 298.512 M/sec (scaled from 70.02%)
8,861,956,159 instructions # 0.30 insns per cycle
# 0.93 stalled cycles per insn (scaled from 84.27%)
1,644,306,068 branches # 164.782 M/sec (scaled from 86.43%)
74,778,443 branch-misses # 4.55% of all branches (scaled from 70.69%)
9978.695711 task-clock # 0.693 CPUs utilized

14.404347983 seconds time elapsed

And color the result depending on the severity of cache-trashing.

Acked-by: Peter Zijlstra
Acked-by Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-54gmz0zymaid84zcs7joq02p@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 02:32:24 +0800
c78df6c1d perf stat: Print branch misses warning colors ... Browse Code »

Print the missed-branches percentage with different warning level ASCII colors,
as the percentage passes the 5%/10%/20% thresholds.

These thresholds are set to relatively low levels, because on most CPUs even a
moderate percentage of branch-misses already shows up as a slowdown.

Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-ybqukg7p86leiup7gl03ecgk@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 02:04:58 +0800
a5d243d04 perf stat: Print stalled cycles warning colors ... Browse Code »

Print the stalled-cycles percentage with different warning level ASCII colors,
as the percentage passes the 25%/50%/75% thresholds.

Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-e25zz44rcms7mu9az4fu5zp0@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 02:04:58 +0800
f99844cb7 perf stat: Fix -nan% output in perf stat noise printouts ... Browse Code »

Before:

0 CPU-migrations # 0.000 M/sec ( +- -nan% )

After:

0 CPU-migrations # 0.000 M/sec ( +- 0.00% )

Also factor out the noise printing function.

Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-z89h2v1bk1mikcbsf7e6v34q@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 02:04:57 +0800
1fc570ad8 perf stat: Add stalled cycles to the default output ... Browse Code »

The new default output looks like this:

Performance counter stats for './loop_1b_instructions':

236.010686 task-clock # 0.996 CPUs utilized
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
99 page-faults # 0.000 M/sec
756,487,646 cycles # 3.205 GHz
354,938,996 stalled-cycles # 46.92% of all cycles are idle
1,001,403,797 instructions # 1.32 insns per cycle
# 0.35 stalled cycles per insn
100,279,773 branches # 424.895 M/sec
12,646 branch-misses # 0.013 % of all branches

0.236902540 seconds time elapsed

We dropped cache-refs and cache-misses and added stalled-cycles - this is a
more generic "how well utilized is the CPU" metric.

If the stalled-cycles ratio is too high then more specific measurements can be
taken to figure out the source of the inefficiency.

Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-pbpl2l4mn797s69bclfpwkwn@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 02:04:57 +0800