17 Mar, 2012
1 commit
-
When event group is enabled for forked task (i.e. no target task was
specified) all events were disabled and marked ->enable_on_exec.
However they are not counted at all since only group leader will be
enabled on exec actually. So the result looked like below:$ ./perf stat --group -- sleep 1
Performance counter stats for 'sleep 1':
0.554926 task-clock # 0.001 CPUs utilized
context-switches
CPU-migrations
page-faults
cycles
stalled-cycles-frontend
stalled-cycles-backend
instructions
branches
branch-misses1.001228093 seconds time elapsed
Fix it by disabling group leader only.
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo
14 Feb, 2012
1 commit
-
Allow a user to collect events for multiple threads or processes
using a comma separated list.e.g., collect data on a VM and its vhost thread:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485or monitoring vcpu threads
perf top -t 21488,21489
perf stat -t 21488,21489 -ddd
perf record -t 21488,21489Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1328718772-16688-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo
07 Feb, 2012
2 commits
-
The output of cpu-clock event is controlled in nsec_printout(),
but its alignment was broken:Performance counter stats for 'sleep 1':
6,038,774 instructions # 0.00 insns per cycle
180 faults # 0.007 K/sec [99.95%]
1,282,201 branches # 0.053 M/sec [99.84%]
24126.221811 cpu-clock [99.62%]
24121.689540 task-clock # 24.098 CPUs utilized [99.52%]1.001001017 seconds time elapsed
This patch fixes this:
Performance counter stats for 'sleep 1':
13,540,843 instructions # 0.00 insns per cycle
180 faults # 0.007 K/sec [99.94%]
2,875,386 branches # 0.119 M/sec [99.82%]
24144.221137 cpu-clock [99.61%]
24133.515366 task-clock # 24.109 CPUs utilized [99.52%]1.001020946 seconds time elapsed
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1328514285-26232-2-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo -
The default 'M/sec' unit is not useful if the result is small enough.
Adjust it dynamically according to the value.
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1328514285-26232-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo
25 Jan, 2012
1 commit
-
The new --uid command line option will show only the tasks for a given
user, using the proc interface to figure out the existing tasks.Kernel work is needed to close races at startup, but this should already
be useful in many use cases.Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-bdnspm000gw2l984a2t53o8z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
04 Jan, 2012
1 commit
-
The get_ratio_color() returns appropriate color string based on @ratio.
It helps reducing code duplication.Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1325000151-4463-2-git-send-email-namhyung@gmail.com
Signed-off-by: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo
21 Dec, 2011
1 commit
-
Merge reason: Update with the latest fixes.
Signed-off-by: Ingo Molnar
06 Dec, 2011
1 commit
-
perf stat is failing on PowerPC:
Error: open_counter returned with 95 (Operation not supported). /bin/dmesg may provide additional information.
Fatal: Not all events could be opened.
commit 370faf1dd046 (perf stat: Fail softly on unsupported events)
added a check for failure returning ENOENT, but the POWER backend
returns EOPNOTSUPP. It looks like alpha, blackfin and mips do the
same.With the patch applied, things work as expected:
Performance counter stats for '/bin/true':
0.362176 task-clock # 0.623 CPUs utilized
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
28 page-faults # 0.077 M/sec
1,677,020 cycles # 4.630 GHz
stalled-cycles-frontend
stalled-cycles-backend
431,220 instructions # 0.26 insns per cycle
101,889 branches # 281.325 M/sec
4,145 branch-misses # 4.07% of all branches0.000581361 seconds time elapsed
Cc: # 3.0+
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111202093833.5fef7226@kryten
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo
29 Nov, 2011
1 commit
-
At first tools were required to do that, but while writing the python
bindings to simplify the API I made them auto-allocate when needed.This just makes record, stat and top use that auto allocation,
simplifying them a bit.Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-iokhcvkzzijr3keioubx8hlq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
28 Nov, 2011
1 commit
-
Replacing the open coded equivalents in 'perf stat'.
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-1btwadnf2tds2g07hsccsdse@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
26 Oct, 2011
1 commit
-
The __perf_evsel__open routing was grouping just the threads for that
specific events per cpu when we want to group all threads in all events
to the first fd opened on that cpu.So pass the xyarray with the first event, where the other events will be
able to get that first per cpu fd.At some point top and record will switch to using perf_evlist__open that
takes care of this detail and probably will also handle the fallback
from hw to soft counters, etc.Reported-by: Deng-Cheng Zhu
Tested-by: Deng-Cheng Zhu
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-ebm34rh098i9y9v4cytfdp0x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
30 Sep, 2011
6 commits
-
When a program crashes under perf there is no message about it, unlike
when running it from bash. This can be confusing and lead to wrong
actions during debugging.Print fatal signals in perf stat/record.
Thanks to Furat Afram for finding the problem originally
Link: http://lkml.kernel.org/r/1316122302-24306-1-git-send-email-andi@firstfloor.org
Cc: Frederic Weisbecker
Cc: Stephane Eranian
Signed-off-by: Andi Kleen
Signed-off-by: Arnaldo Carvalho de Melo -
Link: http://lkml.kernel.org/r/1315437244-3788-6-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo -
If option -x '\t' is given, convert '\t' to "\t". This makes cvs
printing more flexible.Link: http://lkml.kernel.org/r/1315437244-3788-5-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo -
For pretty output only (preserve column for cvs output), dont print
std-deviation when its 0.00. Do this based upon value, instead of
checking for --no-aggr, since the stats could conceivably be computed
over the runs on each CPU, and theres no reason to preclude that.Link: http://lkml.kernel.org/r/1315437244-3788-4-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo -
Without this patch, running:
$ sudo ./perf stat -r20 --no-aggr -a perl -e '$i++ for 1..100000'
I get computations like this:
CPU0 12.488247 task-clock # 1.224 CPUs utilized ( +- -nan% )
CPU1 12.488909 task-clock # 1.225 CPUs utilized ( +- -nan% )
CPU2 12.500221 task-clock # 1.226 CPUs utilized ( +- -nan% )
CPU3 12.481713 task-clock # 1.224 CPUs utilized ( +- -nan% )but with patch, I get:
CPU0 8.233682 task-clock # 0.754 CPUs utilized ( +- 0.00% )
CPU1 8.226318 task-clock # 0.754 CPUs utilized ( +- 0.00% )
CPU2 8.210737 task-clock # 0.752 CPUs utilized ( +- 0.00% )
CPU3 8.201691 task-clock # 0.751 CPUs utilized ( +- 0.00% )Note that without --no-aggr, I get non-0 statistics both before and after patch:
231.986022 task-clock # 4.030 CPUs utilized ( +- 0.97% )
212 context-switches # 0.001 M/sec ( +- 12.07% )
9 CPU-migrations # 0.000 M/sec ( +- 25.80% )
466 page-faults # 0.002 M/sec ( +- 3.23% )
174,318,593 cycles # 0.751 GHz ( +- 1.06% )I couldnt see anything wrong in the caller, so fixed it in
stddev_stats(). ISTM that 0.00 is better than nan, since perf stat was
passed -A (--no-aggr) so no standard deviation should be expected, and
nan is suggestive of a deeper error.When running with --no-aggr, perhaps we should suppress the statistics
printing entirely, or do so when they are 0.00.Link: http://lkml.kernel.org/r/1315437244-3788-3-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo -
This perf stat option emulates valgrind's --log-fd option, allowing the
user to send perf results elsewhere, and leaving stderr for use by the
program under test. This complements --output file option, and is
mutually exclusive with it.3>results perf stat --log-fd 3 -- $cmd
3>>results perf stat --log-fd 3 --append -- $cmdThe perl distro's make test.valgrind target uses valgrind's --log-fd
option, I've adapted it to invoke perf also, and tested this patch
there.Link: http://lkml.kernel.org/r/1315437244-3788-2-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jim Cromie
Signed-off-by: Arnaldo Carvalho de Melo
19 Aug, 2011
1 commit
-
Conflicts:
tools/perf/builtin-stat.c
18 Aug, 2011
2 commits
-
This patch adds an option (-o) to save the output of perf stat into a
file. You could do this with perf record but not with perf stat.
Instead, you had to fiddle with stderr to save the counts into a
separate file.The patch also adds the --append option so that results can be
concatenated into a single file across runs. Each run of the tool is
clearly separated by a comment line starting with a hash mark. The -A
option of perf record is already used by perf stat, so we only add a
long option.$ perf stat -o res.txt date
$ cat res.txtPerformance counter stats for 'date':
0.791306 task-clock # 0.668 CPUs utilized
2 context-switches # 0.003 M/sec
0 CPU-migrations # 0.000 M/sec
197 page-faults # 0.249 M/sec
1878143 cycles # 2.373 GHz
stalled-cycles-frontend
stalled-cycles-backend
1083367 instructions # 0.58 insns per cycle
193027 branches # 243.935 M/sec
9014 branch-misses # 4.67% of all branches0.001184746 seconds time elapsed
The option can be combined with -x to make the output file much easier
to parse.Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110815202233.GA18535@quad
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo -
Group event scheduling command line option is missing in perf
record/stat.Add it to perf record/stat, which is same as in perf top.
Reported-by: Andi Kleen
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1313577727.2754.5.camel@hp6530s
Signed-off-by: Lin Ming
Signed-off-by: Arnaldo Carvalho de Melo
21 Jul, 2011
1 commit
-
Moving out the option parameter from parse_events function,
and adding new parse_events_option function instead.The option parameter is used only to carry "struct perf_evlist"
pointer for chaining new events. Putting it away, enable us
to call parse_events from other places without using the
option parameter.Signed-off-by: Jiri Olsa
Cc: acme@redhat.com
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Link: http://lkml.kernel.org/r/1310635534-4013-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar
01 Jul, 2011
1 commit
-
Previously, when you want perf-stat to output the statistics in
csv mode, no information of the noise will be printed out.For example right now we output this --repeat information:
./perf stat -r3 -x, sleep 1
1.164789,task-clock
8,context-switches
0,CPU-migrations
219,page-faults
3337800,cyclesWith this patch, the output will be appended with an additional
entry for the noise value:./perf stat -r3 -x, sleep 1
1.164789,task-clock,3.75%
8,context-switches,75.00%
0,CPU-migrations,100.00%
219,page-faults,0.00%
3337800,cycles,3.36%Signed-off-by: Zhengyu He
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Stephane Eranian
Cc: Venkatesh Pallipadi
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com
Signed-off-by: Ingo Molnar
03 Jun, 2011
1 commit
-
perf stat continues running even if the event list contains counters
that are not supported. The resulting output then contains
for those events which gets confusing as to which events are supported,
but not counted and which are not supported.Before:
perf stat -ddd -- sleep 1
Performance counter stats for 'sleep 1':
0.571283 task-clock # 0.001 CPUs utilized
1 context-switches # 0.002 M/sec
0 CPU-migrations # 0.000 M/sec
157 page-faults # 0.275 M/sec
1,037,707 cycles # 1.816 GHz
stalled-cycles-frontend
stalled-cycles-backend
654,499 instructions # 0.63 insns per cycle
136,129 branches # 238.286 M/sec
branch-misses
L1-dcache-loads
L1-dcache-load-misses
LLC-loads
LLC-load-misses
L1-icache-loads
L1-icache-load-misses
dTLB-loads
dTLB-load-misses
iTLB-loads
iTLB-load-misses
L1-dcache-prefetches
L1-dcache-prefetch-misses1.001004836 seconds time elapsed
After:
perf stat -ddd -- sleep 1
Performance counter stats for 'sleep 1':
1.350326 task-clock # 0.001 CPUs utilized
2 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
157 page-faults # 0.116 M/sec
11,986 cycles # 0.009 GHz
stalled-cycles-frontend
stalled-cycles-backend
496,986 instructions # 41.46 insns per cycle
138,065 branches # 102.246 M/sec
7,245 branch-misses # 5.25% of all branches
L1-dcache-loads
L1-dcache-load-misses
LLC-loads
LLC-load-misses
L1-icache-loads
L1-icache-load-misses
dTLB-loads
dTLB-load-misses
iTLB-loads
iTLB-load-misses
L1-dcache-prefetches
L1-dcache-prefetch-misses1.002397333 seconds time elapsed
v1->v2:
changed supported type from int to boolv2->v3
fixed vertical alignment of new struct elementCc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo
19 May, 2011
2 commits
-
Print out the cache-miss percentage as well if the cache refs were
collected, for all the generic cache event types.Before:
11,103,723,230 dTLB-loads # 622.471 M/sec ( +- 0.30% )
87,065,337 dTLB-load-misses # 4.881 M/sec ( +- 0.90% )After:
11,353,713,242 dTLB-loads # 626.020 M/sec ( +- 0.35% )
113,393,472 dTLB-load-misses # 1.00% of all dTLB cache hits ( +- 0.49% )Also ASCII color highlight too high percentages, them when it's executed on the console.
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Steven Rostedt
Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.org
Signed-off-by: Ingo Molnar -
Print even more detailed statistics if requested via perf stat -d:
-d: detailed events, L1 and LLC data cache
-d -d: more detailed events, dTLB and iTLB events
-d -d -d: very detailed events, adding prefetch eventsFull output looks like this now:
Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
49,068 context-switches # 0.029 M/sec ( +- 16.66% )
8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
17,397 page-faults # 0.010 M/sec ( +- 0.46% )
2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
1,314,416,379 instructions # 0.56 insns per cycle
# 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]0.195622057 seconds time elapsed ( +- 6.84% )
Also clean up the attribute construction code to be appending, and factor
it out into add_default_attributes().Tweak the coverage percentage printout a bit, so that it's easier to view it
alongside the +- sttddev colum.Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Steven Rostedt
Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
Signed-off-by: Ingo Molnar
30 Apr, 2011
1 commit
-
Similar to perf-record, tell user about unsupported events
that will not be counted if invoked in verbose mode.e.g.,
$ perf stat -e dTLB-prefetch-misses -v -- sleep 1
dTLB-prefetch-misses event is not supported by the kernel.
dTLB-prefetch-misses: 0 0 0Performance counter stats for 'sleep 1':
dTLB-prefetch-misses
1.001884783 seconds time elapsed
Signed-off-by: David Ahern
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1304114655-10600-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar
29 Apr, 2011
5 commits
-
David Ahern reported this perf stat failure:
> # /tmp/build-perf/perf stat -- sleep 1
> Error: stalled-cycles-frontend event is not supported.
> Fatal: Not all events could be opened.
>
> This is a Dell R410 with an E5620 processor.Fail in a softer fashion on unknown/unsupported events.
Reported-by: David Ahern
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n006io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar -
Triple digit percentages do not fit otherwise.
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n005io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar -
Adjust to color thresholds to better match the percentages seen in
real workloads. Both are now a bit more sensitive.Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n004io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar -
Sample output:
Performance counter stats for './loop_1b':
873.691065 task-clock # 1.000 CPUs utilized
1 context-switches # 0.000 M/sec
1 CPU-migrations # 0.000 M/sec
96 page-faults # 0.000 M/sec
2,012,637,222 cycles # 2.304 GHz (66.58%)
1,001,397,911 stalled-cycles-frontend # 49.76% frontend cycles idle (66.58%)
7,523,398 stalled-cycles-backend # 0.37% backend cycles idle (66.76%)
2,004,551,046 instructions # 1.00 insns per cycle
# 0.50 stalled cycles per insn (66.80%)
1,001,304,992 branches # 1146.063 M/sec (66.76%)
39,453 branch-misses # 0.00% of all branches (66.64%)0.874046121 seconds time elapsed
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n003io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar -
Update perf tooling to deal with front-end and back-end stalled cycles events.
Add both the default 'perf stat' output.
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n002io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar
28 Apr, 2011
2 commits
-
Instead of failing on an unknown event, when new perf stat is run on
older kernels:$ ./perf stat true
Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
may provide additional information.Fatal: Not all events could be opened.
Just ignore EINVAL and ENOSYS, we'll print the results as not counted:
Performance counter stats for 'true':
0.239483 task-clock # 0.493 CPUs utilized
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
86 page-faults # 0.359 M/sec
704,766 cycles # 2.943 GHz
stalled-cycles
381,961 instructions # 0.54 insns per cycle
69,626 branches # 290.735 M/sec
4,594 branch-misses # 6.60% of all branches0.000485883 seconds time elapsed
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.org
Signed-off-by: Ingo Molnar -
--sync will tell perf stat to run sync() before starting a command.
This allows IO-heavy tests to be used with --repeat, without one
iteration impacting the other.Elapsed time will stabilize for example:
before: 3.971525714 seconds time elapsed ( +- 8.56% )
after: 3.211098537 seconds time elapsed ( +- 1.52% )So measurements will be more accurate.
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar
27 Apr, 2011
7 commits
-
Before:
|
| Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
|
| 71,321,607 instructions:u # 0.42 insns per cycle ( +- 0.00% )
| 168,040,009 cycles:u # 0.000 GHz ( +- 0.81% )
|
| 1.468002368 seconds time elapsed ( +- 1.33% )
|After:
|
| Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
|
| 71,321,607 instructions:u # 0.42 insns per cycle ( +- 0.00% )
| 168,040,009 cycles:u # 0.000 GHz ( +- 0.81% )
|
| 1.468002368 seconds time elapsed ( +- 1.33% )
|The last column (stddev noise) is properly aligned, vertically.
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn0dsrm@git.kernel.org
Signed-off-by: Ingo Molnar -
Add the new -d/--detailed flag, which generates a pretty detailed event list:
Performance counter stats for './hackbench 10' (10 runs):
1514.287888 task-clock # 10.897 CPUs utilized ( +- 3.05% )
39,698 context-switches # 0.026 M/sec ( +- 12.19% )
8,147 CPU-migrations # 0.005 M/sec ( +- 16.55% )
17,918 page-faults # 0.012 M/sec ( +- 0.37% )
2,944,504,050 cycles # 1.944 GHz ( +- 3.89% ) (32.60%)
1,043,971,283 stalled-cycles # 35.45% of all cycles are idle ( +- 5.22% ) (44.48%)
1,655,906,768 instructions # 0.56 insns per cycle
# 0.63 stalled cycles per insn ( +- 1.95% ) (55.09%)
338,832,373 branches # 223.757 M/sec ( +- 1.96% ) (64.47%)
3,892,416 branch-misses # 1.15% of all branches ( +- 5.49% ) (73.12%)
606,410,482 L1-dcache-loads # 400.459 M/sec ( +- 1.29% ) (71.21%)
31,204,395 L1-dcache-load-misses # 5.15% of all L1-dcache hits ( +- 3.04% ) (60.43%)
3,922,751 LLC-loads # 2.590 M/sec ( +- 6.80% ) (46.87%)
5,037,288 LLC-load-misses # 3.327 M/sec ( +- 3.56% ) (13.00%)0.138966828 seconds time elapsed ( +- 4.11% )
This can be used "at a glance" for narrower analysis.
-d can also be used in addition to other -e events, to further expand an event list.
Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-cxs98quixs3qyvdqx3goojc4@git.kernel.org
Signed-off-by: Ingo Molnar -
Print out this kind of l1-dcache-misses percentage:
Performance counter stats for './bw_tcp localhost':
29,956,262,201 cycles # 3.002 GHz (scaled from 85.14%)
8,255,209,558 stalled-cycles # 27.56% of all cycles are idle (scaled from 86.56%)
1,206,130,308 l1-dcache-misses # 40.49% of all L1-dcache hits (scaled from 86.30%)
2,978,756,779 l1-dcache-refs # 298.512 M/sec (scaled from 70.02%)
8,861,956,159 instructions # 0.30 insns per cycle
# 0.93 stalled cycles per insn (scaled from 84.27%)
1,644,306,068 branches # 164.782 M/sec (scaled from 86.43%)
74,778,443 branch-misses # 4.55% of all branches (scaled from 70.69%)
9978.695711 task-clock # 0.693 CPUs utilized14.404347983 seconds time elapsed
And color the result depending on the severity of cache-trashing.
Acked-by: Peter Zijlstra
Acked-by Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-54gmz0zymaid84zcs7joq02p@git.kernel.org
Signed-off-by: Ingo Molnar -
Print the missed-branches percentage with different warning level ASCII colors,
as the percentage passes the 5%/10%/20% thresholds.These thresholds are set to relatively low levels, because on most CPUs even a
moderate percentage of branch-misses already shows up as a slowdown.Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-ybqukg7p86leiup7gl03ecgk@git.kernel.org
Signed-off-by: Ingo Molnar -
Print the stalled-cycles percentage with different warning level ASCII colors,
as the percentage passes the 25%/50%/75% thresholds.Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-e25zz44rcms7mu9az4fu5zp0@git.kernel.org
Signed-off-by: Ingo Molnar -
Before:
0 CPU-migrations # 0.000 M/sec ( +- -nan% )
After:
0 CPU-migrations # 0.000 M/sec ( +- 0.00% )
Also factor out the noise printing function.
Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-z89h2v1bk1mikcbsf7e6v34q@git.kernel.org
Signed-off-by: Ingo Molnar -
The new default output looks like this:
Performance counter stats for './loop_1b_instructions':
236.010686 task-clock # 0.996 CPUs utilized
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
99 page-faults # 0.000 M/sec
756,487,646 cycles # 3.205 GHz
354,938,996 stalled-cycles # 46.92% of all cycles are idle
1,001,403,797 instructions # 1.32 insns per cycle
# 0.35 stalled cycles per insn
100,279,773 branches # 424.895 M/sec
12,646 branch-misses # 0.013 % of all branches0.236902540 seconds time elapsed
We dropped cache-refs and cache-misses and added stalled-cycles - this is a
more generic "how well utilized is the CPU" metric.If the stalled-cycles ratio is too high then more specific measurements can be
taken to figure out the source of the inefficiency.Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-pbpl2l4mn797s69bclfpwkwn@git.kernel.org
Signed-off-by: Ingo Molnar