23 Jan, 2011
1 commit
-
Using %L[uxd] has issues in some architectures, like on ppc64. Fix it
by making our 64 bit integers typedefs of stdint.h types and using
PRI[ux]64 like, for instance, git does.Reported by Denis Kirjanov that provided a patch for one case, I went
and changed all cases.Reported-by: Denis Kirjanov
Tested-by: Denis Kirjanov
LKML-Reference:
Cc: Denis Kirjanov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Pingtian Han
Cc: Stephane Eranian
Cc: Tom Zanussi
Signed-off-by: Arnaldo Carvalho de Melo
11 Jan, 2011
1 commit
-
We need to defer calling perf_evsel_list__delete() till after atexit
registered routines, because we need to traverse the events being
recorded at that time at least on 'perf record'.This fixes the problem reported by Thomas Renninger where cmd_record
called by cmd_timechart would not write the tracing data to the perf.data
file header because the evsel_list at atexit (control+C on 'perf timechart
record') time would be empty, being already deleted by run_builtin(),
and thus 'perf timechart' when trying to process such perf.data file would
die with:"no trace data in the file"
Problem introduced in 70d544d.
Reported-by: Thomas Renninger
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Renninger
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo
10 Jan, 2011
1 commit
-
For unsupported events (e.g., H/W events when running in a VM)
perf stat currently fails with the error message:Error: open_counter returned with 2 (No such file or directory).
/bin/dmesg may provide additional information.Fatal: Not all events could be opened.
dmesg is of no help and it is not clear as to why it fails to
open the counter. This patch changes the error message toError: cache-misses event is not supported.
Fatal: Not all events could be opened.Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: a.p.zijlstra@chello.nl
LPU-Reference:
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo
07 Jan, 2011
1 commit
-
Since commit 69aad6f1(perf tools: Introduce event selectors), only
perf_event_attr::type and ::config are passed to event selector, which
makes perf tool not work correctly.For example, PEBS does not work because perf_event_attr::precise_ip is
not passed to the syscall.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Lin Ming
Signed-off-by: Arnaldo Carvalho de Melo
04 Jan, 2011
8 commits
-
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo -
So that later, we can pass the thread_map instance instead of
(thread_num, thread_map) for things like perf_evsel__open and friends,
just like was done with cpu_map.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo -
So that later, we can pass the cpu_map instance instead of (nr_cpus, cpu_map)
for things like perf_evsel__open and friends.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo -
Abstracting away the loops needed to create the various event fd handlers.
The users have to pass a confiruged perf->evsel.attr field, which is already
usable after perf_evsel__new (constructor) time, using defaults.Comes out of the ad-hoc routines in builtin-stat, that now uses it.
Fixed a small silly bug where we were die()ing before killing our
children, dysfunctional family this one 8-)Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo -
Making them hopefully generic enough to be used in 'perf test',
well see.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo -
Freeing all the possibly allocated resources, reducing complexity
on each tool exit path.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo -
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo -
Out of ad-hoc code and global arrays with hard coded sizes.
This is the first step on having a library that will be first
used on regression tests in the 'perf test' tool.[acme@felicio linux]$ size /tmp/perf.before
text data bss dec hex filename
1273776 97384 5104416 6475576 62cf38 /tmp/perf.before
[acme@felicio linux]$ size /tmp/perf.new
text data bss dec hex filename
1275422 97416 1392416 2765254 2a31c6 /tmp/perf.newCc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo
02 Dec, 2010
2 commits
-
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-missesWorks also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-missesThis second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.Cc: David S. Miller
Cc: Frederik Weisbecker
Cc: Ingo Molnar
Cc: paulus@samba.org
Cc: Peter Zijlstra
Cc: Robert Richter
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo -
[acme@mica linux]$ perf stat ls > /dev/null
Performance counter stats for 'ls':
1.512532 task-clock-msecs # 0.801 CPUs
2 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
241 page-faults # 0.159 M/sec
2,973,331 cycles # 1965.797 M/sec
1,460,802 instructions # 0.491 IPC
314,642 branches # 208.023 M/sec
18,475 branch-misses # 5.872 %
cache-references
cache-misses0.001887676 seconds time elapsed
To get the previous behaviour just use --no-big-num:
[acme@mica linux]$ perf stat --no-big-num ls > /dev/null
Performance counter stats for 'ls':
1.468014 task-clock-msecs # 0.795 CPUs
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
241 page-faults # 0.164 M/sec
2900254 cycles # 1975.631 M/sec
1437991 instructions # 0.496 IPC
310905 branches # 211.786 M/sec
17912 branch-misses # 5.761 %
cache-references
cache-misses0.001845435 seconds time elapsed
[acme@mica linux]$
Suggested-by: Ingo Molnar
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Stephane Eranian
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo
20 Nov, 2010
2 commits
-
This patch makes several changes to "perf stat":
- "perf stat" will no longer go ahead and run the application when one or
more of the specified events could not be opened.
- Use error() and die() instead of pr_err() so that the output is more
consistent with "perf top" and "perf record".
- Handle permission errors in a more robust way, and in a similar way to
"perf record" and "perf top".In addition, the sys_perf_event_open() error handling of "perf top" and "perf
record" is made more consistent and adds the following phrase when an event
doesn't open (with something ther than an access or permission error):"/bin/dmesg may provide additional information."
This is added because kernel code doesn't have a good way of expressing
detailed errors to user space, so its only avenue is to use printk's. However,
many users may not think of looking at dmesg to find out why an event is being
rejected.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Ian Munsie
Cc: Michael Ellerman
LKML-Reference:
Signed-off-by: Corey Ashford
Signed-off-by: Arnaldo Carvalho de Melo -
This patch adds a new -A option to perf stat. If specified then perf stat does
not aggregate counts across all monitored CPUs in system-wide mode, i.e., when
using -a. This option is not supported in per-thread mode.Being able to get a per-cpu breakdown is useful to detect imbalances between
CPUs when running a uniform workload than spans all monitored CPUs.The second version corrects the missing cpumap[] support, so that it works when
the -C option is used.The third version fixes a missing cpumap[] in print_counter() and removes a
stray patch in builtin-trace.c.Examples on a 4-way system:
# perf stat -a -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
9592808135 cycles
3490380006 instructions # 0.364 IPC
1.001584632 seconds time elapsed# perf stat -a -A -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
CPU0 2398163767 cycles
CPU1 2398180817 cycles
CPU2 2398217115 cycles
CPU3 2398247483 cycles
CPU0 872282046 instructions # 0.364 IPC
CPU1 873481776 instructions # 0.364 IPC
CPU2 872638127 instructions # 0.364 IPC
CPU3 872437789 instructions # 0.364 IPC
1.001556052 seconds time elapsedCc: David S. Miller
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Robert Richter
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo
05 Jun, 2010
1 commit
-
This patch adds a -C option to stat, record, top to designate a list of CPUs to
monitor. CPUs can be specified as a comma-separated list or ranges, no space
allowed.Examples:
$ perf record -a -C0-1,4-7 sleep 1
$ perf top -C0-4
$ perf stat -a -C1,2,3,4 sleep 1With perf record in per-thread mode with inherit mode on, samples are collected
only when the thread runs on the designated CPUs.The -C option does not turn on system-wide mode automatically.
Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo
19 May, 2010
1 commit
-
It is hard to read very large numbers so provide an option to perf stat
to separate thousands using a separator. The patch leverages the locale
support of stdio. You need to set your LC_NUMERIC appropriately, for
instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
feature. This way existing scripts parsing the output do not need to be
changed. Here is an example.$ perf stat noploop 2
noploop for 2 secondsPerformance counter stats for 'noploop 2':
1998.347031 task-clock-msecs # 0.998 CPUs
61 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
118 page-faults # 0.000 M/sec
4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%)
2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%)
2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%)
40,267 branch-misses # 0.002 % (scaled from 30.04%)
2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%)
53,725 cache-misses # 0.027 M/sec (scaled from 30.02%)2.001393933 seconds time elapsed
$ perf stat -B noploop 2
noploop for 2 secondsPerformance counter stats for 'noploop 2':
1998.297883 task-clock-msecs # 0.998 CPUs
59 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
119 page-faults # 0.000 M/sec
4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%)
2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%)
2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%)
25,650 branch-misses # 0.001 % (scaled from 30.05%)
2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%)
47,097 cache-misses # 0.024 M/sec (scaled from 30.02%)2.001391016 seconds time elapsed
Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo
14 May, 2010
1 commit
-
By default, event inheritance across fork and pthread_create was on but the -i
option of stat and record, which enabled inheritance, led to believe it was off
by default.This patch fixes this logic by inverting the meaning of the -i option. By
default inheritance is on whether you attach to a process (-p), a thread (-t)
or start a process. If you pass -i, then you turn off inheritance. Turning off
inheritance if you don't need it, helps limit perf resource usage as well.The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not
start the counters.Acked-by: Frederic Weisbecker
Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo
14 Apr, 2010
1 commit
-
Parsing an option from the command line with OPT_BOOLEAN on a
bool data type would not work on a big-endian machine due to the
manner in which the boolean was being cast into an int and
incremented. For example, running 'perf probe --list' on a
PowerPC machine would fail to properly set the list_events bool
and would therefore print out the usage information and
terminate.This patch makes OPT_BOOLEAN work as expected with a bool
datatype. For cases where the original OPT_BOOLEAN was
intentionally being used to increment an int each time it was
passed in on the command line, this patch introduces OPT_INCR
with the old behaviour of OPT_BOOLEAN (the verbose variable is
currently the only such example of this).I have reviewed every use of OPT_BOOLEAN to verify that a true
C99 bool was passed. Where integers were used, I verified that
they were only being used for boolean logic and changed them to
bools to ensure that they would not be mistakenly used as ints.
The major exception was the verbose variable which now uses
OPT_INCR instead of OPT_BOOLEAN.Signed-off-by: Ian Munsie
Acked-by: David S. Miller
Cc: # NOTE: wont apply to .3[34].x cleanly, please backport
Cc: Git development list
Cc: Ian Munsie
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: KOSAKI Motohiro
Cc: Hitoshi Mitake
Cc: Rusty Russell
Cc: Frederic Weisbecker
Cc: Eric B Munson
Cc: Valdis.Kletnieks@vt.edu
Cc: WANG Cong
Cc: Thiago Farina
Cc: Masami Hiramatsu
Cc: Xiao Guangrong
Cc: Jaswinder Singh Rajput
Cc: Arjan van de Ven
Cc: OGAWA Hirofumi
Cc: Mike Galbraith
Cc: Tom Zanussi
Cc: Anton Blanchard
Cc: John Kacur
Cc: Li Zefan
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar
23 Mar, 2010
1 commit
-
Before:
[acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
Performance counter stats for 'sleep 1s':
task-clock-msecs
context-switches
CPU-migrations
page-faults
cycles
instructions
branches
branch-misses
cache-references
cache-misses1.016998463 seconds time elapsed
[acme@doppio linux-2.6-tip]$
Now:
[acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
No permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid.
[acme@doppio linux-2.6-tip]$Reported-by: Ingo Molnar
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar
18 Mar, 2010
2 commits
-
Parameter --pid (or -p) of perf currently means a thread-wide
collection. For exmaple, if a process whose id is 8888 has 10
threads, 'perf top -p 8888' just collects the main thread
statistics. That's misleading. Users are used to attach a whole
process when debugging a process by gdb. To follow normal usage
style, the patch change --pid to process-wide collection and add
--tid (-t) to mean a thread-wide collection.Usage example is:
# perf top -p 8888
# perf record -p 8888 -f sleep 10
# perf stat -p 8888 -f sleep 10Above commands collect the statistics of all threads of process
8888.Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Cc: zhiteng.huang@intel.com
Cc: Zachary Amsden
LKML-Reference:
Signed-off-by: Ingo Molnar -
Command 'perf stat' doesn't enable counters when collecting an
existing (by -p) process or system-wide statistics. Fix the
issue.Change the condition of fork/exec subcommand. If there is a
subcommand parameter, perf always forks/execs it. The usage
example is:# perf stat -a sleep 10
So this command could collect statistics for 10 seconds
precisely. User still could stop it by CTRL+C. Without the new
capability, user could only use CTRL+C to stop it without
precise time clock.Another issue is 'perf stat -a' consumes 100% time of a full
single logical cpu. It has a bad impact on running workload.Fix it by adding a sleep(1) in the while(!done) loop in function
run_perf_stat.Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Marcelo Tosatti
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Gleb Natapov
Cc: Zachary Amsden
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
11 Mar, 2010
1 commit
-
At present, the perf subcommands that do system-wide monitoring
(perf stat, perf record and perf top) don't work properly unless
the online cpus are numbered 0, 1, ..., N-1. These tools ask
for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
and then try to create events for cpus 0, 1, ..., N-1.This creates problems for systems where the online cpus are
numbered sparsely. For example, a POWER6 system in
single-threaded mode (i.e. only running 1 hardware thread per
core) will have only even-numbered cpus online.This fixes the problem by reading the /sys/devices/system/cpu/online
file to find out which cpus are online. The code that does that is in
tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
function that sets up a cpumap[] array and returns the number of
online cpus. If /sys/devices/system/cpu/online can't be read or
can't be parsed successfully, it falls back to using sysconf to
ask how many cpus are online and sets up an identity map in cpumap[].The perf record, perf stat and perf top code then calls
read_cpu_map() in the system-wide monitoring case (instead of
sysconf) and uses cpumap[] to get the cpu numbers to pass to
perf_event_open.Signed-off-by: Paul Mackerras
Cc: Anton Blanchard
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar
13 Jan, 2010
1 commit
-
current pid option doesn't work for perf stat. Change it to what
perf record --pid acts as.Signed-off-by: Liming Wang
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
15 Nov, 2009
1 commit
-
The ratio between the number of events and the time elapsed makes
sense only if task-clock event is counted. Otherwise it will be
simply a (confusing)# 0.000 M/sec
This patch outputs the ratio only if task-clock event is counted.
Some test examples of before and after:Before:
[lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1
Performance counter stats for 'sleep 1':
1367818 branch-misses # 0.000 M/sec
1.001494325 seconds time elapsed
After (without task-clock):
[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1
Performance counter stats for 'sleep 1':
1135044 branch-misses
1.001370775 seconds time elapsed
After (with task-clock):
[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1
Performance counter stats for 'sleep 1':
1070111 branch-misses # 0.534 M/sec
2002.730893 task-clock-msecs # 1.999 CPUs1.001640292 seconds time elapsed
Signed-off-by: Lucas De Marchi
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar
19 Oct, 2009
4 commits
-
Count branches first, cache-misses second. The reason is that
on x86 branches are not counted by all counters on all CPUs.Before:
Performance counter stats for 'ls':
0.756653 task-clock-msecs # 0.802 CPUs
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
250 page-faults # 0.330 M/sec
2375725 cycles # 3139.781 M/sec
1628129 instructions # 0.685 IPC
19643 cache-references # 25.960 M/sec
4608 cache-misses # 6.090 M/sec
342532 branches # 452.694 M/sec
branch-misses0.000943356 seconds time elapsed
After:
Performance counter stats for 'ls':
1.056734 task-clock-msecs # 0.859 CPUs
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
259 page-faults # 0.245 M/sec
3345932 cycles # 3166.295 M/sec
3074090 instructions # 0.919 IPC
616928 branches # 583.806 M/sec
39279 branch-misses # 6.367 %
21312 cache-references # 20.168 M/sec
3661 cache-misses # 3.464 M/sec0.001230551 seconds time elapsed
(also prettify the printout of branch misses, in case it's
getting scaled.)Cc: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
---
tools/perf/builtin-stat.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c373683..95a55ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },};
---
tools/perf/builtin-stat.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 95a55ea..90e0a26 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -50,17 +50,17 @@static struct perf_event_attr default_attrs[] = {
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES},
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
-
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
+
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },};
-
Clean up the array definition to be vertically aligned.
No functional effects.
Cc: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
---
tools/perf/builtin-stat.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c373683..95a55ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },};
-
Adds performance event information about branches
and branch misses to the default output of perf stat.Signed-off-by: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
When we count both branches and branch-misses it is useful to
print out the percentage of branch-misses:# perf stat -e branches -e branch-misses /bin/true
Performance counter stats for '/bin/true':
401684 branches # 0.000 M/sec
23301 branch-misses # 5.801 %Signed-off-by: Anton Blanchard
Cc: paulus@samba.org
Cc: a.p.zijlstra@chello.nl
LKML-Reference:
Signed-off-by: Ingo Molnar
05 Oct, 2009
1 commit
-
If we launch the child on behalf of the user, ensure that it dies
along with ourselves when we are interrupted.Signed-off-by: Chris Wilson
Cc: Chris Wilson
LKML-Reference:
Signed-off-by: Ingo Molnar
22 Sep, 2009
1 commit
-
Before:
0 sched:sched_switch # nan M/sec
After:
0 sched:sched_switch # 0.000 M/sec
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
21 Sep, 2009
1 commit
-
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILESfor N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
doneFILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )Suggested-by: Stephane Eranian
Acked-by: Peter Zijlstra
Acked-by: Paul Mackerras
Reviewed-by: Arjan van de Ven
Cc: Mike Galbraith
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Benjamin Herrenschmidt
Cc: David Howells
Cc: Kyle McMartin
Cc: Martin Schwidefsky
Cc: "David S. Miller"
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
05 Sep, 2009
1 commit
-
Remove some, now useless, global storage.
Don't calculate the stddev when not needed.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
04 Sep, 2009
4 commits
-
Use the more advanced single pass variance algorithm outlined
on the wikipedia page. This is numerically more stable for
larger sample sets.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
When we're computing the mean by sampling the distribution,
then the std dev of the mean is related to the std dev of the
sample set by:stddev_mean = std_dev / sqrt(N)
Which is exactly what we want.
This results in the error on the mean decreasing with
increasing number of samples.Also fix the scaled == -1, aka not counted case.
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since we don't need all the individual samples to calculate the
error remove both the limit and the storage overhead associated
with that.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
The current noise computation does:
\Sum abs(n_i - avg(n)) * N^-1.5
Which is (afaik) not a regular noise function, and needs the
complete sample set available to post-process.Change this to use a regular stddev computation which can be
done by keeping a two sums:stddev = sqrt( 1/N (\Sum n_i^2) - avg(n)^2 )
For which we only need to keep \Sum n_i and \Sum n_i^2.
Signed-off-by: Peter Zijlstra
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Aug, 2009
1 commit
-
Librarize trace_event() helper so that perf trace can use it
too. Also clean up the debug.h includes a bit.It's not good to have it included in perf.h because it doesn't
make it flexible against other headers it may need (headers
that can also depend on perf.h and then create a recursive
header dependency).Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar
12 Aug, 2009
1 commit
-
Factorize multiple definitions of high level dso helpers into the
symbol source file.The side effect is a general export of the verbose and eprintf
debugging helpers into a new file dedicated to debugging purposes.Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Brice Goglin