Eric Lee / smarc-fsl-linux-kernel

23 Jan, 2011

1 commit

9486aa387 perf tools: Fix 64 bit integer format strings ... Browse Code »

Using %L[uxd] has issues in some architectures, like on ppc64. Fix it
by making our 64 bit integers typedefs of stdint.h types and using
PRI[ux]64 like, for instance, git does.

Reported by Denis Kirjanov that provided a patch for one case, I went
and changed all cases.

Reported-by: Denis Kirjanov
Tested-by: Denis Kirjanov
LKML-Reference:
Cc: Denis Kirjanov
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Pingtian Han
Cc: Stephane Eranian
Cc: Tom Zanussi
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-23 09:41:57 +0800

11 Jan, 2011

1 commit

bd3bfe9ed perf evsel: Fix order of event list deletion ... Browse Code »

We need to defer calling perf_evsel_list__delete() till after atexit
registered routines, because we need to traverse the events being
recorded at that time at least on 'perf record'.

This fixes the problem reported by Thomas Renninger where cmd_record
called by cmd_timechart would not write the tracing data to the perf.data
file header because the evsel_list at atexit (control+C on 'perf timechart
record') time would be empty, being already deleted by run_builtin(),
and thus 'perf timechart' when trying to process such perf.data file would
die with:

"no trace data in the file"

Problem introduced in 70d544d.

Reported-by: Thomas Renninger
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Renninger
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-11 22:51:03 +0800

10 Jan, 2011

1 commit

5a3446bc6 perf stat: better error message for unsupported events ... Browse Code »

For unsupported events (e.g., H/W events when running in a VM)
perf stat currently fails with the error message:

Error: open_counter returned with 2 (No such file or directory).
/bin/dmesg may provide additional information.

Fatal: Not all events could be opened.

dmesg is of no help and it is not clear as to why it fails to
open the counter. This patch changes the error message to

Error: cache-misses event is not supported.
Fatal: Not all events could be opened.

Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: a.p.zijlstra@chello.nl
LPU-Reference:
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo

David Ahern
2011-01-10 21:34:53 +0800

07 Jan, 2011

1 commit

23a2f3ab4 perf tools: Pass whole attr to event selectors ... Browse Code »

Since commit 69aad6f1(perf tools: Introduce event selectors), only
perf_event_attr::type and ::config are passed to event selector, which
makes perf tool not work correctly.

For example, PEBS does not work because perf_event_attr::precise_ip is
not passed to the syscall.

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Lin Ming
Signed-off-by: Arnaldo Carvalho de Melo

Lin Ming
2011-01-07 11:44:36 +0800

04 Jan, 2011

8 commits

86bd5e860 perf evsel: Use {cpu,thread}_map to shorten list of parameters ... Browse Code »

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 10:24:36 +0800
5c98d466e perf tools: Refactor all_tids to hold nr and the map ... Browse Code »

So that later, we can pass the thread_map instance instead of
(thread_num, thread_map) for things like perf_evsel__open and friends,
just like was done with cpu_map.

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 10:24:16 +0800
60d567e2d perf tools: Refactor cpumap to hold nr and the map ... Browse Code »

So that later, we can pass the cpu_map instance instead of (nr_cpus, cpu_map)
for things like perf_evsel__open and friends.

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 10:23:55 +0800
48290609c perf evsel: Introduce per cpu and per thread open helpers ... Browse Code »

Abstracting away the loops needed to create the various event fd handlers.

The users have to pass a confiruged perf->evsel.attr field, which is already
usable after perf_evsel__new (constructor) time, using defaults.

Comes out of the ad-hoc routines in builtin-stat, that now uses it.

Fixed a small silly bug where we were die()ing before killing our
children, dysfunctional family this one 8-)

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 10:23:27 +0800
c52b12ed2 perf evsel: Steal the counter reading routines from stat ... Browse Code »

Making them hopefully generic enough to be used in 'perf test',
well see.

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 10:22:55 +0800
70d544d05 perf evsel: Delete the event selectors at exit ... Browse Code »

Freeing all the possibly allocated resources, reducing complexity
on each tool exit path.

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 02:51:39 +0800
daec78a09 perf evsel: Adopt MATCH_EVENT macro from 'stat' ... Browse Code »

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 02:49:44 +0800
69aad6f1e perf tools: Introduce event selectors ... Browse Code »

Out of ad-hoc code and global arrays with hard coded sizes.

This is the first step on having a library that will be first
used on regression tests in the 'perf test' tool.

[acme@felicio linux]$ size /tmp/perf.before
text data bss dec hex filename
1273776 97384 5104416 6475576 62cf38 /tmp/perf.before
[acme@felicio linux]$ size /tmp/perf.new
text data bss dec hex filename
1275422 97416 1392416 2765254 2a31c6 /tmp/perf.new

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2011-01-04 02:39:04 +0800

02 Dec, 2010

2 commits

d7470b6af perf stat: Add csv-style output ... Browse Code »

This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.

Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses

Works also in non-aggregated mode:

$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses

This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.

Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.

Cc: David S. Miller
Cc: Frederik Weisbecker
Cc: Ingo Molnar
Cc: paulus@samba.org
Cc: Peter Zijlstra
Cc: Robert Richter
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2010-12-02 05:47:41 +0800
201e0b06e perf stat: Use --big-num format by default ... Browse Code »

[acme@mica linux]$ perf stat ls > /dev/null

Performance counter stats for 'ls':

1.512532 task-clock-msecs # 0.801 CPUs
2 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
241 page-faults # 0.159 M/sec
2,973,331 cycles # 1965.797 M/sec
1,460,802 instructions # 0.491 IPC
314,642 branches # 208.023 M/sec
18,475 branch-misses # 5.872 %
cache-references
cache-misses

0.001887676 seconds time elapsed

To get the previous behaviour just use --no-big-num:

[acme@mica linux]$ perf stat --no-big-num ls > /dev/null

Performance counter stats for 'ls':

1.468014 task-clock-msecs # 0.795 CPUs
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
241 page-faults # 0.164 M/sec
2900254 cycles # 1975.631 M/sec
1437991 instructions # 0.496 IPC
310905 branches # 211.786 M/sec
17912 branch-misses # 5.761 %
cache-references
cache-misses

0.001845435 seconds time elapsed

[acme@mica linux]$

Suggested-by: Ingo Molnar
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Stephane Eranian
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2010-12-02 04:22:50 +0800

20 Nov, 2010

2 commits

d9cf837ef perf stat: Change and clean up sys_perf_event_open error handling ... Browse Code »

This patch makes several changes to "perf stat":

- "perf stat" will no longer go ahead and run the application when one or
more of the specified events could not be opened.
- Use error() and die() instead of pr_err() so that the output is more
consistent with "perf top" and "perf record".
- Handle permission errors in a more robust way, and in a similar way to
"perf record" and "perf top".

In addition, the sys_perf_event_open() error handling of "perf top" and "perf
record" is made more consistent and adds the following phrase when an event
doesn't open (with something ther than an access or permission error):

"/bin/dmesg may provide additional information."

This is added because kernel code doesn't have a good way of expressing
detailed errors to user space, so its only avenue is to use printk's. However,
many users may not think of looking at dmesg to find out why an event is being
rejected.

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Ian Munsie
Cc: Michael Ellerman
LKML-Reference:
Signed-off-by: Corey Ashford
Signed-off-by: Arnaldo Carvalho de Melo

Corey Ashford
2010-11-20 23:04:15 +0800
f5b4a9c3a perf stat: Add no-aggregation mode to -a ... Browse Code »

This patch adds a new -A option to perf stat. If specified then perf stat does
not aggregate counts across all monitored CPUs in system-wide mode, i.e., when
using -a. This option is not supported in per-thread mode.

Being able to get a per-cpu breakdown is useful to detect imbalances between
CPUs when running a uniform workload than spans all monitored CPUs.

The second version corrects the missing cpumap[] support, so that it works when
the -C option is used.

The third version fixes a missing cpumap[] in print_counter() and removes a
stray patch in builtin-trace.c.

Examples on a 4-way system:

# perf stat -a -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
9592808135 cycles
3490380006 instructions # 0.364 IPC
1.001584632 seconds time elapsed

# perf stat -a -A -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
CPU0 2398163767 cycles
CPU1 2398180817 cycles
CPU2 2398217115 cycles
CPU3 2398247483 cycles
CPU0 872282046 instructions # 0.364 IPC
CPU1 873481776 instructions # 0.364 IPC
CPU2 872638127 instructions # 0.364 IPC
CPU3 872437789 instructions # 0.364 IPC
1.001556052 seconds time elapsed

Cc: David S. Miller
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Robert Richter
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2010-11-20 02:16:53 +0800

05 Jun, 2010

1 commit

c45c6ea2e perf tools: Add the ability to specify list of cpus to monitor ... Browse Code »

This patch adds a -C option to stat, record, top to designate a list of CPUs to
monitor. CPUs can be specified as a comma-separated list or ranges, no space
allowed.

Examples:
$ perf record -a -C0-1,4-7 sleep 1
$ perf top -C0-4
$ perf stat -a -C1,2,3,4 sleep 1

With perf record in per-thread mode with inherit mode on, samples are collected
only when the thread runs on the designated CPUs.

The -C option does not turn on system-wide mode automatically.

Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2010-06-05 20:33:01 +0800

19 May, 2010

1 commit

5af52b51f perf stat: add perf stat -B to pretty print large numbers ... Browse Code »

It is hard to read very large numbers so provide an option to perf stat
to separate thousands using a separator. The patch leverages the locale
support of stdio. You need to set your LC_NUMERIC appropriately, for
instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
feature. This way existing scripts parsing the output do not need to be
changed. Here is an example.

$ perf stat noploop 2
noploop for 2 seconds

Performance counter stats for 'noploop 2':

1998.347031 task-clock-msecs # 0.998 CPUs
61 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
118 page-faults # 0.000 M/sec
4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%)
2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%)
2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%)
40,267 branch-misses # 0.002 % (scaled from 30.04%)
2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%)
53,725 cache-misses # 0.027 M/sec (scaled from 30.02%)

2.001393933 seconds time elapsed

$ perf stat -B noploop 2
noploop for 2 seconds

Performance counter stats for 'noploop 2':

1998.297883 task-clock-msecs # 0.998 CPUs
59 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
119 page-faults # 0.000 M/sec
4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%)
2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%)
2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%)
25,650 branch-misses # 0.001 % (scaled from 30.05%)
2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%)
47,097 cache-misses # 0.024 M/sec (scaled from 30.02%)

2.001391016 seconds time elapsed

Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2010-05-19 10:03:22 +0800

14 May, 2010

1 commit

2e6cdf996 perf tools: change event inheritance logic in stat and record ... Browse Code »

By default, event inheritance across fork and pthread_create was on but the -i
option of stat and record, which enabled inheritance, led to believe it was off
by default.

This patch fixes this logic by inverting the meaning of the -i option. By
default inheritance is on whether you attach to a process (-p), a thread (-t)
or start a process. If you pass -i, then you turn off inheritance. Turning off
inheritance if you don't need it, helps limit perf resource usage as well.

The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not
start the counters.

Acked-by: Frederic Weisbecker
Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2010-05-14 03:39:12 +0800

14 Apr, 2010

1 commit

c05556421 perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() ... Browse Code »

Parsing an option from the command line with OPT_BOOLEAN on a
bool data type would not work on a big-endian machine due to the
manner in which the boolean was being cast into an int and
incremented. For example, running 'perf probe --list' on a
PowerPC machine would fail to properly set the list_events bool
and would therefore print out the usage information and
terminate.

This patch makes OPT_BOOLEAN work as expected with a bool
datatype. For cases where the original OPT_BOOLEAN was
intentionally being used to increment an int each time it was
passed in on the command line, this patch introduces OPT_INCR
with the old behaviour of OPT_BOOLEAN (the verbose variable is
currently the only such example of this).

I have reviewed every use of OPT_BOOLEAN to verify that a true
C99 bool was passed. Where integers were used, I verified that
they were only being used for boolean logic and changed them to
bools to ensure that they would not be mistakenly used as ints.
The major exception was the verbose variable which now uses
OPT_INCR instead of OPT_BOOLEAN.

Signed-off-by: Ian Munsie
Acked-by: David S. Miller
Cc: # NOTE: wont apply to .3[34].x cleanly, please backport
Cc: Git development list
Cc: Ian Munsie
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: KOSAKI Motohiro
Cc: Hitoshi Mitake
Cc: Rusty Russell
Cc: Frederic Weisbecker
Cc: Eric B Munson
Cc: Valdis.Kletnieks@vt.edu
Cc: WANG Cong
Cc: Thiago Farina
Cc: Masami Hiramatsu
Cc: Xiao Guangrong
Cc: Jaswinder Singh Rajput
Cc: Arjan van de Ven
Cc: OGAWA Hirofumi
Cc: Mike Galbraith
Cc: Tom Zanussi
Cc: Anton Blanchard
Cc: John Kacur
Cc: Li Zefan
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Ian Munsie
2010-04-14 17:26:44 +0800

23 Mar, 2010

1 commit

084ab9f86 perf stat: Better report failure to collect system wide stats ... Browse Code »

Before:

[acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s

Performance counter stats for 'sleep 1s':

task-clock-msecs
context-switches
CPU-migrations
page-faults
cycles
instructions
branches
branch-misses
cache-references
cache-misses

1.016998463 seconds time elapsed

[acme@doppio linux-2.6-tip]$

Now:

[acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
No permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid.
[acme@doppio linux-2.6-tip]$

Reported-by: Ingo Molnar
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar

Arnaldo Carvalho de Melo
2010-03-23 01:47:35 +0800

18 Mar, 2010

2 commits

d6d901c23 perf events: Change perf parameter --pid to process-wide collection instead of thread-wide ... Browse Code »

Parameter --pid (or -p) of perf currently means a thread-wide
collection. For exmaple, if a process whose id is 8888 has 10
threads, 'perf top -p 8888' just collects the main thread
statistics. That's misleading. Users are used to attach a whole
process when debugging a process by gdb. To follow normal usage
style, the patch change --pid to process-wide collection and add
--tid (-t) to mean a thread-wide collection.

Usage example is:

# perf top -p 8888
# perf record -p 8888 -f sleep 10
# perf stat -p 8888 -f sleep 10

Above commands collect the statistics of all threads of process
8888.

Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Cc: zhiteng.huang@intel.com
Cc: Zachary Amsden
LKML-Reference:
Signed-off-by: Ingo Molnar

Zhang, Yanmin
2010-03-18 23:21:12 +0800
6be2850ef perf stat: Enable counters when collecting process-wide or system-wide data ... Browse Code »

Command 'perf stat' doesn't enable counters when collecting an
existing (by -p) process or system-wide statistics. Fix the
issue.

Change the condition of fork/exec subcommand. If there is a
subcommand parameter, perf always forks/execs it. The usage
example is:

# perf stat -a sleep 10

So this command could collect statistics for 10 seconds
precisely. User still could stop it by CTRL+C. Without the new
capability, user could only use CTRL+C to stop it without
precise time clock.

Another issue is 'perf stat -a' consumes 100% time of a full
single logical cpu. It has a bad impact on running workload.

Fix it by adding a sleep(1) in the while(!done) loop in function
run_perf_stat.

Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Marcelo Tosatti
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Gleb Natapov
Cc: Zachary Amsden
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Zhang, Yanmin
2010-03-18 23:21:11 +0800

11 Mar, 2010

1 commit

a12b51c47 perf tools: Fix sparse CPU numbering related bugs ... Browse Code »

At present, the perf subcommands that do system-wide monitoring
(perf stat, perf record and perf top) don't work properly unless
the online cpus are numbered 0, 1, ..., N-1. These tools ask
for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
and then try to create events for cpus 0, 1, ..., N-1.

This creates problems for systems where the online cpus are
numbered sparsely. For example, a POWER6 system in
single-threaded mode (i.e. only running 1 hardware thread per
core) will have only even-numbered cpus online.

This fixes the problem by reading the /sys/devices/system/cpu/online
file to find out which cpus are online. The code that does that is in
tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
function that sets up a cpumap[] array and returns the number of
online cpus. If /sys/devices/system/cpu/online can't be read or
can't be parsed successfully, it falls back to using sysconf to
ask how many cpus are online and sets up an identity map in cpumap[].

The perf record, perf stat and perf top code then calls
read_cpu_map() in the system-wide monitoring case (instead of
sysconf) and uses cpumap[] to get the cpu numbers to pass to
perf_event_open.

Signed-off-by: Paul Mackerras
Cc: Anton Blanchard
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul Mackerras
2010-03-11 20:36:53 +0800

13 Jan, 2010

1 commit

60666c630 perf tools: Fix --pid option for stat ... Browse Code »

current pid option doesn't work for perf stat. Change it to what
perf record --pid acts as.

Signed-off-by: Liming Wang
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Liming Wang
2010-01-13 17:09:08 +0800

15 Nov, 2009

1 commit

7255fe2a4 perf stat: Do not print ratio when task-clock event is not counted ... Browse Code »

The ratio between the number of events and the time elapsed makes
sense only if task-clock event is counted. Otherwise it will be
simply a (confusing)

# 0.000 M/sec

This patch outputs the ratio only if task-clock event is counted.
Some test examples of before and after:

Before:

[lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1

Performance counter stats for 'sleep 1':

1367818 branch-misses # 0.000 M/sec

1.001494325 seconds time elapsed

After (without task-clock):

[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1

Performance counter stats for 'sleep 1':

1135044 branch-misses

1.001370775 seconds time elapsed

After (with task-clock):

[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1

Performance counter stats for 'sleep 1':

1070111 branch-misses # 0.534 M/sec
2002.730893 task-clock-msecs # 1.999 CPUs

1.001640292 seconds time elapsed

Signed-off-by: Lucas De Marchi
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Lucas De Marchi
2009-11-15 22:25:50 +0800

19 Oct, 2009

4 commits

dd86e72ab perf stat: Count branches first ... Browse Code »

Count branches first, cache-misses second. The reason is that
on x86 branches are not counted by all counters on all CPUs.

Before:

Performance counter stats for 'ls':

0.756653 task-clock-msecs # 0.802 CPUs
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
250 page-faults # 0.330 M/sec
2375725 cycles # 3139.781 M/sec
1628129 instructions # 0.685 IPC
19643 cache-references # 25.960 M/sec
4608 cache-misses # 6.090 M/sec
342532 branches # 452.694 M/sec
branch-misses

0.000943356 seconds time elapsed

After:

Performance counter stats for 'ls':

1.056734 task-clock-msecs # 0.859 CPUs
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
259 page-faults # 0.245 M/sec
3345932 cycles # 3166.295 M/sec
3074090 instructions # 0.919 IPC
616928 branches # 583.806 M/sec
39279 branch-misses # 6.367 %
21312 cache-references # 20.168 M/sec
3661 cache-misses # 3.464 M/sec

0.001230551 seconds time elapsed

(also prettify the printout of branch misses, in case it's
getting scaled.)

Cc: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
---
tools/perf/builtin-stat.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c373683..95a55ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

};
---
tools/perf/builtin-stat.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 95a55ea..90e0a26 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -50,17 +50,17 @@

static struct perf_event_attr default_attrs[] = {

- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES},
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
-
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
+
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

};

Ingo Molnar
2009-10-19 19:36:32 +0800
56aab464f perf stat: Re-align the default_attrs[] array ... Browse Code »

Clean up the array definition to be vertically aligned.

No functional effects.

Cc: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
---
tools/perf/builtin-stat.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c373683..95a55ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

};

Ingo Molnar
2009-10-19 19:27:08 +0800
12133afff perf stat: Add branch performance events to default output ... Browse Code »

Adds performance event information about branches
and branch misses to the default output of perf stat.

Signed-off-by: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Tim Blechmann
2009-10-19 19:26:42 +0800
11018201b perf stat: Add branch performance metric ... Browse Code »

When we count both branches and branch-misses it is useful to
print out the percentage of branch-misses:

# perf stat -e branches -e branch-misses /bin/true

Performance counter stats for '/bin/true':

401684 branches # 0.000 M/sec
23301 branch-misses # 5.801 %

Signed-off-by: Anton Blanchard
Cc: paulus@samba.org
Cc: a.p.zijlstra@chello.nl
LKML-Reference:
Signed-off-by: Ingo Molnar

Anton Blanchard
2009-10-19 15:20:21 +0800

05 Oct, 2009

1 commit

933da83aa perf: Propagate term signal to child ... Browse Code »

If we launch the child on behalf of the user, ensure that it dies
along with ourselves when we are interrupted.

Signed-off-by: Chris Wilson
Cc: Chris Wilson
LKML-Reference:
Signed-off-by: Ingo Molnar

Chris Wilson
2009-10-05 01:37:39 +0800

22 Sep, 2009

1 commit

c7f7fea30 perf stat: Fix zero total printouts ... Browse Code »

Before:

0 sched:sched_switch # nan M/sec

After:

0 sched:sched_switch # 0.000 M/sec

Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-22 21:01:47 +0800

21 Sep, 2009

1 commit

cdd6c482c perf: Do the big rename: Performance Counters -> Performance Events ... Browse Code »

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES

for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done

FILES=$(find . -name perf_event.*)

sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian
Acked-by: Peter Zijlstra
Acked-by: Paul Mackerras
Reviewed-by: Arjan van de Ven
Cc: Mike Galbraith
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Benjamin Herrenschmidt
Cc: David Howells
Cc: Kyle McMartin
Cc: Martin Schwidefsky
Cc: "David S. Miller"
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-21 20:28:04 +0800

05 Sep, 2009

1 commit

849abde92 perf stat: Clean up statistics calculations a bit more ... Browse Code »

Remove some, now useless, global storage.
Don't calculate the stddev when not needed.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-05 02:27:26 +0800

04 Sep, 2009

4 commits

8a02631a4 perf stat: More advanced variance computation ... Browse Code »

Use the more advanced single pass variance algorithm outlined
on the wikipedia page. This is numerically more stable for
larger sample sets.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 23:38:15 +0800
63d40deb2 perf stat: Use stddev_mean in stead of stddev ... Browse Code »

When we're computing the mean by sampling the distribution,
then the std dev of the mean is related to the std dev of the
sample set by:

stddev_mean = std_dev / sqrt(N)

Which is exactly what we want.

This results in the error on the mean decreasing with
increasing number of samples.

Also fix the scaled == -1, aka not counted case.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 23:38:14 +0800
9e9772c45 perf stat: Remove the limit on repeat ... Browse Code »

Since we don't need all the individual samples to calculate the
error remove both the limit and the storage overhead associated
with that.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 22:33:08 +0800
506d4bc8d perf stat: Change noise calculation to use stddev ... Browse Code »

The current noise computation does:

\Sum abs(n_i - avg(n)) * N^-1.5

Which is (afaik) not a regular noise function, and needs the
complete sample set available to post-process.

Change this to use a regular stddev computation which can be
done by keeping a two sums:

stddev = sqrt( 1/N (\Sum n_i^2) - avg(n)^2 )

For which we only need to keep \Sum n_i and \Sum n_i^2.

Signed-off-by: Peter Zijlstra
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 22:33:07 +0800

17 Aug, 2009

1 commit

8f28827a1 perf tools: Librarize trace_event() helper ... Browse Code »

Librarize trace_event() helper so that perf trace can use it
too. Also clean up the debug.h includes a bit.

It's not good to have it included in perf.h because it doesn't
make it flexible against other headers it may need (headers
that can also depend on perf.h and then create a recursive
header dependency).

Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-08-17 05:06:45 +0800

12 Aug, 2009

1 commit

cd84c2ac6 perf tools: Factorize high level dso helpers ... Browse Code »

Factorize multiple definitions of high level dso helpers into the
symbol source file.

The side effect is a general export of the verbose and eprintf
debugging helpers into a new file dedicated to debugging purposes.

Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Brice Goglin

Frederic Weisbecker
2009-08-12 18:02:38 +0800