Eric Lee / smarc-fsl-linux-kernel

19 May, 2010

1 commit

5af52b51f perf stat: add perf stat -B to pretty print large numbers ... Browse Code »

It is hard to read very large numbers so provide an option to perf stat
to separate thousands using a separator. The patch leverages the locale
support of stdio. You need to set your LC_NUMERIC appropriately, for
instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this
feature. This way existing scripts parsing the output do not need to be
changed. Here is an example.

$ perf stat noploop 2
noploop for 2 seconds

Performance counter stats for 'noploop 2':

1998.347031 task-clock-msecs # 0.998 CPUs
61 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
118 page-faults # 0.000 M/sec
4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%)
2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%)
2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%)
40,267 branch-misses # 0.002 % (scaled from 30.04%)
2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%)
53,725 cache-misses # 0.027 M/sec (scaled from 30.02%)

2.001393933 seconds time elapsed

$ perf stat -B noploop 2
noploop for 2 seconds

Performance counter stats for 'noploop 2':

1998.297883 task-clock-msecs # 0.998 CPUs
59 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
119 page-faults # 0.000 M/sec
4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%)
2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%)
2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%)
25,650 branch-misses # 0.001 % (scaled from 30.05%)
2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%)
47,097 cache-misses # 0.024 M/sec (scaled from 30.02%)

2.001391016 seconds time elapsed

Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2010-05-19 10:03:22 +0800

14 May, 2010

1 commit

2e6cdf996 perf tools: change event inheritance logic in stat and record ... Browse Code »

By default, event inheritance across fork and pthread_create was on but the -i
option of stat and record, which enabled inheritance, led to believe it was off
by default.

This patch fixes this logic by inverting the meaning of the -i option. By
default inheritance is on whether you attach to a process (-p), a thread (-t)
or start a process. If you pass -i, then you turn off inheritance. Turning off
inheritance if you don't need it, helps limit perf resource usage as well.

The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not
start the counters.

Acked-by: Frederic Weisbecker
Cc: David S. Miller
Cc: Frédéric Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo

Stephane Eranian
2010-05-14 03:39:12 +0800

14 Apr, 2010

1 commit

c05556421 perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() ... Browse Code »

Parsing an option from the command line with OPT_BOOLEAN on a
bool data type would not work on a big-endian machine due to the
manner in which the boolean was being cast into an int and
incremented. For example, running 'perf probe --list' on a
PowerPC machine would fail to properly set the list_events bool
and would therefore print out the usage information and
terminate.

This patch makes OPT_BOOLEAN work as expected with a bool
datatype. For cases where the original OPT_BOOLEAN was
intentionally being used to increment an int each time it was
passed in on the command line, this patch introduces OPT_INCR
with the old behaviour of OPT_BOOLEAN (the verbose variable is
currently the only such example of this).

I have reviewed every use of OPT_BOOLEAN to verify that a true
C99 bool was passed. Where integers were used, I verified that
they were only being used for boolean logic and changed them to
bools to ensure that they would not be mistakenly used as ints.
The major exception was the verbose variable which now uses
OPT_INCR instead of OPT_BOOLEAN.

Signed-off-by: Ian Munsie
Acked-by: David S. Miller
Cc: # NOTE: wont apply to .3[34].x cleanly, please backport
Cc: Git development list
Cc: Ian Munsie
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: KOSAKI Motohiro
Cc: Hitoshi Mitake
Cc: Rusty Russell
Cc: Frederic Weisbecker
Cc: Eric B Munson
Cc: Valdis.Kletnieks@vt.edu
Cc: WANG Cong
Cc: Thiago Farina
Cc: Masami Hiramatsu
Cc: Xiao Guangrong
Cc: Jaswinder Singh Rajput
Cc: Arjan van de Ven
Cc: OGAWA Hirofumi
Cc: Mike Galbraith
Cc: Tom Zanussi
Cc: Anton Blanchard
Cc: John Kacur
Cc: Li Zefan
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Ian Munsie
2010-04-14 17:26:44 +0800

23 Mar, 2010

1 commit

084ab9f86 perf stat: Better report failure to collect system wide stats ... Browse Code »

Before:

[acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s

Performance counter stats for 'sleep 1s':

task-clock-msecs
context-switches
CPU-migrations
page-faults
cycles
instructions
branches
branch-misses
cache-references
cache-misses

1.016998463 seconds time elapsed

[acme@doppio linux-2.6-tip]$

Now:

[acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s
No permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid.
[acme@doppio linux-2.6-tip]$

Reported-by: Ingo Molnar
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar

Arnaldo Carvalho de Melo
2010-03-23 01:47:35 +0800

18 Mar, 2010

2 commits

d6d901c23 perf events: Change perf parameter --pid to process-wide collection instead of thread-wide ... Browse Code »

Parameter --pid (or -p) of perf currently means a thread-wide
collection. For exmaple, if a process whose id is 8888 has 10
threads, 'perf top -p 8888' just collects the main thread
statistics. That's misleading. Users are used to attach a whole
process when debugging a process by gdb. To follow normal usage
style, the patch change --pid to process-wide collection and add
--tid (-t) to mean a thread-wide collection.

Usage example is:

# perf top -p 8888
# perf record -p 8888 -f sleep 10
# perf stat -p 8888 -f sleep 10

Above commands collect the statistics of all threads of process
8888.

Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Cc: zhiteng.huang@intel.com
Cc: Zachary Amsden
LKML-Reference:
Signed-off-by: Ingo Molnar

Zhang, Yanmin
2010-03-18 23:21:12 +0800
6be2850ef perf stat: Enable counters when collecting process-wide or system-wide data ... Browse Code »

Command 'perf stat' doesn't enable counters when collecting an
existing (by -p) process or system-wide statistics. Fix the
issue.

Change the condition of fork/exec subcommand. If there is a
subcommand parameter, perf always forks/execs it. The usage
example is:

# perf stat -a sleep 10

So this command could collect statistics for 10 seconds
precisely. User still could stop it by CTRL+C. Without the new
capability, user could only use CTRL+C to stop it without
precise time clock.

Another issue is 'perf stat -a' consumes 100% time of a full
single logical cpu. It has a bad impact on running workload.

Fix it by adding a sleep(1) in the while(!done) loop in function
run_perf_stat.

Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Marcelo Tosatti
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Gleb Natapov
Cc: Zachary Amsden
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Zhang, Yanmin
2010-03-18 23:21:11 +0800

11 Mar, 2010

1 commit

a12b51c47 perf tools: Fix sparse CPU numbering related bugs ... Browse Code »

At present, the perf subcommands that do system-wide monitoring
(perf stat, perf record and perf top) don't work properly unless
the online cpus are numbered 0, 1, ..., N-1. These tools ask
for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
and then try to create events for cpus 0, 1, ..., N-1.

This creates problems for systems where the online cpus are
numbered sparsely. For example, a POWER6 system in
single-threaded mode (i.e. only running 1 hardware thread per
core) will have only even-numbered cpus online.

This fixes the problem by reading the /sys/devices/system/cpu/online
file to find out which cpus are online. The code that does that is in
tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
function that sets up a cpumap[] array and returns the number of
online cpus. If /sys/devices/system/cpu/online can't be read or
can't be parsed successfully, it falls back to using sysconf to
ask how many cpus are online and sets up an identity map in cpumap[].

The perf record, perf stat and perf top code then calls
read_cpu_map() in the system-wide monitoring case (instead of
sysconf) and uses cpumap[] to get the cpu numbers to pass to
perf_event_open.

Signed-off-by: Paul Mackerras
Cc: Anton Blanchard
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul Mackerras
2010-03-11 20:36:53 +0800

13 Jan, 2010

1 commit

60666c630 perf tools: Fix --pid option for stat ... Browse Code »

current pid option doesn't work for perf stat. Change it to what
perf record --pid acts as.

Signed-off-by: Liming Wang
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Liming Wang
2010-01-13 17:09:08 +0800

15 Nov, 2009

1 commit

7255fe2a4 perf stat: Do not print ratio when task-clock event is not counted ... Browse Code »

The ratio between the number of events and the time elapsed makes
sense only if task-clock event is counted. Otherwise it will be
simply a (confusing)

# 0.000 M/sec

This patch outputs the ratio only if task-clock event is counted.
Some test examples of before and after:

Before:

[lucas@skywalker linux.trees.git]$ sudo perf stat -e branch-misses -a -- sleep 1

Performance counter stats for 'sleep 1':

1367818 branch-misses # 0.000 M/sec

1.001494325 seconds time elapsed

After (without task-clock):

[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -a -- sleep 1

Performance counter stats for 'sleep 1':

1135044 branch-misses

1.001370775 seconds time elapsed

After (with task-clock):

[lucas@skywalker perf]$ sudo ./perf stat -e branch-misses -e task-clock -a -- sleep 1

Performance counter stats for 'sleep 1':

1070111 branch-misses # 0.534 M/sec
2002.730893 task-clock-msecs # 1.999 CPUs

1.001640292 seconds time elapsed

Signed-off-by: Lucas De Marchi
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Lucas De Marchi
2009-11-15 22:25:50 +0800

19 Oct, 2009

4 commits

dd86e72ab perf stat: Count branches first ... Browse Code »

Count branches first, cache-misses second. The reason is that
on x86 branches are not counted by all counters on all CPUs.

Before:

Performance counter stats for 'ls':

0.756653 task-clock-msecs # 0.802 CPUs
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
250 page-faults # 0.330 M/sec
2375725 cycles # 3139.781 M/sec
1628129 instructions # 0.685 IPC
19643 cache-references # 25.960 M/sec
4608 cache-misses # 6.090 M/sec
342532 branches # 452.694 M/sec
branch-misses

0.000943356 seconds time elapsed

After:

Performance counter stats for 'ls':

1.056734 task-clock-msecs # 0.859 CPUs
0 context-switches # 0.000 M/sec
0 CPU-migrations # 0.000 M/sec
259 page-faults # 0.245 M/sec
3345932 cycles # 3166.295 M/sec
3074090 instructions # 0.919 IPC
616928 branches # 583.806 M/sec
39279 branch-misses # 6.367 %
21312 cache-references # 20.168 M/sec
3661 cache-misses # 3.464 M/sec

0.001230551 seconds time elapsed

(also prettify the printout of branch misses, in case it's
getting scaled.)

Cc: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
---
tools/perf/builtin-stat.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c373683..95a55ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

};
---
tools/perf/builtin-stat.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 95a55ea..90e0a26 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -50,17 +50,17 @@

static struct perf_event_attr default_attrs[] = {

- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES},
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
- { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
-
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
+
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

};

Ingo Molnar
2009-10-19 19:36:32 +0800
56aab464f perf stat: Re-align the default_attrs[] array ... Browse Code »

Clean up the array definition to be vertically aligned.

No functional effects.

Cc: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
---
tools/perf/builtin-stat.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c373683..95a55ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,6 +59,8 @@ static struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES},
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS},
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },

};

Ingo Molnar
2009-10-19 19:27:08 +0800
12133afff perf stat: Add branch performance events to default output ... Browse Code »

Adds performance event information about branches
and branch misses to the default output of perf stat.

Signed-off-by: Tim Blechmann
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Tim Blechmann
2009-10-19 19:26:42 +0800
11018201b perf stat: Add branch performance metric ... Browse Code »

When we count both branches and branch-misses it is useful to
print out the percentage of branch-misses:

# perf stat -e branches -e branch-misses /bin/true

Performance counter stats for '/bin/true':

401684 branches # 0.000 M/sec
23301 branch-misses # 5.801 %

Signed-off-by: Anton Blanchard
Cc: paulus@samba.org
Cc: a.p.zijlstra@chello.nl
LKML-Reference:
Signed-off-by: Ingo Molnar

Anton Blanchard
2009-10-19 15:20:21 +0800

05 Oct, 2009

1 commit

933da83aa perf: Propagate term signal to child ... Browse Code »

If we launch the child on behalf of the user, ensure that it dies
along with ourselves when we are interrupted.

Signed-off-by: Chris Wilson
Cc: Chris Wilson
LKML-Reference:
Signed-off-by: Ingo Molnar

Chris Wilson
2009-10-05 01:37:39 +0800

22 Sep, 2009

1 commit

c7f7fea30 perf stat: Fix zero total printouts ... Browse Code »

Before:

0 sched:sched_switch # nan M/sec

After:

0 sched:sched_switch # 0.000 M/sec

Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-22 21:01:47 +0800

21 Sep, 2009

1 commit

cdd6c482c perf: Do the big rename: Performance Counters -> Performance Events ... Browse Code »

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES

for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done

FILES=$(find . -name perf_event.*)

sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian
Acked-by: Peter Zijlstra
Acked-by: Paul Mackerras
Reviewed-by: Arjan van de Ven
Cc: Mike Galbraith
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Benjamin Herrenschmidt
Cc: David Howells
Cc: Kyle McMartin
Cc: Martin Schwidefsky
Cc: "David S. Miller"
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-21 20:28:04 +0800

05 Sep, 2009

1 commit

849abde92 perf stat: Clean up statistics calculations a bit more ... Browse Code »

Remove some, now useless, global storage.
Don't calculate the stddev when not needed.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-05 02:27:26 +0800

04 Sep, 2009

4 commits

8a02631a4 perf stat: More advanced variance computation ... Browse Code »

Use the more advanced single pass variance algorithm outlined
on the wikipedia page. This is numerically more stable for
larger sample sets.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 23:38:15 +0800
63d40deb2 perf stat: Use stddev_mean in stead of stddev ... Browse Code »

When we're computing the mean by sampling the distribution,
then the std dev of the mean is related to the std dev of the
sample set by:

stddev_mean = std_dev / sqrt(N)

Which is exactly what we want.

This results in the error on the mean decreasing with
increasing number of samples.

Also fix the scaled == -1, aka not counted case.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 23:38:14 +0800
9e9772c45 perf stat: Remove the limit on repeat ... Browse Code »

Since we don't need all the individual samples to calculate the
error remove both the limit and the storage overhead associated
with that.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 22:33:08 +0800
506d4bc8d perf stat: Change noise calculation to use stddev ... Browse Code »

The current noise computation does:

\Sum abs(n_i - avg(n)) * N^-1.5

Which is (afaik) not a regular noise function, and needs the
complete sample set available to post-process.

Change this to use a regular stddev computation which can be
done by keeping a two sums:

stddev = sqrt( 1/N (\Sum n_i^2) - avg(n)^2 )

For which we only need to keep \Sum n_i and \Sum n_i^2.

Signed-off-by: Peter Zijlstra
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-09-04 22:33:07 +0800

17 Aug, 2009

1 commit

8f28827a1 perf tools: Librarize trace_event() helper ... Browse Code »

Librarize trace_event() helper so that perf trace can use it
too. Also clean up the debug.h includes a bit.

It's not good to have it included in perf.h because it doesn't
make it flexible against other headers it may need (headers
that can also depend on perf.h and then create a recursive
header dependency).

Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-08-17 05:06:45 +0800

12 Aug, 2009

1 commit

cd84c2ac6 perf tools: Factorize high level dso helpers ... Browse Code »

Factorize multiple definitions of high level dso helpers into the
symbol source file.

The side effect is a general export of the verbose and eprintf
debugging helpers into a new file dedicated to debugging purposes.

Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Brice Goglin

Frederic Weisbecker
2009-08-12 18:02:38 +0800

09 Aug, 2009

1 commit

b26bc5a7f perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale ... Browse Code »

We want to use a coherent flag for -S/--stat across all tools,
so free up -S in perf stat.

Signed-off-by: Brice Goglin
Cc: Peter Zijlstra
Cc: paulus@samba.org
Signed-off-by: Ingo Molnar

Brice Goglin
2009-08-09 18:54:37 +0800

23 Jul, 2009

1 commit

a0541234f perf_counter: Improve perf stat and perf record option parsing ... Browse Code »

perf stat and perf record currently look for all options on the command
line. This can lead to some confusion:

# perf stat ls -l
Error: unknown switch `l'

While we can work around this by adding '--' before the command, the git
option parsing code can stop at the first non option:

# perf stat ls -l
Performance counter stats for 'ls -l':
....

Signed-off-by: Anton Blanchard
Signed-off-by: Peter Zijlstra
LKML-Reference:

Anton Blanchard
2009-07-23 00:05:56 +0800

02 Jul, 2009

1 commit

a92bef0f2 perf stat: Handle pipe read failures in perf stat ... Browse Code »

Building builtin-stat.c reports the following errors:

cc1: warnings being treated as errors
builtin-stat.c: In function ‘run_perf_stat’:
builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
make: *** [builtin-stat.o] Erreur 1

This patch handles the possible pipe read failures.

Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Anton Blanchard
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-07-02 04:37:24 +0800

01 Jul, 2009

2 commits

b9ebdcc0c perf stat: Define MATCH_EVENT for easy attr checking ... Browse Code »

MATCH_EVENT is useful:

1. for multiple attrs checking
2. avoid repetition of PERF_TYPE_ and PERF_COUNT_ and save space
3. avoids line breakage

Signed-off-by: Jaswinder Singh Rajput
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Jaswinder Singh Rajput
2009-07-01 19:28:38 +0800
f37a291c5 perf_counter tools: Add more warnings and fix/annotate them ... Browse Code »

Enable -Wextra. This found a few real bugs plus a number
of signed/unsigned type mismatches/uncleanlinesses. It
also required a few annotations

All things considered it was still worth it so lets try with
this enabled for now.

Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-01 18:49:48 +0800

30 Jun, 2009

3 commits

57e7986ed perf_counter: Provide a way to enable counters on exec ... Browse Code »

This provides a way to mark a counter to be enabled on the next
exec. This is useful for measuring the total activity of a
program without including overhead from the process that
launches it.

This also changes the perf stat command to use this new
facility.

Signed-off-by: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul Mackerras
2009-06-30 18:00:16 +0800
051ae7f73 perf_counter tools: Reduce perf stat measurement overhead/skew ... Browse Code »

Vince Weaver reported a 'perf stat' measurement overhead in the
count of retired instructions, which can amount to a +6000
instructions inflated count in the reported count.

At present, perf stat creates its counters on the perf process. Thus
the counters count the fork and various other activity in both the
parent and child, such as the resolver overhead for resolving PLT
entries for any libc functions that haven't been called before, such
as execvp.

This reduces the overhead by creating the counters on the child process
after the fork, using a couple of pipes to synchronize so that the
child process waits until the parent has created the counters before
doing the exec. To eliminate the PLT resolution overhead on calling
execvp, this does a dummy execvp first which will always fail.

With this, the overhead of executing a program goes down from over
4800 instructions to about 90 instructions on powerpc (32-bit).
This was measured with a statically-linked program written in
assembler which only does the 3 instructions needed to call _exit(0).

Before:

$ perf stat -e 0:1:u ./three

Performance counter stats for './three':

4858 instructions

0.001274523 seconds time elapsed

After:

$ perf stat -e 0:1:u ./three

Performance counter stats for './three':

92 instructions

0.000468153 seconds time elapsed

Reported-by: Vince Weaver
Signed-off-by: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul Mackerras
2009-06-30 04:38:09 +0800
210ad39fb perf stat: Use percentages for scaling output ... Browse Code »

Peter expressed a strong preference for percentage based
display of scaled values - so revert to that from the
recently introduced multiplication-factor unit.

Reported-by: Peter Zijlstra
Cc: Jaswinder Singh Rajput
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-06-30 03:50:54 +0800

28 Jun, 2009

2 commits

c3043569d perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run ... Browse Code »

Set attrs and nr_counters if no event is selected and !null_run.

Setting of attrs should depend on number of counters,
so we need to memcpy only for sizeof(default_attrs)

Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
hardcoded value.

Signed-off-by: Jaswinder Singh Rajput
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Jaswinder Singh Rajput
2009-06-28 21:22:47 +0800
6e750a8fc perf stat: Improve output ... Browse Code »

Increase size for event name to handle bigger names like
'L1-d$-prefetch-misses'

Changed scaled counters from percentage to a multiplicative
factor because the latter is more expressive.

Also aligned the scaling factor, otherwise sometimes it looks
like:

384 iTLB-load-misses (4.74x scaled)
452029 branch-loads (8.00x scaled)
5892 branch-load-misses (20.39x scaled)
972315 iTLB-loads (3.24x scaled)

Before:
150708 L1-d$-stores (scaled from 23.57%)
428804 L1-d$-prefetches (scaled from 23.47%)
314446 L1-d$-prefetch-misses (scaled from 23.42%)
252626137 L1-i$-loads (scaled from 23.24%)
5297550 dTLB-load-misses (scaled from 23.96%)
106992392 branch-loads (scaled from 23.67%)
5239561 branch-load-misses (scaled from 23.43%)

After:
1731713 L1-d$-loads ( 14.25x scaled)
44241 L1-d$-prefetches ( 3.88x scaled)
21076 L1-d$-prefetch-misses ( 3.40x scaled)
5789421 L1-i$-loads ( 3.78x scaled)
29645 dTLB-load-misses ( 2.95x scaled)
461474 branch-loads ( 6.52x scaled)
7493 branch-load-misses ( 26.57x scaled)

Reported-by: Ingo Molnar
Signed-off-by: Jaswinder Singh Rajput
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Jaswinder Singh Rajput
2009-06-28 00:39:41 +0800

27 Jun, 2009

2 commits

566747e62 perf stat: Fix multi-run stats ... Browse Code »

In multi-run (-r/--repeat) printouts, print out the noise of
the wall-clock average as well.

Also, fix a bug in printing out scaled counters: if it was not
scaled then we should not update the average with -1.

Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-06-27 12:34:37 +0800
0cfb7a13b perf stat: Add -n/--null option to run without counters ... Browse Code »

Allow a no-counters run. This can be useful to measure just
elapsed wall-clock time - or to assess the raw overhead of perf
stat itself, without running any counters.

Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-06-27 12:11:24 +0800

24 Jun, 2009

2 commits

3d6325958 perf stat: Remove dead code ... Browse Code »

Remove dead code and do some code alignment.

Signed-off-by: Jaswinder Singh Rajput
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Jaswinder Singh Rajput
2009-06-24 20:55:45 +0800
cca03c0ae perf stat: Fix verbose for perf stat ... Browse Code »

Error message should use stderr for verbose (-v), otherwise
message will be lost for:

$ ./perf stat -v > /dev/null

For example on AMD bus-cycles event is not available so now
it looks like:

$ ./perf stat -v -e bus-cycles ls > /dev/null
Error: counter 0, sys_perf_counter_open() syscall returned with -1 (Invalid argument)

Performance counter stats for 'ls':

bus-cycles

0.006765877 seconds time elapsed.

Signed-off-by: Jaswinder Singh Rajput
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Jaswinder Singh Rajput
2009-06-24 03:58:44 +0800

20 Jun, 2009

1 commit

9cffa8d53 perf_counter tools: Define and use our own u64, s64 etc. definitions ... Browse Code »

On 64-bit powerpc, __u64 is defined to be unsigned long rather than
unsigned long long. This causes compiler warnings every time we
print a __u64 value with %Lx.

Rather than changing __u64, we define our own u64 to be unsigned long
long on all architectures, and similarly s64 as signed long long.
For consistency we also define u32, s32, u16, s16, u8 and s8. These
definitions are put in a new header, types.h, because these definitions
are needed in util/string.h and util/symbol.h.

The main change here is the mechanical change of __[us]{64,32,16,8}
to remove the "__". The other changes are:

* Create types.h
* Include types.h in perf.h, util/string.h and util/symbol.h
* Add types.h to the LIB_H definition in Makefile
* Added (u64) casts in process_overflow_event() and print_sym_table()
to kill two remaining warnings.

Signed-off-by: Paul Mackerras
Acked-by: Peter Zijlstra
Cc: benh@kernel.crashing.org
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul Mackerras
2009-06-20 00:25:47 +0800

13 Jun, 2009

2 commits

ef281a196 perf stat: Enable raw data to be printed ... Browse Code »

If -vv (very verbose) is specified, print out raw data
in the following format:

$ perf stat -vv -r 3 ./loop_1b_instructions

[ perf stat: executing run #1 ... ]
[ perf stat: executing run #2 ... ]
[ perf stat: executing run #3 ... ]

debug: runtime[0]: 235871872
debug: walltime[0]: 236646752
debug: runtime_cycles[0]: 755150182
debug: counter/0[0]: 235871872
debug: counter/1[0]: 235871872
debug: counter/2[0]: 235871872
debug: scaled[0]: 0
debug: counter/0[1]: 2
debug: counter/1[1]: 235870662
debug: counter/2[1]: 235870662
debug: scaled[1]: 0
debug: counter/0[2]: 1
debug: counter/1[2]: 235870437
debug: counter/2[2]: 235870437
debug: scaled[2]: 0
debug: counter/0[3]: 140
debug: counter/1[3]: 235870298
debug: counter/2[3]: 235870298
debug: scaled[3]: 0
debug: counter/0[4]: 755150182
debug: counter/1[4]: 235870145
debug: counter/2[4]: 235870145
debug: scaled[4]: 0
debug: counter/0[5]: 1001411258
debug: counter/1[5]: 235868838
debug: counter/2[5]: 235868838
debug: scaled[5]: 0
debug: counter/0[6]: 27897
debug: counter/1[6]: 235868560
debug: counter/2[6]: 235868560
debug: scaled[6]: 0
debug: counter/0[7]: 2910
debug: counter/1[7]: 235868151
debug: counter/2[7]: 235868151
debug: scaled[7]: 0
debug: runtime[0]: 235980257
debug: walltime[0]: 236770942
debug: runtime_cycles[0]: 755114546
debug: counter/0[0]: 235980257
debug: counter/1[0]: 235980257
debug: counter/2[0]: 235980257
debug: scaled[0]: 0
debug: counter/0[1]: 3
debug: counter/1[1]: 235980049
debug: counter/2[1]: 235980049
debug: scaled[1]: 0
debug: counter/0[2]: 1
debug: counter/1[2]: 235979907
debug: counter/2[2]: 235979907
debug: scaled[2]: 0
debug: counter/0[3]: 135
debug: counter/1[3]: 235979780
debug: counter/2[3]: 235979780
debug: scaled[3]: 0
debug: counter/0[4]: 755114546
debug: counter/1[4]: 235979652
debug: counter/2[4]: 235979652
debug: scaled[4]: 0
debug: counter/0[5]: 1001439771
debug: counter/1[5]: 235979304
debug: counter/2[5]: 235979304
debug: scaled[5]: 0
debug: counter/0[6]: 23723
debug: counter/1[6]: 235979050
debug: counter/2[6]: 235979050
debug: scaled[6]: 0
debug: counter/0[7]: 2213
debug: counter/1[7]: 235978820
debug: counter/2[7]: 235978820
debug: scaled[7]: 0
debug: runtime[0]: 235888002
debug: walltime[0]: 236700533
debug: runtime_cycles[0]: 754881504
debug: counter/0[0]: 235888002
debug: counter/1[0]: 235888002
debug: counter/2[0]: 235888002
debug: scaled[0]: 0
debug: counter/0[1]: 2
debug: counter/1[1]: 235887793
debug: counter/2[1]: 235887793
debug: scaled[1]: 0
debug: counter/0[2]: 1
debug: counter/1[2]: 235887645
debug: counter/2[2]: 235887645
debug: scaled[2]: 0
debug: counter/0[3]: 135
debug: counter/1[3]: 235887499
debug: counter/2[3]: 235887499
debug: scaled[3]: 0
debug: counter/0[4]: 754881504
debug: counter/1[4]: 235887368
debug: counter/2[4]: 235887368
debug: scaled[4]: 0
debug: counter/0[5]: 1001401731
debug: counter/1[5]: 235887024
debug: counter/2[5]: 235887024
debug: scaled[5]: 0
debug: counter/0[6]: 24212
debug: counter/1[6]: 235886786
debug: counter/2[6]: 235886786
debug: scaled[6]: 0
debug: counter/0[7]: 1824
debug: counter/1[7]: 235886560
debug: counter/2[7]: 235886560
debug: scaled[7]: 0

Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):

235.913377 task-clock-msecs # 0.997 CPUs ( +- 0.011% )
2 context-switches # 0.000 M/sec ( +- 0.000% )
1 CPU-migrations # 0.000 M/sec ( +- 0.000% )
136 page-faults # 0.001 M/sec ( +- 0.730% )
755048744 cycles # 3200.534 M/sec ( +- 0.009% )
1001417586 instructions # 1.326 IPC ( +- 0.001% )
25277 cache-references # 0.107 M/sec ( +- 3.988% )
2315 cache-misses # 0.010 M/sec ( +- 9.845% )

0.236706075 seconds time elapsed.

This allows the summary stats to be validated.

Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-06-13 21:40:35 +0800
42202dd56 perf stat: Add feature to run and measure a command multiple times ... Browse Code »

Add the --repeat feature to perf stat, which repeats a given
command up to a 100 times, collects the stats and calculates an
average and a stddev.

For example, the following oneliner 'perf stat' command runs hackbench
5 times and prints a tabulated result of all metrics, with averages
and noise levels (in percentage) printed:

aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
Time: 0.117
Time: 0.108
Time: 0.089
Time: 0.088
Time: 0.100

Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% )
47706 context-switches # 0.038 M/sec ( +- 19.706% )
387 CPU-migrations # 0.000 M/sec ( +- 3.608% )
17793 page-faults # 0.014 M/sec ( +- 0.354% )
3770941606 cycles # 3031.329 M/sec ( +- 4.621% )
1566372416 instructions # 0.415 IPC ( +- 2.703% )
16783421 cache-references # 13.492 M/sec ( +- 5.202% )
7128590 cache-misses # 5.730 M/sec ( +- 7.420% )

0.118924455 seconds time elapsed.

The goal of this feature is to allow the reliance on these accurate
statistics and to know how many times a command has to be repeated
for the noise to go down to an acceptable level.

(The -v option can be used to see a line printed out as each run progresses.)

Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-06-13 21:18:57 +0800