Eric Lee / smarc-fsl-linux-kernel

26 Jan, 2019

1 commit

8603cac28 perf stat: Avoid segfaults caused by negated options ... Browse Code »

[ Upstream commit 51433ead1460fb3f46e1c34f68bb22fd2dd0f5d0 ]

Some 'perf stat' options do not make sense to be negated (event,
cgroup), some do not have negated path implemented (metrics). Due to
that, it is better to disable the "no-" prefix for them, since
otherwise, the later opt-parsing segfaults.

Before:

$ perf stat --no-metrics -- ls
Segmentation fault (core dumped)

After:

$ perf stat --no-metrics -- ls
Error: option `no-metrics' isn't available
Usage: perf stat [] []

Signed-off-by: Michael Petlan
Tested-by: Arnaldo Carvalho de Melo
LPU-Reference: 1485912065.62416880.1544457604340.JavaMail.zimbra@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Sasha Levin

Michael Petlan
2019-01-26 16:32:41 +0800

25 Jul, 2018

2 commits

0aa802a79 perf stat: Get rid of extra clock display function ... Browse Code »

There's no reason to have separate function to display clock events.
It's only purpose was to convert the nanosecond value into microseconds.
We do that now in generic code, if the unit and scale values are
properly set, which this patch do for clock events.

The output differs in the unit field being displayed in its columns
rather than having it added as a suffix of the event name. Plus the
value is rounded into 2 decimal numbers as for any other event.

Before:

# perf stat -e cpu-clock,task-clock -C 0 sleep 3

Performance counter stats for 'CPU(s) 0':

3001.123137 cpu-clock (msec) # 1.000 CPUs utilized
3001.133250 task-clock (msec) # 1.000 CPUs utilized

3.001159813 seconds time elapsed

Now:

# perf stat -e cpu-clock,task-clock -C 0 sleep 3

Performance counter stats for 'CPU(s) 0':

3,001.05 msec cpu-clock # 1.000 CPUs utilized
3,001.05 msec task-clock # 1.000 CPUs utilized

3.001077794 seconds time elapsed

There's a small difference in csv output, as we now output the unit
field, which was empty before. It's in the proper spot, so there's no
compatibility issue.

Before:

# perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
3001.065177,,cpu-clock,3001064187,100.00,1.000,CPUs utilized
3001.077085,,task-clock,3001077085,100.00,1.000,CPUs utilized

# perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
3000.80,msec,cpu-clock,3000799026,100.00,1.000,CPUs utilized
3000.80,msec,task-clock,3000799550,100.00,1.000,CPUs utilized

Add perf_evsel__is_clock to replace nsec_counter.

Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20180720110036.32251-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-07-25 01:54:58 +0800
742d92ff2 perf stat: Add transaction flag (-T) support for s390 ... Browse Code »

The 'perf stat' command line flag -T to display transaction counters is
currently supported for x86 only.

Add support for s390. It is based on the metrics flag -M transaction
using the architecture dependent JSON files. This requires a metric
named "transaction" in the JSON files for the platform.

Introduce a new function metricgroup__has_metric() to check for the
existence of a metric_name transaction.

As suggested by Andi Kleen, this is the new approach to support
transactions counters. Other architectures will follow.

Output before:

[root@p23lp27 perf]# ./perf stat -T -- sleep 1
Cannot set up transaction events
[root@p23lp27 perf]#

Output after:

[root@s35lp76 perf]# ./perf stat -T -- ~/mytesttx 1 >/tmp/111

Performance counter stats for '/root/mytesttx 1':

1 tx_c_tend # 13.0 transaction
1 tx_nc_tend
11 tx_nc_tabort
0 tx_c_tabort_special
0 tx_c_tabort_no_special

0.001070109 seconds time elapsed

[root@s35lp76 perf]#

Suggested-by: Andi Kleen
Signed-off-by: Thomas Richter
Reviewed-by: Hendrik Brueckner
Acked-by: Andi Kleen
Acked-by: Jiri Olsa
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Link: http://lkml.kernel.org/r/20180626071701.58190-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo

Thomas Richter
2018-07-25 01:49:37 +0800

11 Jul, 2018

1 commit

c818cc063 perf stat: Fix --interval_clear option ... Browse Code »

Currently we display extra header line, like:

# perf stat -I 1000 -a --interval-clear
# time counts unit events
insn per cycle branch-misses of all branches
2.964917103 3855.349912 cpu-clock (msec) # 3.855 CPUs utilized
2.964917103 23,993 context-switches # 0.006 M/sec
2.964917103 1,301 cpu-migrations # 0.329 K/sec
...

Fixing the condition and getting proper:

# perf stat -I 1000 -a --interval-clear
# time counts unit events
2.359048938 1432.492228 cpu-clock (msec) # 1.432 CPUs utilized
2.359048938 7,613 context-switches # 0.002 M/sec
2.359048938 419 cpu-migrations # 0.133 K/sec
...

Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Fixes: 9660e08ee8cb ("perf stat: Add --interval-clear option")
Link: http://lkml.kernel.org/r/20180702134202.17745-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-07-11 21:43:03 +0800

08 Jun, 2018

5 commits

a5cfa6217 perf stat: Add event parsing error handling to add_default_attributes ... Browse Code »

Add missing error handling for parse_events calls in add_default_attributes
functions. The error handler displays error details, like for transactions (-T):

Before:
$ perf stat -T
Cannot set up transaction events

After:
$ perf stat -T
Cannot set up transaction events
event syntax error: '..cycles,cpu/cycles-t/,cpu/tx-start/,cpu/el-start/,cpu/cycles-ct/}'
\___ unknown term

Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20180606221513.11302-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-06-08 03:03:21 +0800
c1a1f5d9d perf stat: Allow to specify specific metric column len ... Browse Code »

The following change will introduce new metrics, that doesn't need such
wide hard coded spacing. Switch METRIC_ONLY_LEN macro usage with
metric_only_len variable.

Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20180606221513.11302-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-06-08 03:01:44 +0800
f51557273 perf stat: Fix metric column header display alignment ... Browse Code »

Make the metric only display aligned.

Before:
# perf stat --topdown -I 1000
# time core cpus retiring bad speculation frontend bound backend bound
1.000394323 S0-C0 2 37.4% 12.0% 31.4% 19.2%
1.000394323 S0-C1 2 25.1% 9.2% 43.8% 21.9%
2.001521204 S0-C0 2 36.4% 11.4% 32.4% 19.8%
2.001521204 S0-C1 2 26.2% 9.4% 43.1% 21.3%
3.001930208 S0-C0 2 35.1% 10.7% 33.6% 20.6%
3.001930208 S0-C1 2 28.9% 10.0% 40.0% 21.1%

After:
# perf stat --topdown -I 1000
# time core cpus retiring bad speculation frontend bound backend bound
1.000303722 S0-C0 2 34.2% 7.6% 34.2% 24.0%
1.000303722 S0-C1 2 33.1% 6.4% 36.9% 23.6%
2.001281055 S0-C0 2 34.6% 6.7% 36.8% 21.8%
2.001281055 S0-C1 2 32.8% 7.1% 38.1% 22.0%
3.001546080 S0-C0 2 39.3% 5.5% 32.7% 22.5%
3.001546080 S0-C1 2 37.8% 6.0% 33.1% 23.1%

Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20180606221513.11302-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-06-08 02:59:13 +0800
b37d33edb perf stat: Use only color_fprintf call in print_metric_only ... Browse Code »

We can call color_fprintf also for non color case, it's handled
properly. This change simplifies following patch.

Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20180606221513.11302-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-06-08 02:58:13 +0800
9660e08ee perf stat: Add --interval-clear option ... Browse Code »

Adding --interval-clear option to clear the screen before next interval.

Committer testing:

# perf stat -I 1000 --interval-clear

And, as expected, it behaves almost like:

# watch -n 0 perf stat -a sleep 1

Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/20180606221513.11302-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-06-08 02:53:36 +0800

06 Jun, 2018

1 commit

0ce2da148 perf stat: Display user and system time ... Browse Code »

Adding the support to read rusage data once the workload is finished and
display the system/user time values:

$ perf stat --null perf bench sched pipe
...

Performance counter stats for 'perf bench sched pipe':

5.342599256 seconds time elapsed

2.544434000 seconds user
4.549691000 seconds sys

It works only in non -r mode and only for workload target.

So as of now, for workload targets, we display 3 types of timings. The
time we meassure in perf stat from enable to disable+period:

5.342599256 seconds time elapsed

The time spent in user and system lands, displayed only for workload
session/target:

2.544434000 seconds user
4.549691000 seconds sys

Those times are the very same displayed by 'time' tool. They are
returned by wait4 call via the getrusage struct interface.

Committer notes:

Had to rename some variables to avoid this on older systems such as
centos:6:

builtin-stat.c: In function 'print_footer':
builtin-stat.c:1831: warning: declaration of 'stime' shadows a global declaration
/usr/include/time.h:297: warning: shadowed declaration is here

Committer testing:

# perf stat --null time perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes

Total time: 5.526 [sec]

5.526534 usecs/op
180945 ops/sec
1.00user 6.25system 0:05.52elapsed 131%CPU (0avgtext+0avgdata 8056maxresident)k
0inputs+0outputs (0major+606minor)pagefaults 0swaps

Performance counter stats for 'time perf bench sched pipe':

5.530978744 seconds time elapsed

1.004037000 seconds user
6.259937000 seconds sys

#

Suggested-by: Ingo Molnar
Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20180605121313.31337-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-06-06 23:52:04 +0800

26 Apr, 2018

3 commits

abc60bad0 perf stat: Display length strings of each run for --table option ... Browse Code »

Adding support to display visual aid 'length strings' to easily spot the
biggest difference in time table.

$ perf stat -r 10 --table perf bench sched pipe
...

Performance counter stats for './perf bench sched pipe' (5 runs):

# Table of individual measurements:
5.189 (-0.293) #
5.189 (-0.294) #
5.186 (-0.296) #
5.663 (+0.181) ##
6.186 (+0.703) ####

# Final result:
5.483 +- 0.198 seconds time elapsed ( +- 3.62% )

Suggested-by: Ingo Molnar
Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20180423090823.32309-9-jolsa@kernel.org
[ Updated 'perf stat --table' man page entry ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-04-26 20:30:27 +0800
e55c14af4 perf stat: Add --table option to display time of each run ... Browse Code »

Add --table option to display time for each run (-r option), like:

$ perf stat --null -r 5 --table perf bench sched pipe

Performance counter stats for './perf bench sched pipe' (5 runs):

# Table of individual measurements:
5.379 (-0.176)
5.243 (-0.311)
5.238 (-0.317)
5.536 (-0.019)
6.377 (+0.823)

# Final result:
5.555 +- 0.213 seconds time elapsed ( +- 3.83% )

Suggested-by: Ingo Molnar
Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20180423090823.32309-8-jolsa@kernel.org
[ Document the new option in 'perf stat's man page ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-04-26 20:30:27 +0800
bc22de9bc perf stat: Display time in precision based on std deviation ... Browse Code »

Ingo suggested to display elapsed time for multirun workload (perf stat
-e) with precision based on the precision of the standard deviation.

In his own words:

> This output is a slightly bit misleading:

> Performance counter stats for 'make -j128' (10 runs):
> 27.988995256 seconds time elapsed ( +- 0.39% )

> The 9 significant digits in the result, while only 1 is valid, suggests accuracy
> where none exists.

> It would be better if 'perf stat' would display elapsed time with a precision
> adjusted to stddev, it should display at most 2 more significant digits than
> the stddev inaccuracy.

> I.e. in the above case 0.39% is 0.109, so we only have accuracy for 1 digit, and
> so we should only display 3:

> 27.988 seconds time elapsed ( +- 0.39% )

Plus a suggestion about the output, which is small enough and connected
with the above change that I merged both changes together.

> Small output style nit - I think it would be nice if with --repeat the stddev was
> also displayed in absolute values, besides percentage:
>
> 27.988 +- 0.109 seconds time elapsed ( +- 0.39% )

The output is now:

Performance counter stats for './perf bench sched pipe' (5 runs):
SNIP
13.3667 +- 0.0256 seconds time elapsed ( +- 0.19% )

Suggested-by: Ingo Molnar
Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20180423090823.32309-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2018-04-26 20:30:27 +0800

25 Apr, 2018

2 commits

80ee8c588 perf stat: Fix duplicate PMU name for interval print ... Browse Code »

PMU name is printed repeatedly for interval print, for example:

perf stat --no-merge -e 'unc_m_clockticks' -a -I 1000
# time counts unit events
1.001053069 243,702,144 unc_m_clockticks [uncore_imc_4]
1.001053069 244,268,304 unc_m_clockticks [uncore_imc_2]
1.001053069 244,427,386 unc_m_clockticks [uncore_imc_0]
1.001053069 244,583,760 unc_m_clockticks [uncore_imc_5]
1.001053069 244,738,971 unc_m_clockticks [uncore_imc_3]
1.001053069 244,880,309 unc_m_clockticks [uncore_imc_1]
2.002024821 240,818,200 unc_m_clockticks [uncore_imc_4] [uncore_imc_4]
2.002024821 240,767,812 unc_m_clockticks [uncore_imc_2] [uncore_imc_2]
2.002024821 240,764,215 unc_m_clockticks [uncore_imc_0] [uncore_imc_0]
2.002024821 240,759,504 unc_m_clockticks [uncore_imc_5] [uncore_imc_5]
2.002024821 240,755,992 unc_m_clockticks [uncore_imc_3] [uncore_imc_3]
2.002024821 240,750,403 unc_m_clockticks [uncore_imc_1] [uncore_imc_1]

For each print, the PMU name is unconditionally appended to the
counter->name.

Need to check the counter->name first. If the PMU name is already
appended, do nothing.

Committer notes:

Add and use perf_evsel->uniquified_name bool instead of doing the more
expensive strstr(event->name, pmu->name).

Signed-off-by: Kan Liang
Tested-by: Arnaldo Carvalho de Melo
Cc: Agustin Vega-Frias
Cc: Andi Kleen
Cc: Ganapatrao Kulkarni
Cc: Jin Yao
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Shaokun Zhang
Cc: Will Deacon
Fixes: 8c5421c016a4 ("perf pmu: Display pmu name when printing unmerged events in stat")
Link: http://lkml.kernel.org/r/1524594014-79243-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Kan Liang
2018-04-25 03:12:00 +0800
30060eaed perf stat: Print out hint for mixed PMU group error ... Browse Code »

Perf doesn't support mixed events from different PMUs (except software
event) in a group. For this case, only "" or "" are printed out. There is no hint which guides users to fix
the issue.

Checking the PMU type of events to determine if they are from the same
PMU. There may be false alarm for the checking. E.g. the core PMU has
different PMU type. But it should not happen often.

The false alarm can also be tolerated, because:

- It only happens on error path.
- It just provides a possible solution for the issue.

Signed-off-by: Kan Liang
Cc: Agustin Vega-Frias
Cc: Andi Kleen
Cc: Ganapatrao Kulkarni
Cc: Jin Yao
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Shaokun Zhang
Cc: Will Deacon
Link: http://lkml.kernel.org/r/1524594014-79243-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Kan Liang
2018-04-25 03:11:59 +0800

12 Apr, 2018

1 commit

9dc9a95f0 perf stat: Enable 1ms interval for printing event counters values ... Browse Code »

Currently print count interval for performance counters values is
limited by 10ms so reading the values at frequencies higher than 100Hz
is restricted by the tool.

This change makes perf stat -I possible on frequencies up to 1KHz and,
to some extent, makes perf stat -I to be on-par with perf record
sampling profiling.

When running perf stat -I for monitoring e.g. PCIe uncore counters and
at the same time profiling some I/O workload by perf record e.g. for
cpu-cycles and context switches, it is then possible to observe
consolidated CPU/OS/IO(Uncore) performance picture for that workload.

Tool overhead warning printed when specifying -v option can be missed
due to screen scrolling in case you have output to the console
so message is moved into help available by running perf stat -h.

Signed-off-by: Alexey Budankov
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/b842ad6a-d606-32e4-afe5-974071b5198e@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Alexey Budankov
2018-04-12 20:29:31 +0800

17 Mar, 2018

1 commit

fca32340a perf stat: Fix core dump when flag T is used ... Browse Code »

Executing command 'perf stat -T -- ls' dumps core on x86 and s390.

Here is the call back chain (done on x86):

# gdb ./perf
....
(gdb) r stat -T -- ls
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
(gdb) where
#0 0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
#1 0x00007ffff56ae484 in asprintf () from /lib64/libc.so.6
#2 0x00000000004f1982 in __parse_events_add_pmu (parse_state=0x7fffffffd580,
list=0xbfb970, name=0xbf3ef0 "cpu",
head_config=0xbfb930, auto_merge_stats=false) at util/parse-events.c:1233
#3 0x00000000004f1c8e in parse_events_add_pmu (parse_state=0x7fffffffd580,
list=0xbfb970, name=0xbf3ef0 "cpu",
head_config=0xbfb930) at util/parse-events.c:1288
#4 0x0000000000537ce3 in parse_events_parse (_parse_state=0x7fffffffd580,
scanner=0xbf4210) at util/parse-events.y:234
#5 0x00000000004f2c7a in parse_events__scanner (str=0x6b66c0
"task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}",
parse_state=0x7fffffffd580, start_token=258) at util/parse-events.c:1673
#6 0x00000000004f2e23 in parse_events (evlist=0xbe9990, str=0x6b66c0
"task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", err=0x0)
at util/parse-events.c:1713
#7 0x000000000044e137 in add_default_attributes () at builtin-stat.c:2281
#8 0x000000000044f7b5 in cmd_stat (argc=1, argv=0x7fffffffe3b0) at
builtin-stat.c:2828
#9 0x00000000004c8b0f in run_builtin (p=0xab01a0 , argc=4,
argv=0x7fffffffe3b0) at perf.c:297
#10 0x00000000004c8d7c in handle_internal_command (argc=4,
argv=0x7fffffffe3b0) at perf.c:349
#11 0x00000000004c8ece in run_argv (argcp=0x7fffffffe20c,
argv=0x7fffffffe200) at perf.c:393
#12 0x00000000004c929c in main (argc=4, argv=0x7fffffffe3b0) at perf.c:537
(gdb)

It turns out that a NULL pointer is referenced. Here are the
function calls:

...
cmd_stat()
+---> add_default_attributes()
+---> parse_events(evsel_list, transaction_attrs, NULL);
3rd parameter set to NULL

Function parse_events(xx, xx, struct parse_events_error *err) dives
into a bison generated scanner and creates
parser state information for it first:

struct parse_events_state parse_state = {
.list = LIST_HEAD_INIT(parse_state.list),
.idx = evlist->nr_entries,
.error = err, error.str, ....)

which references a NULL pointer and dumps core.

Fix this by providing a pointer to the necessary error information
instead of NULL. Technically only the else part is needed to avoid the
core dump, just lets be safe...

Signed-off-by: Thomas Richter
Cc: Heiko Carstens
Cc: Hendrik Brueckner
Cc: Martin Schwidefsky
Link: http://lkml.kernel.org/r/20180308145735.64717-1-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo

Thomas Richter
2018-03-17 00:55:29 +0800

08 Mar, 2018

1 commit

8c5421c01 perf pmu: Display pmu name when printing unmerged events in stat ... Browse Code »

To simplify creation of events accross multiple instances of the same
type of PMU stat supports two methods for creating multiple events from
a single event specification:

1. A prefix or glob can be used in the PMU name.
2. Aliases, which are listed immediately after the Kernel PMU events
by perf list, are used.

When the --no-merge option is passed and these events are displayed
individually the PMU name is lost and it's not possible to see which
count corresponds to which pmu:

$ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null

Performance counter stats for 'system wide':

67 l3cache/read-miss/
67 l3cache/read-miss/
63 l3cache/read-miss/
60 l3cache/read-miss/

0.001675706 seconds time elapsed

$ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null

Performance counter stats for 'system wide':

12 l3cache_read_miss
17 l3cache_read_miss
10 l3cache_read_miss
8 l3cache_read_miss

0.001661305 seconds time elapsed

This change adds the original pmu name to the event. For dynamic pmu
events the pmu name is restored in the event name:

$ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null

Performance counter stats for 'system wide':

63 l3cache_0_3/read-miss/
74 l3cache_0_1/read-miss/
64 l3cache_0_2/read-miss/
74 l3cache_0_0/read-miss/

0.001675706 seconds time elapsed

For alias events the name is added after the event name:

$ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null

Performance counter stats for 'system wide':

10 l3cache_read_miss [l3cache_0_3]
12 l3cache_read_miss [l3cache_0_1]
10 l3cache_read_miss [l3cache_0_2]
17 l3cache_read_miss [l3cache_0_0]

0.001661305 seconds time elapsed

Signed-off-by: Agustin Vega-Frias
Acked-by: Andi Kleen
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Timur Tabi
Cc: linux-arm-kernel@lists.infradead.org
Change-Id: I8056b9eda74bda33e95065056167ad96e97cb1fb
Link: http://lkml.kernel.org/r/1520345084-42646-3-git-send-email-agustinv@codeaurora.org
Signed-off-by: Arnaldo Carvalho de Melo

Agustin Vega-Frias
2018-03-08 21:05:49 +0800

07 Mar, 2018

1 commit

3f986eefc Merge branch 'perf/urgent' into perf/core, to resolve conflict ... Browse Code »

Conflicts:
tools/perf/perf.h

Signed-off-by: Ingo Molnar

Ingo Molnar
2018-03-07 16:23:12 +0800

06 Mar, 2018

1 commit

40c21898b perf stat: Fix CVS output format for non-supported counters ... Browse Code »

When printing stats in CSV mode, 'perf stat' appends extra separators
when a counter is not supported:

,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00,,,,

Which causes a failure when parsing fields. The numbers of separators
should be the same for each line, no matter if the counter is or not
supported.

Signed-off-by: Ilya Pronin
Acked-by: Jiri Olsa
Cc: Andi Kleen
Link: http://lkml.kernel.org/r/20180306064353.31930-1-xiyou.wangcong@gmail.com
Fixes: 92a61f6412d3 ("perf stat: Implement CSV metrics output")
Signed-off-by: Cong Wang
Signed-off-by: Arnaldo Carvalho de Melo

Ilya Pronin
2018-03-06 21:53:52 +0800

27 Feb, 2018

1 commit

ab6c79b81 perf stat: Ignore error thread when enabling system-wide --per-thread ... Browse Code »

If we execute 'perf stat --per-thread' with non-root account (even set
kernel.perf_event_paranoid = -1 yet), it reports the error:

jinyao@skl:~$ perf stat --per-thread
Error:
You may not have permission to collect system-wide stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).

The current value is 2:

-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

kernel.perf_event_paranoid = -1

Perhaps the ptrace rule doesn't allow to trace some processes. But anyway
the global --per-thread mode had better ignore such errors and continue
working on other threads.

This patch will record the index of error thread in perf_evsel__open()
and remove this thread before retrying.

For example (run with non-root, kernel.perf_event_paranoid isn't set):

jinyao@skl:~$ perf stat --per-thread
^C
Performance counter stats for 'system wide':

vmstat-3458 6.171984 cpu-clock:u (msec) # 0.000 CPUs utilized
perf-3670 0.515599 cpu-clock:u (msec) # 0.000 CPUs utilized
vmstat-3458 1,163,643 cycles:u # 0.189 GHz
perf-3670 40,881 cycles:u # 0.079 GHz
vmstat-3458 1,410,238 instructions:u # 1.21 insn per cycle
perf-3670 3,536 instructions:u # 0.09 insn per cycle
vmstat-3458 288,937 branches:u # 46.814 M/sec
perf-3670 936 branches:u # 1.815 M/sec
vmstat-3458 15,195 branch-misses:u # 5.26% of all branches
perf-3670 76 branch-misses:u # 8.12% of all branches

12.651675247 seconds time elapsed

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1516117388-10120-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2018-02-27 22:29:21 +0800

21 Feb, 2018

1 commit

42811d509 perf stat: Use xyarray dimensions to iterate fds ... Browse Code »

Now that the xyarray stores the dimensions we can use those
to iterate over the FDs for a evsel.

Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20171006020029.13339-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2018-02-21 22:36:57 +0800

16 Feb, 2018

2 commits

f1f8ad52f perf stat: Add support to print counts after a period of time ... Browse Code »

Introduce a new option to print counts after N milliseconds and update
'perf stat' documentation accordingly.

Show below is the output of the new option for perf stat.

$ perf stat --time 2000 -e cycles -a
Performance counter stats for 'system wide':

157,260,423 cycles

2.003060766 seconds time elapsed

We can print the count deltas after N milliseconds with this new
introduced option. This option is not supported with "-I" option.

In addition, according to Kangliang's patch(19afd10410957), the
monitoring overhead for system-wide core event could be very high if the
interval-print parameter was below 100ms, and the limitation value is
10ms.

So the same warning will be displayed when the time is set between 10ms
to 100ms, and the minimal time is limited to 10ms. Users can make a
decision according to their spcific cases.

Committer notes:

This actually stops the workload after the specified time, then prints
the counts.

So I renamed the option to --timeout and updated the documentation to
state that it will not just print the counts after the specified time,
but will really stop the 'perf stat' session and print the counts.

The rename from 'time' to 'timeout' also fixes the build in systems
where 'time' is used by glibc and can't be used as a name of a variable,
such as centos:5 and centos:6.

Changes since v3:
- none.

Changes since v2:
- modify the time check in __run_perf_stat func to keep some consistency
with the workload case.
- add the warning when the time is set between 10ms to 100ms.
- add the pr_err when the time is set below 10ms.

Changes since v1:
- none.

Signed-off-by: yuzhoujian
Acked-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: David Ahern
Cc: Kan Liang
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1517217923-8302-3-git-send-email-ufo19890607@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo

yuzhoujian
2018-02-16 21:18:06 +0800
db06a269e perf stat: Add support to print counts for fixed times ... Browse Code »

Introduce a new option to print counts for fixed number of times and
update 'perf stat' documentation accordingly.

Show below is the output of the new option for perf stat.

$ perf stat -I 1000 --interval-count 2 -e cycles -a
# time counts unit events
1.002827089 93,884,870 cycles
2.004231506 56,573,446 cycles

We can just print the counts for several times with this newly
introduced option. The usage of it is a little like 'vmstat', and it
should be used together with "-I" option.

$ vmstat -n 1 2
procs ---------memory-------------- --swap- ----io-- -system-- ------cpu---
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 78270544 547484 51732076 0 0 0 20 1 1 1 0 99 0 0
0 0 0 78270512 547484 51732080 0 0 0 16 477 1555 0 0 100 0 0

Changes since v3:
- merge interval_count check and times check to one line.
- fix the wrong indent in stat.h
- use stat_config.times instead of 'times' in cmd_stat function.

Changes since v2:
- none.

Changes since v1:
- change the name of the new option "times-print" to "interval-count".
- keep the new option interval specifically.

Signed-off-by: yuzhoujian
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: David Ahern
Cc: Kan Liang
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1517217923-8302-2-git-send-email-ufo19890607@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo

yuzhoujian
2018-02-16 21:09:24 +0800

27 Dec, 2017

7 commits

3315d14f8 perf perf: Remove duplicate includes ... Browse Code »

These duplicate includes have been found with scripts/checkincludes.pl
but they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge
Cc: David S. Miller
Cc: Greg Kroah-Hartman
Cc: Jiri Olsa
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1512582204-6493-1-git-send-email-pravin.shedge4linux@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo

Pravin Shedge
2017-12-27 23:15:49 +0800
29734550c perf stat: Resort '--per-thread' result ... Browse Code »

There are many threads reported if we enable '--per-thread'
globally.

1. Most of the threads are not counted or counting value 0.
This patch removes these threads.

2. We also resort the threads in display according to the
counting value. It's useful for user to see the hottest
threads easily.

For example, the new results would be:

root@skl:/tmp# perf stat --per-thread
^C
Performance counter stats for 'system wide':

perf-24165
vmstat-23127
irqbalance-2780
sshd-23111
thermald-2841
sshd-23058
kworker/0:2-19991
kworker/u16:1-18249
rcu_sched-8
kworker/u16:2-23146
gmain-2700
kworker/4:1-15354
kworker/6:0-17528
kworker/4:1H-1887
kworker/5:2-31362
watchdog/0-11
kworker/3:2-12870
ksoftirqd/0-7
watchdog/1-14
watchdog/7-50
watchdog/3-26
watchdog/4-32
watchdog/2-20
watchdog/6-44
watchdog/5-38
rcu_sched-8
kworker/u16:1-18249
sshd-23111
vmstat-23127
perf-24165
kworker/0:2-19991
kworker/u16:2-23146
kworker/4:1-15354
kworker/6:0-17528
sshd-23058
ksoftirqd/0-7
watchdog/0-11
watchdog/1-14
watchdog/2-20
watchdog/3-26
watchdog/4-32
watchdog/5-38
watchdog/6-44
watchdog/7-50
kworker/4:1H-1887
thermald-2841
gmain-2700
irqbalance-2780
kworker/3:2-12870
kworker/5:2-31362
kworker/u16:1-18249
kworker/u16:2-23146
rcu_sched-8
sshd-23058
perf-24165
vmstat-23127
irqbalance-2780
sshd-23111
thermald-2841
sshd-23058
kworker/0:2-19991
kworker/u16:1-18249
rcu_sched-8
kworker/u16:2-23146
gmain-2700
kworker/4:1-15354
kworker/6:0-17528
kworker/5:2-31362
watchdog/3-26
kworker/4:1H-1887
watchdog/4-32
kworker/3:2-12870
watchdog/2-20
watchdog/0-11
ksoftirqd/0-7
watchdog/1-14
watchdog/5-38
watchdog/6-44
watchdog/7-50
vmstat-23127
perf-24165
irqbalance-2780
thermald-2841
sshd-23111
sshd-23058
rcu_sched-8
kworker/u16:1-18249
kworker/0:2-19991
kworker/u16:2-23146
gmain-2700
kworker/4:1-15354
kworker/6:0-17528
kworker/5:2-31362
kworker/3:2-12870
kworker/4:1H-1887
ksoftirqd/0-7
watchdog/0-11
watchdog/1-14
watchdog/2-20
watchdog/3-26
watchdog/4-32
watchdog/5-38
watchdog/6-44
watchdog/7-50
vmstat-23127
perf-24165
irqbalance-2780
thermald-2841
sshd-23111
sshd-23058
rcu_sched-8
kworker/u16:1-18249
kworker/0:2-19991
kworker/u16:2-23146
kworker/4:1-15354
gmain-2700
kworker/6:0-17528
kworker/5:2-31362
kworker/3:2-12870
kworker/4:1H-1887
ksoftirqd/0-7
watchdog/0-11
watchdog/1-14
watchdog/2-20
watchdog/3-26
watchdog/4-32
watchdog/5-38
watchdog/6-44
watchdog/7-50
vmstat-23127
irqbalance-2780
perf-24165
sshd-23111
thermald-2841
sshd-23058
kworker/u16:1-18249
rcu_sched-8
kworker/u16:2-23146
kworker/0:2-19991
gmain-2700
kworker/6:0-17528
kworker/4:1-15354
kworker/5:2-31362
watchdog/0-11
watchdog/4-32
kworker/3:2-12870
watchdog/3-26
kworker/4:1H-1887
watchdog/2-20
ksoftirqd/0-7
watchdog/1-14
watchdog/7-50
watchdog/5-38
watchdog/6-44 4.302433 cpu-clock (msec) # 0.001 CPUs utilized 1.562215 cpu-clock (msec) # 0.000 CPUs utilized 0.827851 cpu-clock (msec) # 0.000 CPUs utilized 0.278308 cpu-clock (msec) # 0.000 CPUs utilized 0.230880 cpu-clock (msec) # 0.000 CPUs utilized 0.207306 cpu-clock (msec) # 0.000 CPUs utilized 0.133983 cpu-clock (msec) # 0.000 CPUs utilized 0.125636 cpu-clock (msec) # 0.000 CPUs utilized 0.085533 cpu-clock (msec) # 0.000 CPUs utilized 0.077139 cpu-clock (msec) # 0.000 CPUs utilized 0.041789 cpu-clock (msec) # 0.000 CPUs utilized 0.028370 cpu-clock (msec) # 0.000 CPUs utilized 0.023895 cpu-clock (msec) # 0.000 CPUs utilized 0.013209 cpu-clock (msec) # 0.000 CPUs utilized 0.011627 cpu-clock (msec) # 0.000 CPUs utilized 0.010892 cpu-clock (msec) # 0.000 CPUs utilized 0.010220 cpu-clock (msec) # 0.000 CPUs utilized 0.008869 cpu-clock (msec) # 0.000 CPUs utilized 0.008476 cpu-clock (msec) # 0.000 CPUs utilized 0.002944 cpu-clock (msec) # 0.000 CPUs utilized 0.002893 cpu-clock (msec) # 0.000 CPUs utilized 0.002759 cpu-clock (msec) # 0.000 CPUs utilized 0.002429 cpu-clock (msec) # 0.000 CPUs utilized 0.001491 cpu-clock (msec) # 0.000 CPUs utilized 0.001477 cpu-clock (msec) # 0.000 CPUs utilized 10 context-switches # 0.117 M/sec 7 context-switches # 0.056 M/sec 4 context-switches # 0.014 M/sec 4 context-switches # 0.003 M/sec 4 context-switches # 0.930 K/sec 3 context-switches # 0.022 M/sec 3 context-switches # 0.039 M/sec 2 context-switches # 0.070 M/sec 2 context-switches # 0.084 M/sec 2 context-switches # 0.010 M/sec 1 context-switches # 0.113 M/sec 1 context-switches # 0.092 M/sec 1 context-switches # 0.118 M/sec 1 context-switches # 0.412 M/sec 1 context-switches # 0.346 M/sec 1 context-switches # 0.362 M/sec 1 context-switches # 0.677 M/sec 1 context-switches # 0.671 M/sec 1 context-switches # 0.340 M/sec 1 context-switches # 0.076 M/sec 1 context-switches # 0.004 M/sec 1 context-switches # 0.024 M/sec 1 context-switches # 0.001 M/sec 1 context-switches # 0.098 M/sec 1 context-switches # 0.086 M/sec 2 cpu-migrations # 0.016 M/sec 2 cpu-migrations # 0.026 M/sec 1 cpu-migrations # 0.012 M/sec 1 cpu-migrations # 0.005 M/sec 8,833,385 cycles # 2.053 GHz 1,702,699 cycles # 1.090 GHz 739,847 cycles # 0.894 GHz 269,506 cycles # 0.968 GHz 204,556 cycles # 0.886 GHz 158,780 cycles # 0.766 GHz 112,981 cycles # 0.843 GHz 100,926 cycles # 0.803 GHz 74,024 cycles # 0.865 GHz 55,984 cycles # 0.726 GHz 34,278 cycles # 0.820 GHz 20,665 cycles # 0.728 GHz 16,445 cycles # 0.688 GHz 9,492 cycles # 0.816 GHz 8,695 cycles # 3.006 GHz 8,238 cycles # 0.624 GHz 7,580 cycles # 2.747 GHz 7,306 cycles # 0.715 GHz 7,274 cycles # 2.995 GHz 6,988 cycles # 0.642 GHz 6,376 cycles # 0.719 GHz 5,340 cycles # 0.630 GHz 4,061 cycles # 2.749 GHz 3,976 cycles # 2.667 GHz 3,418 cycles # 1.161 GHz 2,511,699 instructions # 1.48 insn per cycle 1,829,908 instructions # 0.21 insn per cycle 1,190,204 instructions # 1.61 insn per cycle 143,544 instructions # 0.70 insn per cycle 128,138 instructions # 0.48 insn per cycle 57,654 instructions # 0.36 insn per cycle 44,063 instructions # 0.60 insn per cycle 42,551 instructions # 0.42 insn per cycle 25,873 instructions # 0.23 insn per cycle 21,407 instructions # 0.38 insn per cycle 13,691 instructions # 0.40 insn per cycle 12,964 instructions # 0.63 insn per cycle 10,034 instructions # 0.61 insn per cycle 5,203 instructions # 0.55 insn per cycle 4,866 instructions # 0.67 insn per cycle 3,586 instructions # 0.44 insn per cycle 3,463 instructions # 0.54 insn per cycle 3,135 instructions # 0.45 insn per cycle 3,135 instructions # 0.59 insn per cycle 3,135 instructions # 0.43 insn per cycle 3,135 instructions # 0.36 insn per cycle 3,135 instructions # 0.41 insn per cycle 3,135 instructions # 0.77 insn per cycle 3,135 instructions # 0.79 insn per cycle 3,135 instructions # 0.92 insn per cycle 539,181 branches # 345.139 M/sec 375,364 branches # 87.245 M/sec 262,092 branches # 316.593 M/sec 31,611 branches # 136.915 M/sec 21,874 branches # 78.596 M/sec 10,682 branches # 51.528 M/sec 8,693 branches # 101.633 M/sec 7,891 branches # 62.808 M/sec 5,761 branches # 42.998 M/sec 4,099 branches # 53.138 M/sec 2,755 branches # 97.110 M/sec 2,638 branches # 63.127 M/sec 2,216 branches # 92.739 M/sec 1,132 branches # 97.360 M/sec 1,081 branches # 105.773 M/sec 725 branches # 54.887 M/sec 707 branches # 79.716 M/sec 652 branches # 59.860 M/sec 652 branches # 76.923 M/sec 652 branches # 268.423 M/sec 652 branches # 225.372 M/sec 652 branches # 236.318 M/sec 652 branches # 441.435 M/sec 652 branches # 437.290 M/sec 652 branches # 221.467 M/sec 8,960 branch-misses # 1.66% of all branches 3,047 branch-misses # 1.16% of all branches 2,876 branch-misses # 0.77% of all branches 1,843 branch-misses # 8.43% of all branches 1,444 branch-misses # 4.57% of all branches 1,379 branch-misses # 12.91% of all branches 982 branch-misses # 12.44% of all branches 893 branch-misses # 10.27% of all branches 578 branch-misses # 14.10% of all branches 376 branch-misses # 6.53% of all branches 280 branch-misses # 10.61% of all branches 196 branch-misses # 8.84% of all branches 187 branch-misses # 6.79% of all branches 123 branch-misses # 10.87% of all branches 95 branch-misses # 14.57% of all branches 89 branch-misses # 13.65% of all branches 80 branch-misses # 7.40% of all branches 61 branch-misses # 9.36% of all branches 60 branch-misses # 8.28% of all branches 52 branch-misses # 7.98% of all branches 47 branch-misses # 6.65% of all branches 46 branch-misses # 7.06% of all branches 13 branch-misses # 1.99% of all branches 8 branch-misses # 1.23% of all branches 7 branch-misses # 1.07% of all branches

3.695150786 seconds time elapsed

root@skl:/tmp# perf stat --per-thread -M IPC,CPI
^C

Performance counter stats for 'system wide':

vmstat-23127 2,000,783 inst_retired.any # 1.5 IPC
thermald-2841 1,472,670 inst_retired.any # 1.3 IPC
sshd-23111 977,374 inst_retired.any # 1.2 IPC
perf-24163 483,779 inst_retired.any # 0.2 IPC
gmain-2700 341,213 inst_retired.any # 0.9 IPC
sshd-23058 148,891 inst_retired.any # 0.8 IPC
rtkit-daemon-3288 71,210 inst_retired.any # 0.7 IPC
kworker/u16:1-18249 39,562 inst_retired.any # 0.3 IPC
rcu_sched-8 14,474 inst_retired.any # 0.8 IPC
kworker/0:2-19991 7,659 inst_retired.any # 0.2 IPC
kworker/4:1-15354 6,714 inst_retired.any # 0.8 IPC
rtkit-daemon-3289 4,839 inst_retired.any # 0.3 IPC
kworker/6:0-17528 3,321 inst_retired.any # 0.6 IPC
kworker/5:2-31362 3,215 inst_retired.any # 0.5 IPC
kworker/7:2-23145 3,173 inst_retired.any # 0.7 IPC
kworker/4:1H-1887 1,719 inst_retired.any # 0.3 IPC
watchdog/0-11 1,479 inst_retired.any # 0.3 IPC
watchdog/1-14 1,479 inst_retired.any # 0.3 IPC
watchdog/2-20 1,479 inst_retired.any # 0.4 IPC
watchdog/3-26 1,479 inst_retired.any # 0.4 IPC
watchdog/4-32 1,479 inst_retired.any # 0.3 IPC
watchdog/5-38 1,479 inst_retired.any # 0.3 IPC
watchdog/6-44 1,479 inst_retired.any # 0.7 IPC
watchdog/7-50 1,479 inst_retired.any # 0.7 IPC
kworker/u16:2-23146 1,408 inst_retired.any # 0.5 IPC
perf-24163 2,249,872 cpu_clk_unhalted.thread
vmstat-23127 1,352,455 cpu_clk_unhalted.thread
thermald-2841 1,161,140 cpu_clk_unhalted.thread
sshd-23111 807,827 cpu_clk_unhalted.thread
gmain-2700 375,535 cpu_clk_unhalted.thread
sshd-23058 194,071 cpu_clk_unhalted.thread
kworker/u16:1-18249 114,306 cpu_clk_unhalted.thread
rtkit-daemon-3288 103,547 cpu_clk_unhalted.thread
kworker/0:2-19991 46,550 cpu_clk_unhalted.thread
rcu_sched-8 18,855 cpu_clk_unhalted.thread
rtkit-daemon-3289 17,549 cpu_clk_unhalted.thread
kworker/4:1-15354 8,812 cpu_clk_unhalted.thread
kworker/5:2-31362 6,812 cpu_clk_unhalted.thread
kworker/4:1H-1887 5,270 cpu_clk_unhalted.thread
kworker/6:0-17528 5,111 cpu_clk_unhalted.thread
kworker/7:2-23145 4,667 cpu_clk_unhalted.thread
watchdog/0-11 4,663 cpu_clk_unhalted.thread
watchdog/1-14 4,663 cpu_clk_unhalted.thread
watchdog/4-32 4,626 cpu_clk_unhalted.thread
watchdog/5-38 4,403 cpu_clk_unhalted.thread
watchdog/3-26 3,936 cpu_clk_unhalted.thread
watchdog/2-20 3,850 cpu_clk_unhalted.thread
kworker/u16:2-23146 2,654 cpu_clk_unhalted.thread
watchdog/6-44 2,017 cpu_clk_unhalted.thread
watchdog/7-50 2,017 cpu_clk_unhalted.thread
vmstat-23127 2,000,783 inst_retired.any # 0.7 CPI
thermald-2841 1,472,670 inst_retired.any # 0.8 CPI
sshd-23111 977,374 inst_retired.any # 0.8 CPI
perf-24163 495,037 inst_retired.any # 4.7 CPI
gmain-2700 341,213 inst_retired.any # 1.1 CPI
sshd-23058 148,891 inst_retired.any # 1.3 CPI
rtkit-daemon-3288 71,210 inst_retired.any # 1.5 CPI
kworker/u16:1-18249 39,562 inst_retired.any # 2.9 CPI
rcu_sched-8 14,474 inst_retired.any # 1.3 CPI
kworker/0:2-19991 7,659 inst_retired.any # 6.1 CPI
kworker/4:1-15354 6,714 inst_retired.any # 1.3 CPI
rtkit-daemon-3289 4,839 inst_retired.any # 3.6 CPI
kworker/6:0-17528 3,321 inst_retired.any # 1.5 CPI
kworker/5:2-31362 3,215 inst_retired.any # 2.1 CPI
kworker/7:2-23145 3,173 inst_retired.any # 1.5 CPI
kworker/4:1H-1887 1,719 inst_retired.any # 3.1 CPI
watchdog/0-11 1,479 inst_retired.any # 3.2 CPI
watchdog/1-14 1,479 inst_retired.any # 3.2 CPI
watchdog/2-20 1,479 inst_retired.any # 2.6 CPI
watchdog/3-26 1,479 inst_retired.any # 2.7 CPI
watchdog/4-32 1,479 inst_retired.any # 3.1 CPI
watchdog/5-38 1,479 inst_retired.any # 3.0 CPI
watchdog/6-44 1,479 inst_retired.any # 1.4 CPI
watchdog/7-50 1,479 inst_retired.any # 1.4 CPI
kworker/u16:2-23146 1,408 inst_retired.any # 1.9 CPI
perf-24163 2,302,323 cycles
vmstat-23127 1,352,455 cycles
thermald-2841 1,161,140 cycles
sshd-23111 807,827 cycles
gmain-2700 375,535 cycles
sshd-23058 194,071 cycles
kworker/u16:1-18249 114,306 cycles
rtkit-daemon-3288 103,547 cycles
kworker/0:2-19991 46,550 cycles
rcu_sched-8 18,855 cycles
rtkit-daemon-3289 17,549 cycles
kworker/4:1-15354 8,812 cycles
kworker/5:2-31362 6,812 cycles
kworker/4:1H-1887 5,270 cycles
kworker/6:0-17528 5,111 cycles
kworker/7:2-23145 4,667 cycles
watchdog/0-11 4,663 cycles
watchdog/1-14 4,663 cycles
watchdog/4-32 4,626 cycles
watchdog/5-38 4,403 cycles
watchdog/3-26 3,936 cycles
watchdog/2-20 3,850 cycles
kworker/u16:2-23146 2,654 cycles
watchdog/6-44 2,017 cycles
watchdog/7-50 2,017 cycles

2.175726600 seconds time elapsed

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1512482591-4646-12-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2017-12-27 23:15:47 +0800
1d9f8d1b8 perf stat: Remove --per-thread pid/tid limitation ... Browse Code »

Currently, if we execute 'perf stat --per-thread' without specifying
pid/tid, perf will return error.

root@skl:/tmp# perf stat --per-thread
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid stat events on existing process id
-t, --tid stat events on existing thread id

This patch removes this limitation. If no pid/tid specified, it returns
all threads (get threads from /proc).

Note that it doesn't support cpu_list yet so if it's a cpu_list case,
then skip.

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1512482591-4646-11-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2017-12-27 23:15:47 +0800
14e72a21c perf stat: Update or print per-thread stats ... Browse Code »

If the stats pointer in stat_config structure is not null, it will
update the per-thread stats or print the per-thread stats on this
buffer.

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1512482591-4646-9-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2017-12-27 23:15:46 +0800
56739444d perf stat: Allocate shadow stats buffer for threads ... Browse Code »

After perf_evlist__create_maps() being executed, we can get all threads
from /proc. And via thread_map__nr(), we can also get the number of
threads.

With the number of threads, the patch allocates a buffer which will
record the shadow stats for these threads.

The buffer pointer is saved in stat_config.

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1512482591-4646-8-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2017-12-27 23:15:45 +0800
e0128b30d perf stat: Print per-thread shadow stats ... Browse Code »

The function perf_stat__print_shadow_stats() is called to print the
shadow stats on a set of static variables.

But the static variables are the limitations to support
per-thread shadow stats.

This patch lets the perf_stat__print_shadow_stats() support
to print the shadow stats from a input parameter 'st'.

It will not directly get value from static variable. Instead,
it now uses runtime_stat_avg() and runtime_stat_n() to get and
compute the values.

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1512482591-4646-6-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2017-12-27 23:15:44 +0800
1fcd03946 perf stat: Update per-thread shadow stats ... Browse Code »

The functions perf_stat__update_shadow_stats() is called to update the
shadow stats on a set of static variables.

But the static variables are the limitations to be extended to support
per-thread shadow stats.

This patch lets the perf_stat__update_shadow_stats() support to update
the shadow stats on a input parameter 'st' and uses
update_runtime_stat() to update the stats. It will not directly update
the static variables as before.

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1512482591-4646-5-git-send-email-yao.jin@linux.intel.com
[ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2017-12-27 23:15:43 +0800

30 Nov, 2017

1 commit

bfd8f72c2 perf record: Synthesize unit/scale/... in event update ... Browse Code »

Move the code to synthesize event updates for scale/unit/cpus to a
common utility file, and use it both from stat and record.

This allows to access scale and other extra qualifiers from perf script.

Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20171117214300.32746-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2017-11-30 05:18:00 +0800

31 Oct, 2017

3 commits

54830dd0c perf stat: Move the shadow stats scale computation in perf_stat__update_shadow_stats ... Browse Code »

Move the shadow stats scale computation to the
perf_stat__update_shadow_stats() function, so it's centralized and we
don't forget to do it. It also saves few lines of code.

Signed-off-by: Jiri Olsa
Cc: Andi Kleen
Cc: Changbin Du
Cc: David Ahern
Cc: Jin Yao
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-htg7mmyxv6pcrf57qyo6msid@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2017-10-31 00:40:33 +0800
eae8ad804 perf tools: Add struct perf_data_file ... Browse Code »

Add struct perf_data_file to represent a single file within a perf_data
struct.

Signed-off-by: Jiri Olsa
Cc: Andi Kleen
Cc: Changbin Du
Cc: David Ahern
Cc: Jin Yao
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2017-10-31 00:37:37 +0800
8ceb41d7e perf tools: Rename struct perf_data_file to perf_data ... Browse Code »

Rename struct perf_data_file to perf_data, because we will add the
possibility to have multiple files under perf.data, so the 'perf_data'
name fits better.

Signed-off-by: Jiri Olsa
Cc: Andi Kleen
Cc: Changbin Du
Cc: David Ahern
Cc: Jin Yao
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2017-10-31 00:36:09 +0800

27 Oct, 2017

1 commit

e669e833d perf evsel: Restore evsel->priv as a tool private area ... Browse Code »

When we started using it for stats and did it not just in
builtin-stat.c, but also for builtin-script.c, then it stopped being a
tool private area, so introduce a new pointer for these stats and leave
->priv to its original purpose.

Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Wang Nan
Cc: yuzhoujian
Fixes: cfc8874a4859 ("perf script: Process cpu/threads maps")
Link: http://lkml.kernel.org/n/tip-jtpzx3rjqo78snmmsdzwb2eb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2017-10-27 20:10:10 +0800

13 Sep, 2017

4 commits

35c1980eb perf stat: Fall weak group back even for EBADF ... Browse Code »

It's not possible to run a package event and a per cpu event in the same
group. This is used by some of the power metrics. They work correctly
when not using a group.

Normally weak groups should handle that, but in this case EBADF is
returned instead of the normal EINVAL.

$ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
Using CPUID GenuineIntel-6-3E
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3
perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4
perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor)

and perf errors out.

Make weak groups trigger a fall back for EBADF too. Then this case works correctly:

$ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
Using CPUID GenuineIntel-6-3E
Weak group for cstate_pkg/c2-residency//2 failed
cstate_pkg/c2-residency/: 476709882 1000598460 1000598460
msr/tsc/: 39625837911 12007369110 12007369110

Performance counter stats for 'system wide':

476,709,882 cstate_pkg/c2-residency/
39,625,837,911 msr/tsc/

1.000697588 seconds time elapsed

This fixes perf stat -M Power ...

$ perf stat -M Power --metric-only -a sleep 1

Performance counter stats for 'system wide':

Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
1.0 0.7 30.0 0.0 0.9 0.1 0.4 0.0

1.001240740 seconds time elapsed

Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2017-09-13 20:49:16 +0800
b90f1333e perf stat: Update walltime_nsecs_stats in interval mode ... Browse Code »

Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval.
Compute it for each interval instead of only at the end.

Pointed out by Jiri.

Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2017-09-13 20:49:14 +0800
e864c5ca1 perf stat: Hide internal duration_time counter ... Browse Code »

Some perf stat metrics use an internal "duration_time" metric. It is not
correctly printed however. So hide it during output to avoid confusing
users with 0 counts.

Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2017-09-13 20:49:14 +0800
b18f3e365 perf stat: Support JSON metrics in perf stat ... Browse Code »

Add generic support for standalone metrics specified in JSON files to
perf stat. A metric is a formula that uses multiple events to compute a
higher level result (e.g. IPC).

Previously metrics were always tied to an event and automatically
enabled with that event. But now change it that we can have standalone
metrics. They are in the same JSON data structure as events, but don't
have an event name.

We also allow to organize the metrics in metric groups, which allows a
short cut to select several related metrics at once.

Add a new -M / --metrics option to perf stat that adds the metrics or
metric groups specified.

Add the core code to manage and parse the metric groups. They are
collected from the JSON data structures into a separate rblist. When
computing shadow values look for metrics in that list. Then they are
computed using the existing saved values infrastructure in stat-shadow.c

The actual JSON metrics are in a separate pull request.

% perf stat -M Summary --metric-only -a sleep 1

Performance counter stats for 'system wide':

Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
317614222.0 1392930775.0 0.0 0.0 0.2 0.1

1.001497549 seconds time elapsed

% perf stat -M GFLOPs flops

Performance counter stats for 'flops':

3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%)
14 fp_comp_ops_exe.sse_scalar_double (66.65%)
0 fp_comp_ops_exe.sse_packed_double (66.67%)
0 fp_comp_ops_exe.sse_packed_single (66.70%)
0 simd_fp_256.packed_double (66.70%)
0 simd_fp_256.packed_single (66.67%)
0 duration_time

3.238372845 seconds time elapsed

v2: Add missing header file
v3: Move find_map to pmu.c

Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2017-09-13 20:49:13 +0800