26 Jan, 2019

1 commit

  • [ Upstream commit 51433ead1460fb3f46e1c34f68bb22fd2dd0f5d0 ]

    Some 'perf stat' options do not make sense to be negated (event,
    cgroup), some do not have negated path implemented (metrics). Due to
    that, it is better to disable the "no-" prefix for them, since
    otherwise, the later opt-parsing segfaults.

    Before:

    $ perf stat --no-metrics -- ls
    Segmentation fault (core dumped)

    After:

    $ perf stat --no-metrics -- ls
    Error: option `no-metrics' isn't available
    Usage: perf stat [] []

    Signed-off-by: Michael Petlan
    Tested-by: Arnaldo Carvalho de Melo
    LPU-Reference: 1485912065.62416880.1544457604340.JavaMail.zimbra@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Michael Petlan
     

25 Jul, 2018

2 commits

  • There's no reason to have separate function to display clock events.
    It's only purpose was to convert the nanosecond value into microseconds.
    We do that now in generic code, if the unit and scale values are
    properly set, which this patch do for clock events.

    The output differs in the unit field being displayed in its columns
    rather than having it added as a suffix of the event name. Plus the
    value is rounded into 2 decimal numbers as for any other event.

    Before:

    # perf stat -e cpu-clock,task-clock -C 0 sleep 3

    Performance counter stats for 'CPU(s) 0':

    3001.123137 cpu-clock (msec) # 1.000 CPUs utilized
    3001.133250 task-clock (msec) # 1.000 CPUs utilized

    3.001159813 seconds time elapsed

    Now:

    # perf stat -e cpu-clock,task-clock -C 0 sleep 3

    Performance counter stats for 'CPU(s) 0':

    3,001.05 msec cpu-clock # 1.000 CPUs utilized
    3,001.05 msec task-clock # 1.000 CPUs utilized

    3.001077794 seconds time elapsed

    There's a small difference in csv output, as we now output the unit
    field, which was empty before. It's in the proper spot, so there's no
    compatibility issue.

    Before:

    # perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
    3001.065177,,cpu-clock,3001064187,100.00,1.000,CPUs utilized
    3001.077085,,task-clock,3001077085,100.00,1.000,CPUs utilized

    # perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
    3000.80,msec,cpu-clock,3000799026,100.00,1.000,CPUs utilized
    3000.80,msec,task-clock,3000799550,100.00,1.000,CPUs utilized

    Add perf_evsel__is_clock to replace nsec_counter.

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180720110036.32251-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • The 'perf stat' command line flag -T to display transaction counters is
    currently supported for x86 only.

    Add support for s390. It is based on the metrics flag -M transaction
    using the architecture dependent JSON files. This requires a metric
    named "transaction" in the JSON files for the platform.

    Introduce a new function metricgroup__has_metric() to check for the
    existence of a metric_name transaction.

    As suggested by Andi Kleen, this is the new approach to support
    transactions counters. Other architectures will follow.

    Output before:

    [root@p23lp27 perf]# ./perf stat -T -- sleep 1
    Cannot set up transaction events
    [root@p23lp27 perf]#

    Output after:

    [root@s35lp76 perf]# ./perf stat -T -- ~/mytesttx 1 >/tmp/111

    Performance counter stats for '/root/mytesttx 1':

    1 tx_c_tend # 13.0 transaction
    1 tx_nc_tend
    11 tx_nc_tabort
    0 tx_c_tabort_special
    0 tx_c_tabort_no_special

    0.001070109 seconds time elapsed

    [root@s35lp76 perf]#

    Suggested-by: Andi Kleen
    Signed-off-by: Thomas Richter
    Reviewed-by: Hendrik Brueckner
    Acked-by: Andi Kleen
    Acked-by: Jiri Olsa
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Link: http://lkml.kernel.org/r/20180626071701.58190-1-tmricht@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Richter
     

11 Jul, 2018

1 commit

  • Currently we display extra header line, like:

    # perf stat -I 1000 -a --interval-clear
    # time counts unit events
    insn per cycle branch-misses of all branches
    2.964917103 3855.349912 cpu-clock (msec) # 3.855 CPUs utilized
    2.964917103 23,993 context-switches # 0.006 M/sec
    2.964917103 1,301 cpu-migrations # 0.329 K/sec
    ...

    Fixing the condition and getting proper:

    # perf stat -I 1000 -a --interval-clear
    # time counts unit events
    2.359048938 1432.492228 cpu-clock (msec) # 1.432 CPUs utilized
    2.359048938 7,613 context-switches # 0.002 M/sec
    2.359048938 419 cpu-migrations # 0.133 K/sec
    ...

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: 9660e08ee8cb ("perf stat: Add --interval-clear option")
    Link: http://lkml.kernel.org/r/20180702134202.17745-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

08 Jun, 2018

5 commits

  • Add missing error handling for parse_events calls in add_default_attributes
    functions. The error handler displays error details, like for transactions (-T):

    Before:
    $ perf stat -T
    Cannot set up transaction events

    After:
    $ perf stat -T
    Cannot set up transaction events
    event syntax error: '..cycles,cpu/cycles-t/,cpu/tx-start/,cpu/el-start/,cpu/cycles-ct/}'
    \___ unknown term

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20180606221513.11302-8-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • The following change will introduce new metrics, that doesn't need such
    wide hard coded spacing. Switch METRIC_ONLY_LEN macro usage with
    metric_only_len variable.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20180606221513.11302-7-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Make the metric only display aligned.

    Before:
    # perf stat --topdown -I 1000
    # time core cpus retiring bad speculation frontend bound backend bound
    1.000394323 S0-C0 2 37.4% 12.0% 31.4% 19.2%
    1.000394323 S0-C1 2 25.1% 9.2% 43.8% 21.9%
    2.001521204 S0-C0 2 36.4% 11.4% 32.4% 19.8%
    2.001521204 S0-C1 2 26.2% 9.4% 43.1% 21.3%
    3.001930208 S0-C0 2 35.1% 10.7% 33.6% 20.6%
    3.001930208 S0-C1 2 28.9% 10.0% 40.0% 21.1%

    After:
    # perf stat --topdown -I 1000
    # time core cpus retiring bad speculation frontend bound backend bound
    1.000303722 S0-C0 2 34.2% 7.6% 34.2% 24.0%
    1.000303722 S0-C1 2 33.1% 6.4% 36.9% 23.6%
    2.001281055 S0-C0 2 34.6% 6.7% 36.8% 21.8%
    2.001281055 S0-C1 2 32.8% 7.1% 38.1% 22.0%
    3.001546080 S0-C0 2 39.3% 5.5% 32.7% 22.5%
    3.001546080 S0-C1 2 37.8% 6.0% 33.1% 23.1%

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20180606221513.11302-6-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • We can call color_fprintf also for non color case, it's handled
    properly. This change simplifies following patch.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20180606221513.11302-5-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding --interval-clear option to clear the screen before next interval.

    Committer testing:

    # perf stat -I 1000 --interval-clear

    And, as expected, it behaves almost like:

    # watch -n 0 perf stat -a sleep 1

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20180606221513.11302-4-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

06 Jun, 2018

1 commit

  • Adding the support to read rusage data once the workload is finished and
    display the system/user time values:

    $ perf stat --null perf bench sched pipe
    ...

    Performance counter stats for 'perf bench sched pipe':

    5.342599256 seconds time elapsed

    2.544434000 seconds user
    4.549691000 seconds sys

    It works only in non -r mode and only for workload target.

    So as of now, for workload targets, we display 3 types of timings. The
    time we meassure in perf stat from enable to disable+period:

    5.342599256 seconds time elapsed

    The time spent in user and system lands, displayed only for workload
    session/target:

    2.544434000 seconds user
    4.549691000 seconds sys

    Those times are the very same displayed by 'time' tool. They are
    returned by wait4 call via the getrusage struct interface.

    Committer notes:

    Had to rename some variables to avoid this on older systems such as
    centos:6:

    builtin-stat.c: In function 'print_footer':
    builtin-stat.c:1831: warning: declaration of 'stime' shadows a global declaration
    /usr/include/time.h:297: warning: shadowed declaration is here

    Committer testing:

    # perf stat --null time perf bench sched pipe
    # Running 'sched/pipe' benchmark:
    # Executed 1000000 pipe operations between two processes

    Total time: 5.526 [sec]

    5.526534 usecs/op
    180945 ops/sec
    1.00user 6.25system 0:05.52elapsed 131%CPU (0avgtext+0avgdata 8056maxresident)k
    0inputs+0outputs (0major+606minor)pagefaults 0swaps

    Performance counter stats for 'time perf bench sched pipe':

    5.530978744 seconds time elapsed

    1.004037000 seconds user
    6.259937000 seconds sys

    #

    Suggested-by: Ingo Molnar
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180605121313.31337-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

26 Apr, 2018

3 commits

  • Adding support to display visual aid 'length strings' to easily spot the
    biggest difference in time table.

    $ perf stat -r 10 --table perf bench sched pipe
    ...

    Performance counter stats for './perf bench sched pipe' (5 runs):

    # Table of individual measurements:
    5.189 (-0.293) #
    5.189 (-0.294) #
    5.186 (-0.296) #
    5.663 (+0.181) ##
    6.186 (+0.703) ####

    # Final result:
    5.483 +- 0.198 seconds time elapsed ( +- 3.62% )

    Suggested-by: Ingo Molnar
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180423090823.32309-9-jolsa@kernel.org
    [ Updated 'perf stat --table' man page entry ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Add --table option to display time for each run (-r option), like:

    $ perf stat --null -r 5 --table perf bench sched pipe

    Performance counter stats for './perf bench sched pipe' (5 runs):

    # Table of individual measurements:
    5.379 (-0.176)
    5.243 (-0.311)
    5.238 (-0.317)
    5.536 (-0.019)
    6.377 (+0.823)

    # Final result:
    5.555 +- 0.213 seconds time elapsed ( +- 3.83% )

    Suggested-by: Ingo Molnar
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180423090823.32309-8-jolsa@kernel.org
    [ Document the new option in 'perf stat's man page ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Ingo suggested to display elapsed time for multirun workload (perf stat
    -e) with precision based on the precision of the standard deviation.

    In his own words:

    > This output is a slightly bit misleading:

    > Performance counter stats for 'make -j128' (10 runs):
    > 27.988995256 seconds time elapsed ( +- 0.39% )

    > The 9 significant digits in the result, while only 1 is valid, suggests accuracy
    > where none exists.

    > It would be better if 'perf stat' would display elapsed time with a precision
    > adjusted to stddev, it should display at most 2 more significant digits than
    > the stddev inaccuracy.

    > I.e. in the above case 0.39% is 0.109, so we only have accuracy for 1 digit, and
    > so we should only display 3:

    > 27.988 seconds time elapsed ( +- 0.39% )

    Plus a suggestion about the output, which is small enough and connected
    with the above change that I merged both changes together.

    > Small output style nit - I think it would be nice if with --repeat the stddev was
    > also displayed in absolute values, besides percentage:
    >
    > 27.988 +- 0.109 seconds time elapsed ( +- 0.39% )

    The output is now:

    Performance counter stats for './perf bench sched pipe' (5 runs):
    SNIP
    13.3667 +- 0.0256 seconds time elapsed ( +- 0.19% )

    Suggested-by: Ingo Molnar
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180423090823.32309-7-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

25 Apr, 2018

2 commits

  • PMU name is printed repeatedly for interval print, for example:

    perf stat --no-merge -e 'unc_m_clockticks' -a -I 1000
    # time counts unit events
    1.001053069 243,702,144 unc_m_clockticks [uncore_imc_4]
    1.001053069 244,268,304 unc_m_clockticks [uncore_imc_2]
    1.001053069 244,427,386 unc_m_clockticks [uncore_imc_0]
    1.001053069 244,583,760 unc_m_clockticks [uncore_imc_5]
    1.001053069 244,738,971 unc_m_clockticks [uncore_imc_3]
    1.001053069 244,880,309 unc_m_clockticks [uncore_imc_1]
    2.002024821 240,818,200 unc_m_clockticks [uncore_imc_4] [uncore_imc_4]
    2.002024821 240,767,812 unc_m_clockticks [uncore_imc_2] [uncore_imc_2]
    2.002024821 240,764,215 unc_m_clockticks [uncore_imc_0] [uncore_imc_0]
    2.002024821 240,759,504 unc_m_clockticks [uncore_imc_5] [uncore_imc_5]
    2.002024821 240,755,992 unc_m_clockticks [uncore_imc_3] [uncore_imc_3]
    2.002024821 240,750,403 unc_m_clockticks [uncore_imc_1] [uncore_imc_1]

    For each print, the PMU name is unconditionally appended to the
    counter->name.

    Need to check the counter->name first. If the PMU name is already
    appended, do nothing.

    Committer notes:

    Add and use perf_evsel->uniquified_name bool instead of doing the more
    expensive strstr(event->name, pmu->name).

    Signed-off-by: Kan Liang
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Agustin Vega-Frias
    Cc: Andi Kleen
    Cc: Ganapatrao Kulkarni
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Shaokun Zhang
    Cc: Will Deacon
    Fixes: 8c5421c016a4 ("perf pmu: Display pmu name when printing unmerged events in stat")
    Link: http://lkml.kernel.org/r/1524594014-79243-5-git-send-email-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     
  • Perf doesn't support mixed events from different PMUs (except software
    event) in a group. For this case, only "" or "" are printed out. There is no hint which guides users to fix
    the issue.

    Checking the PMU type of events to determine if they are from the same
    PMU. There may be false alarm for the checking. E.g. the core PMU has
    different PMU type. But it should not happen often.

    The false alarm can also be tolerated, because:

    - It only happens on error path.
    - It just provides a possible solution for the issue.

    Signed-off-by: Kan Liang
    Cc: Agustin Vega-Frias
    Cc: Andi Kleen
    Cc: Ganapatrao Kulkarni
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Shaokun Zhang
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1524594014-79243-2-git-send-email-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

12 Apr, 2018

1 commit

  • Currently print count interval for performance counters values is
    limited by 10ms so reading the values at frequencies higher than 100Hz
    is restricted by the tool.

    This change makes perf stat -I possible on frequencies up to 1KHz and,
    to some extent, makes perf stat -I to be on-par with perf record
    sampling profiling.

    When running perf stat -I for monitoring e.g. PCIe uncore counters and
    at the same time profiling some I/O workload by perf record e.g. for
    cpu-cycles and context switches, it is then possible to observe
    consolidated CPU/OS/IO(Uncore) performance picture for that workload.

    Tool overhead warning printed when specifying -v option can be missed
    due to screen scrolling in case you have output to the console
    so message is moved into help available by running perf stat -h.

    Signed-off-by: Alexey Budankov
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/b842ad6a-d606-32e4-afe5-974071b5198e@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Alexey Budankov
     

17 Mar, 2018

1 commit

  • Executing command 'perf stat -T -- ls' dumps core on x86 and s390.

    Here is the call back chain (done on x86):

    # gdb ./perf
    ....
    (gdb) r stat -T -- ls
    ...
    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
    (gdb) where
    #0 0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
    #1 0x00007ffff56ae484 in asprintf () from /lib64/libc.so.6
    #2 0x00000000004f1982 in __parse_events_add_pmu (parse_state=0x7fffffffd580,
    list=0xbfb970, name=0xbf3ef0 "cpu",
    head_config=0xbfb930, auto_merge_stats=false) at util/parse-events.c:1233
    #3 0x00000000004f1c8e in parse_events_add_pmu (parse_state=0x7fffffffd580,
    list=0xbfb970, name=0xbf3ef0 "cpu",
    head_config=0xbfb930) at util/parse-events.c:1288
    #4 0x0000000000537ce3 in parse_events_parse (_parse_state=0x7fffffffd580,
    scanner=0xbf4210) at util/parse-events.y:234
    #5 0x00000000004f2c7a in parse_events__scanner (str=0x6b66c0
    "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}",
    parse_state=0x7fffffffd580, start_token=258) at util/parse-events.c:1673
    #6 0x00000000004f2e23 in parse_events (evlist=0xbe9990, str=0x6b66c0
    "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", err=0x0)
    at util/parse-events.c:1713
    #7 0x000000000044e137 in add_default_attributes () at builtin-stat.c:2281
    #8 0x000000000044f7b5 in cmd_stat (argc=1, argv=0x7fffffffe3b0) at
    builtin-stat.c:2828
    #9 0x00000000004c8b0f in run_builtin (p=0xab01a0 , argc=4,
    argv=0x7fffffffe3b0) at perf.c:297
    #10 0x00000000004c8d7c in handle_internal_command (argc=4,
    argv=0x7fffffffe3b0) at perf.c:349
    #11 0x00000000004c8ece in run_argv (argcp=0x7fffffffe20c,
    argv=0x7fffffffe200) at perf.c:393
    #12 0x00000000004c929c in main (argc=4, argv=0x7fffffffe3b0) at perf.c:537
    (gdb)

    It turns out that a NULL pointer is referenced. Here are the
    function calls:

    ...
    cmd_stat()
    +---> add_default_attributes()
    +---> parse_events(evsel_list, transaction_attrs, NULL);
    3rd parameter set to NULL

    Function parse_events(xx, xx, struct parse_events_error *err) dives
    into a bison generated scanner and creates
    parser state information for it first:

    struct parse_events_state parse_state = {
    .list = LIST_HEAD_INIT(parse_state.list),
    .idx = evlist->nr_entries,
    .error = err, error.str, ....)

    which references a NULL pointer and dumps core.

    Fix this by providing a pointer to the necessary error information
    instead of NULL. Technically only the else part is needed to avoid the
    core dump, just lets be safe...

    Signed-off-by: Thomas Richter
    Cc: Heiko Carstens
    Cc: Hendrik Brueckner
    Cc: Martin Schwidefsky
    Link: http://lkml.kernel.org/r/20180308145735.64717-1-tmricht@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Richter
     

08 Mar, 2018

1 commit

  • To simplify creation of events accross multiple instances of the same
    type of PMU stat supports two methods for creating multiple events from
    a single event specification:

    1. A prefix or glob can be used in the PMU name.
    2. Aliases, which are listed immediately after the Kernel PMU events
    by perf list, are used.

    When the --no-merge option is passed and these events are displayed
    individually the PMU name is lost and it's not possible to see which
    count corresponds to which pmu:

    $ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null

    Performance counter stats for 'system wide':

    67 l3cache/read-miss/
    67 l3cache/read-miss/
    63 l3cache/read-miss/
    60 l3cache/read-miss/

    0.001675706 seconds time elapsed

    $ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null

    Performance counter stats for 'system wide':

    12 l3cache_read_miss
    17 l3cache_read_miss
    10 l3cache_read_miss
    8 l3cache_read_miss

    0.001661305 seconds time elapsed

    This change adds the original pmu name to the event. For dynamic pmu
    events the pmu name is restored in the event name:

    $ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null

    Performance counter stats for 'system wide':

    63 l3cache_0_3/read-miss/
    74 l3cache_0_1/read-miss/
    64 l3cache_0_2/read-miss/
    74 l3cache_0_0/read-miss/

    0.001675706 seconds time elapsed

    For alias events the name is added after the event name:

    $ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null

    Performance counter stats for 'system wide':

    10 l3cache_read_miss [l3cache_0_3]
    12 l3cache_read_miss [l3cache_0_1]
    10 l3cache_read_miss [l3cache_0_2]
    17 l3cache_read_miss [l3cache_0_0]

    0.001661305 seconds time elapsed

    Signed-off-by: Agustin Vega-Frias
    Acked-by: Andi Kleen
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Timur Tabi
    Cc: linux-arm-kernel@lists.infradead.org
    Change-Id: I8056b9eda74bda33e95065056167ad96e97cb1fb
    Link: http://lkml.kernel.org/r/1520345084-42646-3-git-send-email-agustinv@codeaurora.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Agustin Vega-Frias
     

07 Mar, 2018

1 commit


06 Mar, 2018

1 commit

  • When printing stats in CSV mode, 'perf stat' appends extra separators
    when a counter is not supported:

    ,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00,,,,

    Which causes a failure when parsing fields. The numbers of separators
    should be the same for each line, no matter if the counter is or not
    supported.

    Signed-off-by: Ilya Pronin
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/20180306064353.31930-1-xiyou.wangcong@gmail.com
    Fixes: 92a61f6412d3 ("perf stat: Implement CSV metrics output")
    Signed-off-by: Cong Wang
    Signed-off-by: Arnaldo Carvalho de Melo

    Ilya Pronin
     

27 Feb, 2018

1 commit

  • If we execute 'perf stat --per-thread' with non-root account (even set
    kernel.perf_event_paranoid = -1 yet), it reports the error:

    jinyao@skl:~$ perf stat --per-thread
    Error:
    You may not have permission to collect system-wide stats.

    Consider tweaking /proc/sys/kernel/perf_event_paranoid,
    which controls use of the performance events system by
    unprivileged users (without CAP_SYS_ADMIN).

    The current value is 2:

    -1: Allow use of (almost) all events by all users
    Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
    >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
    Disallow raw tracepoint access by users without CAP_SYS_ADMIN
    >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
    >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

    To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

    kernel.perf_event_paranoid = -1

    Perhaps the ptrace rule doesn't allow to trace some processes. But anyway
    the global --per-thread mode had better ignore such errors and continue
    working on other threads.

    This patch will record the index of error thread in perf_evsel__open()
    and remove this thread before retrying.

    For example (run with non-root, kernel.perf_event_paranoid isn't set):

    jinyao@skl:~$ perf stat --per-thread
    ^C
    Performance counter stats for 'system wide':

    vmstat-3458 6.171984 cpu-clock:u (msec) # 0.000 CPUs utilized
    perf-3670 0.515599 cpu-clock:u (msec) # 0.000 CPUs utilized
    vmstat-3458 1,163,643 cycles:u # 0.189 GHz
    perf-3670 40,881 cycles:u # 0.079 GHz
    vmstat-3458 1,410,238 instructions:u # 1.21 insn per cycle
    perf-3670 3,536 instructions:u # 0.09 insn per cycle
    vmstat-3458 288,937 branches:u # 46.814 M/sec
    perf-3670 936 branches:u # 1.815 M/sec
    vmstat-3458 15,195 branch-misses:u # 5.26% of all branches
    perf-3670 76 branch-misses:u # 8.12% of all branches

    12.651675247 seconds time elapsed

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1516117388-10120-1-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

21 Feb, 2018

1 commit


16 Feb, 2018

2 commits

  • Introduce a new option to print counts after N milliseconds and update
    'perf stat' documentation accordingly.

    Show below is the output of the new option for perf stat.

    $ perf stat --time 2000 -e cycles -a
    Performance counter stats for 'system wide':

    157,260,423 cycles

    2.003060766 seconds time elapsed

    We can print the count deltas after N milliseconds with this new
    introduced option. This option is not supported with "-I" option.

    In addition, according to Kangliang's patch(19afd10410957), the
    monitoring overhead for system-wide core event could be very high if the
    interval-print parameter was below 100ms, and the limitation value is
    10ms.

    So the same warning will be displayed when the time is set between 10ms
    to 100ms, and the minimal time is limited to 10ms. Users can make a
    decision according to their spcific cases.

    Committer notes:

    This actually stops the workload after the specified time, then prints
    the counts.

    So I renamed the option to --timeout and updated the documentation to
    state that it will not just print the counts after the specified time,
    but will really stop the 'perf stat' session and print the counts.

    The rename from 'time' to 'timeout' also fixes the build in systems
    where 'time' is used by glibc and can't be used as a name of a variable,
    such as centos:5 and centos:6.

    Changes since v3:
    - none.

    Changes since v2:
    - modify the time check in __run_perf_stat func to keep some consistency
    with the workload case.
    - add the warning when the time is set between 10ms to 100ms.
    - add the pr_err when the time is set below 10ms.

    Changes since v1:
    - none.

    Signed-off-by: yuzhoujian
    Acked-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Kan Liang
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1517217923-8302-3-git-send-email-ufo19890607@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    yuzhoujian
     
  • Introduce a new option to print counts for fixed number of times and
    update 'perf stat' documentation accordingly.

    Show below is the output of the new option for perf stat.

    $ perf stat -I 1000 --interval-count 2 -e cycles -a
    # time counts unit events
    1.002827089 93,884,870 cycles
    2.004231506 56,573,446 cycles

    We can just print the counts for several times with this newly
    introduced option. The usage of it is a little like 'vmstat', and it
    should be used together with "-I" option.

    $ vmstat -n 1 2
    procs ---------memory-------------- --swap- ----io-- -system-- ------cpu---
    r b swpd free buff cache si so bi bo in cs us sy id wa st
    0 0 0 78270544 547484 51732076 0 0 0 20 1 1 1 0 99 0 0
    0 0 0 78270512 547484 51732080 0 0 0 16 477 1555 0 0 100 0 0

    Changes since v3:
    - merge interval_count check and times check to one line.
    - fix the wrong indent in stat.h
    - use stat_config.times instead of 'times' in cmd_stat function.

    Changes since v2:
    - none.

    Changes since v1:
    - change the name of the new option "times-print" to "interval-count".
    - keep the new option interval specifically.

    Signed-off-by: yuzhoujian
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Kan Liang
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1517217923-8302-2-git-send-email-ufo19890607@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    yuzhoujian
     

27 Dec, 2017

7 commits

  • These duplicate includes have been found with scripts/checkincludes.pl
    but they have been removed manually to avoid removing false positives.

    Signed-off-by: Pravin Shedge
    Cc: David S. Miller
    Cc: Greg Kroah-Hartman
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1512582204-6493-1-git-send-email-pravin.shedge4linux@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Pravin Shedge
     
  • There are many threads reported if we enable '--per-thread'
    globally.

    1. Most of the threads are not counted or counting value 0.
    This patch removes these threads.

    2. We also resort the threads in display according to the
    counting value. It's useful for user to see the hottest
    threads easily.

    For example, the new results would be:

    root@skl:/tmp# perf stat --per-thread
    ^C
    Performance counter stats for 'system wide':

    perf-24165 4.302433 cpu-clock (msec) # 0.001 CPUs utilized
    vmstat-23127 1.562215 cpu-clock (msec) # 0.000 CPUs utilized
    irqbalance-2780 0.827851 cpu-clock (msec) # 0.000 CPUs utilized
    sshd-23111 0.278308 cpu-clock (msec) # 0.000 CPUs utilized
    thermald-2841 0.230880 cpu-clock (msec) # 0.000 CPUs utilized
    sshd-23058 0.207306 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/0:2-19991 0.133983 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/u16:1-18249 0.125636 cpu-clock (msec) # 0.000 CPUs utilized
    rcu_sched-8 0.085533 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/u16:2-23146 0.077139 cpu-clock (msec) # 0.000 CPUs utilized
    gmain-2700 0.041789 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/4:1-15354 0.028370 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/6:0-17528 0.023895 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/4:1H-1887 0.013209 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/5:2-31362 0.011627 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/0-11 0.010892 cpu-clock (msec) # 0.000 CPUs utilized
    kworker/3:2-12870 0.010220 cpu-clock (msec) # 0.000 CPUs utilized
    ksoftirqd/0-7 0.008869 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/1-14 0.008476 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/7-50 0.002944 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/3-26 0.002893 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/4-32 0.002759 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/2-20 0.002429 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/6-44 0.001491 cpu-clock (msec) # 0.000 CPUs utilized
    watchdog/5-38 0.001477 cpu-clock (msec) # 0.000 CPUs utilized
    rcu_sched-8 10 context-switches # 0.117 M/sec
    kworker/u16:1-18249 7 context-switches # 0.056 M/sec
    sshd-23111 4 context-switches # 0.014 M/sec
    vmstat-23127 4 context-switches # 0.003 M/sec
    perf-24165 4 context-switches # 0.930 K/sec
    kworker/0:2-19991 3 context-switches # 0.022 M/sec
    kworker/u16:2-23146 3 context-switches # 0.039 M/sec
    kworker/4:1-15354 2 context-switches # 0.070 M/sec
    kworker/6:0-17528 2 context-switches # 0.084 M/sec
    sshd-23058 2 context-switches # 0.010 M/sec
    ksoftirqd/0-7 1 context-switches # 0.113 M/sec
    watchdog/0-11 1 context-switches # 0.092 M/sec
    watchdog/1-14 1 context-switches # 0.118 M/sec
    watchdog/2-20 1 context-switches # 0.412 M/sec
    watchdog/3-26 1 context-switches # 0.346 M/sec
    watchdog/4-32 1 context-switches # 0.362 M/sec
    watchdog/5-38 1 context-switches # 0.677 M/sec
    watchdog/6-44 1 context-switches # 0.671 M/sec
    watchdog/7-50 1 context-switches # 0.340 M/sec
    kworker/4:1H-1887 1 context-switches # 0.076 M/sec
    thermald-2841 1 context-switches # 0.004 M/sec
    gmain-2700 1 context-switches # 0.024 M/sec
    irqbalance-2780 1 context-switches # 0.001 M/sec
    kworker/3:2-12870 1 context-switches # 0.098 M/sec
    kworker/5:2-31362 1 context-switches # 0.086 M/sec
    kworker/u16:1-18249 2 cpu-migrations # 0.016 M/sec
    kworker/u16:2-23146 2 cpu-migrations # 0.026 M/sec
    rcu_sched-8 1 cpu-migrations # 0.012 M/sec
    sshd-23058 1 cpu-migrations # 0.005 M/sec
    perf-24165 8,833,385 cycles # 2.053 GHz
    vmstat-23127 1,702,699 cycles # 1.090 GHz
    irqbalance-2780 739,847 cycles # 0.894 GHz
    sshd-23111 269,506 cycles # 0.968 GHz
    thermald-2841 204,556 cycles # 0.886 GHz
    sshd-23058 158,780 cycles # 0.766 GHz
    kworker/0:2-19991 112,981 cycles # 0.843 GHz
    kworker/u16:1-18249 100,926 cycles # 0.803 GHz
    rcu_sched-8 74,024 cycles # 0.865 GHz
    kworker/u16:2-23146 55,984 cycles # 0.726 GHz
    gmain-2700 34,278 cycles # 0.820 GHz
    kworker/4:1-15354 20,665 cycles # 0.728 GHz
    kworker/6:0-17528 16,445 cycles # 0.688 GHz
    kworker/5:2-31362 9,492 cycles # 0.816 GHz
    watchdog/3-26 8,695 cycles # 3.006 GHz
    kworker/4:1H-1887 8,238 cycles # 0.624 GHz
    watchdog/4-32 7,580 cycles # 2.747 GHz
    kworker/3:2-12870 7,306 cycles # 0.715 GHz
    watchdog/2-20 7,274 cycles # 2.995 GHz
    watchdog/0-11 6,988 cycles # 0.642 GHz
    ksoftirqd/0-7 6,376 cycles # 0.719 GHz
    watchdog/1-14 5,340 cycles # 0.630 GHz
    watchdog/5-38 4,061 cycles # 2.749 GHz
    watchdog/6-44 3,976 cycles # 2.667 GHz
    watchdog/7-50 3,418 cycles # 1.161 GHz
    vmstat-23127 2,511,699 instructions # 1.48 insn per cycle
    perf-24165 1,829,908 instructions # 0.21 insn per cycle
    irqbalance-2780 1,190,204 instructions # 1.61 insn per cycle
    thermald-2841 143,544 instructions # 0.70 insn per cycle
    sshd-23111 128,138 instructions # 0.48 insn per cycle
    sshd-23058 57,654 instructions # 0.36 insn per cycle
    rcu_sched-8 44,063 instructions # 0.60 insn per cycle
    kworker/u16:1-18249 42,551 instructions # 0.42 insn per cycle
    kworker/0:2-19991 25,873 instructions # 0.23 insn per cycle
    kworker/u16:2-23146 21,407 instructions # 0.38 insn per cycle
    gmain-2700 13,691 instructions # 0.40 insn per cycle
    kworker/4:1-15354 12,964 instructions # 0.63 insn per cycle
    kworker/6:0-17528 10,034 instructions # 0.61 insn per cycle
    kworker/5:2-31362 5,203 instructions # 0.55 insn per cycle
    kworker/3:2-12870 4,866 instructions # 0.67 insn per cycle
    kworker/4:1H-1887 3,586 instructions # 0.44 insn per cycle
    ksoftirqd/0-7 3,463 instructions # 0.54 insn per cycle
    watchdog/0-11 3,135 instructions # 0.45 insn per cycle
    watchdog/1-14 3,135 instructions # 0.59 insn per cycle
    watchdog/2-20 3,135 instructions # 0.43 insn per cycle
    watchdog/3-26 3,135 instructions # 0.36 insn per cycle
    watchdog/4-32 3,135 instructions # 0.41 insn per cycle
    watchdog/5-38 3,135 instructions # 0.77 insn per cycle
    watchdog/6-44 3,135 instructions # 0.79 insn per cycle
    watchdog/7-50 3,135 instructions # 0.92 insn per cycle
    vmstat-23127 539,181 branches # 345.139 M/sec
    perf-24165 375,364 branches # 87.245 M/sec
    irqbalance-2780 262,092 branches # 316.593 M/sec
    thermald-2841 31,611 branches # 136.915 M/sec
    sshd-23111 21,874 branches # 78.596 M/sec
    sshd-23058 10,682 branches # 51.528 M/sec
    rcu_sched-8 8,693 branches # 101.633 M/sec
    kworker/u16:1-18249 7,891 branches # 62.808 M/sec
    kworker/0:2-19991 5,761 branches # 42.998 M/sec
    kworker/u16:2-23146 4,099 branches # 53.138 M/sec
    kworker/4:1-15354 2,755 branches # 97.110 M/sec
    gmain-2700 2,638 branches # 63.127 M/sec
    kworker/6:0-17528 2,216 branches # 92.739 M/sec
    kworker/5:2-31362 1,132 branches # 97.360 M/sec
    kworker/3:2-12870 1,081 branches # 105.773 M/sec
    kworker/4:1H-1887 725 branches # 54.887 M/sec
    ksoftirqd/0-7 707 branches # 79.716 M/sec
    watchdog/0-11 652 branches # 59.860 M/sec
    watchdog/1-14 652 branches # 76.923 M/sec
    watchdog/2-20 652 branches # 268.423 M/sec
    watchdog/3-26 652 branches # 225.372 M/sec
    watchdog/4-32 652 branches # 236.318 M/sec
    watchdog/5-38 652 branches # 441.435 M/sec
    watchdog/6-44 652 branches # 437.290 M/sec
    watchdog/7-50 652 branches # 221.467 M/sec
    vmstat-23127 8,960 branch-misses # 1.66% of all branches
    irqbalance-2780 3,047 branch-misses # 1.16% of all branches
    perf-24165 2,876 branch-misses # 0.77% of all branches
    sshd-23111 1,843 branch-misses # 8.43% of all branches
    thermald-2841 1,444 branch-misses # 4.57% of all branches
    sshd-23058 1,379 branch-misses # 12.91% of all branches
    kworker/u16:1-18249 982 branch-misses # 12.44% of all branches
    rcu_sched-8 893 branch-misses # 10.27% of all branches
    kworker/u16:2-23146 578 branch-misses # 14.10% of all branches
    kworker/0:2-19991 376 branch-misses # 6.53% of all branches
    gmain-2700 280 branch-misses # 10.61% of all branches
    kworker/6:0-17528 196 branch-misses # 8.84% of all branches
    kworker/4:1-15354 187 branch-misses # 6.79% of all branches
    kworker/5:2-31362 123 branch-misses # 10.87% of all branches
    watchdog/0-11 95 branch-misses # 14.57% of all branches
    watchdog/4-32 89 branch-misses # 13.65% of all branches
    kworker/3:2-12870 80 branch-misses # 7.40% of all branches
    watchdog/3-26 61 branch-misses # 9.36% of all branches
    kworker/4:1H-1887 60 branch-misses # 8.28% of all branches
    watchdog/2-20 52 branch-misses # 7.98% of all branches
    ksoftirqd/0-7 47 branch-misses # 6.65% of all branches
    watchdog/1-14 46 branch-misses # 7.06% of all branches
    watchdog/7-50 13 branch-misses # 1.99% of all branches
    watchdog/5-38 8 branch-misses # 1.23% of all branches
    watchdog/6-44 7 branch-misses # 1.07% of all branches

    3.695150786 seconds time elapsed

    root@skl:/tmp# perf stat --per-thread -M IPC,CPI
    ^C

    Performance counter stats for 'system wide':

    vmstat-23127 2,000,783 inst_retired.any # 1.5 IPC
    thermald-2841 1,472,670 inst_retired.any # 1.3 IPC
    sshd-23111 977,374 inst_retired.any # 1.2 IPC
    perf-24163 483,779 inst_retired.any # 0.2 IPC
    gmain-2700 341,213 inst_retired.any # 0.9 IPC
    sshd-23058 148,891 inst_retired.any # 0.8 IPC
    rtkit-daemon-3288 71,210 inst_retired.any # 0.7 IPC
    kworker/u16:1-18249 39,562 inst_retired.any # 0.3 IPC
    rcu_sched-8 14,474 inst_retired.any # 0.8 IPC
    kworker/0:2-19991 7,659 inst_retired.any # 0.2 IPC
    kworker/4:1-15354 6,714 inst_retired.any # 0.8 IPC
    rtkit-daemon-3289 4,839 inst_retired.any # 0.3 IPC
    kworker/6:0-17528 3,321 inst_retired.any # 0.6 IPC
    kworker/5:2-31362 3,215 inst_retired.any # 0.5 IPC
    kworker/7:2-23145 3,173 inst_retired.any # 0.7 IPC
    kworker/4:1H-1887 1,719 inst_retired.any # 0.3 IPC
    watchdog/0-11 1,479 inst_retired.any # 0.3 IPC
    watchdog/1-14 1,479 inst_retired.any # 0.3 IPC
    watchdog/2-20 1,479 inst_retired.any # 0.4 IPC
    watchdog/3-26 1,479 inst_retired.any # 0.4 IPC
    watchdog/4-32 1,479 inst_retired.any # 0.3 IPC
    watchdog/5-38 1,479 inst_retired.any # 0.3 IPC
    watchdog/6-44 1,479 inst_retired.any # 0.7 IPC
    watchdog/7-50 1,479 inst_retired.any # 0.7 IPC
    kworker/u16:2-23146 1,408 inst_retired.any # 0.5 IPC
    perf-24163 2,249,872 cpu_clk_unhalted.thread
    vmstat-23127 1,352,455 cpu_clk_unhalted.thread
    thermald-2841 1,161,140 cpu_clk_unhalted.thread
    sshd-23111 807,827 cpu_clk_unhalted.thread
    gmain-2700 375,535 cpu_clk_unhalted.thread
    sshd-23058 194,071 cpu_clk_unhalted.thread
    kworker/u16:1-18249 114,306 cpu_clk_unhalted.thread
    rtkit-daemon-3288 103,547 cpu_clk_unhalted.thread
    kworker/0:2-19991 46,550 cpu_clk_unhalted.thread
    rcu_sched-8 18,855 cpu_clk_unhalted.thread
    rtkit-daemon-3289 17,549 cpu_clk_unhalted.thread
    kworker/4:1-15354 8,812 cpu_clk_unhalted.thread
    kworker/5:2-31362 6,812 cpu_clk_unhalted.thread
    kworker/4:1H-1887 5,270 cpu_clk_unhalted.thread
    kworker/6:0-17528 5,111 cpu_clk_unhalted.thread
    kworker/7:2-23145 4,667 cpu_clk_unhalted.thread
    watchdog/0-11 4,663 cpu_clk_unhalted.thread
    watchdog/1-14 4,663 cpu_clk_unhalted.thread
    watchdog/4-32 4,626 cpu_clk_unhalted.thread
    watchdog/5-38 4,403 cpu_clk_unhalted.thread
    watchdog/3-26 3,936 cpu_clk_unhalted.thread
    watchdog/2-20 3,850 cpu_clk_unhalted.thread
    kworker/u16:2-23146 2,654 cpu_clk_unhalted.thread
    watchdog/6-44 2,017 cpu_clk_unhalted.thread
    watchdog/7-50 2,017 cpu_clk_unhalted.thread
    vmstat-23127 2,000,783 inst_retired.any # 0.7 CPI
    thermald-2841 1,472,670 inst_retired.any # 0.8 CPI
    sshd-23111 977,374 inst_retired.any # 0.8 CPI
    perf-24163 495,037 inst_retired.any # 4.7 CPI
    gmain-2700 341,213 inst_retired.any # 1.1 CPI
    sshd-23058 148,891 inst_retired.any # 1.3 CPI
    rtkit-daemon-3288 71,210 inst_retired.any # 1.5 CPI
    kworker/u16:1-18249 39,562 inst_retired.any # 2.9 CPI
    rcu_sched-8 14,474 inst_retired.any # 1.3 CPI
    kworker/0:2-19991 7,659 inst_retired.any # 6.1 CPI
    kworker/4:1-15354 6,714 inst_retired.any # 1.3 CPI
    rtkit-daemon-3289 4,839 inst_retired.any # 3.6 CPI
    kworker/6:0-17528 3,321 inst_retired.any # 1.5 CPI
    kworker/5:2-31362 3,215 inst_retired.any # 2.1 CPI
    kworker/7:2-23145 3,173 inst_retired.any # 1.5 CPI
    kworker/4:1H-1887 1,719 inst_retired.any # 3.1 CPI
    watchdog/0-11 1,479 inst_retired.any # 3.2 CPI
    watchdog/1-14 1,479 inst_retired.any # 3.2 CPI
    watchdog/2-20 1,479 inst_retired.any # 2.6 CPI
    watchdog/3-26 1,479 inst_retired.any # 2.7 CPI
    watchdog/4-32 1,479 inst_retired.any # 3.1 CPI
    watchdog/5-38 1,479 inst_retired.any # 3.0 CPI
    watchdog/6-44 1,479 inst_retired.any # 1.4 CPI
    watchdog/7-50 1,479 inst_retired.any # 1.4 CPI
    kworker/u16:2-23146 1,408 inst_retired.any # 1.9 CPI
    perf-24163 2,302,323 cycles
    vmstat-23127 1,352,455 cycles
    thermald-2841 1,161,140 cycles
    sshd-23111 807,827 cycles
    gmain-2700 375,535 cycles
    sshd-23058 194,071 cycles
    kworker/u16:1-18249 114,306 cycles
    rtkit-daemon-3288 103,547 cycles
    kworker/0:2-19991 46,550 cycles
    rcu_sched-8 18,855 cycles
    rtkit-daemon-3289 17,549 cycles
    kworker/4:1-15354 8,812 cycles
    kworker/5:2-31362 6,812 cycles
    kworker/4:1H-1887 5,270 cycles
    kworker/6:0-17528 5,111 cycles
    kworker/7:2-23145 4,667 cycles
    watchdog/0-11 4,663 cycles
    watchdog/1-14 4,663 cycles
    watchdog/4-32 4,626 cycles
    watchdog/5-38 4,403 cycles
    watchdog/3-26 3,936 cycles
    watchdog/2-20 3,850 cycles
    kworker/u16:2-23146 2,654 cycles
    watchdog/6-44 2,017 cycles
    watchdog/7-50 2,017 cycles

    2.175726600 seconds time elapsed

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1512482591-4646-12-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • Currently, if we execute 'perf stat --per-thread' without specifying
    pid/tid, perf will return error.

    root@skl:/tmp# perf stat --per-thread
    The --per-thread option is only available when monitoring via -p -t options.
    -p, --pid stat events on existing process id
    -t, --tid stat events on existing thread id

    This patch removes this limitation. If no pid/tid specified, it returns
    all threads (get threads from /proc).

    Note that it doesn't support cpu_list yet so if it's a cpu_list case,
    then skip.

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1512482591-4646-11-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • If the stats pointer in stat_config structure is not null, it will
    update the per-thread stats or print the per-thread stats on this
    buffer.

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1512482591-4646-9-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • After perf_evlist__create_maps() being executed, we can get all threads
    from /proc. And via thread_map__nr(), we can also get the number of
    threads.

    With the number of threads, the patch allocates a buffer which will
    record the shadow stats for these threads.

    The buffer pointer is saved in stat_config.

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1512482591-4646-8-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • The function perf_stat__print_shadow_stats() is called to print the
    shadow stats on a set of static variables.

    But the static variables are the limitations to support
    per-thread shadow stats.

    This patch lets the perf_stat__print_shadow_stats() support
    to print the shadow stats from a input parameter 'st'.

    It will not directly get value from static variable. Instead,
    it now uses runtime_stat_avg() and runtime_stat_n() to get and
    compute the values.

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1512482591-4646-6-git-send-email-yao.jin@linux.intel.com
    [ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • The functions perf_stat__update_shadow_stats() is called to update the
    shadow stats on a set of static variables.

    But the static variables are the limitations to be extended to support
    per-thread shadow stats.

    This patch lets the perf_stat__update_shadow_stats() support to update
    the shadow stats on a input parameter 'st' and uses
    update_runtime_stat() to update the stats. It will not directly update
    the static variables as before.

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1512482591-4646-5-git-send-email-yao.jin@linux.intel.com
    [ Rename 'stat' variables to 'st' to build on centos:{5,6} and others where it shadows a global declaration ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

30 Nov, 2017

1 commit

  • Move the code to synthesize event updates for scale/unit/cpus to a
    common utility file, and use it both from stat and record.

    This allows to access scale and other extra qualifiers from perf script.

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20171117214300.32746-2-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

31 Oct, 2017

3 commits

  • Move the shadow stats scale computation to the
    perf_stat__update_shadow_stats() function, so it's centralized and we
    don't forget to do it. It also saves few lines of code.

    Signed-off-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Changbin Du
    Cc: David Ahern
    Cc: Jin Yao
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-htg7mmyxv6pcrf57qyo6msid@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Add struct perf_data_file to represent a single file within a perf_data
    struct.

    Signed-off-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Changbin Du
    Cc: David Ahern
    Cc: Jin Yao
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org
    [ Fixup recent changes in 'perf script --per-event-dump' ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Rename struct perf_data_file to perf_data, because we will add the
    possibility to have multiple files under perf.data, so the 'perf_data'
    name fits better.

    Signed-off-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Changbin Du
    Cc: David Ahern
    Cc: Jin Yao
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org
    [ Fixup recent changes in 'perf script --per-event-dump' ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

27 Oct, 2017

1 commit

  • When we started using it for stats and did it not just in
    builtin-stat.c, but also for builtin-script.c, then it stopped being a
    tool private area, so introduce a new pointer for these stats and leave
    ->priv to its original purpose.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Cc: yuzhoujian
    Fixes: cfc8874a4859 ("perf script: Process cpu/threads maps")
    Link: http://lkml.kernel.org/n/tip-jtpzx3rjqo78snmmsdzwb2eb@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

13 Sep, 2017

4 commits

  • It's not possible to run a package event and a per cpu event in the same
    group. This is used by some of the power metrics. They work correctly
    when not using a group.

    Normally weak groups should handle that, but in this case EBADF is
    returned instead of the normal EINVAL.

    $ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
    Using CPUID GenuineIntel-6-3E
    perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument)
    perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
    perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
    perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
    perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3
    perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4
    perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor)

    and perf errors out.

    Make weak groups trigger a fall back for EBADF too. Then this case works correctly:

    $ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
    Using CPUID GenuineIntel-6-3E
    Weak group for cstate_pkg/c2-residency//2 failed
    cstate_pkg/c2-residency/: 476709882 1000598460 1000598460
    msr/tsc/: 39625837911 12007369110 12007369110

    Performance counter stats for 'system wide':

    476,709,882 cstate_pkg/c2-residency/
    39,625,837,911 msr/tsc/

    1.000697588 seconds time elapsed

    This fixes perf stat -M Power ...

    $ perf stat -M Power --metric-only -a sleep 1

    Performance counter stats for 'system wide':

    Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
    1.0 0.7 30.0 0.0 0.9 0.1 0.4 0.0

    1.001240740 seconds time elapsed

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval.
    Compute it for each interval instead of only at the end.

    Pointed out by Jiri.

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Some perf stat metrics use an internal "duration_time" metric. It is not
    correctly printed however. So hide it during output to avoid confusing
    users with 0 counts.

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Add generic support for standalone metrics specified in JSON files to
    perf stat. A metric is a formula that uses multiple events to compute a
    higher level result (e.g. IPC).

    Previously metrics were always tied to an event and automatically
    enabled with that event. But now change it that we can have standalone
    metrics. They are in the same JSON data structure as events, but don't
    have an event name.

    We also allow to organize the metrics in metric groups, which allows a
    short cut to select several related metrics at once.

    Add a new -M / --metrics option to perf stat that adds the metrics or
    metric groups specified.

    Add the core code to manage and parse the metric groups. They are
    collected from the JSON data structures into a separate rblist. When
    computing shadow values look for metrics in that list. Then they are
    computed using the existing saved values infrastructure in stat-shadow.c

    The actual JSON metrics are in a separate pull request.

    % perf stat -M Summary --metric-only -a sleep 1

    Performance counter stats for 'system wide':

    Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
    317614222.0 1392930775.0 0.0 0.0 0.2 0.1

    1.001497549 seconds time elapsed

    % perf stat -M GFLOPs flops

    Performance counter stats for 'flops':

    3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%)
    14 fp_comp_ops_exe.sse_scalar_double (66.65%)
    0 fp_comp_ops_exe.sse_packed_double (66.67%)
    0 fp_comp_ops_exe.sse_packed_single (66.70%)
    0 simd_fp_256.packed_double (66.70%)
    0 simd_fp_256.packed_single (66.67%)
    0 duration_time

    3.238372845 seconds time elapsed

    v2: Add missing header file
    v3: Move find_map to pmu.c

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen