23 Mar, 2017

2 commits

  • Add support for a new JSON event attribute to name MetricExpr for better
    output in perf stat.

    If the event has no MetricName it uses the normal event name instead to
    describe the metric.

    Before

    % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
    time unc_p_freq_max_os_cycles
    1.000149775 15.7
    2.000344807 19.3
    3.000502544 16.7
    4.000640656 6.6
    5.000779955 9.9

    After

    % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
    time freq_max_os_cycles %
    1.000149775 15.7
    2.000344807 19.3
    3.000502544 16.7
    4.000640656 6.6
    5.000779955 9.9

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20170320201711.14142-13-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Add generic infrastructure to perf stat to output ratios for
    "MetricExpr" entries in the event lists. Many events are more useful as
    ratios than in raw form, typically some count in relation to total
    ticks.

    Transfer the MetricExpr information from the alias to the evsel.

    We mark the events that need to be collected for MetricExpr, and also
    link the events using them with a pointer. The code is careful to always
    prefer the right event in the same group to minimize multiplexing
    errors. At the moment only a single relation is supported.

    Then add a rblist to the stat shadow code that remembers stats based on
    the cpu and context.

    Then finally update and retrieve and print these values similarly to the
    existing hardcoded perf metrics. We use the simple expression parser
    added earlier to evaluate the expression.

    Normally we just output the result without further commentary, but for
    --metric-only this would lead to empty columns. So for this case use the
    original event as description.

    There is no attempt to automatically add the MetricExpr event, if it is
    missing, however we suggest it to the user, because the user tool
    doesn't have enough information to reliably construct a group that is
    guaranteed to schedule. So we leave that to the user.

    % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
    1.000147889 800,085,181 unc_p_clockticks
    1.000147889 93,126,241 unc_p_freq_max_os_cycles # 11.6
    2.000448381 800,218,217 unc_p_clockticks
    2.000448381 142,516,095 unc_p_freq_max_os_cycles # 17.8
    3.000639852 800,243,057 unc_p_clockticks
    3.000639852 162,292,689 unc_p_freq_max_os_cycles # 20.3

    % perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
    # time freq_max_os_cycles %
    1.000127077 0.9
    2.000301436 0.7
    3.000456379 0.0

    v2: Change from DivideBy to MetricExpr
    v3: Use expr__ prefix. Support more than one other event.
    v4: Update description
    v5: Only print warning message once for multiple PMUs.

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

22 Mar, 2017

1 commit

  • The uncore PMU has a lot of duplicated PMUs for different subsystems.
    When expanding an uncore alias we usually end up with a large
    number of identically named aliases, which makes perf stat
    output difficult to read.

    Automatically sum them up in perf stat, unless --no-merge is specified.

    This can be default because only the uncores generally have duplicated
    aliases. Other PMUs have unique names.

    Before:

    % perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1

    Performance counter stats for 'system wide':

    694,976 Bytes unc_c_llc_lookup.any
    706,304 Bytes unc_c_llc_lookup.any
    956,608 Bytes unc_c_llc_lookup.any
    782,720 Bytes unc_c_llc_lookup.any
    605,696 Bytes unc_c_llc_lookup.any
    442,816 Bytes unc_c_llc_lookup.any
    659,328 Bytes unc_c_llc_lookup.any
    509,312 Bytes unc_c_llc_lookup.any
    263,936 Bytes unc_c_llc_lookup.any
    592,448 Bytes unc_c_llc_lookup.any
    672,448 Bytes unc_c_llc_lookup.any
    608,640 Bytes unc_c_llc_lookup.any
    641,024 Bytes unc_c_llc_lookup.any
    856,896 Bytes unc_c_llc_lookup.any
    808,832 Bytes unc_c_llc_lookup.any
    684,864 Bytes unc_c_llc_lookup.any
    710,464 Bytes unc_c_llc_lookup.any
    538,304 Bytes unc_c_llc_lookup.any

    1.002577660 seconds time elapsed

    After:

    % perf stat -a -e unc_c_llc_lookup.any sleep 1

    Performance counter stats for 'system wide':

    2,685,120 Bytes unc_c_llc_lookup.any

    1.002648032 seconds time elapsed

    v2: Split collect_aliases. Rename alias flag.
    v3: Make sure unsupported/not counted is always printed.
    v4: Factor out callback change into separate patch.
    v5: Move check for bad results here
    Move merged check into collect_data

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20170320201711.14142-3-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

16 Dec, 2016

1 commit

  • Adding perf_evsel::ignore_missing_cpu_thread bool.

    When set true, it allows perf to ignore error of missing pid of perf
    event syscall.

    We remove missing thread id from the thread_map, so the rest of the
    processing like ioctl and mmap won't get disturbed with -1 fd.

    The reason for supporting this is to ease up monitoring group of pids,
    that 'disappear' before perf opens their event. This currently leads
    perf to report error and exit and makes perf record's -u option unusable
    under certain setup.

    With this change we will allow this race and ignore such failure with
    following warning:

    WARNING: Ignored open failure for pid 8605

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161213074622.GA3084@krava
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

25 Nov, 2016

1 commit

  • For tracepoint events, callchains always contain certain functions.
    Sometimes it'd be better to skip those functions as they have no value.

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20161124011114.7102-2-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

23 Nov, 2016

1 commit

  • The EVSEL__PRINT_CALLCHAIN_ARROW options can be used to print callchains
    with arrows for readability. It will be used 'sched timehist' command
    like below:

    __schedule
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20161116060634.28477-3-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

24 Oct, 2016

1 commit

  • It can be useful to specify branch type state per event, for example if
    we want to collect both software trace points and last branch PMU events
    in a single collection. Currently this doesn't work because the software
    trace point errors out with -b.

    There was already a branch-type parameter to configure branch sample
    types per event in the parser, but it was stubbed out. This patch
    implements the necessary plumbing to actually enable it.

    Now:

    $ perf record -e sched:sched_switch,cpu/cpu-cycles,branch_type=any/ ...

    works.

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/1476306127-19721-1-git-send-email-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

29 Sep, 2016

3 commits

  • This patch makes it possible to use the current filter framework with
    address filters. That way address filters for HW tracers such as
    CoreSight and Intel PT can be communicated to the kernel drivers.

    Signed-off-by: Mathieu Poirier
    Acked-by: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1474037045-31730-4-git-send-email-mathieu.poirier@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Mathieu Poirier
     
  • Making function perf_evsel__append_filter() static and introducing a new
    tracepoint specific function to append filters. That way we eliminate
    redundant code and avoid formatting mistake.

    Signed-off-by: Mathieu Poirier
    Acked-by: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1474037045-31730-3-git-send-email-mathieu.poirier@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Mathieu Poirier
     
  • By making function perf_evsel__append_filter() take a format rather than
    an operator it is possible to reuse the code for other purposes (ex.
    Intel PT and CoreSight) than tracepoints.

    Signed-off-by: Mathieu Poirier
    Acked-by: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1474037045-31730-2-git-send-email-mathieu.poirier@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Mathieu Poirier
     

14 Sep, 2016

1 commit

  • This patch adds PMU driver specific configuration to the parser
    infrastructure by preceding any term with the '@' letter. As such doing
    something like:

    perf record -e some_event/@cfg1,@cfg2=config/ ...

    will see 'cfg1' and 'cfg2=config' being added to the list of evsel
    config terms. Token 'cfg1' and 'cfg2=config' are not processed in user
    space and are meant to be interpreted by the PMU driver.

    First the lexer/parser are supplemented with the required definitions to
    recognise the driver specific configuration. From there they are simply
    added to the list of event terms. The bulk of the work is done in
    function "parse_events_add_pmu()" where driver config event terms are
    added to a new list of driver config terms, which in turn spliced with
    the event's new driver configuration list.

    Signed-off-by: Mathieu Poirier
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1473179837-3293-4-git-send-email-mathieu.poirier@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Mathieu Poirier
     

29 Jul, 2016

1 commit


16 Jul, 2016

2 commits

  • This patch allows following config terms and option:

    Globally setting events to overwrite;

    # perf record --overwrite ...

    Set specific events to be overwrite or no-overwrite.

    # perf record --event cycles/overwrite/ ...
    # perf record --event cycles/no-overwrite/ ...

    Add missing config terms and update the config term array size because
    the longest string length has changed.

    For overwritable events, it automatically selects attr.write_backward
    since perf requires it to be backward for reading.

    Test result:

    # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
    # perf evlist -v
    syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
    # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

    Signed-off-by: Wang Nan
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Nilay Vaish
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1468485287-33422-14-git-send-email-wangnan0@huawei.com
    Signed-off-by: He Kuang
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     
  • evsel->overwrite indicator means an event should be put into
    overwritable ring buffer. In current implementation, it equals to
    evsel->attr.write_backward. To reduce compliexity, remove
    evsel->overwrite, use evsel->attr.write_backward instead.

    In addition, in __perf_evsel__open(), if kernel doesn't support
    write_backward and user explicitly set it in evsel, don't fallback
    like other missing feature, since it is meaningless to fall back to
    a forward ring buffer in this case: we are unable to stably read
    from an forward overwritable ring buffer.

    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Nilay Vaish
    Cc: Wang Nan
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1468485287-33422-2-git-send-email-wangnan0@huawei.com
    Signed-off-by: Wang Nan
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

13 Jul, 2016

1 commit


30 Jun, 2016

1 commit

  • Add Utility function to fetch arch using evsel. (evsel->env->arch)

    Signed-off-by: Ravi Bangoria
    Cc: Ananth N Mavinakayanahalli
    Cc: Anton Blanchard
    Cc: Daniel Axtens
    Cc: David Laight
    Cc: Michael Ellerman
    Cc: Naveen N. Rao
    Link: http://lkml.kernel.org/r/1467267262-4589-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     

04 Jun, 2016

1 commit

  • Out of perf_evsel__intval(), that requires passing the variable name,
    that will then be searched in the list of tracepoint variables for the
    given evsel.

    In cases such as syscall file descriptor ("fd") tracking, this is
    wasteful, we need just to use perf_evsel__field() and cache the
    format_field.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-r6f89jx9j5nkx037d0naviqy@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 May, 2016

1 commit

  • The tooling counterpart, now it is possible to do:

    # perf record -e sched:sched_switch/max-stack=10/ -e cycles/call-graph=dwarf,max-stack=4/ -e cpu-cycles/call-graph=dwarf,max-stack=1024/ usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.052 MB perf.data (5 samples) ]
    # perf evlist -v
    sched:sched_switch: type: 2, size: 112, config: 0x110, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, sample_max_stack: 10
    cycles/call-graph=dwarf,max-stack=4/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 4
    cpu-cycles/call-graph=dwarf,max-stack=1024/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192, sample_max_stack: 1024
    # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

    Using just /max-stack=N/ means /call-graph=fp,max-stack=N/, that should
    be further configurable by means of some .perfconfig knob.

    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Wang Nan
    Cc: Zefan Li
    Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

21 May, 2016

1 commit

  • Add 'overwrite' attribute to evsel to mark whether this event is
    overwritable. The following commits will support syntax like:

    # perf record -e cycles/overwrite/ ...

    An overwritable evsel requires kernel support for the
    perf_event_attr.write_backward ring buffer feature.

    Add it to perf_missing_feature.

    Signed-off-by: Wang Nan
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1463762315-155689-2-git-send-email-wangnan0@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     

18 Apr, 2016

1 commit


15 Apr, 2016

3 commits

  • Not used at all, nuke it.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-jf2w8ce8nl3wso3vuodg5jci@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • This way the print routine merely does printing, not requiring access to
    the resolving machinery, which helps disentangling the object files and
    easing creating subsets with a limited functionality set.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-2ti2jbra8fypdfawwwm3aee3@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • # perf test -v python
    16: Try 'import perf' in python, checking link problems :
    --- start ---
    test child forked, pid 672
    Traceback (most recent call last):
    File "", line 1, in
    ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
    symbol_conf
    test child finished with -1
    ---- end ----
    Try 'import perf' in python, checking link problems: FAILED!
    #

    To fix it just pass a parameter to perf_evsel__fprintf_sym telling if
    callchains should be printed.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-comrsr20bsnr8bg0n6rfwv12@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

13 Apr, 2016

1 commit


12 Apr, 2016

3 commits

  • The rename is for consistency with the parameter name.

    Make it public for fine grained control of which evsels should have
    callchains enabled, like, for instance, will be done in the next
    changesets in 'perf trace', to enable callchains just on the
    "raw_syscalls:sys_exit" tracepoint.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-og8vup111rn357g4yagus3ao@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Instead receive a callchain_param pointer to configure callchain
    aspects, not doing so if NULL is passed.

    This will allow fine grained control over which evsels in an evlist
    gets callchains enabled.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-2mupip6khc92mh5x4nw9to82@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • In 'perf trace' we're just interested in printing callchains, and we
    don't want to use the symbol_conf.use_callchain, so move the callchain
    part to a new method.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-kcn3romzivcpxb3u75s9nz33@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

09 Mar, 2016

1 commit

  • Remove the union in evsel so that the database id and priv pointer can
    be used simultainously without conflicting and crashing.

    Detailed Description for the fixed bug follows:

    perf script crashes with a segmentation fault on user space tool version
    4.5.rc7.ge2857b when using the python database export API. It works
    properly in 4.4 and prior versions.

    the crash fist appeared in:

    cfc8874a4859 ("perf script: Process cpu/threads maps")

    How to reproduce the bug:

    Remove any temporary files left over from a previous crash (if you have
    already attemped to reproduce the bug):

    $ rm -r test_db-perf-data
    $ dropdb test_db

    $ perf record timeout 1 yes >/dev/null
    $ perf script -s scripts/python/export-to-postgresql.py test_db

    Stack Trace:
    Program received signal SIGSEGV, Segmentation fault.
    __GI___libc_free (mem=0x1) at malloc.c:2929
    2929 malloc.c: No such file or directory.
    (gdb) bt
    at util/stat.c:122
    argv=, prefix=) at builtin-script.c:2231
    argc=argc@entry=4, argv=argv@entry=0x7fffffffdf70) at perf.c:390
    at perf.c:451

    Signed-off-by: Chris Phlipot
    Acked-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Peter Zijlstra
    Fixes: cfc8874a4859 ("perf script: Process cpu/threads maps")
    Link: http://lkml.kernel.org/r/1457500314-8912-1-git-send-email-cphlipot0@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Chris Phlipot
     

23 Feb, 2016

1 commit

  • Commit a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
    adds a helper to enable a BPF program to output data to a perf ring
    buffer through a new type of perf event, PERF_COUNT_SW_BPF_OUTPUT. This
    patch enables perf to create events of that type. Now a perf user can
    use the following cmdline to receive output data from BPF programs:

    # perf record -a -e bpf-output/no-inherit,name=evt/ \
    -e ./test_bpf_output.c/map:channel.event=evt/ ls /
    # perf script
    perf 1560 [004] 347747.086295: evt: ffffffff811fd201 sys_write ...
    perf 1560 [004] 347747.086300: evt: ffffffff811fd201 sys_write ...
    perf 1560 [004] 347747.086315: evt: ffffffff811fd201 sys_write ...
    ...

    Test result:

    # cat test_bpf_output.c
    /************************ BEGIN **************************/
    #include
    struct bpf_map_def {
    unsigned int type;
    unsigned int key_size;
    unsigned int value_size;
    unsigned int max_entries;
    };

    #define SEC(NAME) __attribute__((section(NAME), used))
    static u64 (*ktime_get_ns)(void) =
    (void *)BPF_FUNC_ktime_get_ns;
    static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
    (void *)BPF_FUNC_trace_printk;
    static int (*get_smp_processor_id)(void) =
    (void *)BPF_FUNC_get_smp_processor_id;
    static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
    (void *)BPF_FUNC_perf_event_output;

    struct bpf_map_def SEC("maps") channel = {
    .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
    .key_size = sizeof(int),
    .value_size = sizeof(u32),
    .max_entries = __NR_CPUS__,
    };

    SEC("func_write=sys_write")
    int func_write(void *ctx)
    {
    struct {
    u64 ktime;
    int cpuid;
    } __attribute__((packed)) output_data;
    char error_data[] = "Error: failed to output: %d\n";

    output_data.cpuid = get_smp_processor_id();
    output_data.ktime = ktime_get_ns();
    int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
    &output_data, sizeof(output_data));
    if (err)
    trace_printk(error_data, sizeof(error_data), err);
    return 0;
    }
    char _license[] SEC("license") = "GPL";
    int _version SEC("version") = LINUX_VERSION_CODE;
    /************************ END ***************************/

    # perf record -a -e bpf-output/no-inherit,name=evt/ \
    -e ./test_bpf_output.c/map:channel.event=evt/ ls /
    # perf script | grep ls
    ls 2242 [003] 347851.557563: evt: ffffffff811fd201 sys_write ...
    ls 2242 [003] 347851.557571: evt: ffffffff811fd201 sys_write ...

    Signed-off-by: Wang Nan
    Cc: Adrian Hunter
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: Cody P Schafer
    Cc: He Kuang
    Cc: Jeremie Galarneau
    Cc: Jiri Olsa
    Cc: Kirill Smelkov
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1456132275-98875-11-git-send-email-wangnan0@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     

09 Jan, 2016

1 commit

  • To use dynamic sort keys, it might be good to add an option to see the
    list of field names.

    $ perf evlist -i perf.data.sched
    sched:sched_switch
    sched:sched_stat_wait
    sched:sched_stat_sleep
    sched:sched_stat_iowait
    sched:sched_stat_runtime
    sched:sched_process_fork
    sched:sched_wakeup
    sched:sched_wakeup_new
    sched:sched_migrate_task
    # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

    $ perf evlist -i perf.data.sched --trace-fields
    sched:sched_switch: trace_fields: prev_comm,prev_pid,prev_prio,prev_state,next_comm,next_pid,next_prio
    sched:sched_stat_wait: trace_fields: comm,pid,delay
    sched:sched_stat_sleep: trace_fields: comm,pid,delay
    sched:sched_stat_iowait: trace_fields: comm,pid,delay
    sched:sched_stat_runtime: trace_fields: comm,pid,runtime,vruntime
    sched:sched_process_fork: trace_fields: parent_comm,parent_pid,child_comm,child_pid
    sched:sched_wakeup: trace_fields: comm,pid,prio,success,target_cpu
    sched:sched_wakeup_new: trace_fields: comm,pid,prio,success,target_cpu
    sched:sched_migrate_task: trace_fields: comm,pid,prio,orig_cpu,dest_cpu

    Committer notes:

    For another file, in verbose mode:

    # perf evlist -v --trace-fields
    sched:sched_switch: type: 2, size: 112, config: 0x10b, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, trace_fields: prev_comm,prev_pid,prev_prio,prev_state,next_comm,next_pid,next_prio
    #

    Signed-off-by: Namhyung Kim
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1452125549-1511-5-git-send-email-namhyung@kernel.org
    [ Replaced 'trace_fields=' with 'trace_fields: ' to make the output consistent in -v mode ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

08 Dec, 2015

2 commits

  • Adding perf_evsel__disable function to have complement for
    perf_evsel__enable function. Both will be used in following patch to
    factor perf_evlist__(enable|disable).

    Signed-off-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1449133606-14429-3-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • All events now share proper cpu and thread maps. There's no need to pass
    those maps from evlist, it's safe to use evsel maps for enabling event.

    Signed-off-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1449133606-14429-2-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

30 Oct, 2015

1 commit

  • This is the final patch which makes basic BPF filter work. After
    applying this patch, users are allowed to use BPF filter like:

    # perf record --event ./hello_world.o ls

    A bpf_fd field is appended to 'struct evsel', and setup during the
    callback function add_bpf_event() for each 'probe_trace_event'.

    PERF_EVENT_IOC_SET_BPF ioctl is used to attach eBPF program to a newly
    created perf event. The file descriptor of the eBPF program is passed to
    perf record using previous patches, and stored into evsel->bpf_fd.

    It is possible that different perf event are created for one kprobe
    events for different CPUs. In this case, when trying to call the ioctl,
    EEXIST will be return. This patch doesn't treat it as an error.

    Committer note:

    The bpf proggie used so far:

    __attribute__((section("fork=_do_fork"), used))
    int fork(void *ctx)
    {
    return 0;
    }

    char _license[] __attribute__((section("license"), used)) = "GPL";
    int _version __attribute__((section("version"), used)) = 0x40300;

    failed to produce any samples, even with forks happening and it being
    running in system wide mode.

    That is because now the filter is being associated, and the code above
    always returns zero, meaning that all forks will be probed but filtered
    away ;-/

    Change it to 'return 1;' instead and after that:

    # trace --no-syscalls --event /tmp/foo.o
    0.000 perf_bpf_probe:fork:(ffffffff8109be30))
    2.333 perf_bpf_probe:fork:(ffffffff8109be30))
    3.725 perf_bpf_probe:fork:(ffffffff8109be30))
    4.550 perf_bpf_probe:fork:(ffffffff8109be30))
    ^C#

    And it works with all tools, including 'perf trace'.

    Signed-off-by: Wang Nan
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: Daniel Borkmann
    Cc: David Ahern
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Kaixu Xia
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1444826502-49291-8-git-send-email-wangnan0@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     

28 Oct, 2015

2 commits

  • This patch allows perf record setting event's attr.inherit bit by
    config terms like:

    # perf record -e cycles/no-inherit/ ...
    # perf record -e cycles/inherit/ ...

    So user can control inherit bit for each event separately.

    In following example, a.out fork()s in main then do some complex
    CPU intensive computations in both of its children.

    Basic result with and without inherit:

    # perf record -e cycles -e instructions ./a.out
    [ perf record: Woken up 9 times to write data ]
    [ perf record: Captured and wrote 2.205 MB perf.data (47920 samples) ]
    # perf report --stdio
    # ...
    # Samples: 23K of event 'cycles'
    # Event count (approx.): 23641752891
    ...
    # Samples: 24K of event 'instructions'
    # Event count (approx.): 30428312415

    # perf record -i -e cycles -e instructions ./a.out
    [ perf record: Woken up 5 times to write data ]
    [ perf record: Captured and wrote 1.111 MB perf.data (24019 samples) ]
    ...
    # Samples: 12K of event 'cycles'
    # Event count (approx.): 11699501775
    ...
    # Samples: 12K of event 'instructions'
    # Event count (approx.): 15058023559

    Cancel inherit for one event when globally enable:

    # perf record -e cycles/no-inherit/ -e instructions ./a.out
    [ perf record: Woken up 7 times to write data ]
    [ perf record: Captured and wrote 1.660 MB perf.data (36004 samples) ]
    ...
    # Samples: 12K of event 'cycles/no-inherit/'
    # Event count (approx.): 11895759282
    ...
    # Samples: 24K of event 'instructions'
    # Event count (approx.): 30668000441

    Enable inherit for one event when globally disable:

    # perf record -i -e cycles/inherit/ -e instructions ./a.out
    [ perf record: Woken up 7 times to write data ]
    [ perf record: Captured and wrote 1.654 MB perf.data (35868 samples) ]
    ...
    # Samples: 23K of event 'cycles/inherit/'
    # Event count (approx.): 23285400229
    ...
    # Samples: 11K of event 'instructions'
    # Event count (approx.): 14969050259

    Committer note:

    One can check if the bit was set, in addition to seeing the result in
    the perf.data file size as above by doing one of:

    # perf record -e cycles -e instructions -a usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.911 MB perf.data (63 samples) ]
    # perf evlist -v
    cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
    instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
    #

    So, the inherit bit was set in both, now, if we disable it globally using
    --no-inherit:

    # perf record --no-inherit -e cycles -e instructions -a usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.910 MB perf.data (56 samples) ]
    # perf evlist -v
    cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
    instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1

    No inherit bit set, then disabling it and setting just on the cycles event:

    # perf record --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.909 MB perf.data (48 samples) ]
    # perf evlist -v
    cycles/inherit/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
    instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
    #

    We can see it as well in by using a more verbose level of debug messages in
    the tool that sets up the perf_event_attr, 'perf record' in this case:

    [root@zoo ~]# perf record -vv --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
    ------------------------------------------------------------
    perf_event_attr:
    size 112
    { sample_period, sample_freq } 4000
    sample_type IP|TID|TIME|ID|CPU|PERIOD
    read_format ID
    disabled 1
    inherit 1
    mmap 1
    comm 1
    freq 1
    task 1
    sample_id_all 1
    exclude_guest 1
    mmap2 1
    comm_exec 1
    ------------------------------------------------------------
    sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8
    sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
    sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8
    sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8
    ------------------------------------------------------------
    perf_event_attr:
    size 112
    config 0x1
    { sample_period, sample_freq } 4000
    sample_type IP|TID|TIME|ID|CPU|PERIOD
    read_format ID
    disabled 1
    freq 1
    sample_id_all 1
    exclude_guest 1
    ------------------------------------------------------------
    sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8

    Signed-off-by: Wang Nan
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: David S. Miller
    Cc: Li Zefan
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1446029705-199659-2-git-send-email-wangnan0@huawei.com
    [ s/u64/bool/ for the perf_evsel_config_term inherit field - jolsa]
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     
  • Because the 'perf stat record' patches will use the id_offset member
    together with the priv pointer.

    Signed-off-by: Jiri Olsa
    Tested-by: Kan Liang
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1445784728-21732-29-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

06 Oct, 2015

1 commit

  • The 'P' will cause the event to get maximum possible detected precise
    level.

    Following record:
    $ perf record -e cycles:P ...

    will detect maximum precise level for 'cycles' event and use it.

    Commiter note:

    Testing it:

    $ perf record -e cycles:P usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.013 MB perf.data (9 samples) ]
    $ perf evlist
    cycles:P
    $ perf evlist -v
    cycles:P: size: 112, { sample_period, sample_freq }: 4000, sample_type:
    IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1,
    enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, mmap2: 1,
    comm_exec: 1
    $

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Kan Liang
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1444068369-20978-6-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

16 Sep, 2015

1 commit


15 Sep, 2015

2 commits

  • perf_evlist__propagate_maps() cannot easily tell if an evsel has its own
    cpu map. To make that simpler, keep a copy of the PMU cpu map and
    adjust the propagation logic accordingly.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: Kan Liang
    Link: http://lkml.kernel.org/r/1441699142-18905-8-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • Propagate error info from tp_format via ERR_PTR to get it all the way
    down to the parse-event.c tracepoint adding routines. Following
    functions now return pointer with encoded error:

    - tp_format
    - trace_event__tp_format
    - perf_evsel__newtp_idx
    - perf_evsel__newtp

    This affects several other places in perf, that cannot use pointer check
    anymore, but must utilize the err.h interface, when getting error
    information from above functions list.

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Matt Fleming
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Raphael Beamonte
    Link: http://lkml.kernel.org/r/1441615087-13886-5-git-send-email-jolsa@kernel.org
    [ Add two missing ERR_PTR() and one IS_ERR() ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

14 Sep, 2015

1 commit