30 Oct, 2015

1 commit

  • Although previous patch allows setting BPF compiler related options in
    perfconfig, on some ad-hoc situation it still requires passing options
    through cmdline. This patch introduces 2 options to 'perf record' for
    this propose: --clang-path and --clang-opt.

    Signed-off-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: Brendan Gregg
    Cc: Daniel Borkmann
    Cc: David Ahern
    Cc: He Kuang
    Cc: Jiri Olsa
    Cc: Kaixu Xia
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1444826502-49291-9-git-send-email-wangnan0@huawei.com
    [ Add the new options to the 'record' man page ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     

27 Oct, 2015

1 commit

  • Now usage_with_options() setup a pager before printing message so normal
    printf() or pr_err() will not be shown. The usage_with_options_msg()
    can be used to print some help message before usage strings.

    Signed-off-by: Namhyung Kim
    Acked-by: Masami Hiramatsu
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1445701767-12731-4-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

23 Oct, 2015

2 commits

  • The --call-graph option is complex so we should provide better guide for
    users. Also change help message to be consistent with config option
    names. Now perf top will show help like below:

    $ perf top --call-graph
    Error: option `call-graph' requires a value

    Usage: perf top []

    --call-graph
    setup and enables call-graph (stack chain/backtrace):

    record_mode: call graph recording mode (fp|dwarf|lbr)
    record_size: if record_mode is 'dwarf', max size of stack recording ()
    default: 8192 (bytes)
    print_type: call graph printing style (graph|flat|fractal|none)
    threshold: minimum call graph inclusion threshold ()
    print_limit: maximum number of call graph entry ()
    order: call graph order (caller|callee)
    sort_key: call graph sort key (function|address)
    branch: include last branch info to call graph (branch)

    Default: fp,graph,0.5,caller,function

    Requested-by: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Acked-by: Frederic Weisbecker
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Brendan Gregg
    Cc: Chandler Carruth
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • These messages will be used by 'perf top' in the next patch.

    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Brendan Gregg
    Cc: Chandler Carruth
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1445495330-25416-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

03 Oct, 2015

1 commit

  • When run "perf record -e", the number of samples showed up is wrong on some
    32 bit systems, i.e. powerpc and arm.

    For example, run the below commands on 32 bit powerpc:

    perf probe -x /lib/libc.so.6 malloc
    perf record -e probe_libc:malloc -a ls perf.data
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.036 MB perf.data (13829241621624967218 samples) ]

    Actually, "perf script" just shows 21 samples. The number of samples is also
    absurd since samples is long type, but it is printed as PRIu64.

    Build test ran on x86-64, x86, aarch64, arm, mips, ppc and ppc64.

    Signed-off-by: Yang Shi
    Cc: linaro-kernel@lists.linaro.org
    Link: http://lkml.kernel.org/r/1443563383-4064-1-git-send-email-yang.shi@linaro.org
    [ Bumped the 'hits' var used together with record.samples to 'unsigned long long' too ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Yang Shi
     

01 Oct, 2015

1 commit

  • A previous patch added a synthesized comm event for forked child process
    but it missed that the event should contain area for sample_id_hdr at
    the end. It worked by accident since the perf_event union contains
    bigger event structs like mmap_events.

    This patch fixes it by dynamically allocating event struct including
    those area like in perf_event__synthesize_thread_map().

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1443577526-3240-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

23 Sep, 2015

1 commit

  • When perf creates a new child to profile, the events are enabled on
    exec(). And in this case, it doesn't synthesize any event for the
    child since they'll be generated during exec(). But there's an window
    between the enabling and the event generation.

    It used to be overcome since samples are only in kernel (so we always
    have the map) and the comm is overridden by a later COMM event.
    However it won't work if events are processed and displayed before the
    COMM event overrides like in 'perf script'. This leads to those early
    samples (like native_write_msr_safe) not having a comm but pid (like
    ':15328').

    So it needs to synthesize COMM event for the child explicitly before
    enabling so that it can have a correct comm. But at this time, the
    comm will be "perf" since it's not exec-ed yet.

    Committer note:

    Before this patch:

    # perf record usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
    # perf script --show-task-events
    :4429 4429 27909.079372: 1 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    :4429 4429 27909.079375: 1 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    :4429 4429 27909.079376: 10 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    :4429 4429 27909.079377: 223 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    :4429 4429 27909.079378: 6571 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    usleep 4429 27909.079380: PERF_RECORD_COMM exec: usleep:4429/4429
    usleep 4429 27909.079381: 185403 cycles: ffffffff810a72d3 flush_signal_handlers (/lib/modules/4.
    usleep 4429 27909.079444: 2241110 cycles: 7fc575355be3 _dl_start (/usr/lib64/ld-2.20.so)
    usleep 4429 27909.079875: PERF_RECORD_EXIT(4429:4429):(4429:4429)

    After:

    # perf record usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
    # perf script --show-task
    perf 0 0.000000: PERF_RECORD_COMM: perf:8446/8446
    perf 8446 30154.038944: 1 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    perf 8446 30154.038948: 1 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    perf 8446 30154.038949: 9 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    perf 8446 30154.038950: 230 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    perf 8446 30154.038951: 6772 cycles: ffffffff8105f45a native_write_msr_safe (/lib/modules/4.
    usleep 8446 30154.038952: PERF_RECORD_COMM exec: usleep:8446/8446
    usleep 8446 30154.038954: 196923 cycles: ffffffff81766440 _raw_spin_lock (/lib/modules/4.3.0-rc1
    usleep 8446 30154.039021: 2292130 cycles: 7f609a173dc4 memcpy (/usr/lib64/ld-2.20.so)
    usleep 8446 30154.039349: PERF_RECORD_EXIT(8446:8446):(8446:8446)
    #

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1442881495-2928-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

01 Sep, 2015

2 commits

  • This patch modifies the -I/--int-regs option to enablepassing the name
    of the registers to sample on interrupt. Registers can be specified by
    their symbolic names. For instance on x86, --intr-regs=ax,si.

    The motivation is to reduce the size of the perf.data file and the
    overhead of sampling by only collecting the registers useful to a
    specific analysis. For instance, for value profiling, sampling only the
    registers used to passed arguements to functions.

    With no parameter, the --intr-regs still records all possible registers
    based on the architecture.

    To name registers, it is necessary to use the long form of the option,
    i.e., --intr-regs:

    $ perf record --intr-regs=si,di,r8,r9 .....

    To record any possible registers:

    $ perf record -I .....
    $ perf report --intr-regs ...

    To display the register, one can use perf report -D

    To list the available registers:

    $ perf record --intr-regs=\?
    available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15

    Signed-off-by: Stephane Eranian
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     
  • An evsel may have different cpus and threads than the evlist it is in.

    Use it's own cpus and threads, when opening the evsel in 'perf record'.

    Signed-off-by: Kan Liang
    Cc: Jiri Olsa
    Link: http://lkml.kernel.org/r/1440138194-17001-1-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

20 Aug, 2015

2 commits


06 Aug, 2015

1 commit

  • Pass global callchain_param into parse_callchain_record_opt and
    perf_evsel__config_callgraph as parameter. So we can reuse these
    functions to parse/config local param for callchain.

    Signed-off-by: Kan Liang
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1438677022-34296-3-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

30 Jul, 2015

1 commit


24 Jul, 2015

1 commit


21 Jul, 2015

1 commit

  • This patch allows 'perf record' to exclude events issued by perf itself
    by '--exclude-perf' option.

    Before this patch, when doing something like:

    # perf record -a -e syscalls:sys_enter_write

    One could easily get result like this:

    # /tmp/perf report --stdio
    ...
    # Overhead Command Shared Object Symbol
    # ........ ....... .................. ....................
    #
    99.99% perf libpthread-2.18.so [.] __write_nocancel
    0.01% ls libc-2.18.so [.] write
    0.01% sshd libc-2.18.so [.] write
    ...

    Where most events are generated by perf itself.

    A shell trick can be done to filter perf itself out:

    # cat << EOF > ./tmp
    > #!/bin/sh
    > exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
    > EOF
    # chmod a+x ./tmp
    # ./tmp

    However, doing so is user unfriendly.

    This patch extracts evsel iteration framework introduced by patch 'perf
    record: Apply filter to all events in a glob matching' into
    foreach_evsel_in_last_glob(), and makes exclude_perf() function append
    new filter expression to each evsel selected by a '-e' selector.

    To avoid losing filters if user pass '--filter' after '--exclude-perf',
    this patch uses perf_evsel__append_filter() in both case, instead of
    perf_evsel__set_filter() which removes old filter. As a side effect, now
    it is possible to use multiple '--filter' option for one selector. They
    are combinded with '&&'.

    Signed-off-by: Wang Nan
    Cc: Andi Kleen
    Cc: Brendan Gregg
    Cc: Steven Rostedt
    Cc: Zefan Li
    Cc: pi3orama@163.com
    Link: http://lkml.kernel.org/r/1436513770-8896-2-git-send-email-wangnan0@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Wang Nan
     

06 Jul, 2015

1 commit

  • If the option -T is used with option --per-thread, then time is still
    not sampled. Fix that by using OPT_BOOLEAN_SET to distinguish when the
    user used the -T option as opposed to the default case when timestamps
    are enabled but only for per-cpu recording.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1436183461-1918-1-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

20 Jun, 2015

1 commit

  • The time out to limit the individual proc map processing was hard code
    to 500ms. This patch introduce a new option --proc-map-timeout to make
    the time limit configurable.

    Signed-off-by: Kan Liang
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Ying Huang
    Link: http://lkml.kernel.org/r/1434549071-25611-2-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

10 Jun, 2015

1 commit

  • Because there's too many options and I cannot read, I frequently get
    confused between -c and -P, and try to do things like:

    perf record -P 50000 -- foo

    Which does not work; try and make the option description slightly longer
    and hopefully less confusing.

    Signed-off-by: Peter Zijlstra (Intel)
    Link: http://lkml.kernel.org/r/20150610144850.GP19282@twins.programming.kicks-ass.net
    [ Do those changes on the man page as well ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Peter Zijlstra
     

08 Jun, 2015

1 commit

  • The size of perf.data is missing update in no-buildid mode, which gives
    wrong output result.

    Before this patch:

    $ perf.perf record -B -e syscalls:sys_enter_open uname
    Linux
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.000 MB perf.data ]

    After this patch:

    $ perf.perf record -B -e syscalls:sys_enter_open uname
    Linux
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.001 MB perf.data ]

    Signed-off-by: He Kuang
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1432819050-30511-1-git-send-email-hekuang@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    He Kuang
     

28 May, 2015

1 commit

  • .. to allow sharing between builtin-record and builtin-top later. No
    code changes, just moved code.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1432749114-904-9-git-send-email-andi@firstfloor.org
    [ Rename too generic branch.[ch] name to parse-branch-options.[ch] ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

06 May, 2015

2 commits

  • Add a new option and support for Instruction Tracing Snapshot Mode.
    When the new option is selected, no AUX area tracing data is captured
    until a signal (SIGUSR2) is received.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1430404667-10593-10-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • Add build option NO_AUXTRACE to exclude compiling support for AUX area
    tracing. Support for both recording and processing is excluded and by
    implication any future additions such as Intel PT and Intel BTS will
    also not be compiled in with this option.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1430404667-10593-5-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

05 May, 2015

2 commits

  • We need to include all buildids when a perf.data file contains AUX area
    tracing data because we do not decode the trace for that purpose because
    it would take too long.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1430404667-10593-4-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • Add an index of AUX area tracing events within a perf.data file.

    perf record uses a special user event PERF_RECORD_FINISHED_ROUND to
    enable sorting of events in chunks instead of having to sort all events
    altogether.

    AUX area tracing events contain data that can span back to the very
    beginning of the recording period. i.e. they do not obey the rules of
    PERF_RECORD_FINISHED_ROUND.

    By adding an index, AUX area tracing events can be found in advance and
    the PERF_RECORD_FINISHED_ROUND approach works as usual.

    The index is recorded with the auxtrace feature in the perf.data file.
    A session reads the index but does not process it. An AUX area decoder
    can queue all the AUX area data in advance using
    auxtrace_queues__process_index() or otherwise process the index in some
    custom manner.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1430404667-10593-3-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

29 Apr, 2015

2 commits

  • Extend the -m option so that the number of mmap pages for AUX area
    tracing can be specified by adding a comma followed by the number of
    pages.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1428594864-29309-7-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • Amend the perf record tool to read the AUX area tracing mmap and
    synthesize AUX area tracing events.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1428594864-29309-6-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

10 Apr, 2015

1 commit

  • The data_head and data_tail fields are defined as __u64 in
    linux/perf_event.h, but perf userspace uses int and unsigned int.

    Convert all references to u64 for consistency.

    Signed-off-by: David Ahern
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1428420037-26599-1-git-send-email-dsahern@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

08 Apr, 2015

1 commit

  • Teach perf-record about the new perf_event_attr::{use_clockid, clockid}
    fields. Add a simple parameter to set the clock (if any) to be used for
    the events to be recorded into the data file.

    Since we store the entire perf_event_attr in the EVENT_DESC section we
    also already store the used clockid in the data file.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: David Ahern
    Cc: "H. Peter Anvin"
    Cc: Adrian Hunter
    Cc: Andrew Morton
    Cc: Jiri Olsa
    Cc: John Stultz
    Cc: Linus Torvalds
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Yunlong Song
    Link: http://lkml.kernel.org/r/20150407154851.GR23123@twins.programming.kicks-ass.net
    [ Conditionally define CLOCK_BOOTTIME, at least rhel6 doesn't have it - dsahern
    Ditto for CLOCK_MONOTONIC_RAW, sles11sp2 doesn't have it - yunlong.song ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Peter Zijlstra
     

26 Mar, 2015

1 commit

  • Use of a bad filter currently generates the message:
    Error: failed to set filter with 22 (Invalid argument)

    Add the event name to make it clear to which event the filter
    failed to apply:
    Error: Failed to set filter "foo" on event sched:sg_lb_stats: 22: Invalid argument

    To test it use something like:

    # perf record -e sched:sched_switch -e sched:*fork --filter parent_pid==1 -e sched:*wait* --filter bla usleep 1
    Error: failed to set filter "bla" on event sched:sched_stat_iowait with 22 (Invalid argument)
    #

    Based-on-a-patch-by: David Ahern
    Acked-by: David Ahern
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-d7gq2fjvaecozp9o2i0siifu@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

11 Mar, 2015

1 commit

  • By keeping pointers to machines, evlist and tool in ordered_events.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-0c6huyaf59mqtm2ek9pmposl@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

25 Feb, 2015

1 commit

  • Add an option to perf record to record running/enabled time for read
    events, similar to what stat does.

    This is useful to understand multiplexing problems.

    Right now the report support is not great, but at least report -D
    already supports it.

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1424819620-16043-1-git-send-email-andi@firstfloor.org
    [ Fixed the Documentation entry to match the OPT_BOOLEAN one ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

19 Feb, 2015

1 commit

  • Currently, there are two call chain recording options, fp and dwarf.

    Haswell has a new feature that utilizes the existing LBR facility to
    record call chains. Kernel side LBR support code provides this as a
    third option to record call chains. This patch enables the lbr call
    stack support on the tooling side.

    LBR call stack has some limitations:

    - It reuses current LBR facility, so LBR call stack and branch record
    can not be enabled at the same time.

    - It is only available for user-space callchains.

    However, it also offers some advantages:

    - LBR call stack can work on user apps which don't have frame-pointers
    or dwarf debug info compiled. It is a good alternative when nothing
    else works.

    Tested-by: Jiri Olsa
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Adrian Hunter
    Cc: Anshuman Khandual
    Cc: Arnaldo Carvalho de Melo
    Cc: Cody P Schafer
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Jacob Shin
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masanari Iida
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Rodrigo Campos
    Cc: Stephane Eranian
    Cc: Sukadev Bhattiprolu
    Link: http://lkml.kernel.org/r/1420482185-29830-2-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar

    Kan Liang
     

30 Jan, 2015

3 commits

  • Do not reference file->fd directly since we want hide the
    implementation details from outside for possible future changes.

    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1422518843-25818-8-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • After perf record finishes, it prints file size and number of samples in
    the file but this info is wrong since it assumes typical sample size of
    24 bytes and divides file size by the value.

    However as we post-process recorded samples for build-id, it can show
    correct number like below. If build-id post-processing is not requested
    just omit the wrong number of samples.

    $ perf record noploop 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.159 MB perf.data (3989 samples) ]

    $ perf report --stdio -n
    # To display the perf.data header info, please use --header/--header-only options.
    #
    # Samples: 3K of event 'cycles'
    # Event count (approx.): 3771330663
    #
    # Overhead Samples Command Shared Object Symbol
    # ........ ............ ....... ................ ..........................
    #
    99.90% 3982 noploop noploop [.] main
    0.09% 1 noploop ld-2.17.so [.] _dl_check_map_versions
    0.01% 1 noploop [kernel.vmlinux] [k] setup_arg_pages
    0.00% 5 noploop [kernel.vmlinux] [k] intel_pmu_enable_all

    Reported-by: Milian Wolff
    Signed-off-by: Namhyung Kim
    Reviewed-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1422518843-25818-4-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • It's only used for perf record to process build-id because its file size
    it's not fixed at this time due to remaining header features.

    However data offset and size is available so that we can use the
    perf_session__process_events() once we set the file size as the current
    offset like for now.

    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1422518843-25818-3-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

16 Nov, 2014

1 commit

  • Add -I/--intr-regs option to capture machine state registers at
    interrupt.

    Add the corresponding man page description

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra (Intel)
    Link: http://lkml.kernel.org/r/1411559322-16548-6-git-send-email-eranian@google.com
    Cc: cebbert.lkml@gmail.com
    Cc: Adrian Hunter
    Cc: Anshuman Khandual
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Masanari Iida
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

05 Nov, 2014

1 commit

  • When perf record finishes a session, it pre-processes samples in order
    to write build-id info from DSOs that had samples.

    During this process it'll call map__load() for the kernel map, and it
    ends up calling dso__load_vmlinux_path() which replaces dso->long_name.

    But this function checks kernel's build-id before searching vmlinux path
    so it'll end up with a cryptic name, the pathname for the entry in the
    ~/.debug cache, which can be confusing to users.

    This patch adds a flag to skip the build-id check during record, so
    that it'll have the original vmlinux path for the kernel dso->long_name,
    not the entry in the ~/.debug cache.

    Before:
    # perf record -va sleep 3
    mmap size 528384B
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.196 MB perf.data (~8545 samples) ]
    Looking at the vmlinux_path (7 entries long)
    Using /home/namhyung/.debug/.build-id/f0/6e17aa50adf4d00b88925e03775de107611551 for symbols

    After:
    # perf record -va sleep 3
    mmap size 528384B
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.193 MB perf.data (~8432 samples) ]
    Looking at the vmlinux_path (7 entries long)
    Using /lib/modules/3.16.4-1-ARCH/build/vmlinux for symbols

    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1415063674-17206-7-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

29 Oct, 2014

1 commit

  • Those are shared with other builtin commands like kvm, script. So
    make it accessable from them. This is a preparation of later change
    that limiting possible options.

    Signed-off-by: Namhyung Kim
    Acked-by: Hemant Kumar
    Cc: Alexander Yarygin
    Cc: David Ahern
    Cc: Hemant Kumar
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1413990949-13953-3-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

17 Oct, 2014

1 commit

  • The only thing we need is a forward declaration for 'struct cgroup_sel',
    that is inside 'struct perf_evsel'.

    Include cgroup.h instead on the tools that support cgroups.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-b7kuymbgf0zxi5viyjjtu5hk@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 Oct, 2014

1 commit

  • It was lost in hist.h, move it to where it belongs, callchain.h, as
    there are places that gets hist.h by means of evsel.h, and since evsel.h
    is being untangled from hist.h...

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-0rg7ji1jnbm6q6gj35j37jby@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo