28 May, 2020

1 commit


06 May, 2020

3 commits


27 Feb, 2020

2 commits

  • For all the perf-config options that can also be set from command line
    option, the preference is given to command line version in case of any
    conflict. But that's opposite in case of perf annotate. i.e. the more
    preference is given to default option rather than command line option.
    Fix it.

    Before:

    $ ./perf config
    annotate.show_nr_samples=false

    $ ./perf annotate shash --show-nr-samples
    Percent│
    │24: mov -0xc(%rbp),%eax
    49.19 │ imul $0x1003f,%eax,%ecx
    │ mov -0x18(%rbp),%rax

    After:

    Samples│
    │24: mov -0xc(%rbp),%eax
    1 │ imul $0x1003f,%eax,%ecx
    │ mov -0x18(%rbp),%rax

    Signed-off-by: Ravi Bangoria
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Adrian Hunter
    Cc: Alexey Budankov
    Cc: Changbin Du
    Cc: Ian Rogers
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Leo Yan
    Cc: Namhyung Kim
    Cc: Song Liu
    Cc: Taeung Song
    Cc: Thomas Richter
    Cc: Yisheng Xie
    Link: http://lore.kernel.org/lkml/20200213064306.160480-7-ravi.bangoria@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     
  • perf default config set by user in [annotate] section is totally ignored
    by annotate code. Fix it.

    Before:

    $ ./perf config
    annotate.hide_src_code=true
    annotate.show_nr_jumps=true
    annotate.show_nr_samples=true

    $ ./perf annotate shash
    │ unsigned h = 0;
    │ movl $0x0,-0xc(%rbp)
    │ while (*s)
    │ ↓ jmp 44
    │ h = 65599 * h + *s++;
    11.33 │24: mov -0xc(%rbp),%eax
    43.50 │ imul $0x1003f,%eax,%ecx
    │ mov -0x18(%rbp),%rax

    After:

    │ movl $0x0,-0xc(%rbp)
    │ ↓ jmp 44
    1 │1 24: mov -0xc(%rbp),%eax
    4 │ imul $0x1003f,%eax,%ecx
    │ mov -0x18(%rbp),%rax

    Note that we have removed show_nr_samples and show_total_period from
    annotation_options because they are not used. Instead of them we use
    symbol_conf.show_nr_samples and symbol_conf.show_total_period.

    Committer testing:

    Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:

    # perf config
    #
    # perf config annotate.hide_src_code=true
    # perf config
    annotate.hide_src_code=true
    #
    # perf config annotate.show_nr_jumps=true
    # perf config annotate.show_nr_samples=true
    # perf config
    annotate.hide_src_code=true
    annotate.show_nr_jumps=true
    annotate.show_nr_samples=true
    #
    #

    Before:

    # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
    Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
    ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
    Percent
    00000000000609f0 :
    endbr64
    cmpq $0x0,0x20(%rdi)
    ↓ je 10
    xor %eax,%eax
    ← retq
    xchg %ax,%ax
    100.00 10: push %rbp
    cmpq $0x0,0x18(%rdi)
    mov %rdi,%rbp
    ↓ jne 20
    1b: xor %eax,%eax
    pop %rbp
    ← retq
    nop
    20: lea 0x18(%rdi),%rdi
    → callq JS_UpdateWeakPointerAfterGC(JS::Heap /dev/null
    Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
    ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
    Samples endbr64
    cmpq $0x0,0x20(%rdi)
    ↓ je 10
    xor %eax,%eax
    ← retq
    xchg %ax,%ax
    1 1 10: push %rbp
    cmpq $0x0,0x18(%rdi)
    mov %rdi,%rbp
    ↓ jne 20
    1 1b: xor %eax,%eax
    pop %rbp
    ← retq
    nop
    1 20: lea 0x18(%rdi),%rdi
    → callq JS_UpdateWeakPointerAfterGC(JS::Heap /dev/null
    Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
    ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
    Samples endbr64
    cmpq $0x0,0x20(%rdi)
    ↓ je 10
    xor %eax,%eax
    ← retq
    xchg %ax,%ax
    1 10: push %rbp
    cmpq $0x0,0x18(%rdi)
    mov %rdi,%rbp
    ↓ jne 20
    1b: xor %eax,%eax
    pop %rbp
    ← retq
    nop
    20: lea 0x18(%rdi),%rdi
    → callq JS_UpdateWeakPointerAfterGC(JS::Heap
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Adrian Hunter
    Cc: Alexey Budankov
    Cc: Changbin Du
    Cc: Ian Rogers
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Leo Yan
    Cc: Namhyung Kim
    Cc: Song Liu
    Cc: Taeung Song
    Cc: Thomas Richter
    Cc: Yisheng Xie
    Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     

14 Jan, 2020

1 commit

  • The objdump utility has useful --prefix / --prefix-strip options to
    allow changing source code file names hardcoded into executables' debug
    info. Add options to 'perf report', 'perf top' and 'perf annotate',
    which are then passed to objdump.

    $ mkdir foo
    $ echo 'main() { for (;;); }' > foo/foo.c
    $ gcc -g foo/foo.c
    foo/foo.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
    1 | main() { for (;;); }
    | ^~~~
    $ perf record ./a.out
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.230 MB perf.data (5721 samples) ]
    $ mv foo bar
    $ perf annotate

    $ perf annotate --prefix=/home/ak/lsrc/git/bar --prefix-strip=5

    Signed-off-by: Andi Kleen
    Tested-by: Jiri Olsa
    LPU-Reference: 20200107210444.214071-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

12 Nov, 2019

2 commits

  • So that we pass that substructure around and with it consolidate lots of
    functions that receive a (map, symbol) pair and now can receive just a
    'struct map_symbol' pointer.

    This further paves the way to add 'struct map_groups' to 'struct
    map_symbol' so that we can have all we need for annotation so that we
    can ditch 'struct map'->groups, i.e. have the map_groups pointer in a
    more central place, avoiding the pointer in the 'struct map' that have
    tons of instances.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-fs90ttd9q12l7989fo7pw81q@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • We are already passing things like:

    symbol__annotate(ms->sym, ms->map, ...)

    So shorten the signature of such functions to receive the 'map_symbol'
    pointer.

    This also paves the way to having the 'struct map_groups' pointer in the
    'struct map_symbol' so that we can get rid of 'struct map'->groups.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-23yx8v1t41nzpkpi7rdrozww@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

07 Nov, 2019

1 commit

  • We can get the per sample cycles by hist__account_cycles(). It's also
    useful to know the total cycles of all samples in order to get the
    cycles coverage for a single program block in further. For example:

    coverage = per block sampled cycles / total sampled cycles

    This patch creates a new argument 'total_cycles' in hist__account_cycles(),
    which will be added with the cycles of each sample.

    Signed-off-by: Jin Yao
    Reviewed-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

21 Sep, 2019

1 commit

  • This patch is to return error code of perf_new_session function on
    failure instead of NULL.

    Test Results:

    Before Fix:

    $ perf c2c report -input
    failed to open nput: No such file or directory

    $ echo $?
    0
    $

    After Fix:

    $ perf c2c report -input
    failed to open nput: No such file or directory

    $ echo $?
    254
    $

    Committer notes:

    Fix 'perf tests topology' case, where we use that TEST_ASSERT_VAL(...,
    session), i.e. we need to pass zero in case of failure, which was the
    case before when NULL was returned by perf_session__new() for failure,
    but now we need to negate the result of IS_ERR(session) to respect that
    TEST_ASSERT_VAL) expectation of zero meaning failure.

    Reported-by: Nageswara R Sastry
    Signed-off-by: Mamatha Inamdar
    Tested-by: Arnaldo Carvalho de Melo
    Tested-by: Nageswara R Sastry
    Acked-by: Ravi Bangoria
    Reviewed-by: Jiri Olsa
    Reviewed-by: Mukesh Ojha
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Greg Kroah-Hartman
    Cc: Jeremie Galarneau
    Cc: Kate Stewart
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Shawn Landden
    Cc: Song Liu
    Cc: Thomas Gleixner
    Cc: Tzvetomir Stoyanov
    Link: http://lore.kernel.org/lkml/20190822071223.17892.45782.stgit@localhost.localdomain
    Signed-off-by: Arnaldo Carvalho de Melo

    Mamatha Inamdar
     

20 Sep, 2019

1 commit


01 Sep, 2019

3 commits


30 Jul, 2019

1 commit

  • Rename struct perf_evsel to struct evsel, so we don't have a name clash
    when we add struct perf_evsel in libperf.

    Committer notes:

    Added fixes for arm64, provided by Jiri.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Andi Kleen
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

09 Jul, 2019

1 commit


16 May, 2019

1 commit

  • The hist__account_cycles() function is executed when the
    hist_iter__branch_callback() is called.

    But it looks it's not necessary. In hist__account_cycles, it already
    walks on all branch entries.

    This patch moves the hist__account_cycles out of callback, now the data
    processing is much faster than before.

    Previous code has an issue that the ch[offset].num++ (in
    __symbol__account_cycles) is executed repeatedly since
    hist__account_cycles is called in each hist_iter__branch_callback, so
    the counting of ch[offset].num is not correct (too big).

    With this patch, the issue is fixed. And we don't need the code of
    "ch->reset >= ch->num / 2" to check if there are too many overlaps (in
    annotation__count_and_fill), otherwise some data would be hidden.

    Now, we can try, for example:

    perf record -b ...
    perf annotate or perf report -s symbol

    The before/after output should be no change.

    v3:
    ---
    Fix the crash in stdio mode.
    Like previous code, it needs the checking of ui__has_annotation()
    before hist__account_cycles()

    v2:
    ---
    1. Cover the similar perf report
    2. Remove the checking code "ch->reset >= ch->num / 2"

    Signed-off-by: Jin Yao
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1552684577-29041-1-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

23 Feb, 2019

1 commit

  • Add a 'path' member to 'struct perf_data'. It will keep the configured
    path for the data (const char *). The path in struct perf_data_file is
    now dynamically allocated (duped) from it.

    This scheme is useful/used in following patches where struct
    perf_data::path holds the 'configure' directory path and struct
    perf_data_file::path holds the allocated path for specific files.

    Also it actually makes the code little simpler.

    Signed-off-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org
    [ Fixup data-convert-bt.c missing conversion ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

06 Feb, 2019

1 commit

  • Lots of places get the map.h file indirectly, and since we're going to
    remove it from machine.h, then those need to include it directly, do it
    now, before we remove that dep.

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-ob8jehdjda8h5jsrv9dqj9tf@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

25 Jan, 2019

2 commits

  • At the cost of an extra pointer, we can avoid the O(logN) cost of
    finding the first element in the tree (smallest node), which is
    something heavily required for histograms. Specifically, the following
    are converted to rb_root_cached, and users accordingly:

    hist::entries_in_array
    hist::entries_in
    hist::entries
    hist::entries_collapsed
    hist_entry::hroot_in
    hist_entry::hroot_out

    Signed-off-by: Davidlohr Bueso
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net
    [ Added some missing conversions to rb_first_cached() ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Davidlohr Bueso
     
  • At the cost of an extra pointer, we can avoid the O(logN) cost of
    finding the first element in the tree (smallest node).

    Signed-off-by: Davidlohr Bueso
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20181206191819.30182-6-dave@stgolabs.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Davidlohr Bueso
     

19 Sep, 2018

1 commit

  • Now that we keep a perf_tool pointer inside perf_session, there's no
    need to have a perf_tool argument in the event_op2 callback. Remove it.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180913125450.21342-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

09 Aug, 2018

1 commit

  • Add --percent-type option to set annotation percent type from following
    choices:

    global-period, local-period, global-hits, local-hits

    Examples:

    $ perf annotate --percent-type period-local --stdio | head -1
    Percent | Source code ... es, percent: local period)
    $ perf annotate --percent-type hits-local --stdio | head -1
    Percent | Source code ... es, percent: local hits)
    $ perf annotate --percent-type hits-global --stdio | head -1
    Percent | Source code ... es, percent: global hits)
    $ perf annotate --percent-type period-global --stdio | head -1
    Percent | Source code ... es, percent: global period)

    The local/global keywords set if the percentage is computed in the scope
    of the function (local) or the whole data (global).

    The period/hits keywords set the base the percentage is computed on -
    the samples period or the number of samples (hits).

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20180804130521.11408-20-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

25 Jun, 2018

1 commit

  • perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE]
    which is not defined and thus perf is crashing. HEADER_LAST_FEATURE is
    used as an end marker for the perf report but it's unused for perf
    script/annotate. Ignore HEADER_LAST_FEATURE for perf script/annotate,
    just like it is done in 'perf report'.

    Before:
    # perf record -o - ls | perf script

    Segmentation fault (core dumped)
    #

    After:
    # perf record -o - ls | perf script

    Segmentation fault (core dumped)
    ls 7031 4392.099856: 250000 cpu-clock:uhH: 7f5e0ce7cd60
    ls 7031 4392.100355: 250000 cpu-clock:uhH: 7f5e0c706ef7
    #

    Signed-off-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: David Carrillo-Cisneros
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Fixes: 57b5de463925 ("perf report: Support forced leader feature in pipe mode")
    Link: http://lkml.kernel.org/r/20180625124220.6434-4-ravi.bangoria@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     

04 Jun, 2018

6 commits


22 May, 2018

1 commit

  • With the '--group' option, even for non-explicit group, 'perf annotate'
    will enable the group output.

    For example,

    $ perf record -e cycles,branches ./div
    $ perf annotate main --stdio --group

    : Disassembly of section .text:
    :
    : 00000000004004b0 :
    : main():
    :
    : return i;
    : }
    :
    : int main(void)
    : {
    0.00 0.00 : 4004b0: push %rbx
    : int i;
    : int flag;
    : volatile double x = 1212121212, y = 121212;
    :
    : s_randseed = time(0);
    0.00 0.00 : 4004b1: xor %edi,%edi
    : srand(s_randseed);
    0.00 0.00 : 4004b3: mov $0x77359400,%ebx
    :
    : return i;
    : }
    :

    But if without --group, there is only one event reported.

    $ perf annotate main --stdio

    : Disassembly of section .text:
    :
    : 00000000004004b0 :
    : main():
    :
    : return i;
    : }
    :
    : int main(void)
    : {
    0.00 : 4004b0: push %rbx
    : int i;
    : int flag;
    : volatile double x = 1212121212, y = 121212;
    :
    : s_randseed = time(0);
    0.00 : 4004b1: xor %edi,%edi
    : srand(s_randseed);
    0.00 : 4004b3: mov $0x77359400,%ebx
    :
    : return i;
    : }

    Signed-off-by: Jin Yao
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1526914666-31839-4-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

27 Apr, 2018

1 commit

  • Remove the split of symbol tables for data (MAP__VARIABLE) and for
    functions (MAP__FUNCTION), its unneeded and there were various places
    doing two lookups to find a symbol, so simplify this.

    We still will consider only the symbols that matched the filters in
    place, i.e. see the (elf_(sec,sym)|symbol_type)__filter() routines in
    the patch, just so that we consider only the same symbols as before,
    to reduce the possibility of regressions.

    All the tests on 50-something build environments, in varios versions
    of lots of distros and cross build environments were performed without
    build regressions, as usual with all pull requests the other tests were
    also performed: 'perf test' and 'make -C tools/perf build-test'.

    Also this was done at a great granularity so that regressions can be
    bisected more easily.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-hiq0fy2rsleupnqqwuojo1ne@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

21 Mar, 2018

3 commits

  • This is already present in 'perf top', albeit undocumented (will fix),
    and is useful to use /proc/kcore instead of vmlinux and then get what is
    really in place, not what the kernel starts with, before alternatives,
    ftrace .text patching, etc, see the differences:

    # perf annotate --stdio2 _raw_spin_lock_irqsave
    _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
    Event: anon group { cycles, instructions }

    0.00 3.17 → callq __fentry__
    0.00 7.94 push %rbx
    7.69 36.51 → callq __page_file_index
    mov %rax,%rbx
    7.69 3.17 → callq *ffffffff82225cd0
    xor %eax,%eax
    mov $0x1,%edx
    80.77 49.21 lock cmpxchg %edx,(%rdi)
    test %eax,%eax
    ↓ jne 2b
    3.85 0.00 mov %rbx,%rax
    pop %rbx
    ← retq
    2b: mov %eax,%esi
    → callq queued_spin_lock_slowpath
    mov %rbx,%rax
    pop %rbx
    ← retq
    [root@jouet ~]# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
    _raw_spin_lock_irqsave() /proc/kcore
    Event: anon group { cycles, instructions }

    0.00 3.17 nop
    0.00 7.94 push %rbx
    0.00 23.81 pushfq
    7.69 12.70 pop %rax
    nop
    mov %rax,%rbx
    7.69 3.17 cli
    nop
    xor %eax,%eax
    mov $0x1,%edx
    80.77 49.21 lock cmpxchg %edx,(%rdi)
    test %eax,%eax
    ↓ jne 2b
    3.85 0.00 mov %rbx,%rax
    pop %rbx
    ← retq
    2b: mov %eax,%esi
    → callq *ffffffff820e96b0
    mov %rbx,%rax
    pop %rbx
    ← retq
    #

    Diff of the output of those commands:

    # perf annotate --stdio2 _raw_spin_lock_irqsave > /tmp/vmlinux
    # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave > /tmp/kcore
    # diff -y /tmp/vmlinux /tmp/kcore
    _raw_spin_lock_irqsave() vmlinux | _raw_spin_lock_irqsave() /proc/kcore
    Event: anon group { cycles, instructions } Event: anon group { cycles, instructions }

    0.00 3.17 → callq __fentry__ | 0.00 3.17 nop
    0.00 7.94 push %rbx 0.00 7.94 push %rbx
    7.69 36.51 → callq __page_file_index | 0.00 23.81 pushfq
    > 7.69 12.70 pop %rax
    > nop
    mov %rax,%rbx mov %rax,%rbx
    7.69 3.17 → callq *ffffffff82225cd0 | 7.69 3.17 cli
    > nop
    xor %eax,%eax xor %eax,%eax
    mov $0x1,%edx mov $0x1,%edx
    80.77 49.21 lock cmpxchg %edx,(%rdi) 80.77 49.21 lock cmpxchg %edx,(%rdi)
    test %eax,%eax test %eax,%eax
    ↓ jne 2b ↓ jne 2b
    3.85 0.00 mov %rbx,%rax 3.85 0.00 mov %rbx,%rax
    pop %rbx pop %rbx
    ← retq ← retq
    2b: mov %eax,%esi 2b: mov %eax,%esi
    → callq queued_spin_lock_slowpath| → callq *ffffffff820e96b0
    mov %rbx,%rax mov %rbx,%rax
    pop %rbx pop %rbx
    ← retq ← retq
    #

    This should be further streamlined by doing both annotations and
    allowing the TUI to toggle initial/current, and show the patched
    instructions in a slightly different color.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-wz8d269hxkcwaczr0r4rhyjg@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • One more thing that goes from the TUI code to be used more widely,
    for instance it'll affect the default options used by:

    perf annotate --stdio2

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-0nsz0dm0akdbo30vgja2a10e@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • This uses the TUI augmented formatting routines, modulo interactivity.

    # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
    _raw_spin_lock_irqsave() /proc/kcore
    Event: cycles:ppp

    Percent

    Disassembly of section load0:

    ffffffff9a8734b0 :
    nop
    push %rbx
    50.00 pushfq
    pop %rax
    nop
    mov %rax,%rbx
    cli
    nop
    xor %eax,%eax
    mov $0x1,%edx
    50.00 lock cmpxchg %edx,(%rdi)
    test %eax,%eax
    ↓ jne 2b
    mov %rbx,%rax
    pop %rbx
    ← retq
    2b: mov %eax,%esi
    → callq queued_spin_lock_slowpath
    mov %rbx,%rax
    pop %rbx
    ← retq

    Tested-by: Jin Yao
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-6cte5o8z84mbivbvqlg14uh1@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

08 Mar, 2018

1 commit

  • Unlike the perf report interactive annotate mode, the perf annotate
    doesn't display the IPC/Cycle even if branch info is recorded in perf
    data file.

    perf record -b ...
    perf annotate function

    It should show IPC/cycle, but it doesn't.

    This patch lets perf annotate support the displaying of IPC/Cycle if
    branch info is in perf data.

    For example,

    perf annotate compute_flag

    Percent│ IPC Cycle


    │ Disassembly of section .text:

    │ 0000000000400640 :
    │ compute_flag():
    │ volatile int count;
    │ static unsigned int s_randseed;

    │ __attribute__((noinline))
    │ int compute_flag()
    │ {
    22.96 │1.18 584 sub $0x8,%rsp
    │ int i;

    │ i = rand() % 2;
    23.02 │1.18 1 → callq rand@plt

    │ return i;
    27.05 │3.37 mov %eax,%edx
    │ }
    │3.37 add $0x8,%rsp
    │ {
    │ int i;

    │ i = rand() % 2;

    │ return i;
    │3.37 shr $0x1f,%edx
    │3.37 add %edx,%eax
    │3.37 and $0x1,%eax
    │3.37 sub %edx,%eax
    │ }
    26.97 │3.37 2 ← retq

    Note that, this patch only supports TUI mode. For stdio, now it just keeps
    original behavior. Will support it in a follow-up patch.

    $ perf annotate compute_flag --stdio

    Percent | Source code & Disassembly of div for cycles:ppp (7993 samples)
    ------------------------------------------------------------------------------
    :
    :
    :
    : Disassembly of section .text:
    :
    : 0000000000400640 :
    : compute_flag():
    : volatile int count;
    : static unsigned int s_randseed;
    :
    : __attribute__((noinline))
    : int compute_flag()
    : {
    0.29 : 400640: sub $0x8,%rsp # +100.00%
    : int i;
    :
    : i = rand() % 2;
    42.93 : 400644: callq 400490 # -100.00% (p:100.00%)
    :
    : return i;
    0.10 : 400649: mov %eax,%edx # +100.00%
    : }
    0.94 : 40064b: add $0x8,%rsp
    : {
    : int i;
    :
    : i = rand() % 2;
    :
    : return i;
    27.02 : 40064f: shr $0x1f,%edx
    0.15 : 400652: add %edx,%eax
    1.24 : 400654: and $0x1,%eax
    2.08 : 400657: sub %edx,%eax
    : }
    25.26 : 400659: retq # -100.00% (p:100.00%)

    Signed-off-by: Jin Yao
    Acked-by: Andi Kleen
    Link: http://lkml.kernel.org/r/20180223170210.GC7045@tassilo.jf.intel.com
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1519724327-7773-1-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

07 Nov, 2017

1 commit

  • Conflicts:
    tools/perf/arch/arm/annotate/instructions.c
    tools/perf/arch/arm64/annotate/instructions.c
    tools/perf/arch/powerpc/annotate/instructions.c
    tools/perf/arch/s390/annotate/instructions.c
    tools/perf/arch/x86/tests/intel-cqm.c
    tools/perf/ui/tui/progress.c
    tools/perf/util/zlib.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

31 Oct, 2017

1 commit

  • Add struct perf_data_file to represent a single file within a perf_data
    struct.

    Signed-off-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Changbin Du
    Cc: David Ahern
    Cc: Jin Yao
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org
    [ Fixup recent changes in 'perf script --per-event-dump' ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa