01 Sep, 2020

1 commit

  • Disable ordered_events for report raw dump, because for raw dump we want
    to see events as they are stored in the perf.data file, not sorted by
    time.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Ian Rogers
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200827134830.126721-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

23 Jun, 2020

2 commits


09 Jun, 2020

1 commit


02 Jun, 2020

1 commit

  • There exists some duplicated includes in tools/perf, remove them.

    Signed-off-by: Tiezhu Yang
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: xuefeng li
    Link: http://lore.kernel.org/lkml/1591071304-19338-2-git-send-email-yangtiezhu@loongson.cn
    Signed-off-by: Arnaldo Carvalho de Melo

    Tiezhu Yang
     

28 May, 2020

2 commits

  • Callchains are automatically initialized by checking on event's
    sample_type. For pipe mode we need to put this check into attr event
    code.

    Moving the callchains setup code into callchain_param_setup function and
    calling it from attr event process code.

    This enables pipe output having callchains, like:

    # perf record -g -e 'raw_syscalls:sys_enter' true | perf script
    # perf record -g -e 'raw_syscalls:sys_enter' true | perf report

    Committer notes:

    We still need the next patch for the above output to work.

    Reported-by: Paul Khuong
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Ian Rogers
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200507095024.2789147-5-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
    libperf, to whom the perf_ prefix belongs.

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

06 May, 2020

4 commits


30 Apr, 2020

1 commit

  • Fixes coccicheck warning:

    tools/perf/builtin-report.c:1403:2-34: WARNING: Assignment of 0/1 to bool variable

    Reported-by: Hulk Robot
    Signed-off-by: Zou Wei
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/1587904683-3510-1-git-send-email-zou_wei@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Zou Wei
     

18 Apr, 2020

1 commit

  • With the LBR stitching approach, the reconstructed LBR call stack can
    break the HW limitation. However, it may reconstruct invalid call stacks
    in some cases, e.g. exception handing such as setjmp/longjmp. Also, it
    may impact the processing time especially when the number of samples
    with stitched LBRs are huge.

    Add an option to enable the approach.

    # To display the perf.data header info, please use
    # --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 6K of event 'cycles'
    # Event count (approx.): 6492797701
    #
    # Children Self Command Shared Object Symbol
    # ........ ........ ............... ..................
    # .................................
    #
    99.99% 99.99% tchain_edit tchain_edit [.] f43
    |
    ---main
    f1
    f2
    f3
    f4
    f5
    f6
    f7
    f8
    f9
    f10
    f11
    f12
    f13
    f14
    f15
    f16
    f17
    f18
    f19
    f20
    f21
    f22
    f23
    f24
    f25
    f26
    f27
    f28
    f29
    f30
    f31
    |
    --99.65%--f32
    f33
    f34
    f35
    f36
    f37
    f38
    f39
    f40
    f41
    f42
    f43

    Committer testing:

    $ perf record --call-graph lbr /wb/tchain_edit
    [ perf record: Woken up 23 times to write data ]
    [ perf record: Captured and wrote 5.578 MB perf.data (6839 samples) ]
    $ perf report --header-only | egrep 'cpu(desc|.*capabilities)'
    # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
    # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
    $

    Before:

    $ perf report --no-children --stdio
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 6K of event 'cycles:u'
    # Event count (approx.): 6459523879
    #
    # Overhead Command Shared Object Symbol
    # ........ ........... ................ .......................
    #
    99.95% tchain_edit tchain_edit [.] f43
    |
    --99.92%--f43
    f42
    f41
    f40
    f39
    f38
    f37
    f36
    f35
    f34
    f33
    f32
    f31
    f30
    f29
    f28
    f27
    f26
    f25
    f24
    f23
    f22
    f21
    f20
    f19
    f18
    f17
    f16
    f15
    f14
    f13
    f12
    f11

    0.03% tchain_edit tchain_edit [.] f42
    0.01% tchain_edit tchain_edit [.] f41
    0.00% tchain_edit tchain_edit [.] f31
    0.00% tchain_edit ld-2.29.so [.] _dl_relocate_object
    0.00% tchain_edit ld-2.29.so [.] memmove
    0.00% tchain_edit [unknown] [k] 0xffffffff93a00b17

    After:

    $ perf report --stitch-lbr --no-children --stdio
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 6K of event 'cycles:u'
    # Event count (approx.): 6459496645
    #
    # Overhead Command Shared Object Symbol
    # ........ ........... ................ ........................
    #
    99.97% tchain_edit tchain_edit [.] f43
    |
    --99.93%--f43
    f42
    f41
    f40
    f39
    f38
    f37
    f36
    f35
    f34
    f33
    f32
    f31
    f30
    f29
    f28
    f27
    f26
    f25
    f24
    f23
    f22
    f21
    f20
    f19
    f18
    f17
    f16
    f15
    f14
    f13
    f12
    f11
    f10
    f9
    f8
    f7
    f6
    f5
    f4
    f3
    f2
    f1
    main
    __libc_start_main

    0.02% tchain_edit [unknown] [k] 0xffffffff93a00b17
    0.01% tchain_edit tchain_edit [.] f31
    0.00% tchain_edit ld-2.29.so [.] _dl_important_hwcaps

    Signed-off-by: Kan Liang
    Reviewed-by: Andi Kleen
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Adrian Hunter
    Cc: Alexey Budankov
    Cc: Mathieu Poirier
    Cc: Michael Ellerman
    Cc: Namhyung Kim
    Cc: Pavel Gerasimov
    Cc: Peter Zijlstra
    Cc: Ravi Bangoria
    Cc: Stephane Eranian
    Cc: Vitaly Slobodskoy
    Link: http://lore.kernel.org/lkml/20200319202517.23423-14-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

16 Apr, 2020

1 commit


03 Apr, 2020

1 commit

  • Implement basic functionality to support cgroup tracking. Each cgroup
    can be identified by inode number which can be read from userspace too.
    The actual cgroup processing will come in the later patch.

    Reported-by: kernel test robot
    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    [ fix perf test failure on sampling parsing ]
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200325124536.2800725-4-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

24 Mar, 2020

2 commits

  • Sometimes we may need to reload the browser to update the output since
    some options are changed.

    This patch creates a new key K_RELOAD. Once the __cmd_report() returns
    K_RELOAD, it would repeat the whole process, such as, read samples from
    data file, sort the data and display in the browser.

    v5:
    ---
    1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
    2. Skip setup_sorting() in repeat path if last key is K_RELOAD.

    v4:
    ---
    Need to quit in perf_evsel_menu__run if key is K_RELOAD.

    Signed-off-by: Jin Yao
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • When performing "perf report --group", it shows the event group
    information together. By default, the output is sorted by the first
    event in group.

    It would be nice for user to select any event for sorting. This patch
    introduces a new option "--group-sort-idx" to sort the output by the
    event at the index n in event group.

    For example,

    Before:

    # perf report --group --stdio

    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
    # Event count (approx.): 6451235635
    #
    # Overhead Command Shared Object Symbol
    # ................................ ......... ....................... ...................................
    #
    92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1
    3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515
    1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7
    1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce
    1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair
    0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single
    0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle
    0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check
    0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt
    ...

    After:

    # perf report --group --stdio --group-sort-idx 3

    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
    # Event count (approx.): 6451235635
    #
    # Overhead Command Shared Object Symbol
    # ................................ ......... ....................... ...................................
    #
    92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1
    0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle
    3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515
    0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns
    1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7
    0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single
    0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr
    0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt
    0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write
    0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se
    0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] scheduler_tick

    Now the output is sorted by the fourth event in group.

    v7:
    ---
    Rebase to latest perf/core, no other change.

    v4:
    ---
    1. Update Documentation/perf-report.txt to mention
    '--group-sort-idx' support multiple groups with different
    amount of events and it should be used on grouped events.

    2. Update __hpp__group_sort_idx(), just return when the
    idx is out of limit.

    3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups.
    So now we don't need to use together with --group.

    v3:
    ---
    Refine the code in __hpp__group_sort_idx().

    Before:
    for (i = 1; i < nr_members; i++) {
    if (i == idx) {
    ret = field_cmp(fields_a[i], fields_b[i]);
    if (ret)
    goto out;
    }
    }

    After:
    if (idx >= 1 && idx < nr_members) {
    ret = field_cmp(fields_a[idx], fields_b[idx]);
    if (ret)
    goto out;
    }

    Signed-off-by: Jin Yao
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com
    [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

18 Mar, 2020

1 commit

  • Previously we could get the report of branch type statistics.

    For example:

    # perf record -j any,save_type ...
    # t perf report --stdio

    #
    # Branch Statistics:
    #
    COND_FWD: 40.6%
    COND_BWD: 4.1%
    CROSS_4K: 24.7%
    CROSS_2M: 12.3%
    COND: 44.7%
    UNCOND: 0.0%
    IND: 6.1%
    CALL: 24.5%
    RET: 24.7%

    But now for the recent perf, it can't report the branch type statistics.

    It's a regression issue caused by commit 40c39e304641 ("perf report: Fix
    a no annotate browser displayed issue"), which only counts the branch
    type statistics for browser mode.

    This patch moves the branch_type_count() outside of ui__has_annotation()
    checking, then branch type statistics can work for stdio mode.

    Fixes: 40c39e304641 ("perf report: Fix a no annotate browser displayed issue")
    Signed-off-by: Jin Yao
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200313134607.12873-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

10 Mar, 2020

1 commit

  • Currently we use a predefined array to set the block info output
    formats, it's fixed and inflexible.

    This patch adds two parameters "block_hpps" and "nr_hpps" in
    block_info__create_report and other static functions, in order to let
    user decide which columns to report and with specified report ordering.
    It should be more flexible.

    Buffers will be allocated to contain the new fmts, of course, we need to
    release them before perf exits.

    Signed-off-by: Jin Yao
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

27 Feb, 2020

1 commit

  • perf default config set by user in [annotate] section is totally ignored
    by annotate code. Fix it.

    Before:

    $ ./perf config
    annotate.hide_src_code=true
    annotate.show_nr_jumps=true
    annotate.show_nr_samples=true

    $ ./perf annotate shash
    │ unsigned h = 0;
    │ movl $0x0,-0xc(%rbp)
    │ while (*s)
    │ ↓ jmp 44
    │ h = 65599 * h + *s++;
    11.33 │24: mov -0xc(%rbp),%eax
    43.50 │ imul $0x1003f,%eax,%ecx
    │ mov -0x18(%rbp),%rax

    After:

    │ movl $0x0,-0xc(%rbp)
    │ ↓ jmp 44
    1 │1 24: mov -0xc(%rbp),%eax
    4 │ imul $0x1003f,%eax,%ecx
    │ mov -0x18(%rbp),%rax

    Note that we have removed show_nr_samples and show_total_period from
    annotation_options because they are not used. Instead of them we use
    symbol_conf.show_nr_samples and symbol_conf.show_total_period.

    Committer testing:

    Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:

    # perf config
    #
    # perf config annotate.hide_src_code=true
    # perf config
    annotate.hide_src_code=true
    #
    # perf config annotate.show_nr_jumps=true
    # perf config annotate.show_nr_samples=true
    # perf config
    annotate.hide_src_code=true
    annotate.show_nr_jumps=true
    annotate.show_nr_samples=true
    #
    #

    Before:

    # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
    Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
    ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
    Percent
    00000000000609f0 :
    endbr64
    cmpq $0x0,0x20(%rdi)
    ↓ je 10
    xor %eax,%eax
    ← retq
    xchg %ax,%ax
    100.00 10: push %rbp
    cmpq $0x0,0x18(%rdi)
    mov %rdi,%rbp
    ↓ jne 20
    1b: xor %eax,%eax
    pop %rbp
    ← retq
    nop
    20: lea 0x18(%rdi),%rdi
    → callq JS_UpdateWeakPointerAfterGC(JS::Heap /dev/null
    Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
    ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
    Samples endbr64
    cmpq $0x0,0x20(%rdi)
    ↓ je 10
    xor %eax,%eax
    ← retq
    xchg %ax,%ax
    1 1 10: push %rbp
    cmpq $0x0,0x18(%rdi)
    mov %rdi,%rbp
    ↓ jne 20
    1 1b: xor %eax,%eax
    pop %rbp
    ← retq
    nop
    1 20: lea 0x18(%rdi),%rdi
    → callq JS_UpdateWeakPointerAfterGC(JS::Heap /dev/null
    Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
    ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
    Samples endbr64
    cmpq $0x0,0x20(%rdi)
    ↓ je 10
    xor %eax,%eax
    ← retq
    xchg %ax,%ax
    1 10: push %rbp
    cmpq $0x0,0x18(%rdi)
    mov %rdi,%rbp
    ↓ jne 20
    1b: xor %eax,%eax
    pop %rbp
    ← retq
    nop
    20: lea 0x18(%rdi),%rdi
    → callq JS_UpdateWeakPointerAfterGC(JS::Heap
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Adrian Hunter
    Cc: Alexey Budankov
    Cc: Changbin Du
    Cc: Ian Rogers
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Leo Yan
    Cc: Namhyung Kim
    Cc: Song Liu
    Cc: Taeung Song
    Cc: Thomas Richter
    Cc: Yisheng Xie
    Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     

14 Jan, 2020

3 commits

  • Commit 800d3f561659 ("perf report: Add warning when libunwind not
    compiled in") breaks the s390 platform. S390 uses libdw-dwarf-unwind for
    call chain unwinding and had no support for libunwind.

    So the warning "Please install libunwind development packages during the
    perf build." caused the confusion even if the call-graph is displayed
    correctly.

    This patch adds checking for HAVE_DWARF_SUPPORT, which is set when
    libdw-dwarf-unwind is compiled in.

    Fixes: 800d3f561659 ("perf report: Add warning when libunwind not compiled in")
    Signed-off-by: Jin Yao
    Reviewed-by: Thomas Richter
    Tested-by: Thomas Richter
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200107191745.18415-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • The objdump utility has useful --prefix / --prefix-strip options to
    allow changing source code file names hardcoded into executables' debug
    info. Add options to 'perf report', 'perf top' and 'perf annotate',
    which are then passed to objdump.

    $ mkdir foo
    $ echo 'main() { for (;;); }' > foo/foo.c
    $ gcc -g foo/foo.c
    foo/foo.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
    1 | main() { for (;;); }
    | ^~~~
    $ perf record ./a.out
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.230 MB perf.data (5721 samples) ]
    $ mv foo bar
    $ perf annotate

    $ perf annotate --prefix=/home/ak/lsrc/git/bar --prefix-strip=5

    Signed-off-by: Andi Kleen
    Tested-by: Jiri Olsa
    LPU-Reference: 20200107210444.214071-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • Refer to --no-children, which is what most people probably want.

    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    LPU-Reference: 20200103183643.149150-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

21 Dec, 2019

1 commit

  • We observed an issue that was some extra columns displayed after switching
    perf data file in browser. The steps to reproduce:

    1. perf record -a -e cycles,instructions -- sleep 3
    2. perf report --group
    3. In browser, we use hotkey 's' to switch to another perf.data
    4. Now in browser, the extra columns 'Self' and 'Children' are displayed.

    The issue is setup_sorting() executed again after repeat path, so dimensions
    are added again.

    This patch checks the last key returned from __cmd_report(). If it's
    K_SWITCH_INPUT_DATA, skips the setup_sorting().

    Fixes: ad0de0971b7f ("perf report: Enable the runtime switching of perf data file")
    Signed-off-by: Jin Yao
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Feng Tang
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191220013722.20592-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

04 Dec, 2019

1 commit

  • If perf.data is recorded without -d, don't allow user to use --mem-mode
    with 'perf report'. symbol_daddr and phys_daddr can be recorded
    separately and may be present in the perf.data but at the report time
    they are associated with mem-mode fields and thus this restriction
    applies to them as well.

    Before:
    $ perf record ls
    $ perf report --mem-mode --stdio
    # Overhead Local Weight Memory access Symbol
    # ........ ............ ............. .......................
    55.56% 0 N/A [k] 0xffffffff81a00ae7

    After:
    $ perf report --mem-mode --stdio
    Error:
    Selected --mem-mode but no mem data. Did you call perf record without -d?

    Suggested-by: Arnaldo Carvalho de Melo
    Signed-off-by: Ravi Bangoria
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/20191114132213.5419-4-ravi.bangoria@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     

26 Nov, 2019

2 commits

  • One more step on the merge of 'struct maps' with 'struct map_groups'.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-69vcr8pubpym90skxhmbwhiw@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • And pick the shortest name: 'struct maps'.

    The split existed because we used to have two groups of maps, one for
    functions and one for variables, but that only complicated things,
    sometimes we needed to figure out what was at some address and then had
    to first try it on the functions group and if that failed, fall back to
    the variables one.

    That split is long gone, so for quite a while we had only one struct
    maps per struct map_groups, simplify things by combining those structs.

    First patch is the minimum needed to merge both, follow up patches will
    rename 'thread->mg' to 'thread->maps', etc.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

20 Nov, 2019

3 commits

  • This patch supports jumping from tui total cycles view to symbol source
    view.

    For example,

    perf record -b ./div
    perf report --total-cycles

    In total cycles view, we can select one entry and press 'a' or press
    ENTER key to jump to symbol source view.

    This patch also sets sort_order to NULL in cmd_report() which will use
    the default branch sort order. The percent value in new annotate view
    will be consistent with the percent in annotate view switched from perf
    report (we observed the original percent gap with previous patches).

    v2:
    ---
    Fix the 'make NO_SLANG=1' error. (set __maybe_unused to
    annotation_opts in block_hists_tui_browse()).

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • And take it into account when looking up DSOs when we have the dso_id
    fields obtained from somewhere, like from PERF_RECORD_MMAP2 records.

    Instances of struct map pointing to the same DSO pathname but with
    anything in dso_id different are in fact different DSOs, so better have
    different 'struct dso' instances to reflect that. At some point we may
    want to get copies of the contents of the different objects if we want
    to do correct annotation or other analysis.

    With this we get 'struct map' 24 bytes leaner:

    $ pahole -C map ~/bin/perf
    struct map {
    union {
    struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
    struct list_head node; /* 0 16 */
    } __attribute__((__aligned__(8))); /* 0 24 */
    u64 start; /* 24 8 */
    u64 end; /* 32 8 */
    _Bool erange_warned:1; /* 40: 0 1 */
    _Bool priv:1; /* 40: 1 1 */

    /* XXX 6 bits hole, try to pack */
    /* XXX 3 bytes hole, try to pack */

    u32 prot; /* 44 4 */
    u64 pgoff; /* 48 8 */
    u64 reloc; /* 56 8 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    u64 (*map_ip)(struct map *, u64); /* 64 8 */
    u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
    struct dso * dso; /* 80 8 */
    refcount_t refcnt; /* 88 4 */
    u32 flags; /* 92 4 */

    /* size: 96, cachelines: 2, members: 13 */
    /* sum members: 92, holes: 1, sum holes: 3 */
    /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
    /* forced alignments: 1 */
    /* last cacheline: 32 bytes */
    } __attribute__((__aligned__(8)));
    $

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • And this patch highlights where these fields are being used: in the sort
    order where it uses it to compare maps and classify samples taking into
    account not just the DSO, but those DSO id fields.

    I think these should be used to differentiate DSOs with the same name
    but different 'struct dso_id' fields, i.e. these fields should move to
    'struct dso' and then be used as part of the key when doing lookups for
    DSOs, in addition to the DSO name.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-8v5isitqy0dup47nnwkpc80f@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

12 Nov, 2019

1 commit


07 Nov, 2019

5 commits

  • Previous patch has implemented a new option "--total-cycles". But only
    stdio mode is supported.

    This patch supports the tui mode and support '--percent-limit'.

    For example,

    perf record -b ./div
    perf report --total-cycles --percent-limit 1

    # Samples: 2753248 of event 'cycles'
    Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
    15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
    5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
    4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
    4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
    3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
    3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
    3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
    2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
    2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
    2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
    2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
    2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
    2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
    1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
    1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
    1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so

    --------------------------------------------------

    v7:
    ---
    1. Since we have used use_browser in report__browse_block_hists
    to support stdio mode, now we also add supporting for tui.

    2. Move block tui browser code from ui/browsers/hists.c
    to block-info.c.

    v6:
    ---
    Create report__tui_browse_block_hists in block-info.c
    (codes are moved from builtin-report.c).

    v5:
    ---
    Fix a crash issue when running perf report without
    '--total-cycles'. The issue is because the internal flag
    is renamed from 'total_cycles' to 'total_cycles_mode' in
    previous patch but this patch still uses 'total_cycles'
    to check if the '--total-cycles' option is enabled, which
    causes the code to be inconsistent.

    v4:
    ---
    Since the block collection is moved out of printing in
    previous patch, this patch is updated accordingly for
    tui supporting.

    v3:
    ---
    Minor change since the function name is changed:
    block_total_cycles_percent -> block_info__total_cycles_percent

    Signed-off-by: Jin Yao
    Reviewed-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • We have already supported the '--total-cycles' option in previous patch.
    It's also useful to show entries only above a threshold percent.

    This patch enables '--percent-limit' for not showing entries
    under that percent.

    For example:

    perf report --total-cycles --stdio --percent-limit 1

    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 2M of event 'cycles'
    # Event count (approx.): 2753248
    #
    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    # ............... .............. ........... .......... ................................................................. ....................
    #
    26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
    15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
    5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
    4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
    4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
    3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
    3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
    3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
    2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
    2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
    2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
    2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
    2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
    2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
    1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
    1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
    1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so

    Committer testing:

    From second exapmple onwards slightly edited for brevity:

    # perf report --total-cycles --percent-limit 2 --stdio
    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 6M of event 'cycles'
    # Event count (approx.): 6299936
    #
    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    # ............... .............. ........... .......... ...................................................................... ....................
    #
    2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
    #
    # (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
    #
    # perf report --total-cycles --percent-limit 1 --stdio
    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
    1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
    #
    # perf report --total-cycles --percent-limit 0.7 --stdio
    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
    1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
    0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
    #

    -------------------------------------------

    It only shows the entries which 'Sampled Cycles%' > 1%.

    v7:
    ---
    No functional change. Only fix the conflict issue because
    previous patches are changed.

    v6:
    ---
    No functional change. Only fix the conflict issue because
    previous patches are changed.

    v5:
    ---
    No functional change. Only fix the conflict issue because
    previous patches are changed.

    v4:
    ---
    No functional change. Only fix the build issue because
    previous patches are changed.

    Signed-off-by: Jin Yao
    Reviewed-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • It would be useful to support sorting for all blocks by the sampled
    cycles percent per block. This is useful to concentrate on the globally
    hottest blocks.

    This patch implements a new option "--total-cycles" which sorts all
    blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:

    percent = block sampled cycles aggregation / total sampled cycles

    Note that, this patch only supports "--stdio" mode.

    For example,

    # perf record -b ./div
    # perf report --total-cycles --stdio
    # To display the perf.data header info, please use --header/--header-only options.
    #
    # Total Lost Samples: 0
    #
    # Samples: 2M of event 'cycles'
    # Event count (approx.): 2753248
    #
    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    # ............... .............. ........... .......... ................................................ .................
    #
    26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
    15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
    5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
    4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
    4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
    3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
    3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
    3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
    2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
    2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
    2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
    2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
    2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
    2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
    1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
    1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
    1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so
    0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so
    0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms]
    0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms]
    0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms]
    0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms]
    0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms]
    0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms]
    0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms]
    0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms]
    0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms]
    0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms]
    0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms]

    Committer testing:

    This should provide material for hours of endless joy, both from looking
    for suspicious things in the implementation of this patch, such as the
    top one:

    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]

    As well from things that look legit:

    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]

    :-)

    Very short system wide taken branches session:

    # perf record -h -b

    Usage: perf record [] []
    or: perf record [] -- []

    -b, --branch-any sample any taken branches

    #
    # perf record -b
    ^C[ perf record: Woken up 595 times to write data ]
    [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]

    #
    # perf evlist -v
    cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
    #
    # perf report --total-cycles --stdio
    # To display the perf.data header info, please use --header/--header-only options.
    #
    # Total Lost Samples: 0
    #
    # Samples: 6M of event 'cycles'
    # Event count (approx.): 6299936
    #
    # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
    # ............... .............. ........... .......... ...................................................................... ....................
    #
    2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
    1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
    0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
    0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux]
    0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux]
    0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux]
    0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux]
    0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux]
    0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux]
    0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux]
    0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux]
    0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux]
    0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
    0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux]
    0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux]
    0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux]
    0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux]
    0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux]
    0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux]
    0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux]
    0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux]
    0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux]
    0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux]
    0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux]
    0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux]
    0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux]
    0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux]

    -----------------------------------------------------------

    v7:
    ---
    Use use_browser in report__browse_block_hists for supporting
    stdio and potential tui mode.

    v6:
    ---
    Create report__browse_block_hists in block-info.c (codes are
    moved from builtin-report.c). It's called from
    perf_evlist__tty_browse_hists.

    v5:
    ---
    1. Move all block functions to block-info.c

    2. Move the code of setting ms in block hist_entry to
    other patch.

    v4:
    ---
    1. Use new option '--total-cycles' to replace
    '-s total_cycles' in v3.

    2. Move block info collection out of block info
    printing.

    v3:
    ---
    1. Use common function block_info__process_sym to
    process the blocks per symbol.

    2. Remove the nasty hack for skipping calculation
    of column length

    3. Some minor cleanup

    Signed-off-by: Jin Yao
    Reviewed-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • We can get the per sample cycles by hist__account_cycles(). It's also
    useful to know the total cycles of all samples in order to get the
    cycles coverage for a single program block in further. For example:

    coverage = per block sampled cycles / total sampled cycles

    This patch creates a new argument 'total_cycles' in hist__account_cycles(),
    which will be added with the cycles of each sample.

    Signed-off-by: Jin Yao
    Reviewed-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     
  • To reduce boilerplate, provide a more compact form using an idiom
    present in other trees of data structures.

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 Oct, 2019

1 commit

  • We received a user report that call-graph DWARF mode was enabled in
    'perf record' but 'perf report' didn't unwind the callstack correctly.
    The reason was, libunwind was not compiled in.

    We can use 'perf -vv' to check the compiled libraries but it would be
    valuable to report a warning to user directly (especially valuable for
    a perf newbie).

    The warning is:

    Warning:
    Please install libunwind development packages during the perf build.

    Both TUI and stdio are supported.

    Signed-off-by: Jin Yao
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191011022122.26369-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

21 Sep, 2019

1 commit

  • This patch is to return error code of perf_new_session function on
    failure instead of NULL.

    Test Results:

    Before Fix:

    $ perf c2c report -input
    failed to open nput: No such file or directory

    $ echo $?
    0
    $

    After Fix:

    $ perf c2c report -input
    failed to open nput: No such file or directory

    $ echo $?
    254
    $

    Committer notes:

    Fix 'perf tests topology' case, where we use that TEST_ASSERT_VAL(...,
    session), i.e. we need to pass zero in case of failure, which was the
    case before when NULL was returned by perf_session__new() for failure,
    but now we need to negate the result of IS_ERR(session) to respect that
    TEST_ASSERT_VAL) expectation of zero meaning failure.

    Reported-by: Nageswara R Sastry
    Signed-off-by: Mamatha Inamdar
    Tested-by: Arnaldo Carvalho de Melo
    Tested-by: Nageswara R Sastry
    Acked-by: Ravi Bangoria
    Reviewed-by: Jiri Olsa
    Reviewed-by: Mukesh Ojha
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Greg Kroah-Hartman
    Cc: Jeremie Galarneau
    Cc: Kate Stewart
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Shawn Landden
    Cc: Song Liu
    Cc: Thomas Gleixner
    Cc: Tzvetomir Stoyanov
    Link: http://lore.kernel.org/lkml/20190822071223.17892.45782.stgit@localhost.localdomain
    Signed-off-by: Arnaldo Carvalho de Melo

    Mamatha Inamdar
     

20 Sep, 2019

1 commit


01 Sep, 2019

1 commit