Eric Lee / smarc-fsl-linux-kernel

01 Sep, 2020

1 commit

977f739b7 perf report: Disable ordered_events for raw dump ... Browse Code »

Disable ordered_events for report raw dump, because for raw dump we want
to see events as they are stored in the perf.data file, not sorted by
time.

Signed-off-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Ian Rogers
Cc: Michael Petlan
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200827134830.126721-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2020-09-01 23:20:25 +0800

23 Jun, 2020

2 commits

92c7d7cdf perf evlist: Fix the class prefix for 'struct evlist' branch_type methods ... Browse Code »

To differentiate from libperf's 'struct perf_evlist' methods.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2020-06-23 03:28:10 +0800
b3c2cc2bd perf evlist: Fix the class prefix for 'struct evlist' sample_type methods ... Browse Code »

To differentiate from libperf's 'struct perf_evlist' methods.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2020-06-23 03:28:09 +0800

09 Jun, 2020

1 commit

11b6e5482 perf report: Fix NULL pointer dereference in hists__fprintf_nr_sample_events() ... Browse Code »

The 'evname' variable can be NULL, as it is checked a few lines back,
check it before using.

Fixes: 9e207ddfa207 ("perf report: Show call graph from reference events")
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Mark Rutland
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/
Signed-off-by: Gaurav Singh

Gaurav Singh
2020-06-09 23:40:04 +0800

02 Jun, 2020

1 commit

3e9b26dc2 perf tools: Remove some duplicated includes ... Browse Code »

There exists some duplicated includes in tools/perf, remove them.

Signed-off-by: Tiezhu Yang
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Mark Rutland
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: xuefeng li
Link: http://lore.kernel.org/lkml/1591071304-19338-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo

Tiezhu Yang
2020-06-02 22:09:41 +0800

28 May, 2020

2 commits

0d71a2b24 perf callchain: Setup callchain properly in pipe mode ... Browse Code »

Callchains are automatically initialized by checking on event's
sample_type. For pipe mode we need to put this check into attr event
code.

Moving the callchains setup code into callchain_param_setup function and
calling it from attr event process code.

This enables pipe output having callchains, like:

# perf record -g -e 'raw_syscalls:sys_enter' true | perf script
# perf record -g -e 'raw_syscalls:sys_enter' true | perf report

Committer notes:

We still need the next patch for the above output to work.

Reported-by: Paul Khuong
Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Ian Rogers
Cc: Michael Petlan
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200507095024.2789147-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2020-05-28 21:03:25 +0800
10c513f79 perf evsel: Rename perf_evsel__resort*() to evsel__resort*() ... Browse Code »

As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2020-05-28 21:03:24 +0800

06 May, 2020

4 commits

c754c382c perf evsel: Rename perf_evsel__is_*() to evsel__is*() ... Browse Code »

As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2020-05-06 03:35:31 +0800
347c751a6 perf evsel: Rename perf_evsel__group_desc() to evsel__group_desc() ... Browse Code »

As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2020-05-06 03:35:30 +0800
8ab2e96d8 perf evsel: Rename *perf_evsel__*name() to *evsel__*name() ... Browse Code »

As they are 'struct evsel' methods or related routines, not part of
tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2020-05-06 03:35:30 +0800
ec90e42ce perf auxtrace: Add option to synthesize branch stack for regular events ... Browse Code »

There is an existing option to synthesize branch stacks for synthesized
events. Add a new option to synthesize branch stacks for regular events.

Signed-off-by: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Link: http://lore.kernel.org/lkml/20200429150751.12570-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Adrian Hunter
2020-05-06 03:35:29 +0800

30 Apr, 2020

1 commit

6fa9c3e77 perf report: Fix warning assignment of 0/1 to bool variable ... Browse Code »

Fixes coccicheck warning:

tools/perf/builtin-report.c:1403:2-34: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot
Signed-off-by: Zou Wei
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Mark Rutland
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/1587904683-3510-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo

Zou Wei
2020-04-30 21:48:33 +0800

18 Apr, 2020

1 commit

b1d1429b1 perf report: Add option to enable the LBR stitching approach ... Browse Code »

With the LBR stitching approach, the reconstructed LBR call stack can
break the HW limitation. However, it may reconstruct invalid call stacks
in some cases, e.g. exception handing such as setjmp/longjmp. Also, it
may impact the processing time especially when the number of samples
with stitched LBRs are huge.

Add an option to enable the approach.

# To display the perf.data header info, please use
# --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cycles'
# Event count (approx.): 6492797701
#
# Children Self Command Shared Object Symbol
# ........ ........ ............... ..................
# .................................
#
99.99% 99.99% tchain_edit tchain_edit [.] f43
|
---main
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22
f23
f24
f25
f26
f27
f28
f29
f30
f31
|
--99.65%--f32
f33
f34
f35
f36
f37
f38
f39
f40
f41
f42
f43

Committer testing:

$ perf record --call-graph lbr /wb/tchain_edit
[ perf record: Woken up 23 times to write data ]
[ perf record: Captured and wrote 5.578 MB perf.data (6839 samples) ]
$ perf report --header-only | egrep 'cpu(desc|.*capabilities)'
# cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
$

Before:

$ perf report --no-children --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cycles:u'
# Event count (approx.): 6459523879
#
# Overhead Command Shared Object Symbol
# ........ ........... ................ .......................
#
99.95% tchain_edit tchain_edit [.] f43
|
--99.92%--f43
f42
f41
f40
f39
f38
f37
f36
f35
f34
f33
f32
f31
f30
f29
f28
f27
f26
f25
f24
f23
f22
f21
f20
f19
f18
f17
f16
f15
f14
f13
f12
f11

0.03% tchain_edit tchain_edit [.] f42
0.01% tchain_edit tchain_edit [.] f41
0.00% tchain_edit tchain_edit [.] f31
0.00% tchain_edit ld-2.29.so [.] _dl_relocate_object
0.00% tchain_edit ld-2.29.so [.] memmove
0.00% tchain_edit [unknown] [k] 0xffffffff93a00b17

After:

$ perf report --stitch-lbr --no-children --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cycles:u'
# Event count (approx.): 6459496645
#
# Overhead Command Shared Object Symbol
# ........ ........... ................ ........................
#
99.97% tchain_edit tchain_edit [.] f43
|
--99.93%--f43
f42
f41
f40
f39
f38
f37
f36
f35
f34
f33
f32
f31
f30
f29
f28
f27
f26
f25
f24
f23
f22
f21
f20
f19
f18
f17
f16
f15
f14
f13
f12
f11
f10
f9
f8
f7
f6
f5
f4
f3
f2
f1
main
__libc_start_main

0.02% tchain_edit [unknown] [k] 0xffffffff93a00b17
0.01% tchain_edit tchain_edit [.] f31
0.00% tchain_edit ld-2.29.so [.] _dl_important_hwcaps

Signed-off-by: Kan Liang
Reviewed-by: Andi Kleen
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Adrian Hunter
Cc: Alexey Budankov
Cc: Mathieu Poirier
Cc: Michael Ellerman
Cc: Namhyung Kim
Cc: Pavel Gerasimov
Cc: Peter Zijlstra
Cc: Ravi Bangoria
Cc: Stephane Eranian
Cc: Vitaly Slobodskoy
Link: http://lore.kernel.org/lkml/20200319202517.23423-14-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Kan Liang
2020-04-18 20:05:01 +0800

16 Apr, 2020

1 commit

1c5c25b3f perf auxtrace: Add an option to synthesize callchains for regular events ... Browse Code »

Currently, callchains can be synthesized only for synthesized events. Add
an itrace option to synthesize callchains for regular events.

Signed-off-by: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Link: http://lore.kernel.org/lkml/20200401101613.6201-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Adrian Hunter
2020-04-16 23:19:15 +0800

03 Apr, 2020

1 commit

ba78c1c54 perf tools: Basic support for CGROUP event ... Browse Code »

Implement basic functionality to support cgroup tracking. Each cgroup
can be identified by inode number which can be read from userspace too.
The actual cgroup processing will come in the later patch.

Reported-by: kernel test robot
Signed-off-by: Namhyung Kim
Cc: Adrian Hunter
[ fix perf test failure on sampling parsing ]
Cc: Alexander Shishkin
Cc: Jiri Olsa
Cc: Mark Rutland
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200325124536.2800725-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2020-04-03 20:37:55 +0800

24 Mar, 2020

2 commits

5e3b810aa perf report: Support a new key to reload the browser ... Browse Code »

Sometimes we may need to reload the browser to update the output since
some options are changed.

This patch creates a new key K_RELOAD. Once the __cmd_report() returns
K_RELOAD, it would repeat the whole process, such as, read samples from
data file, sort the data and display in the browser.

v5:
---
1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
2. Skip setup_sorting() in repeat path if last key is K_RELOAD.

v4:
---
Need to quit in perf_evsel_menu__run if key is K_RELOAD.

Signed-off-by: Jin Yao
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2020-03-24 20:37:27 +0800
429a5f9d8 perf report: Allow specifying event to be used as sort key in --group output ... Browse Code »

When performing "perf report --group", it shows the event group
information together. By default, the output is sorted by the first
event in group.

It would be nice for user to select any event for sorting. This patch
introduces a new option "--group-sort-idx" to sort the output by the
event at the index n in event group.

For example,

Before:

# perf report --group --stdio

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
# Event count (approx.): 6451235635
#
# Overhead Command Shared Object Symbol
# ................................ ......... ....................... ...................................
#
92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1
3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515
1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7
1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce
1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair
0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single
0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle
0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check
0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt
...

After:

# perf report --group --stdio --group-sort-idx 3

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
# Event count (approx.): 6451235635
#
# Overhead Command Shared Object Symbol
# ................................ ......... ....................... ...................................
#
92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1
0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle
3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515
0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns
1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7
0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single
0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr
0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt
0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write
0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se
0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] scheduler_tick

Now the output is sorted by the fourth event in group.

v7:
---
Rebase to latest perf/core, no other change.

v4:
---
1. Update Documentation/perf-report.txt to mention
'--group-sort-idx' support multiple groups with different
amount of events and it should be used on grouped events.

2. Update __hpp__group_sort_idx(), just return when the
idx is out of limit.

3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups.
So now we don't need to use together with --group.

v3:
---
Refine the code in __hpp__group_sort_idx().

Before:
for (i = 1; i < nr_members; i++) {
if (i == idx) {
ret = field_cmp(fields_a[i], fields_b[i]);
if (ret)
goto out;
}
}

After:
if (idx >= 1 && idx < nr_members) {
ret = field_cmp(fields_a[idx], fields_b[idx]);
if (ret)
goto out;
}

Signed-off-by: Jin Yao
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com
[ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ]
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2020-03-24 20:37:27 +0800

18 Mar, 2020

1 commit

c3b10649a perf report: Fix no branch type statistics report issue ... Browse Code »

Previously we could get the report of branch type statistics.

For example:

# perf record -j any,save_type ...
# t perf report --stdio

#
# Branch Statistics:
#
COND_FWD: 40.6%
COND_BWD: 4.1%
CROSS_4K: 24.7%
CROSS_2M: 12.3%
COND: 44.7%
UNCOND: 0.0%
IND: 6.1%
CALL: 24.5%
RET: 24.7%

But now for the recent perf, it can't report the branch type statistics.

It's a regression issue caused by commit 40c39e304641 ("perf report: Fix
a no annotate browser displayed issue"), which only counts the branch
type statistics for browser mode.

This patch moves the branch_type_count() outside of ui__has_annotation()
checking, then branch type statistics can work for stdio mode.

Fixes: 40c39e304641 ("perf report: Fix a no annotate browser displayed issue")
Signed-off-by: Jin Yao
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200313134607.12873-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2020-03-18 05:01:40 +0800

10 Mar, 2020

1 commit

cca0cc76f perf block-info: Allow selecting which columns to report and its order ... Browse Code »

Currently we use a predefined array to set the block info output
formats, it's fixed and inflexible.

This patch adds two parameters "block_hpps" and "nr_hpps" in
block_info__create_report and other static functions, in order to let
user decide which columns to report and with specified report ordering.
It should be more flexible.

Buffers will be allocated to contain the new fmts, of course, we need to
release them before perf exits.

Signed-off-by: Jin Yao
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2020-03-10 08:43:25 +0800

27 Feb, 2020

1 commit

7384083ba perf annotate: Make perf config effective ... Browse Code »

perf default config set by user in [annotate] section is totally ignored
by annotate code. Fix it.

Before:

$ ./perf config
annotate.hide_src_code=true
annotate.show_nr_jumps=true
annotate.show_nr_samples=true

$ ./perf annotate shash
│ unsigned h = 0;
│ movl $0x0,-0xc(%rbp)
│ while (*s)
│ ↓ jmp 44
│ h = 65599 * h + *s++;
11.33 │24: mov -0xc(%rbp),%eax
43.50 │ imul $0x1003f,%eax,%ecx
│ mov -0x18(%rbp),%rax

After:

│ movl $0x0,-0xc(%rbp)
│ ↓ jmp 44
1 │1 24: mov -0xc(%rbp),%eax
4 │ imul $0x1003f,%eax,%ecx
│ mov -0x18(%rbp),%rax

Note that we have removed show_nr_samples and show_total_period from
annotation_options because they are not used. Instead of them we use
symbol_conf.show_nr_samples and symbol_conf.show_total_period.

Committer testing:

Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:

# perf config
#
# perf config annotate.hide_src_code=true
# perf config
annotate.hide_src_code=true
#
# perf config annotate.show_nr_jumps=true
# perf config annotate.show_nr_samples=true
# perf config
annotate.hide_src_code=true
annotate.show_nr_jumps=true
annotate.show_nr_samples=true
#
#

Before:

# perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
Percent
00000000000609f0 :
endbr64
cmpq $0x0,0x20(%rdi)
↓ je 10
xor %eax,%eax
← retq
xchg %ax,%ax
100.00 10: push %rbp
cmpq $0x0,0x18(%rdi)
mov %rdi,%rbp
↓ jne 20
1b: xor %eax,%eax
pop %rbp
← retq
nop
20: lea 0x18(%rdi),%rdi
→ callq JS_UpdateWeakPointerAfterGC(JS::Heap /dev/null
Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
Samples endbr64
cmpq $0x0,0x20(%rdi)
↓ je 10
xor %eax,%eax
← retq
xchg %ax,%ax
1 1 10: push %rbp
cmpq $0x0,0x18(%rdi)
mov %rdi,%rbp
↓ jne 20
1 1b: xor %eax,%eax
pop %rbp
← retq
nop
1 20: lea 0x18(%rdi),%rdi
→ callq JS_UpdateWeakPointerAfterGC(JS::Heap /dev/null
Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
Samples endbr64
cmpq $0x0,0x20(%rdi)
↓ je 10
xor %eax,%eax
← retq
xchg %ax,%ax
1 10: push %rbp
cmpq $0x0,0x18(%rdi)
mov %rdi,%rbp
↓ jne 20
1b: xor %eax,%eax
pop %rbp
← retq
nop
20: lea 0x18(%rdi),%rdi
→ callq JS_UpdateWeakPointerAfterGC(JS::Heap
Tested-by: Arnaldo Carvalho de Melo
Cc: Adrian Hunter
Cc: Alexey Budankov
Cc: Changbin Du
Cc: Ian Rogers
Cc: Jin Yao
Cc: Jiri Olsa
Cc: Leo Yan
Cc: Namhyung Kim
Cc: Song Liu
Cc: Taeung Song
Cc: Thomas Richter
Cc: Yisheng Xie
Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo

Ravi Bangoria
2020-02-27 21:44:59 +0800

14 Jan, 2020

3 commits

c3314a74f perf report: Fix no libunwind compiled warning break s390 issue ... Browse Code »

Commit 800d3f561659 ("perf report: Add warning when libunwind not
compiled in") breaks the s390 platform. S390 uses libdw-dwarf-unwind for
call chain unwinding and had no support for libunwind.

So the warning "Please install libunwind development packages during the
perf build." caused the confusion even if the call-graph is displayed
correctly.

This patch adds checking for HAVE_DWARF_SUPPORT, which is set when
libdw-dwarf-unwind is compiled in.

Fixes: 800d3f561659 ("perf report: Add warning when libunwind not compiled in")
Signed-off-by: Jin Yao
Reviewed-by: Thomas Richter
Tested-by: Thomas Richter
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20200107191745.18415-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2020-01-14 23:02:19 +0800
3b0b16bf8 perf tools: Support --prefix/--prefix-strip ... Browse Code »

The objdump utility has useful --prefix / --prefix-strip options to
allow changing source code file names hardcoded into executables' debug
info. Add options to 'perf report', 'perf top' and 'perf annotate',
which are then passed to objdump.

$ mkdir foo
$ echo 'main() { for (;;); }' > foo/foo.c
$ gcc -g foo/foo.c
foo/foo.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
1 | main() { for (;;); }
| ^~~~
$ perf record ./a.out
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.230 MB perf.data (5721 samples) ]
$ mv foo bar
$ perf annotate

$ perf annotate --prefix=/home/ak/lsrc/git/bar --prefix-strip=5

Signed-off-by: Andi Kleen
Tested-by: Jiri Olsa
LPU-Reference: 20200107210444.214071-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2020-01-14 23:02:19 +0800
aa9d1f833 perf report: Clarify in help that --children is default ... Browse Code »

Refer to --no-children, which is what most people probably want.

Signed-off-by: Andi Kleen
Cc: Jiri Olsa
LPU-Reference: 20200103183643.149150-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo

Andi Kleen
2020-01-14 23:02:19 +0800

21 Dec, 2019

1 commit

0feba17bd perf report: Fix incorrectly added dimensions as switch perf data file ... Browse Code »

We observed an issue that was some extra columns displayed after switching
perf data file in browser. The steps to reproduce:

1. perf record -a -e cycles,instructions -- sleep 3
2. perf report --group
3. In browser, we use hotkey 's' to switch to another perf.data
4. Now in browser, the extra columns 'Self' and 'Children' are displayed.

The issue is setup_sorting() executed again after repeat path, so dimensions
are added again.

This patch checks the last key returned from __cmd_report(). If it's
K_SWITCH_INPUT_DATA, skips the setup_sorting().

Fixes: ad0de0971b7f ("perf report: Enable the runtime switching of perf data file")
Signed-off-by: Jin Yao
Tested-by: Arnaldo Carvalho de Melo
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Feng Tang
Cc: Jin Yao
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20191220013722.20592-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2019-12-21 05:49:27 +0800

04 Dec, 2019

1 commit

bb30acae4 perf report: Bail out --mem-mode if mem info is not available ... Browse Code »

If perf.data is recorded without -d, don't allow user to use --mem-mode
with 'perf report'. symbol_daddr and phys_daddr can be recorded
separately and may be present in the perf.data but at the report time
they are associated with mem-mode fields and thus this restriction
applies to them as well.

Before:
$ perf record ls
$ perf report --mem-mode --stdio
# Overhead Local Weight Memory access Symbol
# ........ ............ ............. .......................
55.56% 0 N/A [k] 0xffffffff81a00ae7

After:
$ perf report --mem-mode --stdio
Error:
Selected --mem-mode but no mem data. Did you call perf record without -d?

Suggested-by: Arnaldo Carvalho de Melo
Signed-off-by: Ravi Bangoria
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Kan Liang
Cc: Mark Rutland
Cc: Namhyung Kim
Link: http://lore.kernel.org/lkml/20191114132213.5419-4-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo

Ravi Bangoria
2019-12-04 23:34:02 +0800

26 Nov, 2019

2 commits

fe87797de perf thread: Rename thread->mg to thread->maps ... Browse Code »

One more step on the merge of 'struct maps' with 'struct map_groups'.

Cc: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-69vcr8pubpym90skxhmbwhiw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-11-26 22:07:46 +0800
79b6bb73f perf maps: Merge 'struct maps' with 'struct map_groups' ... Browse Code »

And pick the shortest name: 'struct maps'.

The split existed because we used to have two groups of maps, one for
functions and one for variables, but that only complicated things,
sometimes we needed to figure out what was at some address and then had
to first try it on the functions group and if that failed, fall back to
the variables one.

That split is long gone, so for quite a while we had only one struct
maps per struct map_groups, simplify things by combining those structs.

First patch is the minimum needed to merge both, follow up patches will
rename 'thread->mg' to 'thread->maps', etc.

Cc: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-11-26 22:07:46 +0800

20 Nov, 2019

3 commits

848a5e507 perf report: Jump to symbol source view from total cycles view ... Browse Code »

This patch supports jumping from tui total cycles view to symbol source
view.

For example,

perf record -b ./div
perf report --total-cycles

In total cycles view, we can select one entry and press 'a' or press
ENTER key to jump to symbol source view.

This patch also sets sort_order to NULL in cmd_report() which will use
the default branch sort order. The percent value in new annotate view
will be consistent with the percent in annotate view switched from perf
report (we observed the original percent gap with previous patches).

v2:
---
Fix the 'make NO_SLANG=1' error. (set __maybe_unused to
annotation_opts in block_hists_tui_browse()).

Signed-off-by: Jin Yao
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2019-11-20 06:37:04 +0800
0e3149f86 perf dso: Move dso_id from 'struct map' to 'struct dso' ... Browse Code »

And take it into account when looking up DSOs when we have the dso_id
fields obtained from somewhere, like from PERF_RECORD_MMAP2 records.

Instances of struct map pointing to the same DSO pathname but with
anything in dso_id different are in fact different DSOs, so better have
different 'struct dso' instances to reflect that. At some point we may
want to get copies of the contents of the different objects if we want
to do correct annotation or other analysis.

With this we get 'struct map' 24 bytes leaner:

$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */

/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

u32 prot; /* 44 4 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 (*map_ip)(struct map *, u64); /* 64 8 */
u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
struct dso * dso; /* 80 8 */
refcount_t refcnt; /* 88 4 */
u32 flags; /* 92 4 */

/* size: 96, cachelines: 2, members: 13 */
/* sum members: 92, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
$

Cc: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-11-20 06:12:26 +0800
99459a84d perf map: Move maj/min/ino/ino_generation to separate struct ... Browse Code »

And this patch highlights where these fields are being used: in the sort
order where it uses it to compare maps and classify samples taking into
account not just the DSO, but those DSO id fields.

I think these should be used to differentiate DSOs with the same name
but different 'struct dso_id' fields, i.e. these fields should move to
'struct dso' and then be used as part of the key when doing lookups for
DSOs, in addition to the DSO name.

Cc: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-8v5isitqy0dup47nnwkpc80f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-11-20 02:09:26 +0800

12 Nov, 2019

1 commit

297548945 perf annotate: Pass a 'map_symbol' in places receiving a pair of 'map' and 'symbol' pointers ... Browse Code »

We are already passing things like:

symbol__annotate(ms->sym, ms->map, ...)

So shorten the signature of such functions to receive the 'map_symbol'
pointer.

This also paves the way to having the 'struct map_groups' pointer in the
'struct map_symbol' so that we can get rid of 'struct map'->groups.

Cc: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-23yx8v1t41nzpkpi7rdrozww@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-11-12 19:20:53 +0800

07 Nov, 2019

5 commits

7fa46cbf2 perf report: Sort by sampled cycles percent per block for tui ... Browse Code »

Previous patch has implemented a new option "--total-cycles". But only
stdio mode is supported.

This patch supports the tui mode and support '--percent-limit'.

For example,

perf record -b ./div
perf report --total-cycles --percent-limit 1

# Samples: 2753248 of event 'cycles'
Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so

--------------------------------------------------

v7:
---
1. Since we have used use_browser in report__browse_block_hists
to support stdio mode, now we also add supporting for tui.

2. Move block tui browser code from ui/browsers/hists.c
to block-info.c.

v6:
---
Create report__tui_browse_block_hists in block-info.c
(codes are moved from builtin-report.c).

v5:
---
Fix a crash issue when running perf report without
'--total-cycles'. The issue is because the internal flag
is renamed from 'total_cycles' to 'total_cycles_mode' in
previous patch but this patch still uses 'total_cycles'
to check if the '--total-cycles' option is enabled, which
causes the code to be inconsistent.

v4:
---
Since the block collection is moved out of printing in
previous patch, this patch is updated accordingly for
tui supporting.

v3:
---
Minor change since the function name is changed:
block_total_cycles_percent -> block_info__total_cycles_percent

Signed-off-by: Jin Yao
Reviewed-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2019-11-07 21:14:48 +0800
0b49f8365 perf report: Support --percent-limit for --total-cycles ... Browse Code »

We have already supported the '--total-cycles' option in previous patch.
It's also useful to show entries only above a threshold percent.

This patch enables '--percent-limit' for not showing entries
under that percent.

For example:

perf report --total-cycles --stdio --percent-limit 1

# To display the
#
#
# Total Lost Samples: 0
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 2753248
#
# Sampled Cycles%
# ...............
#
26.04%
15.17%
5.11%
4.87%
4.53%
3.85%
3.08%
3.06%
2.78%
2.52%
2.36%
2.33%
2.28%
2.20%
1.98%
1.57%
1.44% perf.data header info, please use --header/--header-only options. Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object .............. ........... .......... ................................................................. .................... 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so

Committer testing:

From second exapmple onwards slightly edited for brevity:

# perf report --total-cycles --percent-limit 2 --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6M of event 'cycles'
# Event count (approx.): 6299936
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ...................................................................... ....................
#
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
#
# (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
#
# perf report --total-cycles --percent-limit 1 --stdio
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
#
# perf report --total-cycles --percent-limit 0.7 --stdio
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
#

-------------------------------------------

It only shows the entries which 'Sampled Cycles%' > 1%.

v7:
---
No functional change. Only fix the conflict issue because
previous patches are changed.

v6:
---
No functional change. Only fix the conflict issue because
previous patches are changed.

v5:
---
No functional change. Only fix the conflict issue because
previous patches are changed.

v4:
---
No functional change. Only fix the build issue because
previous patches are changed.

Signed-off-by: Jin Yao
Reviewed-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2019-11-07 21:14:48 +0800
6f7164fa2 perf report: Sort by sampled cycles percent per block for stdio ... Browse Code »

It would be useful to support sorting for all blocks by the sampled
cycles percent per block. This is useful to concentrate on the globally
hottest blocks.

This patch implements a new option "--total-cycles" which sorts all
blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:

percent = block sampled cycles aggregation / total sampled cycles

Note that, this patch only supports "--stdio" mode.

For example,

# perf record -b ./div
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 2753248
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ................................................ .................
#
26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so
0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so
0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms]
0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms]
0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms]
0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms]
0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms]
0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms]
0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms]
0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms]
0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms]

Committer testing:

This should provide material for hours of endless joy, both from looking
for suspicious things in the implementation of this patch, such as the
top one:

# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]

As well from things that look legit:

# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]

:-)

Very short system wide taken branches session:

# perf record -h -b

Usage: perf record [] []
or: perf record [] -- []

-b, --branch-any sample any taken branches

#
# perf record -b
^C[ perf record: Woken up 595 times to write data ]
[ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]

#
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
#
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6M of event 'cycles'
# Event count (approx.): 6299936
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ...................................................................... ....................
#
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux]
0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux]
0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux]
0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux]
0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux]
0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux]
0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux]
0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux]
0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux]
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux]
0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux]
0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux]
0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux]
0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux]
0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux]
0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux]
0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux]
0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux]
0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux]
0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux]
0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux]
0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux]
0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux]

-----------------------------------------------------------

v7:
---
Use use_browser in report__browse_block_hists for supporting
stdio and potential tui mode.

v6:
---
Create report__browse_block_hists in block-info.c (codes are
moved from builtin-report.c). It's called from
perf_evlist__tty_browse_hists.

v5:
---
1. Move all block functions to block-info.c

2. Move the code of setting ms in block hist_entry to
other patch.

v4:
---
1. Use new option '--total-cycles' to replace
'-s total_cycles' in v3.

2. Move block info collection out of block info
printing.

v3:
---
1. Use common function block_info__process_sym to
process the blocks per symbol.

2. Remove the nasty hack for skipping calculation
of column length

3. Some minor cleanup

Signed-off-by: Jin Yao
Reviewed-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2019-11-07 21:14:48 +0800
7841f40ae perf hist: Count the total cycles of all samples ... Browse Code »

We can get the per sample cycles by hist__account_cycles(). It's also
useful to know the total cycles of all samples in order to get the
cycles coverage for a single program block in further. For example:

coverage = per block sampled cycles / total sampled cycles

This patch creates a new argument 'total_cycles' in hist__account_cycles(),
which will be added with the cycles of each sample.

Signed-off-by: Jin Yao
Reviewed-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jin Yao
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2019-11-07 20:14:15 +0800
8efc4f056 perf maps: Add for_each_entry()/_safe() iterators ... Browse Code »

To reduce boilerplate, provide a more compact form using an idiom
present in other trees of data structures.

Cc: Adrian Hunter
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-11-07 02:49:25 +0800

15 Oct, 2019

1 commit

800d3f561 perf report: Add warning when libunwind not compiled in ... Browse Code »

We received a user report that call-graph DWARF mode was enabled in
'perf record' but 'perf report' didn't unwind the callstack correctly.
The reason was, libunwind was not compiled in.

We can use 'perf -vv' to check the compiled libraries but it would be
valuable to report a warning to user directly (especially valuable for
a perf newbie).

The warning is:

Warning:
Please install libunwind development packages during the perf build.

Both TUI and stdio are supported.

Signed-off-by: Jin Yao
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Peter Zijlstra
Link: http://lore.kernel.org/lkml/20191011022122.26369-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Jin Yao
2019-10-15 19:36:22 +0800

21 Sep, 2019

1 commit

6ef81c55a perf session: Return error code for perf_session__new() function on failure ... Browse Code »

This patch is to return error code of perf_new_session function on
failure instead of NULL.

Test Results:

Before Fix:

$ perf c2c report -input
failed to open nput: No such file or directory

$ echo $?
0
$

After Fix:

$ perf c2c report -input
failed to open nput: No such file or directory

$ echo $?
254
$

Committer notes:

Fix 'perf tests topology' case, where we use that TEST_ASSERT_VAL(...,
session), i.e. we need to pass zero in case of failure, which was the
case before when NULL was returned by perf_session__new() for failure,
but now we need to negate the result of IS_ERR(session) to respect that
TEST_ASSERT_VAL) expectation of zero meaning failure.

Reported-by: Nageswara R Sastry
Signed-off-by: Mamatha Inamdar
Tested-by: Arnaldo Carvalho de Melo
Tested-by: Nageswara R Sastry
Acked-by: Ravi Bangoria
Reviewed-by: Jiri Olsa
Reviewed-by: Mukesh Ojha
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexey Budankov
Cc: Greg Kroah-Hartman
Cc: Jeremie Galarneau
Cc: Kate Stewart
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Shawn Landden
Cc: Song Liu
Cc: Thomas Gleixner
Cc: Tzvetomir Stoyanov
Link: http://lore.kernel.org/lkml/20190822071223.17892.45782.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo

Mamatha Inamdar
2019-09-21 02:58:11 +0800

20 Sep, 2019

1 commit

fb71c86cc perf tools: Remove util.h from where it is not needed ... Browse Code »

Check that it is not needed and remove, fixing up some fallout for
places where it was only serving to get something else.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-9h6dg6lsqe2usyqjh5rrues4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-09-20 20:19:20 +0800

01 Sep, 2019

1 commit

d3300a3c4 perf symbols: Move mem_info and branch_info out of symbol.h ... Browse Code »

The mem_info struct goes to mem-events.h and branch_info goes to
branch.h, where they belong, this way we can remove several headers from
symbols.h and trim the include dependency tree more.

Cc: Adrian Hunter
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: https://lkml.kernel.org/n/tip-aupw71xnravcsu2xoabfmhpc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2019-09-01 09:27:48 +0800