24 May, 2016
5 commits
-
record__mmap_read() writes data from ring buffer into perf.data. 'head'
is maintained by the kernel, points to the last written record.
'old' is maintained by perf, points to the record read in previous
round. record__mmap_read() saves data from 'old' to 'head' to
perf.data.The names of these variables are not very intutive. In addition,
when dealing with backward writing ring buffer, the md->prev pointer
should point to 'head' instead of the last byte it got.Add 'start' and 'end' pointer to make code clear and set md->prev to
'head' instead of the moved 'old' pointer. This patch doesn't change
behavior since:buf = &data[old & md->mask];
size = head - old;
old += size;
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: Zefan Li
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1463987628-163563-4-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang
Signed-off-by: Arnaldo Carvalho de Melo -
When record__mmap_read() requires data more than the size of ring
buffer, drop those data to avoid accessing invalid memory.This can happen when reading from overwritable ring buffer, which
should be avoided. However, check this for robustness.Signed-off-by: Wang Nan
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: Zefan Li
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1463987628-163563-3-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang
Signed-off-by: Arnaldo Carvalho de Melo -
perf_evlist__toggle_{pause,resume}() are introduced to pause/resume
events in an evlist. Utilize PERF_EVENT_IOC_PAUSE_OUTPUT ioctl.Following commits use them to ensure overwrite ring buffer is paused
before reading.Signed-off-by: Wang Nan
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: Zefan Li
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1463987628-163563-2-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang
[ Return -1, like all other ioctl() usage in evlist.c, rename 'pause'
arg to avoid breaking the build on ubuntu 12.04 and other old systems ]
Signed-off-by: Arnaldo Carvalho de Melo -
Auto-attach the ptr->name beautifier to syscall args "filename", "path"
and "pathname" if they are of type "const char *".Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-jxii4qmcgoppftv0zdvml9d7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Noticed when the 'setsockopt' 'fd' arg wasn't being formatted via
the SCA_FD beautifier, so just remove the setting of "fd" args to
SCA_FD and do it when reading the syscall info, like we do for
args of type "pid_t", i.e. "fd" as the name should be enough as
the decision to use the SFA_FD beautifier. For odd cases we can
just do it explicitely.Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-0qissgetiuqmqyj4b6ancmpn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
23 May, 2016
1 commit
-
Add "srcline_from" and "srcline_to" branch sort keys that allow to show
the source lines of a branch.That makes it much easier to track down where particular branches happen
in the program, for example to examine branch mispredictions, or to
associate it with cycle counts:% perf record -b -e cycles:p ./tcall
% perf report --sort srcline_from,srcline_to,mispredict
...
15.10% tcall.c:18 tcall.c:10 N
14.83% tcall.c:11 tcall.c:5 N
14.12% tcall.c:7 tcall.c:12 N
14.04% tcall.c:12 tcall.c:5 N
12.42% tcall.c:17 tcall.c:18 N
12.39% tcall.c:7 tcall.c:13 N
12.27% tcall.c:13 tcall.c:17 N
...% perf report --sort srcline_from,srcline_to,cycles
...
17.12% tcall.c:18 tcall.c:11 1
17.01% tcall.c:12 tcall.c:6 1
16.98% tcall.c:11 tcall.c:6 1
15.91% tcall.c:17 tcall.c:18 1
6.38% tcall.c:7 tcall.c:17 7
4.80% tcall.c:7 tcall.c:12 8
4.21% tcall.c:7 tcall.c:17 8
2.67% tcall.c:7 tcall.c:12 7
2.62% tcall.c:7 tcall.c:12 10
2.10% tcall.c:7 tcall.c:17 9
1.58% tcall.c:7 tcall.c:12 6
1.44% tcall.c:7 tcall.c:12 5
1.38% tcall.c:7 tcall.c:12 9
1.06% tcall.c:7 tcall.c:17 13
1.05% tcall.c:7 tcall.c:12 4
1.01% tcall.c:7 tcall.c:17 6Open issues:
- Some kernel symbols get misresolved.
Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Link: http://lkml.kernel.org/r/1463775308-32748-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo
21 May, 2016
3 commits
-
Add a fd field into struct perf_mmap so that perf can track the mmap fd.
This feature will be used for toggling overwrite ring buffers.
Signed-off-by: Wang Nan
Cc: He Kuang
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: Zefan Li
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1463762315-155689-3-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang
Signed-off-by: Arnaldo Carvalho de Melo -
Add 'overwrite' attribute to evsel to mark whether this event is
overwritable. The following commits will support syntax like:# perf record -e cycles/overwrite/ ...
An overwritable evsel requires kernel support for the
perf_event_attr.write_backward ring buffer feature.Add it to perf_missing_feature.
Signed-off-by: Wang Nan
Cc: He Kuang
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: Zefan Li
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1463762315-155689-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo -
…ernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- We should not use the current value of the kernel.perf_event_max_stack as the
default value for --max-stack in tools that can process perf.data files, they
will only match if that sysctl wasn't changed from its default value at the
time the perf.data file was recorded, fix it.This fixes a bug where a 'perf record -a --call-graph dwarf ; perf report'
produces a glibc invalid free backtrace (Arnaldo Carvalho de Melo)- Provide a better warning when running 'perf trace' on a system where the
kernel.kptr_restrict is set to 1, similar to the one produced by 'perf record',
noticed on ubuntu 16.04 where this is the default kptr_restrict setting.
(Arnaldo Carvalho de Melo)- Fix ordering of instructions in the annotation code, noticed when annotating
ARM binaries, now that table is auto-ordered at first use, to avoid more such
problems (Chris Ryder)- Set buildid dir under symfs when --symfs is provided (He Kuang)
- Fix the 'exit_group()' syscall output in 'perf trace' (Arnaldo Carvalho de Melo)
- Only auto set call-graph to "dwarf" in 'perf trace' when syscalls are being
traced (Arnaldo Carvalho de Melo)Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
20 May, 2016
12 commits
-
This patch moves the reference of buildid dir to 'symfs/.debug' and
skips the local buildid dir when '--symfs' is given, so that every
single file opened by perf is relative to symfs directory now.Signed-off-by: He Kuang
Acked-by: David Ahern
Acked-by: Jiri Olsa
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: Ekaterina Tumanova
Cc: Josh Poimboeuf
Cc: Kan Liang
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Sukadev Bhattiprolu
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1463658462-85131-2-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo -
When --min-stack or --max-stack is passwd but --no-syscalls is also in
effect, there is no point in automatically setting '--call-graph dwarf'.Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-pq922i7h9wef0pho1dqpttvn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Currently the list of instructions recognised by perf annotate has to be
explicitly written in sorted order. This makes it easy to make mistakes
when adding new instructions. Sort the list of instructions on first
access.Signed-off-by: Chris Ryder
Acked-by: Pawel Moll
Cc: Alexander Shishkin
Cc: Mark Rutland
Cc: Peter Zijlstra
Cc: Will Deacon
Link: http://lkml.kernel.org/r/4268febaf32f47f322c166fb2fe98cfec7041e11.1463676839.git.chris.ryder@arm.com
Signed-off-by: Arnaldo Carvalho de Melo -
The ARM blt and bls instructions are not correctly identified when
parsing assembly because the list of recognised instructions must be
sorted by name. Swap the ordering of blt and bls.Signed-off-by: Chris Ryder
Acked-by: Pawel Moll
Cc: Alexander Shishkin
Cc: Mark Rutland
Cc: Peter Zijlstra
Cc: Will Deacon
Link: http://lkml.kernel.org/r/560e196b7c79b7ff853caae13d8719a31479cb1a.1463676839.git.chris.ryder@arm.com
Signed-off-by: Arnaldo Carvalho de Melo -
We cannot limit processing stacks from the current value of the sysctl,
as we may be processing perf.data files, possibly from other machines.Instead use the old PERF_MAX_STACK_DEPTH, the sysctl default, that can
be overriden using --max-stack or equivalent.Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Brendan Gregg
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: He Kuang
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Wang Nan
Cc: Zefan Li
Fixes: 4cb93446c587 ("perf tools: Set the maximum allowed stack from /proc/sys/kernel/perf_event_max_stack")
Link: http://lkml.kernel.org/n/tip-eqeutsr7n7wy0c36z24ytvii@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
As thread__resolve_callchain_sample can be used for handling perf.data
files, that could've been recorded with a large max_stack sysctl setting
than what the system used for analysis has set.Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Brendan Gregg
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: He Kuang
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Wang Nan
Cc: Zefan Li
Link: http://lkml.kernel.org/n/tip-2995bt2g5yq2m05vga4kip6m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
This doesn't return, so there is no raw_syscalls:sys_exit for it, add
the ending ')', without any return value, since it is void.Reported-by: Ingo Molnar
Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-vh2mii0g4qlveuc4joufbipu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Its now there, no need to have it too.
Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-y18oeou494uy11im7u9to0dx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Hook into the libtraceevent plugin kernel symbol resolver to warn the
user that that can't happen with kptr_restrict=1.Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-9gc412xx1gl0lvqj1d1xwlyb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
This means the user can't access /proc/kallsyms, for instance, because
/proc/sys/kernel/kptr_restrict is set to 1.Instead leave the ref_reloc_sym as NULL and code using it will cope.
This allows 'perf trace' to work on such systems for !root, the only
issue would be when trying to resolve kernel symbols, which happens,
for instance, in some libtracevent plugins. A warning for that case
will be provided in the next patch in this series.Noticed in Ubuntu 16.04, that comes with kptr_restrict=1.
Reported-by: Milian Wolff
Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-knpu3z4iyp2dxpdfm798fac4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Remove an extraneous space to fix up indentation. Trivial and no
functional changeSigned-off-by: Colin Ian King
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Quentin Casasnovas
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1463503215-18339-1-git-send-email-colin.king@canonical.com
Signed-off-by: Ingo Molnar -
…ernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Honour the kernel.perf_event_max_stack knob more precisely by not counting
PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
- Store vdso buildid unconditionally, as it appears in callchains and
we're not checking those when creating the build-id table, so we
end up not being able to resolve VDSO symbols when doing analysis
on a different machine than the one where recording was done, possibly
of a different arch even (arm -> x86_64) (He Kuang)Infrastructure changes:
- Generalize max_stack sysctl handler, will be used for configuring
multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)Cleanups:
- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
open coded strings (Masami Hiramatsu)Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
19 May, 2016
1 commit
-
The move of the x86 perf implementation forgot to update the MAINTAINERS
F(ile) pattern.Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Borislav Petkov
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: fa9cbf320e99 ("perf/x86: Move perf_event.c ............... => x86/events/core.c")
Link: http://lkml.kernel.org/r/20160519103019.GJ3206@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar
18 May, 2016
1 commit
-
When booting with nr_cpus=1, uncore_pci_probe tries to init the PCI/uncore
also for the other packages and fails with warning when they are not found.The warning is bogus because it's correct to fail here for packages which are
not initialized. Remove it and return silently.Fixes: cf6d445f6897 "perf/x86/uncore: Track packages, not per CPU data"
Signed-off-by: Jiri Olsa
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra
Signed-off-by: Thomas Gleixner
17 May, 2016
17 commits
-
The perf_sample->ip_callchain->nr value includes all the entries in the
ip_callchain->ip[] array, real addresses and PERF_CONTEXT_{KERNEL,USER,etc},
while what the user expects is that what is in the kernel.perf_event_max_stack
sysctl or in the upcoming per event perf_event_attr.sample_max_stack knob be
honoured in terms of IP addresses in the stack trace.So match the kernel support and validate chain->nr taking into account
both kernel.perf_event_max_stack and kernel.perf_event_max_contexts_per_stack.Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Brendan Gregg
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: He Kuang
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Wang Nan
Cc: Zefan Li
Link: http://lkml.kernel.org/n/tip-mgx0jpzfdq4uq4abfa40byu0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
The perf_sample->ip_callchain->nr value includes all the entries in the
ip_callchain->ip[] array, real addresses and PERF_CONTEXT_{KERNEL,USER,etc},
while what the user expects is that what is in the kernel.perf_event_max_stack
sysctl or in the upcoming per event perf_event_attr.sample_max_stack knob be
honoured in terms of IP addresses in the stack trace.So allocate a bunch of extra entries for contexts, and do the accounting
via perf_callchain_entry_ctx struct members.A new sysctl, kernel.perf_event_max_contexts_per_stack is also
introduced for investigating possible bugs in the callchain
implementation by some arch.Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Brendan Gregg
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: He Kuang
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Wang Nan
Cc: Zefan Li
Link: http://lkml.kernel.org/n/tip-3b4wnqk340c4sg4gwkfdi9yk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
We need have different helpers to account how many contexts we have in
the sample and for real addresses, so do it now as a prep patch, to
ease review.Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-q964tnyuqrxw5gld18vizs3c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
We will use it to count how many addresses are in the entry->ip[] array,
excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
return the number of entries specified by the user via the relevant
sysctl, kernel.perf_event_max_contexts, or via the per event
perf_event_attr.sample_max_stack knob.This way we keep the perf_sample->ip_callchain->nr meaning, that is the
number of entries, be it real addresses or PERF_CONTEXT_ entries, while
honouring the max_stack knobs, i.e. the end result will be max_stack
entries if we have at least that many entries in a given stack trace.Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
This makes perf_callchain_{user,kernel}() receive the max stack
as context for the perf_callchain_entry, instead of accessing
the global sysctl_perf_event_max_stack.Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Brendan Gregg
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: He Kuang
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Masami Hiramatsu
Cc: Milian Wolff
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Wang Nan
Cc: Zefan Li
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
So that it can be used for other stack related knobs, such as the
upcoming one to tweak the max number of of contexts per stack sample.In all those cases we can only change the value if there are no perf
sessions collecting stacks, so they need to grab that mutex, etc.Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-8t3fk94wuzp8m2z1n4gc0s17@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Instead of using a raw string, use DSO__NAME_KALLSYMS and
DSO__NAME_KCORE macros for kallsyms and kcore.Signed-off-by: Masami Hiramatsu
Cc: Ananth N Mavinakayanahalli
Cc: Brendan Gregg
Cc: Hemant Kumar
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20160515031935.4017.50971.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo -
Currently 'perf stat' always counts task-clock event by default. But
it's somewhat confusing for system-wide targets (especially with 'sleep
N' as the 'sleep' task just sleeps and doesn't use cputime). Changing
to cpu-clock event instead for that case makes more sense IMHO.Before:
# perf stat -a sleep 0.1Performance counter stats for 'system wide':
403.038603 task-clock (msec) # 4.001 CPUs utilized
150 context-switches # 0.372 K/sec
7 cpu-migrations # 0.017 K/sec
71 page-faults # 0.176 K/sec
23,705,169 cycles # 0.059 GHz
15,888,166 instructions # 0.67 insn per cycle
3,326,078 branches # 8.253 M/sec
87,643 branch-misses # 2.64% of all branches0.100737009 seconds time elapsed
#
After:
# perf stat -a sleep 0.1
Performance counter stats for 'system wide':
404.271182 cpu-clock (msec) # 4.000 CPUs utilized
143 context-switches # 0.354 K/sec
13 cpu-migrations # 0.032 K/sec
73 page-faults # 0.181 K/sec
22,119,220 cycles # 0.055 GHz
13,622,065 instructions # 0.62 insn per cycle
2,918,769 branches # 7.220 M/sec
85,033 branch-misses # 2.91% of all branches0.101073089 seconds time elapsed
#
Signed-off-by: Namhyung Kim
Tested-by: Arnaldo Carvalho de Melo
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1463119263-5569-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Currently only the task-clock event updates the runtime_nsec so it
cannot show the metric when using cpu-clock events. However cpu clock
works basically same as task-clock, so no need to not update the runtime
IMHO.Before:
# perf stat -a -e cpu-clock,context-switches,page-faults,cycles sleep 0.1
Performance counter stats for 'system wide':
1217.759506 cpu-clock (msec)
93 context-switches
61 page-faults
18,958,022 cycles0.101393794 seconds time elapsed
After:
Performance counter stats for 'system wide':
1220.471884 cpu-clock (msec) # 12.013 CPUs utilized
118 context-switches # 0.097 K/sec
59 page-faults # 0.048 K/sec
17,941,247 cycles # 0.015 GHz0.101594777 seconds time elapsed
Signed-off-by: Namhyung Kim
Tested-by: Arnaldo Carvalho de Melo
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1463119263-5569-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
The commit 140aeadc1fb5 ("perf stat: Abstract stat metrics printing")
changed how shadow metrics are printed, but it missed to update the
width of the stalled backend cycles event to 7.2% like others. This
resulted in misaligned output like below:Performance counter stats for 'pwd':
0.638313 task-clock (msec) # 0.567 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
54 page-faults # 0.085 M/sec
885,600 cycles # 1.387 GHz
558,438 stalled-cycles-frontend # 63.06% frontend cycles idle
431,355 stalled-cycles-backend # 48.71% backend cycles idle
674,956 instructions # 0.76 insn per cycle
# 0.83 stalled cycles per insn
130,380 branches # 204.257 M/sec
branch-misses0.001125426 seconds time elapsed
Signed-off-by: Namhyung Kim
Cc: Andi Kleen
Cc: Jiri Olsa
Cc: Peter Zijlstra
Fixes: 140aeadc1fb5 ("perf stat: Abstract stat metrics printing")
Link: http://lkml.kernel.org/r/1463119263-5569-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
When unwinding callchains on a different machine, vdso info should be
available so the unwind process won't be interrupted if address falls
into vdso region. But in most cases, the addresses of sample events are
not in vdso range, the buildid of a zero hit vdso won't be stored into
perf.data.This patch stores vdso buildid regardless of whether the vdso is hit or
not.Signed-off-by: He Kuang
Cc: Adrian Hunter
Cc: Alexander Shishkin
Cc: Andi Kleen
Cc: David Ahern
Cc: Ekaterina Tumanova
Cc: Jiri Olsa
Cc: Josh Poimboeuf
Cc: Kan Liang
Cc: Masami Hiramatsu
Cc: Namhyung Kim
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Sukadev Bhattiprolu
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1463042596-61703-3-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo -
When the scaling factor is a full integer don't display fractional
digits. This avoids unnecessary .00 output for topdown metrics with
scale factors.v2: Remove redundant check.
Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1462489447-31832-7-git-send-email-andi@firstfloor.org
[ Rename 'round' to 'stat_round' as 'round' is defined in math.h,
included by this patch, and this breaks the build on ubuntu 12.04 ]
Signed-off-by: Arnaldo Carvalho de Melo -
Pull x86 platform updates from Ingo Molnar:
"The main change is the addition of SGI/UV4 support"* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/platform/UV: Fix incorrect nodes and pnodes for cpuless and memoryless nodes
x86/platform/UV: Remove Obsolete GRU MMR address translation
x86/platform/UV: Update physical address conversions for UV4
x86/platform/UV: Build GAM reference tables
x86/platform/UV: Support UV4 socket address changes
x86/platform/UV: Add obtaining GAM Range Table from UV BIOS
x86/platform/UV: Add UV4 addressing discovery function
x86/platform/UV: Fold blade info into per node hub info structs
x86/platform/UV: Allocate common per node hub info structs on local node
x86/platform/UV: Move blade local processor ID to the per cpu info struct
x86/platform/UV: Move scir info to the per cpu info struct
x86/platform/UV: Create per cpu info structs to replace per hub info structs
x86/platform/UV: Update MMIOH setup function to work for both UV3 and UV4
x86/platform/UV: Clean up redunduncies after merge of UV4 MMR definitions
x86/platform/UV: Add UV4 Specific MMR definitions
x86/platform/UV: Prep for UV4 MMR updates
x86/platform/UV: Add UV MMR Illegal Access Function
x86/platform/UV: Add UV4 Specific Defines
x86/platform/UV: Add UV Architecture Defines
x86/platform/UV: Add Initial UV4 definitions
... -
Pull x86 debug cleanup from Ingo Molnar:
"A printk() output simplification"* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Combine some printk()s -
Pull x86 cleanup from Ingo Molnar:
"Inline optimizations"* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix non-static inlines -
Pull x86-64 defconfig update from Ingo Molnar:
"Small defconfig addition"[ I'm not actually convinced our defconfig is sensible, but whatever ]
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build/defconfig/64: Enable CONFIG_E1000E=y -
Pull x86 boot updates from Ingo Molnar:
"The biggest changes in this cycle were:- prepare for more KASLR related changes, by restructuring, cleaning
up and fixing the existing boot code. (Kees Cook, Baoquan He,
Yinghai Lu)- simplifly/concentrate subarch handling code, eliminate
paravirt_enabled() usage. (Luis R Rodriguez)"* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/KASLR: Clarify purpose of each get_random_long()
x86/KASLR: Add virtual address choosing function
x86/KASLR: Return earliest overlap when avoiding regions
x86/KASLR: Add 'struct slot_area' to manage random_addr slots
x86/boot: Add missing file header comments
x86/KASLR: Initialize mapping_info every time
x86/boot: Comment what finalize_identity_maps() does
x86/KASLR: Build identity mappings on demand
x86/boot: Split out kernel_ident_mapping_init()
x86/boot: Clean up indenting for asm/boot.h
x86/KASLR: Improve comments around the mem_avoid[] logic
x86/boot: Simplify pointer casting in choose_random_location()
x86/KASLR: Consolidate mem_avoid[] entries
x86/boot: Clean up pointer casting
x86/boot: Warn on future overlapping memcpy() use
x86/boot: Extract error reporting functions
x86/boot: Correctly bounds-check relocations
x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
x86/boot: Fix "run_size" calculation
x86/boot: Calculate decompression size during boot not build
...