08 Feb, 2010
6 commits
-
These are the bits that enable the new nmi_watchdog and safely
isolate the old nmi_watchdog. Only one or the other can run,
not both at the same time.Signed-off-by: Don Zickus
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: peterz@infradead.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
This is a new generic nmi_watchdog implementation using the perf
events infrastructure as suggested by Ingo.The implementation is simple, just create an in-kernel perf
event and register an overflow handler to check for cpu lockups.I created a generic implementation that lives in kernel/ and
the hardware specific part that for now lives in arch/x86.This approach has a number of advantages:
- It simplifies the x86 PMU implementation in the long run,
in that it removes the hardcoded low-level PMU implementation
that was the NMI watchdog before.- It allows new NMI watchdog features to be added in a central
place.- It allows other architectures to enable the NMI watchdog,
as long as they have perf events (that provide NMIs)
implemented.- It also allows for more graceful co-existence of existing
perf events apps and the NMI watchdog - before these changes
the relationship was exclusive. (The NMI watchdog will 'spend'
a perf event when enabled. In later iterations we might be
able to piggyback from an existing NMI event without having
to allocate a hardware event for the NMI watchdog - turning
this into a no-hardware-cost feature.)As for compatibility, we'll keep the old NMI watchdog code as
well until the new one can 100% replace it on all CPUs, old and
new alike. That might take some time as the NMI watchdog has
been ported to many CPU models.I have done light testing to make sure the framework works
correctly and it does.v2: Set the correct timeout values based on the old nmi
watchdogSigned-off-by: Don Zickus
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: peterz@infradead.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
In order to handle a new nmi_watchdog approach, I need to move
the notify_die() routine out of nmi_watchdog_tick() and into
default_do_nmi(). This lets me easily swap out the old
nmi_watchdog with the new one with just a config change.The change probably makes sense from a high level perspective
because the nmi_watchdog shouldn't be handling notify_die
routines anyway. However, this move does change the semantics a
little bit. Instead of checking on every nmi interrupt if the
cpus are stuck, only check them on the nmi_watchdog interrupts.v2: Move notify_die call into #idef block
Signed-off-by: Don Zickus
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: peterz@infradead.org
LKML-Reference:
Signed-off-by: Ingo Molnar -
Fixes these warnings:
arch/x86/kernel/alternative.c: In function 'alternatives_text_reserved':
arch/x86/kernel/alternative.c:402: warning: comparison of distinct pointer types lacks a cast
arch/x86/kernel/alternative.c:402: warning: comparison of distinct pointer types lacks a cast
arch/x86/kernel/alternative.c:405: warning: comparison of distinct pointer types lacks a cast
arch/x86/kernel/alternative.c:405: warning: comparison of distinct pointer types lacks a castCaused by:
2cfa197: ftrace/alternatives: Introducing *_text_reserved functions
Changes in v2:
- Use local variables to compare, instead of type casts.Reported-by: Ingo Molnar
Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
LKML-Reference:
Signed-off-by: Ingo Molnar -
Because we may have aliases, like __GI___strcoll_l in
/lib64/libc-2.10.2.so that appears in objdump as:$ objdump --start-address=0x0000003715a86420 \
--stop-address=0x0000003715a872dc -dS /lib64/libc-2.10.2.so0000003715a86420 :
3715a86420: 55 push %rbp
3715a86421: 48 89 e5 mov %rsp,%rbp
3715a86424: 41 57 push %r15
[root@doppio linux-2.6-tip]#So look for the address exactly at the start of the line instead
so that annotation can work for in these cases.Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Kirill Smelkov
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
First, for programs and prelinked libraries, annotate code was
fooled by objdump output IPs (src->eip in the code) being
wrongly converted to absolute IPs. In such case there were no
conversion needed, but insrc->eip = strtoull(src->line, NULL, 16);
src->eip = map->unmap_ip(map, src->eip); // = eip + map->start - map->pgoffwe were reading absolute address from objdump (e.g. 8048604) and
then almost doubling it, because eip & map->start are
approximately close for small programs.Needless to say, that later, in record_precise_ip() there was no
matching with real runtime IPs.And second, like with `perf annotate` the problem with
non-prelinked *.so was that we were doing rip -> objdump address
conversion wrong.Also, because unlike `perf annotate`, `perf top` code does
annotation based on absolute IPs for performance reasons(*), new
helper for mapping objdump addresse to IP is introduced.(*) we get samples info in absolute IPs, and since we do lots of
hit-testing on absolute IPs at runtime in record_precise_ip(), it's
better to convert objdump addresses to IPs once and do no conversion
at runtime.I also had to fix how objdump output is parsed (with hardcoded
8/16 characters format, which was inappropriate for ET_DYN dsos
with small addresses like '4ac')Also note, that not all objdump output lines has associtated
IPs, e.g. look at source lines here:000004ac :
extern "C"
int my_strlen(const char *s)
4ac: 55 push %ebp
4ad: 89 e5 mov %esp,%ebp
4af: 83 ec 10 sub $0x10,%esp
{
int len = 0;
4b2: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
4b9: eb 08 jmp 4c3while (*s) {
++len;
4bb: 83 45 fc 01 addl $0x1,-0x4(%ebp)
++s;
4bf: 83 45 08 01 addl $0x1,0x8(%ebp)So we mark them with eip=0, and ignore such lines in annotate
lookup code.Signed-off-by: Kirill Smelkov
[ Note: one hunk of this patch was applied by Mike in 57d8188 ]
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar
05 Feb, 2010
1 commit
-
Since mcount function can be called from everywhere,
it should be blacklisted. Moreover, the "mcount" symbol
is a special symbol name. So, it is better to put it in
the generic blacklist.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Ananth N Mavinakayanahalli
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar
04 Feb, 2010
20 commits
-
perf top and perf record refuses to initialize on non-modular kernels:
refuse to initialize:$ perf top -v
map_groups__set_modules_path_dir: cannot open /lib/modules/2.6.33-rc6-tip-00586-g398dde3-dirty/Cc: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
Setting _FILE_OFFSET_BITS and using O_LARGEFILE, lseek64, etc,
is redundant. Thanks H. Peter Anvin for pointing it out.So, this patch removes O_LARGEFILE, lseek64, etc.
Suggested-by: "H. Peter Anvin"
Signed-off-by: Xiao Guangrong
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
We cannot assume that because hwc->idx == assign[i], we can avoid
reprogramming the counter in hw_perf_enable().The event may have been scheduled out and another event may have been
programmed into this counter. Thus, we need a more robust way of
verifying if the counter still contains config/data related to an event.This patch adds a generation number to each counter on each cpu. Using
this mechanism we can verify reliabilty whether the content of a counter
corresponds to an event.Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Avoid accidental misuse by failing to compile things
Suggested-by: Andrew Morton
Signed-off-by: Peter Zijlstra
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Ingo Molnar -
Implement Intel Core Solo/Duo, aka.
Intel Architectural Performance Monitoring Version 1.Signed-off-by: Peter Zijlstra
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Arjan van de Ven
LKML-Reference:
Signed-off-by: Ingo Molnar -
Pretty much all of the calls do perf_disable/perf_enable cycles, pull
that out to cut back on hardware programming.Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Remove record freezing. Because kprobes never puts probe on
ftrace's mcount call anymore, it doesn't need ftrace to check
whether kprobes on it.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Steven Rostedt
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Check whether the address of new probe is already reserved by
ftrace or alternatives (on x86) when registering new probe.
If reserved, it returns an error and not register the probe.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Steven Rostedt
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker
Cc: Ananth N Mavinakayanahalli
Cc: Jim Keniston
Cc: Mathieu Desnoyers
Cc: Jason Baron
LKML-Reference:
Signed-off-by: Ingo Molnar -
Introducing *_text_reserved functions for checking the text
address range is partially reserved or not. This patch provides
checking routines for x86 smp alternatives and dynamic ftrace.
Since both functions modify fixed pieces of kernel text, they
should reserve and protect those from other dynamic text
modifier, like kprobes.This will also be extended when introducing other subsystems
which modify fixed pieces of kernel text. Dynamic text modifiers
should avoid those.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Steven Rostedt
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker
Cc: Ananth N Mavinakayanahalli
Cc: Jim Keniston
Cc: Mathieu Desnoyers
Cc: Jason Baron
LKML-Reference:
Signed-off-by: Ingo Molnar -
Disable kprobe booster when CONFIG_PREEMPT=y at this time,
because it can't ensure that all kernel threads preempted on
kprobe's boosted slot run out from the slot even using
freeze_processes().The booster on preemptive kernel will be resumed if
synchronize_tasks() or something like that is introduced.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Ananth N Mavinakayanahalli
Cc: Frederic Weisbecker
Cc: Jim Keniston
Cc: Mathieu Desnoyers
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Signed-off-by: Mike Galbraith
Cc: Kirill Smelkov
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
By relying on logic in dso__load_kernel_sym(), we can
automatically load vmlinux.The only thing which needs to be adjusted, is how --sym-annotate
option is handled - now we can't rely on vmlinux been loaded
until full successful pass of dso__load_vmlinux(), but that's
not the case if we'll do sym_filter_entry setup in
symbol_filter().So move this step right after event__process_sample() where we
know the whole dso__load_kernel_sym() pass is done.By the way, though conceptually similar `perf top` still can't
annotate userspace - see next patches with fixes.Signed-off-by: Kirill Smelkov
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar -
The problem was we were incorrectly calculating objdump
addresses for sym->start and sym->end, look:For simple ET_DYN type DSO (*.so) with one function, objdump -dS
output is something like this:000004ac :
int my_strlen(const char *s)
4ac: 55 push %ebp
4ad: 89 e5 mov %esp,%ebp
4af: 83 ec 10 sub $0x10,%esp
{i.e. we have relative-to-dso-mapping IPs (=RIP) there.
For ET_EXEC type and probably for prelinked libs as well (sorry
can't test - I don't use prelink) objdump outputs absolute IPs,
e.g.08048604 :
extern "C"
int zz_strlen(const char *s)
8048604: 55 push %ebp
8048605: 89 e5 mov %esp,%ebp
8048607: 83 ec 10 sub $0x10,%esp
{So, if sym->start is always relative to dso mapping(*), we'll
have to unmap it for ET_EXEC like cases, and leave as is for
ET_DYN cases.(*) and it is - we've explicitely made it relative. Look for
adjust_symbols handling in dso__load_sym()Previously we were always unmapping sym->start and for ET_DYN
dsos resulting addresses were wrong, and so objdump output was
empty.The end result was that perf annotate output for symbols from
non-prelinked *.so had always 0.00% percents only, which is
wrong.To fix it, let's introduce a helper for converting rip to
objdump address, and also let's document what map_ip() and
unmap_ip() do -- I had to study sources for several hours to
understand it.Signed-off-by: Kirill Smelkov
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar -
Not to pollute too much 'perf annotate' debugging sessions.
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
We want to stream events as fast as possible to perf.data, and
also in the future we want to have splice working, when no
interception will be possible.Using build_id__mark_dso_hit_ops to create the list of DSOs that
back MMAPs we also optimize disk usage in the build-id cache by
only caching DSOs that had hits.Suggested-by: Peter Zijlstra
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Xiao Guangrong
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
Because 'perf record' will have to find the build-ids in after
we stop recording, so as to reduce even more the impact in the
workload while we do the measurement.Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
With the recent modifications done to untie the session and
symbol layers, 'perf probe' now can use just the symbols layer.Signed-off-by: Arnaldo Carvalho de Melo
Acked-by: Masami Hiramatsu
Cc: Frédéric Weisbecker
Cc: Masami Hiramatsu
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Signed-off-by: Ingo Molnar -
We can check using strcmp, most DSOs don't start with '[' so the
test is cheap enough and we had to test it there anyway since
when reading perf.data files we weren't calling the routine that
created this global variable and thus weren't setting it as
"loaded", which was causing a bogus:Failed to open [vdso], continuing without symbols
Message as the first line of 'perf report'.
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
While debugging a problem reported by Pekka Enberg by printing
the IP and all the maps for a thread when we don't find a map
for an IP I noticed that dso__load_sym needs to fixup these
extra maps it creates to hold symbols in different ELF sections
than the main kernel one.Now we're back showing things like:
[root@doppio linux-2.6-tip]# perf report | grep vsyscall
0.02% mutt [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.01% named [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.01% NetworkManager [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.01% gconfd-2 [kernel.kallsyms].vsyscall_0 [.] vgettimeofday
0.01% hald-addon-rfki [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.00% dbus-daemon [kernel.kallsyms].vsyscall_fn [.] vread_hpet
[root@doppio linux-2.6-tip]#Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
I noticed while writing the first test in 'perf regtest' that to
just test the symbol handling routines one needs to create a
perf session, that is a layer centered on a perf.data file,
events, etc, so I untied these layers.This reduces the complexity for the users as the number of
parameters to most of the symbols and session APIs now was
reduced while not adding more state to all the map instances by
only having data that is needed to split the kernel (kallsyms
and ELF symtab sections) maps and do vmlinux relocation on the
main kernel map.Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar
03 Feb, 2010
1 commit
-
Open perf data file with O_LARGEFILE flag since its size is
easily larger that 2G.For example:
# rm -rf perf.data
# ./perf kmem record sleep 300[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 3142.147 MB perf.data
(~137282513 samples) ]# ll -h perf.data
-rw------- 1 root root 3.1G .....Signed-off-by: Xiao Guangrong
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
31 Jan, 2010
6 commits
-
Fix up a few small stylistic details:
- use consistent vertical spacing/alignment
- remove line80 artifacts
- group some global variables better
- remove dead codePlus rename 'prof' to 'report' to make it more in line with other
tools, and remove the line/file keying as we really want to use
IPs like the other tools do.Signed-off-by: Ingo Molnar
Cc: Hitoshi Mitake
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Adding new subcommand "perf lock" to perf.
I have a lot of remaining ToDos, but for now perf lock can
already provide minimal functionality for analyzing lock
statistics.Signed-off-by: Hitoshi Mitake
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Add wait time and lock identification details.
Signed-off-by: Hitoshi Mitake
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Frederic Weisbecker
LKML-Reference:
[ removed the file/line bits as we can do that better via IPs ]
Signed-off-by: Ingo Molnar -
linux/hash.h, hash header of kernel, is also useful for perf.
util/include/linuxhash.h includes linux/hash.h, so we can use
hash facilities (e.g. hash_long()) in perf now.Signed-off-by: Hitoshi Mitake
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
This patch is required to test the next patch for perf lock.
At 064739bc4b3d7f424b2f25547e6611bcf0132415 ,
support for the modifier "__data_loc" of format is added.But, when I wanted to parse format of lock_acquired (or some
event else), raw_field_ptr() did not returned correct pointer.So I modified raw_field_ptr() like this patch. Then
raw_field_ptr() works well.Signed-off-by: Hitoshi Mitake
Acked-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Tom Zanussi
Cc: Steven Rostedt
LKML-Reference:
[ v3: fixed minor stylistic detail ]
Signed-off-by: Ingo Molnar -
This reverts commit f5a2c3dce03621b55f84496f58adc2d1a87ca16f.
This patch is required for making "perf lock rec" work.
The commit f5a2c3dce0 changes write_event() of builtin-record.c
. And changed write_event() sometimes doesn't stop with perf
lock rec.Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
[ that commit also causes perf record to not be Ctrl-C-able,
and it's concetually wrong to parse the data at record time
(unconditionally - even when not needed), as we eventually
want to be able to do zero-copy recording, at least for
non-archive recordings. ]
Signed-off-by: Ingo Molnar
29 Jan, 2010
6 commits
-
Tell git to ignore perf-archive.
Signed-off-by: John Kacur
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Checked with:
./../scripts/checkpatch.pl --terse --file perf.cperf.c: 51: ERROR: open brace '{' following function declarations go on the next line
perf.c: 73: ERROR: "foo*** bar" should be "foo ***bar"
perf.c:112: ERROR: space prohibited before that close parenthesis ')'
perf.c:127: ERROR: space prohibited before that close parenthesis ')'
perf.c:171: ERROR: "foo** bar" should be "foo **bar"
perf.c:213: ERROR: "(foo*)" should be "(foo *)"
perf.c:216: ERROR: "(foo*)" should be "(foo *)"
perf.c:217: ERROR: space required before that '*' (ctx:OxV)
perf.c:452: ERROR: do not initialise statics to 0 or NULL
perf.c:453: ERROR: do not initialise statics to 0 or NULLSigned-off-by: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Frederic Weisbecker
Cc: Masami Hiramatsu
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: We want to queue up a dependent patch. Also update to
later -rc's.Signed-off-by: Ingo Molnar
-
Removing one extra step needed in the tools that need this,
fixing a bug in 'perf probe' where this was not being done.Signed-off-by: Arnaldo Carvalho de Melo
Cc: Masami Hiramatsu
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
To make it clear and allow for direct usage by, for instance,
regression test suites.Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
So that we can call it directly from regression tests, and also
to reduce the size of dso__load_kernel_sym(), making it more
clear.Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar