18 Mar, 2010
3 commits
-
Parameter --pid (or -p) of perf currently means a thread-wide
collection. For exmaple, if a process whose id is 8888 has 10
threads, 'perf top -p 8888' just collects the main thread
statistics. That's misleading. Users are used to attach a whole
process when debugging a process by gdb. To follow normal usage
style, the patch change --pid to process-wide collection and add
--tid (-t) to mean a thread-wide collection.Usage example is:
# perf top -p 8888
# perf record -p 8888 -f sleep 10
# perf stat -p 8888 -f sleep 10Above commands collect the statistics of all threads of process
8888.Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Cc: zhiteng.huang@intel.com
Cc: Zachary Amsden
LKML-Reference:
Signed-off-by: Ingo Molnar -
'perf record' starts counters before subcommand is execed, so
the statistics is not precise because it includes data of some
preparation steps. I fix it with the patch.In addition, change the condition to fork/exec subcommand. If
there is a subcommand parameter, perf always fork/exec it. The
usage example is:# perf record -f -a sleep 10
So this command could collect statistics for 10 seconds
precisely. User still could stop it by CTRL+C. Without the new
capability, user could only input CTRL+C to stop it without
precise time clock.Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: oerg Roedel
Cc: Jes Sorensen
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Cc:
Cc: Zachary Amsden
LKML-Reference:
Signed-off-by: Ingo Molnar -
Command 'perf stat' doesn't enable counters when collecting an
existing (by -p) process or system-wide statistics. Fix the
issue.Change the condition of fork/exec subcommand. If there is a
subcommand parameter, perf always forks/execs it. The usage
example is:# perf stat -a sleep 10
So this command could collect statistics for 10 seconds
precisely. User still could stop it by CTRL+C. Without the new
capability, user could only use CTRL+C to stop it without
precise time clock.Another issue is 'perf stat -a' consumes 100% time of a full
single logical cpu. It has a bad impact on running workload.Fix it by adding a sleep(1) in the while(!done) loop in function
run_perf_stat.Signed-off-by: Zhang Yanmin
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Avi Kivity
Cc: Peter Zijlstra
Cc: Sheng Yang
Cc: Marcelo Tosatti
Cc: Joerg Roedel
Cc: Jes Sorensen
Cc: Gleb Natapov
Cc: Zachary Amsden
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Mar, 2010
19 commits
-
The same information is stored also in x86_pmu.intel_ctrl. This
patch removes perf_event_mask and instead uses
x86_pmu.intel_ctrl directly.Signed-off-by: Robert Richter
Cc: Stephane Eranian
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
This member in the struct is not used anymore and can be
removed.Signed-off-by: Robert Richter
Cc: Stephane Eranian
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
This corrects the file entries for perf_events. The following
files are caught now:$ xargs | eval ls $(cat) | sort -u
kernel/perf_event*.c
include/linux/perf_event.h
arch/*/kernel/perf_event*.c
arch/*/kernel/*/perf_event*.c
arch/*/kernel/*/*/perf_event*.c
arch/*/include/asm/perf_event.h
arch/*/lib/perf_event*.c
arch/*/kernel/perf_callchain.carch/alpha/include/asm/perf_event.h
arch/arm/include/asm/perf_event.h
arch/arm/kernel/perf_event.c
arch/frv/include/asm/perf_event.h
arch/frv/lib/perf_event.c
arch/parisc/include/asm/perf_event.h
arch/powerpc/include/asm/perf_event.h
arch/powerpc/kernel/perf_callchain.c
arch/powerpc/kernel/perf_event.c
arch/s390/include/asm/perf_event.h
arch/sh/include/asm/perf_event.h
arch/sh/kernel/cpu/sh4a/perf_event.c
arch/sh/kernel/cpu/sh4/perf_event.c
arch/sh/kernel/perf_callchain.c
arch/sh/kernel/perf_event.c
arch/sparc/include/asm/perf_event.h
arch/sparc/kernel/perf_event.c
arch/x86/include/asm/perf_event.h
arch/x86/kernel/cpu/perf_event_amd.c
arch/x86/kernel/cpu/perf_event.c
arch/x86/kernel/cpu/perf_event_intel.c
arch/x86/kernel/cpu/perf_event_p6.c
include/linux/perf_event.h
kernel/perf_event.cSigned-off-by: Robert Richter
Cc: Stephane Eranian
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
The function reserve_pmc_hardware() and release_pmc_hardware()
were hard to read. This patch improves readability of the code by
removing most of the CONFIG_X86_LOCAL_APIC macros.Signed-off-by: Robert Richter
Cc: Stephane Eranian
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Fix the !drawf build.
This uses the existing NO_DWARF_SUPPORT mechanism we use for that,
but it's really fragile and needs a cleanup. (in a separate patch)1) Such uses:
#ifndef NO_DWARF_SUPPORT
are double inverted logic a'la 'not not'. Instead the flag should
be called DWARF_SUPPORT.2) Furthermore, assymetric #ifdef polluted code flow like:
if (need_dwarf)
#ifdef NO_DWARF_SUPPORT
die("Debuginfo-analysis is not supported");
#else /* !NO_DWARF_SUPPORT */
pr_debug("Some probes require debuginfo.\n");fd = open_vmlinux();
is very fragile and not acceptable. Instead of that helper functions
should be created and the dwarf/no-dwarf logic should be separated more
cleanly.3) Local variable #ifdefs like this:
#ifndef NO_DWARF_SUPPORT
int fd;
#endifAre fragile as well and should be eliminated. Helper functions achieve
that too.Cc: Masami Hiramatsu
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Support accessing members in the data structures. With this,
perf-probe accepts data-structure members(IOW, it now accepts
dot '.' and arrow '->' operators) as probe arguemnts.e.g.
./perf probe --add 'schedule:44 rq->curr'
./perf probe --add 'vfs_read file->f_op->read file->f_path.dentry'
Note that '>' can be interpreted as redirection in command-line.
Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Improve --list to show current exist probes with line number and
file name. This enables user easily to check which line is
already probed.for example:
./perf probe --list
probe:vfs_read (on vfs_read:8@linux-2.6-tip/fs/read_write.c)Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Introduce kprobe_trace_event and perf_probe_event and replace
old probe_point structure with it. probe_point structure is
not enough flexible nor extensible. New data structures
will help implementing further features.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Add --dry-run option for debugging and testing.
Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Introduce die_find_child() function to integrate DIE-tree
searching functions.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Rename die_get_real_subprogram and die_get_inlinefunc to
die_find_real_subprogram and die_find_inlinefunc respectively,
because these functions search its children. After that,
'die_get_' means getting a property of that die, and
'die_find_' means searching DIE-tree to get an appropriate
child die.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since this name 'session' conflicts with 'perf_session', and
this structure just holds parameters anymore.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Move add-probe routine to util/probe_event.c. This simplifies
main routine for reducing maintenance cost.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Use wrapped functions as much as possible, to check out of
memory conditions in perf probe.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Introducing xzalloc() which wrapping zalloc() for detecting out
of memory conditions.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Peter Zijlstra
LKML-Reference:
[ -v2: small cleanups in surrounding code ]
Signed-off-by: Ingo Molnar -
Merge reason: We'll be queueing dependent changes.
Signed-off-by: Ingo Molnar
-
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.As a general rule, weak functions should not have their symbol
exported in the same file they are defined.So let's export it on trace_event_perf.c as it is used by trace
events only.This fixes:
ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistakeReported-by: Stephen Rothwell
Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Xiao Guangrong
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
If x86_pmu.hw_config() fails a fixed error code (-EOPNOTSUPP) is
returned even if a different error was reported. This patch fixes
this.Signed-off-by: Robert Richter
Acked-by: Cyrill Gorcunov
Acked-by: Lin Ming
Cc: acme@redhat.com
Cc: eranian@google.com
Cc: gorcunov@openvz.org
Cc: peterz@infradead.org
Cc: fweisbec@gmail.com
LKML-Reference:
Signed-off-by: Ingo Molnar -
The dso_short_width has to start as zero, as we're calculating
the maximum short DSO name length, somehow I missed this one.Reported-by: Frédéric Weisbecker
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar
16 Mar, 2010
7 commits
-
Hide CONFIG_OPTPROBES and set if the arch supports optimized
kprobes (IOW, HAVE_OPTPROBES=y), since this option doesn't
change the major behavior of kprobes, and workarounds for minor
changes are documented.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc: Dieter Ries
Cc: Ananth N Mavinakayanahalli
Cc: OGAWA Hirofumi
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Use original address for looking up the location of variables
for dwarf_getlocation_addr() instead of CU-based address.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
LKML-Reference:
Signed-off-by: Ingo Molnar -
Fix dereference offset to intmax_t from uintmax_t, because
it can have negative values (for example local variable's offset
from frame pointer).Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
LKML-Reference:
Signed-off-by: Ingo Molnar -
When profiling C++ workloads the symbol name length can be
really big, so cap it before it garbles the result.This builds upon the autosizing already present where we choose
to use the short, basename of DSOs instead of its long, full
pathname.Reported-by: Pavel Krauz
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
Reported-by: Cyrill Gorcunov
Signed-off-by: Lin Ming
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.As a general rule, weak functions should not have their symbol
exported in the same file they are defined.So let's export it on trace_event_perf.c as it is used by trace
events only.This fixes:
ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistakeReported-by: Stephen Rothwell
Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Xiao Guangrong
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
Before this patch we would not find a vmlinux, then try to pass
objdump "[kernel.kallsyms]" as the filename, it would get
confused and produce no output:[root@doppio ~]# perf annotate n_tty_write
------------------------------------------------
Percent | Source code & Disassembly of [kernel.kallsyms]
------------------------------------------------Now we check that and emit meaningful warning:
[root@doppio ~]# perf annotate n_tty_write
Can't annotate n_tty_write: No vmlinux file was found in the
path: [0] vmlinux
[1] /boot/vmlinux
[2] /boot/vmlinux-2.6.34-rc1-tip+
[3] /lib/modules/2.6.34-rc1-tip+/build/vmlinux
[4] /usr/lib/debug/lib/modules/2.6.34-rc1-tip+/vmlinux
[root@doppio ~]#This bug was introduced when we added automatic search for
vmlinux, before that time the user had to specify a vmlinux
file.v2: Print the warning just for the first symbol found when no
symbol name is specified, otherwise it will spam the screen
repeating the warning for each symbol.Reported-by: Ingo Molnar
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
15 Mar, 2010
3 commits
-
Before this patch this message would very briefly appear on the
screen and then the screen would get updates only on the top,
for number of interrupts received, etc, but no annotation would
be performed:[root@doppio linux-2.6-tip]# perf top -s n_tty_write > /tmp/bla
objdump: '[kernel.kallsyms]': No such fileNow this is what the user gets:
[root@doppio linux-2.6-tip]# perf top -s n_tty_write
Can't annotate n_tty_write: No vmlinux file was found in the
path: [0] vmlinux
[1] /boot/vmlinux
[2] /boot/vmlinux-2.6.33-rc5
[3] /lib/modules/2.6.33-rc5/build/vmlinux
[4] /usr/lib/debug/lib/modules/2.6.33-rc5/vmlinux
[root@doppio linux-2.6-tip]#This bug was introduced when we added automatic search for
vmlinux, before that time the user had to specify a vmlinux
file.Reported-by: David S. Miller
Reported-by: Ingo Molnar
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar -
When forking its target, perf record can capture data from
before the target application is started. Perf stat uses the
enable_on_exec flag in the event attributes to keep from
displaying events from before the target program starts, this
patch adds the same functionality to perf record when it is will
fork the target process.Signed-off-by: Eric B Munson
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
This should turn on instruction counting on P4s, which was missing in
the first version of the new PMU driver.It's inaccurate for now, we still need dependant event to tag mops
before we can count them precisely. The result is that the number of
instruction may be lifted up.Signed-off-by: Cyrill Gorcunov
Signed-off-by: Lin Ming
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
13 Mar, 2010
8 commits
-
Ingo reported:
|
| There's a build failure on -tip with the P4 driver, on UP 32-bit, if
| PERF_EVENTS is enabled but UP_APIC is disabled:
|
| arch/x86/built-in.o: In function `p4_pmu_handle_irq':
| perf_event.c:(.text+0xa756): undefined reference to `apic'
| perf_event.c:(.text+0xa76e): undefined reference to `apic'
|So we have to unmask LVTPC only if we're configured to have one.
Reported-by: Ingo Molnar
Signed-off-by: Cyrill Gorcunov
CC: Lin Ming
CC: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Set need_dwarf if lazy matching pattern is specified, because
lazy matching requires real source path for which we must use
debuginfo.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
LKML-Reference:
Signed-off-by: Ingo Molnar -
Fix probe_point array-size overrun problem. In some cases (e.g.
inline function), one user-specified probe-point can be
translated to many probe address, and it overruns pre-defined
array-size. This also removes redundant MAX_PROBES macro
definition.Signed-off-by: Masami Hiramatsu
Cc: systemtap
Cc: DLE
Cc:
LKML-Reference:
[ Note that only root can create new probes. Eventually we should remove
the MAX_PROBES limit, but that is a larger patch not eligible to
perf/urgent treatment. ]
Signed-off-by: Ingo Molnar -
The use_browser needs to be in a file that is always built and
also we need a browser__show_help stub in that case.Reported-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: The new P4 driver is stable and ready now for more
testing.Signed-off-by: Ingo Molnar
-
[root@doppio ~]# perf report -i newt.data | head -10
# Samples: 11999679868
#
# Overhead Command Shared Object Symbol
# ........ ....... ............................. ......
#
63.61% perf libslang.so.2.1.4 [.] SLsmg_write_chars
6.30% perf perf [.] symbols__find
2.19% perf libnewt.so.0.52.10 [.] newtListboxAppendEntry
2.08% perf libslang.so.2.1.4 [.] SLsmg_write_chars@plt
1.99% perf libc-2.10.2.so [.] _IO_vfprintf_internal
[root@doppio ~]#Not good, the newt form for report works, but slang has to eat
the cost of the additional callgraph lines everytime it prints a
line, and the callgraph doesn't appear on the screen, so move
the callgraph printing to a separate function and don't use it
in newt.c.Newt tree widgets are being investigated to properly support
callgraphs, but till that gets merged, lets remove this huge
overhead and show at least the symbol overheads for a callgraph
rich perf.data with good performance.Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
For consistency, use the newt API more fully.
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
These are keys people expect when pressed to exit the current
widget, so have associate all of them to this semantic.Suggested-by: Ingo Molnar
Signed-off-by: Arnaldo Carvalho de Melo
Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar