24 Sep, 2011
13 commits
-
Problem introduced in 936be50, that missed one perf_event__parse_sample
user, the python binding.Reported-by: Linus Torvalds
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-ja4phms9618ggi657plyuch2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
GCC often introduces new warnings with lots of false positives -
breaking -Werror builds. WERROR=0 allows one to build perf without much
fuss - while still encouraging people to send patches to avoid the fuss
of having to type WERROR=0.Bisecting back to commits that produce a (mostly harmless) warning on
some compilers is more difficult. With WERROR=0 one could bisect without
worrying about harmless warnings.Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/eac06c7cc4920e5d4830417d466161fb26c7359c.1315514559.git.dvhart@linux.intel.com
Signed-off-by: Darren Hart
Signed-off-by: Arnaldo Carvalho de Melo -
The 'perf top' tool came from the kernel where we had each DSO (vmlinux,
modules) loaded just once at a time.But userspace may have DSOs loaded in multiple addresses (shared
libraries), requiring that we use the just resolved map instead of the
first one found.Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-ag53wz0yllpgers0n2w7hchp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Buildid can vary in size. According to the man page of ld, buildid can
be 160 bits (sha1) or 128 bits (md5, uuid). Perf assumes buildid size of
20 bytes (160 bits) regardless. When dealing with md5 buildids, it would
thus read more than needed and that would cause mismatches and samples
without symbols.This patch fixes this by taking into account the actual buildid size as
encoded int he section header. The leftover bytes are also cleared.This second version fixes a minor issue with the memset() base position.
Cc: David S. Miller
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Robert Richter
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/4cc1af3c.8ee7d80a.5a28.ffff868e@mx.google.com
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo -
Currently, analyzing PPC data files on x86 the cpu field is always 0 and
the tid and pid are backwards. For example, analyzing a PPC file on PPC
the pid/tid fields show:rsyslogd 1210/1212
and analyzing the same PPC file using an x86 perf binary shows:
rsyslogd 1212/1210
The problem is that the swap_op method for samples is
perf_event__all64_swap which assumes all elements in the sample_data
struct are u64s. cpu, tid and pid are u32s and need to be handled
individually. Given that the swap is done before the sample is parsed,
the simplest solution is to undo the 64-bit swap of those elements when
the sample is parsed and do the proper swap.The RAW data field is generic and perf cannot have programmatic knowledge
of how to treat that data. Instead a warning is given to the user.Thanks to Anton Blanchard for providing a data file for a mult-CPU
PPC system so I could verify the fix for the CPU fields.v3 -> v4:
- fixed use of WARN_ONCEv2 -> v3:
- used WARN_ONCE for message regarding raw data
- removed struct wrapper around union
- fixed whitespace issuesv1 -> v2:
- added a union for undoing the byte-swap on u64 and redoing swap on
u32's to address compiler errors (see git commit 65014ab3)Cc: Anton Blanchard
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo -
I took a profile that suggested 60% of total CPU time was in the
hypervisor:...
60.20% [H] 0x33d43c
4.43% [k] ._spin_lock_irqsave
1.07% [k] ._spin_lockUsing perf stat to get the user/kernel/hypervisor breakdown contradicted
this.The problem is we merge all unresolved samples into the one unknown
bucket. If add a comparison by sample type to sort__sym_cmp we get the
real picture:...
57.11% [.] 0x80fbf63c
4.43% [k] ._spin_lock_irqsave
1.07% [k] ._spin_lock
0.65% [H] 0x33d43cSo it was almost all userspace, not hypervisor as the initial profile
suggested.I found another issue while adding this. Symbol sorting sometimes shows
multiple entries for the unknown bucket:...
16.65% [.] 0x6cd3a8
7.25% [.] 0x422460
5.37% [.] yylex
4.79% [.] malloc
4.78% [.] _int_malloc
4.03% [.] _int_free
3.95% [.] hash_source_code_string
2.82% [.] 0x532908
2.64% [.] 0x36b538
0.94% [H] 0x8000000000e132a4
0.82% [H] 0x800000000000e8b0This happens because we aren't consistent with our sorting. On
one hand we check to see if both symbols match and for two unresolved
samples sym is NULL so we match:if (left->ms.sym == right->ms.sym)
return 0;On the other hand we use sample IP for unresolved samples when
comparing against a symbol:ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
ip_r = right->ms.sym ? right->ms.sym->start : right->ip;This means unresolved samples end up spread across the rbtree and we
can't merge them all.If we use cmp_null all unresolved samples will end up in the one bucket
and the output makes more sense:...
39.12% [.] 0x36b538
5.37% [.] yylex
4.79% [.] malloc
4.78% [.] _int_malloc
4.03% [.] _int_free
3.95% [.] hash_source_code_string
2.26% [H] 0x800000000000e8b0Acked-by: Eric B Munson
Cc: Eric B Munson
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Ian Munsie
Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@kryten
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo -
perf_event__synthesize_mmap_events does not create anonymous mmap events
even though the kernel does. As a result an already running application
with dynamically created code will not get profiled - all samples end up
in the unknown bucket.This patch skips any entries with '[' in the name to avoid adding events
for special regions (eg the vsyscall page). All other executable mmaps
are assumed to be anonymous and an event is synthesized.Acked-by: Pekka Enberg
Cc: Eric B Munson
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20110830091506.60b51fe8@kryten
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo -
perf-record currently creates events enabled. When doing a system wide
collection (-a arg) this causes data collection for perf's
initialization activities -- eg., perf_event__synthesize_threads().For some events (e.g., context switch S/W event or tracepoints like
syscalls) perf's initialization causes a lot of events to be captured
frequently generating "Check IO/CPU overload!" warnings on larger
systems (e.g., 2 socket, quad core, hyperthreading).perf's initialization phase can be skipped by creating events
disabled and then enabling them once the initialization is done.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern
Signed-off-by: Arnaldo Carvalho de Melo -
Try and pick the best symbol based on a few heuristics:
- Prefer a non weak symbol over a weak one
- Prefer a global symbol over a non global one
- Prefer a symbol with less underscores (idea taken from kallsyms.c)
- If all else fails, choose the symbol with the longest nameCc: Eric B Munson
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110824065243.161953371@samba.org
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo -
kallsyms__parse capitalises the symbol type, so every symbol is marked
global. Remove this and fix symbol_type__is_a to handle both local and
global symbols.Cc: Eric B Munson
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110824065243.077125989@samba.org
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo -
kallsyms__parse assumes that /proc/kallsyms is sorted and sets the end
of the previous symbol to the start of the current one.Unfortunately module symbols are not sorted, eg:
ffffffffa0081f30 t e1000_clean_rx_irq [e1000e]
ffffffffa00817a0 t e1000_alloc_rx_buffers [e1000e]Some symbols end up with a negative length and others have a length
larger than they should. This results in confusing perf output.We already have a function to fixup the end of zero length symbols so
use that instead.Cc: Eric B Munson
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110824065242.969681349@samba.org
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo -
64bit PowerPC debuginfo files have an empty function descriptor section.
I hit a SEGV when perf tried to use this section for symbol resolution.To fix this we need to check the section is valid and we can do this by
checking for type SHT_PROGBITS.Cc:
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Eric B Munson
Link: http://lkml.kernel.org/r/20110824065242.895239970@samba.org
Signed-off-by: Anton Blanchard
Signed-off-by: Arnaldo Carvalho de Melo -
Fix to call convert_variable() if previous call does not fail.
To call convert_variable, it ensures "ret" is 0. However, since
"ret" has the return value of synthesize_perf_probe_arg() which
always returns positive value if it succeeded, perf probe doesn't
call convert_variable(). This will cause a SEGV when we add an
event with arguments.This has to be fixed as it ensures "ret" is greater than 0
(or not negative).This regression has been introduced by my previous patch, f182e3e1.
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110820053922.3286.65805.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo
30 Aug, 2011
1 commit
-
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: pm: avoid writing the auxillary control register for ARMv7
ARM: pm: some ARMv7 requires a dsb in resume to ensure correctness
ARM: pm: arm920/926: fix number of registers saved
ARM: pm: CPU specific code should not overwrite r1 (v:p offset)
ARM: 7066/1: proc-v7: disable SCTLR.TE when disabling MMU
ARM: 7065/1: kexec: ensure new kernel is entered in ARM state
ARM: 7003/1: vexpress: Add clock definition for the SP805.
ARM: 7051/1: cpuimx* boards: fix mach-types errors
ARM: 7019/1: Footbridge: select CLKEVT_I8253 for ARCH_NETWINDER
ARM: 7015/1: ARM errata: Possible cache data corruption with hit-under-miss enabled
ARM: 7014/1: cache-l2x0: Fix L2 Cache size calculation.
ARM: 6967/1: ep93xx: ts72xx: fix board model detection
ARM: 6965/1: ep93xx: add model detection for ts-7300 and ts-7400 boards
ARM: cache: detect VIPT aliasing I-cache on ARMv6
ARM: twd: register clockevents device before enabling PPI
ARM: realview: ensure visibility of writes during reset
ARM: perf: make name of arm_pmu_type consistent
ARM: perf: fix prototype of release_pmu
ARM: fix perf build with uclibc toolchains
26 Aug, 2011
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/cpupowerutils:
cpupower: use man(1) when calling "cpupower help subcommand"
cpupower: make NLS truly optional
cpupower: fix Makefile typo
cpupower: Make monitor command -c/--cpu aware
cpupower: Better detect offlined CPUs
cpupower: Do not show an empty Idle_Stats monitor if no idle driver is available
cpupower: mperf monitor - Use TSC to calculate max frequency if possible
cpupower: avoid using symlinks
19 Aug, 2011
3 commits
-
Instead of printing something non-formatted to stdout, call
man(1) to show the man page for the proper subcommand.Signed-off-by: Dominik Brodowski
-
Loosely based on a patch for cpufrequtils, submittted by
Sergey Dryabzhinsky andsigned-off-by: Matt Turner
Signed-off-by: Dominik Brodowski -
Signed-off-by: Dave Jones
Signed-off-by: Dominik Brodowski
18 Aug, 2011
5 commits
-
Group event scheduling command line option is missing in perf
record/stat.Add it to perf record/stat, which is same as in perf top.
Reported-by: Andi Kleen
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1313577727.2754.5.camel@hp6530s
Signed-off-by: Lin Ming
Signed-off-by: Arnaldo Carvalho de Melo -
Upstream glibc commit 295e904 added a definition for __attribute_const__
to cdefs.h. This causes the following error when building perf:util/include/linux/compiler.h:8:0: error: "__attribute_const__"
redefined [-Werror] /usr/include/sys/cdefs.h:226:0: note: this is the
location of the previous definitionWrap __attribute_const__ in #ifndef as we do for __always_inline.
Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110818113720.GL2227@zod.bos.redhat.com
Signed-off-by: Josh Boyer
Signed-off-by: Arnaldo Carvalho de Melo -
There was a problem with the parse_events() code not printing the
correct event name when an event was unknown and starting with an 'r'.
The source of the problem was the way raw notation was parsed.Without the patch:
$ perf stat -e retired_foo
invalid event modifier: 'tired_foo'With the patch:
$ perf stat -e retired_foo
invalid or unsupported event: 'retired_foo'This also covers the case where the name of the event was not printed at
all when perf was linked with libpfm4.Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110723021043.GA20178@quad
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo -
When no event is given to perf record, perf top, a default event is
initialized (cycles). However, perf_evlist__add_default() was not
setting the symbolic name for the event. Perf top worked simply because
it was reconstructing the name from the event code. But it should not
have to do this. This patch initializes the evsel->name field properly.This second version improves the code flow on the non error path.
Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110607161936.GA8163@quad
Signed-off-by: Stephane Eranian
[committer note: Use perf_evsel__delete() instead of plain free()]
Signed-off-by: Arnaldo Carvalho de Melo -
This patch fixes an issue with the exit value of perf list:
$ perf list; echo $?
129perf list returns an error exit code even though there is no error.
There was a stray exit(129) in print_events(). This patch removes this
exit().$ perf list; echo $?
0$ perf list hw sw
cpu-cycles OR cycles [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
instructions [Hardware event]
cache-references [Hardware event]
cache-misses [Hardware event]
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]cpu-clock [Software event]
task-clock [Software event]
page-faults OR faults [Software event]
minor-faults [Software event]
major-faults [Software event]
context-switches OR cs [Software event]
cpu-migrations OR migrations [Software event]
alignment-faults [Software event]
emulation-faults [Software event]
$ echo $?
0Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110523123917.GA31060@quad
Signed-off-by: Stephane Eranian
Signed-off-by: Arnaldo Carvalho de Melo
16 Aug, 2011
5 commits
-
This allows for example:
cpupower -c 2-4,6 monitor -m Mperf
|Mperf
PKG |CORE|CPU | C0 | Cx | Freq
0| 8| 4| 2.42| 97.58| 1353
0| 16| 2| 14.38| 85.62| 1928
0| 24| 6| 1.76| 98.24| 1442
1| 16| 3| 15.53| 84.47| 1650CPUs always get resorted for package, core then cpu id if it could get read out
(or however you name these topology levels...).
Still this is a nice way to keep the overview if a test binary is bound to
a specific CPU or if one wants to show all CPUs inside a package or similar.Still missing: Do not measure not available cores to reduce the overhead
and achieve better results.Signed-off-by: Thomas Renninger
Signed-off-by: Dominik Brodowski -
Before, checking for offlined CPUs was done dirty and
it was checked whether topology parsing returned -1 values.
But this is a valid case on a Xen (and possibly other) kernels.Do proper online/offline checking, also take CONFIG_HOTPLUG_CPU
option into account (no /sys/devices/../cpuX/online file).Signed-off-by: Thomas Renninger
Signed-off-by: Dominik Brodowski -
By taking error values of:
sysfs_get_idlestate_count(..);
into account.Signed-off-by: Thomas Renninger
Signed-off-by: Dominik Brodowski -
Which makes the implementation independent from cpufreq drivers.
Therefore this would also work on a Xen kernel where the hypervisor
is doing frequency switching and idle entering.Signed-off-by: Thomas Renninger
Signed-off-by: Dominik Brodowski -
Reference the source directly, don't create symlinks.
Signed-off-by: WANG Cong
Signed-off-by: Dominik Brodowski
14 Aug, 2011
1 commit
-
…inux into perf/urgent
13 Aug, 2011
1 commit
12 Aug, 2011
10 commits
-
libio.h is not provided by uClibc, in order to be able to test the
definition of __UCLIBC__ we need to include stdlib.h, which also
includes stddef.h, providing the definition of 'NULL'.Signed-off-by: Florian Fainelli
Signed-off-by: Will Deacon -
With gcc4.6, some instances of concrete inlined function looks redundant
and broken, because it appears inside of a concrete instance and its
call_file and call_line are same as the original abstruct's decl_file
and decl_line respectively.e.g.
[ d1aa] subprogram
external (flag) Yes
name (strp) "add_timer"
decl_file (data1) 2 ;here is original
decl_line (data2) 847 ;line and file
prototyped (flag) Yes
inline (data1) inlined (1)
sibling (ref4) [ d1c6]
...
[ 11d84] subprogram
abstract_origin (ref4) [ d1aa] ; concrete instance
low_pc (addr) .text+0x000000000000246f
high_pc (addr) .text+0x000000000000248b
frame_base (block1) [ 0] call_frame_cfa
sibling (ref4) [ 11dd9]
[ 11d9f] formal_parameter
abstract_origin (ref4) [ d1b9]
location (data4) location list [ 701b]
[ 11da8] inlined_subroutine
abstract_origin (ref4) [ d1aa] ; redundant instance
low_pc (addr) .text+0x000000000000247e
high_pc (addr) .text+0x0000000000002480
call_file (data1) 2 ; call line and file
call_line (data2) 847 ; are same as aboveThose redundant instances leads unwilling results;
e.g. find probe points inside of functions even if we specify
a function entry as below;$ perf probe -V add_timer
Available variables at add_timer
@
struct timer_list* timer
@
(No matched variables)So, this filters out those redundant instances based on call-site and
decl-site information.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110317.19900.59525.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
gcc 4.6 generates a concrete out-of-line instance when there is a
function which is implicitly inlined somewhere but also has its own
instance. The concrete out-of-line instance means that it has an
abstract origin of the function which is referred by not only
inlined-subroutines but also a concrete subprogram.Since current dwarf_func_inline_instances() can find only instances of
inlined-subroutines, this introduces new die_walk_instances() to find
both of subprogram and inlined-subroutines.e.g. without this,
Available variables at sched_group_rt_period
@
struct task_group* tgperf probe failed to find actual subprogram instance of
sched_group_rt_period().With this,
Available variables at sched_group_rt_period
@
struct task_group* tg
@
struct task_group* tgNow it found the sched_group_rt_period() itself.
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110311.19900.63997.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
Fix variable searching logic to search one in inner than local scope or
global(CU) scope. In the other words, skip searching in intermediate
scopes.e.g., in the following code,
int var1;
void inline infunc(int i)
{
i++; without.varsWith this:
$ perf probe -V pre_schedule --externs > with.varsCheck the diff:
$ diff without.vars with.vars
88d87
< int cpu
133d131
< long unsigned int* switch_countThese vars are actually in the scope of schedule(), the caller of
pre_schedule().Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110305.19900.94374.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
Fix perf probe to search local variables in appropriate local inlined
function scope. For example, pre_schedule() has only 2 local variables,
as below;$ perf probe -L pre_schedule
0 static inline void pre_schedule(struct rq *rq, struct task_struct *prev)
{
2 if (prev->sched_class->pre_schedule)
3 prev->sched_class->pre_schedule(rq, prev);
}However, current perf probe shows 4 local variables on pre_schedule(),
because it searches variables in the caller(schedule()) scope.$ perf probe -V pre_schedule
Available variables at pre_schedule
@
int cpu
long unsigned int* switch_count
struct rq* rq
struct task_struct* prevThis patch fixes this issue by searching variables in the local scope of
the instance of inlined function. Here is the result.$ perf probe -V pre_schedule
Available variables at pre_schedule
@
struct rq* rq
struct task_struct* prevCc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110259.19900.85664.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
Check multiple --lines option and print warning informing that only the
first specified --line option is valid.Changes from the 1st post:
- Accept only the first option instead of the last.
- Fix warning message according to David's comment.
- Mark as a bugfix.Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110253.19900.96192.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
Fix line-range collector to walk all instances of inlined function,
because some execution paths can be optimized out depending on the
function argument of instances.E.g.)
inline_func (arg) {
if (arg)
do_something;
else
do_another;
}func_A() {
inline_func(1)
}func_B() {
inline_func(0)
}In this case, func_A may have only do_something code and func_B may have
only do_another.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Masami Hiramatsu
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110247.19900.93702.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
Fix perf probe to walk through the lines of all nested inlined function
call sites and declared lines when a whole CU is passed to the line
walker.The die_walk_lines() can have two different type of DIEs, subprogram (or
inlined-subroutine) DIE and CU DIE.If a caller passes a subprogram DIE, this means that the walker walk on
lines of given subprogram. In this case, it just needs to search on
direct children of DIE tree for finding call-site information of inlined
function which directly called from given subprogram.On the other hand, if a caller passes a CU DIE to the walker, this means
that the walker have to walk on all lines in the source files included
in given CU DIE. In this case, it has to search whole DIE trees of all
subprograms to find the call-site information of all nested inlined
functions.Without this patch:
$ perf probe --line kernel/cpu.c:151-157
static int cpu_notify(unsigned long val, void *v)
{
154 return __cpu_notify(val, v, -1, NULL);
}With this:
$ perf probe --line kernel/cpu.c:151-157152 static int cpu_notify(unsigned long val, void *v)
{
154 return __cpu_notify(val, v, -1, NULL);
}As you can see, --line option with source line range shows the declared
lines as probe-able.Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110241.19900.34994.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
Fix line walker to check whether a given DIE is CU or not.
Actually this function accepts CU, subprogram and inlined_subroutine
DIEs.Without this fix, perf probe always fails to analyze lines on inlined
functions;$ perf probe -L pre_schedule
Debuginfo analysis failed. (-2)
Error: Failed to show lines. (-2)This fixes that bug, as below.
$ perf probe -L pre_schedule
0 static inline void pre_schedule(struct rq *rq, struct task_struct *prev
{
2 if (prev->sched_class->pre_schedule)
3 prev->sched_class->pre_schedule(rq, prev);
}/* rq->lock is NOT held, but preemption is disabled */
Changes from v1:
- Update against current tip tree.(Fix dwarf-aux.c)Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Masami Hiramatsu
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110235.19900.20614.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo -
Fix a memory leak for scopes array when it finds a variable in the
global scope.Reviewed-by: Pekka Enberg
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110229.19900.63019.stgit@fedora15
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo