17 Jan, 2015
1 commit
-
When thread__init_map_groups() fails, a new thread should be removed
from the rbtree since it's gonna be freed. Also update last match cache
only if the function succeeded.Reported-by: David Ahern
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1420763892-15535-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
09 Dec, 2014
1 commit
-
Using flag to distinguish between branch_history and normal callchain.
Move the cpumode to add_callchain_ip function.
No change in behavior.
Signed-off-by: Kan Liang
Acked-by: Jiri Olsa
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1417532814-26208-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo
02 Dec, 2014
1 commit
-
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.- detect overlaps with the callchain
- remove small loop duplicates in the LBRCurrent limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo
25 Nov, 2014
1 commit
-
Jiri reported that the commit 96d78059d6d9 ("perf tools: Make vmlinux
short name more like kallsyms short name") segfaults on perf script.When processing kernel mmap event, it should access the 'kernel'
variable as sometimes it cannot find a matching dso from build-id table
so 'dso' might be invalid.Reported-by: Jiri Olsa
Tested-by: Jiri Olsa
Signed-off-by: Namhyung Kim
Cc: Adrian Hunter
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1416285028-30572-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
19 Nov, 2014
2 commits
-
Use the relative address, this makes get_srcline work correctly in the
end.Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-4-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo -
Move the code to resolve and add a new callchain entry into a new
add_callchain_ip function. This will be used in the next patches to add
LBRs too.No change in behavior.
Signed-off-by: Andi Kleen
Cc: Jiri Olsa
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/1415844328-4884-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo
05 Nov, 2014
2 commits
-
The previous patch changed kernel dso name from '[kernel.kallsyms]' to
vmlinux. However it might add confusion to old users accustomed to the
old name. So change the short name to '[kernel.vmlinux]' to reduce such
confusion.Before:
# Overhead Command Shared Object Symbol
# ........ .............. ....................... ...............................
#
9.83% swapper vmlinux [k] intel_idle
4.10% awk libc-2.20.so [.] __strcmp_sse2
1.86% sed libc-2.20.so [.] __strcmp_sse2
1.78% netctl-auto libc-2.20.so [.] __strcmp_sse2
1.23% netctl-auto libc-2.20.so [.] __mbrtowc
1.21% firefox libxul.so [.] 0x00000000024b62bd
1.20% swapper vmlinux [k] cpuidle_enter_state
1.03% sleep vmlinux [k] copy_user_generic_unrolledAfter:
# Overhead Command Shared Object Symbol
# ........ .............. ....................... ...............................
#
9.83% swapper [kernel.vmlinux] [k] intel_idle
4.10% awk libc-2.20.so [.] __strcmp_sse2
1.86% sed libc-2.20.so [.] __strcmp_sse2
1.78% netctl-auto libc-2.20.so [.] __strcmp_sse2
1.23% netctl-auto libc-2.20.so [.] __mbrtowc
1.21% firefox libxul.so [.] 0x00000000024b62bd
1.20% swapper [kernel.vmlinux] [k] cpuidle_enter_state
1.03% sleep [kernel.vmlinux] [k] copy_user_generic_unrolledSigned-off-by: Namhyung Kim
Cc: Adrian Hunter
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1415063674-17206-9-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux_path() since the default value of the symbol_conf.
try_vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.4/build/vmlinux) if one
is running on a custom kernel.It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.In this patch, it's now changed to try to search a kernel dso in the
existing dso list which was constructed during build-id table parsing
so it'll always have a build-id. If not found, search "[kernel.kallsyms]".Before:
$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...After (for the same perf.data):
72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...Signed-off-by: Namhyung Kim
Acked-by: Ingo Molnar
Link: http://lkml.kernel.org/r/20140924073356.GB1962@gmail.com
Cc: Adrian Hunter
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1415063674-17206-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
04 Nov, 2014
1 commit
-
This patch adds basic support to handle compressed kernel module as some
distro (such as Archlinux) carries on it now. The actual work using
compression library will be added later.Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: Adrian Hunter
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1415063674-17206-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
29 Oct, 2014
4 commits
-
The unwind__get_entries() already receives the thread parameter, from where it can
obtain the matching machine structure, shorten the signature.Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-isjc6bm8mv4612mhi6af64go@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
Shortening function signature lenght too, since a thread's machine can be
obtained from thread->mg->machine, no need to pass thread, machine.Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-5wb6css280ty0cel5p0zo2b1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
So stop passing both machine and thread to several thread methods,
reducing function signature length.Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-ckcy19dcp1jfkmdihdjcqdn1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo -
We were setting this only in machine__init(), i.e. for the map_groups that
holds the kernel module maps, not for the one used for a thread's executable
mmaps.Now we are sure that we can obtain the machine where a thread is by going
via thread->mg->machine, thus we can, in the following patch, make all
codepaths that receive machine _and_ thread, drop the machine one.Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-y6zgaqsvhrf04v57u15e4ybm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
15 Oct, 2014
1 commit
-
A segfault happens on 'perf test hists_link' because we end up using a
struct machines on the stack, and then machines__init() was not
initializing the newly introduced rb_root, just the existing list_head.When we introduced struct dsos, to group the two ways to store dsos,
i.e. the linked list and the rbtree, we didn't turned the initialization
done in:machines__init(machines->host) ->
machine__init() ->
INIT_LIST_HEADinto a dsos__init() to keep on initializing the list_head but _as well_
initializing the rb_root, oops.All worked because outside perf-test we probably zalloc the whole thing
which ends up initializing it in to NULL.So the problem looks contained to 'perf test' that uses it on stack,
etc.Reported-by: Jiri Olsa
Acked-by: Waiman Long ,
Cc: Adrian Hunter ,
Cc: Don Zickus
Cc: Douglas Hatch
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Waiman Long ,
Link: http://lkml.kernel.org/r/20141014180353.GF3198@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
02 Oct, 2014
1 commit
-
With workload that spawns and destroys many threads and processes, it
was found that perf-mem could took a long time to post-process the perf
data after the target workload had completed its operation.The performance bottleneck was found to be the lookup and insertion of
the new DSO structures (thousands of them in this case).In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
perf profile below shows what perf was doing after the profiled AIM7
shared workload completed:- 83.94% perf libc-2.11.3.so [.] __strcmp_sse42
- __strcmp_sse42
- 99.82% map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_main
- 13.17% perf perf [.] __dsos__findnew
__dsos__findnew
map__new
machine__process_mmap_event
perf_session_deliver_event
perf_session__process_event
__perf_session__process_events
cmd_record
cmd_mem
run_builtin
main
__libc_start_mainSo about 97% of CPU times were spent in the map__new() function trying
to insert new DSO entry into the DSO linked list. The whole
post-processing step took about 9 minutes.The DSO structures are currently searched linearly. So the total
processing time will be proportional to n^2.To overcome this performance problem, the DSO code is modified to also
put the DSO structures in a RB tree sorted by its long name in
additional to being in a simple linked list. With this change, the
processing time will become proportional to n*log(n) which will be much
quicker for large n. However, the short name will still be searched
using the old linear searching method. With that patch in place, the
same perf-mem post-processing step took less than 30 seconds to
complete.Signed-off-by: Waiman Long
Cc: Adrian Hunter
Cc: Don Zickus
Cc: Douglas Hatch
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Scott J Norton
Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Arnaldo Carvalho de Melo
30 Sep, 2014
1 commit
-
This is a precursor patch to enable long name searching of DSOs using
a rbtree.In this patch, a new dsos structure is created which contains only a
list head structure for the moment.The new dsos structure is used, in turn, in the machine structure for
the user_dsos and kernel_dsos fields.Only the following 3 dsos functions are modified to accept the new dsos
structure parameter instead of list_head:- dsos__add()
- dsos__find()
- __dsos__findnew()Signed-off-by: Waiman Long
Cc: Adrian Hunter
Cc: Don Zickus
Cc: Douglas Hatch
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Scott J Norton
Link: http://lkml.kernel.org/r/1412021249-19201-2-git-send-email-Waiman.Long@hp.com
[ Move struct dsos to dso.h to reduce the dso methods depends on machine.h ]
Signed-off-by: Arnaldo Carvalho de Melo
23 Aug, 2014
3 commits
-
As we run "perf c2c" on more applications, we noticed we're missing
significant samples from a common customer's application. Looking at
the /proc//maps file for the app, we see "rwxs" and "rwxp"
permissions on many of the shared memory & heap regions, and on all the
thread stacks.Because those regions have the "x" bit set, perf marks them with a
MAP_FUNCTION type. Hence ip_resolve_data() never finds load or store
events coming from them.We fixed this by re-calling thread__find_addr_location with
MAP__FUNCTION in the case where map is NULL as a last ditch effort to
map the sample before giving up and dropping it.Reported-by: Joe Mario
Tested-by: Joe Mario
Signed-off-by: Don Zickus
Acked-by: Jiri Olsa
Cc: Jiri Olsa
Cc: Joe Mario
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1408591511-57884-1-git-send-email-dzickus@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo -
Add a function to determine if an address is in the kernel. This is
based on the kernel function kernel_ip().Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1408129739-17368-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Rename machine__get_kernel_start_addr() to
machine__get_running_kernel_start() so that a new function, with a
similar name to the original name, can be added that gets the kernel
start address from the kernel map.Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1408129739-17368-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo
14 Aug, 2014
2 commits
-
Add machine__thread_exec_comm() to return the comm that matches the last
exec, if the comm_exec flag is present, or the last comm otherwise.Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1406786474-9306-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
For grouping together all the data from a single execution, which is
needed for pairing calls and returns e.g. any outstanding calls when a
process exec's will never return.Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1406786474-9306-2-git-send-email-adrian.hunter@intel.com
[ Remove testing if comm->exec is false before setting it to true ]
Signed-off-by: Arnaldo Carvalho de Melo
24 Jul, 2014
3 commits
-
The thread will be needed to determine the VDSO type.
Reviewed-by: Jiri Olsa
Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1406035081-14301-52-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
The VDSO temporary file is unlinked when a session is deleted. That
precludes the possibilities that there is no session or there is more
than one session.Correctly the vdso belongs to the machine so put the information on
'struct machine' and get rid of the global variables.Reviewed-by: Jiri Olsa
Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/53CF9B14.7040408@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
This is preparation for removing the global variables used in vdso.c and
thereby fixing the lifetime of the VDSO temporary file.Reviewed-by: Jiri Olsa
Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1406035081-14301-45-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo
23 Jul, 2014
1 commit
-
Add an array to struct machine to store the current tid running on each
cpu.Add machine functions to get / set the tid for a cpu.
This will be used to determine the tid when decoding a per-cpu
Instruction Trace.Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1406035081-14301-17-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo
17 Jul, 2014
3 commits
-
__machine__findnew_thread() creates a 'struct thread' but does not free
it on the error path. Fix it.Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1405495184-20441-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
Events like sched_switch do not provide a pid (tgid) which can result in
threads with an unknown pid. If the pid is later discovered, join the
map groups.Note the thread's map groups should be empty because they are populated
by MMAP events which do provide the pid and tid.Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1405498033-23817-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo -
The value used for unknown pids cannot be zero because that is used by
the "idle" task.Use -1 instead. Also handle the unknown pid case when creating map
groups.Note that, threads with an unknown pid should not occur because fork (or
synthesized) events precede the thread's existence.Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1405332185-4050-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo
27 Jun, 2014
1 commit
-
When saving the callchain on Power, the kernel conservatively saves excess
entries in the callchain. A few of these entries are needed in some cases
but not others. We should use the DWARF debug information to determine
when the entries are needed.Eg: the value in the link register (LR) is needed only when it holds the
return address of a function. At other times it must be ignored.If the unnecessary entries are not ignored, we end up with duplicate arcs
in the call-graphs.Use the DWARF debug information to determine if any callchain entries
should be ignored when building call-graphs.Callgraph before the patch:
14.67% 2234 sprintft libc-2.18.so [.] __random
|
--- __random
|
|--61.12%-- __random
| |
| |--97.15%-- rand
| | do_my_sprintf
| | main
| | generic_start_main.isra.0
| | __libc_start_main
| | 0x0
| |
| --2.85%-- do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--38.88%-- rand
|
|--94.01%-- rand
| do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--5.99%-- do_my_sprintf
main
generic_start_main.isra.0
__libc_start_main
0x0Callgraph after the patch:
14.67% 2234 sprintft libc-2.18.so [.] __random
|
--- __random
|
|--95.93%-- rand
| do_my_sprintf
| main
| generic_start_main.isra.0
| __libc_start_main
| 0x0
|
--4.07%-- do_my_sprintf
main
generic_start_main.isra.0
__libc_start_main
0x0TODO: For split-debug info objects like glibc, we can only determine
the call-frame-address only when both .eh_frame and .debug_info
sections are available. We should be able to determin the CFA
even without the .eh_frame section.Fix suggested by Anton Blanchard.
Thanks to valuable input on DWARF debug information from Ulrich Weigand.
Reported-by: Maynard Johnson
Tested-by: Maynard Johnson
Signed-off-by: Sukadev Bhattiprolu
Link: http://lkml.kernel.org/r/20140625154903.GA29607@us.ibm.com
Signed-off-by: Jiri Olsa
20 Jun, 2014
1 commit
-
The function machine__get_kernel_start_addr() was taking the first symbol
of kallsyms as the start address. This is incorrect in certain cases
where the first symbol is something at 0, while the actual kernel
functions begin at a later point (e.g. 0x80200000).This patch fixes machine__get_kernel_start_addr() to search for the
symbol "_text" or "_stext", which marks the beginning of kernel mapping.
This was already being done in machine__create_kernel_maps(). Thus, this
patch is just a refactor, to move that code into
machine__get_kernel_start_addr().Signed-off-by: Simon Que
Link: http://lkml.kernel.org/r/1402943529-13244-1-git-send-email-sque@chromium.org
Signed-off-by: Jiri Olsa
09 Jun, 2014
1 commit
-
The kernel piece passes more info now. Update the perf tool to reflect
that and adjust the synthesized maps to play along.Signed-off-by: Don Zickus
Link: http://lkml.kernel.org/r/1400526833-141779-4-git-send-email-dzickus@redhat.com
Signed-off-by: Jiri Olsa
01 May, 2014
1 commit
-
Conflicts:
tools/perf/arch/x86/tests/dwarf-unwind.cSigned-off-by: Ingo Molnar
30 Apr, 2014
1 commit
-
Modules installed outside of the kernel's build system should go into
"%s/lib/modules/%s/extra", but at present, perf will only look at them
when they are in "%s/lib/modules/%s/kernel". Lets encourage good
citizenship by relaxing this requirement to "%s/lib/modules/%s". This
way open source modules that are out-of-tree have no incentive to start
populating a directory reserved for in-kernel modules and I can stop
hex-editing my system's perf binary when profiling OSS out-of-tree
modules.Feedback from Namhyung Kim correctly revealed that the hex-edits that I
had been doing meant that perf was also traversing the build and source
symlinks in %s/lib/modules/%s. That is undesireable, so we explicitly
exclude them from traversal with a minor tweak to the traversal routine.Signed-off-by: Richard Yao
Acked-by: Namhyung kim
Link: http://lkml.kernel.org/r/1398532675-13684-1-git-send-email-ryao@gentoo.org
Signed-off-by: Jiri Olsa
28 Apr, 2014
1 commit
-
Sharing map groups within all process threads. This way
there's only one copy of mmap info and it's reachable
from any thread within the process.Original-patch-by: Arnaldo Carvalho de Melo
Acked-by: Namhyung Kim
Cc: Adrian Hunter
Cc: Corey Ashford
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1397490723-1992-5-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa
19 Mar, 2014
2 commits
-
Now that we can properly synthesize threads system-wide, make sure the
mmap and mmap2 events use tids instead of pids to locate their maps.Signed-off-by: Don Zickus
Cc: Jiri Olsa
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1393429527-167840-3-git-send-email-dzickus@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo -
By turning the addr_location->filtered member from a boolean to a u8
bitmap, reusing (and extending) the hist_filter enum for that.This patch doesn't change the logic at all, as it keeps the meaning of
al->filtered !0 to mean that the entry _was_ filtered, so no change in
how this value is interpreted needs to be done at this point.This will be soon used in upcoming patches.
Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: Andi Kleen
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-89hmfgtr9t22sky1lyg7nw7l@git.kernel.org
[ yanked this out of a previous patch ]
Signed-off-by: Arnaldo Carvalho de Melo
15 Mar, 2014
2 commits
-
Forcing the code to always search thread by pid/tid pair.
The PID value will be needed in future to determine the process thread
leader for map groups sharing.Signed-off-by: Jiri Olsa
Acked-by: Adrian Hunter
Cc: Corey Ashford
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1394805606-25883-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo -
Its one level up thread__find_addr_location, where it will look in
different domains for a sample: user, kernel, hypervisor, etc.Will soon be used by a patchkit by Andi Kleen.
Cc: Adrian Hunter
Cc: Andi Kleen
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-so6nxkh7xj48bc5kq4jpj991@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
11 Mar, 2014
1 commit
-
Merge the latest fixes.
Signed-off-by: Ingo Molnar
10 Mar, 2014
1 commit
-
When trying to map a bunch of instruction addresses to their respective
threads, I kept getting a lot of bogus entries [I forget the exact
reason as I patched my code months ago].Looking through ip__resolve_ams, I noticed the check for
if (al.sym)
and realized, most times I have an al.map definition but sometimes an
al.sym is undefined. In the cases where al.sym is undefined, the loop
keeps going even though a valid al.map exists.Modify this check to use the more reliable al.map. This fixed my bogus
entries.Signed-off-by: Don Zickus
Cc: Jiri Olsa
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1393386227-149412-2-git-send-email-dzickus@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo