Eric Lee / smarc-fsl-linux-kernel

24 Aug, 2016

1 commit

4fc76e495 perf sched: Use linux/time64.h ... Browse Code »

Probably the next step is to introduce linux/time.h and use
timespec_to_ns(), etc.

Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Steven Rostedt
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-4nqhskn27fn93cz3ukbc8drf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2016-08-24 02:37:33 +0800

13 Jul, 2016

1 commit

c8b5f2c96 tools: Introduce str_error_r() ... Browse Code »

The tools so far have been using the strerror_r() GNU variant, that
returns a string, be it the buffer passed or something else.

But that, besides being tricky in cases where we expect that the
function using strerror_r() returns the error formatted in a provided
buffer (we have to check if it returned something else and copy that
instead), breaks the build on systems not using glibc, like Alpine
Linux, where musl libc is used.

So, introduce yet another wrapper, str_error_r(), that has the GNU
interface, but uses the portable XSI variant of strerror_r(), so that
users rest asured that the provided buffer is used and it is what is
returned.

Cc: Adrian Hunter
Cc: David Ahern
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Wang Nan
Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2016-07-13 02:19:47 +0800

13 Apr, 2016

5 commits

73643bb6a perf sched map: Display only given cpus ... Browse Code »

Introducing --cpus option that will display only given cpus. Could be
used together with color-cpus option.

$ perf sched map --cpus 0,1
*A0 309999.786924 secs A0 => rcu_sched:7
*. 309999.786930 secs
*B0 . 309999.786931 secs B0 => rcuos/2:25
B0 *A0 309999.786947 secs

Signed-off-by: Jiri Olsa
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1460467771-26532-9-git-send-email-jolsa@kernel.org
[ Added entry to man page ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2016-04-13 21:11:52 +0800
cf294f24f perf sched map: Color given cpus ... Browse Code »

Adding --color-cpus option to display selected cpus with background
color (red by default). It helps on navigating through the perf sched
map output.

Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1460467771-26532-8-git-send-email-jolsa@kernel.org
[ Added entry to man page ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2016-04-13 21:11:51 +0800
a151a37a7 perf sched map: Color given pids ... Browse Code »

Adding --color-pids option to display selected pids in color (blue by
default). It helps on navigating through the 'perf sched map' output.

Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1460467771-26532-7-git-send-email-jolsa@kernel.org
[ Added entry to man page ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2016-04-13 21:11:51 +0800
8cd91195e perf sched: Use color_fprintf for output ... Browse Code »

As preparation for next patch.

Signed-off-by: Jiri Olsa
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1460467771-26532-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2016-04-13 21:11:51 +0800
99623c628 perf sched: Add compact display option ... Browse Code »

Add compact map display that does not output the whole cpu matrix, only
cpus that got event.

$ perf sched map --compact
*A0 1082427.094098 secs A0 => perf:19404 (CPU 2)
A0 *. 1082427.094127 secs . => swapper:0 (CPU 1)
A0 . *B0 1082427.094174 secs B0 => rcuos/2:25 (CPU 3)
A0 . *. 1082427.094177 secs
*C0 . . 1082427.094187 secs C0 => migration/2:21
C0 *A0 . 1082427.094193 secs
*. A0 . 1082427.094195 secs
*D0 A0 . 1082427.094402 secs D0 => rngd:968
*. A0 . 1082427.094406 secs
. *E0 . 1082427.095221 secs E0 => kworker/1:1:5333
. E0 *F0 1082427.095227 secs F0 => xterm:3342

It helps to display sane output for small thread loads on big cpu
servers.

Signed-off-by: Jiri Olsa
Tested-by: Arnaldo Carvalho de Melo
Cc: David Ahern
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1460467771-26532-4-git-send-email-jolsa@kernel.org
[ Add entry in 'perf sched' man page ]
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2016-04-13 21:11:51 +0800

18 Dec, 2015

1 commit

4b6ab94ea perf subcmd: Create subcmd library ... Browse Code »

Move the subcommand-related files from perf to a new library named
libsubcmd.a.

Since we're moving files anyway, go ahead and rename 'exec_cmd.*' to
'exec-cmd.*' to be consistent with the naming of all the other files.

Signed-off-by: Josh Poimboeuf
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/c0a838d4c878ab17fee50998811612b2281355c1.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo

Josh Poimboeuf
2015-12-18 01:27:14 +0800

05 Nov, 2015

1 commit

0014de172 perf sched latency: Fix thread pid reuse issue ... Browse Code »

The latency subcommand holds a tree of working atoms sorted by thread's
pid/tid. If there's new thread with same pid and tid, the old working atom is
found and assert bug condition is hit in search function:

thread_atoms_search: Assertion `!(thread != atoms->thread)' failed

Changing the sort function to use thread object pointers together with pid and
tid check. This way new thread will never find old one with same pid/tid.

Link: http://lkml.kernel.org/n/tip-o4doazhhv0zax5zshkg8hnys@git.kernel.org
Reported-by: Mohit Agrawal
Signed-off-by: Jiri Olsa
Acked-by: Namhyung Kim
Cc: David Ahern
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1446462625-15807-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2015-11-05 23:51:00 +0800

27 Oct, 2015

1 commit

c71183697 perf tools: Introduce usage_with_options_msg() ... Browse Code »

Now usage_with_options() setup a pager before printing message so normal
printf() or pr_err() will not be shown. The usage_with_options_msg()
can be used to print some help message before usage strings.

Signed-off-by: Namhyung Kim
Acked-by: Masami Hiramatsu
Cc: David Ahern
Cc: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1445701767-12731-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2015-10-27 20:28:44 +0800

27 May, 2015

1 commit

2f80dd448 perf sched: Add option to merge like comms to lat output ... Browse Code »

Sometimes when debugging large multi-threaded applications it is helpful
to collate all of the latency numbers into one bulk record to get an
idea of what is going on.

This patch does this by merging any entries that belong to the same comm
into one entry and then spits out those totals.

I've also slightly changed the output so you can see how many threads
were merged in the processing. Here is the new default output format

-----------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |
-----------------------------------------------------------------------------------------------------------
chrome:(23) | 740.878 ms | 2612 | avg: 0.022 ms | max: 0.845 ms | max at: 7935.254223 s
pulseaudio:1523 | 94.440 ms | 597 | avg: 0.027 ms | max: 0.110 ms | max at: 7934.668372 s
threaded-ml:6042 | 72.554 ms | 386 | avg: 0.035 ms | max: 1.186 ms | max at: 7935.330911 s
Chrome_IOThread:3832 | 52.388 ms | 456 | avg: 0.021 ms | max: 1.365 ms | max at: 7935.330602 s
Chrome_ChildIOT:(7) | 50.694 ms | 743 | avg: 0.021 ms | max: 1.448 ms | max at: 7935.256659 s
Compositor:5510 | 30.012 ms | 192 | avg: 0.019 ms | max: 0.131 ms | max at: 7936.636815 s
plugin_audio_th:6043 | 24.828 ms | 314 | avg: 0.018 ms | max: 0.143 ms | max at: 7936.205994 s
CompositorTileW:(2) | 14.099 ms | 45 | avg: 0.022 ms | max: 0.153 ms | max at: 7937.521800 s

the (#) after the task is the number of tasks merged, and then if there were
no tasks merged it just shows the pid. Here is the same trace file with the -p
option to print the per-pid latency numbers

-----------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |
-----------------------------------------------------------------------------------------------------------
chrome:5500 | 386.872 ms | 387 | avg: 0.023 ms | max: 0.241 ms | max at: 7936.001694 s
pulseaudio:1523 | 94.440 ms | 597 | avg: 0.027 ms | max: 0.110 ms | max at: 7934.668372 s
threaded-ml:6042 | 72.554 ms | 386 | avg: 0.035 ms | max: 1.186 ms | max at: 7935.330911 s
chrome:10226 | 69.710 ms | 251 | avg: 0.023 ms | max: 0.764 ms | max at: 7935.992305 s
chrome:4267 | 64.551 ms | 418 | avg: 0.021 ms | max: 0.294 ms | max at: 7937.862427 s
chrome:4827 | 62.268 ms | 54 | avg: 0.029 ms | max: 0.666 ms | max at: 7935.992813 s
Chrome_IOThread:3832 | 52.388 ms | 456 | avg: 0.021 ms | max: 1.365 ms | max at: 7935.330602 s
chrome:3776 | 46.150 ms | 349 | avg: 0.023 ms | max: 0.845 ms | max at: 7935.254223 s

Signed-off-by: Josef Bacik
Acked-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/1432300720-30478-1-git-send-email-jbacik@fb.com
Signed-off-by: Arnaldo Carvalho de Melo

Josef Bacik
2015-05-27 23:21:45 +0800

09 May, 2015

1 commit

b91fc39f4 perf machine: Protect the machine->threads with a rwlock ... Browse Code »

In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.

That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.

So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.

I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".

The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.

Acked-by: David Ahern
Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2015-05-09 03:19:27 +0800

08 Apr, 2015

9 commits

ff5f3bbd4 perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instea… ... Browse Code »

…d of the default value 10

Since sched->replay_repeat is set to 10 as default, the sched->run_avg,
sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use
10 to calculate their value.

However, the replay_repeat can be changed to other value by using -r
option, so the calculation above should use replay_repeat to achieve
more accurate results instead of the default value 10.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Yunlong Song
2015-04-08 20:07:27 +0800
f0dd330fd perf sched replay: Support using -f to override perf.data file ownership ... Browse Code »

Enable to use perf.data when it is not owned by current user or root.

Example:

$ ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data
$ sudo id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)

Before this patch:

$ sudo perf sched replay -f
run measurement overhead: 98 nsecs
sleep measurement overhead: 52909 nsecs
the run test took 1000015 nsecs
the sleep test took 1054253 nsecs
File perf.data not owned by current user or root (use -f to override)

As shown above, the -f option does not work at all.

After this patch:

$ sudo perf sched replay -f
run measurement overhead: 221 nsecs
sleep measurement overhead: 40514 nsecs
the run test took 1000003 nsecs
the sleep test took 1056098 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
...
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( : 0), nr_events: 10
------------------------------------------------------------
#1 : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18
#2 : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17
#3 : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48
#4 : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77
#5 : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43
#6 : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56
#7 : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62
#8 : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45
#9 : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87
#10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45

As shown above, the -f option really works now.

Besides for replay, -f option can also work for latency and map.

Signed-off-by: Yunlong Song
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo

Yunlong Song
2015-04-08 20:07:26 +0800
939cda521 perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files ... Browse Code »

The soft maximum number of open files for a calling process is 1024,
which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
hard maximum number of open files for a calling process is 4096, which
is defined as INR_OPEN_MAX in include/uapi/linux/fs.h.

Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of
RLIMIT_NOFILE in include/asm-generic/resource.h.

And the soft maximum number finally decides the limitation of the
maximum files which are allowed to be opened.

That is to say a process can use at most 1024 file descriptors for its
o pened files, or an EMFILE error will happen.

This error can be fixed by increasing the soft maximum number, under the
constraint that the soft maximum number can not exceed the hard maximum
number, or both soft and hard maximum number should be increased
simultaneously with privilege.

For perf sched replay, it uses sys_perf_event_open to create the file
descriptor for each of the tasks in order to handle information of perf
events.

That is to say each task needs a unique file descriptor. In x86_64,
there may be over 1024 or 4096 tasks correspoinding to the record in
perf.data, which causes that no enough file descriptors can be used.

As a result, EMFILE error happens and stops the replay process. To solve
this problem, we adaptively increase the soft and hard maximum number of
open files with a '-f' option.

Example:

Test environment: x86_64 with 160 cores

$ cat /proc/sys/kernel/pid_max
163840
$ cat /proc/sys/fs/file-max
6815744
$ ulimit -Sn
1024
$ ulimit -Hn
4096

Before this patch:

$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( : 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)

After this patch:

$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( : 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
Have a try with -f option

$ perf sched replay -f
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( : 0), nr_events: 10
------------------------------------------------------------
#1 : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
#2 : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
#3 : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
#4 : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
#5 : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
#6 : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
#7 : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
#8 : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
#9 : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
#10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48

Signed-off-by: Yunlong Song
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo

Yunlong Song
2015-04-08 20:07:26 +0800
1aff59be5 perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task ... Browse Code »

Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem).

The sem_wait can continue only when work_done_sem is greater than 0, or
it will be blocked.

For perf sched replay, one task may sem_post the work_done_sem of
another task, which causes the work_done_sem of that task processed in a
reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...

This sequence simulates the sched process of the running tasks at the
time when perf sched record runs.

As a result, all the tasks are required and their threads must be
successfully created.

If any one (task A) of the tasks fails to create its thread, then
another task (task B), whose work_done_sem needs sem_post from that
failed task A, may likely block itself due to seg_wait.

And this is a dead halt, since task B's thread_func cannot continue at
all.

To solve this problem, perf sched replay should exit once any task fails
to create its thread.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

$ perf sched replay
...
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
------------------------------------------------------------ : 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
$

As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.

Signed-off-by: Yunlong Song
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo

Yunlong Song
2015-04-08 20:07:25 +0800
08097abc1 perf sched replay: Fix the segmentation fault problem caused by pr_err in threads ... Browse Code »

The pr_err in self_open_counters() prints error message to stderr.
Unlike stdout, stderr uses memory buffer on the stack of each calling
process.

The pr_err in self_open_counters() works in a thread called thread_func
created in function create_tasks, which concurrently creates
sched->nr_tasks threads.

If the error happens and pr_err prints the error message in each of
these threads, the stack size of the perf process (default is 8192
kbytes) will quickly run out and the segmentation fault will happen
then.

To solve this problem, pr_err with self_open_counters() should be moved
from newly created threads to the old main thread of the perf process.
Then the pr_err can work in a stable situation without the strange
segmentation fault problem.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( : 0), nr_events: 10
Segmentation fault

After this patch:

$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( : 0), nr_events: 10
...

As shown above, the result continues without any segmentation fault.

Signed-off-by: Yunlong Song
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo

Yunlong Song
2015-04-08 20:07:24 +0800
3a423a5c3 perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the di… ... Browse Code »

…fferent pid_max configurations

Although the memory of pid_to_task can be allocated via calloc according
to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
pid_max is changed after 'perf sched record' has created its perf.data.

If the new pid_max configured in 'perf sched replay' is smaller than the
old pid_max configured in 'perf sched record', then it will cause the
assertion failure problem.

To solve this problem, we realloc the memory of pid_to_task stepwise
once the passed-in pid parameter in register_pid is larger than the
current pid_max.

Example:

Test environment: x86_64 with 160 cores

$ cat /proc/sys/kernel/pid_max
163840
$ perf sched record ls
$ echo 5000 > /proc/sys/kernel/pid_max
$ cat /proc/sys/kernel/pid_max
5000

Before this patch:

$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55356 nsecs
the run test took 1000011 nsecs
the sleep test took 1060940 nsecs
perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
long)pid_max)' failed.
Aborted

After this patch:

$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55611 nsecs
the run test took 1000026 nsecs
the sleep test took 1060486 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
task 3 ( :5: 5), nr_events: 1
...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Yunlong Song
2015-04-08 20:07:23 +0800
cb06ac256 perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the u… ... Browse Code »

…nexpected change of pid_max

The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
is in a permanent and preset way, and it has two problems:

Problem 1: If the pid_max, which is the max number of pids in the
system, is much smaller than MAX_PID (1024*1000), then it causes a waste
of stack memory. This may happen in the case where the number of cpu
cores is much smaller than 1000.

Problem 2: If the pid_max is changed from the default value to a value
larger than MAX_PID, then it will cause assertion failure problem. The
maximum value of pid_max can be set to pid_max_max (see pidmap_init
defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
value is much larger than MAX_PID, and will take up 32768 Kbytes
(4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
larger than the default 8192 Kbytes of the stack size of calling
process.

Due to these two problems, we use calloc to allocate the memory of
pid_to_task dynamically.

Example:

Test environment: x86_64 with 160 cores

$ cat /proc/sys/kernel/pid_max
163840
$ echo 1025000 > /proc/sys/kernel/pid_max
$ cat /proc/sys/kernel/pid_max
1025000

Run some applications until the pid of some process is greater than
the value of MAX_PID (1024*1000).

Before this patch:

$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55480 nsecs
the run test took 1000008 nsecs
the sleep test took 1063151 nsecs
perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
failed.
Aborted

After this patch:

$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55435 nsecs
the run test took 1000004 nsecs
the sleep test took 1059312 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
task 3 ( :5: 5), nr_events: 1
...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Yunlong Song
2015-04-08 20:07:22 +0800
a35e27d0e perf sched replay: Increase the MAX_PID value to fix assertion failure problem ... Browse Code »

Current MAX_PID is only 65536, which will cause assertion failure problem
when CPU cores are more than 64 in x86_64.

This is because the pid_max value in x86_64 is at least
PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init
defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in
include/linux/threads.h).

Thus for MAX_PID = 65536, the correspoinding CPU cores are
65536/1024=64. This is obviously not enough at all for x86_64, and will
cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the
codes.

We increase MAX_PID value from 65536 to 1024*1000, which can be used in
x86_64 with 1000 cores.

This number is finally decided according to the limitation of stack size
of calling process.

Use 'ulimit -a', the result shows the stack size of any process is 8192
Kbytes, which is defined in include/uapi/linux/resource.h (#define
_STK_LIM (8*1024*1024)).

Thus we choose a large enough value for MAX_PID, and make it satisfy to
the limitation of the stack size, i.e., making the perf process take up
a memory space just smaller than 8192 Kbytes.

We have calculated and tested that 1024*1000 is OK for MAX_PID.

This means perf sched replay can now be used with at most 1000 cores in
x86_64 without any assertion failure problem.

Example:

Test environment: x86_64 with 160 cores

$ cat /proc/sys/kernel/pid_max
163840

Before this patch:

$ perf sched replay
run measurement overhead: 240 nsecs
sleep measurement overhead: 55379 nsecs
the run test took 1000004 nsecs
the sleep test took 1059424 nsecs
perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)'
failed.
Aborted

After this patch:

$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55397 nsecs
the run test took 999920 nsecs
the sleep test took 1053313 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
task 3 ( :5: 5), nr_events: 1
...

Signed-off-by: Yunlong Song
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo

Yunlong Song
2015-04-08 20:07:21 +0800
0755bc4dc perf sched replay: Use struct task_desc instead of struct task_task for correct meaning ... Browse Code »

There is no struct task_task at all, thus it is a typo error in the old
commits, now fix it to what it should be in order to avoid unnecessary
misunderstanding.

Signed-off-by: Yunlong Song
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Wang Nan
Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo

Yunlong Song
2015-04-08 20:07:19 +0800

11 Mar, 2015

1 commit

b7b61cbeb perf ordered_events: Shorten function signatures ... Browse Code »

By keeping pointers to machines, evlist and tool in ordered_events.

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-0c6huyaf59mqtm2ek9pmposl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2015-03-11 21:17:09 +0800

03 Mar, 2015

2 commits

ae536acfa perf sched: No need to keep the session around ... Browse Code »

We were keeping the session around just because we kept pointers to
struct thread instances, but now we reference count them, so no need
for deferring the perf_session__delete call to after we traverse the
work_list entries.

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-9agtck6jdr3rebdp39z1lo0e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2015-03-03 11:17:12 +0800
f3b623b84 perf tools: Reference count struct thread ... Browse Code »

We need to do that to stop accumulating entries in the dead_threads
linked list, i.e. we were keeping references to threads in struct hists
that continue to exist even after a thread exited and was removed from
the machine threads rbtree.

We still keep the dead_threads list, but just for debugging, allowing us
to iterate at any given point over the threads that still are referenced
by things like struct hist_entry.

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-3ejvfyed0r7ue61dkurzjux4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2015-03-03 11:17:08 +0800

23 Feb, 2015

1 commit

75be989a7 perf evlist: Adopt events_stats from perf_session ... Browse Code »

For tools that don't deal with perf.data files, thus do not need to
use perf_session.

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-kglq67gvauq9tak02a4se00r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2015-02-23 09:22:57 +0800

09 Oct, 2014

1 commit

b3f25b6e0 perf sched: Stop updating hists stats, not used ... Browse Code »

Not used here, remove to reduce perf_evsel/hists structs interaction.

Cc: Adrian Hunter
Cc: Borislav Petkov
Cc: David Ahern
Cc: Don Zickus
Cc: Frederic Weisbecker
Cc: Jean Pihet
Cc: Jiri Olsa
Cc: Mike Galbraith
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/n/tip-cb7wkk4a3jpoovzim914ih3c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2014-10-09 22:46:35 +0800

16 Aug, 2014

1 commit

fb74fbda4 perf sched: Use strerror_r instead of strerror ... Browse Code »

Use strerror_r instead of strerror in error message for thread-safety.

Signed-off-by: Masami Hiramatsu
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Naohiro Aota
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20140814022247.3545.4564.stgit@kbuild-fedora.novalocal
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2014-08-16 00:07:47 +0800

14 Aug, 2014

2 commits

0a7e6d1b6 perf tools: Check recorded kernel version when finding vmlinux ... Browse Code »

Currently vmlinux_path__init() only tries to find vmlinux file from
current directory, /boot and some canonical directories with version
number of the running kernel. This can be a problem when reporting old
data recorded on a kernel version not running currently.

We can use --symfs option for this but it's annoying for user to do it
always. As we already have the info in the perf.data file, it can be
changed to use it for the search automatically.

Before:

$ perf report
...
# Samples: 4K of event 'cpu-clock'
# Event count (approx.): 1067250000
#
# Overhead Command Shared Object Symbol
# ........ .......... ................. ..............................
71.87% swapper [kernel.kallsyms] [k] recover_probed_instruction

After:

# Overhead Command Shared Object Symbol
# ........ .......... ................. ....................
71.87% swapper [kernel.kallsyms] [k] native_safe_halt

This requires to change signature of symbol__init() to receive struct
perf_session_env *.

Reported-by: Minchan Kim
Signed-off-by: Namhyung Kim
Cc: Adrian Hunter
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Masami Hiramatsu
Cc: Minchan Kim
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1407825645-24586-14-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-08-14 03:42:21 +0800
049341061 perf sched: Move call to symbol__init() after creating session ... Browse Code »

This is a preparation of fixing dso__load_kernel_sym(). It needs a
session info before calling symbol__init().

Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: Adrian Hunter
Cc: David Ahern
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Minchan Kim
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1407825645-24586-10-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2014-08-14 03:34:29 +0800

12 Aug, 2014

1 commit

0a8cb85c2 perf tools: Rename ordered_samples bool to ordered_events ... Browse Code »

The time ordering is generic for all kinds of events, so using generic
name 'ordered_events' for ordered_samples bool in perf_tool struct.

No functional change was intended.

Signed-off-by: Jiri Olsa
Acked-by: David Ahern
Cc: Corey Ashford
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Jean Pihet
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-07mrqzcuhsks9wfmxrzsvemz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2014-08-12 23:02:54 +0800

18 Jul, 2014

1 commit

57480d2cd perf tools: Enable close-on-exec flag on perf file descriptor ... Browse Code »

In commit a21b0b354d4a ('perf: Introduce a flag to enable
close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC
was added to perf_event_open(2) syscall to allows userspace
to atomically enable close-on-exec behavor when creating
the file descriptor.

This patch makes perf tools use the new flag if supported
by the kernel, so that the event file descriptors got
automatically closed if perf tool exec a sub-command.

Signed-off-by: Yann Droneaud
Cc: Andi Kleen
Cc: Arnaldo Carvalho de Melo
Cc: Ingo Molnar
Cc: Jiri Olsa
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/1404160127-7475-1-git-send-email-ydroneaud@opteya.com
Signed-off-by: Jiri Olsa

Yann Droneaud
2014-07-18 15:09:34 +0800

17 Jul, 2014

1 commit

1fcb87686 perf machine: Fix the value used for unknown pids ... Browse Code »

The value used for unknown pids cannot be zero because that is used by
the "idle" task.

Use -1 instead. Also handle the unknown pid case when creating map
groups.

Note that, threads with an unknown pid should not occur because fork (or
synthesized) events precede the thread's existence.

Signed-off-by: Adrian Hunter
Cc: David Ahern
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1405332185-4050-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo

Adrian Hunter
2014-07-17 04:57:33 +0800

01 Jun, 2014

1 commit

1844dbcbe perf tools: Introduce hists__inc_nr_samples() ... Browse Code »

There're some duplicate code for counting number of samples. Add
hists__inc_nr_samples() and reuse it.

Suggested-by: Jiri Olsa
Signed-off-by: Namhyung Kim
Link: http://lkml.kernel.org/r/1401335910-16832-2-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa

Namhyung Kim
2014-06-01 20:34:55 +0800

16 May, 2014

2 commits

9d372ca59 perf sched: Cleanup, remove unused variables in map_switch_event() ... Browse Code »

In map_switch_event(), we don't care the previous process currently,
this patch remove the infomation we get but not used.

Signed-off-by: Dongsheng Yang
Link: http://lkml.kernel.org/r/1400218625-14613-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Jiri Olsa

Dongsheng Yang
2014-05-16 15:17:50 +0800
67d6259dd perf sched: Remove nr_state_machine_bugs in perf latency ... Browse Code »

As we do not use .success in sched_wakeup event any more, then
we can not guarantee that the task when wakeup event happen is
out of run queue. So the message of nr_state_machine_bugs is
not correct.

Signed-off-by: Dongsheng Yang
Link: http://lkml.kernel.org/r/1399945101-21736-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Jiri Olsa

Dongsheng Yang
2014-05-16 15:17:36 +0800

13 May, 2014

1 commit

0680ee7db perf tools: Remove usage of trace_sched_wakeup(.success) ... Browse Code »

trace_sched_wakeup(.success) is a dead argument and has been for ages,
the only reason its still there is because of brain dead software, which
apparently includes perf tools

There's a few more instances in pearly snake shit, but that's not
supported as far as I care anyhow, so let that bitrot.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20140512181946.GG13467@laptop.programming.kicks-ass.net
Signed-off-by: Jiri Olsa

Peter Zijlstra
2014-05-13 03:13:44 +0800

12 May, 2014

3 commits

6bcab4e1e perf tools: Clarify the output of perf sched map. ... Browse Code »

In output of perf sched map, any shortname of thread will be explained
at the first time when it appear.

Example:
*A0 228836.978985 secs A0 => perf:23032
*. A0 228836.979016 secs B0 => swapper:0
. *C0 228836.979099 secs C0 => migration/3:22
*A0 . C0 228836.979115 secs
A0 . *. 228836.979115 secs

But B0, which is explained as swapper:0 did not appear in the
left part of output. Instead, we use '.' as the shortname of
swapper:0. So the comment of "B0 => swapper:0" is not easy to
understand.

This patch clarify the output of perf sched map with not allocating
one letter-number shortname for swapper:0 and print ". => swapper:0"
as the explanation for swapper:0.

Example:
*A0 228836.978985 secs A0 => perf:23032
* . A0 228836.979016 secs . => swapper:0
. *B0 228836.979099 secs B0 => migration/3:22
*A0 . B0 228836.979115 secs
A0 . * . 228836.979115 secs
A0 *C0 . 228836.979225 secs C0 => ksoftirqd/2:18
A0 *D0 . 228836.979236 secs D0 => rcu_sched:7

Signed-off-by: Dongsheng
Acked-by: Ingo Molnar
Link: http://lkml.kernel.org/r/1399354741-19522-1-git-send-email-yangds.fnst@cn.fujitsu.com
[ small style fixes to make checkpatch happy ]
Signed-off-by: Jiri Olsa

Dongsheng
2014-05-12 17:09:05 +0800
e936e8e45 perf tools: Adapt the TASK_STATE_TO_CHAR_STR to new value in kernel space. ... Browse Code »

Currently, TASK_STATE_TO_CHAR_STR in kernel space is already expanded to RSDTtZXxKWP,
but it is still RSDTtZX in perf sched tool.

This patch update TASK_STATE_TO_CHAR_STR to the new value in kernel space.

Signed-off-by: Dongsheng
Link: http://lkml.kernel.org/r/6d2f55dc1e02c1e29a5d70bfeb9d6e8863caf2aa.1399273302.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Jiri Olsa

Dongsheng
2014-05-12 16:01:49 +0800
7fff95978 perf tools: Add missing event for perf sched record. ... Browse Code »

We should record and process sched:sched_wakeup_new event in
perf sched tool, but currently, there is the process function
for it, without recording it in record subcommand.

This patch add -e sched:sched_wakeup_new to perf sched record.

Signed-off-by: Dongsheng
Link: http://lkml.kernel.org/r/710c6edd2162b2cea1711443f54de47c0210d9fd.1399273302.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Jiri Olsa

Dongsheng
2014-05-12 16:01:41 +0800

16 Apr, 2014

1 commit

a83edb2df perf sched: Introduce --list-cmds for use by scripts ... Browse Code »

Signed-off-by: Ramkumar Ramachandra
Acked-by: David Ahern
Cc: David Ahern
Cc: Jiri Olsa
Link: http://lkml.kernel.org/r/1394853474-31019-5-git-send-email-artagnon@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Jiri Olsa

Ramkumar Ramachandra
2014-04-16 23:16:05 +0800