24 Aug, 2016

1 commit

  • Probably the next step is to introduce linux/time.h and use
    timespec_to_ns(), etc.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Steven Rostedt
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-4nqhskn27fn93cz3ukbc8drf@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

13 Jul, 2016

1 commit

  • The tools so far have been using the strerror_r() GNU variant, that
    returns a string, be it the buffer passed or something else.

    But that, besides being tricky in cases where we expect that the
    function using strerror_r() returns the error formatted in a provided
    buffer (we have to check if it returned something else and copy that
    instead), breaks the build on systems not using glibc, like Alpine
    Linux, where musl libc is used.

    So, introduce yet another wrapper, str_error_r(), that has the GNU
    interface, but uses the portable XSI variant of strerror_r(), so that
    users rest asured that the provided buffer is used and it is what is
    returned.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

13 Apr, 2016

5 commits

  • Introducing --cpus option that will display only given cpus. Could be
    used together with color-cpus option.

    $ perf sched map --cpus 0,1
    *A0 309999.786924 secs A0 => rcu_sched:7
    *. 309999.786930 secs
    *B0 . 309999.786931 secs B0 => rcuos/2:25
    B0 *A0 309999.786947 secs

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1460467771-26532-9-git-send-email-jolsa@kernel.org
    [ Added entry to man page ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding --color-cpus option to display selected cpus with background
    color (red by default). It helps on navigating through the perf sched
    map output.

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1460467771-26532-8-git-send-email-jolsa@kernel.org
    [ Added entry to man page ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding --color-pids option to display selected pids in color (blue by
    default). It helps on navigating through the 'perf sched map' output.

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1460467771-26532-7-git-send-email-jolsa@kernel.org
    [ Added entry to man page ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • As preparation for next patch.

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1460467771-26532-5-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Add compact map display that does not output the whole cpu matrix, only
    cpus that got event.

    $ perf sched map --compact
    *A0 1082427.094098 secs A0 => perf:19404 (CPU 2)
    A0 *. 1082427.094127 secs . => swapper:0 (CPU 1)
    A0 . *B0 1082427.094174 secs B0 => rcuos/2:25 (CPU 3)
    A0 . *. 1082427.094177 secs
    *C0 . . 1082427.094187 secs C0 => migration/2:21
    C0 *A0 . 1082427.094193 secs
    *. A0 . 1082427.094195 secs
    *D0 A0 . 1082427.094402 secs D0 => rngd:968
    *. A0 . 1082427.094406 secs
    . *E0 . 1082427.095221 secs E0 => kworker/1:1:5333
    . E0 *F0 1082427.095227 secs F0 => xterm:3342

    It helps to display sane output for small thread loads on big cpu
    servers.

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1460467771-26532-4-git-send-email-jolsa@kernel.org
    [ Add entry in 'perf sched' man page ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

18 Dec, 2015

1 commit

  • Move the subcommand-related files from perf to a new library named
    libsubcmd.a.

    Since we're moving files anyway, go ahead and rename 'exec_cmd.*' to
    'exec-cmd.*' to be consistent with the naming of all the other files.

    Signed-off-by: Josh Poimboeuf
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/c0a838d4c878ab17fee50998811612b2281355c1.1450193761.git.jpoimboe@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Josh Poimboeuf
     

05 Nov, 2015

1 commit

  • The latency subcommand holds a tree of working atoms sorted by thread's
    pid/tid. If there's new thread with same pid and tid, the old working atom is
    found and assert bug condition is hit in search function:

    thread_atoms_search: Assertion `!(thread != atoms->thread)' failed

    Changing the sort function to use thread object pointers together with pid and
    tid check. This way new thread will never find old one with same pid/tid.

    Link: http://lkml.kernel.org/n/tip-o4doazhhv0zax5zshkg8hnys@git.kernel.org
    Reported-by: Mohit Agrawal
    Signed-off-by: Jiri Olsa
    Acked-by: Namhyung Kim
    Cc: David Ahern
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1446462625-15807-1-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

27 Oct, 2015

1 commit

  • Now usage_with_options() setup a pager before printing message so normal
    printf() or pr_err() will not be shown. The usage_with_options_msg()
    can be used to print some help message before usage strings.

    Signed-off-by: Namhyung Kim
    Acked-by: Masami Hiramatsu
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1445701767-12731-4-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

27 May, 2015

1 commit

  • Sometimes when debugging large multi-threaded applications it is helpful
    to collate all of the latency numbers into one bulk record to get an
    idea of what is going on.

    This patch does this by merging any entries that belong to the same comm
    into one entry and then spits out those totals.

    I've also slightly changed the output so you can see how many threads
    were merged in the processing. Here is the new default output format

    -----------------------------------------------------------------------------------------------------------
    Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |
    -----------------------------------------------------------------------------------------------------------
    chrome:(23) | 740.878 ms | 2612 | avg: 0.022 ms | max: 0.845 ms | max at: 7935.254223 s
    pulseaudio:1523 | 94.440 ms | 597 | avg: 0.027 ms | max: 0.110 ms | max at: 7934.668372 s
    threaded-ml:6042 | 72.554 ms | 386 | avg: 0.035 ms | max: 1.186 ms | max at: 7935.330911 s
    Chrome_IOThread:3832 | 52.388 ms | 456 | avg: 0.021 ms | max: 1.365 ms | max at: 7935.330602 s
    Chrome_ChildIOT:(7) | 50.694 ms | 743 | avg: 0.021 ms | max: 1.448 ms | max at: 7935.256659 s
    Compositor:5510 | 30.012 ms | 192 | avg: 0.019 ms | max: 0.131 ms | max at: 7936.636815 s
    plugin_audio_th:6043 | 24.828 ms | 314 | avg: 0.018 ms | max: 0.143 ms | max at: 7936.205994 s
    CompositorTileW:(2) | 14.099 ms | 45 | avg: 0.022 ms | max: 0.153 ms | max at: 7937.521800 s

    the (#) after the task is the number of tasks merged, and then if there were
    no tasks merged it just shows the pid. Here is the same trace file with the -p
    option to print the per-pid latency numbers

    -----------------------------------------------------------------------------------------------------------
    Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |
    -----------------------------------------------------------------------------------------------------------
    chrome:5500 | 386.872 ms | 387 | avg: 0.023 ms | max: 0.241 ms | max at: 7936.001694 s
    pulseaudio:1523 | 94.440 ms | 597 | avg: 0.027 ms | max: 0.110 ms | max at: 7934.668372 s
    threaded-ml:6042 | 72.554 ms | 386 | avg: 0.035 ms | max: 1.186 ms | max at: 7935.330911 s
    chrome:10226 | 69.710 ms | 251 | avg: 0.023 ms | max: 0.764 ms | max at: 7935.992305 s
    chrome:4267 | 64.551 ms | 418 | avg: 0.021 ms | max: 0.294 ms | max at: 7937.862427 s
    chrome:4827 | 62.268 ms | 54 | avg: 0.029 ms | max: 0.666 ms | max at: 7935.992813 s
    Chrome_IOThread:3832 | 52.388 ms | 456 | avg: 0.021 ms | max: 1.365 ms | max at: 7935.330602 s
    chrome:3776 | 46.150 ms | 349 | avg: 0.023 ms | max: 0.845 ms | max at: 7935.254223 s

    Signed-off-by: Josef Bacik
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: kernel-team@fb.com
    Link: http://lkml.kernel.org/r/1432300720-30478-1-git-send-email-jbacik@fb.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Josef Bacik
     

09 May, 2015

1 commit

  • In addition to using refcounts for the struct thread lifetime
    management, we need to protect access to machine->threads from
    concurrent access.

    That happens in 'perf top', where a thread processes events, inserting
    and deleting entries from that rb_tree while another thread decays
    hist_entries, that end up dropping references and ultimately deleting
    threads from the rb_tree and releasing its resources when no further
    hist_entry (or other data structures, like in 'perf sched') references
    it.

    So the rule is the same for refcounts + protected trees in the kernel,
    get the tree lock, find object, bump the refcount, drop the tree lock,
    return, use object, drop the refcount if no more use of it is needed,
    keep it if storing it in some other data structure, drop when releasing
    that data structure.

    I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
    "perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".

    The addr_location__put() one is because as we return references to
    several data structures, we may end up adding more reference counting
    for the other data structures and then we'll drop it at
    addr_location__put() time.

    Acked-by: David Ahern
    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

08 Apr, 2015

9 commits

  • …d of the default value 10

    Since sched->replay_repeat is set to 10 as default, the sched->run_avg,
    sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use
    10 to calculate their value.

    However, the replay_repeat can be changed to other value by using -r
    option, so the calculation above should use replay_repeat to achieve
    more accurate results instead of the default value 10.

    Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Wang Nan <wangnan0@huawei.com>
    Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

    Yunlong Song
     
  • Enable to use perf.data when it is not owned by current user or root.

    Example:

    $ ls -al perf.data
    -rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data
    $ sudo id
    uid=0(root) gid=0(root) groups=0(root),64(pkcs11)

    Before this patch:

    $ sudo perf sched replay -f
    run measurement overhead: 98 nsecs
    sleep measurement overhead: 52909 nsecs
    the run test took 1000015 nsecs
    the sleep test took 1054253 nsecs
    File perf.data not owned by current user or root (use -f to override)

    As shown above, the -f option does not work at all.

    After this patch:

    $ sudo perf sched replay -f
    run measurement overhead: 221 nsecs
    sleep measurement overhead: 40514 nsecs
    the run test took 1000003 nsecs
    the sleep test took 1056098 nsecs
    nr_run_events: 10
    nr_sleep_events: 1562
    nr_wakeup_events: 5
    task 0 ( :1: 1), nr_events: 1
    task 1 ( :2: 2), nr_events: 1
    task 2 ( :3: 3), nr_events: 1
    ...
    ...
    task 1549 ( :163132: 163132), nr_events: 1
    task 1550 ( :163540: 163540), nr_events: 1
    task 1551 ( : 0), nr_events: 10
    ------------------------------------------------------------
    #1 : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18
    #2 : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17
    #3 : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48
    #4 : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77
    #5 : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43
    #6 : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56
    #7 : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62
    #8 : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45
    #9 : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87
    #10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45

    As shown above, the -f option really works now.

    Besides for replay, -f option can also work for latency and map.

    Signed-off-by: Yunlong Song
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yunlong Song
     
  • The soft maximum number of open files for a calling process is 1024,
    which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
    hard maximum number of open files for a calling process is 4096, which
    is defined as INR_OPEN_MAX in include/uapi/linux/fs.h.

    Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of
    RLIMIT_NOFILE in include/asm-generic/resource.h.

    And the soft maximum number finally decides the limitation of the
    maximum files which are allowed to be opened.

    That is to say a process can use at most 1024 file descriptors for its
    o pened files, or an EMFILE error will happen.

    This error can be fixed by increasing the soft maximum number, under the
    constraint that the soft maximum number can not exceed the hard maximum
    number, or both soft and hard maximum number should be increased
    simultaneously with privilege.

    For perf sched replay, it uses sys_perf_event_open to create the file
    descriptor for each of the tasks in order to handle information of perf
    events.

    That is to say each task needs a unique file descriptor. In x86_64,
    there may be over 1024 or 4096 tasks correspoinding to the record in
    perf.data, which causes that no enough file descriptors can be used.

    As a result, EMFILE error happens and stops the replay process. To solve
    this problem, we adaptively increase the soft and hard maximum number of
    open files with a '-f' option.

    Example:

    Test environment: x86_64 with 160 cores

    $ cat /proc/sys/kernel/pid_max
    163840
    $ cat /proc/sys/fs/file-max
    6815744
    $ ulimit -Sn
    1024
    $ ulimit -Hn
    4096

    Before this patch:

    $ perf sched replay
    ...
    task 1549 ( :163132: 163132), nr_events: 1
    task 1550 ( :163540: 163540), nr_events: 1
    task 1551 ( : 0), nr_events: 10
    Error: sys_perf_event_open() syscall returned with -1 (Too many open
    files)

    After this patch:

    $ perf sched replay
    ...
    task 1549 ( :163132: 163132), nr_events: 1
    task 1550 ( :163540: 163540), nr_events: 1
    task 1551 ( : 0), nr_events: 10
    Error: sys_perf_event_open() syscall returned with -1 (Too many open
    files)
    Have a try with -f option

    $ perf sched replay -f
    ...
    task 1549 ( :163132: 163132), nr_events: 1
    task 1550 ( :163540: 163540), nr_events: 1
    task 1551 ( : 0), nr_events: 10
    ------------------------------------------------------------
    #1 : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
    #2 : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
    #3 : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
    #4 : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
    #5 : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
    #6 : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
    #7 : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
    #8 : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
    #9 : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
    #10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48

    Signed-off-by: Yunlong Song
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yunlong Song
     
  • Since there is sem_wait for each task in the wait_for_tasks(), e.g.
    sem_wait(&task->work_done_sem).

    The sem_wait can continue only when work_done_sem is greater than 0, or
    it will be blocked.

    For perf sched replay, one task may sem_post the work_done_sem of
    another task, which causes the work_done_sem of that task processed in a
    reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...

    This sequence simulates the sched process of the running tasks at the
    time when perf sched record runs.

    As a result, all the tasks are required and their threads must be
    successfully created.

    If any one (task A) of the tasks fails to create its thread, then
    another task (task B), whose work_done_sem needs sem_post from that
    failed task A, may likely block itself due to seg_wait.

    And this is a dead halt, since task B's thread_func cannot continue at
    all.

    To solve this problem, perf sched replay should exit once any task fails
    to create its thread.

    Example:

    Test environment: x86_64 with 160 cores

    Before this patch:

    $ perf sched replay
    ...
    Error: sys_perf_event_open() syscall returned with -1 (Too many open
    files)
    ------------------------------------------------------------ : 0), nr_events: 10
    Error: sys_perf_event_open() syscall returned with -1 (Too many open
    files)
    $

    As shown above, perf sched replay finishes the process after printing an
    error message and does not block itself.

    Signed-off-by: Yunlong Song
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yunlong Song
     
  • The pr_err in self_open_counters() prints error message to stderr.
    Unlike stdout, stderr uses memory buffer on the stack of each calling
    process.

    The pr_err in self_open_counters() works in a thread called thread_func
    created in function create_tasks, which concurrently creates
    sched->nr_tasks threads.

    If the error happens and pr_err prints the error message in each of
    these threads, the stack size of the perf process (default is 8192
    kbytes) will quickly run out and the segmentation fault will happen
    then.

    To solve this problem, pr_err with self_open_counters() should be moved
    from newly created threads to the old main thread of the perf process.
    Then the pr_err can work in a stable situation without the strange
    segmentation fault problem.

    Example:

    Test environment: x86_64 with 160 cores

    Before this patch:

    $ perf sched replay
    ...
    task 1549 ( :163132: 163132), nr_events: 1
    task 1550 ( :163540: 163540), nr_events: 1
    task 1551 ( : 0), nr_events: 10
    Segmentation fault

    After this patch:

    $ perf sched replay
    ...
    task 1549 ( :163132: 163132), nr_events: 1
    task 1550 ( :163540: 163540), nr_events: 1
    task 1551 ( : 0), nr_events: 10
    ...

    As shown above, the result continues without any segmentation fault.

    Signed-off-by: Yunlong Song
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yunlong Song
     
  • …fferent pid_max configurations

    Although the memory of pid_to_task can be allocated via calloc according
    to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
    pid_max is changed after 'perf sched record' has created its perf.data.

    If the new pid_max configured in 'perf sched replay' is smaller than the
    old pid_max configured in 'perf sched record', then it will cause the
    assertion failure problem.

    To solve this problem, we realloc the memory of pid_to_task stepwise
    once the passed-in pid parameter in register_pid is larger than the
    current pid_max.

    Example:

    Test environment: x86_64 with 160 cores

    $ cat /proc/sys/kernel/pid_max
    163840
    $ perf sched record ls
    $ echo 5000 > /proc/sys/kernel/pid_max
    $ cat /proc/sys/kernel/pid_max
    5000

    Before this patch:

    $ perf sched replay
    run measurement overhead: 221 nsecs
    sleep measurement overhead: 55356 nsecs
    the run test took 1000011 nsecs
    the sleep test took 1060940 nsecs
    perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
    long)pid_max)' failed.
    Aborted

    After this patch:

    $ perf sched replay
    run measurement overhead: 221 nsecs
    sleep measurement overhead: 55611 nsecs
    the run test took 1000026 nsecs
    the sleep test took 1060486 nsecs
    nr_run_events: 10
    nr_sleep_events: 1562
    nr_wakeup_events: 5
    task 0 ( :1: 1), nr_events: 1
    task 1 ( :2: 2), nr_events: 1
    task 2 ( :3: 3), nr_events: 1
    task 3 ( :5: 5), nr_events: 1
    ...

    Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Wang Nan <wangnan0@huawei.com>
    Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

    Yunlong Song
     
  • …nexpected change of pid_max

    The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
    is in a permanent and preset way, and it has two problems:

    Problem 1: If the pid_max, which is the max number of pids in the
    system, is much smaller than MAX_PID (1024*1000), then it causes a waste
    of stack memory. This may happen in the case where the number of cpu
    cores is much smaller than 1000.

    Problem 2: If the pid_max is changed from the default value to a value
    larger than MAX_PID, then it will cause assertion failure problem. The
    maximum value of pid_max can be set to pid_max_max (see pidmap_init
    defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
    PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
    value is much larger than MAX_PID, and will take up 32768 Kbytes
    (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
    larger than the default 8192 Kbytes of the stack size of calling
    process.

    Due to these two problems, we use calloc to allocate the memory of
    pid_to_task dynamically.

    Example:

    Test environment: x86_64 with 160 cores

    $ cat /proc/sys/kernel/pid_max
    163840
    $ echo 1025000 > /proc/sys/kernel/pid_max
    $ cat /proc/sys/kernel/pid_max
    1025000

    Run some applications until the pid of some process is greater than
    the value of MAX_PID (1024*1000).

    Before this patch:

    $ perf sched replay
    run measurement overhead: 221 nsecs
    sleep measurement overhead: 55480 nsecs
    the run test took 1000008 nsecs
    the sleep test took 1063151 nsecs
    perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
    failed.
    Aborted

    After this patch:

    $ perf sched replay
    run measurement overhead: 221 nsecs
    sleep measurement overhead: 55435 nsecs
    the run test took 1000004 nsecs
    the sleep test took 1059312 nsecs
    nr_run_events: 10
    nr_sleep_events: 1562
    nr_wakeup_events: 5
    task 0 ( :1: 1), nr_events: 1
    task 1 ( :2: 2), nr_events: 1
    task 2 ( :3: 3), nr_events: 1
    task 3 ( :5: 5), nr_events: 1
    ...

    Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Wang Nan <wangnan0@huawei.com>
    Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

    Yunlong Song
     
  • Current MAX_PID is only 65536, which will cause assertion failure problem
    when CPU cores are more than 64 in x86_64.

    This is because the pid_max value in x86_64 is at least
    PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init
    defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in
    include/linux/threads.h).

    Thus for MAX_PID = 65536, the correspoinding CPU cores are
    65536/1024=64. This is obviously not enough at all for x86_64, and will
    cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the
    codes.

    We increase MAX_PID value from 65536 to 1024*1000, which can be used in
    x86_64 with 1000 cores.

    This number is finally decided according to the limitation of stack size
    of calling process.

    Use 'ulimit -a', the result shows the stack size of any process is 8192
    Kbytes, which is defined in include/uapi/linux/resource.h (#define
    _STK_LIM (8*1024*1024)).

    Thus we choose a large enough value for MAX_PID, and make it satisfy to
    the limitation of the stack size, i.e., making the perf process take up
    a memory space just smaller than 8192 Kbytes.

    We have calculated and tested that 1024*1000 is OK for MAX_PID.

    This means perf sched replay can now be used with at most 1000 cores in
    x86_64 without any assertion failure problem.

    Example:

    Test environment: x86_64 with 160 cores

    $ cat /proc/sys/kernel/pid_max
    163840

    Before this patch:

    $ perf sched replay
    run measurement overhead: 240 nsecs
    sleep measurement overhead: 55379 nsecs
    the run test took 1000004 nsecs
    the sleep test took 1059424 nsecs
    perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)'
    failed.
    Aborted

    After this patch:

    $ perf sched replay
    run measurement overhead: 221 nsecs
    sleep measurement overhead: 55397 nsecs
    the run test took 999920 nsecs
    the sleep test took 1053313 nsecs
    nr_run_events: 10
    nr_sleep_events: 1562
    nr_wakeup_events: 5
    task 0 ( :1: 1), nr_events: 1
    task 1 ( :2: 2), nr_events: 1
    task 2 ( :3: 3), nr_events: 1
    task 3 ( :5: 5), nr_events: 1
    ...

    Signed-off-by: Yunlong Song
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yunlong Song
     
  • There is no struct task_task at all, thus it is a typo error in the old
    commits, now fix it to what it should be in order to avoid unnecessary
    misunderstanding.

    Signed-off-by: Yunlong Song
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yunlong Song
     

11 Mar, 2015

1 commit

  • By keeping pointers to machines, evlist and tool in ordered_events.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-0c6huyaf59mqtm2ek9pmposl@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

03 Mar, 2015

2 commits

  • We were keeping the session around just because we kept pointers to
    struct thread instances, but now we reference count them, so no need
    for deferring the perf_session__delete call to after we traverse the
    work_list entries.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-9agtck6jdr3rebdp39z1lo0e@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • We need to do that to stop accumulating entries in the dead_threads
    linked list, i.e. we were keeping references to threads in struct hists
    that continue to exist even after a thread exited and was removed from
    the machine threads rbtree.

    We still keep the dead_threads list, but just for debugging, allowing us
    to iterate at any given point over the threads that still are referenced
    by things like struct hist_entry.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-3ejvfyed0r7ue61dkurzjux4@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

23 Feb, 2015

1 commit

  • For tools that don't deal with perf.data files, thus do not need to
    use perf_session.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-kglq67gvauq9tak02a4se00r@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

09 Oct, 2014

1 commit

  • Not used here, remove to reduce perf_evsel/hists structs interaction.

    Cc: Adrian Hunter
    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Jean Pihet
    Cc: Jiri Olsa
    Cc: Mike Galbraith
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-cb7wkk4a3jpoovzim914ih3c@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

16 Aug, 2014

1 commit

  • Use strerror_r instead of strerror in error message for thread-safety.

    Signed-off-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Naohiro Aota
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140814022247.3545.4564.stgit@kbuild-fedora.novalocal
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     

14 Aug, 2014

2 commits

  • Currently vmlinux_path__init() only tries to find vmlinux file from
    current directory, /boot and some canonical directories with version
    number of the running kernel. This can be a problem when reporting old
    data recorded on a kernel version not running currently.

    We can use --symfs option for this but it's annoying for user to do it
    always. As we already have the info in the perf.data file, it can be
    changed to use it for the search automatically.

    Before:

    $ perf report
    ...
    # Samples: 4K of event 'cpu-clock'
    # Event count (approx.): 1067250000
    #
    # Overhead Command Shared Object Symbol
    # ........ .......... ................. ..............................
    71.87% swapper [kernel.kallsyms] [k] recover_probed_instruction

    After:

    # Overhead Command Shared Object Symbol
    # ........ .......... ................. ....................
    71.87% swapper [kernel.kallsyms] [k] native_safe_halt

    This requires to change signature of symbol__init() to receive struct
    perf_session_env *.

    Reported-by: Minchan Kim
    Signed-off-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Minchan Kim
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1407825645-24586-14-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • This is a preparation of fixing dso__load_kernel_sym(). It needs a
    session info before calling symbol__init().

    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Minchan Kim
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1407825645-24586-10-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

12 Aug, 2014

1 commit

  • The time ordering is generic for all kinds of events, so using generic
    name 'ordered_events' for ordered_samples bool in perf_tool struct.

    No functional change was intended.

    Signed-off-by: Jiri Olsa
    Acked-by: David Ahern
    Cc: Corey Ashford
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jean Pihet
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-07mrqzcuhsks9wfmxrzsvemz@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

18 Jul, 2014

1 commit

  • In commit a21b0b354d4a ('perf: Introduce a flag to enable
    close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC
    was added to perf_event_open(2) syscall to allows userspace
    to atomically enable close-on-exec behavor when creating
    the file descriptor.

    This patch makes perf tools use the new flag if supported
    by the kernel, so that the event file descriptors got
    automatically closed if perf tool exec a sub-command.

    Signed-off-by: Yann Droneaud
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/1404160127-7475-1-git-send-email-ydroneaud@opteya.com
    Signed-off-by: Jiri Olsa

    Yann Droneaud
     

17 Jul, 2014

1 commit

  • The value used for unknown pids cannot be zero because that is used by
    the "idle" task.

    Use -1 instead. Also handle the unknown pid case when creating map
    groups.

    Note that, threads with an unknown pid should not occur because fork (or
    synthesized) events precede the thread's existence.

    Signed-off-by: Adrian Hunter
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1405332185-4050-2-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     

01 Jun, 2014

1 commit

  • There're some duplicate code for counting number of samples. Add
    hists__inc_nr_samples() and reuse it.

    Suggested-by: Jiri Olsa
    Signed-off-by: Namhyung Kim
    Link: http://lkml.kernel.org/r/1401335910-16832-2-git-send-email-namhyung@kernel.org
    Signed-off-by: Jiri Olsa

    Namhyung Kim
     

16 May, 2014

2 commits


13 May, 2014

1 commit

  • trace_sched_wakeup(.success) is a dead argument and has been for ages,
    the only reason its still there is because of brain dead software, which
    apparently includes perf tools

    There's a few more instances in pearly snake shit, but that's not
    supported as far as I care anyhow, so let that bitrot.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140512181946.GG13467@laptop.programming.kicks-ass.net
    Signed-off-by: Jiri Olsa

    Peter Zijlstra
     

12 May, 2014

3 commits

  • In output of perf sched map, any shortname of thread will be explained
    at the first time when it appear.

    Example:
    *A0 228836.978985 secs A0 => perf:23032
    *. A0 228836.979016 secs B0 => swapper:0
    . *C0 228836.979099 secs C0 => migration/3:22
    *A0 . C0 228836.979115 secs
    A0 . *. 228836.979115 secs

    But B0, which is explained as swapper:0 did not appear in the
    left part of output. Instead, we use '.' as the shortname of
    swapper:0. So the comment of "B0 => swapper:0" is not easy to
    understand.

    This patch clarify the output of perf sched map with not allocating
    one letter-number shortname for swapper:0 and print ". => swapper:0"
    as the explanation for swapper:0.

    Example:
    *A0 228836.978985 secs A0 => perf:23032
    * . A0 228836.979016 secs . => swapper:0
    . *B0 228836.979099 secs B0 => migration/3:22
    *A0 . B0 228836.979115 secs
    A0 . * . 228836.979115 secs
    A0 *C0 . 228836.979225 secs C0 => ksoftirqd/2:18
    A0 *D0 . 228836.979236 secs D0 => rcu_sched:7

    Signed-off-by: Dongsheng
    Acked-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/1399354741-19522-1-git-send-email-yangds.fnst@cn.fujitsu.com
    [ small style fixes to make checkpatch happy ]
    Signed-off-by: Jiri Olsa

    Dongsheng
     
  • Currently, TASK_STATE_TO_CHAR_STR in kernel space is already expanded to RSDTtZXxKWP,
    but it is still RSDTtZX in perf sched tool.

    This patch update TASK_STATE_TO_CHAR_STR to the new value in kernel space.

    Signed-off-by: Dongsheng
    Link: http://lkml.kernel.org/r/6d2f55dc1e02c1e29a5d70bfeb9d6e8863caf2aa.1399273302.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Jiri Olsa

    Dongsheng
     
  • We should record and process sched:sched_wakeup_new event in
    perf sched tool, but currently, there is the process function
    for it, without recording it in record subcommand.

    This patch add -e sched:sched_wakeup_new to perf sched record.

    Signed-off-by: Dongsheng
    Link: http://lkml.kernel.org/r/710c6edd2162b2cea1711443f54de47c0210d9fd.1399273302.git.yangds.fnst@cn.fujitsu.com
    Signed-off-by: Jiri Olsa

    Dongsheng
     

16 Apr, 2014

1 commit