06 Feb, 2019

2 commits

  • CoreSight was the only client of the PMU's set_drv_config() API. Now
    that it is no longer needed by CoreSight remove it from the code base.

    Signed-off-by: Mathieu Poirier
    Acked-by: Suzuki K Poulouse
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: Heiko Carstens
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Martin Schwidefsky
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-s390@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190131184714.20388-8-mathieu.poirier@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Mathieu Poirier
     
  • Lots of places get the map.h file indirectly, and since we're going to
    remove it from machine.h, then those need to include it directly, do it
    now, before we remove that dep.

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-ob8jehdjda8h5jsrv9dqj9tf@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

25 Jan, 2019

1 commit

  • At the cost of an extra pointer, we can avoid the O(logN) cost of
    finding the first element in the tree (smallest node), which is
    something heavily required for histograms. Specifically, the following
    are converted to rb_root_cached, and users accordingly:

    hist::entries_in_array
    hist::entries_in
    hist::entries
    hist::entries_collapsed
    hist_entry::hroot_in
    hist_entry::hroot_out

    Signed-off-by: Davidlohr Bueso
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net
    [ Added some missing conversions to rb_first_cached() ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Davidlohr Bueso
     

22 Jan, 2019

1 commit


09 Jan, 2019

1 commit

  • This restriction is not present in 'perf report' and since 'perf top'
    uses the same hists browser, remove it from it as well.

    With this we create per event buckets with callchain trees, so that

    # perf top --sort dso -g --no-children

    Bucketizes samples by DSO and below it shows the callchains leading to
    functions in this DSO.

    Try also:

    # perf top -e sched:*switch -g --no-children

    To see the callchains leading to sched switches, pressing 'E' to expand
    all one can quickly see the most common scheduler switches and what
    leads to them, for instance, calls to IO, futexes, etc.

    Acked-by: Namhyung Kim
    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Link: https://lkml.kernel.org/r/20190107140854.GA28965@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

18 Dec, 2018

12 commits

  • Move the perf_top__reset_sample_counters() call to right after we
    display the counters so we can see the updated numbers for longer.

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/n/tip-o72pyiwt05f3p2juprwmz2jo@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Currently we display the "Too slow to read ring buffer.." helpline only
    in the slow reader thread. This patch triggers it also when the
    processing thread drops samples, because it has the same reason, which
    is too many data on input.

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/n/tip-bnev2mloavyurmgchcr3o24o@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Add drop count to 'perf top' headers:

    # perf top --stdio
    PerfTop: 3549 irqs/sec kernel:51.8% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles:ppp], (all, 8 CPUs)

    # perf top
    Samples: 0 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 0 lost: 0/0 drop: 0/0

    The format is: /

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-2lj87zz8tq9ye1ntax3ulw0n@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Drop samples from processing thread if they get behind the latest event
    read from the kernel maps. If it gets behind more than the refresh rate
    (-d option), drop the sample.

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/n/tip-x533ra5c1pgofvbtsizzuydd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • So we can get out of hist processing ASAP on user request.

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/n/tip-r8aufbgbixr2f85s3wcoaw9v@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Use conditional variable logic to synchronize between the reading and
    processing threads. Currently it's done by having mutex around rotation
    code.

    Using a POSIX cond variable to sync both threads after queues rotation:

    Process thread:

    - Detects data
    - Switches queues
    - Sets rotate variable
    - Waits in pthread_cond_wait()

    Read thread:

    - Detects rotate is set
    - Kicks the process thread with a pthread_cond_signal()

    After this rotation is safely completed and both threads can continue
    with the new queue.

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-3rdeg23rv3brvy1pwt3igvyw@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Add a new thread that takes care of the hist creating to alleviate the
    main reader thread so it can keep perf mmaps served in time so that we
    reduce the possibility of losing events.

    The 'perf top' command now spawns 2 extra threads, the data processing
    is the following:

    1) The main thread reads the data from mmaps and queues them to
    ordered events object;

    2) The processing threads takes the data from the ordered events
    object and create initial histogram;

    3) The GUI thread periodically sorts the initial histogram and
    presents it.

    Passing the data between threads 1 and 2 is done by having 2 ordered
    events queues. One is always being stored by thread 1 while the other is
    flushed out in thread 2.

    Passing the data between threads 2 and 3 stays the same as was initially
    for threads 1 and 3.

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-hhf4hllgkmle9wl1aly1jli0@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • We can't display the UI box saying that we are slow in the reader
    thread. That will make 'perf top' even slower and the user even more
    angry ;-)

    Move the UI box message from the reader thread to the UI thread and
    change it to a helpline, so there's no need to 'press any key'.

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/n/tip-x4k0iuw7tt6mywsaguq6jfwu@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Add a 'lost count' to 'perf top' headers:

    # perf top --stdio
    PerfTop: 3850 irqs/sec kernel:49.0% exact: 100.0% lost: 0/0 [4000Hz cycles:ppp], (all, 8 CPUs)

    # perf top
    Samples: 0 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 0 lost: 0/0

    The format is: /

    Signed-off-by: Jiri Olsa
    Acked-by: David S. Miller
    Acked-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/n/tip-zo11rn270gij5jtp8fknpf8u@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • The default timeout of 500ms for parsing /proc//maps files is too
    short for profiling many of our services.

    This can be overridden by passing --proc-map-timeout to the relevant
    command but it'd be nice to globally increase our default value.

    This patch permits setting a different default with the
    core.proc-map-timeout config file parameter.

    Signed-off-by: Mark Drayton
    Acked-by: Song Liu
    Acked-by: Namhyung Kim
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20181204203420.1683114-1-mbd@fb.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Mark Drayton
     
  • Go over the tools/ files that are maintained in Arnaldo's tree and
    fix common typos: half of them were in comments, the other half
    in JSON files.

    No change in functionality intended.

    Committer notes:

    This was split from a larger patch as there are code that is,
    additionally, maintained outside the kernel tree, so to ease
    cherry-picking and/or backporting, split this into multiple patches.

    Just typos in comments, no need to backport, reducing the possibility of
    possible backporting artifacts.

    Signed-off-by: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ingo Molnar
     
  • This basically replicates what was done for 'perf report' in:

    b226a5a72901 ("perf report: Allow user to specify path to kallsyms file")

    This should help with resolving eBPF symbols, that are in kallsyms but,
    of course, not in vmlinux.

    Reported-by: Ivan Babrou
    Tested-by: Ivan Babrou
    Cc: Adrian Hunter
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: David Ahern
    Cc: David S. Miller
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-x52mx1ybq8128rtg9hjrj5qk@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

06 Nov, 2018

1 commit

  • 'perf report' has supported the displaying of LBR stats (such as cycles,
    predicted%) in callchain entry.

    For example:

    $ perf report --branch-history --stdio

    --1.01%--intel_idle mwait.h:29
    intel_idle cpufeature.h:164 (cycles:5)
    intel_idle cpufeature.h:164 (predicted:76.4%)
    intel_idle mwait.h:102 (cycles:41)
    intel_idle current.h:15

    While 'perf top' doesn't support that.

    For example:

    $ perf top -a -b --call-graph branch

    - 13.86% 0.23% [kernel] [k] __x86_indirect_thunk_rax
    - 13.65% __x86_indirect_thunk_rax
    + 1.69% do_syscall_64
    + 1.68% do_select
    + 1.41% ktime_get
    + 0.70% __schedule
    + 0.62% do_sys_poll
    0.58% __x86_indirect_thunk_rax

    Actually it's very easy to enable this feature in 'perf top'.

    With this patch, the result is:

    $ perf top -a -b --call-graph branch

    $ - 13.58% 0.00% [kernel] [k] __x86_indirect_thunk_rax
    $ - 13.57% __x86_indirect_thunk_rax (predicted:93.9%)
    $ + 1.78% do_select (cycles:2)
    $ + 1.68% perf_pmu_disable.part.99 (cycles:1)
    $ + 1.45% ___sys_recvmsg (cycles:25)
    $ + 0.81% unix_stream_sendmsg (cycles:18)
    $ + 0.80% ktime_get (cycles:400)
    $ 0.58% pick_next_task_fair (cycles:47)
    $ + 0.56% i915_request_retire (cycles:2)
    $ + 0.52% do_sys_poll (cycles:4)

    Signed-off-by: Jin Yao
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1540983995-20462-1-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jin Yao
     

31 Oct, 2018

2 commits

  • If events are coming in at a rate such that the event processing thread
    can barely keep up, our initial run of the event ring will almost never
    terminate and this delays the starting of the display thread.

    The screen basically stays black until the event thread can get out of
    it's endless loop.

    Therefore, start the display thread before we start processing the ring
    buffer.

    This also make sure that we always have the user requested real time
    setting engaged when processing the ring.

    Signed-off-by: David S. Miller
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20181030.223003.2242527041807905962.davem@davemloft.net
    Signed-off-by: Arnaldo Carvalho de Melo

    David Miller
     
  • Enabling --overwrite mode allows us to to use just the most recent
    records, which helps in high core count machines such as Knights
    Landing/Mill, but right now is being disabled by default as the pausing
    used in this technique is leading to loss of metadata events such as
    PERF_RECORD_MMAP which makes 'perf top' unable to resolve samples,
    leading to lots of unknown samples appearing on the UI.

    Enabling this may be useful if you are in such machines and profiling a
    workload that doesn't creates short lived threads and/or doesn't uses
    many executable mmap operations.

    Work is being planed to solve this situation, till then, this will
    remain disabled by default.

    Reported-by: David Miller
    Acked-by: Kan Liang
    Link: https://lkml.kernel.org/r/4f84468f-37d9-cf1b-12c1-514ef74b6a48@linux.intel.com
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Fixes: ebebbf082357 ("perf top: Switch default mode to overwrite mode")
    Link: https://lkml.kernel.org/n/tip-ehvf77vi1si9409r7p4wx788@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 Oct, 2018

1 commit

  • In ebebbf082357 ("perf top: Switch default mode to overwrite mode") we
    forgot to leave a way to disable that new default, add a --overwrite
    option that can be disabled using --no-overwrite, since the code already
    in such a way that we can readily disable this mode.

    This is useful when investigating bugs with this mode like the recent
    report from David Miller where lots of unknown symbols appear due to
    disabling the events while processing them which disables all record
    types, not just PERF_RECORD_SAMPLE, which makes it impossible to resolve
    maps when we lose PERF_RECORD_MMAP records.

    This can be easily seen while building a kernel, when there are lots of
    short lived processes.

    Reported-by: David Miller
    Acked-by: Kan Liang
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Fixes: ebebbf082357 ("perf top: Switch default mode to overwrite mode")
    Link: https://lkml.kernel.org/n/tip-oqgsz2bq4kgrnnajrafcdhie@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

25 Jul, 2018

1 commit

  • We want to allow having mixed events with/without callchains, not
    using a global flag to show callchains, but allowing supressing
    callchains when they are present.

    So invert the logic of the last parameter to hists__fprint() to
    that effect.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-ohqyisr6qge79qa95ojslptx@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

04 Jun, 2018

9 commits


18 May, 2018

1 commit


27 Apr, 2018

2 commits

  • To further simplify checking if symbols are available for a given map
    and to reduce the number of users of MAP__{FUNCTION,VARIABLE}.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-iyfoyvbfdti5uehgpjum3qrq@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • To replace longer code sequences in various places.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-tlk3klbkfyjrbfjvryyznfju@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

26 Apr, 2018

1 commit

  • Shorter form to figure out if a given map is the kernel one and also
    reduces the number of code accessing MAP__{FUNCTION,VARIABLE}, that
    should go away at some point.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-rn8pexelsxpx92ce3elu3wiw@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

21 Mar, 2018

1 commit


17 Mar, 2018

1 commit

  • When trying to add the "call-graph" variable for top into the
    .perfconfig file, like:

    [top]
    call-graph = fp

    I that perf_top_config() do not parse this variable.

    Fix it by calling perf_default_config() when the top.call-graph variable
    is set.

    Signed-off-by: Yisheng Xie
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Fixes: b8cbb349061e ("perf config: Bring perf_default_config to the very beginning at main()")
    Link: http://lkml.kernel.org/r/1520853957-36106-1-git-send-email-xieyisheng1@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yisheng Xie
     

08 Mar, 2018

3 commits

  • It isn't necessary to pass the 'start', 'end' and 'overwrite' arguments
    to perf_mmap__read_init(). The data is stored in the struct perf_mmap.

    Discard the parameters.

    Signed-off-by: Kan Liang
    Suggested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1520350567-80082-8-git-send-email-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     
  • It isn't necessary to pass the 'overwrite', 'start' and 'end' argument
    to perf_mmap__read_event(). Discard them.

    Signed-off-by: Kan Liang
    Suggested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1520350567-80082-7-git-send-email-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     
  • It isn't necessary to pass the 'overwrite' argument to
    perf_mmap__consume(). Discard it.

    Signed-off-by: Kan Liang
    Suggested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/1520350567-80082-6-git-send-email-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang