15 Oct, 2020

8 commits

  • The metrics "LLC Ld Miss" and "Load Dram" overlap with each other for
    accouting items:

    "LLC Ld Miss" = "lcl_dram" + "rmt_dram" + "rmt_hit" + "rmt_hitm"
    "Load Dram" = "lcl_dram" + "rmt_dram"

    Furthermore, the metrics "LLC Ld Miss" is not directive to show
    statistics due to it contains summary value and cannot give out
    breakdown details.

    For this reason, add a new metrics "RMT Load Hit" which is used to
    present the remote cache hit; it contains two items:

    "RMT Load Hit" = remote hit ("rmt_hit") + remote hitm ("rmt_hitm")

    As result, the metrics "LLC Ld Miss" is perfectly divided into two
    metrics "RMT Load Hit" and "Load Dram". It's not necessary to keep
    metrics "LLC Ld Miss", so remove it.

    Before:

    # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- LLC --- Load Dram ----
    # Index Address Node PA cnt Hitm Total LclHitm RmtHitm records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm Ld Miss Lcl Rmt
    # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ....... ........ ........
    #
    0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 548 2615 66 169 481 0 0 0
    1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 187 361 27 11 78 0 0 0
    2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 131 0 10 263 1 0 0 0

    After:

    # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
    # Index Address Node PA cnt Hitm Total LclHitm RmtHitm records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
    # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
    #
    0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 548 2615 66 169 481 0 0 0 0
    1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 187 361 27 11 78 0 0 0 0
    2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 131 0 10 263 1 0 0 0 0

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Signed-off-by: Arnaldo Carvalho de Melo
    Link: https://lore.kernel.org/r/20201014050921.5591-9-leo.yan@linaro.org

    Leo Yan
     
  • "rmt_hit" is accounted into two metrics: one is accounted into the
    metrics "LLC Ld Miss" (see the function llc_miss() for calculation
    "llcmiss"); and it's accounted into metrics "LLC Load Hit". Thus,
    for the literal meaning, it is contradictory that "rmt_hit" is
    accounted for both "LLC Ld Miss" (LLC miss) and "LLC Load Hit"
    (LLC hit).

    Thus this is easily to introduce confusion: "LLC Load Hit" gives
    impression that all items belong to it are LLC hit; in fact "rmt_hit"
    is LLC miss and remote cache hit.

    To give out clear semantics for metric "LLC Load Hit", "rmt_hit" is
    moved out from it and changes "LLC Load Hit" to contain two items:

    LLC Load Hit = LLC's hit ("ld_llchit") + LLC's hitm ("lcl_hitm")

    For output alignment, adjusts the header for "LLC Load Hit".

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20201014050921.5591-8-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     
  • Replace the header string "Lcl" with "LclHit", which is more explicit
    to express the event type is LLC local hit.

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20201014050921.5591-7-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     
  • Local and remote HITM use the headers 'Lcl' and 'Rmt' respectively,
    suppose if we want to extend the tool to display these two dimensions
    under any one metrics, users cannot understand the semantics if only
    based on the header string 'Lcl' or 'Rmt'.

    To explicit express the meaning for HITM items, this patch changes the
    headers string as "LclHitm" and "RmtHitm", the strings are more readable
    and this allows to extend metrics for using HITM items.

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20201014050921.5591-6-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     
  • The metrics "LLC Load Hitm" contains two items: one is "local Hitm" and
    another is "remote Hitm".

    "local Hitm" means: L3 HIT and was serviced by another processor core
    with a cross core snoop where modified copies were found; it's no doubt
    that "local Hitm" belongs to LLC access.

    But for "remote Hitm", based on the code in util/mem-events, it's the
    event for remote cache HIT and was serviced by another processor core
    with modified copies. Thus the remote Hitm is a remote cache's hit and
    actually it's LLC load miss.

    Now the display format gives users the impression that "local Hitm" and
    "remote Hitm" both belong to the LLC load, but this is not the fact as
    described.

    This patch changes the header from "LLC Load Hitm" to "Load Hitm", this
    can avoid the give the wrong impression that all Hitm belong to LLC.

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20201014050921.5591-5-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     
  • The metrics are not organized based on memory hierarchy, e.g. the tool
    doesn't organize the metrics order based on memory nodes from the close
    node (e.g. L1/L2 cache) to far node (e.g. L3 cache and DRAM).

    To output metrics with more friendly form, this patch refines the
    metrics order based on memory hierarchy:

    "Core Load Hit" => "LLC Load Hit" => "LLC Ld Miss" => "Load Dram"

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20201014050921.5591-4-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     
  • The total stores is displayed under the metrics "Store Reference", to
    output the same format with total records and all loads, extract the
    total stores number as a standalone metrics "Total Stores".

    After this patch, the tool shows the summary numbers ("Total records",
    "Total loads", "Total Stores") in the unified form.

    Before:

    # ----------- Cacheline ---------- Tot ----- LLC Load Hitm ----- Total Total ---- Store Reference ---- --- Load Dram ---- LLC ----- Core Load Hit ----- -- LLC Load Hit --
    # Index Address Node PA cnt Hitm Total Lcl Rmt records Loads Total L1Hit L1Miss Lcl Rmt Ld Miss FB L1 L2 Llc Rmt
    # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ........ ....... ....... ....... ....... ........ ........
    #
    0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 0 0 0 548 2615 66 169 0
    1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 0 0 0 187 361 27 11 0
    2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 0 0 0 131 0 10 263 0

    After:

    # ----------- Cacheline ---------- Tot ----- LLC Load Hitm ----- Total Total Total ---- Stores ---- --- Load Dram ---- LLC ----- Core Load Hit ----- -- LLC Load Hit --
    # Index Address Node PA cnt Hitm Total Lcl Rmt records Loads Stores L1Hit L1Miss Lcl Rmt Ld Miss FB L1 L2 Llc Rmt
    # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ........ ....... ....... ....... ....... ........ ........
    #
    0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 0 0 0 548 2615 66 169 0
    1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 0 0 0 187 361 27 11 0
    2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 0 0 0 131 0 10 263 0

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20201014050921.5591-3-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     
  • To view the statistics with "breakdown" mode, it's good to show the
    summary numbers for the total records, all stores and all loads, then
    the sequential conlumns can be used to break into more detailed items.

    To achieve this purpose, this patch displays the summary numbers for
    records/stores/loads continuously and places them before breakdown
    items, this can allow uses to easily read the summarized statistics.

    Signed-off-by: Leo Yan
    Tested-by: Joe Mario
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20201014050921.5591-2-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     

14 Oct, 2020

1 commit

  • Since commit b027cc6fdf1b ("perf c2c: Fix 'perf c2c record -e list' to
    show the default events used"), "perf c2c" tool can show the memory
    events properly, it's no reason to still suggest user to use the
    command "perf mem record -e list" for showing events.

    This patch updates the usage for showing memory events with command
    "perf c2c record -e list".

    Signed-off-by: Leo Yan
    Acked-by: Jiri Olsa
    Acked-by: Ian Rogers
    Signed-off-by: Arnaldo Carvalho de Melo
    Link: https://lore.kernel.org/r/20201011121022.22409-1-leo.yan@linaro.org

    Leo Yan
     

23 Jun, 2020

1 commit


28 May, 2020

1 commit

  • When the event is passed as list, the default events should be listed as
    per 'perf mem record -e list'. Previous behavior is:

    $ perf c2c record -e list
    failed: event 'list' not found, use '-e list' to get list of available events

    Usage: perf c2c record [] []
    or: perf c2c record [] -- []

    -e, --event event selector. Use 'perf mem record -e list' to list available events
    $

    New behavior:

    $ perf c2c record -e list
    ldlat-loads : available
    ldlat-stores : available

    v3: is a rebase.
    v2: addresses review comments by Jiri Olsa.

    https://lore.kernel.org/lkml/20191127081844.GH32367@krava/
    Signed-off-by: Ian Rogers
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lore.kernel.org/lkml/20200507220604.3391-1-irogers@google.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian Rogers
     

06 May, 2020

1 commit


30 Apr, 2020

1 commit

  • Fixes coccicheck warnings:

    tools/perf/builtin-c2c.c:1712:2-3: Unneeded semicolon
    tools/perf/builtin-c2c.c:1928:2-3: Unneeded semicolon
    tools/perf/builtin-c2c.c:2962:2-3: Unneeded semicolon

    Reported-by: Hulk Robot
    Signed-off-by: Zou Wei
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/1588064336-70456-1-git-send-email-zou_wei@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Zou Wei
     

18 Apr, 2020

1 commit

  • With the LBR stitching approach, the reconstructed LBR call stack can
    break the HW limitation. However, it may reconstruct invalid call stacks
    in some cases, e.g. exception handing such as setjmp/longjmp. Also, it
    may impact the processing time especially when the number of samples
    with stitched LBRs are huge.

    Add an option to enable the approach.

    Signed-off-by: Kan Liang
    Reviewed-by: Andi Kleen
    Acked-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Alexey Budankov
    Cc: Mathieu Poirier
    Cc: Michael Ellerman
    Cc: Namhyung Kim
    Cc: Pavel Gerasimov
    Cc: Peter Zijlstra
    Cc: Ravi Bangoria
    Cc: Stephane Eranian
    Cc: Vitaly Slobodskoy
    Link: http://lore.kernel.org/lkml/20200319202517.23423-17-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     

15 Jan, 2020

1 commit

  • Commit 722ddfde366f ("perf tools: Fix time sorting") changed - correctly
    so - hist_entry__sort to return int64. Unfortunately several of the
    builtin-c2c.c comparison routines only happened to work due the cast
    caused by the wrong return type.

    This causes meaningless ordering of both the cacheline list, and the
    cacheline details page. E.g a simple:

    perf c2c record -a sleep 3
    perf c2c report

    will result in cacheline table like
    =================================================
    Shared Data Cache Line Table
    =================================================
    #
    # ------- Cacheline ---------- Total Tot - LLC Load Hitm - - Store Reference - - Load Dram - LLC Total - Core Load Hit - - LLC Load Hit -
    # Index Address Node PA cnt records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rmt
    # ..... .............. .... ...... ....... ...... ..... ..... ... .... ..... ...... ...... .... ...... ..... ..... ..... ... .... .......

    0 0x7f0d27ffba00 N/A 0 52 0.12% 13 6 7 12 12 0 0 7 14 40 4 16 0 0 0
    1 0x7f0d27ff61c0 N/A 0 6353 14.04% 1475 801 674 779 779 0 0 718 1392 5574 1299 1967 0 115 0
    2 0x7f0d26d3ec80 N/A 0 71 0.15% 16 4 12 13 13 0 0 12 24 58 1 20 0 9 0
    3 0x7f0d26d3ec00 N/A 0 98 0.22% 23 17 6 19 19 0 0 6 12 79 0 40 0 10 0

    i.e. with the list not being ordered by Total Hitm.

    Fixes: 722ddfde366f ("perf tools: Fix time sorting")
    Signed-off-by: Andres Freund
    Tested-by: Michael Petlan
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org # v3.16+
    Link: http://lore.kernel.org/lkml/20200109043030.233746-1-andres@anarazel.de
    Signed-off-by: Arnaldo Carvalho de Melo

    Andres Freund
     

06 Jan, 2020

1 commit

  • Sometimes we're in an outer code, like the main hists browser popup menu
    and the user follows a suggestion about using some hotkey, and that
    hotkey is really handled by hists_browser__run(), so allow for calling
    it with that hotkey, making it handle it instead of waiting for the user
    to press one.

    Reviewed-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-xv2l7i6o4urn37nv1h40ryfs@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 Oct, 2019

1 commit

  • There is a memory leak problem in the failure paths of
    build_cl_output(), so fix it.

    Signed-off-by: Yunfeng Ye
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Feilong Lin
    Cc: Hu Shiyuan
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/4d3c0178-5482-c313-98e1-f82090d2d456@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Yunfeng Ye
     

21 Sep, 2019

1 commit

  • This patch is to return error code of perf_new_session function on
    failure instead of NULL.

    Test Results:

    Before Fix:

    $ perf c2c report -input
    failed to open nput: No such file or directory

    $ echo $?
    0
    $

    After Fix:

    $ perf c2c report -input
    failed to open nput: No such file or directory

    $ echo $?
    254
    $

    Committer notes:

    Fix 'perf tests topology' case, where we use that TEST_ASSERT_VAL(...,
    session), i.e. we need to pass zero in case of failure, which was the
    case before when NULL was returned by perf_session__new() for failure,
    but now we need to negate the result of IS_ERR(session) to respect that
    TEST_ASSERT_VAL) expectation of zero meaning failure.

    Reported-by: Nageswara R Sastry
    Signed-off-by: Mamatha Inamdar
    Tested-by: Arnaldo Carvalho de Melo
    Tested-by: Nageswara R Sastry
    Acked-by: Ravi Bangoria
    Reviewed-by: Jiri Olsa
    Reviewed-by: Mukesh Ojha
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Greg Kroah-Hartman
    Cc: Jeremie Galarneau
    Cc: Kate Stewart
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Shawn Landden
    Cc: Song Liu
    Cc: Thomas Gleixner
    Cc: Tzvetomir Stoyanov
    Link: http://lore.kernel.org/lkml/20190822071223.17892.45782.stgit@localhost.localdomain
    Signed-off-by: Arnaldo Carvalho de Melo

    Mamatha Inamdar
     

20 Sep, 2019

1 commit

  • Only a 'struct perf_cmp_map' forward allocation is necessary, fix the
    places that need the header but were getting it indirectly, by luck,
    from env.h.

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-3sj3n534zghxhk7ygzeaqlx9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

01 Sep, 2019

3 commits


30 Aug, 2019

2 commits

  • Its not needed there, add it to the places that need it and were getting
    it via those headers.

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-5yulx1u16vyd0zmrbg1tjhju@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • There's wrong bitmap considered when checking for cpu count of specific
    node.

    We do the needed computation for 'set' variable, but at the end we use
    the 'c2c_he->cpuset' weight, which shows misleading numbers.

    Fixes: 1e181b92a2da ("perf c2c report: Add 'node' sort key")
    Reported-by: Joe Mario
    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20190820140219.28338-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

26 Aug, 2019

1 commit


23 Aug, 2019

1 commit

  • If c2c is recorded on a machine where any cpus are offline, 'perf c2c
    report' throws an error "node/cpu topology bugFailed setup nodes".

    It fails because while preparing node-cpu mapping we don't consider
    offline cpus.

    Reported-by: Nageswara R Sastry
    Signed-off-by: Ravi Bangoria
    Acked-by: Jiri Olsa
    Fixes: 1e181b92a2da ("perf c2c report: Add 'node' sort key")
    Link: http://lkml.kernel.org/r/20190822085045.25108-1-ravi.bangoria@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     

22 Aug, 2019

1 commit


30 Jul, 2019

3 commits

  • Rename struct perf_evlist to struct evlist, so we don't have a name
    clash when we add struct perf_evlist in libperf.

    Committer notes:

    Added fixes to build on arm64, from Jiri and from me
    (tools/perf/util/cs-etm.c)

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Andi Kleen
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Rename struct perf_evsel to struct evsel, so we don't have a name clash
    when we add struct perf_evsel in libperf.

    Committer notes:

    Added fixes for arm64, provided by Jiri.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Andi Kleen
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Rename struct cpu_map to struct perf_cpu_map, so it could be part of
    libperf.

    Committer notes:

    Added fixes for arm64, provided by Jiri.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Andi Kleen
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190721112506.12306-3-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

09 Jul, 2019

1 commit


07 Mar, 2019

1 commit

  • Ravi Bangoria reported that we fail with an empty NUMA node with the
    following message:

    $ lscpu
    NUMA node0 CPU(s):
    NUMA node1 CPU(s): 0-4

    $ sudo ./perf c2c report
    node/cpu topology bugFailed setup nodes

    Fix this by detecting the empty node and keeping its CPU set empty.

    Reported-by: Nageswara R Sastry
    Signed-off-by: Jiri Olsa
    Tested-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jonas Rabenstein
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190305152536.21035-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

23 Feb, 2019

1 commit

  • Add a 'path' member to 'struct perf_data'. It will keep the configured
    path for the data (const char *). The path in struct perf_data_file is
    now dynamically allocated (duped) from it.

    This scheme is useful/used in following patches where struct
    perf_data::path holds the 'configure' directory path and struct
    perf_data_file::path holds the allocated path for specific files.

    Also it actually makes the code little simpler.

    Signed-off-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org
    [ Fixup data-convert-bt.c missing conversion ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

06 Feb, 2019

2 commits

  • Add argument to hists__resort_cb_t so that we can pass data from upper
    layers to the callback function. It will be used in the following
    patches.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190204141808.23031-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Several places were using definitions found in symbols.h but not
    including it, getting it by sheer luck from some other headers that now
    are in the process of removing that include because they don't need it
    or because simply having struct forward declarations is enough, fix it.

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-xbcvvx296d70kpg9wb0qmeq9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

25 Jan, 2019

1 commit

  • At the cost of an extra pointer, we can avoid the O(logN) cost of
    finding the first element in the tree (smallest node), which is
    something heavily required for histograms. Specifically, the following
    are converted to rb_root_cached, and users accordingly:

    hist::entries_in_array
    hist::entries_in
    hist::entries
    hist::entries_collapsed
    hist_entry::hroot_in
    hist_entry::hroot_out

    Signed-off-by: Davidlohr Bueso
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net
    [ Added some missing conversions to rb_first_cached() ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Davidlohr Bueso
     

22 Jan, 2019

1 commit

  • An automatic const char[] variable gets initialized at runtime, just
    like any other automatic variable. For long strings, that uses a lot of
    stack and wastes time building the string; e.g. for the "No %s
    allocation events..." case one has:

    444516: 48 b8 4e 6f 20 25 73 20 61 6c movabs $0x6c61207325206f4e,%rax # "No %s al"
    ...
    444674: 48 89 45 80 mov %rax,-0x80(%rbp)
    444678: 48 b8 6c 6f 63 61 74 69 6f 6e movabs $0x6e6f697461636f6c,%rax # "location"
    444682: 48 89 45 88 mov %rax,-0x78(%rbp)
    444686: 48 b8 20 65 76 65 6e 74 73 20 movabs $0x2073746e65766520,%rax # " events "
    444690: 66 44 89 55 c4 mov %r10w,-0x3c(%rbp)
    444695: 48 89 45 90 mov %rax,-0x70(%rbp)
    444699: 48 b8 66 6f 75 6e 64 2e 20 20 movabs $0x20202e646e756f66,%rax

    Make them all static so that the compiler just references objects in .rodata.

    Committer testing:

    Ok, using dwarves's codiff tool:

    $ codiff --functions /tmp/perf.before ~/bin/perf
    builtin-sched.c:
    cmd_sched | -48
    1 function changed, 48 bytes removed, diff: -48

    builtin-report.c:
    cmd_report | -32
    1 function changed, 32 bytes removed, diff: -32

    builtin-kmem.c:
    cmd_kmem | -64
    build_alloc_func_list | -50
    2 functions changed, 114 bytes removed, diff: -114

    builtin-c2c.c:
    perf_c2c__report | -390
    1 function changed, 390 bytes removed, diff: -390

    ui/browsers/header.c:
    tui__header_window | -104
    1 function changed, 104 bytes removed, diff: -104

    /home/acme/bin/perf:
    9 functions changed, 688 bytes removed, diff: -688

    Signed-off-by: Rasmus Villemoes
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20181102230624.20064-1-linux@rasmusvillemoes.dk
    Signed-off-by: Arnaldo Carvalho de Melo

    Rasmus Villemoes
     

29 Dec, 2018

2 commits

  • The cachelines being reported are the ones with percentages all the way
    down to 0.05%. That makes for very long output files. Raising that to
    0.1%. The user can always specify --show-all if they want all the
    cachelines with hits.

    Suggested-by: Joe Mario
    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20181228101820.28010-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Joe suggested to have the coalesce default set just to 'iaddr', because
    it's easier to read on the default 'perf c2c report' output.

    By removing the "pid" field from the default -c/--coalesce option, the
    'perf c2c' report will group all the relevant PIDs under the instruction
    address ('iaddr') bucket. User can always run "-c pid,iaddr" for a more
    fine grained output on particular PIDs.

    Suggested-by: Joe Mario
    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20181228101820.28010-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

31 Jul, 2018

1 commit

  • 'perf c2c' scans read/write accesses and tries to find false sharing
    cases, so when the events it wants were not asked for or ended up not
    taking place, we get no histograms.

    So do not try to display entry details if there's not any. Currently
    this ends up in crash:

    $ perf c2c report # then press 'd'
    perf: Segmentation fault
    $

    Committer testing:

    Before:

    Record a perf.data file without events of interest to 'perf c2c report',
    then call it and press 'd':

    # perf record sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.001 MB perf.data (6 samples) ]
    # perf c2c report
    perf: Segmentation fault
    -------- backtrace --------
    perf[0x5b1d2a]
    /lib64/libc.so.6(+0x346df)[0x7fcb566e36df]
    perf[0x46fcae]
    perf[0x4a9f1e]
    perf[0x4aa220]
    perf(main+0x301)[0x42c561]
    /lib64/libc.so.6(__libc_start_main+0xe9)[0x7fcb566cff29]
    perf(_start+0x29)[0x42c999]
    #

    After the patch the segfault doesn't take place, a follow up patch to
    tell the user why nothing changes when 'd' is pressed would be good.

    Reported-by: rodia@autistici.org
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: David Ahern
    Cc: Don Zickus
    Cc: Joe Mario
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: f1c5fd4d0bb9 ("perf c2c report: Add TUI cacheline browser")
    Link: http://lkml.kernel.org/r/20180724062008.26126-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa