18 May, 2020

4 commits

  • Add ddr bandwidth usage metric support for i.MX8DXL.

    Metric:
    imx8dxl_lpddr4.bandwidth_usage
    imx8dxl_ddr3l.bandwidth_usage

    Example:
    root@imx8dxlevk:~# ./perf stat -a -I 1000 -M imx8dxl_lpddr4.bandwidth_usage dd if=/dev/zero of=/dev/null bs=100M count=1000000
    1.000242625 1444320 imx8_ddr0/axid-read,axi_mask=0xffff,axi_id=0x0000,axi_channel=0x0/ # 15.1 % imx8dxl_lpddr4.bandwidth_usage
    1.000242625 88964544 imx8_ddr0/axid-write,axi_mask=0xffff,axi_id=0x0000,axi_channel=0x0/
    1.000242625 1000242625 ns duration_time
    2.001170500 297392 imx8_ddr0/axid-read,axi_mask=0xffff,axi_id=0x0000,axi_channel=0x0/ # 16.0 % imx8dxl_lpddr4.bandwidth_usage
    2.001170500 95684315 imx8_ddr0/axid-write,axi_mask=0xffff,axi_id=0x0000,axi_channel=0x0/
    2.001170500 1000927875 ns duration_time
    3.001840125 320798 imx8_ddr0/axid-read,axi_mask=0xffff,axi_id=0x0000,axi_channel=0x0/ # 16.0 % imx8dxl_lpddr4.bandwidth_usage
    3.001840125 95655155 imx8_ddr0/axid-write,axi_mask=0xffff,axi_id=0x0000,axi_channel=0x0/
    3.001840125 1000669625 ns duration_time

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • Add ddr bandwidth usage metric support for i.MX8MP.

    Metric:
    imx8mp-lpddr4-bandwidth-usage

    Example:
    root@imx8mpevk:~# ./perf stat -a -I 1000 -M imx8mp-lpddr4-bandwidth-usage dd if=/dev/zero of=/dev/null bs=100M count=1000000
    1.000770875 18081664 imx8_ddr0/axid-read,axi_mask=0xffff,axi_id=0x0000/ # 37.0 % imx8mp-lpddr4-bandwidth-usage
    1.000770875 5895351484 imx8_ddr0/axid-write,axi_mask=0xffff,axi_id=0x0000/
    1.000770875 1000770875 ns duration_time
    2.001780250 11137456 imx8_ddr0/axid-read,axi_mask=0xffff,axi_id=0x0000/ # 39.0 % imx8mp-lpddr4-bandwidth-usage
    2.001780250 6232776052 imx8_ddr0/axid-write,axi_mask=0xffff,axi_id=0x0000/
    2.001780250 1001009375 ns duration_time
    3.002748125 10643520 imx8_ddr0/axid-read,axi_mask=0xffff,axi_id=0x0000/ # 39.0 % imx8mp-lpddr4-bandwidth-usage
    3.002748125 6229700768 imx8_ddr0/axid-write,axi_mask=0xffff,axi_id=0x0000/
    3.002748125 1000967875 ns duration_time

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • In interval mode, if metric expression contains duration_time event,
    command with -I 5000 config can trigger this cast issue.

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • For interval mode, the metric is printed after the '#' character if it
    exists. But it's not calculated by the counts generated in this
    interval.

    See the following examples:

    root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
    # time counts unit events
    1.000422803 764,809 inst_retired.any # 2.9 CPI
    1.000422803 2,234,932 cycles
    2.001464585 1,960,061 inst_retired.any # 1.6 CPI
    2.001464585 4,022,591 cycles

    The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1)

    root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
    # time counts unit events
    1.000429493 2,869,311 cycles
    1.000429493 816,875 instructions # 0.28 insn per cycle
    2.001516426 9,260,973 cycles
    2.001516426 5,250,634 instructions # 0.87 insn per cycle

    The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is
    0.57).

    The current code uses a global variable 'rt_stat' for tracking and
    updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not
    reset for interval. While the counts are reset for interval.

    perf_stat_process_counter()
    {
    if (config->interval)
    init_stats(ps->res_stats);
    }

    So for interval mode, the 'rt_stat' variable should be reset too.

    This patch resets 'rt_stat' before read_counters(), so the runtime stat
    is only calculated by the counts generated in this interval.

    With this patch:

    root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
    # time counts unit events
    1.000420924 2,408,818 inst_retired.any # 2.1 CPI
    1.000420924 5,010,111 cycles
    2.001448579 2,798,407 inst_retired.any # 1.6 CPI
    2.001448579 4,599,861 cycles

    root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
    # time counts unit events
    1.000428555 2,769,714 cycles
    1.000428555 774,462 instructions # 0.28 insn per cycle
    2.001471562 3,595,904 cycles
    2.001471562 1,243,703 instructions # 0.35 insn per cycle

    Now the second 'insn per cycle' and CPI are calculated by the counts
    generated in this interval.

    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Tested-By: Kajol Jain
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200420145417.6864-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Joakim cherry pick from perf/core commit: 197ba86fdc888dc0d3d6b89b402c9c6851d4c6fb
    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Jin Yao
     

01 Apr, 2020

2 commits


08 Mar, 2020

1 commit

  • Merge Linux stable release v5.4.24 into imx_5.4.y

    * tag 'v5.4.24': (3306 commits)
    Linux 5.4.24
    blktrace: Protect q->blk_trace with RCU
    kvm: nVMX: VMWRITE checks unsupported field before read-only field
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    arch/arm/boot/dts/imx6sll-evk.dts
    arch/arm/boot/dts/imx7ulp.dtsi
    arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
    drivers/clk/imx/clk-composite-8m.c
    drivers/gpio/gpio-mxc.c
    drivers/irqchip/Kconfig
    drivers/mmc/host/sdhci-of-esdhc.c
    drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
    drivers/net/can/flexcan.c
    drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
    drivers/net/ethernet/mscc/ocelot.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    drivers/net/phy/realtek.c
    drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
    drivers/perf/fsl_imx8_ddr_perf.c
    drivers/tee/optee/shm_pool.c
    drivers/usb/cdns3/gadget.c
    kernel/sched/cpufreq.c
    net/core/xdp.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c
    sound/soc/sof/core.c
    sound/soc/sof/imx/Kconfig
    sound/soc/sof/loader.c

    Jason Liu
     

07 Mar, 2020

1 commit

  • libbfd has changed the bfd_section_* macros to inline functions
    bfd_section_ since 2019-09-18. See below two commits:
    o http://www.sourceware.org/ml/gdb-cvs/2019-09/msg00064.html
    o https://www.sourceware.org/ml/gdb-cvs/2019-09/msg00072.html

    This fix make perf able to build with both old and new libbfd.

    Signed-off-by: Changbin Du
    Acked-by: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200128152938.31413-1-changbin.du@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Changbin Du
     

05 Mar, 2020

2 commits

  • commit 604e2139a1026793b8c2172bd92c7e9d039a5cf0 upstream.

    When we moved zalloc.o to the library we missed gtk library which needs
    it compiled in, otherwise the missing __zfree symbol will cause the
    library to fail to load.

    Adding the zalloc object to the gtk library build.

    Fixes: 7f7c536f23e6 ("tools lib: Adopt zalloc()/zfree() from tools/perf")
    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Jelle van der Waa
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200113104358.123511-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     
  • commit 3f7774033e6820d25beee5cf7aefa11d4968b951 upstream.

    We need to set actions->ms.map since 599a2f38a989 ("perf hists browser:
    Check sort keys before hot key actions"), as in that patch we bail out
    if map is NULL.

    Reviewed-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Namhyung Kim
    Fixes: 599a2f38a989 ("perf hists browser: Check sort keys before hot key actions")
    Link: https://lkml.kernel.org/n/tip-wp1ssoewy6zihwwexqpohv0j@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Arnaldo Carvalho de Melo
     

20 Feb, 2020

1 commit

  • commit 80cc7bb6c104d733bff60ddda09f19139c61507c upstream.

    For data collected on machines with front end stalled cycles supported,
    such as found on modern AMD CPU families, commit 146540fb545b ("perf
    stat: Always separate stalled cycles per insn") introduces a new line in
    CSV output with a leading comma that upsets some automated scripts.
    Scripts have to use "-e ex_ret_instr" to work around this issue, after
    upgrading to a version of perf with that commit.

    We could add "if (have_frontend_stalled && !config->csv_sep)" to the not
    (total && avg) else clause, to emphasize that CSV users are usually
    scripts, and are written to do only what is needed, i.e., they wouldn't
    typically invoke "perf stat" without specifying an explicit event list.

    But - let alone CSV output - why should users now tolerate a constant
    0-reporting extra line in regular terminal output?:

    BEFORE:

    $ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1

    Performance counter stats for 'system wide':

    181,110,981 instructions # 0.58 insn per cycle
    # 0.00 stalled cycles per insn
    309,876,469 cycles

    1.002202582 seconds time elapsed

    The user would not like to see the now permanent:

    "0.00 stalled cycles per insn"

    line fixture, as it gives no useful information.

    So this patch removes the printing of the zeroed stalled cycles line
    altogether, almost reverting the very original commit fb4605ba47e7
    ("perf stat: Check for frontend stalled for metrics"), which seems like
    it was written to normalize --metric-only column output of common Intel
    machines at the time: modern Intel machines have ceased to support the
    genericised frontend stalled metrics AFAICT.

    AFTER:

    $ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1

    Performance counter stats for 'system wide':

    244,071,432 instructions # 0.69 insn per cycle
    355,353,490 cycles

    1.001862516 seconds time elapsed

    Output behaviour when stalled cycles is indeed measured is not affected
    (BEFORE == AFTER):

    $ sudo perf stat --all-cpus -einstructions,cycles,stalled-cycles-frontend -- sleep 1

    Performance counter stats for 'system wide':

    247,227,799 instructions # 0.63 insn per cycle
    # 0.26 stalled cycles per insn
    394,745,636 cycles
    63,194,485 stalled-cycles-frontend # 16.01% frontend cycles idle

    1.002079770 seconds time elapsed

    Fixes: 146540fb545b ("perf stat: Always separate stalled cycles per insn")
    Signed-off-by: Kim Phillips
    Acked-by: Andi Kleen
    Acked-by: Jiri Olsa
    Acked-by: Song Liu
    Cc: Alexander Shishkin
    Cc: Cong Wang
    Cc: Davidlohr Bueso
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200207230613.26709-1-kim.phillips@amd.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Kim Phillips
     

12 Feb, 2020

6 commits

  • Add SocName in DDR JSON file, so that metric/metricgroup can filter by this
    property.

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • All metrics under one CPUID would be loaded by Perf tool when the CPUID
    of SoC is matched. So users could see other platforms' metrics from one
    platform, which is very confused. We can match metric/metricgroup with
    SOCNAME if needed.

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • Add support for checking socname for ARCH arm64.

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • Add socname for struct pmu_event for that we can distinguish different SoCs
    by this property.

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • [ Upstream commit eb573e746b9d4f0921dcb2449be3df41dae3caea ]

    Commit f01642e4912b ("perf metricgroup: Support multiple events for
    metricgroup") introduced support for multiple events in a metric group.
    But with the current upstream, metric events names are not printed
    properly

    In power9 platform:

    command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2
    1.000208486
    2.000368863
    2.001400558

    Similarly in skylake platform:

    command:./perf stat --metric-only -M Power -I 1000
    1.000579994
    2.002189493

    With current upstream version, issue is with event name comparison logic
    in find_evsel_group(). Current logic is to compare events belonging to a
    metric group to the events in perf_evlist. Since the break statement is
    missing in the loop used for comparison between metric group and
    perf_evlist events, the loop continues to execute even after getting a
    pattern match, and end up in discarding the matches.

    Incase of single metric event belongs to metric group, its working fine,
    because in case of single event once it compare all events it reaches to
    end of perf_evlist.

    Example for single metric event in power9 platform:

    command:# ./perf stat --metric-only -M branches_per_inst -I 1000 sleep 1
    1.000094653 0.2
    1.001337059 0.0

    This patch fixes the issue by making sure once we found all events
    belongs to that metric event matched in find_evsel_group(), we
    successfully break from that loop by adding corresponding condition.

    With this patch:
    In power9 platform:

    command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2
    result:#
    time derat_4k_miss_rate_percent derat_4k_miss_ratio derat_miss_ratio derat_64k_miss_rate_percent derat_64k_miss_ratio dslb_miss_rate_percent islb_miss_rate_percent
    1.000135672 0.0 0.3 1.0 0.0 0.2 0.0 0.0
    2.000380617 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    command:# ./perf stat --metric-only -M Power -I 1000

    Similarly in skylake platform:
    result:#
    time Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
    1.000563580 0.3 0.0 2.6 44.2 21.9 0.0 0.0 0.0
    2.002235027 0.4 0.0 2.7 43.0 20.7 0.0 0.0 0.0

    Committer testing:

    Before:

    [root@seventh ~]# perf stat --metric-only -M Power -I 1000
    # time
    1.000383223
    2.001168182
    3.001968545
    4.002741200
    5.003442022
    ^C 5.777687244

    [root@seventh ~]#

    After the patch:

    [root@seventh ~]# perf stat --metric-only -M Power -I 1000
    # time Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
    1.000406577 0.4 0.1 1.4 97.0 0.0 0.0 0.0 0.0
    2.001481572 0.3 0.0 0.6 97.9 0.0 0.0 0.0 0.0
    3.002332585 0.2 0.0 1.0 97.5 0.0 0.0 0.0 0.0
    4.003196624 0.2 0.0 0.3 98.6 0.0 0.0 0.0 0.0
    5.004063851 0.3 0.0 0.7 97.7 0.0 0.0 0.0 0.0
    ^C 5.471260276 0.2 0.0 0.5 49.3 0.0 0.0 0.0 0.0

    [root@seventh ~]#
    [root@seventh ~]# dmesg | grep -i skylake
    [ 0.187807] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
    [root@seventh ~]#

    Fixes: f01642e4912b ("perf metricgroup: Support multiple events for metricgroup")
    Signed-off-by: Kajol Jain
    Reviewed-by: Ravi Bangoria
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Anju T Sudhakar
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Madhavan Srinivasan
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191120084059.24458-1-kjain@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Joakim cherry pick from upstream:
    3635b27cc058 perf metricgroup: Fix printing event names of metric group with multiple events

    Signed-off-by: Joakim Zhang

    Kajol Jain
     
  • Currently when cross compiling perf tool for ARM64 on my x86 machine I
    get this error:

    arch/arm64/util/sym-handling.c:9:10: fatal error: gelf.h: No such file or directory
    #include

    For the build, libelf is reported off:

    Auto-detecting system features:
    ...
    ... libelf: [ OFF ]

    Indeed, test-libelf is not built successfully:

    more ./build/feature/test-libelf.make.output
    test-libelf.c:2:10: fatal error: libelf.h: No such file or directory
    #include
    ^~~~~~~~~~
    compilation terminated.

    I have no such problems natively compiling on ARM64, and I did not
    previously have this issue for cross compiling. Fix by relocating the
    gelf.h include.

    Signed-off-by: John Garry
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lore.kernel.org/lkml/1573045254-39833-1-git-send-email-john.garry@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Joakim cherry pick from upstream:
    1302caaef52a perf tools: Fix cross compile for ARM64

    Signed-off-by: Joakim Zhang

    John Garry
     

06 Feb, 2020

2 commits

  • [ Upstream commit c3314a74f86dc00827e0945c8e5039fc3aebaa3c ]

    Commit 800d3f561659 ("perf report: Add warning when libunwind not
    compiled in") breaks the s390 platform. S390 uses libdw-dwarf-unwind for
    call chain unwinding and had no support for libunwind.

    So the warning "Please install libunwind development packages during the
    perf build." caused the confusion even if the call-graph is displayed
    correctly.

    This patch adds checking for HAVE_DWARF_SUPPORT, which is set when
    libdw-dwarf-unwind is compiled in.

    Fixes: 800d3f561659 ("perf report: Add warning when libunwind not compiled in")
    Signed-off-by: Jin Yao
    Reviewed-by: Thomas Richter
    Tested-by: Thomas Richter
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200107191745.18415-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jin Yao
     
  • commit c1c8013ec34d7163431d18367808ea40b2e305f8 upstream.

    Commit 722ddfde366f ("perf tools: Fix time sorting") changed - correctly
    so - hist_entry__sort to return int64. Unfortunately several of the
    builtin-c2c.c comparison routines only happened to work due the cast
    caused by the wrong return type.

    This causes meaningless ordering of both the cacheline list, and the
    cacheline details page. E.g a simple:

    perf c2c record -a sleep 3
    perf c2c report

    will result in cacheline table like
    =================================================
    Shared Data Cache Line Table
    =================================================
    #
    # ------- Cacheline ---------- Total Tot - LLC Load Hitm - - Store Reference - - Load Dram - LLC Total - Core Load Hit - - LLC Load Hit -
    # Index Address Node PA cnt records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rmt
    # ..... .............. .... ...... ....... ...... ..... ..... ... .... ..... ...... ...... .... ...... ..... ..... ..... ... .... .......

    0 0x7f0d27ffba00 N/A 0 52 0.12% 13 6 7 12 12 0 0 7 14 40 4 16 0 0 0
    1 0x7f0d27ff61c0 N/A 0 6353 14.04% 1475 801 674 779 779 0 0 718 1392 5574 1299 1967 0 115 0
    2 0x7f0d26d3ec80 N/A 0 71 0.15% 16 4 12 13 13 0 0 12 24 58 1 20 0 9 0
    3 0x7f0d26d3ec00 N/A 0 98 0.22% 23 17 6 19 19 0 0 6 12 79 0 40 0 10 0

    i.e. with the list not being ordered by Total Hitm.

    Fixes: 722ddfde366f ("perf tools: Fix time sorting")
    Signed-off-by: Andres Freund
    Tested-by: Michael Petlan
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org # v3.16+
    Link: http://lore.kernel.org/lkml/20200109043030.233746-1-andres@anarazel.de
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Andres Freund
     

26 Jan, 2020

1 commit

  • commit f068435d9bb2d825d59e3c101bc579f09315ee01 upstream.

    At some point in the past we needed to make sure we would get the long
    name of modules and not just what we get from /proc/modules, but that
    need, as described in the cset that introduced the adjustment function:

    Fixes: c03d5184f0e9 ("perf machine: Adjust dso->long_name for offline module")

    Without using the buildid-cache:

    # lsmod | grep trusted
    # insmod trusted.ko
    # lsmod | grep trusted
    trusted 24576 0
    # strace -e open,openat perf probe -m ./trusted.ko key_seal |& grep trusted
    openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4
    openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR|O_CREAT, 0644) = 3
    openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    probe:key_seal (on key_seal in trusted)
    # perf probe -l
    probe:key_seal (on key_seal in trusted)
    #

    No attempt at opening '[trusted]'.

    Now using the build-id cache:

    # rmmod trusted
    # perf buildid-cache --add ./trusted.ko
    # insmod trusted.ko
    # strace -e open,openat perf probe -m ./trusted.ko key_seal |& grep trusted
    openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4
    openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR|O_CREAT, 0644) = 3
    openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4
    openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3
    #

    Again, no attempt at reading '[trusted]'.

    Finally, adding a probe to that function and then using:

    [root@quaco ~]# perf trace -e probe_perf:*/max-stack=16/ --max-events=2
    0.000 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263)
    dso__adjust_kmod_long_name (/home/acme/bin/perf)
    machine__process_kernel_mmap_event (/home/acme/bin/perf)
    machine__process_mmap_event (/home/acme/bin/perf)
    perf_event__process_mmap (/home/acme/bin/perf)
    machines__deliver_event (/home/acme/bin/perf)
    perf_session__deliver_event (/home/acme/bin/perf)
    perf_session__process_event (/home/acme/bin/perf)
    process_simple (/home/acme/bin/perf)
    reader__process_events (/home/acme/bin/perf)
    __perf_session__process_events (/home/acme/bin/perf)
    perf_session__process_events (/home/acme/bin/perf)
    process_buildids (/home/acme/bin/perf)
    record__finish_output (/home/acme/bin/perf)
    __cmd_record (/home/acme/bin/perf)
    cmd_record (/home/acme/bin/perf)
    run_builtin (/home/acme/bin/perf)
    0.055 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263)
    dso__adjust_kmod_long_name (/home/acme/bin/perf)
    machine__process_kernel_mmap_event (/home/acme/bin/perf)
    machine__process_mmap_event (/home/acme/bin/perf)
    perf_event__process_mmap (/home/acme/bin/perf)
    machines__deliver_event (/home/acme/bin/perf)
    perf_session__deliver_event (/home/acme/bin/perf)
    perf_session__process_event (/home/acme/bin/perf)
    process_simple (/home/acme/bin/perf)
    reader__process_events (/home/acme/bin/perf)
    __perf_session__process_events (/home/acme/bin/perf)
    perf_session__process_events (/home/acme/bin/perf)
    process_buildids (/home/acme/bin/perf)
    record__finish_output (/home/acme/bin/perf)
    __cmd_record (/home/acme/bin/perf)
    cmd_record (/home/acme/bin/perf)
    run_builtin (/home/acme/bin/perf)
    #

    This was the only path I could find using the perf tools that reach at this
    function, then as of november/2019, if we put a probe in the line where the
    actuall setting of the dso->long_name is done:

    # perf trace -e probe_perf:*
    ^C[root@quaco ~]
    # perf stat -e probe_perf:* -I 2000
    2.000404265 0 probe_perf:dso__adjust_kmod_long_name
    4.001142200 0 probe_perf:dso__adjust_kmod_long_name
    6.001704120 0 probe_perf:dso__adjust_kmod_long_name
    8.002398316 0 probe_perf:dso__adjust_kmod_long_name
    10.002984010 0 probe_perf:dso__adjust_kmod_long_name
    12.003597851 0 probe_perf:dso__adjust_kmod_long_name
    14.004113303 0 probe_perf:dso__adjust_kmod_long_name
    16.004582773 0 probe_perf:dso__adjust_kmod_long_name
    18.005176373 0 probe_perf:dso__adjust_kmod_long_name
    20.005801605 0 probe_perf:dso__adjust_kmod_long_name
    22.006467540 0 probe_perf:dso__adjust_kmod_long_name
    ^C 23.683261941 0 probe_perf:dso__adjust_kmod_long_name

    #

    Its not being used at all.

    To further test this I used kvm.ko as the offline module, i.e. removed
    if from the buildid-cache by nuking it completely (rm -rf ~/.debug) and
    moved it from the normal kernel distro path, removed the modules, stoped
    the kvm guest, and then installed it manually, etc.

    # rmmod kvm-intel
    # rmmod kvm
    # lsmod | grep kvm
    # modprobe kvm-intel
    modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
    modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
    modprobe: ERROR: could not insert 'kvm_intel': Unknown symbol in module, or unknown parameter (see dmesg)
    # insmod ./kvm.ko
    # modprobe kvm-intel
    modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
    modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory
    # lsmod | grep kvm
    kvm_intel 299008 0
    kvm 765952 1 kvm_intel
    irqbypass 16384 1 kvm
    #
    # perf probe -x ~/bin/perf machine__findnew_module_map:12 mname=m.name:string filename=filename:string 'dso_long_name=map->dso->long_name:string' 'dso_name=map->dso->name:string'
    # perf probe -l
    probe_perf:machine__findnew_module_map (on machine__findnew_module_map:12@util/machine.c in /home/acme/bin/perf with mname filename dso_long_name dso_name)
    # perf record
    ^C[ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 3.416 MB perf.data (33956 samples) ]
    # perf trace -e probe_perf:machine*

    6.322 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[salsa20_generic]", filename: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_long_name: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_name: "[salsa20_generic]")
    6.375 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[kvm]", filename: "[kvm]", dso_long_name: "[kvm]", dso_name: "[kvm]")

    The filename doesn't come with the path, no point in trying to set the dso->long_name.

    [root@quaco ~]# strace -e open,openat perf probe -m ./kvm.ko kvm_apic_local_deliver |& egrep 'open.*kvm'
    openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 4
    openat(AT_FDCWD, "/sys/module/kvm/notes/.note.gnu.build-id", O_RDONLY) = 4
    openat(AT_FDCWD, "/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 7
    openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 8
    openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/.debug/root/kvm.ko/5955f426cb93f03f30f3e876814be2db80ab0b55/probes", O_RDWR|O_CREAT, 0644) = 3
    openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/.debug/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, ".debug/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
    openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 4
    openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3
    [root@quaco ~]#

    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-jlfew3lyb24d58egrp0o72o2@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Arnaldo Carvalho de Melo
     

23 Jan, 2020

5 commits

  • [ Upstream commit b3509b6ed7a79ec49f6b64e4f3b780f259a2a468 ]

    My earlier patch to just enable --reltime with --time was a little too
    optimistic. The --time parsing would accept absolute time, which is
    very confusing to the user.

    Support relative time in --time parsing too. This only works with recent
    perf record that records the first sample time. Otherwise we error out.

    Fixes: 3714437d3fcc ("perf script: Allow --time with --reltime")
    Signed-off-by: Andi Kleen
    Cc: Jiri Olsa
    Link: http://lore.kernel.org/lkml/20191011182140.8353-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Andi Kleen
     
  • commit 3714437d3fcc7956cabcb0077f2a506b61160a56 upstream.

    The original --reltime patch forbid --time with --reltime.

    But it turns out --time doesn't really care about --reltime, because the
    relative time is only used at final output, while the time filtering
    always works earlier on absolute time.

    So just remove the check and allow combining the two options.

    Fixes: 90b10f47c0ee ("perf script: Support relative time")
    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lore.kernel.org/lkml/20191002164642.1719-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     
  • commit 07d369857808b7e8e471bbbbb0074a6718f89b31 upstream.

    Since there are some DIE which has only ranges instead of the
    combination of entrypc/highpc, address verification must use
    dwarf_haspc() instead of dwarf_entrypc/dwarf_highpc.

    Also, the ranges only DIE will have a partial code in different section
    (e.g. unlikely code will be in text.unlikely as "FUNC.cold" symbol). In
    that case, we can not use dwarf_entrypc() or die_entrypc(), because the
    offset from original DIE can be a minus value.

    Instead, this simply gets the symbol and offset from symtab.

    Without this patch;

    # perf probe -D clear_tasks_mm_cpumask:1
    Failed to get entry address of clear_tasks_mm_cpumask
    Error: Failed to add events.

    And with this patch:

    # perf probe -D clear_tasks_mm_cpumask:1
    p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0
    p:probe/clear_tasks_mm_cpumask_1 clear_tasks_mm_cpumask+5
    p:probe/clear_tasks_mm_cpumask_2 clear_tasks_mm_cpumask+8
    p:probe/clear_tasks_mm_cpumask_3 clear_tasks_mm_cpumask+16
    p:probe/clear_tasks_mm_cpumask_4 clear_tasks_mm_cpumask+82

    Committer testing:

    I managed to reproduce the above:

    [root@quaco ~]# perf probe -D clear_tasks_mm_cpumask:1
    p:probe/clear_tasks_mm_cpumask _text+919968
    p:probe/clear_tasks_mm_cpumask_1 _text+919973
    p:probe/clear_tasks_mm_cpumask_2 _text+919976
    [root@quaco ~]#

    But then when trying to actually put the probe in place, it fails if I
    use :0 as the offset:

    [root@quaco ~]# perf probe -L clear_tasks_mm_cpumask | head -5

    0 void clear_tasks_mm_cpumask(int cpu)
    1 {
    2 struct task_struct *p;

    [root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
    Probe point 'clear_tasks_mm_cpumask' not found.
    Error: Failed to add events.
    [root@quaco

    The next patch is needed to fix this case.

    Fixes: 576b523721b7 ("perf probe: Fix probing symbols with optimization suffix")
    Reported-by: Arnaldo Carvalho de Melo
    Tested-by: Arnaldo Carvalho de Melo
    Signed-off-by: Masami Hiramatsu
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/157199318513.8075.10463906803299647907.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • commit 0feba17bd7ee3b7e03d141f119049dcc23efa94e upstream.

    We observed an issue that was some extra columns displayed after switching
    perf data file in browser. The steps to reproduce:

    1. perf record -a -e cycles,instructions -- sleep 3
    2. perf report --group
    3. In browser, we use hotkey 's' to switch to another perf.data
    4. Now in browser, the extra columns 'Self' and 'Children' are displayed.

    The issue is setup_sorting() executed again after repeat path, so dimensions
    are added again.

    This patch checks the last key returned from __cmd_report(). If it's
    K_SWITCH_INPUT_DATA, skips the setup_sorting().

    Fixes: ad0de0971b7f ("perf report: Enable the runtime switching of perf data file")
    Signed-off-by: Jin Yao
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Feng Tang
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191220013722.20592-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Jin Yao
     
  • commit 55347ec340af401437680fd0e88df6739a967f9f upstream.

    Variable names are inconsistent in hists__for_each macro().

    Due to this inconsistency, the macro replaces its second argument with
    "fmt" regardless of its original name.

    So far it works because only "fmt" is passed to the second argument.
    However, this behavior is not expected and should be fixed.

    Fixes: f0786af536bb ("perf hists: Introduce hists__for_each_format macro")
    Fixes: aa6f50af822a ("perf hists: Introduce hists__for_each_sort_list macro")
    Signed-off-by: Yuya Fujita
    Acked-by: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/OSAPR01MB1588E1C47AC22043175DE1B2E8520@OSAPR01MB1588.jpnprd01.prod.outlook.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Yuya Fujita
     

18 Jan, 2020

1 commit

  • commit 58b3bafff8257c6946df5d6aeb215b8ac839ed2a upstream.

    In 7fcfa9a2d9 an unintended prefix "Counter:18 Name:" was removed from
    the description for L1D_RO_EXCL_WRITES, but the extra name remained in
    the description. Remove it too.

    Fixes: 7fcfa9a2d9a7 ("perf list: Fix s390 counter long description for L1D_RO_EXCL_WRITES")
    Signed-off-by: Ed Maste
    Cc: Alexander Shishkin
    Cc: Greentime Hu
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Nick Hu
    Cc: Peter Zijlstra
    Cc: Thomas Richter
    Cc: Vincent Chen
    Link: http://lore.kernel.org/lkml/20191212145346.5026-1-emaste@freefall.freebsd.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Ed Maste
     

12 Jan, 2020

2 commits

  • [ Upstream commit 28707826877f84bce0977845ea529cbdd08e4e8d ]

    Before this patch, perf expected that there might be NPROC*4 unique
    cache entries at max, however, it also expected that some of them would
    be shared and/or of the same size, thus the final number of entries
    would be reduced to be lower than NPROC*4. In case the number of entries
    hadn't been reduced (was NPROC*4), the warning was printed.

    However, some systems might have unusual cache topology, such as the
    following two-processor KVM guest:

    cpu level shared_cpu_list size
    0 1 0 32K
    0 1 0 64K
    0 2 0 512K
    0 3 0 8192K
    1 1 1 32K
    1 1 1 64K
    1 2 1 512K
    1 3 1 8192K

    This KVM guest has 8 (NPROC*4) unique cache entries, which used to make
    perf printing the message, although there actually aren't "way too many
    cpu caches".

    v2: Removing unused argument.

    v3: Unifying the way we obtain number of cpus.

    v4: Removed '& UINT_MAX' construct which is redundant.

    Signed-off-by: Michael Petlan
    Acked-by: Jiri Olsa
    LPU-Reference: 20191208162056.20772-1-mpetlan@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Michael Petlan
     
  • [ Upstream commit eb573e746b9d4f0921dcb2449be3df41dae3caea ]

    Commit f01642e4912b ("perf metricgroup: Support multiple events for
    metricgroup") introduced support for multiple events in a metric group.
    But with the current upstream, metric events names are not printed
    properly

    In power9 platform:

    command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2
    1.000208486
    2.000368863
    2.001400558

    Similarly in skylake platform:

    command:./perf stat --metric-only -M Power -I 1000
    1.000579994
    2.002189493

    With current upstream version, issue is with event name comparison logic
    in find_evsel_group(). Current logic is to compare events belonging to a
    metric group to the events in perf_evlist. Since the break statement is
    missing in the loop used for comparison between metric group and
    perf_evlist events, the loop continues to execute even after getting a
    pattern match, and end up in discarding the matches.

    Incase of single metric event belongs to metric group, its working fine,
    because in case of single event once it compare all events it reaches to
    end of perf_evlist.

    Example for single metric event in power9 platform:

    command:# ./perf stat --metric-only -M branches_per_inst -I 1000 sleep 1
    1.000094653 0.2
    1.001337059 0.0

    This patch fixes the issue by making sure once we found all events
    belongs to that metric event matched in find_evsel_group(), we
    successfully break from that loop by adding corresponding condition.

    With this patch:
    In power9 platform:

    command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2
    result:#
    time derat_4k_miss_rate_percent derat_4k_miss_ratio derat_miss_ratio derat_64k_miss_rate_percent derat_64k_miss_ratio dslb_miss_rate_percent islb_miss_rate_percent
    1.000135672 0.0 0.3 1.0 0.0 0.2 0.0 0.0
    2.000380617 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    command:# ./perf stat --metric-only -M Power -I 1000

    Similarly in skylake platform:
    result:#
    time Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
    1.000563580 0.3 0.0 2.6 44.2 21.9 0.0 0.0 0.0
    2.002235027 0.4 0.0 2.7 43.0 20.7 0.0 0.0 0.0

    Committer testing:

    Before:

    [root@seventh ~]# perf stat --metric-only -M Power -I 1000
    # time
    1.000383223
    2.001168182
    3.001968545
    4.002741200
    5.003442022
    ^C 5.777687244

    [root@seventh ~]#

    After the patch:

    [root@seventh ~]# perf stat --metric-only -M Power -I 1000
    # time Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
    1.000406577 0.4 0.1 1.4 97.0 0.0 0.0 0.0 0.0
    2.001481572 0.3 0.0 0.6 97.9 0.0 0.0 0.0 0.0
    3.002332585 0.2 0.0 1.0 97.5 0.0 0.0 0.0 0.0
    4.003196624 0.2 0.0 0.3 98.6 0.0 0.0 0.0 0.0
    5.004063851 0.3 0.0 0.7 97.7 0.0 0.0 0.0 0.0
    ^C 5.471260276 0.2 0.0 0.5 49.3 0.0 0.0 0.0 0.0

    [root@seventh ~]#
    [root@seventh ~]# dmesg | grep -i skylake
    [ 0.187807] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
    [root@seventh ~]#

    Fixes: f01642e4912b ("perf metricgroup: Support multiple events for metricgroup")
    Signed-off-by: Kajol Jain
    Reviewed-by: Ravi Bangoria
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Anju T Sudhakar
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Madhavan Srinivasan
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20191120084059.24458-1-kjain@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Kajol Jain
     

09 Jan, 2020

1 commit

  • commit aceb98261ea7d9fe38f9c140c5531f0b13623832 upstream.

    Do not dereference 'chain' when it is NULL.

    $ perf record -e intel_pt//u -e branch-misses:u uname
    $ perf report --itrace=l --branch-history
    perf: Segmentation fault

    Fixes: e9024d519d89 ("perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}")
    Signed-off-by: Adrian Hunter
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Link: http://lore.kernel.org/lkml/20191114142538.4097-1-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     

07 Jan, 2020

1 commit


05 Jan, 2020

3 commits

  • [ Upstream commit 5b596e0ff0e1852197d4c82d3314db5e43126bf7 ]

    To avoid breaking the build on arches where this is not wired up, at
    least all the other features should be made available and when using
    this specific routine, the "unknown" should point the user/developer to
    the need to wire this up on this particular hardware architecture.

    Detected in a container mipsel debian cross build environment, where it
    shows up as:

    In file included from /usr/mipsel-linux-gnu/include/stdio.h:867,
    from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
    from util/session.c:13:
    In function 'printf',
    inlined from 'regs_dump__printf' at util/session.c:1103:3,
    inlined from 'regs__printf' at util/session.c:1131:2:
    /usr/mipsel-linux-gnu/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
    107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    cross compiler details:

    mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909

    Also on mips64:

    In file included from /usr/mips64-linux-gnuabi64/include/stdio.h:867,
    from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
    from util/session.c:13:
    In function 'printf',
    inlined from 'regs_dump__printf' at util/session.c:1103:3,
    inlined from 'regs__printf' at util/session.c:1131:2,
    inlined from 'regs_user__printf' at util/session.c:1139:3,
    inlined from 'dump_sample' at util/session.c:1246:3,
    inlined from 'machines__deliver_event' at util/session.c:1421:3:
    /usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
    107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'printf',
    inlined from 'regs_dump__printf' at util/session.c:1103:3,
    inlined from 'regs__printf' at util/session.c:1131:2,
    inlined from 'regs_intr__printf' at util/session.c:1147:3,
    inlined from 'dump_sample' at util/session.c:1249:3,
    inlined from 'machines__deliver_event' at util/session.c:1421:3:
    /usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
    107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    cross compiler details:

    mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909

    Fixes: 2bcd355b71da ("perf tools: Add interface to arch registers sets")
    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-95wjyv4o65nuaeweq31t7l1s@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 0cd032d3b5fcebf5454315400ab310746a81ca53 ]

    brstackinsn must be allowed to be set by the user when AUX area data has
    been captured because, in that case, the branch stack might be
    synthesized on the fly. This fixes the following error:

    Before:

    $ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
    [ perf record: Woken up 19 times to write data ]
    [ perf record: Captured and wrote 2.274 MB perf.data ]
    $ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
    Display of branch stack assembler requested, but non all-branch filter set
    Hint: run 'perf record -b ...'

    After:

    $ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
    [ perf record: Woken up 19 times to write data ]
    [ perf record: Captured and wrote 2.274 MB perf.data ]
    $ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
    grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep)
    bmexec+2485:
    00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED
    00005641d5806bd0 movzxb (%r13,%rdx,1), %eax
    00005641d5806bd6 add %rdi, %rax
    00005641d5806bd9 movzxb -0x1(%rax), %edx
    00005641d5806bdd cmp %rax, %r14
    00005641d5806be0 jnb 0x5641d58069c0 # MISPRED
    mismatch of LBR data and executable
    00005641d58069c0 movzxb (%r13,%rdx,1), %edi

    Fixes: 48d02a1d5c13 ("perf script: Add 'brstackinsn' for branch stacks")
    Reported-by: Andi Kleen
    Signed-off-by: Adrian Hunter
    Cc: Jiri Olsa
    Link: http://lore.kernel.org/lkml/20191127095322.15417-1-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Adrian Hunter
     
  • [ Upstream commit 98e93245113d0f5c279ef77f4a9e7d097323ad71 ]

    To fix these build errors on a debian mipsel cross build environment:

    builtin-diff.c: In function 'block_cycles_diff_cmp':
    builtin-diff.c:550:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value]
    550 | l = labs(left->diff.cycles);
    | ^~~~
    builtin-diff.c:551:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value]
    551 | r = labs(right->diff.cycles);
    | ^~~~

    Fixes: 99150a1faab2 ("perf diff: Use hists to manage basic blocks per symbol")
    Cc: Jin Yao
    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-pn7szy5uw384ntjgk6zckh6a@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     

31 Dec, 2019

7 commits

  • commit 91e2f539eeda26ab00bd03fae8dc434c128c85ed upstream.

    Fix die_walk_lines() to list the function entry line correctly. Since
    the dwarf_entrypc() does not return the entry pc if the DIE has only
    range attribute, __die_walk_funclines() fails to list the declaration
    line (entry line) in that case.

    To solve this issue, this introduces die_entrypc() which correctly
    returns the entry PC (the first address range) even if the DIE has only
    range attribute. With this fix die_walk_lines() shows the function entry
    line is able to probe correctly.

    Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface")
    Signed-off-by: Masami Hiramatsu
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/157190837419.1859.4619125803596816752.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Thomas Backlund
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • [ Upstream commit bb1835a3b86c73aa534ef6430ad40223728dfbc0 ]

    Avoid termination of trace loading in case the last record in the
    decompressed buffer partly resides in the following mmaped
    PERF_RECORD_COMPRESSED record.

    In this case NULL value returned by fetch_mmaped_event() means to
    proceed to the next mmaped record then decompress it and load compressed
    events.

    The issue can be reproduced like this:

    $ perf record -z -- some_long_running_workload
    $ perf report --stdio -vv
    decomp (B): 44519 to 163000
    decomp (B): 48119 to 174800
    decomp (B): 65527 to 131072
    fetch_mmaped_event: head=0x1ffe0 event->header_size=0x28, mmap_size=0x20000: fuzzed perf.data?
    Error:
    failed to process sample
    ...

    Testing:

    71: Zstd perf.data compression/decompression : Ok

    $ tools/perf/perf report -vv --stdio
    decomp (B): 59593 to 262160
    decomp (B): 4438 to 16512
    decomp (B): 285 to 880
    Looking at the vmlinux_path (8 entries long)
    Using vmlinux for symbols
    decomp (B): 57474 to 261248
    prefetch_event: head=0x3fc78 event->header_size=0x28, mmap_size=0x3fc80: fuzzed or compressed perf.data?
    decomp (B): 25 to 32
    decomp (B): 52 to 120
    ...

    Fixes: 57fc032ad643 ("perf session: Avoid infinite loop when seeing invalid header.size")
    Link: https://marc.info/?l=linux-kernel&m=156580812427554&w=2
    Co-developed-by: Jiri Olsa
    Acked-by: Jiri Olsa
    Signed-off-by: Alexey Budankov
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/cf782c34-f3f8-2f9f-d6ab-145cee0d5322@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Alexey Budankov
     
  • [ Upstream commit da6cb952a89efe24bb76c4971370d485737a2d85 ]

    Filter out instances except for inlined_subroutine and subprogram DIE in
    die_walk_instances() and die_is_func_instance().

    This fixes an issue that perf probe sets some probes on calling address
    instead of a target function itself.

    When perf probe walks on instances of an abstruct origin (a kind of
    function prototype of inlined function), die_walk_instances() can also
    pass a GNU_call_site (a GNU extension for call site) to callback. Since
    it is not an inlined instance of target function, we have to filter out
    when searching a probe point.

    Without this patch, perf probe sets probes on call site address too.This
    can happen on some function which is marked "inlined", but has actual
    symbol. (I'm not sure why GCC mark it "inlined"):

    # perf probe -D vfs_read
    p:probe/vfs_read _text+2500017
    p:probe/vfs_read_1 _text+2499468
    p:probe/vfs_read_2 _text+2499563
    p:probe/vfs_read_3 _text+2498876
    p:probe/vfs_read_4 _text+2498512
    p:probe/vfs_read_5 _text+2498627

    With this patch:

    Slightly different results, similar tho:

    # perf probe -D vfs_read
    p:probe/vfs_read _text+2498512

    Committer testing:

    # uname -a
    Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

    Before:

    # perf probe -D vfs_read
    p:probe/vfs_read _text+3131557
    p:probe/vfs_read_1 _text+3130975
    p:probe/vfs_read_2 _text+3131047
    p:probe/vfs_read_3 _text+3130380
    p:probe/vfs_read_4 _text+3130000
    # uname -a
    Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
    #

    After:

    # perf probe -D vfs_read
    p:probe/vfs_read _text+3130000
    #

    Fixes: db0d2c6420ee ("perf probe: Search concrete out-of-line instances")
    Signed-off-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/157241937063.32002.11024544873990816590.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     
  • [ Upstream commit f4d99bdfd124823a81878b44b5e8750b97f73902 ]

    Skip end-of-sequence and non-statement lines while walking through lines
    list.

    The "end-of-sequence" line information means:

    "the current address is that of the first byte after the
    end of a sequence of target machine instructions."
    (DWARF version 4 spec 6.2.2)

    This actually means out of scope and we can not probe on it.

    On the other hand, the statement lines (is_stmt) means:

    "the current instruction is a recommended breakpoint location.
    A recommended breakpoint location is intended to “represent”
    a line, a statement and/or a semantically distinct subpart
    of a statement."

    (DWARF version 4 spec 6.2.2)

    So, non-statement line info also should be skipped.

    These can reduce unneeded probe points and also avoid an error.

    E.g. without this patch:

    # perf probe -a "clear_tasks_mm_cpumask:1"
    Added new events:
    probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
    probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
    probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
    probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1)
    probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1)

    You can now use it in all perf tools, such as:

    perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1

    #

    This puts 5 probes on one line, but acutally it's not inlined function.
    This is because there are many non statement instructions at the
    function prologue.

    With this patch:

    # perf probe -a "clear_tasks_mm_cpumask:1"
    Added new event:
    probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)

    You can now use it in all perf tools, such as:

    perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1

    #

    Now perf-probe skips unneeded addresses.

    Committer testing:

    Slightly different results, but similar:

    Before:

    # uname -a
    Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
    #
    # perf probe -a "clear_tasks_mm_cpumask:1"
    Added new events:
    probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
    probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
    probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)

    You can now use it in all perf tools, such as:

    perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1

    #

    After:

    # perf probe -a "clear_tasks_mm_cpumask:1"
    Added new event:
    probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)

    You can now use it in all perf tools, such as:

    perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1

    # perf probe -l
    probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
    #

    Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface")
    Signed-off-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/157241936090.32002.12156347518596111660.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     
  • [ Upstream commit 86c0bf8539e7f46d91bd105e55eda96e0064caef ]

    Fix to show calling lines of inlined functions (where an inline function
    is called).

    die_walk_lines() filtered out the lines inside inlined functions based
    on the address. However this also filtered out the lines which call
    those inlined functions from the target function.

    To solve this issue, check the call_file and call_line attributes and do
    not filter out if it matches to the line information.

    Without this fix, perf probe -L doesn't show some lines correctly.
    (don't see the lines after 17)

    # perf probe -L vfs_read

    0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
    1 {
    2 ssize_t ret;

    4 if (!(file->f_mode & FMODE_READ))
    return -EBADF;
    6 if (!(file->f_mode & FMODE_CAN_READ))
    return -EINVAL;
    8 if (unlikely(!access_ok(buf, count)))
    return -EFAULT;

    11 ret = rw_verify_area(READ, file, pos, count);
    12 if (!ret) {
    13 if (count > MAX_RW_COUNT)
    count = MAX_RW_COUNT;
    15 ret = __vfs_read(file, buf, count, pos);
    16 if (ret > 0) {
    fsnotify_access(file);
    add_rchar(current, ret);
    }

    With this fix:

    # perf probe -L vfs_read

    0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
    1 {
    2 ssize_t ret;

    4 if (!(file->f_mode & FMODE_READ))
    return -EBADF;
    6 if (!(file->f_mode & FMODE_CAN_READ))
    return -EINVAL;
    8 if (unlikely(!access_ok(buf, count)))
    return -EFAULT;

    11 ret = rw_verify_area(READ, file, pos, count);
    12 if (!ret) {
    13 if (count > MAX_RW_COUNT)
    count = MAX_RW_COUNT;
    15 ret = __vfs_read(file, buf, count, pos);
    16 if (ret > 0) {
    17 fsnotify_access(file);
    18 add_rchar(current, ret);
    }
    20 inc_syscr(current);
    }

    Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface")
    Signed-off-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/157241937995.32002.17899884017011512577.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     
  • [ Upstream commit c701636aeec4c173208697d68da6e4271125564b ]

    Make find_best_scope() returns innermost DIE at given address if there
    is no best matched scope DIE. Since Gcc sometimes generates intuitively
    strange line info which is out of inlined function address range, we
    need this fixup.

    Without this, sometimes perf probe failed to probe on a line inside an
    inlined function:

    # perf probe -D ksys_open:3
    Failed to find scope of probe point.
    Error: Failed to add events.

    With this fix, 'perf probe' can probe it:

    # perf probe -D ksys_open:3
    p:probe/ksys_open _text+25707308
    p:probe/ksys_open_1 _text+25710596
    p:probe/ksys_open_2 _text+25711114
    p:probe/ksys_open_3 _text+25711343
    p:probe/ksys_open_4 _text+25714058
    p:probe/ksys_open_5 _text+2819653
    p:probe/ksys_open_6 _text+2819701

    Signed-off-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Namhyung Kim
    Cc: Ravi Bangoria
    Cc: Steven Rostedt (VMware)
    Cc: Tom Zanussi
    Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     
  • [ Upstream commit dee36a2abb67c175265d49b9a8c7dfa564463d9a ]

    Since debuginfo__find_probes() callback function can be called with the
    location which already passed, the callback function must filter out
    such overlapped locations.

    add_probe_trace_event() has already done it by commit 1a375ae7659a
    ("perf probe: Skip same probe address for a given line"), but
    add_available_vars() doesn't. Thus perf probe -v shows same address
    repeatedly as below:

    # perf probe -V vfs_read:18
    Available variables at vfs_read:18
    @
    char* buf
    loff_t* pos
    ssize_t ret
    struct file* file
    @
    char* buf
    loff_t* pos
    ssize_t ret
    struct file* file
    @
    char* buf
    loff_t* pos
    ssize_t ret
    struct file* file

    With this fix, perf probe -V shows it correctly:

    # perf probe -V vfs_read:18
    Available variables at vfs_read:18
    @
    char* buf
    loff_t* pos
    ssize_t ret
    struct file* file
    @
    char* buf
    loff_t* pos
    ssize_t ret
    struct file* file

    Fixes: cf6eb489e5c0 ("perf probe: Show accessible local variables")
    Signed-off-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/157241938927.32002.4026859017790562751.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Masami Hiramatsu