14 Oct, 2020

2 commits

  • Allow the CC compiler to accept a CFLAGS environment variable. This
    doesn't change the code generated but makes it easier to integrate
    running the shell script in build systems like bazel.

    Signed-off-by: Ian Rogers
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexios Zavras
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: Igor Lubashev
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Mark Rutland
    Cc: Mathieu Poirier
    Cc: Namhyung Kim
    Cc: Nick Desaulniers
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Wei Li
    Link: http://lore.kernel.org/lkml/20200306071110.130202-4-irogers@google.com
    [ split from a larger patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian Rogers
     
  • The variable 'traceid_list' is defined in the header file cs-etm.h,
    if multiple C files include cs-etm.h the compiler might complaint for
    multiple definition of 'traceid_list'.

    To fix multiple definition error, move the definition of 'traceid_list'
    into cs-etm.c.

    Fixes: cd8bfd8c973e ("perf tools: Add processing of coresight metadata")
    Reported-by: Thomas Backlund
    Signed-off-by: Leo Yan
    Reviewed-by: Mathieu Poirier
    Reviewed-by: Mike Leach
    Tested-by: Mike Leach
    Tested-by: Thomas Backlund
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Suzuki Poulouse
    Cc: Tor Jeremiassen
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lore.kernel.org/lkml/20200505133642.4756-1-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Leo Yan
     

08 Oct, 2020

1 commit

  • * tag 'v5.4.70': (3051 commits)
    Linux 5.4.70
    netfilter: ctnetlink: add a range check for l3/l4 protonum
    ep_create_wakeup_source(): dentry name can change under you...
    ...

    Conflicts:
    arch/arm/mach-imx/pm-imx6.c
    arch/arm64/boot/dts/freescale/imx8mm-evk.dts
    arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts
    drivers/crypto/caam/caamalg.c
    drivers/gpu/drm/imx/dw_hdmi-imx.c
    drivers/gpu/drm/imx/imx-ldb.c
    drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
    drivers/mmc/host/sdhci-esdhc-imx.c
    drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
    drivers/net/ethernet/freescale/enetc/enetc.c
    drivers/net/ethernet/freescale/enetc/enetc_pf.c
    drivers/thermal/imx_thermal.c
    drivers/usb/cdns3/ep0.c
    drivers/xen/swiotlb-xen.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c

    Signed-off-by: Jason Liu

    Jason Liu
     

01 Oct, 2020

16 commits

  • [ Upstream commit 8510895bafdbf7c4dd24c22946d925691135c2b2 ]

    A big uncore event group is split into multiple small groups which only
    include the uncore events from the same PMU. This has been supported in
    the commit 3cdc5c2cb924a ("perf parse-events: Handle uncore event
    aliases in small groups properly").

    If the event's PMU name starts to repeat, it must be a new event.
    That can be used to distinguish the leader from other members.
    But now it only compares the pointer of pmu_name
    (leader->pmu_name == evsel->pmu_name).

    If we use "perf stat -M LLC_MISSES.PCIE_WRITE -a" on cascadelakex,
    the event list is:

    evsel->name evsel->pmu_name
    ---------------------------------------------------------------
    unc_iio_data_req_of_cpu.mem_write.part0 uncore_iio_4 (as leader)
    unc_iio_data_req_of_cpu.mem_write.part0 uncore_iio_2
    unc_iio_data_req_of_cpu.mem_write.part0 uncore_iio_0
    unc_iio_data_req_of_cpu.mem_write.part0 uncore_iio_5
    unc_iio_data_req_of_cpu.mem_write.part0 uncore_iio_3
    unc_iio_data_req_of_cpu.mem_write.part0 uncore_iio_1
    unc_iio_data_req_of_cpu.mem_write.part1 uncore_iio_4
    ......

    For the event "unc_iio_data_req_of_cpu.mem_write.part1" with
    "uncore_iio_4", it should be the event from PMU "uncore_iio_4".
    It's not a new leader for this PMU.

    But if we use "(leader->pmu_name == evsel->pmu_name)", the check
    would be failed and the event is stored to leaders[] as a new
    PMU leader.

    So this patch uses strcmp to compare the PMU name between events.

    Fixes: d4953f7ef1a2 ("perf parse-events: Fix 3 use after frees found with clang ASAN")
    Signed-off-by: Jin Yao
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jin Yao
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200430003618.17002-1-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jin Yao
     
  • [ Upstream commit 463538a383a27337cb83ae195e432a839a52d639 ]

    Commit 5aa98879efe7 ("s390/cpum_sf: prohibit callchain data collection")
    prohibits call graph sampling for hardware events on s390. The
    information recorded is out of context and does not match.

    On s390 this commit now breaks test case 68 Zstd perf.data
    compression/decompression.

    Therefore omit call graph sampling on s390 in this test.

    Output before:
    [root@t35lp46 perf]# ./perf test -Fv 68
    68: Zstd perf.data compression/decompression :
    --- start ---
    Collecting compressed record file:
    Error:
    cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
    Try 'perf stat'
    ---- end ----
    Zstd perf.data compression/decompression: FAILED!
    [root@t35lp46 perf]#

    Output after:
    [root@t35lp46 perf]# ./perf test -Fv 68
    68: Zstd perf.data compression/decompression :
    --- start ---
    Collecting compressed record file:
    500+0 records in
    500+0 records out
    256000 bytes (256 kB, 250 KiB) copied, 0.00615638 s, 41.6 MB/s
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.004 MB /tmp/perf.data.X3M,
    compressed (original 0.002 MB, ratio is 3.609) ]
    Checking compressed events stats:
    # compressed : Zstd, level = 1, ratio = 4
    COMPRESSED events: 1
    2ELIFREPh---- end ----
    Zstd perf.data compression/decompression: Ok
    [root@t35lp46 perf]#

    Signed-off-by: Thomas Richter
    Reviewed-by: Sumanth Korikkar
    Cc: Heiko Carstens
    Cc: Sven Schnelle
    Cc: Vasily Gorbik
    Link: http://lore.kernel.org/lkml/20200729135314.91281-1-tmricht@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Thomas Richter
     
  • [ Upstream commit 61f82e3fb697a8e85f22fdec786528af73dc36d1 ]

    In the absence of any modules, no "modules" map is created, but there
    are other executable pages to map, due to eBPF JIT, kprobe or ftrace.
    Map them by recognizing that the first "module" symbol is not
    necessarily from a module, and adjust the map accordingly.

    Signed-off-by: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Leo Yan
    Cc: Mark Rutland
    Cc: Masami Hiramatsu
    Cc: Mathieu Poirier
    Cc: Peter Zijlstra
    Cc: Steven Rostedt (VMware)
    Cc: x86@kernel.org
    Link: http://lore.kernel.org/lkml/20200512121922.8997-10-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Adrian Hunter
     
  • [ Upstream commit a159e2fe89b4d1f9fb54b0ae418b961e239bf617 ]

    Avoid a simple memory leak.

    Signed-off-by: Ian Rogers
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Andi Kleen
    Cc: Andrii Nakryiko
    Cc: Cong Wang
    Cc: Daniel Borkmann
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: John Fastabend
    Cc: John Garry
    Cc: Kajol Jain
    Cc: Kan Liang
    Cc: Kim Phillips
    Cc: Mark Rutland
    Cc: Martin KaFai Lau
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Song Liu
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: Yonghong Song
    Cc: bpf@vger.kernel.org
    Cc: kp singh
    Cc: netdev@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20200508053629.210324-10-irogers@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Ian Rogers
     
  • [ Upstream commit 07e9a6f538cbeecaf5c55b6f2991416f873cdcbd ]

    Need to free "str" before return when asprintf() failed to avoid memory
    leak.

    Signed-off-by: Xie XiuQi
    Cc: Alexander Shishkin
    Cc: Hongbo Yao
    Cc: Jiri Olsa
    Cc: Li Bin
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/20200521133218.30150-4-liwei391@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Xie XiuQi
     
  • [ Upstream commit ea9eb1f456a08c18feb485894185f7a4e31cc8a4 ]

    Joakim reported wrong duration_time value for interval bigger
    than 4000 [1].

    The problem is in the interval value we pass to update_stats
    function, which is typed as 'unsigned int' and overflows when
    we get over 2^32 (happens between intervals 4000 and 5000).

    Retyping the passed value to unsigned long long.

    [1] https://www.spinics.net/lists/linux-perf-users/msg11777.html

    Fixes: b90f1333ef08 ("perf stat: Update walltime_nsecs_stats in interval mode")
    Reported-by: Joakim Zhang
    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200518131445.3745083-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jiri Olsa
     
  • [ Upstream commit 7597ce89b3ed239f7a3408b930d2a6c7a4c938a1 ]

    Make the architecture test directory agree with the code comment.

    Committer notes:

    This was split from a larger patch.

    The code was assuming the developer always worked from tools/perf/, so make sure we
    do the test -d having $toolsdir/perf/arch/$arch, to match the intent expressed in the comment,
    just above that loop.

    Signed-off-by: Ian Rogers
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexios Zavras
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: Igor Lubashev
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Mark Rutland
    Cc: Mathieu Poirier
    Cc: Namhyung Kim
    Cc: Nick Desaulniers
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Wei Li
    Link: http://lore.kernel.org/lkml/20200306071110.130202-4-irogers@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Ian Rogers
     
  • [ Upstream commit 3efc899d9afb3d03604f191a0be9669eabbfc4aa ]

    If allocated, perf_pkg_mask and metric_events need freeing.

    Signed-off-by: Ian Rogers
    Reviewed-by: Andi Kleen
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lore.kernel.org/lkml/20200512235918.10732-1-irogers@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Ian Rogers
     
  • [ Upstream commit 266150c94c69429cf6d18e130237224a047f5061 ]

    Realloc of size zero is a free not an error, avoid this causing a double
    free. Caught by clang's address sanitizer:

    ==2634==ERROR: AddressSanitizer: attempting double-free on 0x6020000015f0 in thread T0:
    #0 0x5649659297fd in free llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3
    #1 0x5649659e9251 in __zfree tools/lib/zalloc.c:13:2
    #2 0x564965c0f92c in mem2node__exit tools/perf/util/mem2node.c:114:2
    #3 0x564965a08b4c in perf_c2c__report tools/perf/builtin-c2c.c:2867:2
    #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
    #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11
    #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
    #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
    #8 0x564965942e41 in main tools/perf/perf.c:538:3

    0x6020000015f0 is located 0 bytes inside of 1-byte region [0x6020000015f0,0x6020000015f1)
    freed by thread T0 here:
    #0 0x564965929da3 in realloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0x564965c0f55e in mem2node__init tools/perf/util/mem2node.c:97:16
    #2 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8
    #3 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
    #4 0x564965944348 in run_builtin tools/perf/perf.c:312:11
    #5 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
    #6 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
    #7 0x564965942e41 in main tools/perf/perf.c:538:3

    previously allocated by thread T0 here:
    #0 0x564965929c42 in calloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0x5649659e9220 in zalloc tools/lib/zalloc.c:8:9
    #2 0x564965c0f32d in mem2node__init tools/perf/util/mem2node.c:61:12
    #3 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8
    #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
    #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11
    #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
    #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
    #8 0x564965942e41 in main tools/perf/perf.c:538:3

    v2: add a WARN_ON_ONCE when the free condition arises.

    Signed-off-by: Ian Rogers
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: clang-built-linux@googlegroups.com
    Link: http://lore.kernel.org/lkml/20200320182347.87675-1-irogers@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Ian Rogers
     
  • [ Upstream commit bec49a9e05db3dbdca696fa07c62c52638fb6371 ]

    When it is not possible for a non-privilege perf command to monitor at
    the kernel level (:k), the fallback code forces a :u. That works if the
    event was previously monitoring both levels. But if the event was
    already constrained to kernel only, then it does not make sense to
    restrict it to user only.

    Given the code works by exclusion, a kernel only event would have:

    attr->exclude_user = 1

    The fallback code would add:

    attr->exclude_kernel = 1

    In the end the end would not monitor in either the user level or kernel
    level. In other words, it would count nothing.

    An event programmed to monitor kernel only cannot be switched to user
    only without seriously warning the user.

    This patch forces an error in this case to make it clear the request
    cannot really be satisfied.

    Behavior with paranoid 1:

    $ sudo bash -c "echo 1 > /proc/sys/kernel/perf_event_paranoid"
    $ perf stat -e cycles:k sleep 1

    Performance counter stats for 'sleep 1':

    1,520,413 cycles:k

    1.002361664 seconds time elapsed

    0.002480000 seconds user
    0.000000000 seconds sys

    Old behavior with paranoid 2:

    $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
    $ perf stat -e cycles:k sleep 1
    Performance counter stats for 'sleep 1':

    0 cycles:ku

    1.002358127 seconds time elapsed

    0.002384000 seconds user
    0.000000000 seconds sys

    New behavior with paranoid 2:

    $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
    $ perf stat -e cycles:k sleep 1
    Error:
    You may not have permission to collect stats.

    Consider tweaking /proc/sys/kernel/perf_event_paranoid,
    which controls use of the performance events system by
    unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).

    The current value is 2:

    -1: Allow use of (almost) all events by all users
    Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
    >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN
    Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN
    >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN
    >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN

    To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

    kernel.perf_event_paranoid = -1

    v2 of this patch addresses the review feedback from jolsa@redhat.com.

    Signed-off-by: Stephane Eranian
    Reviewed-by: Ian Rogers
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lore.kernel.org/lkml/20200414161550.225588-1-irogers@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Stephane Eranian
     
  • [ Upstream commit d74b181a028bb5a468f0c609553eff6a8fdf4887 ]

    'snprintf' returns the number of characters which would be generated for
    the given input.

    If the returned value is *greater than* or equal to the buffer size, it
    means that the output has been truncated.

    Fix the overflow test accordingly.

    Fixes: 7780c25bae59f ("perf tools: Allow ability to map cpus to nodes easily")
    Fixes: 92a7e1278005b ("perf cpumap: Add cpu__max_present_cpu()")
    Signed-off-by: Christophe JAILLET
    Suggested-by: David Laight
    Cc: Alexander Shishkin
    Cc: Don Zickus
    Cc: He Zhe
    Cc: Jan Stancek
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: kernel-janitors@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20200324070319.10901-1-christophe.jaillet@wanadoo.fr
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Christophe JAILLET
     
  • [ Upstream commit d4953f7ef1a2e87ef732823af35361404d13fea8 ]

    Reproducible with a clang asan build and then running perf test in
    particular 'Parse event definition strings'.

    Signed-off-by: Ian Rogers
    Acked-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Leo Yan
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: clang-built-linux@googlegroups.com
    Link: http://lore.kernel.org/lkml/20200314170356.62914-1-irogers@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Ian Rogers
     
  • [ Upstream commit c9f5baa136777b2c982f6f7a90c9da69a88be148 ]

    When 'etm->instructions_sample_period' is less than
    'tidq->period_instructions', the function cs_etm__sample() cannot handle
    this case properly with its logic.

    Let's see below flow as an example:

    - If we set itrace option '--itrace=i4', then function cs_etm__sample()
    has variables with initialized values:

    tidq->period_instructions = 0
    etm->instructions_sample_period = 4

    - When the first packet is coming:

    packet->instr_count = 10; the number of instructions executed in this
    packet is 10, thus update period_instructions as below:

    tidq->period_instructions = 0 + 10 = 10
    instrs_over = 10 - 4 = 6
    offset = 10 - 6 - 1 = 3
    tidq->period_instructions = instrs_over = 6

    - When the second packet is coming:

    packet->instr_count = 10; in the second pass, assume 10 instructions
    in the trace sample again:

    tidq->period_instructions = 6 + 10 = 16
    instrs_over = 16 - 4 = 12
    offset = 10 - 12 - 1 = -3 -> the negative value
    tidq->period_instructions = instrs_over = 12

    So after handle these two packets, there have below issues:

    The first issue is that cs_etm__instr_addr() returns the address within
    the current trace sample of the instruction related to offset, so the
    offset is supposed to be always unsigned value. But in fact, function
    cs_etm__sample() might calculate a negative offset value (in handling
    the second packet, the offset is -3) and pass to cs_etm__instr_addr()
    with u64 type with a big positive integer.

    The second issue is it only synthesizes 2 samples for sample period = 4.
    In theory, every packet has 10 instructions so the two packets have
    total 20 instructions, 20 instructions should generate 5 samples
    (4 x 5 = 20). This is because cs_etm__sample() only calls once
    cs_etm__synth_instruction_sample() to generate instruction sample per
    range packet.

    This patch fixes the logic in function cs_etm__sample(); the basic
    idea for handling coming packet is:

    - To synthesize the first instruction sample, it combines the left
    instructions from the previous packet and the head of the new
    packet; then generate continuous samples with sample period;
    - At the tail of the new packet, if it has the rest instructions,
    these instructions will be left for the sequential sample.

    Suggested-by: Mike Leach
    Signed-off-by: Leo Yan
    Reviewed-by: Mathieu Poirier
    Reviewed-by: Mike Leach
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Robert Walker
    Cc: Suzuki Poulouse
    Cc: coresight ml
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Leo Yan
     
  • [ Upstream commit d01751563caf0dec7be36f81de77cc0197b77e59 ]

    If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
    fails inject instruction samples; the root cause is the packets are only
    swapped for branch samples and last branches but not for instruction
    samples, so the new coming packets cannot be properly handled for only
    synthesizing instruction samples.

    To fix this issue, this patch refactors the code with a new function
    cs_etm__packet_swap() which is used to swap packets and adds the
    condition for instruction samples.

    Signed-off-by: Leo Yan
    Reviewed-by: Mathieu Poirier
    Reviewed-by: Mike Leach
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Robert Walker
    Cc: Suzuki Poulouse
    Cc: coresight ml
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lore.kernel.org/lkml/20200219021811.20067-2-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Leo Yan
     
  • [ Upstream commit 3f5777fbaf04c58d940526a22a2e0c813c837936 ]

    The memory for global pointer is never freed during normal program
    execution, so let's do that in the main function exit as a good
    programming practice.

    A stray blank line is also removed.

    Reported-by: Jiri Olsa
    Signed-off-by: John Garry
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: James Clark
    Cc: Joakim Zhang
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: linuxarm@huawei.com
    Link: http://lore.kernel.org/lkml/1583406486-154841-2-git-send-email-john.garry@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    John Garry
     
  • [ Upstream commit 2bbc83537614517730e9f2811195004b712de207 ]

    This test places a kprobe to function getname_flags() in the kernel
    which has the following prototype:

    struct filename *getname_flags(const char __user *filename, int flags, int *empty)

    The 'filename' argument points to a filename located in user space memory.

    Looking at commit 88903c464321c ("tracing/probe: Add ustring type for
    user-space string") the kprobe should indicate that user space memory is
    accessed.

    Output before:

    [root@m35lp76 perf]# ./perf test 66 67
    66: Use vfs_getname probe to get syscall args filenames : FAILED!
    67: Check open filename arg using perf trace + vfs_getname: FAILED!
    [root@m35lp76 perf]#

    Output after:

    [root@m35lp76 perf]# ./perf test 66 67
    66: Use vfs_getname probe to get syscall args filenames : Ok
    67: Check open filename arg using perf trace + vfs_getname: Ok
    [root@m35lp76 perf]#

    Comments from Masami Hiramatsu:

    This bug doesn't happen on x86 or other archs on which user address
    space and kernel address space is the same. On some arches (ppc64 in
    this case?) user address space is partially or completely the same as
    kernel address space.

    (Yes, they switch the world when running into the kernel) In this case,
    we need to use different data access functions for each space.

    That is why I introduced the "ustring" type for kprobe events.

    As far as I can see, Thomas's patch is sane. Thomas, could you show us
    your result on your test environment?

    Comments from Thomas Richter:

    Test results for s/390 included above.

    Signed-off-by: Thomas Richter
    Acked-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Heiko Carstens
    Cc: Sumanth Korikkar
    Cc: Vasily Gorbik
    Link: http://lore.kernel.org/lkml/20200217102111.61137-1-tmricht@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Thomas Richter
     

23 Sep, 2020

5 commits

  • [ Upstream commit d26383dcb2b4b8629fde05270b4e3633be9e3d4b ]

    The following leaks were detected by ASAN:

    Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    #4 0x560578e07045 in test__pmu tests/pmu.c:155
    #5 0x560578de109b in run_test tests/builtin-test.c:410
    #6 0x560578de109b in test_and_print tests/builtin-test.c:440
    #7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    #8 0x560578de401a in cmd_test tests/builtin-test.c:807
    #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

    Fixes: cff7f956ec4a1 ("perf tests: Move pmu tests into separate object")
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Ian Rogers
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Namhyung Kim
     
  • [ Upstream commit b12eea5ad8e77f8a380a141e3db67c07432dde16 ]

    The evsel->unit borrows a pointer of pmu event or alias instead of
    owns a string. But tool event (duration_time) passes a result of
    strdup() caused a leak.

    It was found by ASAN during metric test:

    Direct leak of 210 byte(s) in 70 object(s) allocated from:
    #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
    #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
    #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
    #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
    #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
    #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
    #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
    #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
    #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
    #10 0x559fbbc0109b in run_test tests/builtin-test.c:410
    #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
    #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
    #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
    #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308

    Fixes: f0fbb114e3025 ("perf stat: Implement duration_time as a proper event")
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Ian Rogers
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Namhyung Kim
     
  • [ Upstream commit bfd1b83d75e44a9f65de30accb3dd3b5940bd3ac ]

    Asan reported leak of cpu and thread maps as they have one more refcount
    than released. I found that after setting evlist maps it should release
    it's refcount.

    It seems to be broken from the beginning so I chose the original commit
    as the culprit. But not sure how it's applied to stable trees since
    there are many changes in the code after that.

    Fixes: 7e2ed097538c5 ("perf evlist: Store pointer to the cpu and thread maps")
    Fixes: 4112eb1899c0e ("perf evlist: Default to syswide target when no thread/cpu maps set")
    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Ian Rogers
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lore.kernel.org/lkml/20200915031819.386559-4-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Namhyung Kim
     
  • [ Upstream commit 8a39e8c4d9baf65d88f66d49ac684df381e30055 ]

    When compiling with DEBUG=1 on Fedora 32 I'm getting crash for 'perf
    test signal':

    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000c68548 in __test_function ()
    (gdb) bt
    #0 0x0000000000c68548 in __test_function ()
    #1 0x00000000004d62e9 in test_function () at tests/bp_signal.c:61
    #2 0x00000000004d689a in test__bp_signal (test=0xa8e280 DW_AT_producer : (indirect string, offset: 0x254a): GNU C99 10.2.1 20200723 (Red Hat 10.2.1-1) -mtune=generic -march=x86-64 -ggdb3 -std=gnu99 -fno-omit-frame-pointer -funwind-tables -fstack-protector-all
    ^^^^^
    ^^^^^
    ^^^^^
    $

    Before:

    $ perf test signal
    20: Breakpoint overflow signal handler : FAILED!
    $

    After:

    $ perf test signal
    20: Breakpoint overflow signal handler : Ok
    $

    Fixes: 8fd34e1cce18 ("perf test: Improve bp_signal")
    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Wang Nan
    Link: http://lore.kernel.org/lkml/20200911130005.1842138-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jiri Olsa
     
  • The ddr bit width on i.MX8MN evk board is 16 bit, not 32 bit.

    Reviewed-by: Fugang Duan
    Signed-off-by: Joakim Zhang

    Joakim Zhang
     

10 Sep, 2020

2 commits

  • commit a060c1f12b525ba828f871eff3127dabf8daa1e6 upstream.

    The help info of option "--no-bpf-event" is wrongly described as "record
    bpf events", correct it.

    Committer testing:

    $ perf record -h bpf

    Usage: perf record [] []
    or: perf record [] -- []

    --clang-opt
    options passed to clang when compiling BPF scriptlets
    --clang-path
    clang binary to use for compiling BPF scriptlets
    --no-bpf-event do not record bpf events

    $

    Fixes: 71184c6ab7e6 ("perf record: Replace option --bpf-event with --no-bpf-event")
    Signed-off-by: Wei Li
    Acked-by: Song Liu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Hanjun Guo
    Cc: Jiri Olsa
    Cc: Li Bin
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/20200819031947.12115-1-liwei391@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Wei Li
     
  • [ Upstream commit e62458e3940eb3dfb009481850e140fbee183b04 ]

    The new string should have enough space for the original string and the
    back slashes IMHO.

    Fixes: fbc2844e84038ce3 ("perf vendor events: Use more flexible pattern matching for CPU identification for mapfile.csv")
    Signed-off-by: Namhyung Kim
    Reviewed-by: Ian Rogers
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: John Garry
    Cc: Kajol Jain
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: William Cohen
    Link: http://lore.kernel.org/lkml/20200903152510.489233-1-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Namhyung Kim
     

05 Sep, 2020

1 commit

  • commit e48a73a312ebf19cc3d72aa74985db25c30757c1 upstream.

    Event modifiers are not mentioned in the perf record or perf stat
    manpages. Add them to orient new users more effectively by pointing
    them to the perf list manpage for details.

    Fixes: 2055fdaf8703 ("perf list: Document precise event sampling for AMD IBS")
    Signed-off-by: Kim Phillips
    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Alexey Budankov
    Cc: Ian Rogers
    Cc: Jin Yao
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Paul Clarke
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tony Jones
    Cc: stable@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20200901215853.276234-1-kim.phillips@amd.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Kim Phillips
     

02 Sep, 2020

5 commits

  • Add bandwidth usage metric for i.MX8QM DDR Perf.

    Test Report:
    ------------------------------------------------------
    root@imx8qmmek:~# ./perf list metric

    List of pre-defined events (to be used in -e):

    Metrics:

    imx8qm-ddr0-all-r
    [imx8qm: bytes of all masters read from ddr0]
    imx8qm-ddr0-all-w
    [imx8qm: bytes of all masters write to ddr0]
    imx8qm-ddr0-bandwidth-usage
    [imx8qm: percentage of bandwidth usage for ddr0]
    imx8qm-ddr1-all-r
    [imx8qm: bytes of all masters read from ddr1]
    imx8qm-ddr1-all-w
    [imx8qm: bytes of all masters write to ddr1]
    imx8qm-ddr1-bandwidth-usage
    [imx8qm: percentage of bandwidth usage for ddr1]
    ------------------------------------------------------
    root@imx8qmmek:~# ./perf stat -a -I 1000 -M imx8qm-ddr0-bandwidth-usage dd if=/dev/zero of=/dev/null bs=1M count=1000000
    1.000137560 8403160 imx8_ddr0/read-cycles/ # 11.9 % imx8qm-ddr0-bandwidth-usage
    1.000137560 86499449 imx8_ddr0/write-cycles/
    1.000137560 1000137560 ns duration_time
    2.000542875 8818984 imx8_ddr0/read-cycles/ # 10.5 % imx8qm-ddr0-bandwidth-usage
    2.000542875 74883499 imx8_ddr0/write-cycles/
    2.000542875 1000405315 ns duration_time
    3.000839188 8604400 imx8_ddr0/read-cycles/ # 9.6 % imx8qm-ddr0-bandwidth-usage
    3.000839188 68284175 imx8_ddr0/write-cycles/
    3.000839188 1000296313 ns duration_time
    --------------------------------------------------------
    root@imx8qmmek:~# ./perf stat -a -I 1000 -M imx8qm-ddr1-bandwidth-usage dd if=/dev/zero of=/dev/null bs=1M count=1000000
    1.000129435 15152856 imx8_ddr1/read-cycles/ # 14.5 % imx8qm-ddr1-bandwidth-usage
    1.000129435 100669236 imx8_ddr1/write-cycles/
    1.000129435 1000129435 ns duration_time
    2.000521875 15463356 imx8_ddr1/read-cycles/ # 13.4 % imx8qm-ddr1-bandwidth-usage
    2.000521875 91710077 imx8_ddr1/write-cycles/
    2.000521875 1000392440 ns duration_time
    3.000794688 15773560 imx8_ddr1/read-cycles/ # 12.7 % imx8qm-ddr1-bandwidth-usage
    3.000794688 85948507 imx8_ddr1/write-cycles/
    3.000794688 1000272813 ns duration_time

    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • Add bandwidth usage metric for i.MX8QXP DDR Perf.

    Test Report:
    ----------------------------------------------------
    root@imx8qxpmek:~# ./perf list metric

    List of pre-defined events (to be used in -e):

    Metrics:

    imx8qxp-ddr0-all-r
    [imx8qxp: bytes of all masters read from ddr0]
    imx8qxp-ddr0-all-w
    [imx8qxp: bytes of all masters write to ddr0]
    imx8qxp-ddr0-bandwidth-usage
    [imx8qxp: percentage of bandwidth usage for ddr0]
    --------------------------------------------------------
    root@imx8qxpmek:~# ./perf stat -a -I 1000 -M imx8qxp-ddr0-bandwidth-usage dd if=/dev/zero of=/dev/null bs=1M count=1000000
    1.000170750 681608 imx8_ddr0/read-cycles/ # 13.6 % imx8qxp-ddr0-bandwidth-usage
    1.000170750 81013320 imx8_ddr0/write-cycles/
    1.000170750 1000170750 ns duration_time
    2.000833375 592804 imx8_ddr0/read-cycles/ # 13.7 % imx8qxp-ddr0-bandwidth-usage
    2.000833375 81756288 imx8_ddr0/write-cycles/
    2.000833375 1000662625 ns duration_time
    3.001393875 611804 imx8_ddr0/read-cycles/ # 13.6 % imx8qxp-ddr0-bandwidth-usage
    3.001393875 80897346 imx8_ddr0/write-cycles/
    3.001393875 1000560500 ns duration_time
    4.001917375 600564 imx8_ddr0/read-cycles/ # 13.5 % imx8qxp-ddr0-bandwidth-usage
    4.001917375 80269884 imx8_ddr0/write-cycles/
    4.001917375 1000523500 ns duration_time

    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • Add bandwidth usage metric for i.MX8MM DDR Perf.

    Test Report:
    -------------------------------------------------------
    root@imx8mmevk:~# ./perf list metric

    List of pre-defined events (to be used in -e):

    Metrics:

    imx8mm-ddr0-2d-r
    [imx8mm: bursts of gpu 2d read from ddr0]
    imx8mm-ddr0-2d-w
    [imx8mm: bursts of gpu 2d write to ddr0]
    imx8mm-ddr0-3d-r
    [imx8mm: bursts of gpu 3d read from ddr0]
    imx8mm-ddr0-3d-w
    [imx8mm: bursts of gpu 3d write to ddr0]
    imx8mm-ddr0-a53-r
    [imx8mm: bursts of a53 core read from ddr0]
    imx8mm-ddr0-a53-w
    [imx8mm: bursts of a53 core write to ddr0]
    imx8mm-ddr0-all-r
    [imx8mm: bytes of all masters read from ddr0]
    imx8mm-ddr0-all-w
    [imx8mm: bytes of all masters write to ddr0]
    ---------------------------------------------------------
    root@imx8mmevk:~# ./perf stat -a -I 1000 -M imx8mm-ddr0-bandwidth-usage-lpddr4 dd if=/dev/zero of=/dev/null bs=1M count=1000000
    1.000127125 324072 imx8_ddr0/read-cycles/ # 33.4 % imx8mm-ddr0-bandwidth-usage-lpddr4
    1.000127125 250417562 imx8_ddr0/write-cycles/
    1.000127125 1000127125 ns duration_time
    2.001282750 293964 imx8_ddr0/read-cycles/ # 33.9 % imx8mm-ddr0-bandwidth-usage-lpddr4
    2.001282750 254176749 imx8_ddr0/write-cycles/
    2.001282750 1001155625 ns duration_time
    3.002299500 234264 imx8_ddr0/read-cycles/ # 33.0 % imx8mm-ddr0-bandwidth-usage-lpddr4
    3.002299500 247474957 imx8_ddr0/write-cycles/
    3.002299500 1001016750 ns duration_time
    4.003355875 202304 imx8_ddr0/read-cycles/ # 32.8 % imx8mm-ddr0-bandwidth-usage-lpddr4
    4.003355875 245469156 imx8_ddr0/write-cycles/
    4.003355875 1001056375 ns duration_time

    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • Add JSON file for i.MX8MQ DDR Perf.

    Test Report:
    -------------------------------------------------------------
    root@imx8mqevk:~# ./perf list metric

    List of pre-defined events (to be used in -e):

    Metrics:

    imx8mq-ddr0-all-r
    [imx8mq: bytes of all masters read from ddr0]
    imx8mq-ddr0-all-w
    [imx8mq: bytes of all masters write to ddr0]
    imx8mq-ddr0-bandwidth-usage
    [imx8mq: percentage of bandwidth usage for ddr0]
    ------------------------------------------------------------
    root@imx8mqevk:~# ./perf stat -a -I 1000 -M imx8mq-ddr0-all-r,imx8mq-ddr0-all-w
    1.001143121 34224 imx8_ddr0/read-cycles/ # 547584.0 imx8mq-ddr0-all-r
    1.001143121 10805 imx8_ddr0/write-cycles/ # 172880.0 imx8mq-ddr0-all-w
    2.003035881 31656 imx8_ddr0/read-cycles/ # 506496.0 imx8mq-ddr0-all-r
    2.003035881 7585 imx8_ddr0/write-cycles/ # 121360.0 imx8mq-ddr0-all-w
    3.004305241 19864 imx8_ddr0/read-cycles/ # 317824.0 imx8mq-ddr0-all-r
    3.004305241 1483 imx8_ddr0/write-cycles/ # 23728.0 imx8mq-ddr0-all-w
    ------------------------------------------------------------
    root@imx8mqevk:~# ./perf stat -a -I 1000 -M imx8mq-ddr0-bandwidth-usage dd if=/dev/zero of=/dev/null bs=1M count=1000000
    1.000643080 126560 imx8_ddr0/read-cycles/ # 11.6 % imx8mq-ddr0-bandwidth-usage
    1.000643080 92714082 imx8_ddr0/write-cycles/
    1.000643080 1000643080 ns duration_time
    2.002052721 82056 imx8_ddr0/read-cycles/ # 9.4 % imx8mq-ddr0-bandwidth-usage
    2.002052721 75279735 imx8_ddr0/write-cycles/
    2.002052721 1001409641 ns duration_time
    3.003379081 85448 imx8_ddr0/read-cycles/ # 9.3 % imx8mq-ddr0-bandwidth-usage
    3.003379081 74199950 imx8_ddr0/write-cycles/
    3.003379081 1001326360 ns duration_time
    4.004734241 91084 imx8_ddr0/read-cycles/ # 9.5 % imx8mq-ddr0-bandwidth-usage
    4.004734241 75513082 imx8_ddr0/write-cycles/
    4.004734241 1001355160 ns duration_time

    Signed-off-by: Joakim Zhang

    Joakim Zhang
     
  • Add JSON file for i.MX8MN DDR Perf

    Test Report:
    ---------------------------------------------------------------
    root@imx8mnevk:~# ./perf list metric

    List of pre-defined events (to be used in -e):

    Metrics:

    imx8mn-ddr0-all-r
    [imx8mn: bytes of all masters read from ddr0]
    imx8mn-ddr0-all-w
    [imx8mn: bytes of all masters write to ddr0]
    imx8mn-ddr0-bandwidth-usage-ddr4
    [imx8mn: percentage of bandwidth usage for ddr0]
    imx8mn-ddr0-bandwidth-usage-lpddr4
    [imx8mn: percentage of bandwidth usage for ddr0]
    ------------------------------------------------------------------
    root@imx8mnevk:~# ./perf stat -a -I 1000 -M imx8mn-ddr0-all-r,imx8mn-ddr0-all-w
    1.000469875 108120 imx8_ddr0/read-cycles/ # 1729920.0 imx8mn-ddr0-all-r
    1.000469875 28841 imx8_ddr0/write-cycles/ # 461456.0 imx8mn-ddr0-all-w
    2.001191750 37396 imx8_ddr0/read-cycles/ # 598336.0 imx8mn-ddr0-all-r
    2.001191750 6090 imx8_ddr0/write-cycles/ # 97440.0 imx8mn-ddr0-all-w
    ------------------------------------------------------------------
    root@imx8mnevk:~# ./perf stat -a -I 1000 -M imx8mn-ddr0-bandwidth-usage-lpddr4 dd if=/dev/zero of=/dev/null bs=1M count=1000000
    1.000762250 840456 imx8_ddr0/read-cycles/ # 48.9 % imx8mn-ddr0-bandwidth-usage-lpddr4
    1.000762250 390024176 imx8_ddr0/write-cycles/
    1.000762250 1000762250 ns duration_time
    2.001982125 592944 imx8_ddr0/read-cycles/ # 48.5 % imx8mn-ddr0-bandwidth-usage-lpddr4
    2.001982125 387366923 imx8_ddr0/write-cycles/
    2.001982125 1001219875 ns duration_time
    3.003123250 542650 imx8_ddr0/read-cycles/ # 48.4 % imx8mn-ddr0-bandwidth-usage-lpddr4
    3.003123250 386631603 imx8_ddr0/write-cycles/
    3.003123250 1001141125 ns duration_time
    4.004289875 538522 imx8_ddr0/read-cycles/ # 48.4 % imx8mn-ddr0-bandwidth-usage-lpddr4
    4.004289875 386577020 imx8_ddr0/write-cycles/
    4.004289875 1001166625 ns duration_time
    5.005546750 515596 imx8_ddr0/read-cycles/ # 48.4 % imx8mn-ddr0-bandwidth-usage-lpddr4
    5.005546750 386800889 imx8_ddr0/write-cycles/
    5.005546750 1001256875 ns duration_time

    Signed-off-by: Joakim Zhang

    Joakim Zhang
     

26 Aug, 2020

1 commit

  • [ Upstream commit 12d572e785b15bc764e956caaa8a4c846fd15694 ]

    Fix the memory leakage in debuginfo__find_trace_events() when the probe
    point is not found in the debuginfo. If there is no probe point found in
    the debuginfo, debuginfo__find_probes() will NOT return -ENOENT, but 0.

    Thus the caller of debuginfo__find_probes() must check the tf.ntevs and
    release the allocated memory for the array of struct probe_trace_event.

    The current code releases the memory only if the debuginfo__find_probes()
    hits an error but not checks tf.ntevs. In the result, the memory allocated
    on *tevs are not released if tf.ntevs == 0.

    This fixes the memory leakage by checking tf.ntevs == 0 in addition to
    ret < 0.

    Fixes: ff741783506c ("perf probe: Introduce debuginfo to encapsulate dwarf information")
    Signed-off-by: Masami Hiramatsu
    Reviewed-by: Srikar Dronamraju
    Cc: Andi Kleen
    Cc: Oleg Nesterov
    Cc: stable@vger.kernel.org
    Link: http://lore.kernel.org/lkml/159438668346.62703.10887420400718492503.stgit@devnote2
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     

21 Aug, 2020

3 commits

  • [ Upstream commit 1beaef29c34154ccdcb3f1ae557f6883eda18840 ]

    For memcpy, the source pages are memset to zero only when --cycles is
    used. This leads to wildly different results with or without --cycles,
    since all sources pages are likely to be mapped to the same zero page
    without explicit writes.

    Before this fix:

    $ export cmd="./perf stat -e LLC-loads -- ./perf bench \
    mem memcpy -s 1024MB -l 100 -f default"
    $ $cmd

    2,935,826 LLC-loads
    3.821677452 seconds time elapsed

    $ $cmd --cycles

    217,533,436 LLC-loads
    8.616725985 seconds time elapsed

    After this fix:

    $ $cmd

    214,459,686 LLC-loads
    8.674301124 seconds time elapsed

    $ $cmd --cycles

    214,758,651 LLC-loads
    8.644480006 seconds time elapsed

    Fixes: 47b5757bac03c338 ("perf bench mem: Move boilerplate memory allocation to the infrastructure")
    Signed-off-by: Vincent Whitchurch
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: kernel@axis.com
    Link: http://lore.kernel.org/lkml/20200810133404.30829-1-vincent.whitchurch@axis.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Vincent Whitchurch
     
  • commit a58a057ce65b52125dd355b7d8b0d540ea267a5f upstream.

    CBR events can result in a duplicate branch event, because the state
    type defaults to a branch. Fix by clearing the state type.

    Example: trace 'sleep' and hope for a frequency change

    Before:

    $ perf record -e intel_pt//u sleep 0.1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.034 MB perf.data ]
    $ perf script --itrace=bpe > before.txt

    After:

    $ perf script --itrace=bpe > after.txt
    $ diff -u before.txt after.txt
    # --- before.txt 2020-07-07 14:42:18.191508098 +0300
    # +++ after.txt 2020-07-07 14:42:36.587891753 +0300
    @@ -29673,7 +29673,6 @@
    sleep 93431 [007] 15411.619905: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb2e0 clock_nanosleep@@GLIBC_2.17+0x0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
    sleep 93431 [007] 15411.619905: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown])
    sleep 93431 [007] 15411.720069: cbr: cbr: 15 freq: 1507 MHz ( 56%) 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
    - sleep 93431 [007] 15411.720069: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown])
    sleep 93431 [007] 15411.720076: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb30e clock_nanosleep@@GLIBC_2.17+0x2e (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
    sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818abb323 clock_nanosleep@@GLIBC_2.17+0x43 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 7f0818ac0eb7 __nanosleep+0x17 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
    sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818ac0ebf __nanosleep+0x1f (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 55cb7e4c2827 rpl_nanosleep+0x97 (/usr/bin/sleep)

    Fixes: 91de8684f1cff ("perf intel-pt: Cater for CBR change in PSB+")
    Fixes: abe5a1d3e4bee ("perf intel-pt: Decoder to output CBR changes immediately")
    Signed-off-by: Adrian Hunter
    Reviewed-by: Andi Kleen
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20200710151104.15137-3-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     
  • commit 401136bb084fd021acd9f8c51b52fe0a25e326b2 upstream.

    While walking code towards a FUP ip, the packet state is
    INTEL_PT_STATE_FUP or INTEL_PT_STATE_FUP_NO_TIP. That was mishandled
    resulting in the state becoming INTEL_PT_STATE_IN_SYNC prematurely. The
    result was an occasional lost EXSTOP event.

    Signed-off-by: Adrian Hunter
    Reviewed-by: Andi Kleen
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20200710151104.15137-2-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     

05 Aug, 2020

4 commits

  • commit e4d9b04b973b2dbce7b42af95ea70d07da1c936d upstream.

    Noticed with gcc 10 (fedora rawhide) that those variables were not being
    declared as static, so end up with:

    ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
    ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
    ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
    ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
    ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
    ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
    make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1

    Prefix those with bench__ and add them to bench/bench.h, so that we can
    share those on the tools needing to access those variables from signal
    handlers.

    Acked-by: Thomas Gleixner
    Cc: Adrian Hunter
    Cc: Davidlohr Bueso
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Ben Hutchings
    Signed-off-by: Greg Kroah-Hartman

    Arnaldo Carvalho de Melo
     
  • commit ebcb9464a2ae3a547e97de476575c82ece0e93e2 upstream.

    It is possible to return a pointer to a local variable when looking up
    the architecture name for the running system and no normalization is
    done on that value, i.e. we may end up returning the uts.machine local
    variable.

    While this doesn't happen on most arches, as normalization takes place,
    lets fix this by making that a static variable and optimize it a bit by
    not always running uname(), only the first time.

    Noticed in fedora rawhide running with:

    [perfbuilder@a5ff49d6e6e4 ~]$ gcc --version
    gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8)

    Reported-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Ben Hutchings
    Signed-off-by: Greg Kroah-Hartman

    Arnaldo Carvalho de Melo
     
  • commit cff20b3151ccab690715cb6cf0f5da5cccb32adf upstream.

    To fix the build with newer gccs, that without this patch exit with:

    LD /tmp/build/perf/tests/perf-in.o
    ld: /tmp/build/perf/tests/bp_account.o:/git/perf/tools/perf/tests/bp_account.c:22: multiple definition of `the_var'; /tmp/build/perf/tests/bp_signal.o:/git/perf/tools/perf/tests/bp_signal.c:38: first defined here
    make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/tests/perf-in.o] Error 1

    First noticed in fedora:rawhide/32 with:

    [perfbuilder@a5ff49d6e6e4 ~]$ gcc --version
    gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8)

    Reported-by: Jiri Olsa
    Cc: Adrian Hunter
    Cc: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Ben Hutchings
    Signed-off-by: Greg Kroah-Hartman

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit bd3c628f8fafa6cbd6a1ca440034b841f0080160 ]

    When recording with cache-misses and arm_spe_x event, I found that it
    will just fail without showing any error info if i put cache-misses
    after 'arm_spe_x' event.

    [root@localhost 0620]# perf record -e cache-misses \
    -e arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.067 MB perf.data ]
    [root@localhost 0620]#
    [root@localhost 0620]# perf record -e arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ \
    -e cache-misses sleep 1
    [root@localhost 0620]#

    The current code can only work if the only event to be traced is an
    'arm_spe_x', or if it is the last event to be specified. Otherwise the
    last event type will be checked against all the arm_spe_pmus[i]->types,
    none will match and an out of bound 'i' index will be used in
    arm_spe_recording_init().

    We don't support concurrent multiple arm_spe_x events currently, that
    is checked in arm_spe_recording_options(), and it will show the relevant
    info. So add the check and record of the first found 'arm_spe_pmu' to
    fix this issue here.

    Fixes: ffd3d18c20b8 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support")
    Signed-off-by: Wei Li
    Reviewed-by: Mathieu Poirier
    Tested-by-by: Leo Yan
    Cc: Alexander Shishkin
    Cc: Hanjun Guo
    Cc: Jiri Olsa
    Cc: Kim Phillips
    Cc: Mark Rutland
    Cc: Mike Leach
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Suzuki Poulouse
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lore.kernel.org/lkml/20200724071111.35593-2-liwei391@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Wei Li