13 Jun, 2019

1 commit

  • Add DDR performance monitor support for iMX8QXP. The PMU consists of 3
    programmable event counters and a single dedicated cycle counter.

    Example usage:

    $ perf stat -a -e \
    imx8_ddr0/read-cycles/,imx8_ddr0/write-cycles/,imx8_ddr0/precharge/ ls

    - or -

    $ perf stat -a -e \
    imx8_ddr0/cycles/,imx8_ddr0/read-access/,imx8_ddr0/write-access/ ls

    Other events are supported, and advertised via perf list.

    Reviewed-by: Andrey Smirnov
    Signed-off-by: Frank Li
    [will: rewrote commit message/kconfig and used #defines for dev/cpuhp names]
    Signed-off-by: Will Deacon

    Frank Li
     

04 Apr, 2019

1 commit

  • Adds a new driver to support the SMMUv3 PMU and add it into the
    perf events framework.

    Each SMMU node may have multiple PMUs associated with it, each of
    which may support different events.

    SMMUv3 PMCG devices are named as smmuv3_pmcg_ where
    is the physical page address of the SMMU PMCG
    wrapped to 4K boundary. For example, the PMCG at 0xff88840000 is
    named smmuv3_pmcg_ff88840

    Filtering by stream id is done by specifying filtering parameters
    with the event. options are:
    filter_enable - 0 = no filtering, 1 = filtering enabled
    filter_span - 0 = exact match, 1 = pattern match
    filter_stream_id - pattern to filter against

    Example: perf stat -e smmuv3_pmcg_ff88840/transaction,filter_enable=1,
    filter_span=1,filter_stream_id=0x42/ -a netperf

    Applies filter pattern 0x42 to transaction events, which means events
    matching stream ids 0x42 & 0x43 are counted as only upper StreamID
    bits are required to match the given filter. Further filtering
    information is available in the SMMU documentation.

    SMMU events are not attributable to a CPU, so task mode and sampling
    are not supported.

    Signed-off-by: Neil Leeder
    Signed-off-by: Shameer Kolothum
    Reviewed-by: Robin Murphy
    [will: fold in review feedback from Robin]
    [will: rewrite Kconfig text and allow building as a module]
    Signed-off-by: Will Deacon

    Neil Leeder
     

06 Dec, 2018

1 commit

  • This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
    Controller(DMC) and Level 3 Cache(L3C). Each PMU supports up to 4
    counters. All counters lack overflow interrupt and are
    sampled periodically.

    Reviewed-by: Suzuki K Poulose
    Signed-off-by: Ganapatrao Kulkarni
    [will: consistent enum cpuhp_state naming]
    Signed-off-by: Will Deacon

    Kulkarni, Ganapatrao
     

07 Mar, 2018

2 commits

  • The arm-cci driver is really two entirely separate drivers; one for MCPM
    port control and the other for the performance monitors. Since they are
    already relatively self-contained, let's take the plunge and move the
    PMU parts out to drivers/perf where they belong these days. For non-MCPM
    systems this leaves a small dependency on the remaining "bus" stub for
    initial probing and discovery, but we end up with something that still
    fits the general pattern of its fellow system PMU drivers to ease future
    maintenance.

    Moving code to a new file also offers a perfect excuse to modernise the
    license/copyright headers and clean up some funky linewraps on the way.

    Cc: Lorenzo Pieralisi
    Reviewed-by: Suzuki Poulose
    Acked-by: Punit Agrawal
    Signed-off-by: Robin Murphy
    Signed-off-by: Arnd Bergmann

    Robin Murphy
     
  • The arm-ccn driver is purely a perf driver for the CCN PMU, not a bus
    driver in the sense of the other residents of drivers/bus/, so let's
    move it to the appropriate place for SoC PMU drivers. Not to mention
    moving the documentation accordingly as well.

    Acked-by: Pawel Moll
    Acked-by: Will Deacon
    Signed-off-by: Robin Murphy
    Signed-off-by: Arnd Bergmann

    Robin Murphy
     

03 Jan, 2018

1 commit

  • Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
    The DSU integrates one or more cores with an L3 memory system, control
    logic, and external interfaces to form a multicore cluster. The PMU
    allows counting the various events related to L3, SCU etc, along with
    providing a cycle counter.

    The PMU can be accessed via system registers, which are common
    to the cores in the same cluster. The PMU registers follow the
    semantics of the ARMv8 PMU, mostly, with the exception that
    the counters record the cluster wide events.

    This driver is mostly based on the ARMv8 and CCI PMU drivers.
    The driver only supports ARM64 at the moment. It can be extended
    to support ARM32 by providing register accessors like we do in
    arch/arm64/include/arm_dsu_pmu.h.

    Cc: Mark Rutland
    Cc: Will Deacon
    Reviewed-by: Jonathan Cameron
    Reviewed-by: Mark Rutland
    Signed-off-by: Suzuki K Poulose
    Signed-off-by: Will Deacon

    Suzuki K Poulose
     

16 Nov, 2017

1 commit

  • Pull arm64 updates from Will Deacon:
    "The big highlight is support for the Scalable Vector Extension (SVE)
    which required extensive ABI work to ensure we don't break existing
    applications by blowing away their signal stack with the rather large
    new vector context ( of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
    arm64: Make ARMV8_DEPRECATED depend on SYSCTL
    arm64: Implement __lshrti3 library function
    arm64: support __int128 on gcc 5+
    arm64/sve: Add documentation
    arm64/sve: Detect SVE and activate runtime support
    arm64/sve: KVM: Hide SVE from CPU features exposed to guests
    arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
    arm64/sve: KVM: Prevent guests from using SVE
    arm64/sve: Add sysctl to set the default vector length for new processes
    arm64/sve: Add prctl controls for userspace vector length management
    arm64/sve: ptrace and ELF coredump support
    arm64/sve: Preserve SVE registers around EFI runtime service calls
    arm64/sve: Preserve SVE registers around kernel-mode NEON use
    arm64/sve: Probe SVE capabilities and usable vector lengths
    arm64: cpufeature: Move sys_caps_initialised declarations
    arm64/sve: Backend logic for setting the vector length
    arm64/sve: Signal handling support
    arm64/sve: Support vector length resetting for new processes
    arm64/sve: Core task context handling
    arm64/sve: Low-level CPU setup
    ...

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

20 Oct, 2017

1 commit


18 Oct, 2017

1 commit

  • The ARMv8.2 architecture introduces the optional Statistical Profiling
    Extension (SPE).

    SPE can be used to profile a population of operations in the CPU pipeline
    after instruction decode. These are either architected instructions (i.e.
    a dynamic instruction trace) or CPU-specific uops and the choice is fixed
    statically in the hardware and advertised to userspace via caps/. Sampling
    is controlled using a sampling interval, similar to a regular PMU counter,
    but also with an optional random perturbation to avoid falling into patterns
    where you continuously profile the same instruction in a hot loop.

    After each operation is decoded, the interval counter is decremented. When
    it hits zero, an operation is chosen for profiling and tracked within the
    pipeline until it retires. Along the way, information such as TLB lookups,
    cache misses, time spent to issue etc is captured in the form of a sample.
    The sample is then filtered according to certain criteria (e.g. load
    latency) that can be specified in the event config (described under
    format/) and, if the sample satisfies the filter, it is written out to
    memory as a record, otherwise it is discarded. Only one operation can
    be sampled at a time.

    The in-memory buffer is linear and virtually addressed, raising an
    interrupt when it fills up. The PMU driver handles these interrupts to
    give the appearance of a ring buffer, as expected by the AUX code.

    The in-memory trace-like format is self-describing (though not parseable
    in reverse) and written as a series of records, with each record
    corresponding to a sample and consisting of a sequence of packets. These
    packets are defined by the architecture, although some have CPU-specific
    fields for recording information specific to the microarchitecture.

    As a simple example, a record generated for a branch instruction may
    consist of the following packets:

    0 (Address) : Virtual PC of the branch instruction
    1 (Type) : Conditional direct branch
    2 (Counter) : Number of cycles taken from Dispatch to Issue
    3 (Address) : Virtual branch target + condition flags
    4 (Counter) : Number of cycles taken from Dispatch to Complete
    5 (Events) : Mispredicted as not-taken
    6 (END) : End of record

    It is also possible to toggle properties such as timestamp packets in
    each record.

    This patch adds support for SPE in the form of a new perf driver.

    Cc: Alexander Shishkin
    Reviewed-by: Mark Rutland
    Signed-off-by: Will Deacon

    Will Deacon
     

11 Apr, 2017

2 commits

  • This patch adds framework code to handle parsing PMU data out of the
    MADT, sanity checking this, and managing the association of CPUs (and
    their interrupts) with appropriate logical PMUs.

    For the time being, we expect that only one PMU driver (PMUv3) will make
    use of this, and we simply pass in a single probe function.

    This is based on an earlier patch from Jeremy Linton.

    Signed-off-by: Mark Rutland
    Tested-by: Jeremy Linton
    Cc: Will Deacon
    Signed-off-by: Will Deacon

    Mark Rutland
     
  • Now that we've split the pdev and DT probing logic from the runtime
    management, let's move the former into its own file. We gain a few lines
    due to the copyright header and includes, but this should keep the logic
    clearly separated, and paves the way for adding ACPI support in a
    similar fashion.

    Signed-off-by: Mark Rutland
    Tested-by: Jeremy Linton
    [will: rename nr_irqs to avoid conflict with global variable]
    Signed-off-by: Will Deacon

    Mark Rutland
     

04 Apr, 2017

1 commit

  • This adds a new dynamic PMU to the Perf Events framework to program
    and control the L3 cache PMUs in some Qualcomm Technologies SOCs.

    The driver supports a distributed cache architecture where the overall
    cache for a socket is comprised of multiple slices each with its own PMU.
    Access to each individual PMU is provided even though all CPUs share all
    the slices. User space needs to aggregate to individual counts to provide
    a global picture.

    The driver exports formatting and event information to sysfs so it can
    be used by the perf user space tools with the syntaxes:
    perf stat -a -e l3cache_0_0/read-miss/
    perf stat -a -e l3cache_0_0/event=0x21/

    Acked-by: Mark Rutland
    Signed-off-by: Agustin Vega-Frias
    [will: fixed sparse issues]
    Signed-off-by: Will Deacon

    Agustin Vega-Frias
     

09 Feb, 2017

1 commit

  • Adds perf events support for L2 cache PMU.

    The L2 cache PMU driver is named 'l2cache_0' and can be used
    with perf events to profile L2 events such as cache hits
    and misses on Qualcomm Technologies processors.

    Reviewed-by: Mark Rutland
    Signed-off-by: Neil Leeder
    [will: minimise nesting in l2_cache_associate_cpu_with_cluster]
    [will: use kstrtoul for unsigned long, remove redunant .owner setting]
    Signed-off-by: Will Deacon

    Neil Leeder
     

16 Sep, 2016

1 commit


31 Jul, 2015

1 commit

  • To enable sharing of the arm_pmu code with arm64, this patch factors it
    out to drivers/perf/. A new drivers/perf directory is added for
    performance monitor drivers to live under.

    MAINTAINERS is updated accordingly. Files added previously without a
    corresponsing MAINTAINERS update (perf_regs.c, perf_callchain.c, and
    perf_event.h) are also added.

    Cc: Arnaldo Carvalho de Melo
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Linus Walleij
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Mark Rutland
    [will: augmented Kconfig help slightly]
    Signed-off-by: Will Deacon

    Mark Rutland