23 May, 2018

1 commit

  • commit 4bbaf2584b86b0772413edeac22ff448f36351b1 upstream.

    Correct a trinity finding for the perf_event_open() system call with
    a perf event attribute structure that uses a frequency but has the
    sampling frequency set to zero. This causes a FP divide exception during
    the sample rate initialization for the hardware sampling facility.

    Fixes: 8c069ff4bd606 ("s390/perf: add support for the CPU-Measurement Sampling Facility")
    Cc: stable@vger.kernel.org # 3.14+
    Reviewed-by: Heiko Carstens
    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Greg Kroah-Hartman

    Hendrik Brueckner
     

13 Sep, 2017

1 commit

  • A per-thread event could not be created correctly like below:

    perf record --per-thread -e rB0000 -- sleep 1
    Error:
    The sys_perf_event_open() syscall returned with 19 (No such device) for event (rB0000).
    /bin/dmesg may provide additional information.
    No CONFIG_PERF_EVENTS=y kernel support configured?

    This bug was introduced by:

    commit c311c797998c1e70eade463dd60b843da4f1a203
    Author: Alexey Dobriyan
    Date: Mon May 8 15:56:15 2017 -0700

    cpumask: make "nr_cpumask_bits" unsigned

    If a per-thread event is not attached to any CPU, the cpu field
    in struct perf_event is -1. The above commit converts the CPU number
    to unsigned int, which result in an illegal CPU number.

    Fixes: c311c797998c ("cpumask: make "nr_cpumask_bits" unsigned")
    Cc: # v4.12+
    Cc: Alexey Dobriyan
    Acked-by: Heiko Carstens
    Signed-off-by: Pu Hou
    Signed-off-by: Martin Schwidefsky

    Pu Hou
     

13 Jul, 2017

1 commit


12 Jun, 2017

1 commit

  • Rename a couple of the struct psw_bits members so it is more obvious
    for what they are good. Initially I thought using the single character
    names from the PoP would be sufficient and obvious, but admittedly
    that is not true.

    The current implementation is not easy to use, if one has to look into
    the source file to figure out which member represents the 'per' bit
    (which is the 'r' member).

    Therefore rename the members to sane names that are identical to the
    uapi psw mask defines:

    r -> per
    i -> io
    e -> ext
    t -> dat
    m -> mcheck
    w -> wait
    p -> pstate

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

09 May, 2017

1 commit

  • Bit searching functions accept "unsigned long" indices but
    "nr_cpumask_bits" is "int" which is signed, so inevitable sign
    extensions occur on x86_64. Those MOVSX are #1 MOVSX bloat by number of
    uses across whole kernel.

    Change "nr_cpumask_bits" to unsigned, this number can't be negative
    after all. It allows to do implicit zero-extension on x86_64 without
    MOVSX.

    Change signed comparisons into unsigned comparisons where necessary.

    Other uses looks fine because it is either argument passed to a function
    or comparison is already unsigned.

    Net win on allyesconfig type of kernel: ~2.8 KB (!)

    add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833)
    function old new delta
    xen_exit_mmap 691 735 +44
    qstat_read 426 440 +14
    __cpufreq_cooling_register 1678 1687 +9
    trace_rb_cpu_prepare 447 455 +8
    vermagic 54 60 +6
    nfp_driver_version 54 60 +6
    rcu_torture_stats_print 1147 1151 +4
    find_next_push_cpu 267 269 +2
    xen_irq_resume 961 960 -1
    ...
    init_vp_index 946 906 -40
    od_set_powersave_bias 328 281 -47
    power_cpu_exit 193 139 -54
    arch_show_interrupts 3538 3484 -54
    select_idle_sibling 1558 1471 -87
    Total: Before=158358910, After=158356077, chg -0.00%

    Same arguments apply to "nr_cpu_ids" but I haven't yet found enough
    courage to delve into this issue (and proper fix may require new type
    "cpu_t" which is whole separate story).

    Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2
    Signed-off-by: Alexey Dobriyan
    Cc: Rusty Russell
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

05 Apr, 2017

1 commit

  • There are three different code levels in regard to the identification
    of guest samples. They differ in the way the LPP instruction is used.

    1) Old kernels without the LPP instruction. The guest program parameter
    is always zero.
    2) Newer kernels load the process pid into the program parameter with LPP.
    The guest program parameter is non-zero if the guest executes in a
    process != idle.
    3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value
    non-zero even for the idle task. The guest program parameter is non-zero
    if the guest is running.

    All kernels load the process pid to CR4 on context switch. The CPU sampling
    code uses the value in CR4 to decide between guest and host samples in case
    the guest program parameter is zero. The three cases:

    1) CR4==pid, gpp==0
    2) CR4==pid, gpp==pid
    3) CR4==pid, gpp==((1UL << 31) | pid)

    The load-control instruction to load the pid into CR4 is expensive and the
    goal is to remove it. To distinguish the host CR4 from the guest pid for
    the idle process the maximum value 0xffff for the PASN is used.
    This adds a fourth case for a guest OS with an updated kernel:

    4) CR4==0xffff, gpp=((1UL << 31) | pid)

    The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff)
    to identify guest samples. This works nicely with all 4 cases, the only
    possible issue would be a guest with an old kernel (gpp==0) and a process
    pid of 0xffff. Well, don't do that..

    Suggested-by: Christian Borntraeger
    Reviewed-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

25 Dec, 2016

1 commit

  • When the state names got added a script was used to add the extra argument
    to the calls. The script basically converted the state constant to a
    string, but the cleanup to convert these strings into meaningful ones did
    not happen.

    Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
    are used in all the other places already.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

12 Dec, 2016

1 commit


23 Nov, 2016

1 commit

  • Use the psw_bits macro and simplify the code. The generated code is
    also better since it doesn't contain any conditional branches anymore.

    Reviewed-by: Hendrik Brueckner
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

30 Jul, 2016

1 commit

  • Pull smp hotplug updates from Thomas Gleixner:
    "This is the next part of the hotplug rework.

    - Convert all notifiers with a priority assigned

    - Convert all CPU_STARTING/DYING notifiers

    The final removal of the STARTING/DYING infrastructure will happen
    when the merge window closes.

    Another 700 hundred line of unpenetrable maze gone :)"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
    timers/core: Correct callback order during CPU hot plug
    leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
    powerpc/numa: Convert to hotplug state machine
    arm/perf: Fix hotplug state machine conversion
    irqchip/armada: Avoid unused function warnings
    ARC/time: Convert to hotplug state machine
    clocksource/atlas7: Convert to hotplug state machine
    clocksource/armada-370-xp: Convert to hotplug state machine
    clocksource/exynos_mct: Convert to hotplug state machine
    clocksource/arm_global_timer: Convert to hotplug state machine
    rcu: Convert rcutree to hotplug state machine
    KVM/arm/arm64/vgic-new: Convert to hotplug state machine
    smp/cfd: Convert core to hotplug state machine
    x86/x2apic: Convert to CPU hotplug state machine
    profile: Convert to hotplug state machine
    timers/core: Convert to hotplug state machine
    hrtimer: Convert to hotplug state machine
    x86/tboot: Convert to hotplug state machine
    arm64/armv8 deprecated: Convert to hotplug state machine
    hwtracing/coresight-etm4x: Convert to hotplug state machine
    ...

    Linus Torvalds
     

28 Jul, 2016

1 commit

  • Pull networking updates from David Miller:

    1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

    2) Make DSA binding more sane, from Andrew Lunn.

    3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

    4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

    5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface. From Brenden Blanco and
    others.

    6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

    7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

    8) Simplify netlink conntrack entry layout, from Florian Westphal.

    9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

    10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

    11) Support qdisc packet injection in pktgen, from John Fastabend.

    12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

    13) Add NV congestion control support to TCP, from Lawrence Brakmo.

    14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

    15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

    16) Support MPLS over IPV4, from Simon Horman.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    xgene: Fix build warning with ACPI disabled.
    be2net: perform temperature query in adapter regardless of its interface state
    l2tp: Correctly return -EBADF from pppol2tp_getname.
    net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
    net: ipmr/ip6mr: update lastuse on entry change
    macsec: ensure rx_sa is set when validation is disabled
    tipc: dump monitor attributes
    tipc: add a function to get the bearer name
    tipc: get monitor threshold for the cluster
    tipc: make cluster size threshold for monitoring configurable
    tipc: introduce constants for tipc address validation
    net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
    MAINTAINERS: xgene: Add driver and documentation path
    Documentation: dtb: xgene: Add MDIO node
    dtb: xgene: Add MDIO node
    drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
    drivers: net: xgene: Use exported functions
    drivers: net: xgene: Enable MDIO driver
    drivers: net: xgene: Add backward compatibility
    drivers: net: phy: xgene: Add MDIO driver
    ...

    Linus Torvalds
     

16 Jul, 2016

1 commit

  • This patch adds support for non-linear data on raw records. It
    extends raw records to have one or multiple fragments that will
    be written linearly into the ring slot, where each fragment can
    optionally have a custom callback handler to walk and extract
    complex, possibly non-linear data.

    If a callback handler is provided for a fragment, then the new
    __output_custom() will be used instead of __output_copy() for
    the perf_output_sample() part. perf_prepare_sample() does all
    the size calculation only once, so perf_output_sample() doesn't
    need to redo the same work anymore, meaning real_size and padding
    will be cached in the raw record. The raw record becomes 32 bytes
    in size without holes; to not increase it further and to avoid
    doing unnecessary recalculations in fast-path, we can reuse
    next pointer of the last fragment, idea here is borrowed from
    ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
    path for PERF_SAMPLE_RAW minimal.

    This facility is needed for BPF's event output helper as a first
    user that will, in a follow-up, add an additional perf_raw_frag
    to its perf_raw_record in order to be able to more efficiently
    dump skb context after a linear head meta data related to it.
    skbs can be non-linear and thus need a custom output function to
    dump buffers. Currently, the skb data needs to be copied twice;
    with the help of __output_custom() this work only needs to be
    done once. Future users could be things like XDP/BPF programs
    that work on different context though and would thus also have
    a different callback function.

    The few users of raw records are adapted to initialize their frag
    data from the raw record itself, no change in behavior for them.
    The code is based upon a PoC diff provided by Peter Zijlstra [1].

    [1] http://thread.gmane.org/gmane.linux.network/421294

    Suggested-by: Peter Zijlstra
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

14 Jul, 2016

1 commit

  • Install the callbacks via the state machine and let the core invoke the
    callbacks on the already online CPUs.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Anna-Maria Gleixner
    Cc: Christian Borntraeger
    Cc: Heiko Carstens
    Cc: Hendrik Brueckner
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-s390@vger.kernel.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160713153334.518084858@linutronix.de
    Signed-off-by: Ingo Molnar

    Sebastian Andrzej Siewior
     

28 Jun, 2016

1 commit


03 May, 2016

1 commit

  • Since commit 3b9d6da67e11 ("cpu/hotplug: Fix rollback during error-out
    in __cpu_disable()") it is ensured that callbacks of CPU_ONLINE and
    CPU_DOWN_PREPARE are processed on the hotplugged CPU. Due to this SMP
    function calls are no longer required.

    Replace smp_call_function_single() with a direct call of
    setup_pmc_cpu(). To keep the calling convention, interrupts are
    explicitly disabled around the call.

    Cc: linux-s390@vger.kernel.org
    Signed-off-by: Anna-Maria Gleixner
    Acked-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Anna-Maria Gleixner
     

17 Mar, 2016

1 commit

  • The cpumf_pmu_notfier() hotplug callback lacks handling of the
    CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the PMC
    of the CPU is not setup again. Furthermore the CPU_ONLINE_FROZEN case
    will never be processed because of masking the switch expression with
    CPU_TASKS_FROZEN.

    Add handling for CPU_DOWN_FAILED transition to setup the PMC of the
    CPU. Remove CPU_ONLINE_FROZEN case.

    Signed-off-by: Anna-Maria Gleixner
    Acked-by: Hendrik Brueckner
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Anna-Maria Gleixner
     

02 Mar, 2016

1 commit

  • commit e22cf8ca6f75 ("s390/cpumf: rework program parameter setting
    to detect guest samples") requires guest changes to get proper
    guest/host. We can do better: We can use the primary asn value,
    which is set on all Linux variants to compare this with the host
    pp value.
    We now have the following cases:
    1. Guest using PP
    host sample: gpp == 0, asn == hpp --> host
    guest sample: gpp != 0 --> guest
    2. Guest not using PP
    host sample: gpp == 0, asn == hpp --> host
    guest sample: gpp == 0, asn != hpp --> guest

    As soon as the host no longer sets CR4, we must back out
    this heuristics - let's add a comment in switch_to.

    Signed-off-by: Christian Borntraeger
    Reviewed-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     

14 Oct, 2015

1 commit

  • The program parameter can be used to mark hardware samples with
    some token. Previously, it was used to mark guest samples only.

    Improve the program parameter doubleword by combining two parts,
    the leftmost LPP part and the rightmost PID part. Set the PID
    part for processes by using the task PID.
    To distinguish host and guest samples for the kernel (PID part
    is zero), the guest must always set the program paramater to a
    non-zero value. Use the leftmost bit in the LPP part of the
    program parameter to be able to detect guest kernel samples.

    [brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced
    __LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts.
    And updated the commit message accordingly.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Hendrik Brueckner
    Reviewed-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     

03 Aug, 2015

1 commit


28 May, 2015

1 commit

  • Most code already uses consts for the struct kernel_param_ops,
    sweep the kernel for the last offending stragglers. Other than
    include/linux/moduleparam.h and kernel/params.c all other changes
    were generated with the following Coccinelle SmPL patch. Merge
    conflicts between trees can be handled with Coccinelle.

    In the future git could get Coccinelle merge support to deal with
    patch --> fail --> grammar --> Coccinelle --> new patch conflicts
    automatically for us on patches where the grammar is available and
    the patch is of high confidence. Consider this a feature request.

    Test compiled on x86_64 against:

    * allnoconfig
    * allmodconfig
    * allyesconfig

    @ const_found @
    identifier ops;
    @@

    const struct kernel_param_ops ops = {
    };

    @ const_not_found depends on !const_found @
    identifier ops;
    @@

    -struct kernel_param_ops ops = {
    +const struct kernel_param_ops ops = {
    };

    Generated-by: Coccinelle SmPL
    Cc: Rusty Russell
    Cc: Junio C Hamano
    Cc: Andrew Morton
    Cc: Kees Cook
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: cocci@systeme.lip6.fr
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Rusty Russell

    Luis R. Rodriguez
     

13 Mar, 2015

1 commit


12 Dec, 2014

1 commit

  • Pull s390 updates from Martin Schwidefsky:
    "The most notable change for this pull request is the ftrace rework
    from Heiko. It brings a small performance improvement and the ground
    work to support a new gcc option to replace the mcount blocks with a
    single nop.

    Two new s390 specific system calls are added to emulate user space
    mmio for PCI, an artifact of the how PCI memory is accessed.

    Two patches for the memory management with changes to common code.
    For KVM mm_forbids_zeropage is added which disables the empty zero
    page for an mm that is used by a KVM process. And an optimization,
    pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full.

    Some micro optimization for the cmpxchg and the spinlock code.

    And as usual bug fixes and cleanups"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
    s390/cputime: fix 31-bit compile
    s390/scm_block: make the number of reqs per HW req configurable
    s390/scm_block: handle multiple requests in one HW request
    s390/scm_block: allocate aidaw pages only when necessary
    s390/scm_block: use mempool to manage aidaw requests
    s390/eadm: change timeout value
    s390/mm: fix memory leak of ptlock in pmd_free_tlb
    s390: use local symbol names in entry[64].S
    s390/ptrace: always include vector registers in core files
    s390/simd: clear vector register pointer on fork/clone
    s390: translate cputime magic constants to macros
    s390/idle: convert open coded idle time seqcount
    s390/idle: add missing irq off lockdep annotation
    s390/debug: avoid function call for debug_sprintf_*
    s390/kprobes: fix instruction copy for out of line execution
    s390: remove diag 44 calls from cpu_relax()
    s390/dasd: retry partition detection
    s390/dasd: fix list corruption for sleep_on requests
    s390/dasd: fix infinite term I/O loop
    s390/dasd: remove unused code
    ...

    Linus Torvalds
     

03 Nov, 2014

1 commit


28 Oct, 2014

1 commit

  • Andy reported that the current state of event_idx is rather confused.
    So remove all but the x86_pmu implementation and change the default to
    return 0 (the safe option).

    Reported-by: Andy Lutomirski
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Benjamin Herrenschmidt
    Cc: Christoph Lameter
    Cc: Cody P Schafer
    Cc: Cody P Schafer
    Cc: Heiko Carstens
    Cc: Hendrik Brueckner
    Cc: Himangi Saraogi
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Gortmaker
    Cc: Paul Mackerras
    Cc: sukadev@linux.vnet.ibm.com
    Cc: Thomas Huth
    Cc: Vince Weaver
    Cc: linux390@de.ibm.com
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: linux-s390@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

27 Aug, 2014

1 commit

  • __get_cpu_var() is used for multiple purposes in the kernel source. One of
    them is address calculation via the form &__get_cpu_var(x). This calculates
    the address for the instance of the percpu variable of the current processor
    based on an offset.

    Other use cases are for storing and retrieving data from the current
    processors percpu area. __get_cpu_var() can be used as an lvalue when
    writing data or on the right side of an assignment.

    __get_cpu_var() is defined as :

    #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

    __get_cpu_var() always only does an address determination. However, store
    and retrieve operations could use a segment prefix (or global register on
    other platforms) to avoid the address calculation.

    this_cpu_write() and this_cpu_read() can directly take an offset into a
    percpu area and use optimized assembly code to read and write per cpu
    variables.

    This patch converts __get_cpu_var into either an explicit address
    calculation using this_cpu_ptr() or into a use of this_cpu operations that
    use the offset. Thereby address calculations are avoided and less registers
    are used when code is generated.

    At the end of the patch set all uses of __get_cpu_var have been removed so
    the macro is removed too.

    The patch set includes passes over all arches as well. Once these operations
    are used throughout then specialized macros can be defined in non -x86
    arches as well in order to optimize per cpu access by f.e. using a global
    register that may be set to the per cpu base.

    Transformations done to __get_cpu_var()

    1. Determine the address of the percpu instance of the current processor.

    DEFINE_PER_CPU(int, y);
    int *x = &__get_cpu_var(y);

    Converts to

    int *x = this_cpu_ptr(&y);

    2. Same as #1 but this time an array structure is involved.

    DEFINE_PER_CPU(int, y[20]);
    int *x = __get_cpu_var(y);

    Converts to

    int *x = this_cpu_ptr(y);

    3. Retrieve the content of the current processors instance of a per cpu
    variable.

    DEFINE_PER_CPU(int, y);
    int x = __get_cpu_var(y)

    Converts to

    int x = __this_cpu_read(y);

    4. Retrieve the content of a percpu struct

    DEFINE_PER_CPU(struct mystruct, y);
    struct mystruct x = __get_cpu_var(y);

    Converts to

    memcpy(&x, this_cpu_ptr(&y), sizeof(x));

    5. Assignment to a per cpu variable

    DEFINE_PER_CPU(int, y)
    __get_cpu_var(y) = x;

    Converts to

    this_cpu_write(y, x);

    6. Increment/Decrement etc of a per cpu variable

    DEFINE_PER_CPU(int, y);
    __get_cpu_var(y)++

    Converts to

    this_cpu_inc(y)

    Cc: Martin Schwidefsky
    CC: linux390@de.ibm.com
    Acked-by: Heiko Carstens
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

03 Apr, 2014

1 commit

  • Use the new defines for external interruption codes to get rid
    of "magic" numbers in the s390 source code. And while we're at it,
    also rename the (un-)register_external_interrupt function to
    something shorter so that this patch does not exceed the 80
    columns all over the place.

    Signed-off-by: Thomas Huth
    Signed-off-by: Martin Schwidefsky

    Thomas Huth
     

13 Jan, 2014

1 commit

  • The patch "s390/perf: add support for the CPU-Measurement Sampling
    Facility" added a new instance of the __cpuinit macro usage.

    We removed this a couple versions ago; we now want to remove
    the compat no-op stubs. Introducing new users is not what
    we want to see at this point in time, as it will break once
    the stubs are gone.

    Cc: Hendrik Brueckner
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Martin Schwidefsky

    Paul Gortmaker
     

16 Dec, 2013

10 commits

  • Add the PERF_CPUM_SF_FULL_BLOCKS flag to process only sample-data-blocks that
    have the block-full-indicator bit set. Sample-data-blocks that are partially
    filled are discarded. Use this flag if the sampling buffer is likely to be
    shared among perf events that use different sampling modes. In such
    environments, flushing sample-data-blocks that are not completely filled, might
    cause invalid-data-formats.

    Setting PERF_CPUM_SF_FULL_BLOCKS prevents potentially invalid sampling data to
    be processed but, in contrast, also discards valid samples in partially filled
    sample-data-blocks. Note that sample-data-blocks might not become full for
    small sampling frequencies or for workload that is scheduled for tiny intervals.

    To sample with the PERF_CPUM_SF_FULL_BLOCKS flag, set the perf->attr.config1
    to 0x0004. For example:

    perf record -e cpum_sf/config=0xB000,config1=0x0004/

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • Also support the diagnostic-sampling function in addition to the basic-sampling
    function. Diagnostic-sampling data entries contain hardware model specific
    sampling data and additional programs are required to analyze the data.

    To deliver diagnostic-sampling, as well, as basis-sampling data entries to user
    space, introduce support for sampling "raw data". If this particular perf
    sampling type (PERF_SAMPLE_RAW) is used, sampling data entries are copied
    to user space. External programs can then analyze these data.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • Introduce the perf_exclude_event() function to filter perf samples
    according to event->attr.exclude_* settings. During event initialization,
    reset event exclude settings that are not supported.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • The host-program-parameter (hpp) value of basic sample-data-entries designates
    a SIE control block that is set by the LPP instruction in sie64a().
    Non-zero values indicate guest samples, a value of zero indicates a host sample.

    For perf samples, host and guest samples are distinguished using particular
    PERF_MISC_* flags. The perf layer calls perf_misc_flags() to set the flags
    based on the pt_regs content. For each sample-data-entry, the cpum_sf PMU
    creates a pt_regs structure with the sample-data information. An additional
    flag structure is added to easily distinguish between host and guest samples.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • The trailer entry contains a timestamp of the time when the sample-data-block
    became full. The timestamp specifies a TOD (time-of-day) value in either the
    STCK or STCKE format.

    Provide a helper function to return the TOD value depending on the setting of
    time format indicator.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • Ensure to reset the sample-data-block full indicator and the overflow counter
    at the same time. This must be done atomically because the sampling hardware
    is still active while full sample-data-block is processed.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • Improve the sampling buffer allocation and add a function to reallocate and
    increase the sampling buffer structure. The number of allocated buffer elements
    (sample-data-blocks) are accounted. You can control the minimum and maximum
    number these sample-data-blocks through the cpum_sfb_size kernel parameter.

    The number hardware sample overflows (if any) are also accounted and stored
    per perf event. During the PMU disable/enable calls, the accumulated overflow
    counter is analyzed and, if necessary, the sampling buffer is dynamically
    increased.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • Introduce reserve/release functions to share the sampling facility
    between perf and oprofile.
    Also improve error handling for the sampling facility support in perf.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • The cpum_cf (counter facility) PMU does not support sampling events.
    With cpum_sf (sampling facility), a PMU for sampling CPU cycles is
    available.

    Make cpum_sf the "default" PMU for PERF_COUNT_HW_CPU_CYCLES sampling
    events but use the more precise cpum_cf PMU for non-sampling events.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • Introduce a perf PMU, "cpum_sf", to support the CPU-Measurement
    Sampling Facility. You can control the sampling facility through
    this perf PMU interfaces. Perf sampling events are created for
    hardware samples.

    For details about the CPU-Measurement Sampling Facility, see
    "The Load-Program-Parameter and the CPU-Measurement Facilities" (SA23-2260).

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner