01 Jul, 2011

1 commit

  • Add a NODE level to the generic cache events which is used to measure
    local vs remote memory accesses. Like all other cache events, an
    ACCESS is HIT+MISS, if there is no way to distinguish between reads
    and writes do reads only etc..

    The below needs filling out for !x86 (which I filled out with
    unsupported events).

    I'm fairly sure ARM can leave it like that since it doesn't strike me as
    an architecture that even has NUMA support. SH might have something since
    it does appear to have some NUMA bits.

    Sparc64, PowerPC and MIPS certainly want a good look there since they
    clearly are NUMA capable.

    Signed-off-by: Peter Zijlstra
    Cc: David Miller
    Cc: Anton Blanchard
    Cc: David Daney
    Cc: Deng-Cheng Zhu
    Cc: Paul Mundt
    Cc: Will Deacon
    Cc: Robert Richter
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

26 Nov, 2010

1 commit

  • The perf hardware pmu got initialized at various points in the boot,
    some before early_initcall() some after (notably arch_initcall).

    The problem is that the NMI lockup detector is ran from early_initcall()
    and expects the hardware pmu to be present.

    Sanitize this by moving all architecture hardware pmu implementations to
    initialize at early_initcall() and move the lockup detector to an explicit
    initcall right after that.

    Cc: paulus
    Cc: davem
    Cc: Michael Cree
    Cc: Deng-Cheng Zhu
    Acked-by: Paul Mundt
    Acked-by: Will Deacon
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 Oct, 2009

1 commit


21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

06 Aug, 2009

1 commit


18 Jun, 2009

2 commits

  • At present, the powerpc generic (processor-independent) perf_counter
    code has list of processor back-end modules, and at initialization,
    it looks at the PVR (processor version register) and has a switch
    statement to select a suitable processor-specific back-end.

    This is going to become inconvenient as we add more processor-specific
    back-ends, so this inverts the order: now each back-end checks whether
    it applies to the current processor, and registers itself if so.
    Furthermore, instead of looking at the PVR, back-ends now check the
    cur_cpu_spec->oprofile_cpu_type string and match on that.

    Lastly, each back-end now specifies a name for itself so the core can
    print a nice message when a back-end registers itself.

    This doesn't provide any support for unregistering back-ends, but that
    wouldn't be hard to do and would allow back-ends to be modules.

    Signed-off-by: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: linuxppc-dev@ozlabs.org
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • This changes the powerpc perf_counter back-end to use unsigned long
    types for hardware register values and for the value/mask pairs used
    in checking whether a given set of events fit within the hardware
    constraints. This is in preparation for adding support for the PMU
    on some 32-bit powerpc processors. On 32-bit processors the hardware
    registers are only 32 bits wide, and the PMU structure is generally
    simpler, so 32 bits should be ample for expressing the hardware
    constraints. On 64-bit processors, unsigned long is 64 bits wide,
    so using unsigned long vs. u64 (unsigned long long) makes no actual
    difference.

    This makes some other very minor changes: adjusting whitespace to line
    things up in initialized structures, and simplifying some code in
    hw_perf_disable().

    Signed-off-by: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: linuxppc-dev@ozlabs.org
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

11 Jun, 2009

3 commits


15 May, 2009

2 commits

  • This uses values from the MMCRA, SIAR and SDAR registers on
    powerpc to supply more precise information for overflow events,
    including a data address when PERF_RECORD_ADDR is specified.

    Since POWER6 uses different bit positions in MMCRA from earlier
    processors, this converts the struct power_pmu limited_pmc5_6
    field, which only had 0/1 values, into a flags field and
    defines bit values for its previous use (PPMU_LIMITED_PMC5_6)
    and a new flag (PPMU_ALT_SIPR) to indicate that the processor
    uses the POWER6 bit positions rather than the earlier
    positions. It also adds definitions in reg.h for the new and
    old positions of the bit that indicates that the SIAR and SDAR
    values come from the same instruction.

    For the data address, the SDAR value is supplied if we are not
    doing instruction sampling. In that case there is no guarantee
    that the address given in the PERF_RECORD_ADDR subrecord will
    correspond to the instruction whose address is given in the
    PERF_RECORD_IP subrecord.

    If instruction sampling is enabled (e.g. because this counter
    is counting a marked instruction event), then we only supply
    the SDAR value for the PERF_RECORD_ADDR subrecord if it
    corresponds to the instruction whose address is in the
    PERF_RECORD_IP subrecord. Otherwise we supply 0.

    [ Impact: support more PMU hardware features on PowerPC ]

    Signed-off-by: Paul Mackerras
    Acked-by: Peter Zijlstra
    Cc: Corey Ashford
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Although the perf_counter API allows 63-bit raw event codes,
    internally in the powerpc back-end we had been using 32-bit
    event codes. This expands them to 64 bits so that we can add
    bits for specifying threshold start/stop events and instruction
    sampling modes later.

    This also corrects the return value of can_go_on_limited_pmc;
    we were returning an event code rather than just a 0/1 value in
    some circumstances. That didn't particularly matter while event
    codes were 32-bit, but now that event codes are 64-bit it
    might, so this fixes it.

    [ Impact: extend PowerPC perfcounter interfaces from u32 to u64 ]

    Signed-off-by: Paul Mackerras
    Acked-by: Peter Zijlstra
    Cc: Corey Ashford
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

29 Apr, 2009

1 commit

  • POWER5+ and POWER6 have two hardware counters with limited functionality:
    PMC5 counts instructions completed in run state and PMC6 counts cycles
    in run state. (Run state is the state when a hardware RUN bit is 1;
    the idle task clears RUN while waiting for work to do and sets it when
    there is work to do.)

    These counters can't be written to by the kernel, can't generate
    interrupts, and don't obey the freeze conditions. That means we can
    only use them for per-task counters (where we know we'll always be in
    run state; we can't put a per-task counter on an idle task), and only
    if we don't want interrupts and we do want to count in all processor
    modes.

    Obviously some counters can't go on a limited hardware counter, but there
    are also situations where we can only put a counter on a limited hardware
    counter - if there are already counters on that exclude some processor
    modes and we want to put on a per-task cycle or instruction counter that
    doesn't exclude any processor mode, it could go on if it can use a
    limited hardware counter.

    To keep track of these constraints, this adds a flags argument to the
    processor-specific get_alternatives() functions, with three bits defined:
    one to say that we can accept alternative event codes that go on limited
    counters, one to say we only want alternatives on limited counters, and
    one to say that this is a per-task counter and therefore events that are
    gated by run state are equivalent to those that aren't (e.g. a "cycles"
    event is equivalent to a "cycles in run state" event). These flags
    are computed for each counter and stored in the counter->hw.counter_base
    field (slightly wonky name for what it does, but it was an existing
    unused field).

    Since the limited counters don't freeze when we freeze the other counters,
    we need some special handling to avoid getting skew between things counted
    on the limited counters and those counted on normal counters. To minimize
    this skew, if we are using any limited counters, we read PMC5 and PMC6
    immediately after setting and clearing the freeze bit. This is done in
    a single asm in the new write_mmcr0() function.

    The code here is specific to PMC5 and PMC6 being the limited hardware
    counters. Being more general (e.g. having a bitmap of limited hardware
    counter numbers) would have meant more complex code to read the limited
    counters when freezing and unfreezing the normal counters, with
    conditional branches, which would have increased the skew. Since it
    isn't necessary for the code to be more general at this stage, it isn't.

    This also extends the back-ends for POWER5+ and POWER6 to be able to
    handle up to 6 counters rather than the 4 they previously handled.

    Signed-off-by: Paul Mackerras
    Acked-by: Peter Zijlstra
    Cc: Robert Richter
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

08 Apr, 2009

1 commit

  • Impact: enable access to hardware feature

    POWER processors have the ability to "mark" a subset of the instructions
    and provide more detailed information on what happens to the marked
    instructions as they flow through the pipeline. This marking is
    enabled by the "sample enable" bit in MMCRA, and there are
    synchronization requirements around setting and clearing the bit.

    This adds logic to the processor-specific back-ends so that they know
    which events relate to marked instructions and set the sampling enable
    bit if any event that we want to put on the PMU is a marked instruction
    event. It also adds logic to the generic powerpc code to do the
    necessary synchronization if that bit is set.

    Signed-off-by: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

10 Jan, 2009

1 commit