13 Mar, 2010

2 commits


12 Mar, 2010

8 commits

  • Newt has widespread availability and provides a rather simple
    API as can be seen by the size of this patch.

    The work needed to support it will benefit other frontends too.

    In this initial patch it just checks if the output is a tty, if
    not it falls back to the previous behaviour, also if
    newt-devel/libnewt-dev is not installed the previous behaviour
    is maintaned.

    Pressing enter on a symbol will annotate it, ESC in the
    annotation window will return to the report symbol list.

    More work will be done to remove the special casing in
    color_fprintf, stop using fmemopen/FILE in the printing of
    hist_entries, etc.

    Also the annotation doesn't need to be done via spawning "perf
    annotate" and then browsing its output, we can do better by
    calling directly the builtin-annotate.c functions, that would
    then be moved to tools/perf/util/annotate.c and shared with perf
    top, etc

    But lets go by baby steps, this patch already improves perf
    usability by allowing to quickly do annotations on symbols from
    the report screen and provides a first experimentation with
    libnewt/TUI integration of tools.

    Tested on RHEL5 and Fedora12 X86_64 and on Debian PARISC64 to
    browse a perf.data file collected on a Fedora12 x86_64 box.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Avi Kivity
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • We need those to properly size the browser widht in the newt
    TUI.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Just like we do for pr_debug, so that we can have a single point
    where to redirect to the currently used output system, be it
    stdio or newt.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Will be used by the newt code too.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Merge reason: We want to queue up a dependent patch.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • [acme@mica linux-2.6-tip]$ perf record -a -f
    Fatal: Permission error - are you root?
    Consider tweaking /proc/sys/kernel/perf_event_paranoid.

    [acme@mica linux-2.6-tip]$

    Suggested-by: Ingo Molnar
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Fixing this symptom:

    [acme@mica linux-2.6-tip]$ perf record -a -f
    Fatal: Permission error - are you root?

    Bus error
    [acme@mica linux-2.6-tip]$

    I.e. if for some reason no data is collected, in this case a non
    root user trying to do systemwide profiling, no data will be
    collected, and then we end up trying to mmap a zero sized file
    and access the file header, b00m.

    Reported-by: Ingo Molnar
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

11 Mar, 2010

8 commits

  • This patch is an optimization in perf_event_task_sched_in() to avoid
    scheduling the events twice in a row.

    Without it, the perf_disable()/perf_enable() pair is invoked twice,
    thereby pinned events counts while scheduling flexible events and we go
    throuh hw_perf_enable() twice.

    By encapsulating, the whole sequence into perf_disable()/perf_enable() we
    ensure, hw_perf_enable() is going to be invoked only once because of the
    refcount protection.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    eranian@google.com
     
  • Export perf_trace_regs and perf_arch_fetch_caller_regs since module will
    use these.

    Signed-off-by: Xiao Guangrong
    [ use EXPORT_PER_CPU_SYMBOL_GPL() ]
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     
  • What happens is that we schedule badly like:

    -1987 [019] 280.252808: x86_pmu_start: event-46/1300c0: idx: 0
    -1987 [019] 280.252811: x86_pmu_start: event-47/1300c0: idx: 1
    -1987 [019] 280.252812: x86_pmu_start: event-48/1300c0: idx: 2
    -1987 [019] 280.252813: x86_pmu_start: event-49/1300c0: idx: 3
    -1987 [019] 280.252814: x86_pmu_start: event-50/1300c0: idx: 32
    -1987 [019] 280.252825: x86_pmu_stop: event-46/1300c0: idx: 0
    -1987 [019] 280.252826: x86_pmu_stop: event-47/1300c0: idx: 1
    -1987 [019] 280.252827: x86_pmu_stop: event-48/1300c0: idx: 2
    -1987 [019] 280.252828: x86_pmu_stop: event-49/1300c0: idx: 3
    -1987 [019] 280.252829: x86_pmu_stop: event-50/1300c0: idx: 32
    -1987 [019] 280.252834: x86_pmu_start: event-47/1300c0: idx: 1
    -1987 [019] 280.252834: x86_pmu_start: event-48/1300c0: idx: 2
    -1987 [019] 280.252835: x86_pmu_start: event-49/1300c0: idx: 3
    -1987 [019] 280.252836: x86_pmu_start: event-50/1300c0: idx: 32
    -1987 [019] 280.252837: x86_pmu_start: event-51/1300c0: idx: 32 *FAIL*

    This happens because we only iterate the n_running events in the first
    pass, and reset their index to -1 if they don't match to force a
    re-assignment.

    Now, in our RR example, n_running == 0 because we fully unscheduled, so
    event-50 will retain its idx==32, even though in scheduling it will have
    gotten idx=0, and we don't trigger the re-assign path.

    The easiest way to fix this is the below patch, which simply validates
    the full assignment in the second pass.

    Reported-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Fix:

    arch/powerpc/kernel/perf_event.c:1334: error: 'power_pmu_notifier' undeclared (first use in this function)
    arch/powerpc/kernel/perf_event.c:1334: error: (Each undeclared identifier is reported only once
    arch/powerpc/kernel/perf_event.c:1334: error: for each function it appears in.)
    arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'power_pmu_notifier'
    arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'register_cpu_notifier'

    Due to commit 3f6da390 (perf: Rework and fix the arch CPU-hotplug hooks).

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Without this change, the install path is relative to
    prefix/DESTDIR where prefix is automatically set to $HOME.

    This can produce unexpected results. For example:

    make -C tools/perf DESTDIR=/home/jkacur/tmp install-man

    creates the directory: /home/jkacur/home/jkacur/tmp/share/...
    instead of the expected: /home/jkacur/tmp/share/...

    Signed-off-by: John Kacur
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Tom Zanussi
    Cc: Kyle McMartin
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    John Kacur
     
  • From : Ananth N Mavinakayanahalli

    When freeing the instruction slot, the arithmetic to calculate
    the index of the slot in the page needs to account for the total
    size of the instruction on the various architectures.

    Calculate the index correctly when freeing the out-of-line
    execution slot.

    Reported-by: Sachin Sant
    Reported-by: Heiko Carstens
    Signed-off-by: Ananth N Mavinakayanahalli
    Signed-off-by: Masami Hiramatsu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     
  • At present, the perf subcommands that do system-wide monitoring
    (perf stat, perf record and perf top) don't work properly unless
    the online cpus are numbered 0, 1, ..., N-1. These tools ask
    for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
    and then try to create events for cpus 0, 1, ..., N-1.

    This creates problems for systems where the online cpus are
    numbered sparsely. For example, a POWER6 system in
    single-threaded mode (i.e. only running 1 hardware thread per
    core) will have only even-numbered cpus online.

    This fixes the problem by reading the /sys/devices/system/cpu/online
    file to find out which cpus are online. The code that does that is in
    tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
    function that sets up a cpumap[] array and returns the number of
    online cpus. If /sys/devices/system/cpu/online can't be read or
    can't be parsed successfully, it falls back to using sysconf to
    ask how many cpus are online and sets up an identity map in cpumap[].

    The perf record, perf stat and perf top code then calls
    read_cpu_map() in the system-wide monitoring case (instead of
    sysconf) and uses cpumap[] to get the cpu numbers to pass to
    perf_event_open.

    Signed-off-by: Paul Mackerras
    Cc: Anton Blanchard
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Anton Blanchard found that he could reliably make the kernel hit a
    BUG_ON in the slab allocator by taking a cpu offline and then online
    while a system-wide perf record session was running.

    The reason is that when the cpu comes up, we completely reinitialize
    the ctx field of the struct perf_cpu_context for the cpu. If there is
    a system-wide perf record session running, then there will be a struct
    perf_event that has a reference to the context, so its refcount will
    be 2. (The perf_event has been removed from the context's group_entry
    and event_entry lists by perf_event_exit_cpu(), but that doesn't
    remove the perf_event's reference to the context and doesn't decrement
    the context's refcount.)

    When the cpu comes up, perf_event_init_cpu() gets called, and it calls
    __perf_event_init_context() on the cpu's context. That resets the
    refcount to 1. Then when the perf record session finishes and the
    perf_event is closed, the refcount gets decremented to 0 and the
    context gets kfreed after an RCU grace period. Since the context
    wasn't kmalloced -- it's part of a per-cpu variable -- bad things
    happen.

    In fact we don't need to completely reinitialize the context when the
    cpu comes up. It's sufficient to initialize the context once at boot,
    but we need to do it for all possible cpus.

    This moves the context initialization to happen at boot time. With
    this, we don't trash the refcount and the context never gets kfreed,
    and we don't hit the BUG_ON.

    Reported-by: Anton Blanchard
    Signed-off-by: Paul Mackerras
    Tested-by: Anton Blanchard
    Acked-by: Peter Zijlstra
    Cc:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

10 Mar, 2010

22 commits

  • Drop the obsolete "profile" naming used by perf for trace events.
    Perf can now do more than simple events counting, so generalize
    the API naming.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Masami Hiramatsu
    Cc: Jason Baron

    Frederic Weisbecker
     
  • We are taking a wrong regs snapshot when a trace event triggers.
    Either we use get_irq_regs(), which gives us the interrupted
    registers if we are in an interrupt, or we use task_pt_regs()
    which gives us the state before we entered the kernel, assuming
    we are lucky enough to be no kernel thread, in which case
    task_pt_regs() returns the initial set of regs when the kernel
    thread was started.

    What we want is different. We need a hot snapshot of the regs,
    so that we can get the instruction pointer to record in the
    sample, the frame pointer for the callchain, and some other
    things.

    Let's use the new perf_fetch_caller_regs() for that.

    Comparison with perf record -e lock: -R -a -f -g
    Before:

    perf [kernel] [k] __do_softirq
    |
    --- __do_softirq
    |
    |--55.16%-- __open
    |
    --44.84%-- __write_nocancel

    After:

    perf [kernel] [k] perf_tp_event
    |
    --- perf_tp_event
    |
    |--41.07%-- lock_acquire
    | |
    | |--39.36%-- _raw_spin_lock
    | | |
    | | |--7.81%-- hrtimer_interrupt
    | | | smp_apic_timer_interrupt
    | | | apic_timer_interrupt

    The old case was producing unreliable callchains. Now having
    right frame and instruction pointers, we have the trace we
    want.

    Also syscalls and kprobe events already have the right regs,
    let's use them instead of wasting a retrieval.

    v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    Cc: Archs

    Frederic Weisbecker
     
  • Events that trigger overflows by interrupting a context can
    use get_irq_regs() or task_pt_regs() to retrieve the state
    when the event triggered. But this is not the case for some
    other class of events like trace events as tracepoints are
    executed in the same context than the code that triggered
    the event.

    It means we need a different api to capture the regs there,
    namely we need a hot snapshot to get the most important
    informations for perf: the instruction pointer to get the
    event origin, the frame pointer for the callchain, the code
    segment for user_mode() tests (we always use __KERNEL_CS as
    trace events always occur from the kernel) and the eflags
    for further purposes.

    v2: rename perf_save_regs to perf_fetch_caller_regs as per
    Masami's suggestion.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    Cc: Archs

    Frederic Weisbecker
     
  • We were using the frame pointer based stack walker on every
    contexts in x86-32, but not in x86-64 where we only use the
    seven-league boots on the exception stacks.

    Use it also on irq and process stacks. This utterly accelerate
    the captures.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • There are rcu locked read side areas in the path where we submit
    a trace event. And these rcu_read_(un)lock() trigger lock events,
    which create recursive events.

    One pair in do_perf_sw_event:

    __lock_acquire
    |
    |--96.11%-- lock_acquire
    | |
    | |--27.21%-- do_perf_sw_event
    | | perf_tp_event
    | | |
    | | |--49.62%-- ftrace_profile_lock_release
    | | | lock_release
    | | | |
    | | | |--33.85%-- _raw_spin_unlock

    Another pair in perf_output_begin/end:

    __lock_acquire
    |--23.40%-- perf_output_begin
    | | __perf_event_overflow
    | | perf_swevent_overflow
    | | perf_swevent_add
    | | perf_swevent_ctx_event
    | | do_perf_sw_event
    | | perf_tp_event
    | | |
    | | |--55.37%-- ftrace_profile_lock_acquire
    | | | lock_acquire
    | | | |
    | | | |--37.31%-- _raw_spin_lock

    The problem is not that much the trace recursion itself, as we have a
    recursion protection already (though it's always wasteful to recurse).
    But the trace events are outside the lockdep recursion protection, then
    each lockdep event triggers a lock trace, which will trigger two
    other lockdep events. Here the recursive lock trace event won't
    be taken because of the trace recursion, so the recursion stops there
    but lockdep will still analyse these new events:

    To sum up, for each lockdep events we have:

    lock_*()
    |
    trace lock_acquire
    |
    ----- rcu_read_lock()
    | |
    | lock_acquire()
    | |
    | trace_lock_acquire() (stopped)
    | |
    | lockdep analyze
    |
    ----- rcu_read_unlock()
    |
    lock_release
    |
    trace_lock_release() (stopped)
    |
    lockdep analyze

    And you can repeat the above two times as we have two rcu read side
    sections when we submit an event.

    This is fixed in this patch by moving the lock trace event under
    the lockdep recursion protection.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Hitoshi Mitake
    Cc: Li Zefan
    Cc: Lai Jiangshan
    Cc: Masami Hiramatsu
    Cc: Jens Axboe

    Frederic Weisbecker
     
  • If -vv is used just the map table will be printed, -vvv will
    print the symbol table too, with it we can see that we have a
    bug where some samples are not being resolved to a map when we
    get them in the perf.data stream, but after we have it all
    processed, we can find the right map, some reordering probably
    is happening.

    Upcoming patches will provide ways to ask for most PERF_SAMPLE_
    conditional samples to be taken for !PERF_RECORD_SAMPLE events
    too, then we'll be able to ask for PERF_SAMPLE_TIME and
    PERF_SAMPLE_CPU to help diagnose this.

    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Perf report does not handle multiple events being reported, even
    though perf record stores them properly on disk. This patch
    addresses that issue by adding the logic to perf report to use
    the event stream id that is saved by record and the new data
    structures to seperate the event streams and report them
    individually.

    Signed-off-by: Eric B Munson
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Now that report can store historgrams for multiple events we
    need to be able to do the post processing work for each
    histogram. This patch changes the post processing functions so
    that they can be called individually for each event's histogram.

    Signed-off-by: Eric B Munson
    [ Guarantee bisectabilty by fixing up builtin-report.c ]
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • This patch adds the structures necessary to count each event
    type independently in perf report.

    Signed-off-by: Eric B Munson
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • In order to minimize the impact of storing multiple events in a
    report this function will now take the root of the histogram
    tree so that the logic for selecting the proper tree can be
    inserted before the call.

    Signed-off-by: Eric B Munson
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • Currently perf record does not write the ID or the to disk for
    events. This doesn't allow report to tell if an event stream
    contains one or more types of events. This patch adds this
    entry to the list of data that record will write to disk if more
    than one event was requested.

    Signed-off-by: Eric B Munson
    Signed-off-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • cc1: warnings being treated as errors
    util/probe-finder.c: In function 'find_line_range':
    util/probe-finder.c:172: warning: 'src' may be used
    uninitialized in this function make: *** [util/probe-finder.o]
    Error 1

    Signed-off-by: Arnaldo Carvalho de Melo
    Acked-by: Masami Hiramatsu
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Cc: David S. Miller
    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Fix typo. But the modularization here is ugly and should be improved.

    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The PEBS+LBR decoding magic needs the insn_get_length() infrastructure
    to be able to decode x86 instruction length.

    So split it out of KPROBES dependency and make it enabled when either
    KPROBES or PERF_EVENTS is enabled.

    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Don't decrement the TOS twice...

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Pull the core handler in line with the nhm one, also make sure we always
    drain the buffer.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We don't need checking_{wr,rd}msr() calls, since we should know what cpu
    we're running on and not use blindly poke at msrs.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • If we reset the LBR on each first counter, simple counter rotation which
    first deschedules all counters and then reschedules the new ones will
    lead to LBR reset, even though we're still in the same task context.

    Reduce this by not flushing on the first counter but only flushing on
    different task contexts.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We need to use the actual cpuc->pebs_enabled value, not a local copy for
    the changes to take effect.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Its unclear if the PEBS state record will have only a single bit set, in
    case it does not and accumulates bits, deal with that by only processing
    each event once.

    Also, robustify some of the code.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The documentation says we have to enable PEBS before we enable the PMU
    proper.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra