16 Jul, 2011

7 commits

  • Since there are dwarf_bitsize, dwarf_bitoffset and dwarf_bytesize
    defined in libdw, we don't need die_get_bit_size, die_get_bit_offset
    and die_get_byte_size anymore.

    Signed-off-by: Masami Hiramatsu
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20110627072721.6528.2747.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Since strtailcmp() is enough generic, it should be defined in string.c.

    Signed-off-by: Masami Hiramatsu
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20110627072715.6528.10677.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Since die_find/walk* callbacks use DIE_FIND_CB_FOUND for
    both of failed and found cases, it should be "END"
    instead "FOUND" for avoiding confusion.

    Signed-off-by: Masami Hiramatsu
    Reported-by: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/20110627072709.6528.45706.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Since the address of a module-local variable can only be
    solved after the target module is loaded, the symbol
    fetch-argument should be updated when loading target
    module.

    Signed-off-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20110627072703.6528.75042.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • To support probing module init functions, kprobe-tracer allows
    user to define a probe on non-existed function when it is given
    with a module name. This also enables user to set a probe on
    a function on a specific module, even if a same name (but different)
    function is locally defined in another module.

    The module name must be in the front of function name and separated
    by a ':'. e.g. btrfs:btrfs_init_sysfs

    Signed-off-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Return -ENOENT if probe point doesn't exist, but still returns
    -EINVAL if both of kprobe->addr and kprobe->symbol_name are
    specified or both are not specified.

    Acked-by: Ananth N Mavinakayanahalli
    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Anil S Keshavamurthy
    Cc: "David S. Miller"
    Link: http://lkml.kernel.org/r/20110627072650.6528.67329.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Merge redundant enable/disable functions into enable_trace_probe()
    and disable_trace_probe().

    Signed-off-by: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: yrl.pp-manager.tt@hitachi.com
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/20110627072644.6528.26910.stgit@fedora15

    [ converted kprobe selftest to use enable_trace_probe ]

    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     

15 Jul, 2011

4 commits

  • Rename probe_* to trace_probe_* for avoiding namespace
    confliction. This also fixes improper names of find_probe_event()
    and cleanup_all_probes() to find_trace_probe() and
    release_all_trace_probes() respectively.

    Signed-off-by: Masami Hiramatsu
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Instead of hw_nmi_watchdog_set_attr() weak function
    and appropriate x86_pmu::hw_watchdog_set_attr() call
    we introduce even alias mechanism which allow us
    to drop this routines completely and isolate quirks
    of Netburst architecture inside P4 PMU code only.

    The main idea remains the same though -- to allow
    nmi-watchdog and perf top run simultaneously.

    Note the aliasing mechanism applies to generic
    PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
    event (say passed as RAW initially) might have some
    additional bits set inside ESCR register changing
    the behaviour of event and we can't guarantee anymore
    that alias event will give the same result.

    P.S. Thanks a huge to Don and Steven for for testing
    and early review.

    Acked-by: Don Zickus
    Tested-by: Steven Rostedt
    Signed-off-by: Cyrill Gorcunov
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Stephane Eranian
    CC: Lin Ming
    CC: Arnaldo Carvalho de Melo
    CC: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun
    Signed-off-by: Steven Rostedt

    Cyrill Gorcunov
     
  • Currently the stack trace per event in ftace is only 8 frames.
    This can be quite limiting and sometimes useless. Especially when
    the "ignore frames" is wrong and we also use up stack frames for
    the event processing itself.

    Change this to be dynamic by adding a percpu buffer that we can
    write a large stack frame into and then copy into the ring buffer.

    For interrupts and NMIs that come in while another event is being
    process, will only get to use the 8 frame stack. That should be enough
    as the task that it interrupted will have the full stack frame anyway.

    Requested-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • While attempting to create a timechart of boot up I found perf didn't
    tolerate modules being loaded/unloaded. This patch fixes this by
    reading the file once and then writing the size read at the correct
    point in the file. It also simplifies the code somewhat.

    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Signed-off-by: Sonny Rao
    Signed-off-by: Michael Neuling
    Link: http://lkml.kernel.org/r/10011.1310614483@neuling.org
    Signed-off-by: Steven Rostedt

    Sonny Rao
     

14 Jul, 2011

3 commits

  • Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
    fail the dynamic ftrace selftest.

    The function tracer has a quick 'off' variable that will prevent
    the call back functions from being called. This variable is called
    function_trace_stop. In x86, this is implemented directly in the mcount
    assembly, but for other archs, an intermediate function is used called
    ftrace_test_stop_func().

    In dynamic ftrace, the function pointer variable ftrace_trace_function is
    used to update the caller code in the mcount caller. But for archs that
    do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
    ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.

    When more than one ftrace_ops is registered, the function it calls is
    ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
    and call the callbacks that have their hash matching.

    The issue happens when two ftrace_ops are registered for different functions
    and one is then unregistered. The __ftrace_trace_function is then pointed
    to the remaining ftrace_ops callback function directly. This mean it will
    be called for all functions that were registered to trace by both ftrace_ops
    that were registered.

    This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
    because the update of ftrace_trace_function doesn't happen until after all
    functions have been updated, and then the mcount caller is updated. But
    for those archs that do use the ftrace_test_stop_func(), the update is
    immediate.

    The dynamic selftest fails because it hits this situation, and the
    ftrace_ops that it registers fails to only trace what it was suppose to
    and instead traces all other functions.

    The solution is to delay the setting of __ftrace_trace_function until
    after all the functions have been updated according to the registered
    ftrace_ops. Also, function_trace_stop is set during the update to prevent
    function tracing from calling code that is caused by the function tracer
    itself.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently, if set_ftrace_filter() is called when the ftrace_ops is
    active, the function filters will not be updated. They will only be updated
    when tracing is disabled and re-enabled.

    Update the functions immediately during set_ftrace_filter().

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Whenever the hash of the ftrace_ops is updated, the record counts
    must be balance. This requires disabling the records that are set
    in the original hash, and then enabling the records that are set
    in the updated hash.

    Moving the update into ftrace_hash_move() removes the bug where the
    hash was updated but the records were not, which results in ftrace
    triggering a warning and disabling itself because the ftrace_ops filter
    is updated while the ftrace_ops was registered, and then the failure
    happens when the ftrace_ops is unregistered.

    The current code will not trigger this bug, but new code will.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Jul, 2011

3 commits

  • When I mounted an NFS directory, it caused several modules to be loaded. At the
    time I was running the preemptirqsoff tracer, and it showed the following
    output:

    # tracer: preemptirqsoff
    #
    # preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
    # --------------------------------------------------------------------
    # latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
    # -----------------
    # | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    # => started at: ftrace_module_notify
    # => ended at: ftrace_module_notify
    #
    #
    # _------=> CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| /_--=> lock-depth
    # |||||/ delay
    # cmd pid |||||| time | caller
    # \ / |||||| \ | /
    modprobe-19370 3d.... 0us!: ftrace_process_locs
    => ftrace_process_locs
    => ftrace_module_notify
    => notifier_call_chain
    => __blocking_notifier_call_chain
    => blocking_notifier_call_chain
    => sys_init_module
    => system_call_fastpath

    That's over 1ms that interrupts are disabled on a Real-Time kernel!

    Looking at the cause (being the ftrace author helped), I found that the
    interrupts are disabled before the code modification of mcounts into nops. The
    interrupts only need to be disabled on start up around this code, not when
    modules are being loaded.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If a function is set to be traced by the set_graph_function, but the
    option funcgraph-irqs is zero, and the traced function happens to be
    called from a interrupt, it will not be traced.

    The point of funcgraph-irqs is to not trace interrupts when we are
    preempted by an irq, not to not trace functions we want to trace that
    happen to be *in* a irq.

    Luckily the current->trace_recursion element is perfect to add a flag
    to help us be able to trace functions within an interrupt even when
    we are not tracing interrupts that preempt the trace.

    Reported-by: Heiko Carstens
    Tested-by: Heiko Carstens
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • I triggered a triple fault with gcc 4.5.1 because it did not
    honor the inline annotation to arch_local_save_flags() function
    and that function was added to the pool of functions traced by
    the function tracer.

    When preempt_schedule() called arch_local_save_flags() (called
    by irqs_disabled()), it was traced, but the first thing the
    function tracer does is disable preemption. When it enables
    preemption, the NEED_RESCHED flag will not have been cleared and
    the preemption check will trigger the call to preempt_schedule()
    again.

    Although the dynamic function tracer crashed immediately, the
    static version of the function tracer (CONFIG_DYNAMIC_FTRACE is
    not set) actually was able to show where the problem was.

    swapper-1 3.N.. 103885us : arch_local_save_flags for this file to include the notrace
    tag.

    Signed-off-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.org
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

05 Jul, 2011

2 commits

  • …rostedt/linux-2.6-trace into perf/core

    Ingo Molnar
     
  • Add an option to perf report/annotate/script to specify which
    CPUs to operate on. This enables us to take a single system wide
    profile and analyse each CPU (or group of CPUs) in isolation.

    This was useful when profiling a multiprocess workload where the
    bottleneck was on one CPU but this was hidden in the overall
    profile. Per process and per thread breakdowns didn't help
    because multiple processes were running on each CPU and no
    single process consumed an entire CPU.

    The patch converts the list of CPUs returned by cpu_map__new
    into a bitmap for fast lookup. I wanted to use -C to be
    consistent with perf top/record/stat, but unfortunately perf
    report already uses -C .

    v2: Incorporate suggestions from David Ahern:
    - Added -c to perf script
    - Check that SAMPLE_CPU is set when -c is used
    - Update documentation

    v3: Create perf_session__cpu_bitmap()

    Signed-off-by: Anton Blanchard
    Acked-by: David Ahern
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Link: http://lkml.kernel.org/r/20110704215750.11647eb9@kryten
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

04 Jul, 2011

2 commits


03 Jul, 2011

6 commits

  • rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
    in order to restore it later in ret_from_intr.

    It is convenient because we save its value in the irq regs
    and it's easily restored using the leave instruction.

    However this is a kind of abuse of the frame pointer which
    role is to help unwinding the kernel by chaining frames
    together, each node following the return address to the
    previous frame.

    But although we are breaking the frame by changing the stack
    pointer, there is no preceding return address before the new
    frame. Hence using the frame pointer to link the two stacks
    breaks the stack unwinders that find a random value instead of
    a return address here.

    There is no workaround that can work in every case. We are using
    the fixup_bp_irq_link() function to dereference that abused frame
    pointer in the case of non nesting interrupt (which means stack
    changed).
    But that doesn't fix the case of interrupts that don't change the
    stack (but we still have the unconditional frame link), which is
    the case of hardirq interrupting softirq. We have no way to detect
    this transition so the frame irq link is considered as a real frame
    pointer and the return address is dereferenced but it is still a
    spurious one.

    There are two possible results of this: either the spurious return
    address, a random stack value, luckily belongs to the kernel text
    and then the unwinding can continue and we just have a weird entry
    in the stack trace. Or it doesn't belong to the kernel text and
    unwinding stops there.

    This is the reason why stacktraces (including perf callchains) on
    irqs that interrupted softirqs don't work very well.

    To solve this, we don't save the old stack pointer on rbp anymore
    but we save it to a scratch register that we push on the new
    stack and that we pop back later on irq return.

    This preserves the whole frame chain without spurious return addresses
    in the middle and drops the need for the horrid fixup_bp_irq_link()
    workaround.

    And finally irqs that interrupt softirq are sanely unwinded.

    Before:

    99.81% perf [kernel.kallsyms] [k] perf_pending_event
    |
    --- perf_pending_event
    irq_work_run
    smp_irq_work_interrupt
    irq_work_interrupt
    |
    |--41.60%-- __read
    | |
    | |--99.90%-- create_worker
    | | bench_sched_messaging
    | | cmd_bench
    | | run_builtin
    | | main
    | | __libc_start_main
    | --0.10%-- [...]

    After:

    1.64% swapper [kernel.kallsyms] [k] perf_pending_event
    |
    --- perf_pending_event
    irq_work_run
    smp_irq_work_interrupt
    irq_work_interrupt
    |
    |--95.00%-- arch_irq_work_raise
    | irq_work_queue
    | __perf_event_overflow
    | perf_swevent_overflow
    | perf_swevent_event
    | perf_tp_event
    | perf_trace_softirq
    | __do_softirq
    | call_softirq
    | do_softirq
    | irq_exit
    | |
    | |--73.68%-- smp_apic_timer_interrupt
    | | apic_timer_interrupt
    | | |
    | | |--96.43%-- amd_e400_idle
    | | | cpu_idle
    | | | start_secondary

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jan Beulich

    Frederic Weisbecker
     
  • The unwinder backlink in interrupt entry is very useless.
    It's actually not part of the stack frame chain and thus is
    never used.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jan Beulich

    Frederic Weisbecker
     
  • Just for clarity in the code. Have a first block that handles
    the frame pointer and a separate one that handles pt_regs
    pointer and its use.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jan Beulich

    Frederic Weisbecker
     
  • The save_regs function that saves the regs on low level
    irq entry is complicated because of the fact it changes
    its stack in the middle and also because it manipulates
    data allocated in the caller frame and accesses there
    are directly calculated from callee rsp value with the
    return address in the middle of the way.

    This complicates the static stack offsets calculation and
    require more dynamic ones. It also needs a save/restore
    of the function's return address.

    To simplify and optimize this, turn save_regs() into a
    macro.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jan Beulich

    Frederic Weisbecker
     
  • When regs are passed to dump_stack(), we fetch the frame
    pointer from the regs but the stack pointer is taken from
    the current frame.

    Thus the frame and stack pointers may not come from the same
    context. For example this can result in the unwinder to
    think the context is in irq, due to the current value of
    the stack, but the frame pointer coming from the regs points
    to a frame from another place. It then tries to fix up
    the irq link but ends up dereferencing a random frame
    pointer that doesn't belong to the irq stack:

    [ 9131.706906] ------------[ cut here ]------------
    [ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
    [ 9131.707003] Hardware name: AMD690VM-FMH
    [ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
    [ 9131.707003] Modules linked in:
    [ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
    [ 9131.707003] Call Trace:
    [ 9131.707003] [] warn_slowpath_common+0x7a/0xb0
    [ 9131.707003] [] warn_slowpath_fmt+0x41/0x50
    [ 9131.707003] [] ? bad_to_user+0x6d/0x10be
    [ 9131.707003] [] dump_trace+0x2aa/0x330
    [ 9131.707003] [] ? native_sched_clock+0x13/0x50
    [ 9131.707003] [] perf_callchain_kernel+0x54/0x70
    [ 9131.707003] [] perf_prepare_sample+0x19f/0x2a0
    [ 9131.707003] [] __perf_event_overflow+0x16c/0x290
    [ 9131.707003] [] ? __perf_event_overflow+0x130/0x290
    [ 9131.707003] [] ? native_sched_clock+0x13/0x50
    [ 9131.707003] [] ? sched_clock+0x9/0x10
    [ 9131.707003] [] ? T.375+0x15/0x90
    [ 9131.707003] [] ? trace_hardirqs_on_caller+0x64/0x180
    [ 9131.707003] [] ? trace_hardirqs_off+0xd/0x10
    [ 9131.707003] [] perf_event_overflow+0x14/0x20
    [ 9131.707003] [] perf_swevent_hrtimer+0x11c/0x130
    [ 9131.707003] [] ? error_exit+0x51/0xb0
    [ 9131.707003] [] __run_hrtimer+0x83/0x1e0
    [ 9131.707003] [] ? perf_event_overflow+0x20/0x20
    [ 9131.707003] [] hrtimer_interrupt+0x106/0x250
    [ 9131.707003] [] ? trace_hardirqs_off_thunk+0x3a/0x3c
    [ 9131.707003] [] smp_apic_timer_interrupt+0x53/0x90
    [ 9131.707003] [] apic_timer_interrupt+0x13/0x20
    [ 9131.707003] [] ? error_exit+0x51/0xb0
    [ 9131.707003] [] ? error_exit+0x4c/0xb0
    [ 9131.707003] ---[ end trace b2560d4876709347 ]---

    Fix this by simply taking the stack pointer from regs->sp
    when regs are provided.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • In order to prepare for fetching the stack pointer from the
    regs when possible in dump_trace() instead of taking the
    local one, save the current stack pointer in perf live regs saving.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     

01 Jul, 2011

13 commits

  • The patch a8b0ca17b80e ("perf: Remove the nmi parameter from the swevent
    and overflow interface") missed a spot in the ppc hw_breakpoint code,
    fix this up so things compile again.

    Reported-by: Ingo Molnar
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-09pfip95g88s70iwkxu6nnbt@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Previously, when you want perf-stat to output the statistics in
    csv mode, no information of the noise will be printed out.

    For example right now we output this --repeat information:

    ./perf stat -r3 -x, sleep 1
    1.164789,task-clock
    8,context-switches
    0,CPU-migrations
    219,page-faults
    3337800,cycles

    With this patch, the output will be appended with an additional
    entry for the noise value:

    ./perf stat -r3 -x, sleep 1
    1.164789,task-clock,3.75%
    8,context-switches,75.00%
    0,CPU-migrations,100.00%
    219,page-faults,0.00%
    3337800,cycles,3.36%

    Signed-off-by: Zhengyu He
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: Venkatesh Pallipadi
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com
    Signed-off-by: Ingo Molnar

    Zhengyu He
     
  • …ic/random-tracing into perf/core

    Ingo Molnar
     
  • KVM needs one-shot samples, since a PMC programmed to -X will fire after X
    events and then again after 2^40 events (i.e. variable period).

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • The v1 PMU does not have any fixed counters. Using the v2 constraints,
    which do have fixed counters, causes an additional choice to be present
    in the weight calculation, but not when actually scheduling the event,
    leading to an event being not scheduled at all.

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-3-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • The perf_event overflow handler does not receive any caller-derived
    argument, so many callers need to resort to looking up the perf_event
    in their local data structure. This is ugly and doesn't scale if a
    single callback services many perf_events.

    Fix by adding a context parameter to perf_event_create_kernel_counter()
    (and derived hardware breakpoints APIs) and storing it in the perf_event.
    The field can be accessed from the callback as event->overflow_handler_context.
    All callers are updated.

    Signed-off-by: Avi Kivity
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • Add a NODE level to the generic cache events which is used to measure
    local vs remote memory accesses. Like all other cache events, an
    ACCESS is HIT+MISS, if there is no way to distinguish between reads
    and writes do reads only etc..

    The below needs filling out for !x86 (which I filled out with
    unsupported events).

    I'm fairly sure ARM can leave it like that since it doesn't strike me as
    an architecture that even has NUMA support. SH might have something since
    it does appear to have some NUMA bits.

    Sparc64, PowerPC and MIPS certainly want a good look there since they
    clearly are NUMA capable.

    Signed-off-by: Peter Zijlstra
    Cc: David Miller
    Cc: Anton Blanchard
    Cc: David Daney
    Cc: Deng-Cheng Zhu
    Cc: Paul Mundt
    Cc: Will Deacon
    Cc: Robert Richter
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since the OFFCORE registers are fully symmetric, try the other one
    when the specified one is already in use.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1306141897.18455.8.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • This patch adds Intel Sandy Bridge offcore_response support by
    providing the low-level constraint table for those events.

    On Sandy Bridge, there are two offcore_response events. Each uses
    its own dedictated extra register. But those registers are NOT shared
    between sibling CPUs when HT is on unlike Nehalem/Westmere. They are
    always private to each CPU. But they still need to be controlled within
    an event group. All events within an event group must use the same
    value for the extra MSR. That's not controlled by the second patch in
    this series.

    Furthermore on Sandy Bridge, the offcore_response events have NO
    counter constraints contrary to what the official documentation
    indicates, so drop the events from the contraint table.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110606145712.GA7304@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • The validate_group() function needs to validate events with
    extra shared regs. Within an event group, only events with
    the same value for the extra reg can co-exist. This was not
    checked by validate_group() because it was missing the
    shared_regs logic.

    This patch changes the allocation of the fake cpuc used for
    validation to also point to a fake shared_regs structure such
    that group events be properly testing.

    It modifies __intel_shared_reg_get_constraints() to use
    spin_lock_irqsave() to avoid lockdep issues.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110606145708.GA7279@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch improves the code managing the extra shared registers
    used for offcore_response events on Intel Nehalem/Westmere. The
    idea is to use static allocation instead of dynamic allocation.
    This simplifies greatly the get and put constraint routines for
    those events.

    The patch also renames per_core to shared_regs because the same
    data structure gets used whether or not HT is on. When HT is
    off, those events still need to coordination because they use
    a extra MSR that has to be shared within an event group.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110606145703.GA7258@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • Since only samples call perf_output_sample() its much saner (and more
    correct) to put the sample logic in there than in the
    perf_output_begin()/perf_output_end() pair.

    Saves a useless argument, reduces conditionals and shrinks
    struct perf_output_handle, win!

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The nmi parameter indicated if we could do wakeups from the current
    context, if not, we would set some state and self-IPI and let the
    resulting interrupt do the wakeup.

    For the various event classes:

    - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
    - tracepoint: nmi=0; since tracepoint could be from NMI context.
    - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

    As one can see, there is very little nmi=1 usage, and the down-side of
    not using it is that on some platforms some software events can have a
    jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

    The up-side however is that we can remove the nmi parameter and save a
    bunch of conditionals in fast paths.

    Signed-off-by: Peter Zijlstra
    Cc: Michael Cree
    Cc: Will Deacon
    Cc: Deng-Cheng Zhu
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Frederic Weisbecker
    Cc: Jason Wessel
    Cc: Don Zickus
    Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra