21 Jan, 2010

1 commit

  • Anton reported that perf record kept receiving events even after calling
    ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
    events didn't respect the disabled state and kept flowing in.

    Reported-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    Tested-by: Anton Blanchard
    LKML-Reference:
    CC: stable@kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

01 Jan, 2010

1 commit


31 Dec, 2009

1 commit

  • Liming found a NULL deref when a task has a perf context but no
    counters when it forks.

    This can occur in two cases, a race during construction where
    the fork hits after installing the context but before the first
    counter gets inserted, or more reproducably, a fork after the
    last counter is closed (which leaves the context around).

    Reported-by: Wang Liming
    Signed-off-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    CC:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

23 Dec, 2009

1 commit

  • It seems a couple places such as arch/ia64/kernel/perfmon.c and
    drivers/infiniband/core/uverbs_main.c could use anon_inode_getfile()
    instead of a private pseudo-fs + alloc_file(), if only there were a way
    to get a read-only file. So provide this by having anon_inode_getfile()
    create a read-only file if we pass O_RDONLY in flags.

    Signed-off-by: Roland Dreier
    Signed-off-by: Al Viro

    Roland Dreier
     

20 Dec, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf session: Make events_stats u64 to avoid overflow on 32-bit arches
    hw-breakpoints: Fix hardware breakpoints -> perf events dependency
    perf events: Dont report side-band events on each cpu for per-task-per-cpu events
    perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker
    perf events, x86/stacktrace: Make stack walking optional
    perf events: Remove unused perf_counter.h header file
    perf probe: Check new event name
    kprobe-tracer: Check new event/group name
    perf probe: Check whether debugfs path is correct
    perf probe: Fix libdwarf include path for Debian

    Linus Torvalds
     

17 Dec, 2009

4 commits

  • Acme noticed that his FORK/MMAP numbers were inflated by about
    the same factor as his cpu-count.

    This led to the discovery of a few more sites that need to
    respect the event->cpu filter.

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Also, we want to check against nr_cpu_ids, not num_possible_cpus().
    The latter works, but the correct bounds check is < nr_cpu_ids.

    Signed-off-by: Rusty Russell
    To: Thomas Gleixner

    Rusty Russell
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (52 commits)
    perf record: Use per-task-per-cpu events for inherited events
    perf record: Properly synchronize child creation
    perf events: Allow per-task-per-cpu counters
    perf diff: Percent calcs should use double values
    perf diff: Change the default sort order to "dso,symbol"
    perf diff: Use perf_session__fprintf_hists just like 'perf record'
    perf report: Fix cut'n'paste error recently introduced
    perf session: Move perf report specific hits out of perf_session__fprintf_hists
    perf tools: Move hist entries printing routines from perf report
    perf report: Generalize perf_session__fprintf_hists()
    perf symbols: Move symbol filtering to event__preprocess_sample()
    perf symbols: Adopt the strlists for dso, comm
    perf symbols: Make symbol_conf global
    perf probe: Fix to show which probe point is not found
    perf probe: Check symbols in symtab/kallsyms
    perf probe: Check build-id of vmlinux
    perf probe: Reject second attempt of adding same-name event
    perf probe: Support event name for --add option
    perf probe: Add glob matching support on --del
    perf probe: Use strlist__for_each macros in probe-event.c
    ...

    Linus Torvalds
     
  • In order to allow for per-task-per-cpu counters, useful for
    scalability when profiling task hierarchies, we allow installing
    events with event->cpu != -1 in task contexts.

    __perf_event_sched_in() already skips events where ->cpu
    mis-matches the current cpu, fix up __perf_install_in_context()
    and __perf_event_enable() to also respect this filter.

    This does lead to vary hard to interpret enabled/running times
    for such counters, but I don't see a simple solution for that.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: fweisbec@gmail.com
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

16 Dec, 2009

1 commit

  • The miss-alignment of bp_addr created a 32bit hole, causing
    different structure packings on 32 and 64 bit machines.

    Fix that by moving __reserve_2 into that hole.

    Further, remove the useless struct and redundant __bp_reserve
    muck.

    Signed-off-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Dec, 2009

2 commits

  • It is quite legitimate for CPUs to be numbered sparsely, meaning
    that it possible for an online CPU to have a number which is
    greater than the total count of possible CPUs.

    Currently find_get_context() has a sanity check on the cpu
    number where it checks it against num_possible_cpus(). This
    test can fail for a legitimate cpu number if the
    cpu_possible_mask is sparsely populated.

    This fixes the problem by checking the CPU number against
    nr_cpumask_bits instead, since that is the appropriate check to
    ensure that the cpu number is same to pass to cpu_isset()
    subsequently.

    Reported-by: Michael Neuling
    Signed-off-by: Paul Mackerras
    Tested-by: Michael Neuling
    Acked-by: Peter Zijlstra
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Convert locks which cannot be sleeping locks in preempt-rt to
    raw_spinlocks.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Acked-by: Ingo Molnar

    Thomas Gleixner
     

12 Dec, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
    x86, perf events: Check if we have APIC enabled
    perf_event: Fix variable initialization in other codepaths
    perf kmem: Fix unused argument build warning
    perf symbols: perf_header__read_build_ids() offset'n'size should be u64
    perf symbols: dsos__read_build_ids() should read both user and kernel buildids
    perf tools: Align long options which have no short forms
    perf kmem: Show usage if no option is specified
    sched: Mark sched_clock() as notrace
    perf sched: Add max delay time snapshot
    perf tools: Correct size given to memset
    perf_event: Fix perf_swevent_hrtimer() variable initialization
    perf sched: Fix for getting task's execution time
    tracing/kprobes: Fix field creation's bad error handling
    perf_event: Cleanup for cpu_clock_perf_event_update()
    perf_event: Allocate children's perf_event_ctxp at the right time
    perf_event: Clean up __perf_event_init_context()
    hw-breakpoints: Modify breakpoints without unregistering them
    perf probe: Update perf-probe document
    perf probe: Support --del option
    trace-kprobe: Support delete probe syntax
    ...

    Linus Torvalds
     

11 Dec, 2009

1 commit


10 Dec, 2009

1 commit

  • fix:

    [] ? printk+0x1d/0x24
    [] ? perf_prepare_sample+0x269/0x280
    [] warn_slowpath_common+0x71/0xd0
    [] ? perf_prepare_sample+0x269/0x280
    [] warn_slowpath_null+0x1a/0x20
    [] perf_prepare_sample+0x269/0x280
    [] ? cpu_clock+0x53/0x90
    [] __perf_event_overflow+0x2a8/0x300
    [] perf_event_overflow+0x1b/0x30
    [] perf_swevent_hrtimer+0x7f/0x120

    This is because 'data.raw' variable not initialize.

    Signed-off-by: Xiao Guangrong
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     

09 Dec, 2009

4 commits

  • Using atomic64_xchg() instead of atomic64_read() and
    atomic64_set().

    Signed-off-by: Xiao Guangrong
    Reviewed-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     
  • In current code, children task will allocate memory for
    'child->perf_event_ctxp' if the parent is counted, we can
    do it only if the parent allowed children inherit it.

    It can save memory and reduce overhead.

    Signed-off-by: Xiao Guangrong
    Reviewed-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     
  • Clean up the code a bit:

    - define 'perf_cpu_context' variable with 'static'
    - use kzalloc() instead of kmalloc() and memset()

    Signed-off-by: Xiao Guangrong
    Reviewed-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     
  • Currently, when ptrace needs to modify a breakpoint, like disabling
    it, changing its address, type or len, it calls
    modify_user_hw_breakpoint(). This latter will perform the heavy and
    racy task of unregistering the old breakpoint and registering a new
    one.

    This is racy as someone else might steal the reserved breakpoint
    slot under us, which is undesired as the breakpoint is only
    supposed to be modified, sometimes in the middle of a debugging
    workflow. We don't want our slot to be stolen in the middle.

    So instead of unregistering/registering the breakpoint, just
    disable it while we modify its breakpoint fields and re-enable it
    after if necessary.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Prasad
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

08 Dec, 2009

1 commit


06 Dec, 2009

1 commit

  • struct perf_event::event callback was called when a breakpoint
    triggers. But this is a rather opaque callback, pretty
    tied-only to the breakpoint API and not really integrated into perf
    as it triggers even when we don't overflow.

    We prefer to use overflow_handler() as it fits into the perf events
    rules, being called only when we overflow.

    Reported-by: Peter Zijlstra
    Signed-off-by: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: "K. Prasad"

    Frederic Weisbecker
     

04 Dec, 2009

1 commit

  • That is "success", "unknown", "through", "performance", "[re|un]mapping"
    , "access", "default", "reasonable", "[con]currently", "temperature"
    , "channel", "[un]used", "application", "example","hierarchy", "therefore"
    , "[over|under]flow", "contiguous", "threshold", "enough" and others.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Jiri Kosina

    André Goddard Rosa
     

02 Dec, 2009

1 commit

  • In the CONFIG_PERF_USE_VMALLOC case, perf_mmap_data_free() only
    schedules the cleanup of the perf_mmap_data struct. In that
    case we have to wait until the work has been done before we free
    data.

    Signed-off-by: Kristian Høgsberg
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Kristian Høgsberg
     

01 Dec, 2009

1 commit

  • In current code in perf_swevent_hrtimer(), data.period is not
    initialized, The result is obvious wrong:

    # ./perf record -f -e cpu-clock make
    # ./perf report
    # Samples: 1740
    #
    # Overhead Command ......
    # ........ ........ ..........................................
    #
    1025422183050275328.00% sh libc-2.9.90.so ...
    1025422183050275328.00% perl libperl.so ...
    1025422168240043264.00% perl [kernel] ...
    1025422030011210752.00% perl [kernel] ...

    Signed-off-by: Xiao Guangrong
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     

27 Nov, 2009

1 commit

  • When a pinned group cannot be scheduled it goes into error state.

    Normally a group cannot go out of error state without being
    explicitly re-enabled or disabled. There was a bug in per-thread
    mode, whereby upon termination of the thread, the group would
    transition from error to off leading to bogus counts and timing
    information returned by read().

    Fix it by clearing the error state.

    Signed-off-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: perfmon2-devel@lists.sourceforge.net
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

26 Nov, 2009

2 commits

  • bp_perf_event_destroy() is unused in its off-case version, let's
    remove it to fix the following warning reported by Stephen
    Rothwell in linux-next:

    kernel/perf_event.c:4306: warning: 'bp_perf_event_destroy' defined but not used

    Reported-by: Stephen Rothwell
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • In fail case, perf_event_create_kernel_counter() returns NULL
    instead of an error, which doesn't help us to inform the user
    about the origin of the problem from the outer most callers.
    Often we can just return -EINVAL, which doesn't help anyone when
    it's eventually about a memory allocation failure.

    Then, this patch makes perf_event_create_kernel_counter() always
    return a detailed error code.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Prasad
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

25 Nov, 2009

1 commit

  • Commit 4ed7c92d68a5387ba5f7030dc76eab03558e27f5
    (perf_events: Undo some recursion damage) has introduced a bad
    reference counting of the recursion context. putting the context
    behaves like getting it, dropping every software/trace events
    after the first one in a context.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Arjan van de Ven
    Cc: Li Zefan
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

24 Nov, 2009

2 commits

  • When using an event group, the value and id for non leaders events
    were wrong due to invalid offset into the outgoing buffer.

    Signed-off-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Cc: paulus@samba.org
    Cc: perfmon2-devel@lists.sourceforge.net
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • Add the remaining necessary bits to support breakpoints created
    through perf syscall.

    We don't use the software counter interface as:

    - We don't need to check against recursion, this is already done
    in hardware breakpoints arch level.

    - We already know the perf event we are dealing with when the
    event is to be committed.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Prasad
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

23 Nov, 2009

8 commits

  • It is quite possible to call update_event_times() on a context
    that isn't actually running and thereby confuse the thing.

    perf stat was reporting !100% scale values for software counters
    (2e2af50b perf_events: Disable events when we detach them,
    solved the worst of that, but there was still some left).

    The thing that happens is that because we are not self-reaping
    (we have a caring parent) there is a time between the last
    schedule (out) and having do_exit() called which will detach the
    events.

    This period would be accounted as enabled,!running because the
    event->state==INACTIVE, even though !event->ctx->is_active.

    Similar issues could have been observed by calling read() on a
    event while the attached task was not scheduled in.

    Solve this by teaching update_event_times() about
    ctx->is_active.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Make perf_swevent_get_recursion_context return a context number
    and disable preemption.

    This could be used to remove the IRQ disable from the trace bit
    and index the per-cpu buffer with.

    Signed-off-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Move the update_event_times() call in __perf_event_exit_task()
    into list_del_event() because that holds the proper lock
    (ctx->lock) and seems a more natural place to do the last time
    update.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • It appeared we did call update_event_times() on exit, but we
    failed to update the context time, which renders the former
    moot.

    Locking is a bit iffy, we call update_event_times under
    ctx->mutex instead of ctx->lock - the next patch fixes this.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • If we leave the event in STATE_INACTIVE, any read of the event
    after the detach will increase the running count but not the
    enabled count and cause funny scaling artefacts.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We had two almost identical functions, avoid the duplication.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The structure init creates a bit memcpy, which shows
    up big time in perf annotate output:

    : ffffffff810a859d :
    1.68 : ffffffff810a859d: 55 push %rbp
    1.69 : ffffffff810a859e: 41 89 fa mov %edi,%r10d
    0.01 : ffffffff810a85a1: 49 89 c9 mov %rcx,%r9
    0.00 : ffffffff810a85a4: 31 c0 xor %eax,%eax
    1.71 : ffffffff810a85a6: b9 16 00 00 00 mov $0x16,%ecx
    0.00 : ffffffff810a85ab: 48 89 e5 mov %rsp,%rbp
    0.00 : ffffffff810a85ae: 48 83 ec 60 sub $0x60,%rsp
    1.52 : ffffffff810a85b2: 48 8d 7d a0 lea -0x60(%rbp),%rdi
    85.20 : ffffffff810a85b6: f3 ab rep stos %eax,%es:(%rdi)

    None of the callees depends on the structure being pre-initialized,
    so only initialize ->addr. This gets rid of the memcpy overhead.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

22 Nov, 2009

2 commits

  • Fix:

    ERROR: "perf_swevent_put_recursion_context" [fs/ext4/ext4.ko] undefined!
    ERROR: "perf_swevent_get_recursion_context" [fs/ext4/ext4.ko] undefined!

    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The buffer is first zeroed out by memset(). Then strncpy() is
    used to fill the content. The strncpy() function also pads the
    string till the end of the specified length, which is redundant.
    The strncpy() does not ensures that the string will be properly
    closed with 0. Use strlcpy() instead.

    The semantic match that finds this kind of pattern is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    expression buffer;
    expression size;
    expression str;
    @@
    memset(buffer, 0, size);
    ...
    - strncpy(
    + strlcpy(
    buffer, str, sizeof(buffer)
    );
    @@
    expression buffer;
    expression size;
    expression str;
    @@
    memset(&buffer, 0, size);
    ...
    - strncpy(
    + strlcpy(
    &buffer, str, sizeof(buffer));
    @@
    expression buffer;
    identifier field;
    expression size;
    expression str;
    @@
    memset(buffer, 0, size);
    ...
    - strncpy(
    + strlcpy(
    buffer->field, str, sizeof(buffer->field)
    );
    @@
    expression buffer;
    identifier field;
    expression size;
    expression str;
    @@
    memset(&buffer, 0, size);
    ...
    - strncpy(
    + strlcpy(
    buffer.field, str, sizeof(buffer.field));
    //

    On strncpy() vs strlcpy() see
    http://www.gratisoft.us/todd/papers/strlcpy.html .

    Signed-off-by: Márton Németh
    Cc: Julia Lawall
    Cc: cocci@diku.dk
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Márton Németh