09 Feb, 2013

12 commits

  • uprobe_perf_open/close call the costly uprobe_apply() every time,
    we can avoid it if:

    - "nr_systemwide != 0" is not changed.

    - There is another process/thread with the same ->mm.

    - copy_proccess() does inherit_event(). dup_mmap() preserves the
    inserted breakpoints.

    - event->attr.enable_on_exec == T, we can rely on uprobe_mmap()
    called by exec/mmap paths.

    - tp_target is exiting. Only _close() checks PF_EXITING, I don't
    think TRACE_REG_PERF_OPEN can hit the dying task too often.

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     
  • Change uprobe_trace_func() and uprobe_perf_func() to return "int". Change
    uprobe_dispatcher() to return "trace_ret | perf_ret" although this is not
    needed, currently TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive.

    The only functional change is that uprobe_perf_func() checks the filtering
    too and returns UPROBE_HANDLER_REMOVE if nobody wants to trace current.

    Testing:

    # perf probe -x /lib/libc.so.6 syscall

    # perf record -e probe_libc:syscall -i perl -e 'fork; syscall -1 for 1..10; wait'

    # perf report --show-total-period
    100.00% 10 perl libc-2.8.so [.] syscall

    Before this patch:

    # cat /sys/kernel/debug/tracing/uprobe_profile
    /lib/libc.so.6 syscall 20

    A child process doesn't have a counter, but still it hits this breakoint
    "copied" by dup_mmap().

    After the patch:

    # cat /sys/kernel/debug/tracing/uprobe_profile
    /lib/libc.so.6 syscall 11

    The child process hits this int3 only once and does unapply_uprobe().

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     
  • Finally implement uprobe_perf_filter() which checks ->nr_systemwide or
    ->perf_events to figure out whether we need to insert the breakpoint.

    uprobe_perf_open/close are changed to do uprobe_apply(true/false) when
    the new perf event comes or goes away.

    Note that currently this is very suboptimal:

    - uprobe_register() called by TRACE_REG_PERF_REGISTER becomes a
    heavy nop, consumer->filter() always returns F at this stage.

    As it was already discussed we need uprobe_register_only() to
    avoid the costly register_for_each_vma() when possible.

    - uprobe_apply() is oftenly overkill. Unless "nr_systemwide != 0"
    changes we need uprobe_apply_mm(), unapply_uprobe() is almost
    what we need.

    - uprobe_apply() can be simply avoided sometimes, see the next
    changes.

    Testing:

    # perf probe -x /lib/libc.so.6 syscall

    # perl -e 'syscall -1 while 1' &
    [1] 530

    # perf record -e probe_libc:syscall perl -e 'syscall -1 for 1..10; sleep 1'

    # perf report --show-total-period
    100.00% 10 perl libc-2.8.so [.] syscall

    Before this patch:

    # cat /sys/kernel/debug/tracing/uprobe_profile
    /lib/libc.so.6 syscall 79291

    A huge ->nrhit == 79291 reflects the fact that the background process
    530 constantly hits this breakpoint too, even if doesn't contribute to
    the output.

    After the patch:

    # cat /sys/kernel/debug/tracing/uprobe_profile
    /lib/libc.so.6 syscall 10

    This shows that only the target process was punished by int3.

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     
  • Introduce "struct trace_uprobe_filter" which records the "active"
    perf_event's attached to ftrace_event_call. For the start we simply
    use list_head, we can optimize this later if needed. For example, we
    do not really need to record an event with ->parent != NULL, we can
    rely on parent->child_list. And we can certainly do some optimizations
    for the case when 2 events have the same ->tp_target or tp_target->mm.

    Change trace_uprobe_register() to process TRACE_REG_PERF_OPEN/CLOSE
    and add/del this perf_event to the list.

    We can probably avoid any locking, but lets start with the "obvioulsy
    correct" trace_uprobe_filter->rwlock which protects everything.

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     
  • Move tu->nhit++ from uprobe_trace_func() to uprobe_dispatcher().

    ->nhit counts how many time we hit the breakpoint inserted by this
    uprobe, we do not want to loose this info if uprobe was enabled by
    sys_perf_event_open().

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju

    Oleg Nesterov
     
  • trace_uprobe->consumer and "struct uprobe_trace_consumer" add the
    unnecessary indirection and complicate the code for no reason.

    This patch simply embeds uprobe_consumer into "struct trace_uprobe",
    all other changes only fix the compilation errors.

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     
  • probe_event_enable/disable() check tu->consumer != NULL to avoid the
    wrong uprobe_register/unregister().

    We are going to kill this pointer and "struct uprobe_trace_consumer",
    so we add the new helper, is_trace_uprobe_enabled(), which can rely
    on TP_FLAG_TRACE/TP_FLAG_PROFILE instead.

    Note: the current logic doesn't look optimal, it is not clear why
    TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive, we will probably
    change this later.

    Also kill the unused TP_FLAG_UPROBE.

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju

    Oleg Nesterov
     
  • probe_event_enable/disable() check tu->inode != NULL at the start.
    This is ugly, if igrab() can fail create_trace_uprobe() should not
    succeed and "postpone" the failure.

    And S_ISREG(inode->i_mode) check added by d24d7dbf is not safe.

    Note: alloc_uprobe() should probably check igrab() != NULL as well.

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju

    Oleg Nesterov
     
  • probe_event_enable() does uprobe_register() and only after that sets
    utc->tu and tu->consumer/flags. This can race with uprobe_dispatcher()
    which can miss these assignments or see them out of order. Nothing
    really bad can happen, but this doesn't look clean/safe.

    And this does not allow to use uprobe_consumer->filter() we are going
    to add, it is called by uprobe_register() and it needs utc->tu.

    Change this code to initialize everything before uprobe_register(), and
    reset tu->consumer/flags if it fails. We can't race with event_disable(),
    the caller holds event_mutex, and if we could the code would be wrong
    anyway.

    In fact I think uprobe_trace_consumer should die, it buys nothing but
    complicates the code. We can simply add uprobe_consumer into trace_uprobe.

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju

    Oleg Nesterov
     
  • create_trace_uprobe() does kern_path() to find ->d_inode, but forgets
    to do path_put(). We can do this right after igrab().

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju

    Oleg Nesterov
     
  • Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
    what consumer->handler() needs but uprobe_get_swbp_addr() is not
    exported.

    This also simplifies the code and makes it more consistent across
    the supported architectures. handle_swbp() becomes the only caller
    of uprobe_get_swbp_addr().

    Signed-off-by: Oleg Nesterov
    Acked-by: Ananth N Mavinakayanahalli

    Oleg Nesterov
     
  • uprobe_consumer->filter() is pointless in its current form, kill it.

    We will add it back, but with the different signature/semantics. Perhaps
    we will even re-introduce the callsite in handler_chain(), but not to
    just skip uc->handler().

    Signed-off-by: Oleg Nesterov
    Acked-by: Srikar Dronamraju

    Oleg Nesterov
     

22 Jan, 2013

1 commit

  • Without this patch, we can register a uprobe event for a directory.
    Enabling such a uprobe event would anyway fail.

    Example:
    $ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events

    However dirctories cannot be valid targets for uprobe.
    Hence verify if the target is a regular file during the probe
    registration.

    Link: http://lkml.kernel.org/r/20130103004212.690763002@goodmis.org

    Cc: Namhyung Kim
    Signed-off-by: Jovi Zhang
    Acked-by: Srikar Dronamraju
    [ cleaned up whitespace and removed redundant IS_DIR() check ]
    Signed-off-by: Steven Rostedt

    Jovi Zhang
     

18 Dec, 2012

1 commit


08 Dec, 2012

1 commit


01 Nov, 2012

1 commit


25 Oct, 2012

1 commit


31 Jul, 2012

1 commit

  • A few events are interesting not only for a current task.
    For example, sched_stat_* events are interesting for a task
    which wakes up. For this reason, it will be good if such
    events will be delivered to a target task too.

    Now a target task can be set by using __perf_task().

    The original idea and a draft patch belongs to Peter Zijlstra.

    I need these events for profiling sleep times. sched_switch is used for
    getting callchains and sched_stat_* is used for getting time periods.
    These events are combined in user space, then it can be analyzed by
    perf tools.

    Inspired-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Arun Sharma
    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     

07 May, 2012

1 commit

  • Implements trace_event support for uprobes. In its current form
    it can be used to put probes at a specified offset in a file and
    dump the required registers when the code flow reaches the
    probed address.

    The following example shows how to dump the instruction pointer
    and %ax a register at the probed text address. Here we are
    trying to probe zfree in /bin/zsh:

    # cd /sys/kernel/debug/tracing/
    # cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
    00400000-0048a000 r-xp 00000000 08:03 130904 /bin/zsh
    # objdump -T /bin/zsh | grep -w zfree
    0000000000446420 g DF .text 0000000000000012 Base
    zfree # echo 'p /bin/zsh:0x46420 %ip %ax' > uprobe_events
    # cat uprobe_events
    p:uprobes/p_zsh_0x46420 /bin/zsh:0x0000000000046420
    # echo 1 > events/uprobes/enable
    # sleep 20
    # echo 0 > events/uprobes/enable
    # cat trace
    # tracer: nop
    #
    # TASK-PID CPU# TIMESTAMP FUNCTION
    # | | | | |
    zsh-24842 [006] 258544.995456: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
    zsh-24842 [007] 258545.000270: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
    zsh-24842 [002] 258545.043929: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
    zsh-24842 [004] 258547.046129: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79

    Signed-off-by: Srikar Dronamraju
    Acked-by: Steven Rostedt
    Acked-by: Masami Hiramatsu
    Cc: Linus Torvalds
    Cc: Ananth N Mavinakayanahalli
    Cc: Jim Keniston
    Cc: Linux-mm
    Cc: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Christoph Hellwig
    Cc: Arnaldo Carvalho de Melo
    Cc: Anton Arapov
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120411103043.GB29437@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Srikar Dronamraju