21 Sep, 2009

1 commit


27 Aug, 2009

1 commit

  • Kernel threads don't call syscalls using the sysenter/sysexit
    path. Instead they directly call the sys_* or do_* functions
    that implement the syscalls inside the kernel.

    The current syscall tracepoints only bind the sysenter/sysexit
    path, then it has no effect to trace the kernel thread calls
    to syscalls in that path.
    Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these.

    Actually there is only one case when a kernel thread can reach the
    usual syscall exit tracing path: when we create a kernel thread, the
    child comes to ret_from_fork and is the fork() return is then traced.
    But this information alone is useless, then we don't want to set the
    TIF flags for these threads.

    Kernel threads have task_struct->mm set to NULL.
    (Thanks to Heiko for that hint ;-)
    The idea is then to check the mm field in syscall_regfunc() and
    set the flag accordingly.

    Signed-off-by: Hendrik Brueckner
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Lai Jiangshan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Li Zefan
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: Heiko Carstens
    Cc: Hendrik Brueckner
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Hendrik Brueckner
     

26 Aug, 2009

3 commits

  • It's not strictly correct for the tracepoint reg/unreg callbacks to
    occur when a client is hooking up, because the actual tracepoint may not
    be present yet. This happens to be fine for syscall, since that's in
    the core kernel, but it would cause problems for tracepoints defined in
    a module that hasn't been loaded yet. It also means the reg/unreg has
    to be EXPORTed for any modules to use the tracepoint (as in SystemTap).

    This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces
    DEFINE_TRACE_FN which stores the callbacks in struct tracepoint. The
    callbacks are used now when the active state of the tracepoint changes
    in set_tracepoint & disable_tracepoint.

    This also introduces TRACE_EVENT_FN, so ftrace events can also provide
    registration callbacks if needed.

    Signed-off-by: Josh Stone
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Li Zefan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Lai Jiangshan
    Cc: Paul Mundt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Josh Stone
     
  • The syscall enter/exit tracepoints are only supported on archs that
    HAVE_SYSCALL_TRACEPOINTS, so the declarations should be #ifdef'ed.
    Also, the definition of syscall_regfunc and syscall_unregfunc should
    depend on this same config, rather than the ftrace-specific one.

    Signed-off-by: Josh Stone
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Li Zefan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Josh Stone
     
  • s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g
    s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g

    The syscall enter/exit tracing is no longer specific to just ftrace, so
    they now have names that reflect their tie to tracepoints instead.

    Signed-off-by: Josh Stone
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Li Zefan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Lai Jiangshan
    Cc: Paul Mundt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Josh Stone
     

25 Aug, 2009

1 commit


14 Aug, 2009

1 commit


12 Aug, 2009

1 commit

  • add two tracepoints in syscall exit and entry path, conditioned on
    TIF_SYSCALL_FTRACE. Supports the syscall trace event code.

    Signed-off-by: Jason Baron
    Cc: Lai Jiangshan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Signed-off-by: Frederic Weisbecker

    Jason Baron
     

19 Mar, 2009

2 commits

  • Zero-sized tracepoint sections can occur if tracing is enabled but
    no tracepoint is defined. Do not emit a warning in that case.

    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Jaswinder Singh Rajput
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Change this crash:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] tracepoint_update_probe_range+0x1f/0x9b
    PGD 13d5fb067 PUD 13d688067 PMD 0
    Oops: 0000 [#1] SMP

    To a more debuggable WARN_ONCE().

    Signed-off-by: Jaswinder Singh Rajput
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    [ moved the check outside the lock and added a WARN_ON(). ]
    Signed-off-by: Ingo Molnar

    Jaswinder Singh Rajput
     

16 Nov, 2008

3 commits


03 Nov, 2008

2 commits

  • Impact: add new tracepoint APIs to allow the batched registration of probes

    new APIs separate tracepoint_probe_register(),
    tracepoint_probe_unregister() into 2 steps. The first step of them
    is just update tracepoint_entry, not connect or disconnect.

    this patch introduces tracepoint_probe_update_all() for update all.

    these APIs are very useful for registering lots of probes
    but just updating once. Another very important thing is that
    *_noupdate APIs do not require module_mutex.

    Signed-off-by: Lai Jiangshan
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: simplify implementation

    Now, unused memory is handled by struct tp_probes.

    old code use these three field to handle unused memory.
    struct tracepoint_entry {
    ...
    struct rcu_head rcu;
    void *oldptr;
    unsigned char rcu_pending:1;
    ...
    };

    in this way, unused memory is handled by struct tracepoint_entry.
    it bring reenter bug(it was fixed) and tracepoint.c is filled
    full of ".*rcu.*" code statements. this patch removes all these.

    and:
    rcu_barrier_sched() is removed.
    Do not need regain tracepoints_mutex after tracepoint_update_probes()
    several little cleanup.

    Signed-off-by: Lai Jiangshan
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     

27 Oct, 2008

1 commit

  • Impact: fix kernel crash that can trigger during tracing

    If we try to remove a probe that has not been already registered,
    the tracepoint_entry_remove_probe() function will dereference a NULL
    pointer.

    Check the probe before removing it to avoid crashes.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Mathieu Desnoyers
    Acked-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

14 Oct, 2008

4 commits

  • The tracepoints had the same problem markers did have wrt reentrancy. Apply a
    similar fix using a rcu_barrier after each tracepoint mutex lock.

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers
     
  • Make tracepoints use rcu sched. (cleanup)

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers
     
  • Steven Rostedt suggested:

    | Wouldn't it look nicer to have: (TRACEPOINT_TABLE_SIZE - 1) ?

    Signed-off-by: Mathieu Desnoyers
    Cc: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers
     
  • Implementation of kernel tracepoints. Inspired from the Linux Kernel
    Markers. Allows complete typing verification by declaring both tracing
    statement inline functions and probe registration/unregistration static
    inline functions within the same macro "DEFINE_TRACE". No format string
    is required. See the tracepoint Documentation and Samples patches for
    usage examples.

    Taken from the documentation patch :

    "A tracepoint placed in code provides a hook to call a function (probe)
    that you can provide at runtime. A tracepoint can be "on" (a probe is
    connected to it) or "off" (no probe is attached). When a tracepoint is
    "off" it has no effect, except for adding a tiny time penalty (checking
    a condition for a branch) and space penalty (adding a few bytes for the
    function call at the end of the instrumented function and adds a data
    structure in a separate section). When a tracepoint is "on", the
    function you provide is called each time the tracepoint is executed, in
    the execution context of the caller. When the function provided ends its
    execution, it returns to the caller (continuing from the tracepoint
    site).

    You can put tracepoints at important locations in the code. They are
    lightweight hooks that can pass an arbitrary number of parameters, which
    prototypes are described in a tracepoint declaration placed in a header
    file."

    Addition and removal of tracepoints is synchronized by RCU using the
    scheduler (and preempt_disable) as guarantees to find a quiescent state
    (this is really RCU "classic"). The update side uses rcu_barrier_sched()
    with call_rcu_sched() and the read/execute side uses
    "preempt_disable()/preempt_enable()".

    We make sure the previous array containing probes, which has been
    scheduled for deletion by the rcu callback, is indeed freed before we
    proceed to the next update. It therefore limits the rate of modification
    of a single tracepoint to one update per RCU period. The objective here
    is to permit fast batch add/removal of probes on _different_
    tracepoints.

    Changelog :
    - Use #name ":" #proto as string to identify the tracepoint in the
    tracepoint table. This will make sure not type mismatch happens due to
    connexion of a probe with the wrong type to a tracepoint declared with
    the same name in a different header.
    - Add tracepoint_entry_free_old.
    - Change __TO_TRACE to get rid of the 'i' iterator.

    Masami Hiramatsu :
    Tested on x86-64.

    Performance impact of a tracepoint : same as markers, except that it
    adds about 70 bytes of instructions in an unlikely branch of each
    instrumented function (the for loop, the stack setup and the function
    call). It currently adds a memory read, a test and a conditional branch
    at the instrumentation site (in the hot path). Immediate values will
    eventually change this into a load immediate, test and branch, which
    removes the memory read which will make the i-cache impact smaller
    (changing the memory read for a load immediate removes 3-4 bytes per
    site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
    also saves the d-cache hit).

    About the performance impact of tracepoints (which is comparable to
    markers), even without immediate values optimizations, tests done by
    Hideo Aoki on ia64 show no regression. His test case was using hackbench
    on a kernel where scheduler instrumentation (about 5 events in code
    scheduler code) was added.

    Quoting Hideo Aoki about Markers :

    I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
    tree, which includes several markers for LTTng, using an ia64 server.

    While the immediate trace mark feature isn't implemented on ia64, there
    is no major performance regression. So, I think that we don't have any
    issues to propose merging marker point patches into Linus's tree from
    the viewpoint of performance impact.

    I prepared two kernels to evaluate. The first one was compiled without
    CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

    I downloaded the original hackbench from the following URL:
    http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

    I ran hackbench 5 times in each condition and calculated the average and
    difference between the kernels.

    The parameter of hackbench: every 50 from 50 to 800
    The number of CPUs of the server: 2, 4, and 8

    Below is the results. As you can see, major performance regression
    wasn't found in any case. Even if number of processes increases,
    differences between marker-enabled kernel and marker- disabled kernel
    doesn't increase. Moreover, if number of CPUs increases, the differences
    doesn't increase either.

    Curiously, marker-enabled kernel is better than marker-disabled kernel
    in more than half cases, although I guess it comes from the difference
    of memory access pattern.

    * 2 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 4.811 | 4.872 | +0.061 | +1.27 |
    100 | 9.854 | 10.309 | +0.454 | +4.61 |
    150 | 15.602 | 15.040 | -0.562 | -3.6 |
    200 | 20.489 | 20.380 | -0.109 | -0.53 |
    250 | 25.798 | 25.652 | -0.146 | -0.56 |
    300 | 31.260 | 30.797 | -0.463 | -1.48 |
    350 | 36.121 | 35.770 | -0.351 | -0.97 |
    400 | 42.288 | 42.102 | -0.186 | -0.44 |
    450 | 47.778 | 47.253 | -0.526 | -1.1 |
    500 | 51.953 | 52.278 | +0.325 | +0.63 |
    550 | 58.401 | 57.700 | -0.701 | -1.2 |
    600 | 63.334 | 63.222 | -0.112 | -0.18 |
    650 | 68.816 | 68.511 | -0.306 | -0.44 |
    700 | 74.667 | 74.088 | -0.579 | -0.78 |
    750 | 78.612 | 79.582 | +0.970 | +1.23 |
    800 | 85.431 | 85.263 | -0.168 | -0.2 |
    --------------------------------------------------------------

    * 4 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 2.586 | 2.584 | -0.003 | -0.1 |
    100 | 5.254 | 5.283 | +0.030 | +0.56 |
    150 | 8.012 | 8.074 | +0.061 | +0.76 |
    200 | 11.172 | 11.000 | -0.172 | -1.54 |
    250 | 13.917 | 14.036 | +0.119 | +0.86 |
    300 | 16.905 | 16.543 | -0.362 | -2.14 |
    350 | 19.901 | 20.036 | +0.135 | +0.68 |
    400 | 22.908 | 23.094 | +0.186 | +0.81 |
    450 | 26.273 | 26.101 | -0.172 | -0.66 |
    500 | 29.554 | 29.092 | -0.461 | -1.56 |
    550 | 32.377 | 32.274 | -0.103 | -0.32 |
    600 | 35.855 | 35.322 | -0.533 | -1.49 |
    650 | 39.192 | 38.388 | -0.804 | -2.05 |
    700 | 41.744 | 41.719 | -0.025 | -0.06 |
    750 | 45.016 | 44.496 | -0.520 | -1.16 |
    800 | 48.212 | 47.603 | -0.609 | -1.26 |
    --------------------------------------------------------------

    * 8 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 2.094 | 2.072 | -0.022 | -1.07 |
    100 | 4.162 | 4.273 | +0.111 | +2.66 |
    150 | 6.485 | 6.540 | +0.055 | +0.84 |
    200 | 8.556 | 8.478 | -0.078 | -0.91 |
    250 | 10.458 | 10.258 | -0.200 | -1.91 |
    300 | 12.425 | 12.750 | +0.325 | +2.62 |
    350 | 14.807 | 14.839 | +0.032 | +0.22 |
    400 | 16.801 | 16.959 | +0.158 | +0.94 |
    450 | 19.478 | 19.009 | -0.470 | -2.41 |
    500 | 21.296 | 21.504 | +0.208 | +0.98 |
    550 | 23.842 | 23.979 | +0.137 | +0.57 |
    600 | 26.309 | 26.111 | -0.198 | -0.75 |
    650 | 28.705 | 28.446 | -0.259 | -0.9 |
    700 | 31.233 | 31.394 | +0.161 | +0.52 |
    750 | 34.064 | 33.720 | -0.344 | -1.01 |
    800 | 36.320 | 36.114 | -0.206 | -0.57 |
    --------------------------------------------------------------

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: 'Peter Zijlstra'
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers