08 Apr, 2009

1 commit

  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y
    branch tracer: Fix for enabling branch profiling makes sparse unusable
    ftrace: Correct a text align for event format output
    Update /debug/tracing/README
    tracing/ftrace: alloc the started cpumask for the trace file
    tracing, x86: remove duplicated #include
    ftrace: Add check of sched_stopped for probe_sched_wakeup
    function-graph: add proper initialization for init task
    tracing/ftrace: fix missing include string.h
    tracing: fix incorrect return type of ns2usecs()
    tracing: remove CALLER_ADDR2 from wakeup tracer
    blktrace: fix pdu_len when tracing packet command requests
    blktrace: small cleanup in blk_msg_write()
    blktrace: NUL-terminate user space messages
    tracing: move scripts/trace/power.pl to scripts/tracing/power.pl

    Linus Torvalds
     

07 Apr, 2009

1 commit

  • Impact: fix time output bug in 32bits system

    ns2usecs() returns 'long', it's incorrect.

    (In i386)
    ...
    -0 [000] 521.442100: _spin_lock -0 [000] 521.442101: do_timer -0 [000] 521.442102: update_wall_time -0 [000] 521.442102: update_xtime_cache <-0 [001] 4154502640.134759: rcu_bh_qsctr_inc -0 [001] 4154502640.134760: _local_bh_enable -0 [001] 4154502640.134761: idle_cpu
    LKML-Reference:
    Reported-by: Li Zefan
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     

03 Apr, 2009

1 commit

  • kmemtrace now uses tracepoints instead of markers. We no longer need to
    use format specifiers to pass arguments.

    Signed-off-by: Eduard - Gabriel Munteanu
    [ folded: Use the new TP_PROTO and TP_ARGS to fix the build. ]
    [ folded: fix build when CONFIG_KMEMTRACE is disabled. ]
    [ folded: define tracepoints when CONFIG_TRACEPOINTS is enabled. ]
    Signed-off-by: Pekka Enberg
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eduard - Gabriel Munteanu
     

31 Mar, 2009

1 commit


24 Mar, 2009

2 commits

  • Impact: give user a choice to show times spent while sleeping

    The user may want to see the time a function spent sleeping.
    This patch adds the trace option "sleep-time" to allow that.
    The "sleep-time" option is default on.

    echo sleep-time > /debug/tracing/trace_options

    produces:

    ------------------------------------------
    2) avahi-d-3428 => -0
    ------------------------------------------

    2) | finish_task_switch() {
    2) 0.621 us | _spin_unlock_irq();
    2) 2.202 us | }
    2) ! 1002.197 us | }
    2) ! 1003.521 us | }

    where as,

    echo nosleep-time > /debug/tracing/trace_options

    produces:

    0) -0 => yum-upd-3416
    ------------------------------------------

    0) | finish_task_switch() {
    0) 0.643 us | _spin_unlock_irq();
    0) 2.342 us | }
    0) + 41.302 us | }
    0) + 42.453 us | }

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: cleanup

    Instead of just using the trace_seq buffer to print the filters, use
    trace_seq_printf() as it was intended to be used.

    Reported-by: Steven Rostedt
    Signed-off-by: Tom Zanussi
    Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     

23 Mar, 2009

4 commits

  • Impact: fix hard-lockup with sched switch events

    Some ftrace events, such as sched wakeup, can be traced
    while the runqueue lock is hold. Since they are using
    trace_current_buffer_unlock_commit(), they call wake_up()
    which can try to grab the runqueue lock too, resulting in
    a deadlock.

    Now for all event, we call a new helper:
    trace_nowake_buffer_unlock_commit() which do pretty the same than
    trace_current_buffer_unlock_commit() except than it doesn't call
    trace_wake_up().

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • This patch adds per-subsystem filtering to the event tracing subsystem.

    It adds a 'filter' debugfs file to each subsystem directory. This file
    can be written to to set filters; reading from it will display the
    current set of filters set for that subsystem.

    Basically what it does is propagate the filter down to each event
    contained in the subsystem. If a particular event doesn't have a field
    with the name specified in the filter, it simply doesn't get set for
    that event. You can verify whether or not the filter was set for a
    particular event by looking at the filter file for that event.

    As with per-event filters, compound expressions are supported, echoing
    '0' to the subsystem's filter file clears all filters in the subsystem,
    etc.

    Signed-off-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     
  • This patch adds per-event filtering to the event tracing subsystem.

    It adds a 'filter' debugfs file to each event directory. This file can
    be written to to set filters; reading from it will display the current
    set of filters set for that event.

    Basically, any field listed in the 'format' file for an event can be
    filtered on (including strings, but not yet other array types) using
    either matching ('==') or non-matching ('!=') 'predicates'. A
    'predicate' can be either a single expression:

    # echo pid != 0 > filter

    # cat filter
    pid != 0

    or a compound expression of up to 8 sub-expressions combined using '&&'
    or '||':

    # echo comm == Xorg > filter
    # echo "&& sig != 29" > filter

    # cat filter
    comm == Xorg
    && sig != 29

    Only events having field values matching an expression will be available
    in the trace output; non-matching events are discarded.

    Note that a compound expression is built up by echoing each
    sub-expression separately - it's not the most efficient way to do
    things, but it keeps the parser simple and assumes that compound
    expressions will be relatively uncommon. In any case, a subsequent
    patch introducing a way to set filters for entire subsystems should
    mitigate any need to do this for lots of events.

    Setting a filter without an '&&' or '||' clears the previous filter
    completely and sets the filter to the new expression:

    # cat filter
    comm == Xorg
    && sig != 29

    # echo comm != Xorg

    # cat filter
    comm != Xorg

    To clear a filter, echo 0 to the filter file:

    # echo 0 > filter
    # cat filter
    none

    The limit of 8 predicates for a compound expression is arbitrary - for
    efficiency, it's implemented as an array of pointers to predicates, and
    8 seemed more than enough for any filter...

    Signed-off-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     
  • This patch makes the field descriptions defined for event tracing
    available at run-time, for the event-filtering mechanism introduced
    in a subsequent patch.

    The common event fields are prepended with 'common_' in the format
    display, allowing them to be distinguished from the other fields
    that might internally have same name and can therefore be
    unambiguously used in filters.

    Signed-off-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     

20 Mar, 2009

2 commits


18 Mar, 2009

2 commits


17 Mar, 2009

2 commits


16 Mar, 2009

1 commit


13 Mar, 2009

6 commits

  • Impact: new feature

    This adds the generic support for syscalls tracing. This is
    currently exploited through a devoted tracer but other tracing
    engines can use it. (They just have to play with
    {start,stop}_ftrace_syscalls() and use the display callbacks
    unless they want to override them.)

    The syscalls prototypes definitions are abused here to steal
    some metadata informations:

    - syscall name, param types, param names, number of params

    The syscall addr is not directly saved during this definition
    because we don't know if its prototype is available in the
    namespace. But we don't really need it. The arch has just to
    build a function able to resolve the syscall number to its
    metadata struct.

    The current tracer prints the syscall names, parameters names
    and values (and their types optionally). Currently the value is
    a raw hex but higher level values diplaying is on my TODO list.

    Signed-off-by: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Ingo Molnar
     
  • Provide basic callbacks to do syscall tracing.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Steven Rostedt
    Cc: Lai Jiangshan
    LKML-Reference:
    [ simplified it to a trace_printk() for now. ]
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: documentation

    The use of the double __builtin_contant_p checks in the event_trace_printk
    can be confusing to developers and reviewers. This patch adds a comment
    to explain why it is there.

    Requested-by: KOSAKI Motohiro
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: speed up on event tracing

    The event_trace_printk is currently a wrapper function that calls
    trace_vprintk. Because it uses a variable for the fmt it misses out
    on the optimization of using the binary printk.

    This patch makes event_trace_printk into a macro wrapper to use the
    fmt as the same as the trace_printks.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: fix callsites with dynamic format strings

    Since its new binary implementation, trace_printk() internally uses static
    containers for the format strings on each callsites. But the value is
    assigned once at build time, which means that it can't take dynamic
    formats.

    So this patch unearthes the raw trace_printk implementation for the callers
    that will need trace_printk to be able to carry these dynamic format
    strings. The trace_printk() macro will use the appropriate implementation
    for each callsite. Most of the time however, the binary implementation will
    still be used.

    The other impact of this patch is that mmiotrace_printk() will use the old
    implementation because it calls the low level trace_vprintk and we can't
    guess here whether the format passed in it is dynamic or not.

    Some parts of this patch have been written by Steven Rostedt (most notably
    the part that chooses the appropriate implementation for each callsites).

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Frederic Weisbecker
     

12 Mar, 2009

1 commit


10 Mar, 2009

1 commit

  • Impact: clean up and enhancement

    The TRACE_EVENT_FORMAT macro looks quite ugly and is limited in its
    ability to save data as well as to print the record out. Working with
    Ingo Molnar, we came up with a new format that is much more pleasing to
    the eye of C developers. This new macro is more C style than the old
    macro, and is more obvious to what it does.

    Here's the example. The only updated macro in this patch is the
    sched_switch trace point.

    The old method looked like this:

    TRACE_EVENT_FORMAT(sched_switch,
    TP_PROTO(struct rq *rq, struct task_struct *prev,
    struct task_struct *next),
    TP_ARGS(rq, prev, next),
    TP_FMT("task %s:%d ==> %s:%d",
    prev->comm, prev->pid, next->comm, next->pid),
    TRACE_STRUCT(
    TRACE_FIELD(pid_t, prev_pid, prev->pid)
    TRACE_FIELD(int, prev_prio, prev->prio)
    TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
    next_comm,
    TP_CMD(memcpy(TRACE_ENTRY->next_comm,
    next->comm,
    TASK_COMM_LEN)))
    TRACE_FIELD(pid_t, next_pid, next->pid)
    TRACE_FIELD(int, next_prio, next->prio)
    ),
    TP_RAW_FMT("prev %d:%d ==> next %s:%d:%d")
    );

    The above method is hard to read and requires two format fields.

    The new method:

    /*
    * Tracepoint for task switches, performed by the scheduler:
    *
    * (NOTE: the 'rq' argument is not used by generic trace events,
    * but used by the latency tracer plugin. )
    */
    TRACE_EVENT(sched_switch,

    TP_PROTO(struct rq *rq, struct task_struct *prev,
    struct task_struct *next),

    TP_ARGS(rq, prev, next),

    TP_STRUCT__entry(
    __array( char, prev_comm, TASK_COMM_LEN )
    __field( pid_t, prev_pid )
    __field( int, prev_prio )
    __array( char, next_comm, TASK_COMM_LEN )
    __field( pid_t, next_pid )
    __field( int, next_prio )
    ),

    TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
    __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
    __entry->next_comm, __entry->next_pid, __entry->next_prio),

    TP_fast_assign(
    memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
    __entry->prev_pid = prev->pid;
    __entry->prev_prio = prev->prio;
    memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
    __entry->next_pid = next->pid;
    __entry->next_prio = next->prio;
    )
    );

    This macro is called TRACE_EVENT, it is broken up into 5 parts:

    TP_PROTO: the proto type of the trace point
    TP_ARGS: the arguments of the trace point
    TP_STRUCT_entry: the structure layout of the entry in the ring buffer
    TP_printk: the printk format
    TP_fast_assign: the method used to write the entry into the ring buffer

    The structure is the definition of how the event will be saved in the
    ring buffer. The printk is used by the internal tracing in case of
    an oops, and the kernel needs to print out the format of the record
    to the console. This the TP_printk gives a means to show the records
    in a human readable format. It is also used to print out the data
    from the trace file.

    The TP_fast_assign is executed directly. It is basically like a C function,
    where the __entry is the handle to the record.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

07 Mar, 2009

3 commits

  • Impact: cleanup

    Remove a few leftovers and clean up the code a bit.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: faster and lighter tracing

    Now that we have trace_bprintk() which is faster and consume lesser
    memory than trace_printk() and has the same purpose, we can now drop
    the old implementation in favour of the binary one from trace_bprintk(),
    which means we move all the implementation of trace_bprintk() to
    trace_printk(), so the Api doesn't change except that we must now use
    trace_seq_bprintk() to print the TRACE_PRINT entries.

    Some changes result of this:

    - Previously, trace_bprintk depended of a single tracer and couldn't
    work without. This tracer has been dropped and the whole implementation
    of trace_printk() (like the module formats management) is now integrated
    in the tracing core (comes with CONFIG_TRACING), though we keep the file
    trace_printk (previously trace_bprintk.c) where we can find the module
    management. Thus we don't overflow trace.c

    - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.

    - change a bit trace_printk/trace_vprintk macros to support non-builtin formats
    constants, and fix 'const' qualifiers warnings. But this is all transparent for
    developers.

    - etc...

    V2:

    - Rebase against last changes
    - Fix mispell on the changelog

    V3:

    - Rebase against last changes (moving trace_printk() to kernel.h)

    Signed-off-by: Frederic Weisbecker
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: save on memory for tracing

    Current tracers are typically using a struct(like struct ftrace_entry,
    struct ctx_switch_entry, struct special_entr etc...)to record a binary
    event. These structs can only record a their own kind of events.
    A new kind of tracer need a new struct and a lot of code too handle it.

    So we need a generic binary record for events. This infrastructure
    is for this purpose.

    [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
    tracing as reported by Steven Rostedt]

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Frederic Weisbecker
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     

05 Mar, 2009

2 commits

  • Impact: cleanup

    Use a more generic name - this also allows the prototype to move
    to kernel.h and be generally available to kernel developers who
    want to do some quick tracing.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • With the removal of the latency_trace file, we lost the ability
    to see some of the finer details in a trace. Like the state of
    interrupts enabled, the preempt count, need resched, and if we
    are in an interrupt handler, softirq handler or not.

    This patch simply creates an option to bring back the old format.
    This also removes the warning about an unused variable that held
    the latency_trace file operations.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

04 Mar, 2009

1 commit

  • Impact: new feature

    This patch creates a directory of files that correspond to the
    per CPU ring buffers. These are binary files and are made to
    be used with splice. This is the fastest way to extract data from
    the ftrace ring buffers.

    Thanks to Jiaying Zhang for pushing me to get this code fixed,
    and to Eduard - Gabriel Munteanu for his splice code that helped
    me debug my code.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

03 Mar, 2009

2 commits

  • This patch adds the "format" file to the trace point event directory.
    This is based off of work by Tom Zanussi, in which a file is exported
    to be tread from user land such that a user space app may read the
    binary record stored in the ring buffer.

    # cat /debug/tracing/events/sched/sched_switch/format
    field:pid_t prev_pid; offset:12; size:4;
    field:int prev_prio; offset:16; size:4;
    field special:char next_comm[TASK_COMM_LEN]; offset:20; size:16;
    field:pid_t next_pid; offset:36; size:4;
    field:int next_prio; offset:40; size:4;

    Idea-from: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: clean up

    The trace_seq functions may be used separately outside of the ftrace
    iterator. The trace_seq_reset is needed for these operations.

    This patch also renames trace_seq_reset to the more appropriate
    trace_seq_init.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

28 Feb, 2009

3 commits

  • This patch adds the interface to enable the C style trace points.
    In the directory /debugfs/tracing/events/subsystem/event
    We now have three files:

    enable : values 0 or 1 to enable or disable the trace event.

    available_types: values 'raw' and 'printf' which indicate the tracing
    types available for the trace point. If a developer does not
    use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT
    macro, then only 'printf' will be available. This file is
    read only.

    type: values 'raw' or 'printf'. This indicates which type of tracing
    is active for that trace point. 'printf' is the default and
    if 'raw' is not available, this file is read only.

    # echo raw > /debug/tracing/events/sched/sched_wakeup/type
    # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable

    Will enable the C style tracing for the sched_wakeup trace point.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: lower overhead tracing

    The current event tracer can automatically pick up trace points
    that are registered with the TRACE_FORMAT macro. But it required
    a printf format string and parsing. Although, this adds the ability
    to get guaranteed information like task names and such, it took
    a hit in overhead processing. This processing can add about 500-1000
    nanoseconds overhead, but in some cases that too is considered
    too much and we want to shave off as much from this overhead as
    possible.

    Tom Zanussi recently posted tracing patches to lkml that are based
    on a nice idea about capturing the data via C structs using
    STRUCT_ENTER, STRUCT_EXIT type of macros.

    I liked that method very much, but did not like the implementation
    that required a developer to add data/code in several disjoint
    locations.

    This patch extends the event_tracer macros to do a similar "raw C"
    approach that Tom Zanussi did. But instead of having the developers
    needing to tweak a bunch of code all over the place, they can do it
    all in one macro - preferably placed near the code that it is
    tracing. That makes it much more likely that tracepoints will be
    maintained on an ongoing basis by the code they modify.

    The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
    a developer may still utilize the more low level DECLARE_TRACE macros
    if they don't care about getting their traces automatically in the event
    tracer.)

    They can also use the existing TRACE_FORMAT if they don't need to code
    the tracepoint in C, but just want to use the convenience of printf.

    So if the developer wants to "hardwire" a tracepoint in the fastest
    possible way, and wants to acquire their data via a user space utility
    in a raw binary format, or wants to see it in the trace output but not
    sacrifice any performance, then they can implement the faster but
    more complex TRACE_EVENT_FORMAT macro.

    Here's what usage looks like:

    TRACE_EVENT_FORMAT(name,
    TPPROTO(proto),
    TPARGS(args),
    TPFMT(fmt, fmt_args),
    TRACE_STUCT(
    TRACE_FIELD(type1, item1, assign1)
    TRACE_FIELD(type2, item2, assign2)
    [...]
    ),
    TPRAWFMT(raw_fmt)
    );

    Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
    uses.

    name: is the unique identifier of the trace point
    proto: The proto type that the trace point uses
    args: the args in the proto type
    fmt: printf format to use with the event printf tracer
    fmt_args: the printf argments to match fmt

    TRACE_STRUCT starts the ability to create a structure.
    Each item in the structure is defined with a TRACE_FIELD

    TRACE_FIELD(type, item, assign)

    type: the C type of item.
    item: the name of the item in the stucture
    assign: what to assign the item in the trace point callback

    raw_fmt is a way to pretty print the struct. It must match
    the order of the items are added in TRACE_STUCT

    An example of this would be:

    TRACE_EVENT_FORMAT(sched_wakeup,
    TPPROTO(struct rq *rq, struct task_struct *p, int success),
    TPARGS(rq, p, success),
    TPFMT("task %s:%d %s",
    p->comm, p->pid, success?"succeeded":"failed"),
    TRACE_STRUCT(
    TRACE_FIELD(pid_t, pid, p->pid)
    TRACE_FIELD(int, success, success)
    ),
    TPRAWFMT("task %d success=%d")
    );

    This creates us a unique struct of:

    struct {
    pid_t pid;
    int success;
    };

    And the way the call back would assign these values would be:

    entry->pid = p->pid;
    entry->success = success;

    The nice part about this is that the creation of the assignent is done
    via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
    created, the developer will then have a faster method to record
    into the ring buffer. They do not need to worry about the tracer itself.

    The developer would only need to touch the files in include/trace/*.h

    Again, I would like to give special thanks to Tom Zanussi for this
    nice idea.

    Idea-from: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Right now all tracers must manage their own trace buffers. This was
    to enforce tracers to be independent in case we finally decide to
    allow each tracer to have their own trace buffer.

    But now we are adding event tracing that writes to the current tracer's
    buffer. This adds an interface to allow events to write to the current
    tracer buffer without having to manage its own. Since event tracing
    has no "tracer", and is just a way to hook into any other tracer.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

25 Feb, 2009

2 commits

  • Now that several per-cpu files can be read or spliced at the
    same, we want the read/splice callbacks for tracing files to be
    reentrants.

    Until now, a single global mutex (trace_types_lock) serialized
    the access to tracing_read_pipe(), tracing_splice_read_pipe(),
    and the seq helpers.

    Ie: it means that if a user tries to read trace_pipe0 and
    trace_pipe1 at the same time, the access to the function
    tracing_read_pipe() is contended and one reader must wait for
    the other to finish its read call.

    The trace_type_lock mutex is mostly here to serialize the access
    to the global current tracer (current_trace), which can be
    changed concurrently. Although the iter struct keeps a private
    pointer to this tracer, its callbacks can be changed by another
    function.

    The method used here is to not keep anymore private reference to
    the tracer inside the iterator but to make a copy of it inside
    the iterator. Then it checks on subsequents read calls if the
    tracer has changed. This is not costly because the current
    tracer is not expected to be changed often, so we use a branch
    prediction for that.

    Moreover, we add a private mutex to the iterator (there is one
    iterator per file descriptor) to serialize the accesses in case
    of multiple consumers per file descriptor (which would be a
    silly idea from the user). Note that this is not to protect the
    ring buffer, since the ring buffer already serializes the
    readers accesses. This is to prevent from traces weirdness in
    case of concurrent consumers. But these mutexes can be dropped
    anyway, that would not result in any crash. Just tell me what
    you think about it.

    Signed-off-by: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: split up tracing output per cpu

    Currently, on the tracing debugfs directory, three files are
    available to the user to let him extracting the trace output:

    - trace is an iterator through the ring-buffer. It's a reader
    but not a consumer It doesn't block when no more traces are
    available.

    - trace pretty similar to the former, except that it adds more
    informations such as prempt count, irq flag, ...

    - trace_pipe is a reader and a consumer, it will also block
    waiting for traces if necessary (heh, yes it's a pipe).

    The traces coming from different cpus are curretly mixed up
    inside these files. Sometimes it messes up the informations,
    sometimes it's useful, depending on what does the tracer
    capture.

    The tracing_cpumask file is useful to filter the output and
    select only the traces captured a custom defined set of cpus.
    But still it is not enough powerful to extract at the same time
    one trace buffer per cpu.

    So this patch creates a new directory: /debug/tracing/per_cpu/.

    Inside this directory, you will now find one trace_pipe file and
    one trace file per cpu.

    Which means if you have two cpus, you will have:

    trace0
    trace1
    trace_pipe0
    trace_pipe1

    And of course, reading these files will have the same effect
    than with the usual tracing files, except that you will only see
    the traces from the given cpu.

    The original all-in-one cpu trace file are still available on
    their original place.

    Until now, only one consumer was allowed on trace_pipe to avoid
    racy consuming on the ring-buffer. Now the approach changed a
    bit, you can have only one consumer per cpu.

    Which means you are allowed to read concurrently trace_pipe0 and
    trace_pipe1 But you can't have two readers on trace_pipe0 or
    trace_pipe1.

    Following the same logic, if there is one reader on the common
    trace_pipe, you can not have at the same time another reader on
    trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
    a consumer in all cpu buffers in essence.

    Signed-off-by: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

18 Feb, 2009

1 commit

  • Impact: api and pipe waiting change

    Currently, the waiting used in tracing_read_pipe() is done through a
    100 msecs schedule_timeout() loop which periodically check if there
    are traces on the buffer.

    This can cause small latencies for programs which are reading the incoming
    events.

    This patch makes the reader waiting for the trace_wait waitqueue except
    for few tracers such as the sched and functions tracers which might be
    already hold the runqueue lock while waking up the reader.

    This is performed through a new callback wait_pipe() on struct tracer.
    If none is implemented on a specific tracer, the default waiting for
    trace_wait queue is attached.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

10 Feb, 2009

1 commit