Eric Lee / smarc-fsl-linux-kernel

08 Apr, 2009

1 commit

c93f216b5 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y
branch tracer: Fix for enabling branch profiling makes sparse unusable
ftrace: Correct a text align for event format output
Update /debug/tracing/README
tracing/ftrace: alloc the started cpumask for the trace file
tracing, x86: remove duplicated #include
ftrace: Add check of sched_stopped for probe_sched_wakeup
function-graph: add proper initialization for init task
tracing/ftrace: fix missing include string.h
tracing: fix incorrect return type of ns2usecs()
tracing: remove CALLER_ADDR2 from wakeup tracer
blktrace: fix pdu_len when tracing packet command requests
blktrace: small cleanup in blk_msg_write()
blktrace: NUL-terminate user space messages
tracing: move scripts/trace/power.pl to scripts/tracing/power.pl

Linus Torvalds
2009-04-08 05:10:10 +0800

07 Apr, 2009

1 commit

cf8e34746 tracing: fix incorrect return type of ns2usecs() ... Browse Code »

Impact: fix time output bug in 32bits system

ns2usecs() returns 'long', it's incorrect.

(In i386)
...
-0 [000] 521.442100: _spin_lock -0 [000] 521.442101: do_timer -0 [000] 521.442102: update_wall_time -0 [000] 521.442102: update_xtime_cache <-0 [001] 4154502640.134759: rcu_bh_qsctr_inc -0 [001] 4154502640.134760: _local_bh_enable -0 [001] 4154502640.134761: idle_cpu
LKML-Reference:
Reported-by: Li Zefan
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt
Signed-off-by: Ingo Molnar

Lai Jiangshan
2009-04-07 19:59:23 +0800

03 Apr, 2009

1 commit

ca2b84cb3 kmemtrace: use tracepoints ... Browse Code »

kmemtrace now uses tracepoints instead of markers. We no longer need to
use format specifiers to pass arguments.

Signed-off-by: Eduard - Gabriel Munteanu
[ folded: Use the new TP_PROTO and TP_ARGS to fix the build. ]
[ folded: fix build when CONFIG_KMEMTRACE is disabled. ]
[ folded: define tracepoints when CONFIG_TRACEPOINTS is enabled. ]
Signed-off-by: Pekka Enberg
LKML-Reference:
Signed-off-by: Ingo Molnar

Eduard - Gabriel Munteanu
2009-04-03 18:23:06 +0800

31 Mar, 2009

1 commit

8b54e45b0 Merge branches 'tracing/docs', 'tracing/filters', 'tracing/ftrace', 'tracing/kpr… ... Browse Code »

…obes', 'tracing/blktrace-v2' and 'tracing/textedit' into tracing/core-v2

Ingo Molnar
2009-03-31 23:46:40 +0800

24 Mar, 2009

2 commits

be6f164a0 function-graph: add option for include sleep times ... Browse Code »

Impact: give user a choice to show times spent while sleeping

The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.

echo sleep-time > /debug/tracing/trace_options

produces:

------------------------------------------
2) avahi-d-3428 => -0
------------------------------------------

2) | finish_task_switch() {
2) 0.621 us | _spin_unlock_irq();
2) 2.202 us | }
2) ! 1002.197 us | }
2) ! 1003.521 us | }

where as,

echo nosleep-time > /debug/tracing/trace_options

produces:

0) -0 => yum-upd-3416
------------------------------------------

0) | finish_task_switch() {
0) 0.643 us | _spin_unlock_irq();
0) 2.342 us | }
0) + 41.302 us | }
0) + 42.453 us | }

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-24 23:06:24 +0800
4bda2d517 tracing/filters: use trace_seq_printf() to print filters ... Browse Code »

Impact: cleanup

Instead of just using the trace_seq buffer to print the filters, use
trace_seq_printf() as it was intended to be used.

Reported-by: Steven Rostedt
Signed-off-by: Tom Zanussi
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Tom Zanussi
2009-03-24 15:26:52 +0800

23 Mar, 2009

4 commits

07edf7121 tracing/events: don't use wake up for events ... Browse Code »

Impact: fix hard-lockup with sched switch events

Some ftrace events, such as sched wakeup, can be traced
while the runqueue lock is hold. Since they are using
trace_current_buffer_unlock_commit(), they call wake_up()
which can try to grab the runqueue lock too, resulting in
a deadlock.

Now for all event, we call a new helper:
trace_nowake_buffer_unlock_commit() which do pretty the same than
trace_current_buffer_unlock_commit() except than it doesn't call
trace_wake_up().

Reported-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-03-23 16:22:14 +0800
cfb180f3e tracing: add per-subsystem filtering ... Browse Code »

This patch adds per-subsystem filtering to the event tracing subsystem.

It adds a 'filter' debugfs file to each subsystem directory. This file
can be written to to set filters; reading from it will display the
current set of filters set for that subsystem.

Basically what it does is propagate the filter down to each event
contained in the subsystem. If a particular event doesn't have a field
with the name specified in the filter, it simply doesn't get set for
that event. You can verify whether or not the filter was set for a
particular event by looking at the filter file for that event.

As with per-event filters, compound expressions are supported, echoing
'0' to the subsystem's filter file clears all filters in the subsystem,
etc.

Signed-off-by: Tom Zanussi
Acked-by: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Tom Zanussi
2009-03-23 01:38:47 +0800
7ce7e4249 tracing: add per-event filtering ... Browse Code »

This patch adds per-event filtering to the event tracing subsystem.

It adds a 'filter' debugfs file to each event directory. This file can
be written to to set filters; reading from it will display the current
set of filters set for that event.

Basically, any field listed in the 'format' file for an event can be
filtered on (including strings, but not yet other array types) using
either matching ('==') or non-matching ('!=') 'predicates'. A
'predicate' can be either a single expression:

# echo pid != 0 > filter

# cat filter
pid != 0

or a compound expression of up to 8 sub-expressions combined using '&&'
or '||':

# echo comm == Xorg > filter
# echo "&& sig != 29" > filter

# cat filter
comm == Xorg
&& sig != 29

Only events having field values matching an expression will be available
in the trace output; non-matching events are discarded.

Note that a compound expression is built up by echoing each
sub-expression separately - it's not the most efficient way to do
things, but it keeps the parser simple and assumes that compound
expressions will be relatively uncommon. In any case, a subsequent
patch introducing a way to set filters for entire subsystems should
mitigate any need to do this for lots of events.

Setting a filter without an '&&' or '||' clears the previous filter
completely and sets the filter to the new expression:

# cat filter
comm == Xorg
&& sig != 29

# echo comm != Xorg

# cat filter
comm != Xorg

To clear a filter, echo 0 to the filter file:

# echo 0 > filter
# cat filter
none

The limit of 8 predicates for a compound expression is arbitrary - for
efficiency, it's implemented as an array of pointers to predicates, and
8 seemed more than enough for any filter...

Signed-off-by: Tom Zanussi
Acked-by: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Tom Zanussi
2009-03-23 01:38:46 +0800
cf027f645 tracing: add run-time field descriptions for event filtering ... Browse Code »

This patch makes the field descriptions defined for event tracing
available at run-time, for the event-filtering mechanism introduced
in a subsequent patch.

The common event fields are prepended with 'common_' in the format
display, allowing them to be distinguished from the other fields
that might internally have same name and can therefore be
unambiguously used in filters.

Signed-off-by: Tom Zanussi
Acked-by: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Tom Zanussi
2009-03-23 01:11:22 +0800

20 Mar, 2009

2 commits

ac199db01 ftrace: event profile hooks ... Browse Code »

Impact: new tracing infrastructure feature

Provide infrastructure to generate software perf counter events
from tracepoints.

Signed-off-by: Peter Zijlstra
Cc: Paul Mackerras
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-03-20 17:17:07 +0800
40ce74f19 tracing: remove recording function depth from trace_printk ... Browse Code »

The function depth in trace_printk was to facilitate the function
graph output. Now that the function graph calculates the depth within
the trace output, we no longer need to record the depth when the
trace_printk is called.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-20 03:58:47 +0800

18 Mar, 2009

2 commits

327019b01 Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/rostedt/linux-2.6-trace into tracing/ftrace

Ingo Molnar
2009-03-18 13:59:56 +0800
af4617bdb tracing: add global-clock option to provide cross CPU clock to traces ... Browse Code »

Impact: feature to allow better serialized clock

This patch adds an option called "global-clock" that will allow
the tracer to switch to a slower but more accurate (across CPUs)
clock.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-18 11:10:35 +0800

17 Mar, 2009

2 commits

4176935b5 Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/rostedt/linux-2.6-trace into tracing/ftrace

Ingo Molnar
2009-03-17 17:37:37 +0800
4ca530852 tracing: protect reader of cmdline output ... Browse Code »

Impact: fix to one cause of incorrect comm outputs in trace

The spinlock only protected the creation of a comm pid pair.
But it was possible that a reader could look up a pid, and get the
wrong comm because it had no locking.

This also required changing trace_find_cmdline to copy the comm cache
and not just send back a pointer to it.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-17 11:27:06 +0800

16 Mar, 2009

1 commit

7243f2145 Merge branches 'tracing/ftrace', 'tracing/syscalls' and 'linus' into tracing/core ... Browse Code »

Conflicts:
arch/parisc/kernel/irq.c

Ingo Molnar
2009-03-16 16:12:42 +0800

13 Mar, 2009

6 commits

bed1ffca0 tracing/syscalls: core infrastructure for syscalls tracing, enhancements ... Browse Code »

Impact: new feature

This adds the generic support for syscalls tracing. This is
currently exploited through a devoted tracer but other tracing
engines can use it. (They just have to play with
{start,stop}_ftrace_syscalls() and use the display callbacks
unless they want to override them.)

The syscalls prototypes definitions are abused here to steal
some metadata informations:

- syscall name, param types, param names, number of params

The syscall addr is not directly saved during this definition
because we don't know if its prototype is available in the
namespace. But we don't really need it. The arch has just to
build a function able to resolve the syscall number to its
metadata struct.

The current tracer prints the syscall names, parameters names
and values (and their types optionally). Currently the value is
a raw hex but higher level values diplaying is on my TODO list.

Signed-off-by: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-03-13 23:57:42 +0800
62a394eb7 Merge branches 'tracing/ftrace' and 'tracing/syscalls'; commit 'v2.6.29-rc8' into tracing/core Browse Code »

Ingo Molnar
2009-03-13 17:23:39 +0800
ee08c6ecc tracing/ftrace: syscall tracing infrastructure, basics ... Browse Code »

Provide basic callbacks to do syscall tracing.

Signed-off-by: Frederic Weisbecker
Acked-by: Steven Rostedt
Cc: Lai Jiangshan
LKML-Reference:
[ simplified it to a trace_printk() for now. ]
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-03-13 13:25:43 +0800
bdc067582 tracing: add comment for use of double __builtin_consant_p ... Browse Code »

Impact: documentation

The use of the double __builtin_contant_p checks in the event_trace_printk
can be confusing to developers and reviewers. This patch adds a comment
to explain why it is there.

Requested-by: KOSAKI Motohiro
LKML-Reference:
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-13 12:15:46 +0800
e9fb2b6d5 tracing: have event_trace_printk use static tracer ... Browse Code »

Impact: speed up on event tracing

The event_trace_printk is currently a wrapper function that calls
trace_vprintk. Because it uses a variable for the fmt it misses out
on the optimization of using the binary printk.

This patch makes event_trace_printk into a macro wrapper to use the
fmt as the same as the trace_printks.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-13 09:15:00 +0800
48ead0203 tracing/core: bring back raw trace_printk for dynamic formats strings ... Browse Code »

Impact: fix callsites with dynamic format strings

Since its new binary implementation, trace_printk() internally uses static
containers for the format strings on each callsites. But the value is
assigned once at build time, which means that it can't take dynamic
formats.

So this patch unearthes the raw trace_printk implementation for the callers
that will need trace_printk to be able to carry these dynamic format
strings. The trace_printk() macro will use the appropriate implementation
for each callsite. Most of the time however, the binary implementation will
still be used.

The other impact of this patch is that mmiotrace_printk() will use the old
implementation because it calls the low level trace_vprintk and we can't
guess here whether the format passed in it is dynamic or not.

Some parts of this patch have been written by Steven Rostedt (most notably
the part that chooses the appropriate implementation for each callsites).

Signed-off-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Frederic Weisbecker
2009-03-13 09:15:00 +0800

12 Mar, 2009

1 commit

1852fcce1 tracing: expand the ring buffers when an event is activated ... Browse Code »

To save memory, the tracer ring buffers are set to a minimum.
The activating of a trace expands the ring buffer size. This patch
adds this expanding, when an event is activated.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-12 10:15:24 +0800

10 Mar, 2009

1 commit

da4d03020 tracing: new format for specialized trace points ... Browse Code »

Impact: clean up and enhancement

The TRACE_EVENT_FORMAT macro looks quite ugly and is limited in its
ability to save data as well as to print the record out. Working with
Ingo Molnar, we came up with a new format that is much more pleasing to
the eye of C developers. This new macro is more C style than the old
macro, and is more obvious to what it does.

Here's the example. The only updated macro in this patch is the
sched_switch trace point.

The old method looked like this:

TRACE_EVENT_FORMAT(sched_switch,
TP_PROTO(struct rq *rq, struct task_struct *prev,
struct task_struct *next),
TP_ARGS(rq, prev, next),
TP_FMT("task %s:%d ==> %s:%d",
prev->comm, prev->pid, next->comm, next->pid),
TRACE_STRUCT(
TRACE_FIELD(pid_t, prev_pid, prev->pid)
TRACE_FIELD(int, prev_prio, prev->prio)
TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
next_comm,
TP_CMD(memcpy(TRACE_ENTRY->next_comm,
next->comm,
TASK_COMM_LEN)))
TRACE_FIELD(pid_t, next_pid, next->pid)
TRACE_FIELD(int, next_prio, next->prio)
),
TP_RAW_FMT("prev %d:%d ==> next %s:%d:%d")
);

The above method is hard to read and requires two format fields.

The new method:

/*
* Tracepoint for task switches, performed by the scheduler:
*
* (NOTE: the 'rq' argument is not used by generic trace events,
* but used by the latency tracer plugin. )
*/
TRACE_EVENT(sched_switch,

TP_PROTO(struct rq *rq, struct task_struct *prev,
struct task_struct *next),

TP_ARGS(rq, prev, next),

TP_STRUCT__entry(
__array( char, prev_comm, TASK_COMM_LEN )
__field( pid_t, prev_pid )
__field( int, prev_prio )
__array( char, next_comm, TASK_COMM_LEN )
__field( pid_t, next_pid )
__field( int, next_prio )
),

TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
__entry->next_comm, __entry->next_pid, __entry->next_prio),

TP_fast_assign(
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_pid = prev->pid;
__entry->prev_prio = prev->prio;
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
__entry->next_pid = next->pid;
__entry->next_prio = next->prio;
)
);

This macro is called TRACE_EVENT, it is broken up into 5 parts:

TP_PROTO: the proto type of the trace point
TP_ARGS: the arguments of the trace point
TP_STRUCT_entry: the structure layout of the entry in the ring buffer
TP_printk: the printk format
TP_fast_assign: the method used to write the entry into the ring buffer

The structure is the definition of how the event will be saved in the
ring buffer. The printk is used by the internal tracing in case of
an oops, and the kernel needs to print out the format of the record
to the console. This the TP_printk gives a means to show the records
in a human readable format. It is also used to print out the data
from the trace file.

The TP_fast_assign is executed directly. It is basically like a C function,
where the __entry is the handle to the record.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-10 12:35:07 +0800

07 Mar, 2009

3 commits

9de36825b tracing: trace_bprintk() cleanups ... Browse Code »

Impact: cleanup

Remove a few leftovers and clean up the code a bit.

Signed-off-by: Frederic Weisbecker
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-03-07 00:59:12 +0800
769b0441f tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() ... Browse Code »

Impact: faster and lighter tracing

Now that we have trace_bprintk() which is faster and consume lesser
memory than trace_printk() and has the same purpose, we can now drop
the old implementation in favour of the binary one from trace_bprintk(),
which means we move all the implementation of trace_bprintk() to
trace_printk(), so the Api doesn't change except that we must now use
trace_seq_bprintk() to print the TRACE_PRINT entries.

Some changes result of this:

- Previously, trace_bprintk depended of a single tracer and couldn't
work without. This tracer has been dropped and the whole implementation
of trace_printk() (like the module formats management) is now integrated
in the tracing core (comes with CONFIG_TRACING), though we keep the file
trace_printk (previously trace_bprintk.c) where we can find the module
management. Thus we don't overflow trace.c

- changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.

- change a bit trace_printk/trace_vprintk macros to support non-builtin formats
constants, and fix 'const' qualifiers warnings. But this is all transparent for
developers.

- etc...

V2:

- Rebase against last changes
- Fix mispell on the changelog

V3:

- Rebase against last changes (moving trace_printk() to kernel.h)

Signed-off-by: Frederic Weisbecker
Acked-by: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-03-07 00:59:12 +0800
1427cdf05 tracing: infrastructure for supporting binary record ... Browse Code »

Impact: save on memory for tracing

Current tracers are typically using a struct(like struct ftrace_entry,
struct ctx_switch_entry, struct special_entr etc...)to record a binary
event. These structs can only record a their own kind of events.
A new kind of tracer need a new struct and a lot of code too handle it.

So we need a generic binary record for events. This infrastructure
is for this purpose.

[fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
tracing as reported by Steven Rostedt]

Signed-off-by: Lai Jiangshan
Signed-off-by: Frederic Weisbecker
Acked-by: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Lai Jiangshan
2009-03-07 00:59:11 +0800

05 Mar, 2009

2 commits

5e1607a00 tracing: rename ftrace_printk() => trace_printk() ... Browse Code »

Impact: cleanup

Use a more generic name - this also allows the prototype to move
to kernel.h and be generally available to kernel developers who
want to do some quick tracing.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-03-05 17:24:48 +0800
c032ef64d tracing: add latency output format option ... Browse Code »

With the removal of the latency_trace file, we lost the ability
to see some of the finer details in a trace. Like the state of
interrupts enabled, the preempt count, need resched, and if we
are in an interrupt handler, softirq handler or not.

This patch simply creates an option to bring back the old format.
This also removes the warning about an unused variable that held
the latency_trace file operations.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-05 09:34:24 +0800

04 Mar, 2009

1 commit

2cadf9135 tracing: add binary buffer files for use with splice ... Browse Code »

Impact: new feature

This patch creates a directory of files that correspond to the
per CPU ring buffers. These are binary files and are made to
be used with splice. This is the fastest way to extract data from
the ftrace ring buffers.

Thanks to Jiaying Zhang for pushing me to get this code fixed,
and to Eduard - Gabriel Munteanu for his splice code that helped
me debug my code.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-04 10:01:55 +0800

03 Mar, 2009

2 commits

981d081ec tracing: add format file to describe event struct fields ... Browse Code »

This patch adds the "format" file to the trace point event directory.
This is based off of work by Tom Zanussi, in which a file is exported
to be tread from user land such that a user space app may read the
binary record stored in the ring buffer.

# cat /debug/tracing/events/sched/sched_switch/format
field:pid_t prev_pid; offset:12; size:4;
field:int prev_prio; offset:16; size:4;
field special:char next_comm[TASK_COMM_LEN]; offset:20; size:16;
field:pid_t next_pid; offset:36; size:4;
field:int next_prio; offset:40; size:4;

Idea-from: Tom Zanussi
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-03 03:27:27 +0800
f9520750c tracing: make trace_seq_reset global and rename to trace_seq_init ... Browse Code »

Impact: clean up

The trace_seq functions may be used separately outside of the ftrace
iterator. The trace_seq_reset is needed for these operations.

This patch also renames trace_seq_reset to the more appropriate
trace_seq_init.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-03-03 03:08:51 +0800

28 Feb, 2009

3 commits

fd9949898 tracing: add raw fast tracing interface for trace events ... Browse Code »

This patch adds the interface to enable the C style trace points.
In the directory /debugfs/tracing/events/subsystem/event
We now have three files:

enable : values 0 or 1 to enable or disable the trace event.

available_types: values 'raw' and 'printf' which indicate the tracing
types available for the trace point. If a developer does not
use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT
macro, then only 'printf' will be available. This file is
read only.

type: values 'raw' or 'printf'. This indicates which type of tracing
is active for that trace point. 'printf' is the default and
if 'raw' is not available, this file is read only.

# echo raw > /debug/tracing/events/sched/sched_wakeup/type
# echo 1 > /debug/tracing/events/sched/sched_wakeup/enable

Will enable the C style tracing for the sched_wakeup trace point.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-02-28 17:04:03 +0800
c32e827b2 tracing: add raw trace point recording infrastructure ... Browse Code »

Impact: lower overhead tracing

The current event tracer can automatically pick up trace points
that are registered with the TRACE_FORMAT macro. But it required
a printf format string and parsing. Although, this adds the ability
to get guaranteed information like task names and such, it took
a hit in overhead processing. This processing can add about 500-1000
nanoseconds overhead, but in some cases that too is considered
too much and we want to shave off as much from this overhead as
possible.

Tom Zanussi recently posted tracing patches to lkml that are based
on a nice idea about capturing the data via C structs using
STRUCT_ENTER, STRUCT_EXIT type of macros.

I liked that method very much, but did not like the implementation
that required a developer to add data/code in several disjoint
locations.

This patch extends the event_tracer macros to do a similar "raw C"
approach that Tom Zanussi did. But instead of having the developers
needing to tweak a bunch of code all over the place, they can do it
all in one macro - preferably placed near the code that it is
tracing. That makes it much more likely that tracepoints will be
maintained on an ongoing basis by the code they modify.

The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
a developer may still utilize the more low level DECLARE_TRACE macros
if they don't care about getting their traces automatically in the event
tracer.)

They can also use the existing TRACE_FORMAT if they don't need to code
the tracepoint in C, but just want to use the convenience of printf.

So if the developer wants to "hardwire" a tracepoint in the fastest
possible way, and wants to acquire their data via a user space utility
in a raw binary format, or wants to see it in the trace output but not
sacrifice any performance, then they can implement the faster but
more complex TRACE_EVENT_FORMAT macro.

Here's what usage looks like:

TRACE_EVENT_FORMAT(name,
TPPROTO(proto),
TPARGS(args),
TPFMT(fmt, fmt_args),
TRACE_STUCT(
TRACE_FIELD(type1, item1, assign1)
TRACE_FIELD(type2, item2, assign2)
[...]
),
TPRAWFMT(raw_fmt)
);

Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
uses.

name: is the unique identifier of the trace point
proto: The proto type that the trace point uses
args: the args in the proto type
fmt: printf format to use with the event printf tracer
fmt_args: the printf argments to match fmt

TRACE_STRUCT starts the ability to create a structure.
Each item in the structure is defined with a TRACE_FIELD

TRACE_FIELD(type, item, assign)

type: the C type of item.
item: the name of the item in the stucture
assign: what to assign the item in the trace point callback

raw_fmt is a way to pretty print the struct. It must match
the order of the items are added in TRACE_STUCT

An example of this would be:

TRACE_EVENT_FORMAT(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p, int success),
TPARGS(rq, p, success),
TPFMT("task %s:%d %s",
p->comm, p->pid, success?"succeeded":"failed"),
TRACE_STRUCT(
TRACE_FIELD(pid_t, pid, p->pid)
TRACE_FIELD(int, success, success)
),
TPRAWFMT("task %d success=%d")
);

This creates us a unique struct of:

struct {
pid_t pid;
int success;
};

And the way the call back would assign these values would be:

entry->pid = p->pid;
entry->success = success;

The nice part about this is that the creation of the assignent is done
via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
created, the developer will then have a faster method to record
into the ring buffer. They do not need to worry about the tracer itself.

The developer would only need to touch the files in include/trace/*.h

Again, I would like to give special thanks to Tom Zanussi for this
nice idea.

Idea-from: Tom Zanussi
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-02-28 16:09:32 +0800
ef5580d0f tracing: add interface to write into current tracer buffer ... Browse Code »

Right now all tracers must manage their own trace buffers. This was
to enforce tracers to be independent in case we finally decide to
allow each tracer to have their own trace buffer.

But now we are adding event tracing that writes to the current tracer's
buffer. This adds an interface to allow events to write to the current
tracer buffer without having to manage its own. Since event tracing
has no "tracer", and is just a way to hook into any other tracer.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-02-28 16:06:44 +0800

25 Feb, 2009

2 commits

d7350c3f4 tracing/core: make the read callbacks reentrants ... Browse Code »

Now that several per-cpu files can be read or spliced at the
same, we want the read/splice callbacks for tracing files to be
reentrants.

Until now, a single global mutex (trace_types_lock) serialized
the access to tracing_read_pipe(), tracing_splice_read_pipe(),
and the seq helpers.

Ie: it means that if a user tries to read trace_pipe0 and
trace_pipe1 at the same time, the access to the function
tracing_read_pipe() is contended and one reader must wait for
the other to finish its read call.

The trace_type_lock mutex is mostly here to serialize the access
to the global current tracer (current_trace), which can be
changed concurrently. Although the iter struct keeps a private
pointer to this tracer, its callbacks can be changed by another
function.

The method used here is to not keep anymore private reference to
the tracer inside the iterator but to make a copy of it inside
the iterator. Then it checks on subsequents read calls if the
tracer has changed. This is not costly because the current
tracer is not expected to be changed often, so we use a branch
prediction for that.

Moreover, we add a private mutex to the iterator (there is one
iterator per file descriptor) to serialize the accesses in case
of multiple consumers per file descriptor (which would be a
silly idea from the user). Note that this is not to protect the
ring buffer, since the ring buffer already serializes the
readers accesses. This is to prevent from traces weirdness in
case of concurrent consumers. But these mutexes can be dropped
anyway, that would not result in any crash. Just tell me what
you think about it.

Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Steven Rostedt
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-02-25 20:40:58 +0800
b04cc6b1f tracing/core: introduce per cpu tracing files ... Browse Code »

Impact: split up tracing output per cpu

Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:

- trace is an iterator through the ring-buffer. It's a reader
but not a consumer It doesn't block when no more traces are
available.

- trace pretty similar to the former, except that it adds more
informations such as prempt count, irq flag, ...

- trace_pipe is a reader and a consumer, it will also block
waiting for traces if necessary (heh, yes it's a pipe).

The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.

The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.

So this patch creates a new directory: /debug/tracing/per_cpu/.

Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.

Which means if you have two cpus, you will have:

trace0
trace1
trace_pipe0
trace_pipe1

And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.

The original all-in-one cpu trace file are still available on
their original place.

Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.

Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.

Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.

Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Steven Rostedt
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-02-25 20:40:58 +0800

18 Feb, 2009

1 commit

6eaaa5d57 tracing/core: use appropriate waiting on trace_pipe ... Browse Code »

Impact: api and pipe waiting change

Currently, the waiting used in tracing_read_pipe() is done through a
100 msecs schedule_timeout() loop which periodically check if there
are traces on the buffer.

This can cause small latencies for programs which are reading the incoming
events.

This patch makes the reader waiting for the trace_wait waitqueue except
for few tracers such as the sched and functions tracers which might be
already hold the runqueue lock while waking up the reader.

This is performed through a new callback wait_pipe() on struct tracer.
If none is implemented on a specific tracer, the default waiting for
trace_wait queue is attached.

Signed-off-by: Frederic Weisbecker
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-02-18 08:40:20 +0800

10 Feb, 2009

1 commit

3c56819b1 tracing: splice support for tracing_pipe ... Browse Code »

Added and implemented tracing_pipe_fops->splice_read(). This allows
userspace programs to get tracing data more efficiently.

Signed-off-by: Eduard - Gabriel Munteanu
Signed-off-by: Steven Rostedt

Eduard - Gabriel Munteanu
2009-02-10 01:24:34 +0800