Eric Lee / linux-smarc-t335x-v3.2

19 Nov, 2010

1 commit

042957801 tracing/events: Show real number in array fields ... Browse Code »

Currently we have in something like the sched_switch event:

field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1;

When a userspace tool such as perf tries to parse this, the
TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
simply uses a #len to show the string of the length. When the length is
an enum, we get a string that means nothing for tools.

By adding a static buffer and a mutex to protect it, we can store the
string into that buffer with snprintf and show the actual number.
Now we get:

field:char prev_comm[16]; offset:12; size:16; signed:1;

Something much more useful.

Signed-off-by: Steven Rostedt

Steven Rostedt
2010-11-19 23:18:47 +0800

18 Nov, 2010

2 commits

53cf810b1 tracing: Allow syscall trace events for non privileged users ... Browse Code »

As for the raw syscalls events, individual syscall events won't
leak system wide information on task bound tracing. Allow non
privileged users to use them in such workflow.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Thomas Gleixner
Cc: Steven Rostedt
Cc: Li Zefan
Cc: Jason Baron

Frederic Weisbecker
2010-11-18 21:37:44 +0800
61c32659b tracing: New flag to allow non privileged users to use a trace event ... Browse Code »

This adds a new trace event internal flag that allows them to be
used in perf by non privileged users in case of task bound tracing.

This is desired for syscalls tracepoint because they don't leak
global system informations, like some other tracepoints.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Thomas Gleixner
Cc: Steven Rostedt
Cc: Li Zefan
Cc: Jason Baron

Frederic Weisbecker
2010-11-18 21:37:40 +0800

10 Sep, 2010

1 commit

a4eaf7f14 perf: Rework the PMU methods ... Browse Code »

Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.

The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.

This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).

It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).

The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:

1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state

2) We store the counter state and ignore all read/overflow events

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:30 +0800

19 Aug, 2010

1 commit

6016ee13d perf, tracing: add missing __percpu markups ... Browse Code »

ftrace_event_call->perf_events, perf_trace_buf,
fgraph_data->cpu_data and some local variables are percpu pointers
missing __percpu markups. Add them.

Signed-off-by: Namhyung Kim
Acked-by: Tejun Heo
Cc: Steven Rostedt
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Stephane Eranian
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Namhyung Kim
2010-08-19 07:33:05 +0800

21 Jul, 2010

2 commits

bc289ae98 tracing: Reduce latency and remove percpu trace_seq ... Browse Code »

__print_flags() and __print_symbolic() use percpu trace_seq:

1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
2) It is percpu data and it wastes more memory for multi-cpus system.
3) It disables preemption when it executes its core routine
"trace_seq_printf(s, "%s: ", #call);" and introduces latency.

So we move this trace_seq to struct trace_iterator.

Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt

Lai Jiangshan
2010-07-21 10:05:34 +0800
e870e9a12 tracing: Allow to disable cmdline recording ... Browse Code »

We found that even enabling a single trace event that will rarely be
triggered can add big overhead to context switch.

(lmbench context switch test)
-------------------------------------------------
2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
------ ------ ------ ------ ------ ------- -------
2.19 2.3 2.21 2.56 2.13 2.54 2.07
2.39 2.51 2.35 2.75 2.27 2.81 2.24

The overhead is 6% ~ 11%.

It's because when a trace event is enabled 3 tracepoints (sched_switch,
sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
to allow to disable cmdline recording.

Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt

Li Zefan
2010-07-21 09:52:33 +0800

29 Jun, 2010

1 commit

a1d0ce821 tracing: Use class->reg() for all registering of events ... Browse Code »

Because kprobes and syscalls need special processing to register
events, the class->reg() method was created to handle the differences.

But instead of creating a default ->reg for perf and ftrace events,
the code was scattered with:

if (class->reg)
class->reg();
else
default_reg();

This is messy and can also lead to bugs.

This patch cleans up this code and creates a default reg() entry for
the events allowing for the code to directly call the class->reg()
without the condition.

Reported-by: Peter Zijlstra
Acked-by: Peter Zijlstra
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-06-29 09:13:14 +0800

09 Jun, 2010

1 commit

ecc55f84b perf, trace: Inline perf_swevent_put_recursion_context() ... Browse Code »

Inline perf_swevent_put_recursion_context into perf_tp_event(), this
shrinks the per trace template code footprint and saves a function
call.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-06-09 17:12:33 +0800

28 May, 2010

1 commit

c5617b200 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
tracing: Add __used annotation to event variable
perf, trace: Fix !x86 build bug
perf report: Support multiple events on the TUI
perf annotate: Fix up usage of the build id cache
x86/mmiotrace: Remove redundant instruction prefix checks
perf annotate: Add TUI interface
perf tui: Remove annotate from popup menu after failure
perf report: Don't start the TUI if -D is used
perf: Fix getline undeclared
perf: Optimize perf_tp_event_match()
perf: Remove more code from the fastpath
perf: Optimize the !vmalloc backed buffer
perf: Optimize perf_output_copy()
perf: Fix wakeup storm for RO mmap()s
perf-record: Share per-cpu buffers
perf-record: Remove -M
perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
...

Linus Torvalds
2010-05-28 06:23:47 +0800

21 May, 2010

4 commits

ff5f149b6 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/li… ... Browse Code »

…nux-2.6-tip into trace/tip/tracing/core-7

Conflicts:
include/linux/ftrace_event.h
include/trace/ftrace.h
kernel/trace/trace_event_perf.c
kernel/trace/trace_kprobe.c
kernel/trace/trace_syscalls.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Steven Rostedt
2010-05-21 23:49:57 +0800
33cf23b0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (182 commits)
[SCSI] aacraid: add an ifdef'd device delete case instead of taking the device offline
[SCSI] aacraid: prohibit access to array container space
[SCSI] aacraid: add support for handling ATA pass-through commands.
[SCSI] aacraid: expose physical devices for models with newer firmware
[SCSI] aacraid: respond automatically to volumes added by config tool
[SCSI] fcoe: fix fcoe module ref counting
[SCSI] libfcoe: FIP Keep-Alive messages for VPorts are sent with incorrect port_id and wwn
[SCSI] libfcoe: Fix incorrect MAC address clearing
[SCSI] fcoe: fix a circular locking issue with rtnl and sysfs mutex
[SCSI] libfc: Move the port_id into lport
[SCSI] fcoe: move link speed checking into its own routine
[SCSI] libfc: Remove extra pointer check
[SCSI] libfc: Remove unused fc_get_host_port_type
[SCSI] fcoe: fixes wrong error exit in fcoe_create
[SCSI] libfc: set seq_id for incoming sequence
[SCSI] qla2xxx: Updates to ISP82xx support.
[SCSI] qla2xxx: Optionally disable target reset.
[SCSI] qla2xxx: ensure flash operation and host reset via sg_reset are mutually exclusive
[SCSI] qla2xxx: Silence bogus warning by gcc for wrap and did.
[SCSI] qla2xxx: T10 DIF support added.
...

Linus Torvalds
2010-05-21 22:19:18 +0800
1c024eca5 perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events ... Browse Code »

Avoid the swevent hash-table by using per-tracepoint
hlists.

Also, avoid conditionals on the fast path by ordering
with probe unregister so that we should never get on
the callback path without the data being there.

Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-05-21 17:37:56 +0800
b7e2ecef9 perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction ... Browse Code »

Improves performance.

Acked-by: Frederic Weisbecker
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-05-21 17:37:56 +0800

20 May, 2010

1 commit

dfacc4d6c Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/fred… ... Browse Code »

…eric/random-tracing into perf/core

Ingo Molnar
2010-05-20 20:38:55 +0800

19 May, 2010

1 commit

4f41c013f perf/ftrace: Optimize perf/tracepoint interaction for single events ... Browse Code »

When we've got but a single event per tracepoint
there is no reason to try and multiplex it so don't.

Signed-off-by: Peter Zijlstra
Tested-by: Ingo Molnar
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-05-19 00:35:46 +0800

15 May, 2010

8 commits

1eaa4787a tracing: Comment the use of event_mutex with trace event flags ... Browse Code »

The flags variable is protected by the event_mutex when modifying,
but the event_mutex is not held when reading the variable.

This is due to the fact that the reads occur in critical sections where
taking a mutex (or even a spinlock) is not wanted.

But the two flags that exist (enable and filter_active) have the code
written as such to handle the reads to not need a lock.

The enable flag is used just to know if the event is enabled or not
and its use is always under the event_mutex. Whether or not the event
is actually enabled is really determined by the tracepoint being
registered. The flag is just a way to let the code know if the tracepoint
is registered.

The filter_active is different. It is read without the lock. If it
is set, then the event probes jump to the filter code. There can be a
slight mismatch between filters available and filter_active. If the flag is
set but no filters are available, the code safely jumps to a filter nop.
If the flag is not set and the filters are available, then the filters
are skipped. This is acceptable since filters are usually set before
tracing or they are set by humans, which would not notice the slight
delay that this causes.

v2: Fixed typo: "cacheing" -> "caching"

Reported-by: Mathieu Desnoyers
Acked-by: Mathieu Desnoyers
Cc: Tom Zanussi
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 09:17:41 +0800
553552ce1 tracing: Combine event filter_active and enable into single flags field ... Browse Code »

The filter_active and enable both use an int (4 bytes each) to
set a single flag. We can save 4 bytes per event by combining the
two into a single integer.

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4894944 1018052 861512 6774508 675eec vmlinux.id
4894871 1012292 861512 6768675 674823 vmlinux.flags

This gives us another 5K in savings.

The modification of both the enable and filter fields are done
under the event_mutex, so it is still safe to combine the two.

Note: Although Mathieu gave his Acked-by, he would like it documented
that the reads of flags are not protected by the mutex. The way the
code works, these reads will not break anything, but will have a
residual effect. Since this behavior is the same even before this
patch, describing this situation is left to another patch, as this
patch does not change the behavior, but just brought it to Mathieu's
attention.

v2: Updated the event trace self test to for this change.

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Cc: Tom Zanussi
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:33:22 +0800
32c0edaea tracing: Remove duplicate id information in event structure ... Browse Code »

Now that the trace_event structure is embedded in the ftrace_event_call
structure, there is no need for the ftrace_event_call id field.
The id field is the same as the trace_event type field.

Removing the id and re-arranging the structure brings down the tracepoint
footprint by another 5K.

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4895024 1023812 861512 6780348 6775bc vmlinux.print
4894944 1018052 861512 6774508 675eec vmlinux.id

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:33:15 +0800
80decc70a tracing: Move print functions into event class ... Browse Code »

Currently, every event has its own trace_event structure. This is
fine since the structure is needed anyway. But the print function
structure (trace_event_functions) is now separate. Since the output
of the trace event is done by the class (with the exception of events
defined by DEFINE_EVENT_PRINT), it makes sense to have the class
define the print functions that all events in the class can use.

This makes a bigger deal with the syscall events since all syscall events
use the same class. The savings here is another 30K.

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4900382 1048964 861512 6810858 67ecea vmlinux.init
4900446 1049028 861512 6810986 67ed6a vmlinux.preprint
4895024 1023812 861512 6780348 6775bc vmlinux.print

To accomplish this, and to let the class know what event is being
printed, the event structure is embedded in the ftrace_event_call
structure. This should not be an issues since the event structure
was created for each event anyway.

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:20:34 +0800
a9a577638 tracing: Allow events to share their print functions ... Browse Code »

Multiple events may use the same method to print their data.
Instead of having all events have a pointer to their print funtions,
the trace_event structure now points to a trace_event_functions structure
that will hold the way to print ouf the event.

The event itself is now passed to the print function to let the print
function know what kind of event it should print.

This opens the door to consolidating the way several events print
their output.

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4900382 1048964 861512 6810858 67ecea vmlinux.init
4900446 1049028 861512 6810986 67ed6a vmlinux.preprint

This change slightly increases the size but is needed for the next change.

v3: Fix the branch tracer events to handle this change.

v2: Fix the new function graph tracer event calls to handle this change.

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:20:32 +0800
0405ab80a tracing: Move raw_init from events to class ... Browse Code »

The raw_init function pointer in the event is used to initialize
various kinds of events. The type of initialization needed is usually
classed to the kind of event it is.

Two events with the same class will always have the same initialization
function, so it makes sense to move this to the class structure.

Perhaps even making a special system structure would work since
the initialization is the same for all events within a system.
But since there's no system structure (yet), this will just move it
to the class.

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4900375 1053380 861512 6815267 67fe23 vmlinux.fields
4900382 1048964 861512 6810858 67ecea vmlinux.init

The text grew very slightly, but this is a constant growth that happened
with the changing of the C files that call the init code.
The bigger savings is the data which will be saved the more events share
a class.

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:20:30 +0800
2e33af029 tracing: Move fields from event to class structure ... Browse Code »

Move the defined fields from the event to the class structure.
Since the fields of the event are defined by the class they belong
to, it makes sense to have the class hold the information instead
of the individual events. The events of the same class would just
hold duplicate information.

After this change the size of the kernel dropped another 3K:

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4900252 1057412 861512 6819176 680d68 vmlinux.regs
4900375 1053380 861512 6815267 67fe23 vmlinux.fields

Although the text increased, this was mainly due to the C files
having to adapt to the change. This is a constant increase, where
new tracepoints will not increase the Text. But the big drop is
in the data size (as well as needed allocations to hold the fields).
This will give even more savings as more tracepoints are created.

Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
with several DEFINE_EVENT()s, then the savings will be lost. But
we are pushing developers to consolidate events with DEFINE_EVENT()
so this should not be an issue.

The kprobes define a unique class to every new event, but are dynamic
so it should not be a issue.

The syscalls however have a single class but the fields for the individual
events are different. The syscalls use a metadata to define the
fields. I moved the fields list from the event to the metadata and
added a "get_fields()" function to the class. This function is used
to find the fields. For normal events and kprobes, get_fields() just
returns a pointer to the fields list_head in the class. For syscall
events, it returns the fields list_head in the metadata for the event.

v2: Fixed the syscall fields. The syscall metadata needs a list
of fields for both enter and exit.

Acked-by: Frederic Weisbecker
Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Cc: Tom Zanussi
Cc: Peter Zijlstra
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:20:23 +0800
2239291ae tracing: Remove per event trace registering ... Browse Code »

This patch removes the register functions of TRACE_EVENT() to enable
and disable tracepoints. The registering of a event is now down
directly in the trace_events.c file. The tracepoint_probe_register()
is now called directly.

The prototypes are no longer type checked, but this should not be
an issue since the tracepoints are created automatically by the
macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
other macros will catch it.

The trace_event_class structure now holds the probes to be called
by the callbacks. This removes needing to have each event have
a separate pointer for the probe.

To handle kprobes and syscalls, since they register probes in a
different manner, a "reg" field is added to the ftrace_event_class
structure. If the "reg" field is assigned, then it will be called for
enabling and disabling of the probe for either ftrace or perf. To let
the reg function know what is happening, a new enum (trace_reg) is
created that has the type of control that is needed.

With this new rework, the 82 kernel events and 618 syscall events
has their footprint dramatically lowered:

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4914025 1088868 861512 6864405 68be15 vmlinux.class
4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint
4900252 1057412 861512 6819176 680d68 vmlinux.regs

The size went from 6863829 to 6819176, that's a total of 44K
in savings. With tracepoints being continuously added, this is
critical that the footprint becomes minimal.

v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
specific structure in trace_events.c.

v4: Fixed trace self tests to check probe because regfunc no longer
exists.

v3: Updated to handle void *data in beginning of probe parameters.
Also added the tracepoint: check_trace_callback_type_##call().

v2: Changed the callback probes to pass void * and typecast the
value within the function.

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:19:14 +0800

14 May, 2010

1 commit

8f0820183 tracing: Create class struct for events ... Browse Code »

This patch creates a ftrace_event_class struct that event structs point to.
This class struct will be made to hold information to modify the
events. Currently the class struct only holds the events system name.

This patch slightly increases the size, but this change lays the ground work
of other changes to make the footprint of tracepoints smaller.

With 82 standard tracepoints, and 618 system call tracepoints
(two tracepoints per syscall: enter and exit):

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4914025 1088868 861512 6864405 68be15 vmlinux.class

This patch also cleans up some stale comments in ftrace.h.

v2: Fixed missing semi-colon in macro.

Acked-by: Frederic Weisbecker
Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-14 21:33:49 +0800

01 May, 2010

1 commit

5a2e39959 [SCSI] ftrace: add __print_hex() ... Browse Code »

__print_hex() prints values in an array in hex (w/o '0x') (space separated)
EX) 92 33 32 f3 ee 4d

Signed-off-by: Li Zefan
Signed-off-by: Tomohiro Kusumi
Signed-off-by: Kei Tokunaga
Acked-by: Steven Rostedt
Signed-off-by: James Bottomley

Kei Tokunaga
2010-05-01 01:50:22 +0800

01 Apr, 2010

1 commit

bc21b4784 tracing: Show the lost events in the trace_pipe output ... Browse Code »

Now that the ring buffer can keep track of where events are lost.
Use this information to the output of trace_pipe:

hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
CPU:1 [LOST 673 EVENTS]
hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem

Even works with the function graph tracer:

2) ! 170.098 us | }
2) 4.036 us | rcu_irq_exit();
2) 3.657 us | idle_cpu();
2) ! 190.301 us | }
CPU:2 [LOST 2196 EVENTS]
2) 0.853 us | } /* cancel_dirty_page */
2) | remove_from_page_cache() {
2) 1.578 us | _raw_spin_lock_irq();
2) | __remove_from_page_cache() {

Note, it does not work with the iterator "trace" file, since it requires
the use of consuming the page from the ring buffer to determine how many
events were lost, which the iterator does not do.

Signed-off-by: Steven Rostedt

Steven Rostedt
2010-04-01 10:57:06 +0800

10 Mar, 2010

2 commits

97d5a2200 perf: Drop the obsolete profile naming for trace events ... Browse Code »

Drop the obsolete "profile" naming used by perf for trace events.
Perf can now do more than simple events counting, so generalize
the API naming.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Steven Rostedt
Cc: Masami Hiramatsu
Cc: Jason Baron

Frederic Weisbecker
2010-03-10 21:47:18 +0800
c530665c3 perf: Take a hot regs snapshot for trace events ... Browse Code »

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

Let's use the new perf_fetch_caller_regs() for that.

Comparison with perf record -e lock: -R -a -f -g
Before:

perf [kernel] [k] __do_softirq
|
--- __do_softirq
|
|--55.16%-- __open
|
--44.84%-- __write_nocancel

After:

perf [kernel] [k] perf_tp_event
|
--- perf_tp_event
|
|--41.07%-- lock_acquire
| |
| |--39.36%-- _raw_spin_lock
| | |
| | |--7.81%-- hrtimer_interrupt
| | | smp_apic_timer_interrupt
| | | apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.

Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.

v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: H. Peter Anvin
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Steven Rostedt
Cc: Arnaldo Carvalho de Melo
Cc: Masami Hiramatsu
Cc: Jason Baron
Cc: Archs

Frederic Weisbecker
2010-03-10 21:40:38 +0800

01 Mar, 2010

1 commit

6556a6743 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (172 commits)
perf_event, amd: Fix spinlock initialization
perf_event: Fix preempt warning in perf_clock()
perf tools: Flush maps on COMM events
perf_events, x86: Split PMU definitions into separate files
perf annotate: Handle samples not at objdump output addr boundaries
perf_events, x86: Remove superflous MSR writes
perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()
perf_events, x86: AMD event scheduling
perf_events: Add new start/stop PMU callbacks
perf_events: Report the MMAP pgoff value in bytes
perf annotate: Defer allocating sym_priv->hist array
perf symbols: Improve debugging information about symtab origins
perf top: Use a macro instead of a constant variable
perf symbols: Check the right return variable
perf/scripts: Tag syscall_name helper as not yet available
perf/scripts: Add perf-trace-python Documentation
perf/scripts: Remove unnecessary PyTuple resizes
perf/scripts: Add syscall tracing scripts
perf/scripts: Add Python scripting engine
perf/scripts: Remove check-perf-trace from listed scripts
...

Fix trivial conflict in tools/perf/util/probe-event.c

Linus Torvalds
2010-03-01 02:20:25 +0800

29 Jan, 2010

1 commit

430ad5a60 perf: Factorize trace events raw sample buffer operations ... Browse Code »

Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to
gather the common code that operates on raw events sampling buffer.
This cleans up redundant code between regular trace events, syscall
events and kprobe events.

Changelog v1->v2:
- Rename function name as per Masami and Frederic's suggestion
- Add __kprobes for ftrace_perf_buf_prepare() and make
ftrace_perf_buf_submit() inline as per Masami's suggestion
- Export ftrace_perf_buf_prepare since modules will use it

Signed-off-by: Xiao Guangrong
Acked-by: Masami Hiramatsu
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Paul Mackerras
Cc: Jason Baron
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Xiao Guangrong
2010-01-29 09:02:57 +0800

07 Jan, 2010

2 commits

0fa0edaf3 tracing: Remove show_format and related macros from TRACE_EVENT ... Browse Code »

The previous patches added the use of print_fmt string and changes
the trace_define_field() function to also create the fields and
format output for the event format files.

text data bss dec hex filename
5857201 1355780 9336808 16549789 fc879d vmlinux
5884589 1351684 9337896 16574169 fce6d9 vmlinux-orig

The above shows the size of the vmlinux after this patch set
compared to the vmlinux-orig which is before the patch set.

This saves us 27k on text, 1k on bss and adds just 4k of data.

The total savings of 24k in size.

Signed-off-by: Lai Jiangshan
LKML-Reference:
Acked-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Lai Jiangshan
2010-01-07 01:08:46 +0800
509e760cd tracing: Add print_fmt field ... Browse Code »

This is part of a patch set that removes the show_format method
in the ftrace event macros.

The print_fmt field is added to hold the string that shows
the print_fmt in the event format files. This patch only adds
the field but it is currently not used. Later patches will use
this field to enable us to remove the show_format field
and function.

Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt

Lai Jiangshan
2010-01-07 00:41:54 +0800

28 Dec, 2009

1 commit

07b139c8c perf events: Remove CONFIG_EVENT_PROFILE ... Browse Code »

Quoted from Ingo:

| This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE -
| it's an unnecessary Kconfig complication. If both PERF_EVENTS and
| EVENT_TRACING is enabled we should expose generic tracepoints.
|
| Nor is it limited to event 'profiling', so it has become a misnomer as
| well.

Signed-off-by: Li Zefan
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
2009-12-28 17:33:06 +0800

14 Dec, 2009

3 commits

e00bf2ec6 tracing: Change event->profile_count to be int type ... Browse Code »

Like total_profile_count, struct ftrace_event_call::profile_count
is protected by event_mutex, so it doesn't need to be atomic_t.

Signed-off-by: Li Zefan
Acked-by: Steven Rostedt
Cc: Jason Baron
Cc: Masami Hiramatsu
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Li Zefan
2009-12-14 01:37:28 +0800
614a71a26 tracing: Pull up calls to trace_define_common_fields() ... Browse Code »

Call trace_define_common_fields() in event_create_dir() only.
This avoids trace events to handle it from their define_fields
callbacks and shrinks the kernel code size:

text data bss dec hex filename
5346802 1961864 7103260 14411926 dbe896 vmlinux.o.old
5345151 1961864 7103260 14410275 dbe223 vmlinux.o

Signed-off-by: Li Zefan
Acked-by: Steven Rostedt
Cc: Ingo Molnar
Cc: Jason Baron
Cc: Masami Hiramatsu
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Li Zefan
2009-12-14 01:34:23 +0800
87d9b4e1c tracing: Extract duplicate ftrace_raw_init_event_foo() ... Browse Code »

Use a generic trace_event_raw_init() function for all event's raw_init
callbacks (but kprobes) instead of defining the same version for each
of these.
This shrinks the kernel code:

text data bss dec hex filename
5355293 1961928 7103260 14420481 dc0a01 vmlinux.o.old
5346802 1961864 7103260 14411926 dbe896 vmlinux.o

raw_init can't be removed, because ftrace events and kprobe events
use different raw_init callbacks. Though it's possible to totally
remove raw_init, I choose to leave it as it is for now.

Signed-off-by: Li Zefan
Acked-by: Steven Rostedt
Cc: Jason Baron
Cc: Ingo Molnar
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Li Zefan
2009-12-14 01:34:23 +0800

10 Dec, 2009

1 commit

a63ce5b30 tracing: Buffer the output of seq_file in case of filled buffer ... Browse Code »

If the seq_read fills the buffer it will call s_start again on the next
itertation with the same position. This causes a problem with the
function_graph tracer because it consumes the iteration in order to
determine leaf functions.

What happens is that the iterator stores the entry, and the function
graph plugin will look at the next entry. If that next entry is a return
of the same function and task, then the function is a leaf and the
function_graph plugin calls ring_buffer_read which moves the ring buffer
iterator forward (the trace iterator still points to the function start
entry).

The copying of the trace_seq to the seq_file buffer will fail if the
seq_file buffer is full. The seq_read will not show this entry.
The next read by userspace will cause seq_read to again call s_start
which will reuse the trace iterator entry (the function start entry).
But the function return entry was already consumed. The function graph
plugin will think that this entry is a nested function and not a leaf.

To solve this, the trace code now checks the return status of the
seq_printf (trace_print_seq). If the writing to the seq_file buffer
fails, we set a flag in the iterator (leftover) and we do not reset
the trace_seq buffer. On the next call to s_start, we check the leftover
flag, and if it is set, we just reuse the trace_seq buffer and do not
call into the plugin print functions.

Before this patch:

2) | fput() {
2) | __fput() {
2) 0.550 us | inotify_inode_queue_event();
2) | __fsnotify_parent() {
2) 0.540 us | inotify_dentry_parent_queue_event();

After the patch:

2) | fput() {
2) | __fput() {
2) 0.550 us | inotify_inode_queue_event();
2) 0.548 us | __fsnotify_parent();
2) 0.540 us | inotify_dentry_parent_queue_event();

[
Updated the patch to fix a missing return 0 from the trace_print_seq()
stub when CONFIG_TRACING is disabled.

Reported-by: Ingo Molnar
]

Reported-by: Jiri Olsa
Cc: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-12-10 02:55:26 +0800

22 Nov, 2009

1 commit

ce71b9df8 tracing: Use the perf recursion protection from trace event ... Browse Code »

When we commit a trace to perf, we first check if we are
recursing in the same buffer so that we don't mess-up the buffer
with a recursing trace. But later on, we do the same check from
perf to avoid commit recursion. The recursion check is desired
early before we touch the buffer but we want to do this check
only once.

Then export the recursion protection from perf and use it from
the trace events before submitting a trace.

v2: Put appropriate Reported-by tag

Reported-by: Peter Zijlstra
Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Steven Rostedt
Cc: Masami Hiramatsu
Cc: Jason Baron
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-11-22 16:03:42 +0800

08 Nov, 2009

1 commit

444a2a3bc tracing, perf_events: Protect the buffer from recursion in perf ... Browse Code »

While tracing using events with perf, if one enables the
lockdep:lock_acquire event, it will infect every other perf
trace events.

Basically, you can enable whatever set of trace events through
perf but if this event is part of the set, the only result we
can get is a long list of lock_acquire events of rcu read lock,
and only that.

This is because of a recursion inside perf.

1) When a trace event is triggered, it will fill a per cpu
buffer and submit it to perf.

2) Perf will commit this event but will also protect some data
using rcu_read_lock

3) A recursion appears: rcu_read_lock triggers a lock_acquire
event that will fill the per cpu event and then submit the
buffer to perf.

4) Perf detects a recursion and ignores it

5) Perf continues its work on the previous event, but its buffer
has been overwritten by the lock_acquire event, it has then
been turned into a lock_acquire event of rcu read lock

Such scenario also happens with lock_release with
rcu_read_unlock().

We could turn the rcu_read_lock() into __rcu_read_lock() to drop
the lock debugging from perf fast path, but that would make us
lose the rcu debugging and that doesn't prevent from other
possible kind of recursion from perf in the future.

This patch adds a recursion protection based on a counter on the
perf trace per cpu buffers to solve the problem.

-v2: Fixed lost whitespace, added reviewed-by tag

Signed-off-by: Frederic Weisbecker
Reviewed-by: Masami Hiramatsu
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Steven Rostedt
Cc: Li Zefan
Cc: Jason Baron
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2009-11-08 17:31:42 +0800