07 Jan, 2010
2 commits
-
If the very unlikely case happens where the writer moves the head by one
between where the head page is read and where the new reader page
is assigned _and_ the writer then writes and wraps the entire ring buffer
so that the head page is back to what was originally read as the head page,
the page to be swapped will have a corrupted next pointer.Simple solution is to wrap the assignment of the next pointer with a
rb_list_head().Signed-off-by: Steven Rostedt
-
This reference at the end of rb_get_reader_page() was causing off-by-one
writes to the prev pointer of the page after the reader page when that
page is the head page, and therefore the reader page has the RB_PAGE_HEAD
flag in its list.next pointer. This eventually results in a GPF in a
subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
when that prev pointer is dereferenced. The dereferenced register would
characteristically have an address that appears shifted left by one byte
(eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
an address one byte too high.Signed-off-by: David Sharp
LKML-Reference:
Signed-off-by: Steven Rostedt
17 Dec, 2009
1 commit
-
…nel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix return of trace_dump_stack()
ksym_tracer: Fix bad cast
tracing/power: Remove two exports
tracing: Change event->profile_count to be int type
tracing: Simplify trace_option_write()
tracing: Remove useless trace option
tracing: Use seq file for trace_clock
tracing: Use seq file for trace_options
function-graph: Allow writing the same val to set_graph_function
ftrace: Call trace_parser_clear() properly
ftrace: Return EINVAL when writing invalid val to set_ftrace_filter
tracing: Move a printk out of ftrace_raw_reg_event_foo()
tracing: Pull up calls to trace_define_common_fields()
tracing: Extract duplicate ftrace_raw_init_event_foo()
ftrace.h: Use common pr_info fmt string
tracing: Add stack trace to irqsoff tracer
tracing: Add trace_dump_stack()
ring-buffer: Move resize integrity check under reader lock
ring-buffer: Use sync sched protection on ring buffer resizing
tracing: Fix wrong usage of strstrip in trace_ksyms
15 Dec, 2009
3 commits
-
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Cc: linux-arch@vger.kernel.org -
Further name space cleanup. No functional change
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Cc: linux-arch@vger.kernel.org -
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whateverNo functional change.
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Cc: linux-arch@vger.kernel.org
11 Dec, 2009
2 commits
-
While using an application that does splice on the ftrace ring
buffer at start up, I triggered an integrity check failure.Looking into this, I discovered that resizing the buffer performs
an integrity check after the buffer is resized. This check unfortunately
is preformed after it releases the reader lock. If a reader is
reading the buffer it may cause the integrity check to trigger a
false failure.This patch simply moves the integrity checker under the protection
of the ring buffer reader lock.Signed-off-by: Steven Rostedt
-
There was a comment in the ring buffer code that says the calling
layers should prevent tracing or reading of the ring buffer while
resizing. I have discovered that the tracers do not honor this
arrangement.This patch moves the disabling and synchronizing the ring buffer to
a higher layer during resizing. This guarantees that no writes
are occurring while the resize takes place.Signed-off-by: Steven Rostedt
06 Dec, 2009
2 commits
-
…git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits)
x86: Fix comments of register/stack access functions
perf tools: Replace %m with %a in sscanf
hw-breakpoints: Keep track of user disabled breakpoints
tracing/syscalls: Make syscall events print callbacks static
tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook
perf: Don't free perf_mmap_data until work has been done
perf_event: Fix compile error
perf tools: Fix _GNU_SOURCE macro related strndup() build error
trace_syscalls: Remove unused syscall_name_to_nr()
trace_syscalls: Simplify syscall profile
trace_syscalls: Remove duplicate init_enter_##sname()
trace_syscalls: Add syscall_nr field to struct syscall_metadata
trace_syscalls: Remove enter_id exit_id
trace_syscalls: Set event_enter_##sname->data to its metadata
trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit
perf_event: Initialize data.period in perf_swevent_hrtimer()
perf probe: Simplify event naming
perf probe: Add --list option for listing current probe events
perf probe: Add argv_split() from lib/argv_split.c
perf probe: Move probe event utility functions to probe-event.c
... -
…el/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
tracing: Separate raw syscall from syscall tracer
ring-buffer-benchmark: Add parameters to set produce/consumer priorities
tracing, function tracer: Clean up strstrip() usage
ring-buffer benchmark: Run producer/consumer threads at nice +19
tracing: Remove the stale include/trace/power.h
tracing: Only print objcopy version warning once from recordmcount
tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
ring-buffer: Move access to commit_page up into function used
tracing: do not disable interrupts for trace_clock_local
ring-buffer: Add multiple iterations between benchmark timestamps
kprobes: Sanitize struct kretprobe_instance allocations
tracing: Fix to use __always_unused attribute
compiler: Introduce __always_unused
tracing: Exit with error if a weak function is used in recordmcount.pl
tracing: Move conditional into update_funcs() in recordmcount.pl
tracing: Add regex for weak functions in recordmcount.pl
tracing: Move mcount section search to front of loop in recordmcount.pl
tracing: Fix objcopy revision check in recordmcount.pl
tracing: Check absolute path of input file in recordmcount.pl
tracing: Correct the check for number of arguments in recordmcount.pl
...
17 Nov, 2009
1 commit
-
With the change of the way we process commits. Where a commit only happens
at the outer most level, and that we don't need to worry about
a commit ending after the rb_start_commit() has been called, the code
use to grab the commit page before the tail page to prevent a possible
race. But this race no longer exists with the rb_start_commit()
rb_end_commit() interface.Signed-off-by: Steven Rostedt
15 Nov, 2009
1 commit
-
Merge reason: pick up perf fixlets
Signed-off-by: Ingo Molnar
04 Nov, 2009
2 commits
-
Conflicts:
tools/perf/MakefileMerge reason: Resolve the conflict, merge to upstream and merge in
perf fixes so we can add a dependent patch.Signed-off-by: Ingo Molnar
-
We got a sudden panic when we reduced the size of the
ringbuffer.We can reproduce the panic by the following steps:
echo 1 > events/sched/enable
cat trace_pipe > /dev/null &while ((1))
do
echo 12000 > buffer_size_kb
echo 512 > buffer_size_kb
done(not more than 5 seconds, panic ...)
Reported-by: KOSAKI Motohiro
Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt
24 Oct, 2009
2 commits
-
The cpu argument is not used inside the rb_time_stamp() function.
Plus fix a typo.Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Trivial patch to fix a documentation example and to fix a
comment.Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
06 Oct, 2009
1 commit
-
The sign info used for filters in the kernel is also useful to
applications that process the trace stream. Add it to the format
files and make it available to userspace.Signed-off-by: Tom Zanussi
Acked-by: Frederic Weisbecker
Cc: rostedt@goodmis.org
Cc: lizf@cn.fujitsu.com
Cc: hch@infradead.org
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar
20 Sep, 2009
1 commit
-
fix the following 'make includecheck' warning:
kernel/trace/ring_buffer.c: trace.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput
Cc: Steven Rostedt
Cc: Ingo Molnar
Cc: Sam Ravnborg
LKML-Reference:
14 Sep, 2009
1 commit
-
The cmpxchg used by PowerPC does the following:
({ \
__typeof__(*(ptr)) _o_ = (o); \
__typeof__(*(ptr)) _n_ = (n); \
(__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_, \
(unsigned long)_n_, sizeof(*(ptr))); \
})This does a type check of *ptr to both o and n.
Unfortunately, the code in ring-buffer.c assigns longs to pointers
and pointers to longs and causes a warning on PowerPC:ring_buffer.c: In function 'rb_head_page_set':
ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
ring_buffer.c:704: warning: initialization makes pointer from integer without a cast
ring_buffer.c: In function 'rb_head_page_replace':
ring_buffer.c:797: warning: initialization makes integer from pointer without a castThis patch adds the typecasts inside cmpxchg to annotate that a long is
being cast to a pointer and a pointer is being casted to a long and this
removes the PowerPC warnings.Reported-by: Stephen Rothwell
Signed-off-by: Steven Rostedt
10 Sep, 2009
1 commit
-
rb_buffer_peek() operates with struct ring_buffer_per_cpu *cpu_buffer
only. Thus, instead of passing variables buffer and cpu it is better
to use cpu_buffer directly. This also reduces the risk of races since
cpu_buffer is not calculated twice.Signed-off-by: Robert Richter
LKML-Reference:
Signed-off-by: Steven Rostedt
05 Sep, 2009
2 commits
-
Since the ability to swap the cpu buffers adds a small overhead to
the recording of a trace, we only want to add it when needed.Only the irqsoff and preemptoff tracers use this feature, and both are
not recommended for production kernels. This patch disables its use
when neither irqsoff nor preemptoff is configured.Signed-off-by: Steven Rostedt
-
Because the irqsoff tracer can swap an internal CPU buffer, it is possible
that a swap happens between the start of the write and before the committing
bit is set (the committing bit will disable swapping).This patch adds a check for this and will fail the write if it detects it.
Signed-off-by: Steven Rostedt
04 Sep, 2009
7 commits
-
Currently the way RB_WARN_ON works, is to disable either the current
CPU buffer or all CPU buffers, depending on whether a ring_buffer or
ring_buffer_per_cpu struct was passed into the macro.Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
CPU buffer gets disabled but the rest are still active. This may
confuse users even though a warning is sent to the console.This patch changes the macro to disable the entire buffer even if
the CPU buffer is passed in.Signed-off-by: Steven Rostedt
-
The latency tracers report the number of items in the trace buffer.
This uses the ring buffer data to calculate this. Because discarded
events are also counted, the numbers do not match the number of items
that are printed. The ring buffer also adds a "padding" item to the
end of each buffer page which also gets counted as a discarded item.This patch decrements the counter to the page entries on a discard.
This allows us to ignore discarded entries while reading the buffer.Decrementing the counter is still safe since it can only happen while
the committing flag is still set.Signed-off-by: Steven Rostedt
-
The function ring_buffer_event_discard can be used on any item in the
ring buffer, even after the item was committed. This function provides
no safety nets and is very race prone.An item may be safely removed from the ring buffer before it is committed
with the ring_buffer_discard_commit.Since there are currently no users of this function, and because this
function is racey and error prone, this patch removes it altogether.Note, removing this function also allows the counters to ignore
all discarded events (patches will follow).Signed-off-by: Steven Rostedt
-
When the ring buffer uses an iterator (static read mode, not on the
fly reading), when it crosses a page boundery, it will skip the first
entry on the next page. The reason is that the last entry of a page
is usually padding if the page is not full. The padding will not be
returned to the user.The problem arises on ring_buffer_read because it also increments the
iterator. Because both the read and peek use the same rb_iter_peek,
the rb_iter_peak will return the padding but also increment to the next
item. This is because the ring_buffer_peek will not incerment it
itself.The ring_buffer_read will increment it again and then call rb_iter_peek
again to get the next item. But that will be the second item, not the
first one on the page.The reason this never showed up before, is because the ftrace utility
always calls ring_buffer_peek first and only uses ring_buffer_read
to increment to the next item. The ring_buffer_peek will always keep
the pointer to a valid item and not padding. This just hid the bug.Signed-off-by: Steven Rostedt
-
The loops in the ring buffer that use cpu_relax are not dependent on
other CPUs. They simply came across some padding in the ring buffer and
are skipping over them. It is a normal loop and does not require a
cpu_relax.Signed-off-by: Steven Rostedt
-
If a commit is taking place on a CPU ring buffer, do not allow it to
be swapped. Return -EBUSY when this is detected instead.Signed-off-by: Steven Rostedt
-
The callers of reset must ensure that no commit can be taking place
at the time of the reset. If it does then we may corrupt the ring buffer.Signed-off-by: Steven Rostedt
11 Aug, 2009
1 commit
-
Conflicts:
kernel/trace/trace_events_filter.cWe use the tracing/core version.
Signed-off-by: Ingo Molnar
08 Aug, 2009
1 commit
-
I noticed oprofile memleaked in linux-2.6 current tree,
and tracked this ring-buffer leak.Signed-off-by: Eric Dumazet
LKML-Reference:
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt
06 Aug, 2009
3 commits
-
When calling rb_buffer_peek() from ring_buffer_consume() and a
padding event is returned, the function rb_advance_reader() is
called twice. This may lead to missing samples or under high
workloads to the warning below. This patch fixes this. If a padding
event is returned by rb_buffer_peek() it will be consumed by the
calling function now.Also, I simplified some code in ring_buffer_consume().
------------[ cut here ]------------
WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
Hardware name: Anaheim
Modules linked in:
Pid: 29, comm: events/2 Tainted: G W 2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
Call Trace:
[] ? rb_advance_reader+0x2e/0xc5
[] warn_slowpath_common+0x77/0x8f
[] warn_slowpath_null+0xf/0x11
[] rb_advance_reader+0x2e/0xc5
[] ring_buffer_consume+0xa0/0xd2
[] op_cpu_buffer_read_entry+0x21/0x9e
[] ? __find_get_block+0x4b/0x165
[] sync_buffer+0xa5/0x401
[] ? __find_get_block+0x4b/0x165
[] ? wq_sync_buffer+0x0/0x78
[] wq_sync_buffer+0x5b/0x78
[] worker_thread+0x113/0x1ac
[] ? autoremove_wake_function+0x0/0x38
[] ? worker_thread+0x0/0x1ac
[] kthread+0x88/0x92
[] child_rip+0xa/0x20
[] ? kthread+0x0/0x92
[] ? child_rip+0x0/0x20
---[ end trace f561c0a58fcc89bd ]---Cc: Steven Rostedt
Cc:
Signed-off-by: Robert Richter
Signed-off-by: Ingo Molnar -
The commit:
commit e0fdace10e75dac67d906213b780ff1b1a4cc360
Author: David Miller
Date: Fri Aug 1 01:11:22 2008 -0700debug_locks: set oops_in_progress if we will log messages.
Otherwise lock debugging messages on runqueue locks can deadlock the
system due to the wakeups performed by printk().Signed-off-by: David S. Miller
Signed-off-by: Ingo MolnarWill permanently set oops_in_progress on any lockdep failure.
When this triggers it will cause any read from the ring buffer to
permanently disable the ring buffer (not to mention no locking of
printk).This patch removes the check. It keeps the print in NMI which makes
sense. This is probably OK, since the ring buffer should not cause
something to set oops_in_progress anyway.Signed-off-by: Steven Rostedt
-
The function ring_buffer_discard_commit inversed the code path
of the result of try_to_discard. It should skip incrementing the
entry counter if try_to_discard succeeded. But instead, it increments
the entry conder if it succeeded to discard, and does not increment
it if it fails.The result of this bug is that filtering will make the stat counters
incorrect.Signed-off-by: Steven Rostedt
17 Jul, 2009
1 commit
-
kernel/trace/ring_buffer.c: In function 'rb_tail_page_update':
kernel/trace/ring_buffer.c:849: warning: value computed is not used
kernel/trace/ring_buffer.c:850: warning: value computed is not usedAdd "(void)"s to fix this warning, because we don't need here to handle
the fail case of cmpxchg, it's fine if an interrupt already did the
job.Changed from V1:
Add a comment(which is written by Steven) for it.Signed-off-by: Lai Jiangshan
Acked-by: Steven Rostedt
Signed-off-by: Frederic Weisbecker
08 Jul, 2009
2 commits
-
This patch converts the ring buffers into a completely lockless
buffer recording system. The read side still takes locks since
we still serialize readers. But the writers are the ones that
must be lockless (those can happen in NMIs).The main change is to the "head_page" pointer. We write to the
tail, and read from the head. The "head_page" pointer in the cpu
buffer is now just a reference to where to look. The real head
page is now kept in the head_page->list->prev->next pointer.
That is, in the list head of the previous page we set flags.The list pages are allocated to be aligned such that the lowest
significant bits are always zero pointing to the list. This gives
us play to put in flags to their pointers.bit 0: set when the page is a head page
bit 1: set when the writer is moving the page (for overwrite mode)cmpxchg is used to update the pointer.
When the writer wraps the buffer and the tail meets the head,
in overwrite mode, the writer must move the head page forward.
It first uses cmpxchg to change the pointer flag from 1 to 2.
Once this is done, the reader on another CPU will not take the
page from the buffer.The writers need to protect against interrupts (we don't bother with
disabling interrupts because NMIs are allowed to write too).After the writer sets the pointer flag to 2, it takes care to
manage interrupts coming in. This is discribed in detail within the
comments of the code.Changes in version 2:
- Let reader reset entries value of header page.
- Fix tail page passing commit page on reader page test.
- Always increment entries and write counter in rb_tail_page_update
- Add safety check in rb_set_commit_to_write to break out of infinite loop
- add mask in rb_is_reader_page[ Impact: lock free writing to the ring buffer ]
Signed-off-by: Steven Rostedt
-
This patch changes the ring buffer data pages from using a link list
head pointer, to making each buffer page point to another buffer page
and never back to a "head".This makes the handling of the ring buffer less complex, since the
traversing of the ring buffer pages no longer needs to account for the
head pointer.This change also is needed to make the ring buffer lockless.
[
Changes in version 2:- Added change that Lai Jiangshan mentioned.
From: Lai Jiangshan
Date: Thu, 11 Jun 2009 11:25:48 +0800
LKML-Reference:I'm not sure whether these 4 lines:
bpage = list_entry(pages.next, struct buffer_page, list);
list_del_init(&bpage->list);
cpu_buffer->pages = &bpage->list;list_splice(&pages, cpu_buffer->pages);
equal to these 2 lines:
cpu_buffer->pages = pages.next;
list_del(&pages);If there are equivalent, I think the second one
are simpler. It may be not a really necessarily cleanup.What I asked is: if there are equivalent, could you use these two line:
cpu_buffer->pages = pages.next;
list_del(&pages);
][ Impact: simplify the ring buffer to help make it lockless ]
Signed-off-by: Steven Rostedt
25 Jun, 2009
1 commit
-
In hunting down the cause for the hwlat_detector ring buffer spew in
my failed -next builds it became obvious that folks are now treating
ring_buffer as something that is generic independent of tracing and thus,
suitable for public driver consumption.Given that there are only a few minor areas in ring_buffer that have any
reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
those and make it generally available.Signed-off-by: Paul Mundt
Cc: Jon Masters
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar
21 Jun, 2009
1 commit
-
…nel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
tracing/urgent: warn in case of ftrace_start_up inbalance
tracing/urgent: fix unbalanced ftrace_start_up
function-graph: add stack frame test
function-graph: disable when both x86_32 and optimize for size are configured
ring-buffer: have benchmark test print to trace buffer
ring-buffer: do not grab locks in nmi
ring-buffer: add locks around rb_per_cpu_empty
ring-buffer: check for less than two in size allocation
ring-buffer: remove useless compile check for buffer_page size
ring-buffer: remove useless warn on check
ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
tracing: update sample event documentation
tracing/filters: fix race between filter setting and module unload
tracing/filters: free filter_string in destroy_preds()
ring-buffer: use commit counters for commit pointer accounting
ring-buffer: remove unused variable
ring-buffer: have benchmark test handle discarded events
ring-buffer: prevent adding write in discarded area
tracing/filters: strloc should be unsigned short
tracing/filters: operand can be negative
...Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
18 Jun, 2009
1 commit
-
If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it
will need to read from the ring buffer. But the read side of the
ring buffer still takes locks. This patch adds a check on the read
side that if it is in an NMI, then it will disable the ring buffer
and not take any locks.Reads can still happen on a disabled ring buffer.
Signed-off-by: Steven Rostedt