Eric Lee / smarc-fsl-linux-kernel

03 Aug, 2017

1 commit

a7e52ad7e ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU ... Browse Code »

Chunyu Hu reported:
"per_cpu trace directories and files are created for all possible cpus,
but only the cpus which have ever been on-lined have their own per cpu
ring buffer (allocated by cpuhp threads). While trace_buffers_open, the
open handler for trace file 'trace_pipe_raw' is always trying to access
field of ring_buffer_per_cpu, and would panic with the NULL pointer.

Align the behavior of trace_pipe_raw with trace_pipe, that returns -NODEV
when openning it if that cpu does not have trace ring buffer.

Reproduce:
cat /sys/kernel/debug/tracing/per_cpu/cpu31/trace_pipe_raw
(cpu31 is never on-lined, this is a 16 cores x86_64 box)

Tested with:
1) boot with maxcpus=14, read trace_pipe_raw of cpu15.
Got -NODEV.
2) oneline cpu15, read trace_pipe_raw of cpu15.
Get the raw trace data.

Call trace:
[ 5760.950995] RIP: 0010:ring_buffer_alloc_read_page+0x32/0xe0
[ 5760.961678] tracing_buffers_read+0x1f6/0x230
[ 5760.962695] __vfs_read+0x37/0x160
[ 5760.963498] ? __vfs_read+0x5/0x160
[ 5760.964339] ? security_file_permission+0x9d/0xc0
[ 5760.965451] ? __vfs_read+0x5/0x160
[ 5760.966280] vfs_read+0x8c/0x130
[ 5760.967070] SyS_read+0x55/0xc0
[ 5760.967779] do_syscall_64+0x67/0x150
[ 5760.968687] entry_SYSCALL64_slow_path+0x25/0x25"

This was introduced by the addition of the feature to reuse reader pages
instead of re-allocating them. The problem is that the allocation of a
reader page (which is per cpu) does not check if the cpu is online and set
up for the ring buffer.

Link: http://lkml.kernel.org/r/1500880866-1177-1-git-send-email-chuhu@redhat.com

Cc: stable@vger.kernel.org
Fixes: 73a757e63114 ("ring-buffer: Return reader page back into existing ring buffer")
Reported-by: Chunyu Hu
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2017-08-03 02:23:02 +0800

01 May, 2017

1 commit

73a757e63 ring-buffer: Return reader page back into existing ring buffer ... Browse Code »

When reading the ring buffer for consuming, it is optimized for splice,
where a page is taken out of the ring buffer (zero copy) and sent to the
reading consumer. When the read is finished with the page, it calls
ring_buffer_free_read_page(), which simply frees the page. The next time the
reader needs to get a page from the ring buffer, it must call
ring_buffer_alloc_read_page() which allocates and initializes a reader page
for the ring buffer to be swapped into the ring buffer for a new filled page
for the reader.

The problem is that there's no reason to actually free the page when it is
passed back to the ring buffer. It can hold it off and reuse it for the next
iteration. This completely removes the interaction with the page_alloc
mechanism.

Using the trace-cmd utility to record all events (causing trace-cmd to
require reading lots of pages from the ring buffer, and calling
ring_buffer_alloc/free_read_page() several times), and also assigning a
stack trace trigger to the mm_page_alloc event, we can see how many times
the ring_buffer_alloc_read_page() needed to allocate a page for the ring
buffer.

Before this change:

# trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
# trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
9968

After this change:

# trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
# trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
4

Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2017-05-01 22:26:40 +0800

02 Mar, 2017

1 commit

ae7e81c07 sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> ... Browse Code »

We are going to move scheduler ABI details to ,
which will be used from a number of .c files.

Create empty placeholder header that maps to .

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:27 +0800

04 Nov, 2015

3 commits

54ed14440 ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark ... Browse Code »

wake_up_process() has a memory barrier before doing anything, thus adding a
memory barrier before calling it is redundant.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2015-11-04 05:19:02 +0800
f47cb66df ring_buffer: Fix more races when terminating the producer in the benchmark ... Browse Code »

The commit b44754d8262d3aab8 ("ring_buffer: Allow to exit the ring
buffer benchmark immediately") added a hack into ring_buffer_producer()
that set @kill_test when kthread_should_stop() returned true. It improved
the situation a lot. It stopped the kthread in most cases because
the producer spent most of the time in the patched while cycle.

But there are still few possible races when kthread_should_stop()
is set outside of the cycle. Then we do not set @kill_test and
some other checks pass.

This patch adds a better fix. It renames @test_kill/TEST_KILL() into
a better descriptive @test_error/TEST_ERROR(). Also it introduces
break_test() function that checks for both @test_error and
kthread_should_stop().

The new function is used in the producer when the check for @test_error
is not enough. It is not used in the consumer because its state
is manipulated by the producer via the "reader_finish" variable.

Also we add a missing check into ring_buffer_producer_thread()
between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
Otherwise, we might miss a wakeup from kthread_stop().

Link: http://lkml.kernel.org/r/1441629518-32712-3-git-send-email-pmladek@suse.com

Signed-off-by: Petr Mladek
Signed-off-by: Steven Rostedt

Petr Mladek
2015-11-04 05:03:45 +0800
8b46ff693 ring_buffer: Do no not complete benchmark reader too early ... Browse Code »

It seems that complete(&read_done) might be called too early
in some situations.

1st scenario:
-------------

CPU0 CPU1

ring_buffer_producer_thread()
wake_up_process(consumer);
wait_for_completion(&read_start);

ring_buffer_consumer_thread()
complete(&read_start);

ring_buffer_producer()
# producing data in
# the do-while cycle

ring_buffer_consumer();
# reading data
# got error
# set kill_test = 1;
set_current_state(
TASK_INTERRUPTIBLE);
if (reader_finish) # false
schedule();

# producer still in the middle of
# do-while cycle
if (consumer && !(cnt % wakeup_interval))
wake_up_process(consumer);

# spurious wakeup
while (!reader_finish &&
!kill_test)
# leaving because
# kill_test == 1
reader_finish = 0;
complete(&read_done);

1st BANG: We might access uninitialized "read_done" if this is the
the first round.

# producer finally leaving
# the do-while cycle because kill_test == 1;

if (consumer) {
reader_finish = 1;
wake_up_process(consumer);
wait_for_completion(&read_done);

2nd BANG: This will never complete because consumer already did
the completion.

2nd scenario:
-------------

CPU0 CPU1

ring_buffer_producer_thread()
wake_up_process(consumer);
wait_for_completion(&read_start);

ring_buffer_consumer_thread()
complete(&read_start);

ring_buffer_producer()
# CPU3 removes the module we never go
# into the top level while()
reader_finish = 0;
complete(&read_done);

# producer still in the middle of
# do-while cycle
if (consumer && !(cnt % wakeup_interval))
wake_up_process(consumer);

# spurious wakeup
while (!reader_finish &&
!kill_test)
# leaving because kill_test == 1
reader_finish = 0;
complete(&read_done);

BANG: We are in the same "bang" situations as in the 1st scenario.

Root of the problem:
--------------------

ring_buffer_consumer() must complete "read_done" only when "reader_finish"
variable is set. It must not be skipped due to other conditions.

Note that we still must keep the check for "reader_finish" in a loop
because there might be spurious wakeups as described in the
above scenarios.

Solution:
----------

The top level cycle in ring_buffer_consumer() will finish only when
"reader_finish" is set. The data will be read in "while-do" cycle
so that they are not read after an error (kill_test == 1)
or a spurious wake up.

In addition, "reader_finish" is manipulated by the producer thread.
Therefore we add READ_ONCE() to make sure that the fresh value is
read in each cycle. Also we add the corresponding barrier
to synchronize the sleep check.

Next we set the state back to TASK_RUNNING for the situation where we
did not sleep.

Just from paranoid reasons, we initialize both completions statically.
This is safer, in case there are other races that we are unaware of.

As a side effect we could remove the memory barrier from
ring_buffer_producer_thread(). IMHO, this was the reason for
the barrier. ring_buffer_reset() uses spin locks that should
provide the needed memory barrier for using the buffer.

Link: http://lkml.kernel.org/r/1441629518-32712-2-git-send-email-pmladek@suse.com

Signed-off-by: Petr Mladek
Signed-off-by: Steven Rostedt

Petr Mladek
2015-11-04 05:03:24 +0800

27 Jun, 2015

1 commit

e38260825 Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"This patch series contains several clean ups and even a new trace
clock "monitonic raw". Also some enhancements to make the ring buffer
even faster. But the biggest and most noticeable change is the
renaming of the ftrace* files, structures and variables that have to
deal with trace events.

Over the years I've had several developers tell me about their
confusion with what ftrace is compared to events. Technically,
"ftrace" is the infrastructure to do the function hooks, which include
tracing and also helps with live kernel patching. But the trace
events are a separate entity altogether, and the files that affect the
trace events should not be named "ftrace". These include:

include/trace/ftrace.h -> include/trace/trace_events.h
include/linux/ftrace_event.h -> include/linux/trace_events.h

Also, functions that are specific for trace events have also been renamed:

ftrace_print_*() -> trace_print_*()
(un)register_ftrace_event() -> (un)register_trace_event()
ftrace_event_name() -> trace_event_name()
ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
ftrace_define_fields_##call() -> trace_define_fields_##call()
ftrace_get_offsets_##call() -> trace_get_offsets_##call()

Structures have been renamed:

ftrace_event_file -> trace_event_file
ftrace_event_{call,class} -> trace_event_{call,class}
ftrace_event_buffer -> trace_event_buffer
ftrace_subsystem_dir -> trace_subsystem_dir
ftrace_event_raw_##call -> trace_event_raw_##call
ftrace_event_data_offset_##call-> trace_event_data_offset_##call
ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call

And a few various variables and flags have also been updated.

This has been sitting in linux-next for some time, and I have not
heard a single complaint about this rename breaking anything. Mostly
because these functions, variables and structures are mostly internal
to the tracing system and are seldom (if ever) used by anything
external to that"

* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
ring_buffer: Allow to exit the ring buffer benchmark immediately
ring-buffer-benchmark: Fix the wrong type
ring-buffer-benchmark: Fix the wrong param in module_param
ring-buffer: Add enum names for the context levels
ring-buffer: Remove useless unused tracing_off_permanent()
ring-buffer: Give NMIs a chance to lock the reader_lock
ring-buffer: Add trace_recursive checks to ring_buffer_write()
ring-buffer: Allways do the trace_recursive checks
ring-buffer: Move recursive check to per_cpu descriptor
ring-buffer: Add unlikelys to make fast path the default
tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
tracing: Rename ftrace_event_name() to trace_event_name()
tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
...

Linus Torvalds
2015-06-27 05:02:43 +0800

16 Jun, 2015

1 commit

b44754d82 ring_buffer: Allow to exit the ring buffer benchmark immediately ... Browse Code »

It takes a while until the ring_buffer_benchmark module is removed
when the ring buffer hammer is running. It is because it takes
few seconds and kthread_should_stop() is not being checked.

This patch adds the check for kthread termination into the producer.
It uses the existing @kill_test flag to finish the kthreads as
cleanly as possible.

It disables printing the "ERROR" message when the kthread is going.

It makes sure that producer does not go into the 10sec sleep
when it is being killed.

Finally, it does not call wait_to_die() when kthread_should_stop()
already returns true.

Link: http://lkml.kernel.org/r/20150615155428.GD3135@pathway.suse.cz

Signed-off-by: Petr Mladek
Signed-off-by: Steven Rostedt

Petr Mladek
2015-06-16 00:03:12 +0800

11 Jun, 2015

3 commits

108029323 ring-buffer-benchmark: Fix the wrong sched_priority of producer ... Browse Code »

The producer should be used producer_fifo as its sched_priority,
so correct it.

Link: http://lkml.kernel.org/r/1433923957-67842-1-git-send-email-long.wanglong@huawei.com

Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Wang Long
Signed-off-by: Steven Rostedt

Wang Long
2015-06-11 21:27:58 +0800
33d657d13 ring-buffer-benchmark: Fix the wrong type ... Browse Code »

The macro 'module_param' shows that the type of the
variable disable_reader and write_iteration is unsigned
integer. so, we change their type form int to unsigned int.

Link: http://lkml.kernel.org/r/1433923927-67782-1-git-send-email-long.wanglong@huawei.com

Signed-off-by: Wang Long
Signed-off-by: Steven Rostedt

Wang Long
2015-06-11 03:45:22 +0800
7364e8654 ring-buffer-benchmark: Fix the wrong param in module_param ... Browse Code »

The {producer|consumer}_{nice|fifo} parameters are integer
type, we should use 'int' as the second param in module_param.

For example(consumer_fifo):
the default value of consumer_fifo is -1.
Without this patch:
# cat /sys/module/ring_buffer_benchmark/parameters/consumer_fifo
4294967295
With this patch:
# cat /sys/module/ring_buffer_benchmark/parameters/consumer_fifo
-1

Link: http://lkml.kernel.org/r/1433923873-67712-1-git-send-email-long.wanglong@huawei.com

Signed-off-by: Wang Long
Signed-off-by: Steven Rostedt

Wang Long
2015-06-11 03:44:35 +0800

29 Jan, 2015

1 commit

da194930e trace: Use 64-bit timekeeping ... Browse Code »

The ring_buffer_producer uses 'struct timeval' to measure
its start and end times. 'struct timeval' on 32-bit systems
will have its tv_sec value overflow in year 2038 and beyond.
This patch replaces struct timeval with 'ktime_t' which uses
64-bit representation for nanoseconds.

Link: http://lkml.kernel.org/r/20150128141611.GA2701@tinar

Suggested-by: Arnd Bergmann
Suggested-by: Steven Rostedt
Signed-off-by: Tina Ruchandani
Signed-off-by: Steven Rostedt

Tina Ruchandani
2015-01-29 00:02:05 +0800

19 Sep, 2014

1 commit

f139caf2e sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() ... Browse Code »

schedule(), io_schedule() and schedule_timeout() always return
with TASK_RUNNING state set, so one more setting is unnecessary.

(All places in patch are visible good, only exception is
kiblnd_scheduler() from:

drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c

Its schedule() is one line above standard 3 lines of unified diff)

No places where set_current_state() is used for mb().

Signed-off-by: Kirill Tkhai
Signed-off-by: Peter Zijlstra (Intel)
Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
Cc: Alasdair Kergon
Cc: Anil Belur
Cc: Arnd Bergmann
Cc: Dave Kleikamp
Cc: David Airlie
Cc: David Howells
Cc: Dmitry Eremin
Cc: Frank Blaschka
Cc: Greg Kroah-Hartman
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Isaac Huang
Cc: James E.J. Bottomley
Cc: James E.J. Bottomley
Cc: J. Bruce Fields
Cc: Jeff Dike
Cc: Jesper Nilsson
Cc: Jiri Slaby
Cc: Laura Abbott
Cc: Liang Zhen
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Masaru Nomura
Cc: Michael Opdenacker
Cc: Mikael Starvik
Cc: Mike Snitzer
Cc: Neil Brown
Cc: Oleg Drokin
Cc: Peng Tao
Cc: Richard Weinberger
Cc: Robert Love
Cc: Steven Rostedt
Cc: Trond Myklebust
Cc: Ursula Braun
Cc: Zi Shen Lim
Cc: devel@driverdev.osuosl.org
Cc: dm-devel@redhat.com
Cc: dri-devel@lists.freedesktop.org
Cc: fcoe-devel@open-fcoe.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-afs@lists.infradead.org
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-raid@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: qla2xxx-upstream@qlogic.com
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: user-mode-linux-user@lists.sourceforge.net
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-09-19 18:35:17 +0800

27 Feb, 2014

1 commit

2b3942e4b trace: Replace hardcoding of 19 with MAX_NICE ... Browse Code »

Use MAX_NICE instead of the value 19 for ring_buffer_benchmark.

Signed-off-by: Dongsheng Yang
Signed-off-by: Peter Zijlstra
Acked-by: Steven Rostedt
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/r/1393251121-25534-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar

Dongsheng Yang
2014-02-27 19:41:03 +0800

15 Jun, 2011

1 commit

7ea590640 tracing: Use NUMA allocation for per-cpu ring buffer pages ... Browse Code »

The tracing ring buffer is a group of per-cpu ring buffers where
allocation and logging is done on a per-cpu basis. The events that are
generated on a particular CPU are logged in the corresponding buffer.
This is to provide wait-free writes between CPUs and good NUMA node
locality while accessing the ring buffer.

However, the allocation routines consider NUMA locality only for buffer
page metadata and not for the actual buffer page. This causes the pages
to be allocated on the NUMA node local to the CPU where the allocation
routine is running at the time.

This patch fixes the problem by using a NUMA node specific allocation
routine so that the pages are allocated from a NUMA node local to the
logging CPU.

I tested with the getuid_microbench from autotest. It is a simple binary
that calls getuid() in a loop and measures the average time for the
syscall to complete. The following command was used to test:
$ getuid_microbench 1000000

Compared the numbers found on kernel with and without this patch and
found that logging latency decreases by 30-50 ns/call.
tracing with non-NUMA allocation - 569 ns/call
tracing with NUMA allocation - 512 ns/call

Signed-off-by: Vaibhav Nagarnaik
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Michael Rubin
Cc: David Sharp
Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt

Vaibhav Nagarnaik
2011-06-15 10:04:39 +0800

28 Apr, 2010

1 commit

a838b2e63 ring-buffer: Make benchmark handle missed events ... Browse Code »

With the addition of the "missed events" flags that is stored in the
commit field of the ring buffer page, the ring_buffer_benchmark
was not updated to handle this. If events are missed, then the
missed events flag is set in the ring buffer page, the benchmark
will count that flag as part of the size of the page and will hit the BUG()
when it tries to read beyond the page.

The solution is simply to have the ring buffer benchmark mask off
the extra bits.

Reported-by: Ingo Molnar
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-04-28 01:26:58 +0800

01 Apr, 2010

1 commit

66a8cb95e ring-buffer: Add place holder recording of dropped events ... Browse Code »

Currently, when the ring buffer drops events, it does not record
the fact that it did so. It does inform the writer that the event
was dropped by returning a NULL event, but it does not put in any
place holder where the event was dropped.

This is not a trivial thing to add because the ring buffer mostly
runs in overwrite (flight recorder) mode. That is, when the ring
buffer is full, new data will overwrite old data.

In a produce/consumer mode, where new data is simply dropped when
the ring buffer is full, it is trivial to add the placeholder
for dropped events. When there's more room to write new data, then
a special event can be added to notify the reader about the dropped
events.

But in overwrite mode, any new write can overwrite events. A place
holder can not be inserted into the ring buffer since there never
may be room. A reader could also come in at anytime and miss the
placeholder.

Luckily, the way the ring buffer works, the read side can find out
if events were lost or not, and how many events. Everytime a write
takes place, if it overwrites the header page (the next read) it
updates a "overrun" variable that keeps track of the number of
lost events. When a reader swaps out a page from the ring buffer,
it can record this number, perfom the swap, and then check to
see if the number changed, and take the diff if it has, which would be
the number of events dropped. This can be stored by the reader
and returned to callers of the reader.

Since the reader page swap will fail if the writer moved the head
page since the time the reader page set up the swap, this gives room
to record the overruns without worrying about races. If the reader
sets up the pages, records the overrun, than performs the swap,
if the swap succeeds, then the overrun variable has not been
updated since the setup before the swap.

For binary readers of the ring buffer, a flag is set in the header
of each sub page (sub buffer) of the ring buffer. This flag is embedded
in the size field of the data on the sub buffer, in the 31st bit (the size
can be 32 or 64 bits depending on the architecture), but only 27
bits needs to be used for the actual size (less actually).

We could add a new field in the sub buffer header to also record the
number of events dropped since the last read, but this will change the
format of the binary ring buffer a bit too much. Perhaps this change can
be made if the information on the number of events dropped is considered
important enough.

Note, the notification of dropped events is only used by consuming reads
or peeking at the ring buffer. Iterating over the ring buffer does not
keep this information because the necessary data is only available when
a page swap is made, and the iterator does not swap out pages.

Cc: Robert Richter
Cc: Andi Kleen
Cc: Li Zefan
Cc: Arnaldo Carvalho de Melo
Cc: "Luis Claudio R. Goncalves"
Cc: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-04-01 10:57:04 +0800

05 Jan, 2010

1 commit

79615760f local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c ... Browse Code »

ringbuffer*.c are the last users of local.h.

Remove the include from modules.h and add it to ringbuffer files.

Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
2010-01-05 14:34:50 +0800

26 Nov, 2009

1 commit

7ac074340 ring-buffer-benchmark: Add parameters to set produce/consumer priorities ... Browse Code »

Running the ring-buffer-benchmark's threads at the lowest priority may
work well for keeping it in the background, but it is not appropriate
for the benchmarks.

This patch adds 4 parameters to the module:

consumer_fifo
consumer_nice
producer_fifo
producer_nice

By default the consumer and producer still run at nice +19.

If the *_fifo options are set, they will override the *_nice values.

modprobe ring_buffer_benchmark consumer_nice=0 producer_fifo=10

The above will set the consumer thread to a nice value of 0, and
the producer thread to a RT SCHED_FIFO priority of 10.

Note, this patch also fixes a bug where calling set_user_nice on the
consumer thread would oops the kernel when the parameter "disable_reader"
is set.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-11-26 03:14:15 +0800

23 Nov, 2009

1 commit

98e4833ba ring-buffer benchmark: Run producer/consumer threads at nice +19 ... Browse Code »

The ring-buffer benchmark threads run on nice 0 by default, using
up a lot of CPU time and slowing down the system:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1024 root 20 0 0 0 0 D 95.3 0.0 4:01.67 rb_producer
1023 root 20 0 0 0 0 R 93.5 0.0 2:54.33 rb_consumer
21569 mingo 40 0 14852 1048 772 R 3.6 0.1 0:00.05 top
1 root 40 0 4080 928 668 S 0.0 0.0 0:23.98 init

Renice them to +19 to make them less intrusive.

Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Mike Galbraith
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-11-23 15:03:09 +0800

12 Nov, 2009

1 commit

a6f0eb6ad ring-buffer: Add multiple iterations between benchmark timestamps ... Browse Code »

The ring_buffer_benchmark does a gettimeofday after every write to the
ring buffer in its measurements. This adds the overhead of the call
to gettimeofday to the measurements and does not give an accurate picture
of the length of time it takes to record a trace.

This was first noticed with perf top:

------------------------------------------------------------------------------
PerfTop: 679 irqs/sec kernel:99.9% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
------------------------------------------------------------------------------

samples pcnt kernel function
_______ _____ _______________

1673.00 - 27.8% : trace_clock_local
806.00 - 13.4% : do_gettimeofday
590.00 - 9.8% : rb_reserve_next_event
554.00 - 9.2% : native_read_tsc
431.00 - 7.2% : ring_buffer_lock_reserve
365.00 - 6.1% : __rb_reserve_next
355.00 - 5.9% : rb_end_commit
322.00 - 5.4% : getnstimeofday
268.00 - 4.5% : ring_buffer_unlock_commit
262.00 - 4.4% : ring_buffer_producer_thread [ring_buffer_benchmark]
113.00 - 1.9% : read_tsc
91.00 - 1.5% : debug_smp_processor_id
69.00 - 1.1% : trace_recursive_unlock
66.00 - 1.1% : ring_buffer_event_data
25.00 - 0.4% : _spin_unlock_irq

And the length of each write to the ring buffer measured at 310ns.

This patch adds a new module parameter called "write_interval" which is
defaulted to 50. This is the number of writes performed between
timestamps. After this patch perf top shows:

------------------------------------------------------------------------------
PerfTop: 244 irqs/sec kernel:100.0% [1000Hz cpu-clock-msecs], (all, 4 CPUs)
------------------------------------------------------------------------------

samples pcnt kernel function
_______ _____ _______________

2842.00 - 40.4% : trace_clock_local
1043.00 - 14.8% : rb_reserve_next_event
784.00 - 11.1% : ring_buffer_lock_reserve
600.00 - 8.5% : __rb_reserve_next
579.00 - 8.2% : rb_end_commit
440.00 - 6.3% : ring_buffer_unlock_commit
290.00 - 4.1% : ring_buffer_producer_thread [ring_buffer_benchmark]
155.00 - 2.2% : debug_smp_processor_id
117.00 - 1.7% : trace_recursive_unlock
103.00 - 1.5% : ring_buffer_event_data
28.00 - 0.4% : do_gettimeofday
22.00 - 0.3% : _spin_unlock_irq
14.00 - 0.2% : native_read_tsc
11.00 - 0.2% : getnstimeofday

do_gettimeofday dropped from 13% usage to a mere 0.4%! (using the default
50 interval) The measurement for each timestamp went from 310ns to 210ns.
That's 100ns (1/3rd) overhead that the gettimeofday call was introducing.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-11-12 11:22:15 +0800

18 Jun, 2009

1 commit

4b221f031 ring-buffer: have benchmark test print to trace buffer ... Browse Code »

Currently the output of the ring buffer benchmark/test prints to
the console. This test runs for ten seconds every ten seconds and
ouputs the result after every iteration. This needlessly fills up
the logs.

This patch makes the ring buffer benchmark/test print to the ftrace
buffer using trace_printk. To view the test results, you must examine
the debug/tracing/trace file.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-06-18 05:01:09 +0800

17 Jun, 2009

1 commit

9086c7b90 ring-buffer: have benchmark test handle discarded events ... Browse Code »

With the addition of commit:

c7b0930857e2278f2e7714db6294e94c57f623b0
ring-buffer: prevent adding write in discarded area

The ring buffer may now add discarded events when a write passes
the end of a buffer page. Before, a discarded event was only added
when the tracer deliberately created one. The ring buffer benchmark
test does not handle discarded events when it reads the buffer and
fails when it encounters one.

Also fix the increment for large data entries (luckily, the test did
not add any yet).

[ Impact: fix false failure of ring buffer self test ]

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-06-17 01:48:52 +0800

12 May, 2009

2 commits

d988ff94c ring-buffer: check for divide by zero in ring-buffer-benchmark ... Browse Code »

Although we check if "missed" is not zero, we divide by hit + missed,
and the addition can possible overflow and become a divide by zero.

This patch checks for this case, and will report it when it happens
then modify "hit" to make the calculation be non zero.

[ Impact: prevent possible divide by zero in ring-buffer-benchmark ]

Reported-by: Andrew Morton
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-12 01:22:26 +0800
5a772b2b3 ring-buffer: replace constants with time macros in ring-buffer-benchmark ... Browse Code »

The use of numeric constants is discouraged. It is cleaner and more
descriptive to use macros for constant time conversions.

This patch also removes an extra new line.

[ Impact: more descriptive time conversions ]

Reported-by: Andrew Morton
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-12 01:22:26 +0800

08 May, 2009

2 commits

7da3046d6 ring-buffer: add total count in ring-buffer-benchmark ... Browse Code »

It is nice to see the overhead of the benchmark test when tracing is
disabled. That is, we turn off the ring buffer just to see what the
cost of running the loop that calls into the ring buffer is.

Currently, if no entries wer made, we get 0. This is not informative.
This patch changes it to check if we had any "missed" (non recorded)
events. If so, a total count is also reported.

[ Impact: evaluate the over head of the ring buffer benchmark test ]

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-08 07:52:20 +0800
0574ea421 ring-buffer: only periodically call cond_resched to ring-buffer-benchmark ... Browse Code »

Calling cond_resched at every iteration of the loop adds a bit of
overhead to the benchmark.

This patch does two things.

1) only calls cond-resched when CONFIG_PREEMPT is not enabled
2) only calls cond-resched after so many traces has been performed.

[ Impact: less overhead to the ring-buffer-benchmark ]

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-08 02:20:28 +0800

07 May, 2009

3 commits

29c8000ee ring-buffer: remove complex calculations in ring-buffer-test ... Browse Code »

Ingo Molnar thought that the code to calculate the time in cond_resched
is a bit too ugly and is not needed. This patch removes it and replaces
it with a simple call to cond_resched. I kept the comment that explains
the reason for the cond_resched.

[ Impact: remove ugly code ]

Reported-by: Ingo Molnar
Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-07 23:16:18 +0800
3e07a4f68 ring-buffer: change test to be more latency friendly ... Browse Code »

The ring buffer benchmark/test runs a producer for 10 seconds.
This is done with preemption and interrupts enabled. But if the kernel
is not compiled with CONFIG_PREEMPT, it basically stops everything
but interrupts for 10 seconds.

Although this is just a test and is not for production, this attribute
can be quite annoying. It can also spawn badness elsewhere.

This patch solves the issues by calling "cond_resched" when the system
is not compiled with CONFIG_PREEMPT. It also keeps track of the time
spent to call cond_resched such that it does not go against the
time calculations. That is, if the task schedules away, the time scheduled
out is removed from the test data. Note, this only works for non PREEMPT
because we do not know when the task is scheduled out if we have PREEMPT
enabled.

[ Impact: prevent test from stopping the world for 10 seconds ]

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-07 06:36:59 +0800
00c81a58c ring-buffer: check for failed allocation in ring buffer benchmark ... Browse Code »

The result of the allocation of the ring buffer read page in the
ring buffer bench mark does not check the return to see if a page
was actually allocated. This patch fixes that.

[ Impact: avoid NULL dereference ]

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-07 00:49:20 +0800

06 May, 2009

1 commit

5092dbc96 ring-buffer: add benchmark and tester ... Browse Code »

This patch adds code that can benchmark the ring buffer as well as
test it. This code can be compiled into the kernel (not recommended)
or as a module.

A separate ring buffer is used to not interfer with other users, like
ftrace. It creates a producer and a consumer (option to disable creation
of the consumer) and will run for 10 seconds, then sleep for 10 seconds
and then repeat.

While running, the producer will write 10 byte loads into the ring
buffer with just putting in the current CPU number. The reader will
continually try to read the buffer. The reader will alternate from reading
the buffer via event by event, or by full pages.

The output is a pr_info, thus it will fill up the syslogs.

Starting ring buffer hammer
End ring buffer hammer
Time: 9000349 (usecs)
Overruns: 12578640
Read: 5358440 (by events)
Entries: 0
Total: 17937080
Missed: 0
Hit: 17937080
Entries per millisec: 1993
501 ns per entry
Sleeping for 10 secs
Starting ring buffer hammer
End ring buffer hammer
Time: 9936350 (usecs)
Overruns: 0
Read: 28146644 (by pages)
Entries: 74
Total: 28146718
Missed: 0
Hit: 28146718
Entries per millisec: 2832
353 ns per entry
Sleeping for 10 secs

Time: is the time the test ran
Overruns: the number of events that were overwritten and not read
Read: the number of events read (either by pages or events)
Entries: the number of entries left in the buffer
(the by pages will only read full pages)
Total: Entries + Read + Overruns
Missed: the number of entries that failed to write
Hit: the number of entries that were written

The above example shows that it takes ~353 nanosecs per entry when
there is a reader, reading by pages (and no overruns)

The event by event reader slowed the producer down to 501 nanosecs.

[ Impact: see how changes to the ring buffer affect stability and performance ]

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-05-06 12:08:50 +0800