Doug / smarc-fsl-linux-kernel | Embedian Git Server

31 Oct, 2010

1 commit

f02a38d86 Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.ker… ... Browse Code »

…nel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
jump label: Add work around to i386 gcc asm goto bug
x86, ftrace: Use safe noops, drop trap test
jump_label: Fix unaligned traps on sparc.
jump label: Make arch_jump_label_text_poke_early() optional
jump label: Fix error with preempt disable holding mutex
oprofile: Remove deprecated use of flush_scheduled_work()
oprofile: Fix the hang while taking the cpu offline
jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex
jump label: Fix module __init section race

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Check irq_remapped instead of remapping_enabled in destroy_irq()

Linus Torvalds
2010-10-31 02:43:26 +0800

30 Oct, 2010

1 commit

169ed55bd Merge branch 'tip/perf/jump-label-2' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/rostedt/linux-2.6-trace into perf/urgent

Ingo Molnar
2010-10-30 16:43:08 +0800

29 Oct, 2010

3 commits

3d7851b3c oprofile: Remove deprecated use of flush_scheduled_work() ... Browse Code »

flush_scheduled_work() is deprecated and scheduled to be removed.
sync_stop() currently cancels cpu_buffer works inside buffer_mutex and
flushes the system workqueue outside. Instead, split end_cpu_work()
into two parts - stopping further work enqueues and flushing works -
and do the former inside buffer_mutex and latter outside.

For stable kernels v2.6.35.y and v2.6.36.y.

Signed-off-by: Tejun Heo
Cc: stable@kernel.org
Signed-off-by: Robert Richter

Tejun Heo
2010-10-29 17:54:18 +0800
4ac3dbec8 oprofile: Fix the hang while taking the cpu offline ... Browse Code »

The kernel build with CONFIG_OPROFILE and CPU_HOTPLUG enabled.
The oprofile is initialised using system timer in absence of hardware
counters supports. Oprofile isn't started from userland.

In this setup while doing a CPU offline the kernel hangs in infinite
for loop inside lock_hrtimer_base() function

This happens because as part of oprofile_cpu_notify(, it tries to
stop an hrtimer which was never started. These per-cpu hrtimers
are started when the oprfile is started.
echo 1 > /dev/oprofile/enable

This problem also existwhen the cpu is booted with maxcpus parameter
set. When bringing the remaining cpus online the timers are started
even if oprofile is not yet enabled.

This patch fix this issue by adding a state variable so that
these hrtimer start/stop is only attempted when oprofile is
started

For stable kernels v2.6.35.y and v2.6.36.y.

Reported-by: Jan Sebastien
Tested-by: sricharan
Signed-off-by: Santosh Shilimkar
Cc: stable@kernel.org
Signed-off-by: Robert Richter

Santosh Shilimkar
2010-10-29 17:52:53 +0800
fc14f2fef convert get_sb_single() users ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:16:28 +0800

26 Oct, 2010

1 commit

85fe4025c fs: do not assign default i_ino in new_inode ... Browse Code »

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves. For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:11 +0800

23 Oct, 2010

1 commit

092e0e7e5 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl ... Browse Code »

* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek

Linus Torvalds
2010-10-23 01:52:56 +0800

15 Oct, 2010

5 commits

6038f373a llseek: automatically add .llseek fop ... Browse Code »

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{

}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{

}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{

}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{

}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann
Cc: Julia Lawall
Cc: Christoph Hellwig

Arnd Bergmann
2010-10-15 21:53:27 +0800
cd254f295 oprofile: make !CONFIG_PM function stubs static inline ... Browse Code »

Make !CONFIG_PM function stubs static inline and remove section
attribute.

Signed-off-by: Robert Richter

Robert Richter
2010-10-15 18:47:18 +0800
b3b3a9b63 oprofile: fix linker errors ... Browse Code »

Commit e9677b3ce (oprofile, ARM: Use oprofile_arch_exit() to
cleanup on failure) caused oprofile_perf_exit to be called
in the cleanup path of oprofile_perf_init. The __exit tag
for oprofile_perf_exit should therefore be dropped.

The same has to be done for exit_driverfs as well, as this
function is called from oprofile_perf_exit. Else, we get
the following two linker errors.

LD .tmp_vmlinux1
`oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1

LD .tmp_vmlinux1
`exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Anand Gadiyar
Cc: Will Deacon
Signed-off-by: Robert Richter

Anand Gadiyar
2010-10-15 18:45:44 +0800
277dd9841 oprofile: include platform_device.h to fix build break ... Browse Code »

oprofile_perf.c needs to include platform_device.h
Otherwise we get the following build break.

CC arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.o
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: 'struct platform_device' declared inside parameter list
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: its scope is only this definition or declaration, which is probably not what you want
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:201: warning: 'struct platform_device' declared inside parameter list
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:210: error: variable 'oprofile_driver' has initializer but incomplete type
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: unknown field 'driver' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: extra brace group at end of initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: error: unknown field 'resume' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: error: unknown field 'suspend' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c: In function 'init_driverfs':

Signed-off-by: Anand Gadiyar
Cc: Matt Fleming
Cc: Will Deacon
Signed-off-by: Robert Richter

Anand Gadiyar
2010-10-15 18:45:44 +0800
6268464b3 Merge remote branch 'tip/perf/core' into oprofile/core ... Browse Code »

Conflicts:
arch/arm/oprofile/common.c
kernel/perf_event.c

Robert Richter
2010-10-15 18:45:00 +0800

12 Oct, 2010

7 commits

7df01d96b oprofile: disable write access to oprofilefs while profiler is running ... Browse Code »

Oprofile counters are setup when profiling is disabled. Thus, writing
to oprofilefs has no immediate effect. Changes are updated only after
oprofile is reenabled.

To keep userland and kernel states synchronized, we now allow
configuration of oprofile only if profiling is disabled. In this case
it checks if the profiler is running and then disables write access to
oprofilefs by returning -EBUSY. The change should be backward
compatible with current oprofile userland daemon.

Acked-by: Maynard Johnson
Cc: William Cohen
Cc: Suravee Suthikulpanit
Signed-off-by: Robert Richter

Robert Richter
2010-10-12 23:25:06 +0800
0361e0234 Merge branch 'oprofile/perf' into oprofile/core ... Browse Code »

Conflicts:
arch/arm/oprofile/common.c

Signed-off-by: Robert Richter

Robert Richter
2010-10-12 01:38:39 +0800
e9677b3ce oprofile, ARM: Use oprofile_arch_exit() to cleanup on failure ... Browse Code »

There is duplicate cleanup code in the init and exit functions. Now,
oprofile_arch_exit() is also used if oprofile_arch_init() fails.

Acked-by: Will Deacon
Signed-off-by: Robert Richter

Robert Richter
2010-10-12 01:34:04 +0800
2bcb2b641 oprofile, ARM: Rework op_create_counter() ... Browse Code »

This patch simplifies op_create_counter(). Removing if/else if paths
and return code variable by direct returning from function.

Acked-by: Will Deacon
Signed-off-by: Robert Richter

Robert Richter
2010-10-12 01:34:03 +0800
9c91283a1 oprofile, ARM: Remove some goto statements ... Browse Code »

This patch removes some unnecessary goto statements.

Acked-by: Will Deacon
Signed-off-by: Robert Richter

Robert Richter
2010-10-12 01:34:03 +0800
81771974a oprofile, ARM: Release resources on failure ... Browse Code »

This patch fixes a resource leak on failure, where the
oprofilefs and some counters may not released properly.

Signed-off-by: Robert Richter
Acked-by: Will Deacon
Cc: linux-arm-kernel@lists.infradead.org
Cc: # .35.x
LKML-Reference:
Signed-off-by: Ingo Molnar

Robert Richter
2010-10-12 01:27:10 +0800
ad0f7cfaa Merge branch 'oprofile/urgent' (early part) into oprofile/perf Browse Code »

Robert Richter
2010-10-12 01:26:50 +0800

11 Oct, 2010

1 commit

3d90a0076 oprofile: Abstract the perf-events backend ... Browse Code »

Move the perf-events backend from arch/arm/oprofile into
drivers/oprofile so that the code can be shared between architectures.

This allows each architecture to maintain only a single copy of the PMU
accessor functions instead of one for both perf and OProfile. It also
becomes possible for other architectures to delete much of their
OProfile code in favour of the common code now available in
drivers/oprofile/oprofile_perf.c.

Signed-off-by: Matt Fleming
Tested-by: Will Deacon
Signed-off-by: Robert Richter

Matt Fleming
2010-10-11 23:46:16 +0800

04 Oct, 2010

1 commit

4fdaa7b68 oprofile: Remove duplicate code around __oprofilefs_create_file() ... Browse Code »

Removing duplicate code by assigning the inodes private data pointer
in __oprofilefs_create_file(). Extending the function interface to
pass the pointer.

Signed-off-by: Robert Richter

Robert Richter
2010-10-04 16:55:24 +0800

01 Oct, 2010

1 commit

ef70fcc0c Merge branch 'oprofile/urgent' into oprofile/core ... Browse Code »

Conflicts:
arch/arm/oprofile/common.c

Signed-off-by: Robert Richter

Robert Richter
2010-10-01 14:54:17 +0800

31 Aug, 2010

1 commit

979048e1f oprofile: don't call arch exit code from init code on failure ... Browse Code »

oprofile_init calls oprofile_arch_init to initialise the architecture-specific
backend code. If this backend code returns failure, oprofile_arch_exit is
called immediately, making it difficult to allocate and free resources
correctly.

This patch removes the oprofile_arch_exit call from oprofile_init,
meaning that all architectures must ensure that oprofile_arch_init
cleans up any mess it's made before returning an error. As far as
I can tell, this only affects the code for ARM.

Cc: Robert Richter
Cc: Matt Fleming
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Will Deacon
Signed-off-by: Robert Richter

Will Deacon
2010-08-31 17:47:50 +0800

25 Aug, 2010

1 commit

750d857c6 oprofile: fix crash when accessing freed task structs ... Browse Code »

This patch fixes a crash during shutdown reported below. The crash is
caused by accessing already freed task structs. The fix changes the
order for registering and unregistering notifier callbacks.

All notifiers must be initialized before buffers start working. To
stop buffer synchronization we cancel all workqueues, unregister the
notifier callback and then flush all buffers. After all of this we
finally can free all tasks listed.

This should avoid accessing freed tasks.

On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:

> So the initial observation is a spinlock bad magic followed by a crash
> in the spinlock debug code:
>
> [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
>
> Backtrace looks like:
>
> spin_bug+0x74/0xd4
> ._raw_spin_lock+0x48/0x184
> ._spin_lock+0x10/0x24
> .get_task_mm+0x28/0x8c
> .sync_buffer+0x1b4/0x598
> .wq_sync_buffer+0xa0/0xdc
> .worker_thread+0x1d8/0x2a8
> .kthread+0xa8/0xb4
> .kernel_thread+0x54/0x70
>
> So we are accessing a freed task struct in the work queue when
> processing the samples.

Reported-by: Benjamin Herrenschmidt
Cc: stable@kernel.org
Signed-off-by: Robert Richter

Robert Richter
2010-08-25 15:09:09 +0800

26 Jul, 2010

1 commit

729419f00 oprofile: make event buffer nonseekable ... Browse Code »

The event buffer cannot deal with seeks, so
we should forbid that outright.

Signed-off-by: Arnd Bergmann
Cc: Robert Richter
Cc: oprofile-list@lists.sf.net
Signed-off-by: Robert Richter

Arnd Bergmann
2010-07-26 16:58:24 +0800

04 May, 2010

1 commit

9414e9967 oprofile: protect from not being in an IRQ context ... Browse Code »

http://lkml.org/lkml/2010/4/27/285

Protect against dereferencing regs when it's NULL, and
force a magic number into pc to prevent too deep processing.
This approach permits the dropped samples to be tallied as
invalid Instruction Pointer events.

e.g. output from about 15mins at 10kHz sample rate:
Nr. samples received: 2565380
Nr. samples lost invalid pc: 4

Signed-off-by: Phil Carmody
Signed-off-by: Robert Richter

Phil Carmody
2010-05-04 05:02:39 +0800

23 Apr, 2010

3 commits

b971f0618 Merge commit 'tip/tracing/core' into oprofile/core ... Browse Code »

Conflicts:
drivers/oprofile/cpu_buffer.c

Signed-off-by: Robert Richter

Robert Richter
2010-04-23 22:47:51 +0800
cb6e943cc oprofile: remove double ring buffering ... Browse Code »

oprofile used a double buffer scheme for its cpu event buffer
to avoid races on reading with the old locked ring buffer.

But that is obsolete now with the new ring buffer, so simply
use a single buffer. This greatly simplifies the code and avoids
a lot of sample drops on large runs, especially with call graph.

Based on suggestions from Steven Rostedt

For stable kernels from v2.6.32, but not earlier.

Signed-off-by: Andi Kleen
Cc: Steven Rostedt
Cc: stable
Signed-off-by: Robert Richter

Andi Kleen
2010-04-23 21:30:38 +0800
a36bf32e9 Merge commit 'v2.6.34-rc5' into oprofile/core Browse Code »

Robert Richter
2010-04-23 20:30:22 +0800

08 Apr, 2010

1 commit

c1ab9cab7 Merge branch 'linus' into tracing/core ... Browse Code »

Conflicts:
include/linux/module.h
kernel/module.c

Semantic conflict:
include/trace/events/module.h

Merge reason: Resolve the conflict with upstream commit 5fbfb18 ("Fix up
possibly racy module refcounting")

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-04-08 16:18:47 +0800

01 Apr, 2010

1 commit

66a8cb95e ring-buffer: Add place holder recording of dropped events ... Browse Code »

Currently, when the ring buffer drops events, it does not record
the fact that it did so. It does inform the writer that the event
was dropped by returning a NULL event, but it does not put in any
place holder where the event was dropped.

This is not a trivial thing to add because the ring buffer mostly
runs in overwrite (flight recorder) mode. That is, when the ring
buffer is full, new data will overwrite old data.

In a produce/consumer mode, where new data is simply dropped when
the ring buffer is full, it is trivial to add the placeholder
for dropped events. When there's more room to write new data, then
a special event can be added to notify the reader about the dropped
events.

But in overwrite mode, any new write can overwrite events. A place
holder can not be inserted into the ring buffer since there never
may be room. A reader could also come in at anytime and miss the
placeholder.

Luckily, the way the ring buffer works, the read side can find out
if events were lost or not, and how many events. Everytime a write
takes place, if it overwrites the header page (the next read) it
updates a "overrun" variable that keeps track of the number of
lost events. When a reader swaps out a page from the ring buffer,
it can record this number, perfom the swap, and then check to
see if the number changed, and take the diff if it has, which would be
the number of events dropped. This can be stored by the reader
and returned to callers of the reader.

Since the reader page swap will fail if the writer moved the head
page since the time the reader page set up the swap, this gives room
to record the overruns without worrying about races. If the reader
sets up the pages, records the overrun, than performs the swap,
if the swap succeeds, then the overrun variable has not been
updated since the setup before the swap.

For binary readers of the ring buffer, a flag is set in the header
of each sub page (sub buffer) of the ring buffer. This flag is embedded
in the size field of the data on the sub buffer, in the 31st bit (the size
can be 32 or 64 bits depending on the architecture), but only 27
bits needs to be used for the actual size (less actually).

We could add a new field in the sub buffer header to also record the
number of events dropped since the last read, but this will change the
format of the binary ring buffer a bit too much. Perhaps this change can
be made if the information on the number of events dropped is considered
important enough.

Note, the notification of dropped events is only used by consuming reads
or peeking at the ring buffer. Iterating over the ring buffer does not
keep this information because the necessary data is only available when
a page swap is made, and the iterator does not swap out pages.

Cc: Robert Richter
Cc: Andi Kleen
Cc: Li Zefan
Cc: Arnaldo Carvalho de Melo
Cc: "Luis Claudio R. Goncalves"
Cc: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-04-01 10:57:04 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

03 Mar, 2010

1 commit

bc078e4ea oprofile: convert oprofile from timer_hook to hrtimer ... Browse Code »

Oprofile is currently broken on systems running with NOHZ enabled.
A maximum of 1 tick is accounted via the timer_hook if a cpu sleeps
for a longer period of time. This does bad things to the percentages
in the profiler output. To solve this problem convert oprofile to
use a restarting hrtimer instead of the timer_hook.

Signed-off-by: Martin Schwidefsky
Signed-off-by: Robert Richter

Martin Schwidefsky
2010-03-03 00:03:20 +0800

15 Dec, 2009

1 commit

d0316554d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...

Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c

Linus Torvalds
2009-12-15 01:58:24 +0800

29 Oct, 2009

1 commit

b3e9f672b percpu: make percpu symbols in oprofile unique ... Browse Code »

This patch updates percpu related symbols in oprofile such that percpu
symbols are unique and don't clash with local symbols. This serves
two purposes of decreasing the possibility of global percpu symbol
collision and allowing dropping per_cpu__ prefix from percpu symbols.

* drivers/oprofile/cpu_buffer.c: s/cpu_buffer/op_cpu_buffer/

Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.

Signed-off-by: Tejun Heo
Acked-by: Robert Richter
Cc: Rusty Russell

Tejun Heo
2009-10-29 21:34:13 +0800

10 Oct, 2009

2 commits

c0868934e oprofile: warn on freeing event buffer too early ... Browse Code »

A race shouldn't happen since all workqueues or handlers are canceled
or flushed before the event buffer is freed. A warning is triggered
now if the buffer is freed too early.

Also, this patch adds some comments about event buffer protection,
reworks some code and adds code to clear buffer_pos during alloc and
free of the event buffer.

Cc: David Rientjes
Cc: Stephane Eranian
Signed-off-by: Robert Richter

Robert Richter
2009-10-10 03:32:05 +0800
066b3aa84 oprofile: fix race condition in event_buffer free ... Browse Code »

Looking at the 2.6.31-rc9 code, it appears there is a race condition
in the event_buffer cleanup code path (shutdown). This could lead to
kernel panic as some CPUs may be operating on the event buffer AFTER
it has been freed. The attached patch solves the problem and makes
sure CPUs check if the buffer is not NULL before they access it as
some may have been spinning on the mutex while the buffer was being
freed.

The race may happen if the buffer is freed during pending reads. But
it is not clear why there are races in add_event_entry() since all
workqueues or handlers are canceled or flushed before the event buffer
is freed.

Signed-off-by: David Rientjes
Signed-off-by: Stephane Eranian
Signed-off-by: Robert Richter

David Rientjes
2009-10-10 00:02:01 +0800

24 Sep, 2009

1 commit

79f559977 cpumask: use zalloc_cpumask_var() where possible ... Browse Code »

Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node().

Signed-off-by: Li Zefan
Signed-off-by: Rusty Russell

Li Zefan
2009-09-24 08:04:24 +0800

22 Sep, 2009

1 commit

b87221de6 const: mark remaining super_operations const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:24 +0800

20 Jul, 2009

1 commit

1b294f596 oprofile: Adding switch counter to oprofile statistic variables ... Browse Code »

This patch moves the multiplexing switch counter from x86 code to
common oprofile statistic variables. Now the value will be available
and usable for all architectures. The initialization and
incrementation also moved to common code.

Signed-off-by: Robert Richter

Robert Richter
2009-07-20 22:43:21 +0800