Doug / smarc-fsl-linux-kernel | Embedian Git Server

28 May, 2010

1 commit

80b5184cc kernel/: convert cpu notifier to return encapsulate errno value ... Browse Code »

By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for kernel/*.c

Signed-off-by: Akinobu Mita
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-05-28 00:12:48 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

18 Jan, 2010

1 commit

e03bcb686 generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data ... Browse Code »

The smp ipi data is passed around and given write access by
other cpus and should be separated from per-cpu data consumed by
this cpu.

Looking for hot lines, I saw call_function_data shared with
tick_cpu_sched.

Signed-off-by: Milton Miller
Acked-by: Anton Blanchard
Acked-by: Jens Axboe
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: : Nick Piggin
LKML-Reference:
Signed-off-by: Ingo Molnar

Milton Miller
2010-01-18 16:02:59 +0800

17 Jan, 2010

1 commit

af2422c42 smp_call_function_any(): pass the node value to cpumask_of_node() ... Browse Code »

The change in acpi_cpufreq to use smp_call_function_any causes a warning
when it is called since the function erroneously passes the cpu id to
cpumask_of_node rather than the node that the cpu is on. Fix this.

cpumask_of_node(3): node > nr_node_ids(1)
Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
Call Trace:
[] cpumask_of_node+0x23/0x58
[] smp_call_function_any+0x65/0xfa
[] ? do_drv_read+0x0/0x2f
[] get_cur_val+0xb0/0x102
[] get_cur_freq_on_cpu+0x74/0xc5
[] acpi_cpufreq_cpu_init+0x417/0x515
[] ? __down_write+0xb/0xd
[] cpufreq_add_dev+0x278/0x922

Signed-off-by: David John
Cc: Suresh Siddha
Cc: Rusty Russell
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David John
2010-01-17 04:15:39 +0800

16 Dec, 2009

2 commits

8f0ddf91f Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
clockevents: Convert to raw_spinlock
clockevents: Make tick_device_lock static
debugobjects: Convert to raw_spinlocks
perf_event: Convert to raw_spinlock
hrtimers: Convert to raw_spinlocks
genirq: Convert irq_desc.lock to raw_spinlock
smp: Convert smplocks to raw_spinlocks
rtmutes: Convert rtmutex.lock to raw_spinlock
sched: Convert pi_lock to raw_spinlock
sched: Convert cpupri lock to raw_spinlock
sched: Convert rt_runtime_lock to raw_spinlock
sched: Convert rq->lock to raw_spinlock
plist: Make plist debugging raw_spinlock aware
bkl: Fixup core_lock fallout
locking: Cleanup the name space completely
locking: Further name space cleanups
alpha: Fix fallout from locking changes
locking: Implement new raw_spinlock
locking: Convert raw_rwlock functions to arch_rwlock
locking: Convert raw_rwlock to arch_rwlock
...

Linus Torvalds
2009-12-16 01:02:01 +0800
c0f68c2fa generic-ipi: cleanup for generic_smp_call_function_interrupt() ... Browse Code »

Use smp_processor_id() instead of get_cpu() and put_cpu() in
generic_smp_call_function_interrupt(), It's no need to disable preempt,
because we must call generic_smp_call_function_interrupt() with interrupts
disabled.

Signed-off-by: Xiao Guangrong
Acked-by: Ingo Molnar
Cc: Jens Axboe
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiao Guangrong
2009-12-16 00:53:25 +0800

15 Dec, 2009

1 commit

9f5a5621e smp: Convert smplocks to raw_spinlocks ... Browse Code »

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Acked-by: Ingo Molnar

Thomas Gleixner
2009-12-15 06:55:33 +0800

18 Nov, 2009

1 commit

2ea6dec4a generic-ipi: Add smp_call_function_any() ... Browse Code »

Andrew points out that acpi-cpufreq uses cpumask_any, when it really
would prefer to use the same CPU if possible (to avoid an IPI). In
general, this seems a good idea to offer.

[ tglx: Documented selection preference and Inlined the UP case to
avoid the copy of smp_call_function_single() and the extra
EXPORT ]

Signed-off-by: Rusty Russell
Cc: Ingo Molnar
Cc: Venkatesh Pallipadi
Cc: Len Brown
Cc: Zhao Yakui
Cc: Dave Jones
Cc: Thomas Gleixner
Cc: Mike Galbraith
Cc: "Zhang, Yanmin"
Signed-off-by: Andrew Morton
Signed-off-by: Thomas Gleixner

Rusty Russell
2009-11-18 21:52:25 +0800

23 Oct, 2009

1 commit

72f279b25 generic-ipi: Fix misleading smp_call_function*() description ... Browse Code »

After commit:8969a5ede0f9e17da4b943712429aef2c9bcd82b
"generic-ipi: remove kmalloc()", wait = 0 can be guaranteed.

Signed-off-by: Sheng Yang
Cc: Peter Zijlstra
Cc: Jens Axboe
Cc: Nick Piggin
LKML-Reference:
Signed-off-by: Ingo Molnar

Sheng Yang
2009-10-23 19:51:45 +0800

24 Sep, 2009

1 commit

0748bd017 cpumask: remove arch_send_call_function_ipi ... Browse Code »

Now everyone is converted to arch_send_call_function_ipi_mask, remove
the shim and the #defines.

Signed-off-by: Rusty Russell

Rusty Russell
2009-09-24 08:04:47 +0800

23 Sep, 2009

1 commit

54fdade1c generic-ipi: make struct call_function_data lockless ... Browse Code »

This patch can remove spinlock from struct call_function_data, the
reasons are below:

1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
it can atomically test and clear specific cpu, we can use it instead
of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
to protect those in generic_smp_call_function_interrupt().

2: in smp_call_function_many(), after csd_lock() return, the current's
cfd_data is deleted from call_function list, so it not have race
between other cpus, then cfs_data is only used in
smp_call_function_many() that must disable preemption and not from
a hardware interrupthandler or from a bottom half handler to call,
only the correspond cpu can use it, so it not have race in current
cpu, no need cfs_data->lock to protect it.

3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
generic_smp_call_function_interrupt(), so we can define cfs_data->refs
to atomic_t, and no need cfs_data->lock any more.

Signed-off-by: Xiao Guangrong
Cc: Ingo Molnar
Cc: Jens Axboe
Cc: Nick Piggin
Cc: Peter Zijlstra
Acked-by: Rusty Russell
[akpm@linux-foundation.org: use atomic_dec_return()]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiao Guangrong
2009-09-23 22:39:28 +0800

27 Aug, 2009

1 commit

b855192c0 Merge branch 'x86/urgent' into x86/pat ... Browse Code »

Reason: Change to is_new_memtype_allowed() in x86/urgent

Resolved semantic conflicts in:

arch/x86/mm/pat.c
arch/x86/mm/ioremap.c

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2009-08-27 08:24:28 +0800

22 Aug, 2009

1 commit

269c861ba generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled ... Browse Code »

Because of deadlock possiblities smp_call_function() is not allowed to
be called with interrupts disabled. Add an exception for the cpu not
yet online, as no one else can send smp call function interrupt to this
cpu that is not yet online and as such deadlock condition is not possible.

Signed-off-by: Suresh Siddha
Acked-by: Nick Piggin
Signed-off-by: H. Peter Anvin

Suresh Siddha
2009-08-22 07:25:43 +0800

08 Aug, 2009

1 commit

69dd647f9 generic-ipi: fix hotplug_cfd() ... Browse Code »

Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG

When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
but only if CPUMASK_OFFSTACK=y, which is default to n.

The bug was introduced by 8969a5ede0f9e17da4b943712429aef2c9bcd82b
("generic-ipi: remove kmalloc()").

Signed-off-by: Xiao Guangrong
Cc: Ingo Molnar
Cc: Jens Axboe
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Rusty Russell
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiao Guangrong
2009-08-08 01:39:55 +0800

09 Jun, 2009

1 commit

eaa958402 cpumask: alloc zeroed cpumask for static cpumask_var_ts ... Browse Code »

These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already. Avoid surprises when MAXSMP is enabled.

Signed-off-by: Yinghai Lu
Signed-off-by: Rusty Russell

Yinghai Lu
2009-06-09 21:00:27 +0800

13 Mar, 2009

1 commit

641cd4cfc generic-ipi: eliminate WARN_ON()s during oops/panic ... Browse Code »

Do not output smp-call related warnings in the oops/panic codepath.

Reported-by: Jan Beulich
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-03-13 17:47:34 +0800

25 Feb, 2009

4 commits

0b13fda1e generic-ipi: cleanups ... Browse Code »

Andrew pointed out that there's some small amount of
style rot in kernel/smp.c.

Clean it up.

Reported-by: Andrew Morton
Cc: Nick Piggin
Cc: Jens Axboe
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-02-25 23:52:50 +0800
6e2756376 generic-ipi: remove CSD_FLAG_WAIT ... Browse Code »

Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
the code so that we can use CSD_FLAG_LOCK for both purposes.

Signed-off-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Linus Torvalds
Cc: Nick Piggin
Cc: Jens Axboe
Cc: "Paul E. McKenney"
Cc: Rusty Russell
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-02-25 21:13:44 +0800
8969a5ede generic-ipi: remove kmalloc() ... Browse Code »

Remove the use of kmalloc() from the smp_call_function_*()
calls.

Steven's generic-ipi patch (d7240b98: generic-ipi: use per cpu
data for single cpu ipi calls) started the discussion on the use
of kmalloc() in this code and fixed the
smp_call_function_single(.wait=0) fallback case.

In this patch we complete this by also providing means for the
_many() call, which fully removes the need for kmalloc() in this
code.

The problem with the _many() call is that other cpus might still
be observing our entry when we're done with it. It solved this
by dynamically allocating data elements and RCU-freeing it.

We solve it by using a single per-cpu entry which provides
static storage and solves one half of the problem (avoiding
referencing freed data).

The other half, ensuring the queue iteration it still possible,
is done by placing re-used entries at the head of the list. This
means that if someone was still iterating that entry when it got
moved, he will now re-visit the entries on the list he had
already seen, but avoids skipping over entries like would have
happened had we placed the new entry at the end.

Furthermore, visiting entries twice is not a problem, since we
remove our cpu from the entry's cpumask once its called.

Many thanks to Oleg for his suggestions and him poking holes in
my earlier attempts.

Signed-off-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Linus Torvalds
Cc: Nick Piggin
Cc: Jens Axboe
Cc: "Paul E. McKenney"
Cc: Rusty Russell
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-02-25 21:13:43 +0800
15d0d3b33 generic IPI: simplify barriers and locking ... Browse Code »

Simplify the barriers in generic remote function call interrupt
code.

Firstly, just unconditionally take the lock and check the list
in the generic_call_function_single_interrupt IPI handler. As
we've just taken an IPI here, the chances are fairly high that
there will be work on the list for us, so do the locking
unconditionally. This removes the tricky lockless list_empty
check and dubious barriers. The change looks bigger than it is
because it is just removing an outer loop.

Secondly, clarify architecture specific IPI locking rules.
Generic code has no tools to impose any sane ordering on IPIs if
they go outside normal cache coherency, ergo the arch code must
make them appear to obey cache coherency as a "memory operation"
to initiate an IPI, and a "memory operation" to receive one.
This way at least they can be reasoned about in generic code,
and smp_mb used to provide ordering.

The combination of these two changes means that explict barriers
can be taken out of queue handling for the single case -- shared
data is explicitly locked, and ipi ordering must conform to
that, so no barriers needed. An extra barrier is needed in the
many handler, so as to ensure we load the list element after the
IPI is received.

Does any architecture actually *need* these barriers? For the
initiator I could see it, but for the handler I would be
surprised. So the other thing we could do for simplicity is just
to require that, rather than just matching with cache coherency,
we just require a full barrier before generating an IPI, and
after receiving an IPI. In which case, the smp_mb()s can go
away. But just for now, we'll be on the safe side and use the
barriers (they're in the slow case anyway).

Signed-off-by: Nick Piggin
Acked-by: Peter Zijlstra
Cc: linux-arch@vger.kernel.org
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Jens Axboe
Cc: Oleg Nesterov
Cc: Suresh Siddha
Signed-off-by: Ingo Molnar

Nick Piggin
2009-02-25 19:27:08 +0800

31 Jan, 2009

1 commit

d7240b988 generic-ipi: use per cpu data for single cpu ipi calls ... Browse Code »

The smp_call_function can be passed a wait parameter telling it to
wait for all the functions running on other CPUs to complete before
returning, or to return without waiting. Unfortunately, this is
currently just a suggestion and not manditory. That is, the
smp_call_function can decide not to return and wait instead.

The reason for this is because it uses kmalloc to allocate storage
to send to the called CPU and that CPU will free it when it is done.
But if we fail to allocate the storage, the stack is used instead.
This means we must wait for the called CPU to finish before
continuing.

Unfortunatly, some callers do no abide by this hint and act as if
the non-wait option is mandatory. The MTRR code for instance will
deadlock if the smp_call_function is set to wait. This is because
the smp_call_function will wait for the other CPUs to finish their
called functions, but those functions are waiting on the caller to
continue.

This patch changes the generic smp_call_function code to use per cpu
variables if the allocation of the data fails for a single CPU call. The
smp_call_function_many will fall back to the smp_call_function_single
if it fails its alloc. The smp_call_function_single is modified
to not force the wait state.

Since we now are using a single data per cpu we must synchronize the
callers to prevent a second caller modifying the data before the
first called IPI functions complete. To do so, I added a flag to
the call_single_data called CSD_FLAG_LOCK. When the single CPU is
called (which can be called when a many call fails an alloc), we
set the LOCK bit on this per cpu data. When the caller finishes
it clears the LOCK bit.

The caller must wait till the LOCK bit is cleared before setting
it. When it is cleared, there is no IPI function using it.

Signed-off-by: Steven Rostedt
Signed-off-by: Peter Zijlstra
Acked-by: Jens Axboe
Acked-by: Linus Torvalds
Signed-off-by: Ingo Molnar

Steven Rostedt
2009-01-31 01:31:08 +0800

01 Jan, 2009

1 commit

4f4b6c1a9 cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: core ... Browse Code »

Impact: cleanup

In future, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators
and other comparisons.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.

Signed-off-by: Rusty Russell
Signed-off-by: Mike Travis
Acked-by: Ingo Molnar
Acked-by: James Morris
Cc: Eric Biederman

Rusty Russell
2009-01-01 07:42:15 +0800

30 Dec, 2008

2 commits

ce47d974f cpumask: arch_send_call_function_ipi_mask: core ... Browse Code »

Impact: new API to reduce stack usage

We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask().

Signed-off-by: Rusty Russell

Rusty Russell
2008-12-30 06:35:17 +0800
54b11e6d5 cpumask: smp_call_function_many() ... Browse Code »

Impact: Implementation change to remove cpumask_t from stack.

Actually change smp_call_function_mask() to smp_call_function_many().
We avoid cpumasks on the stack in this version.

(S390 has its own version, but that's going away apparently).

We have to do some dancing to figure out if 0 or 1 other cpus are in
the mask supplied and the online mask without allocating a tmp
cpumask. It's still fairly cheap.

We allocate the cpumask at the end of the call_function_data
structure: if allocation fails we fallback to smp_call_function_single
rather than using the baroque quiescing code (which needs a cpumask on
stack).

(Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)

Signed-off-by: Rusty Russell
Signed-off-by: Mike Travis
Cc: Hiroshi Shimamoto
Cc: npiggin@suse.de
Cc: axboe@kernel.dk

Rusty Russell
2008-12-30 06:35:16 +0800

06 Nov, 2008

1 commit

561920a0d generic-ipi: fix the smp_mb() placement ... Browse Code »

smp_mb() is needed (to make the memory operations visible globally) before
sending the ipi on the sender and the receiver (on Alpha atleast) needs
smp_read_barrier_depends() in the handler before reading the call_single_queue
list in a lock-free fashion.

On x86, x2apic mode register accesses for sending IPI's don't have serializing
semantics. So the need for smp_mb() before sending the IPI becomes more
critical in x2apic mode.

Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.

Signed-off-by: Suresh Siddha
Signed-off-by: Jens Axboe

Suresh Siddha
2008-11-06 15:41:56 +0800

26 Aug, 2008

1 commit

f73be6ded smp: have smp_call_function_single() detect invalid CPUs ... Browse Code »

Have smp_call_function_single() return invalid CPU indicies and return
-ENXIO. This function is already executed inside a
get_cpu()..put_cpu() which locks out CPU removal, so rather than
having the higher layers doing another layer of locking to guard
against unplugged CPUs do the test here.

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2008-08-26 08:45:48 +0800

12 Aug, 2008

1 commit

c2fc11985 generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix ... Browse Code »

> > Nick Piggin (1):
> > generic-ipi: fix stack and rcu interaction bug in
> > smp_call_function_mask()
>
> I'm still not 100% sure that I have this patch right... I might have seen
> a lockup trace implicating the smp call function path... which may have
> been due to some other problem or a different bug in the new call function
> code, but if some more people can take a look at it before merging?

OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
data properly in the alloc && wait case. Secondly, I wasn't resetting
CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
wait).

After those fixes, the patch boots and runs with the kmalloc commented
out (so it always executes the slowpath).

Signed-off-by: Ingo Molnar

Nick Piggin
2008-08-12 17:21:27 +0800

11 Aug, 2008

1 commit

cc7a486ca generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() ... Browse Code »

* Venki Pallipadi wrote:

> Found a OOPS on a big SMP box during an overnight reboot test with
> upstream git.
>
> Suresh and I looked at the oops and looks like the root cause is in
> generic_smp_call_function_interrupt() and smp_call_function_mask() with
> wait parameter.
>
> The actual oops looked like
>
> [ 11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
> [ 11.277815] IP: [] 0xffff8802ffffffff
> [ 11.278155] PGD 202063 PUD 0
> [ 11.278576] Oops: 0010 [1] SMP
> [ 11.279006] CPU 5
> [ 11.279336] Modules linked in:
> [ 11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f #290
> [ 11.280039] RIP: 0010:[] [] 0xffff8802ffffffff
> [ 11.280692] RSP: 0018:ffff88027f1f7f70 EFLAGS: 00010086
> [ 11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
> [ 11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
> [ 11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
> [ 11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
> [ 11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
> [ 11.282502] FS: 0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
> [ 11.283096] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
> [ 11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
> [ 11.284936] Stack: ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
> [ 11.285802] ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
> [ 11.286599] ffffffff8020bdd6 ffff88027f1f3e40 ffff88027f1f3ef8 0000000000000000
> [ 11.287120] Call Trace:
> [ 11.287768] [] ? generic_smp_call_function_interrupt+0x61/0x12c
> [ 11.288354] [] smp_call_function_interrupt+0x17/0x27
> [ 11.288744] [] call_function_interrupt+0x66/0x70
> [ 11.289030] [] ? clockevents_notify+0x19/0x73
> [ 11.289380] [] ? acpi_idle_enter_simple+0x18b/0x1fa
> [ 11.289760] [] ? acpi_idle_enter_simple+0x181/0x1fa
> [ 11.290051] [] ? cpuidle_idle_call+0x70/0xa2
> [ 11.290338] [] ? cpu_idle+0x5f/0x7d
> [ 11.290723] [] ? start_secondary+0x14d/0x152
> [ 11.291010]
> [ 11.291287]
> [ 11.291654] Code: Bad RIP value.
> [ 11.292041] RIP [] 0xffff8802ffffffff
> [ 11.292380] RSP
> [ 11.292741] CR2: ffff8802ffffffff
> [ 11.310951] ---[ end trace 137c54d525305f1c ]---
>
> The problem is with the following sequence of events:
>
> - CPU A calls smp_call_function_mask() for CPU B with wait parameter
> - CPU A sets up the call_function_data on the stack and does an rcu add to
> call_function_queue
> - CPU A waits until the WAIT flag is cleared
> - CPU B gets the call function interrupt and starts going through the
> call_function_queue
> - CPU C also gets some other call function interrupt and starts going through
> the call_function_queue
> - CPU C, which is also going through the call_function_queue, starts referencing
> CPU A's stack, as that element is still in call_function_queue
> - CPU B finishes the function call that CPU A set up and as there are no other
> references to it, rcu deletes the call_function_data (which was from CPU A
> stack)
> - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
> - CPU A which was waiting on the flag continues executing and the stack
> contents change
>
> - CPU C is still in rcu_read section accessing the CPU A's stack sees
> inconsistent call_funation_data and can try to execute
> function with some random pointer, causing stack corruption for A
> (by clearing the bits in mask field) and oops.

Nice debugging work.

I'd suggest something like the attached (boot tested) patch as the simple
fix for now.

I expect the benefits from the less synchronized, multiple-in-flight-data
global queue will still outweigh the costs of dynamic allocations. But
if worst comes to worst then we just go back to a globally synchronous
one-at-a-time implementation, but that would be pretty sad!

Signed-off-by: Ingo Molnar

Nick Piggin
2008-08-11 21:21:28 +0800

27 Jul, 2008

1 commit

7babe8db9 Full conversion to early_initcall() interface, remove old interface ... Browse Code »

A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls. Now we remove the older interface, converting all
existing users to the new one.

[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu
Cc: Tom Zanussi
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eduard - Gabriel Munteanu
2008-07-27 03:00:04 +0800

16 Jul, 2008

2 commits

63cf13b77 generic ipi function calls: wait on alloc failure fallback ... Browse Code »

When a GFP_ATOMIC allocation fails, it falls back to allocating the
data on the stack and converting it to a waiting call.

Make sure we actually wait in this case.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
2008-07-16 05:12:20 +0800
59190f421 Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
generic-ipi: more merge fallout
generic-ipi: merge fix
x86, visws: use mach-default/entry_arch.h
x86, visws: fix generic-ipi build
generic-ipi: fixlet
generic-ipi: fix s390 build bug
generic-ipi: fix linux-next tree build failure
fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
fix "smp_call_function: get rid of the unused nonatomic/retry argument"
on_each_cpu(): kill unused 'retry' parameter
smp_call_function: get rid of the unused nonatomic/retry argument
sh: convert to generic helpers for IPI function calls
parisc: convert to generic helpers for IPI function calls
mips: convert to generic helpers for IPI function calls
m32r: convert to generic helpers for IPI function calls
arm: convert to generic helpers for IPI function calls
alpha: convert to generic helpers for IPI function calls
ia64: convert to generic helpers for IPI function calls
powerpc: convert to generic helpers for IPI function calls
...

Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually

Linus Torvalds
2008-07-16 05:12:03 +0800

27 Jun, 2008

1 commit

ce0d1b6f7 fix: "smp_call_function: get rid of the unused nonatomic/retry argument" ... Browse Code »

fix:

kernel/smp.c: In function 'smp_call_function_mask':
kernel/smp.c:303: error: too many arguments to function 'smp_call_function_single'

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-06-27 17:50:32 +0800

26 Jun, 2008

2 commits

8691e5a8f smp_call_function: get rid of the unused nonatomic/retry argument ... Browse Code »

It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge
Signed-off-by: Jens Axboe

Jens Axboe
2008-06-26 17:24:35 +0800
3d4422332 Add generic helpers for arch IPI function calls ... Browse Code »

This adds kernel/smp.c which contains helpers for IPI function calls. In
addition to supporting the existing smp_call_function() in a more efficient
manner, it also adds a more scalable variant called smp_call_function_single()
for calling a given function on a single CPU only.

The core of this is based on the x86-64 patch from Nick Piggin, lots of
changes since then. "Alan D. Brunelle" has
contributed lots of fixes and suggestions as well. Also thanks to
Paul E. McKenney for reviewing RCU usage
and getting rid of the data allocation fallback deadlock.

Acked-by: Ingo Molnar
Reviewed-by: Paul E. McKenney
Signed-off-by: Jens Axboe

Jens Axboe
2008-06-26 17:21:34 +0800