Eric Lee / smarc-fsl-linux-kernel

20 Aug, 2008

1 commit

1b04624f9 tracehook: fix SA_NOCLDWAIT ... Browse Code »

I outwitted myself again in commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664,
and broke the SA_NOCLDWAIT behavior so it leaks zombies. This fixes it.

Reported-by: Andi Kleen
Signed-off-by: Roland McGrath

Roland McGrath
2008-08-20 11:37:07 +0800

18 Aug, 2008

1 commit

6951b12a0 lockdep: fix spurious 'inconsistent lock state' warning ... Browse Code »

Since f82b217e3513fe3af342c0f3ee1494e86250c21c lockdep can output spurious
warnings related to hwirqs due to hardirq_off shrinkage from int to bit-sized
flag. Guard it with double negation to fix the warning.

Signed-off-by: Dmitry Baryshkov
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Dmitry Baryshkov
2008-08-18 15:42:31 +0800

17 Aug, 2008

2 commits

406703f8d Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: fix build if CONFIG_PROVE_LOCKING not defined
lockdep: use WARN() in kernel/lockdep.c
lockdep: spin_lock_nest_lock(), checkpatch fixes
lockdep: build fix

Linus Torvalds
2008-08-17 08:16:07 +0800
c100548d4 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: scale sysctl_sched_shares_ratelimit with nr_cpus
sched: fix rt-bandwidth hotplug race
sched: fix the race between walk_tg_tree and sched_create_group

Linus Torvalds
2008-08-17 08:15:32 +0800

16 Aug, 2008

3 commits

71ef2a46f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorri… ... Browse Code »

…s/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: Fix setting of PF_SUPERPRIV by __capable()

Linus Torvalds
2008-08-16 06:32:13 +0800
df60a8441 lockdep: fix build if CONFIG_PROVE_LOCKING not defined ... Browse Code »

If CONFIG_PROVE_LOCKING not defined, then no dependency information
is available.

Signed-off-by: Stephen Hemminger
Signed-off-by: Ingo Molnar

Stephen Hemminger
2008-08-16 01:22:04 +0800
55cd53404 sched: scale sysctl_sched_shares_ratelimit with nr_cpus ... Browse Code »

David reported that his Niagra spend a little too much time in
tg_shares_up(), which considering he has a large cpu count makes sense.

So scale the ratelimit value with the number of cpus like we do for
other controls as well.

Reported-by: David Miller
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-08-16 00:25:07 +0800

15 Aug, 2008

8 commits

be4de3526 completions: uninline try_wait_for_completion and completion_done ... Browse Code »

m68k fails to build with these functions inlined in completion.h. Move
them out of line into sched.c and export them to avoid this problem.

Signed-off-by: Dave Chinner
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Chinner
2008-08-15 23:35:44 +0800
8c5a1cf0a kexec: use a mutex for locking rather than xchg() ... Browse Code »

Functionally the same, but more conventional.

Cc: Huang Ying
Tested-by: Vivek Goyal
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-08-15 23:35:43 +0800
3122c3311 kexec jump: fix for ftrace ... Browse Code »

Ftrace depends on some processor state that we destroyed during kexec and
restored by restore_processor_state(). So save_processor_state() and
restore_processor_state() are moved into machine_kexec() and ftrace is
restored after restore_processor_state().

Signed-off-by: Huang Ying
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:43 +0800
73bd9c72a kexec jump: in sync with hibernation implementation ... Browse Code »

Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with
current hibernation implementation.

Signed-off-by: Huang Ying
Acked-by: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
ca195b7f6 kexec jump: remove duplication of kexec_restart_prepare() ... Browse Code »

Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the
code.

Signed-off-by: Huang Ying
Acked-by: Pavel Machek
Acked-by: Vivek Goyal
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
163f6876f kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE ... Browse Code »

Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.

[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Cc: Russell King
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
7ade3fcc1 kexec jump: clean up #ifdef and comments ... Browse Code »

Move if (kexec_image->preserve_context) { ... } into #ifdef
CONFIG_KEXEC_JUMP to make code looks cleaner.

Fix no longer correct comments of kernel_kexec().

Signed-off-by: Huang Ying
Acked-by: Vivek Goyal
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
4cd69b986 kexec: fix compilation warning on xchg(&kexec_lock, 0) in kernel_kexec() ... Browse Code »

kernel/kexec.c: In function 'kernel_kexec':
kernel/kexec.c:1506: warning: value computed is not used

Signed-off-by: Huang Ying
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800

14 Aug, 2008

4 commits

f1679d084 sched: fix rt-bandwidth hotplug race ... Browse Code »

When we hot-unplug a cpu and rebuild the sched-domain, all cpus will be
detatched. Alex observed the case where a runqueue was stealing bandwidth
from an already disabled runqueue to satisfy its own needs.

Stop this by skipping over already disabled runqueues.

Reported-by: Alex Nixon
Signed-off-by: Peter Zijlstra
Tested-by: Alex Nixon
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-08-14 21:50:58 +0800
5cd9c58fb security: Fix setting of PF_SUPERPRIV by __capable() ... Browse Code »

Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
the target process if that is not the current process and it is trying to
change its own flags in a different way at the same time.

__capable() is using neither atomic ops nor locking to protect t->flags. This
patch removes __capable() and introduces has_capability() that doesn't set
PF_SUPERPRIV on the process being queried.

This patch further splits security_ptrace() in two:

(1) security_ptrace_may_access(). This passes judgement on whether one
process may access another only (PTRACE_MODE_ATTACH for ptrace() and
PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
current is the parent.

(2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only,
and takes only a pointer to the parent process. current is the child.

In Smack and commoncap, this uses has_capability() to determine whether
the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
This does not set PF_SUPERPRIV.

Two of the instances of __capable() actually only act on current, and so have
been changed to calls to capable().

Of the places that were using __capable():

(1) The OOM killer calls __capable() thrice when weighing the killability of a
process. All of these now use has_capability().

(2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
whether the parent was allowed to trace any process. As mentioned above,
these have been split. For PTRACE_ATTACH and /proc, capable() is now
used, and for PTRACE_TRACEME, has_capability() is used.

(3) cap_safe_nice() only ever saw current, so now uses capable().

(4) smack_setprocattr() rejected accesses to tasks other than current just
after calling __capable(), so the order of these two tests have been
switched and capable() is used instead.

(5) In smack_file_send_sigiotask(), we need to allow privileged processes to
receive SIGIO on files they're manipulating.

(6) In smack_task_wait(), we let a process wait for a privileged process,
whether or not the process doing the waiting is privileged.

I've tested this with the LTP SELinux and syscalls testscripts.

Signed-off-by: David Howells
Acked-by: Serge Hallyn
Acked-by: Casey Schaufler
Acked-by: Andrew G. Morgan
Acked-by: Al Viro
Signed-off-by: James Morris

David Howells
2008-08-14 20:59:43 +0800
09f2724a7 sched: fix the race between walk_tg_tree and sched_create_group ... Browse Code »

With 2.6.27-rc3, I hit a kernel panic when running volanoMark on my
new x86_64 machine. I also hit it with other 2.6.27-rc kernels.
See below log.

Basically, function walk_tg_tree and sched_create_group have a race
between accessing and initiating tg->children. Below patch fixes it
by moving tg->children initiation to the front of linking tg->siblings
to parent->children.

{----------------panic log------------}

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [] walk_tg_tree+0x45/0x7f
PGD 1be1c4067 PUD 1bdd8d067 PMD 0
Oops: 0000 [1] SMP
CPU 11
Modules linked in: igb
Pid: 22979, comm: java Not tainted 2.6.27-rc3 #1
RIP: 0010:[] [] walk_tg_tree+0x45/0x7f
RSP: 0018:ffff8801bfbbbd18 EFLAGS: 00010083
RAX: 0000000000000000 RBX: ffff8800be0dce40 RCX: ffffffffffffffc0
RDX: ffff880102c43740 RSI: 0000000000000000 RDI: ffff8800be0dce40
RBP: ffff8801bfbbbd48 R08: ffff8800ba437bc8 R09: 0000000000001f40
R10: ffff8801be812100 R11: ffffffff805fdf44 R12: ffff880102c43740
R13: 0000000000000000 R14: ffffffff8022cf0f R15: ffffffff8022749f
FS: 00000000568ac950(0063) GS:ffff8801bfa26d00(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000001bd848000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process java (pid: 22979, threadinfo ffff8801b145a000, task ffff8801bf18e450)
Stack: 0000000000000001 ffff8800ba5c8d60 0000000000000001 0000000000000001
ffff8800bad1ccb8 0000000000000000 ffff8801bfbbbd98 ffffffff8022ed37
0000000000000001 0000000000000286 ffff8801bd5ee180 ffff8800ba437bc8
Call Trace:
[] try_to_wake_up+0x71/0x24c
[] autoremove_wake_function+0x9/0x2e
[] ? __wake_up_common+0x46/0x76
[] __wake_up+0x38/0x4f
[] tcp_v4_rcv+0x380/0x62e

Signed-off-by: Zhang Yanmin
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Zhang, Yanmin
2008-08-14 16:58:48 +0800
2df8b1d65 lockdep: use WARN() in kernel/lockdep.c ... Browse Code »

Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.

Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton

Arjan van de Ven
2008-08-14 01:06:46 +0800

13 Aug, 2008

5 commits

c72f4573a lockdep: spin_lock_nest_lock(), checkpatch fixes ... Browse Code »

fix:

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
#46: FILE: kernel/spinlock.c:326:
+EXPORT_SYMBOL(_spin_lock_nest_lock);

total: 0 errors, 1 warnings, 26 lines checked

Signed-off-by: Andrew Morton
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Andrew Morton
2008-08-13 19:56:51 +0800
73909f7a6 Merge commit 'v2.6.27-rc3' into core/urgent Browse Code »

Ingo Molnar
2008-08-13 19:56:44 +0800
d6672c501 lockdep: build fix ... Browse Code »

fix:

kernel/built-in.o: In function `lockdep_stats_show':
lockdep_proc.c:(.text+0x3cb2f): undefined reference to `lockdep_count_forward_deps'
kernel/built-in.o: In function `l_show':
lockdep_proc.c:(.text+0x3d02b): undefined reference to `lockdep_count_forward_deps'
lockdep_proc.c:(.text+0x3d047): undefined reference to `lockdep_count_backward_deps'

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-08-13 18:55:10 +0800
f18e439d1 genirq: switch /proc/irq/*/smp_affinity et al to seqfiles ... Browse Code »

Switch /proc/irq/*/smp_affinity , /proc/irq/default_smp_affinity to
seq_files.

cat(1) reads with 1024 chunks by default, with high enough NR_CPUS, there
will be -EINVAL.

As side effect, there are now two less users of the ->read_proc interface.

Signed-off-by: Alexey Dobriyan
Cc: Paul Jackson
Cc: Mike Travis
Cc: Al Viro
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-08-13 07:07:30 +0800
3ee1062b4 cpu hotplug: s390 doesn't support additional_cpus anymore. ... Browse Code »

s390 doesn't support the additional_cpus kernel parameter anymore since a
long time. So we better update the code and documentation to reflect
that.

Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2008-08-13 07:07:28 +0800

12 Aug, 2008

11 commits

96348852c Merge branch 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix

Linus Torvalds
2008-08-12 23:49:53 +0800
1c89ac550 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
fix spinlock recursion in hvc_console
stop_machine: remove unused variable
modules: extend initcall_debug functionality to the module loader
export virtio_rng.h
lguest: use get_user_pages_fast() instead of get_user_pages()
mm: Make generic weak get_user_pages_fast and EXPORT_GPL it
lguest: don't set MAC address for guest unless specified

Linus Torvalds
2008-08-12 23:40:19 +0800
c2fc11985 generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix ... Browse Code »

> > Nick Piggin (1):
> > generic-ipi: fix stack and rcu interaction bug in
> > smp_call_function_mask()
>
> I'm still not 100% sure that I have this patch right... I might have seen
> a lockup trace implicating the smp call function path... which may have
> been due to some other problem or a different bug in the new call function
> code, but if some more people can take a look at it before merging?

OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
data properly in the alloc && wait case. Secondly, I wasn't resetting
CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
wait).

After those fixes, the patch boots and runs with the kmalloc commented
out (so it always executes the slowpath).

Signed-off-by: Ingo Molnar

Nick Piggin
2008-08-12 17:21:27 +0800
ed6d68763 stop_machine: remove unused variable ... Browse Code »

Signed-off-by: Li Zefan
Signed-off-by: Rusty Russell

Li Zefan
2008-08-12 15:52:55 +0800
59f9415ff modules: extend initcall_debug functionality to the module loader ... Browse Code »

The kernel has this really nice facility where if you put "initcall_debug"
on the kernel commandline, it'll print which function it's going to
execute just before calling an initcall, and then after the call completes
it will

1) print if it had an error code

2) checks for a few simple bugs (like leaving irqs off)
and

3) print how long the init call took in milliseconds.

While trying to optimize the boot speed of my laptop, I have been loving
number 3 to figure out what to optimize... ... and then I wished that
the same thing was done for module loading.

This patch makes the module loader use this exact same functionality; it's
a logical extension in my view (since modules are just sort of late
binding initcalls anyway) and so far I've found it quite useful in finding
where things are too slow in my boot.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Rusty Russell

Arjan van de Ven
2008-08-12 15:52:54 +0800
1ea295088 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks
sched: fix mysql+oltp regression
sched_clock: delay using sched_clock()
sched clock: couple local and remote clocks
sched clock: simplify __update_sched_clock()
sched: eliminate scd->prev_raw
sched clock: clean up sched_clock_cpu()
sched clock: revert various sched_clock() changes
sched: move sched_clock before first use
sched: test runtime rather than period in global_rt_runtime()
sched: fix SCHED_HRTICK dependency
sched: fix warning in hrtick_start_fair()

Linus Torvalds
2008-08-12 07:46:31 +0800
67a077dca Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix-timers: fix posix_timer_event() vs dequeue_signal() race
posix-timers: do_schedule_next_timer: fix the setting of ->si_overrun

Linus Torvalds
2008-08-12 07:46:11 +0800
9b4d0bab3 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: fix debug_lock_alloc
lockdep: increase MAX_LOCKDEP_KEYS
generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask()
lockdep: fix overflow in the hlock shrinkage code
lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]()
lockdep: handle chains involving classes defined in modules
mm: fix mm_take_all_locks() locking order
lockdep: annotate mm_take_all_locks()
lockdep: spin_lock_nest_lock()
lockdep: lock protection locks
lockdep: map_acquire
lockdep: shrink held_lock structure
lockdep: re-annotate scheduler runqueues
lockdep: lock_set_subclass - reset a held lock's subclass
lockdep: change scheduler annotation
debug_locks: set oops_in_progress if we will log messages.
lockdep: fix combinatorial explosion in lock subgraph traversal

Linus Torvalds
2008-08-12 07:45:46 +0800
23a0ee908 Merge branch 'core/locking' into core/urgent Browse Code »

Ingo Molnar
2008-08-12 06:11:49 +0800
e26b33e95 Merge branch 'sched/clock' into sched/urgent Browse Code »

Ingo Molnar
2008-08-12 06:07:02 +0800
0f2bc27be lockdep: fix debug_lock_alloc ... Browse Code »

When we enable DEBUG_LOCK_ALLOC but do not enable PROVE_LOCKING and or
LOCK_STAT, lock_alloc() and lock_release() turn into nops, even though
we should be doing hlock checking (check=1).

This causes a false warning and a lockdep self-disable.

Rectify this.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-08-12 04:45:51 +0800

11 Aug, 2008

5 commits

279ef6bbb sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks ... Browse Code »

Mark Langsdorf reported:

> One of my co-workers noticed that the powernow-k8
> driver no longer restarts when a CPU core is
> hot-disabled and then hot-enabled on AMD quad-core
> systems.
>
> The following comands work fine on 2.6.26 and fail
> on 2.6.27-rc1:
>
> echo 0 > /sys/devices/system/cpu/cpu3/online
> echo 1 > /sys/devices/system/cpu/cpu3/online
> find /sys -name cpufreq
>
> For 2.6.26, the find will return a cpufreq
> directory for each processor. In 2.6.27-rc1,
> the cpu3 directory is missing.
>
> After digging through the code, the following
> logic is failing when the core is hot-enabled
> at runtime. The code works during the boot
> sequence.
>
> cpumask_t = current->cpus_allowed;
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
> if (smp_processor_id() != cpu)
> return -ENODEV;

So set the CPU active before calling the CPU_ONLINE notifier chain,
there are a handful of notifiers that use set_cpus_allowed().

This fix also solves the problem with x86-microcode. I've sent
alternative patches for microcode, but as this "rely on
set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)"
assumption seems to be more broad than what we thought, perhaps this fix
should be applied.

With this patch we define that by the moment CPU_ONLINE is being sent,
a 'cpu' is online and ready for tasks to be migrated onto it.

Signed-off-by: Dmitry Adamushko
Reported-by: Mark Langsdorf
Tested-by: Mark Langsdorf
Signed-off-by: Ingo Molnar

Dmitry Adamushko
2008-08-11 22:32:41 +0800
cc7a486ca generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() ... Browse Code »

* Venki Pallipadi wrote:

> Found a OOPS on a big SMP box during an overnight reboot test with
> upstream git.
>
> Suresh and I looked at the oops and looks like the root cause is in
> generic_smp_call_function_interrupt() and smp_call_function_mask() with
> wait parameter.
>
> The actual oops looked like
>
> [ 11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
> [ 11.277815] IP: [] 0xffff8802ffffffff
> [ 11.278155] PGD 202063 PUD 0
> [ 11.278576] Oops: 0010 [1] SMP
> [ 11.279006] CPU 5
> [ 11.279336] Modules linked in:
> [ 11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f #290
> [ 11.280039] RIP: 0010:[] [] 0xffff8802ffffffff
> [ 11.280692] RSP: 0018:ffff88027f1f7f70 EFLAGS: 00010086
> [ 11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
> [ 11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
> [ 11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
> [ 11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
> [ 11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
> [ 11.282502] FS: 0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
> [ 11.283096] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
> [ 11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
> [ 11.284936] Stack: ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
> [ 11.285802] ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
> [ 11.286599] ffffffff8020bdd6 ffff88027f1f3e40 ffff88027f1f3ef8 0000000000000000
> [ 11.287120] Call Trace:
> [ 11.287768] [] ? generic_smp_call_function_interrupt+0x61/0x12c
> [ 11.288354] [] smp_call_function_interrupt+0x17/0x27
> [ 11.288744] [] call_function_interrupt+0x66/0x70
> [ 11.289030] [] ? clockevents_notify+0x19/0x73
> [ 11.289380] [] ? acpi_idle_enter_simple+0x18b/0x1fa
> [ 11.289760] [] ? acpi_idle_enter_simple+0x181/0x1fa
> [ 11.290051] [] ? cpuidle_idle_call+0x70/0xa2
> [ 11.290338] [] ? cpu_idle+0x5f/0x7d
> [ 11.290723] [] ? start_secondary+0x14d/0x152
> [ 11.291010]
> [ 11.291287]
> [ 11.291654] Code: Bad RIP value.
> [ 11.292041] RIP [] 0xffff8802ffffffff
> [ 11.292380] RSP
> [ 11.292741] CR2: ffff8802ffffffff
> [ 11.310951] ---[ end trace 137c54d525305f1c ]---
>
> The problem is with the following sequence of events:
>
> - CPU A calls smp_call_function_mask() for CPU B with wait parameter
> - CPU A sets up the call_function_data on the stack and does an rcu add to
> call_function_queue
> - CPU A waits until the WAIT flag is cleared
> - CPU B gets the call function interrupt and starts going through the
> call_function_queue
> - CPU C also gets some other call function interrupt and starts going through
> the call_function_queue
> - CPU C, which is also going through the call_function_queue, starts referencing
> CPU A's stack, as that element is still in call_function_queue
> - CPU B finishes the function call that CPU A set up and as there are no other
> references to it, rcu deletes the call_function_data (which was from CPU A
> stack)
> - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
> - CPU A which was waiting on the flag continues executing and the stack
> contents change
>
> - CPU C is still in rcu_read section accessing the CPU A's stack sees
> inconsistent call_funation_data and can try to execute
> function with some random pointer, causing stack corruption for A
> (by clearing the bits in mask field) and oops.

Nice debugging work.

I'd suggest something like the attached (boot tested) patch as the simple
fix for now.

I expect the benefits from the less synchronized, multiple-in-flight-data
global queue will still outweigh the costs of dynamic allocations. But
if worst comes to worst then we just go back to a globally synchronous
one-at-a-time implementation, but that would be pretty sad!

Signed-off-by: Ingo Molnar

Nick Piggin
2008-08-11 21:21:28 +0800
77ae65134 sched: fix mysql+oltp regression ... Browse Code »

Defer commit 6d299f1b53b84e2665f402d9bcc494800aba6386 to the next release.

Testing of the tip/sched/clock tree revealed a mysql+oltp regression
which bisection eventually traced back to this commit in mainline.

Pertinent test results: Three run sysbench averages, throughput units
in read/write requests/sec.

clients 1 2 4 8 16 32 64
6e0534f 9646 17876 34774 33868 32230 30767 29441
2.6.26.1 9112 17936 34652 33383 31929 30665 29232
6d299f1 9112 14637 28370 33339 32038 30762 29204

Note: subsequent commits hide the majority of this regression until you
apply the clock fixes, at which time it reemerges at full magnitude.

We cannot see anything bad about the change itself so we defer it to the
next release until this problem is fully analysed.

Signed-off-by: Mike Galbraith
Acked-by: Peter Zijlstra
Cc: Gregory Haskins
Signed-off-by: Ingo Molnar

Mike Galbraith
2008-08-11 20:49:29 +0800
251a169c6 Merge branch 'linus' into sched/urgent Browse Code »

Ingo Molnar
2008-08-11 19:40:56 +0800
3295f0ef9 lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]() ... Browse Code »

the names were too generic:

drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-08-11 16:30:30 +0800