Eric Lee / smarc-fsl-linux-kernel

07 May, 2009

1 commit

381a80e6d inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive ... Browse Code »

There is what we believe to be a false positive reported by lockdep.

inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock

The plan is to fix this via lockdep annotation, but that is proving to be
quite involved.

The patch flips the allocation over to GFP_NFS to shut the warning up, for
the 2.6.30 release.

Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm
to switch it back to GFP_KERNEL so we don't forget.

=================================
[ INFO: inconsistent lock state ]
2.6.30-rc2-next-20090417 #203
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&inode->inotify_mutex){+.+.?.}, at: [] inotify_inode_is_dead+0x35/0xb0
{RECLAIM_FS-ON-W} state was registered at:
[] mark_held_locks+0x68/0x90
[] lockdep_trace_alloc+0xf5/0x100
[] __kmalloc_node+0x31/0x1e0
[] kernel_event+0xe2/0x190
[] inotify_dev_queue_event+0x126/0x230
[] inotify_inode_queue_event+0xc6/0x110
[] vfs_create+0xcd/0x140
[] do_filp_open+0x88d/0xa20
[] do_sys_open+0x98/0x140
[] sys_open+0x20/0x30
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff
irq event stamp: 690455
hardirqs last enabled at (690455): [] _spin_unlock_irqrestore+0x44/0x80
hardirqs last disabled at (690454): [] _spin_lock_irqsave+0x32/0xa0
softirqs last enabled at (690178): [] __do_softirq+0x202/0x220
softirqs last disabled at (690157): [] call_softirq+0x1c/0x50

other info that might help us debug this:
2 locks held by kswapd0/380:
#0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x37/0x180
#1: (&type->s_umount_key#17){++++..}, at: [] shrink_dcache_memory+0x11f/0x1e0

stack backtrace:
Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
Call Trace:
[] print_usage_bug+0x19f/0x200
[] ? save_stack_trace+0x2f/0x50
[] mark_lock+0x4bb/0x6d0
[] ? check_usage_forwards+0x0/0xc0
[] __lock_acquire+0xc62/0x1ae0
[] ? slob_free+0x10c/0x370
[] lock_acquire+0xe1/0x120
[] ? inotify_inode_is_dead+0x35/0xb0
[] mutex_lock_nested+0x63/0x420
[] ? inotify_inode_is_dead+0x35/0xb0
[] ? inotify_inode_is_dead+0x35/0xb0
[] ? sched_clock+0x9/0x10
[] ? lock_release_holdtime+0x35/0x1c0
[] inotify_inode_is_dead+0x35/0xb0
[] dentry_iput+0xbc/0xe0
[] d_kill+0x33/0x60
[] __shrink_dcache_sb+0x2d3/0x350
[] shrink_dcache_memory+0x15a/0x1e0
[] shrink_slab+0x125/0x180
[] kswapd+0x560/0x7a0
[] ? isolate_pages_global+0x0/0x2c0
[] ? autoremove_wake_function+0x0/0x40
[] ? trace_hardirqs_on+0xd/0x10
[] ? kswapd+0x0/0x7a0
[] kthread+0x5b/0xa0
[] child_rip+0xa/0x20
[] ? restore_args+0x0/0x30
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20

[eparis@redhat.com: fix audit too]
Cc: Al Viro
Cc: Matt Mackall
Cc: Christoph Lameter
Signed-off-by: Wu Fengguang
Signed-off-by: Eric Paris
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-05-07 07:36:09 +0800

28 Mar, 2009

1 commit

aabb8fdb4 fs: avoid I_NEW inodes ... Browse Code »

To be on the safe side, it should be less fragile to exclude I_NEW inodes
from inode list scans by default (unless there is an important reason to
have them).

Normally they will get excluded (eg. by zero refcount or writecount etc),
however it is a bit fragile for list walkers to know exactly what parts of
the inode state is set up and valid to test when in I_NEW. So along these
lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
checks with them too -- this shouldn't be a problem should it?)

Signed-off-by: Nick Piggin
Acked-by: Jan Kara
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Nick Piggin
2009-03-28 02:44:05 +0800

19 Feb, 2009

1 commit

f04b30de3 inotify: fix GFP_KERNEL related deadlock ... Browse Code »

Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
assert:

[ 1093.677775]
[ 1093.677781] =================================
[ 1093.680031] [ INFO: inconsistent lock state ]
[ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
[ 1093.680031] ---------------------------------
[ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1093.680031] (&inode->inotify_mutex){+.+.?.}, at: [] inotify_inode_is_dead+0x20/0x80
[ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
[ 1093.680031] [] mark_held_locks+0x43/0x5b
[ 1093.680031] [] lockdep_trace_alloc+0x6c/0x6e
[ 1093.680031] [] kmem_cache_alloc+0x20/0x150
[ 1093.680031] [] idr_pre_get+0x27/0x6c
[ 1093.680031] [] inotify_handle_get_wd+0x25/0xad
[ 1093.680031] [] inotify_add_watch+0x7a/0x129
[ 1093.680031] [] sys_inotify_add_watch+0x20f/0x250
[ 1093.680031] [] sysenter_do_call+0x12/0x35
[ 1093.680031] [] 0xffffffff
[ 1093.680031] irq event stamp: 60417
[ 1093.680031] hardirqs last enabled at (60417): [] call_rcu+0x53/0x59
[ 1093.680031] hardirqs last disabled at (60416): [] call_rcu+0x17/0x59
[ 1093.680031] softirqs last enabled at (59656): [] __do_softirq+0x157/0x16b
[ 1093.680031] softirqs last disabled at (59651): [] do_softirq+0x74/0x15d
[ 1093.680031]
[ 1093.680031] other info that might help us debug this:
[ 1093.680031] 2 locks held by kswapd0/308:
[ 1093.680031] #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x36/0x189
[ 1093.680031] #1: (&type->s_umount_key#4){+++++.}, at: [] shrink_dcache_memory+0x110/0x1fb
[ 1093.680031]
[ 1093.680031] stack backtrace:
[ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
[ 1093.680031] Call Trace:
[ 1093.680031] [] valid_state+0x12a/0x13d
[ 1093.680031] [] mark_lock+0xc1/0x1e9
[ 1093.680031] [] ? check_usage_forwards+0x0/0x3f
[ 1093.680031] [] __lock_acquire+0x2c6/0xac8
[ 1093.680031] [] ? register_lock_class+0x17/0x228
[ 1093.680031] [] lock_acquire+0x5d/0x7a
[ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] __mutex_lock_common+0x3a/0x4cb
[ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] mutex_lock_nested+0x2e/0x36
[ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] dentry_iput+0x90/0xc2
[ 1093.680031] [] d_kill+0x21/0x45
[ 1093.680031] [] __shrink_dcache_sb+0x27f/0x355
[ 1093.680031] [] shrink_dcache_memory+0x15e/0x1fb
[ 1093.680031] [] shrink_slab+0x121/0x189
[ 1093.680031] [] kswapd+0x39f/0x561
[ 1093.680031] [] ? isolate_pages_global+0x0/0x233
[ 1093.680031] [] ? autoremove_wake_function+0x0/0x43
[ 1093.680031] [] ? kswapd+0x0/0x561
[ 1093.680031] [] kthread+0x41/0x82
[ 1093.680031] [] ? kthread+0x0/0x82
[ 1093.680031] [] kernel_thread_helper+0x7/0x10

inotify_handle_get_wd() does idr_pre_get() which does a
kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
extreme MM pressure.

Signed-off-by: Ingo Molnar
Acked-by: Peter Zijlstra
Cc: MinChan Kim
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2009-02-19 07:37:56 +0800

27 Jan, 2009

1 commit

3632dee2f inotify: clean up inotify_read and fix locking problems ... Browse Code »

If userspace supplies an invalid pointer to a read() of an inotify
instance, the inotify device's event list mutex is unlocked twice.
This causes an unbalance which effectively leaves the data structure
unprotected, and we can trigger oopses by accessing the inotify
instance from different tasks concurrently.

The best fix (contributed largely by Linus) is a total rewrite
of the function in question:

On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote:
> The thing to notice is that:
>
> - locking is done in just one place, and there is no question about it
> not having an unlock.
>
> - that whole double-while(1)-loop thing is gone.
>
> - use multiple functions to make nesting and error handling sane
>
> - do error testing after doing the things you always need to do, ie do
> this:
>
> mutex_lock(..)
> ret = function_call();
> mutex_unlock(..)
>
> .. test ret here ..
>
> instead of doing conditional exits with unlocking or freeing.
>
> So if the code is written in this way, it may still be buggy, but at least
> it's not buggy because of subtle "forgot to unlock" or "forgot to free"
> issues.
>
> This _always_ unlocks if it locked, and it always frees if it got a
> non-error kevent.

Cc: John McCutchan
Cc: Robert Love
Cc:
Signed-off-by: Vegard Nossum
Signed-off-by: Linus Torvalds

Vegard Nossum
2009-01-27 02:08:05 +0800

14 Jan, 2009

2 commits

2e4d0924e [CVE-2009-0029] System call wrappers part 29 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:30 +0800
938bb9f5e [CVE-2009-0029] System call wrappers part 28 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:30 +0800

06 Jan, 2009

1 commit

4ae8978cf inotify: fix type errors in interfaces ... Browse Code »

The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.

For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
currently 'u32', it should be '__s32' . That is Robert's suggestion, and
is consistent with the other declarations of watch descriptors in the
kernel source, in particular, the inotify_event structure in
include/linux/inotify.h:

struct inotify_event {
__s32 wd; /* watch descriptor */
__u32 mask; /* watch mask */
__u32 cookie; /* cookie to synchronize two events */
__u32 len; /* length (including nulls) of name */
char name[0]; /* stub for possible name */
};

The patch makes the changes needed for inotify_rm_watch().

Signed-off-by: Michael Kerrisk
Cc: Robert Love
Cc: Vegard Nossum
Cc: Ulrich Drepper
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Michael Kerrisk
2009-01-06 00:54:29 +0800

01 Jan, 2009

1 commit

272eb0148 filesystem notification: create fs/notify to contain all fs notification ... Browse Code »

Creating a generic filesystem notification interface, fsnotify, which will be
used by inotify, dnotify, and eventually fanotify is really starting to
clutter the fs directory. This patch simply moves inotify and dnotify into
fs/notify/inotify and fs/notify/dnotify respectively to make both current fs/
and future notification tidier.

Signed-off-by: Eric Paris
Signed-off-by: Al Viro

Eric Paris
2009-01-01 07:07:43 +0800