Eric Lee / smarc-fsl-linux-kernel

28 Feb, 2009

2 commits

5cf8cf414 Fix FREEZE/THAW compat_ioctl regression ... Browse Code »

Commit 8e961870bb9804110d5c8211d5d9d500451c4518 removed the FREEZE/THAW
handling in xfs_compat_ioctl but never added any compat handler back, so
now any freeze/thaw request from a 32-bit binary ond 64-bit userspace
will fail.

As these ioctls are 32/64-bit compatible two simple COMPATIBLE_IOCTL
entries in fs/compat_ioctl.c will do the job.

Signed-off-by: Christoph Hellwig
Signed-off-by: Linus Torvalds

Christoph Hellwig
2009-02-28 08:27:45 +0800
adc487204 EXPORT_SYMBOL(d_obtain_alias) rather than EXPORT_SYMBOL_GPL ... Browse Code »

Commit 4ea3ada2955e4519befa98ff55dd62d6dfbd1705 declares d_obtain_alias()
as EXPORT_SYMBOL_GPL where it's supposed to replace d_alloc_anon which was
previously declared as EXPORT_SYMBOL and thus available to any loadable
module.

This patch reverts that.

Signed-off-by: Benny Halevy
Acked-by: Linus Torvalds
Cc: Christoph Hellwig
Cc: "J. Bruce Fields"
Cc: Trond Myklebust
Acked-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benny Halevy
2009-02-28 08:26:20 +0800

27 Feb, 2009

11 commits

221be177e Merge git://git.infradead.org/mtd-2.6 ... Browse Code »

* git://git.infradead.org/mtd-2.6:
[MTD] [MAPS] Remove MODULE_DEVICE_TABLE() from ck804rom driver.
[JFFS2] fix mount crash caused by removed nodes
[JFFS2] force the jffs2 GC daemon to behave a bit better
[MTD] [MAPS] blackfin async requires complex mappings
[MTD] [MAPS] blackfin: fix memory leak in error path
[MTD] [MAPS] physmap: fix wrong free and del_mtd_{partition,device}
[MTD] slram: Handle negative devlength correctly
[MTD] map_rom has NULL erase pointer
[MTD] [LPDDR] qinfo_probe depends on lpddr

Linus Torvalds
2009-02-27 06:45:57 +0800
28d57d437 ocfs2: add IO error check in ocfs2_get_sector() ... Browse Code »

Check for IO error in ocfs2_get_sector().

Signed-off-by: Wengang Wang
Signed-off-by: Mark Fasheh

wengang wang
2009-02-27 03:51:12 +0800
4442f5182 ocfs2: set gap to seperate entry and value when xattr in bucket ... Browse Code »

This patch set a gap (4 bytes) between xattr entry and
name/value when xattr in bucket. This gap use to seperate
entry and name/value when a bucket is full. It had already
been set when xattr in inode/block.

Signed-off-by: Tiger Yang
Signed-off-by: Mark Fasheh

Tiger Yang
2009-02-27 03:51:11 +0800
c8b9cf9a7 ocfs2: lock the metaecc process for xattr bucket ... Browse Code »

For other metadata in ocfs2, metaecc is checked in ocfs2_read_blocks
with io_mutex held. While for xattr bucket, it is calculated by
the whole buckets. So we have to add a spin_lock to prevent multiple
processes calculating metaecc.

Signed-off-by: Tao Ma
Tested-by: Tristan Ye
Signed-off-by: Mark Fasheh

Tao Ma
2009-02-27 03:51:11 +0800
89a907afe ocfs2: Use the right access_* method in ctime update of xattr. ... Browse Code »

In ctime updating of xattr, it use the wrong type of access for
inode, so use ocfs2_journal_access_di instead.

Reported-and-Tested-by: Tristan Ye
Signed-off-by: Tao Ma
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

Tao Ma
2009-02-27 03:51:11 +0800
53ecd25e1 ocfs2/dlm: Make dlm_assert_master_handler() kill itself instead of the asserter ... Browse Code »

In dlm_assert_master_handler(), if we get an incorrect assert master from a node
that, we reply with EINVAL asking the asserter to die. The problem is that an
assert is sent after so many hoops, it is invariably the node that thinks the
asserter is wrong, is actually wrong. So instead of killing the asserter, this
patch kills the assertee.

This patch papers over a race that is still being addressed.

Signed-off-by: Sunil Mushran
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

Sunil Mushran
2009-02-27 03:51:11 +0800
dabc47de7 ocfs2/dlm: Use ast_lock to protect ast_list ... Browse Code »

The code was using dlm->spinlock instead of dlm->ast_lock to protect the
ast_list. This patch fixes the issue.

Signed-off-by: Sunil Mushran
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

Sunil Mushran
2009-02-27 03:51:09 +0800
c74ff8bb2 ocfs2: Cleanup the lockname print in dlmglue.c ... Browse Code »

The dentry lock has a different format than other locks. This patch fixes
ocfs2_log_dlm_error() macro to make it print the dentry lock correctly.

Signed-off-by: Sunil Mushran
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

Sunil Mushran
2009-02-27 03:51:09 +0800
7dc102b73 ocfs2/dlm: Retract fix for race between purge and migrate ... Browse Code »

Mainline commit d4f7e650e55af6b235871126f747da88600e8040 attempts to delay
the dlm_thread from sending the drop ref message if the lockres is being
migrated. The problem is that we make the dlm_thread wait for the migration
to complete. This causes a deadlock as dlm_thread also participates in the
lockres migration process.

A better fix for the original oss bugzilla#1012 is in testing.

Signed-off-by: Sunil Mushran
Acked-by: Joel Becker
Signed-off-by: Mark Fasheh

Sunil Mushran
2009-02-27 03:51:09 +0800
47be12e4e ocfs2: Access and dirty the buffer_head in mark_written. ... Browse Code »

In __ocfs2_mark_extent_written, when we meet with the situation
of c_split_covers_rec, the old solution just replace the extent
record and forget to access and dirty the buffer_head. This will
cause a problem when the unwritten extent is in an extent block.
So access and dirty it.

Signed-off-by: Tao Ma
Signed-off-by: Mark Fasheh

Tao Ma
2009-02-27 03:51:09 +0800
64e71303e Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: try committing transaction before returning ENOSPC
Btrfs: add better -ENOSPC handling

Linus Torvalds
2009-02-27 02:37:00 +0800

26 Feb, 2009

1 commit

b2bf96833 block: fix bogus gcc warning for uninitialized var usage ... Browse Code »

Newer gcc throw this warning:

fs/bio.c: In function ?bio_alloc_bioset?:
fs/bio.c:305: warning: ?p? may be used uninitialized in this function

since it cannot figure out that 'p' is only ever used if 'bs' is non-NULL.

Signed-off-by: Jens Axboe

Jens Axboe
2009-02-26 17:45:48 +0800

25 Feb, 2009

3 commits

694593e33 Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc ... Browse Code »

* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
proc: fix PG_locked reporting in /proc/kpageflags

Linus Torvalds
2009-02-25 07:42:08 +0800
4daa0682a Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin()
ext4: Add fallback for find_group_flex

Linus Torvalds
2009-02-25 07:39:34 +0800
e07a4b921 proc: fix PG_locked reporting in /proc/kpageflags ... Browse Code »

Expr always evaluates to zero.

Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Alexey Dobriyan

Helge Bahmann
2009-02-25 02:17:58 +0800

24 Feb, 2009

1 commit

cac711211 proc: proc_get_inode should de_put when inode already initialized ... Browse Code »

de_get is called before every proc_get_inode, but corresponding de_put is
called only when dropping last reference to an inode. This might cause
something like
remove_proc_entry: /proc/stats busy, count=14496
to be printed to the syslog.

The fix is to call de_put in case of an already initialized inode in
proc_get_inode.

Signed-off-by: Krzysztof Sachanowicz
Tested-by: Marcin Pilipczuk
Acked-by: Al Viro
Signed-off-by: Linus Torvalds

Krzysztof Sachanowicz
2009-02-24 10:25:32 +0800

23 Feb, 2009

1 commit

ebd3610b1 ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin() ... Browse Code »

Functions ext4_write_begin() and ext4_da_write_begin() call
grab_cache_page_write_begin() without AOP_FLAG_NOFS. Thus it
can happen that page reclaim is triggered in that function
and it recurses back into the filesystem (or some other filesystem).
But this can lead to various problems as a transaction is already
started at that point. Add the necessary flag.

http://bugzilla.kernel.org/show_bug.cgi?id=11688

Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"

Jan Kara
2009-02-23 10:09:59 +0800

22 Feb, 2009

2 commits

05bf9e839 ext4: Add fallback for find_group_flex ... Browse Code »

This is a workaround for find_group_flex() which badly needs to be
replaced. One of its problems (besides ignoring the Orlov algorithm)
is that it is a bit hyperactive about returning failure under
suspicious circumstances. This can lead to spurious ENOSPC failures
even when there are inodes still available.

Work around this for now by retrying the search using
find_group_other() if find_group_flex() returns -1. If
find_group_other() succeeds when find_group_flex() has failed, log a
warning message.

A better block/inode allocator that will fix this problem for real has
been queued up for the next merge window.

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2009-02-22 01:13:24 +0800
710320d57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts
[CIFS] improve posix semantics of file create
[CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS
cifs: posix fill in inode needed by posix open
cifs: properly handle case where CIFSGetSrvInodeNumber fails
cifs: refactor new_inode() calls and inode initialization
[CIFS] Prevent OOPs when mounting with remote prefixpath.
[CIFS] ipv6_addr_equal for address comparison

Linus Torvalds
2009-02-22 01:11:28 +0800

21 Feb, 2009

10 commits

4c41bd0ec [JFFS2] fix mount crash caused by removed nodes ... Browse Code »

At scan time we observed following scenario:

node A inserted
node B inserted
node C inserted -> sets overlapped flag on node B

node A is removed due to CRC failure -> overlapped flag on node B remains

while (tn->overlapped)
tn = tn_prev(tn);

==> crash, when tn_prev(B) is referenced.

When the ultimate node is removed at scan time and the overlapped flag
is set on the penultimate node, then nothing updates the overlapped
flag of that node. The overlapped iterators blindly expect that the
ultimate node does not have the overlapped flag set, which causes the
scan code to crash.

It would be a huge overhead to go through the node chain on node
removal and fix up the overlapped flags, so detecting such a case on
the fly in the overlapped iterators is a simpler and reliable
solution.

Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner
Signed-off-by: David Woodhouse

Thomas Gleixner
2009-02-21 18:09:29 +0800
eca6acf91 [CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts ... Browse Code »

When two different users mount the same Windows 2003 Server share using CIFS,
the first session mounted can be invalidated. Some servers invalidate the first
smb session when a second similar user (e.g. two users who get mapped by server to "guest")
authenticates an smb session from the same client.

By making sure that we set the 2nd and subsequent vc numbers to nonzero values,
this ensures that we will not have this problem.

Fixes Samba bug 6004, problem description follows:
How to reproduce:

- configure an "open share" (full permissions to Guest user) on Windows 2003
Server (I couldn't reproduce the problem with Samba server or Windows older
than 2003)
- mount the share twice with different users who will be authenticated as guest.

noacl,noperm,user=john,dir_mode=0700,domain=DOMAIN,rw
noacl,noperm,user=jeff,dir_mode=0700,domain=DOMAIN,rw

Result:

- just the mount point mounted last is accessible:

Signed-off-by: Steve French

Steve French
2009-02-21 11:37:10 +0800
c3b2a0c64 [CIFS] improve posix semantics of file create ... Browse Code »

Samba server added support for a new posix open/create/mkdir operation
a year or so ago, and we added support to cifs for mkdir to use it,
but had not added the corresponding code to file create.

The following patch helps improve the performance of the cifs create
path (to Samba and servers which support the cifs posix protocol
extensions). Using Connectathon basic test1, with 2000 files, the
performance improved about 15%, and also helped reduce network traffic
(17% fewer SMBs sent over the wire) due to saving a network round trip
for the SetPathInfo on every file create.

It should also help the semantics (and probably the performance) of
write (e.g. when posix byte range locks are on the file) on file
handles opened with posix create, and adds support for a few flags
which would have to be ignored otherwise.

Signed-off-by: Steve French

Steve French
2009-02-21 11:37:09 +0800
69765529d [CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS ... Browse Code »

Fixes kernel bug #10451 http://bugzilla.kernel.org/show_bug.cgi?id=10451

Certain NAS appliances do not set the operating system or network operating system
fields in the session setup response on the wire. cifs was oopsing on the unexpected
zero length response fields (when trying to null terminate a zero length field).

This fixes the oops.

Acked-by: Jeff Layton
CC: stable
Signed-off-by: Steve French

Steve French
2009-02-21 11:37:09 +0800
44f68fadd cifs: posix fill in inode needed by posix open ... Browse Code »

function needed to prepare for posix open

Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2009-02-21 11:37:08 +0800
950ec5288 cifs: properly handle case where CIFSGetSrvInodeNumber fails ... Browse Code »

...if it does then we pass a pointer to an unintialized variable for
the inode number to cifs_new_inode. Have it pass a NULL pointer instead.

Also tweak the function prototypes to reduce the amount of casting.

Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2009-02-21 11:37:08 +0800
132ac7b77 cifs: refactor new_inode() calls and inode initialization ... Browse Code »

Move new inode creation into a separate routine and refactor the
callers to take advantage of it.

Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2009-02-21 11:37:07 +0800
e4cce94c9 [CIFS] Prevent OOPs when mounting with remote prefixpath. ... Browse Code »

Fixes OOPs with message 'kernel BUG at fs/cifs/cifs_dfs_ref.c:274!'.
Checks if the prefixpath in an accesible while we are still in cifs_mount
and fails with reporting a error if we can't access the prefixpath

Should fix Samba bugs 6086 and 5861 and kernel bug 12192

Signed-off-by: Igor Mammedov
Acked-by: Jeff Layton
Signed-off-by: Steve French

Igor Mammedov
2009-02-21 11:36:21 +0800
264b29900 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: check file pointer in btrfs_sync_file

Linus Torvalds
2009-02-21 09:59:14 +0800
6a63209fc Btrfs: add better -ENOSPC handling ... Browse Code »

This is a step in the direction of better -ENOSPC handling. Instead of
checking the global bytes counter we check the space_info bytes counters to
make sure we have enough space.

If we don't we go ahead and try to allocate a new chunk, and then if that fails
we return -ENOSPC. This patch adds two counters to btrfs_space_info,
bytes_delalloc and bytes_may_use.

bytes_delalloc account for extents we've actually setup for delalloc and will
be allocated at some point down the line.

bytes_may_use is to keep track of how many bytes we may use for delalloc at
some point. When we actually set the extent_bit for the delalloc bytes we
subtract the reserved bytes from the bytes_may_use counter. This keeps us from
not actually being able to allocate space for any delalloc bytes.

Signed-off-by: Josef Bacik

Josef Bacik
2009-02-21 00:00:09 +0800

20 Feb, 2009

5 commits

4e06bdd6c Btrfs: try committing transaction before returning ENOSPC ... Browse Code »

This fixes a problem where we could return -ENOSPC when we may actually have
plenty of space, the space is just pinned. Instead of returning -ENOSPC
immediately, commit the transaction first and then try and do the allocation
again.

This patch also does chunk allocation for metadata if we pass the 80%
threshold for metadata space. This will help with stack usage since the chunk
allocation will happen early on, instead of when the allocation is happening.

Signed-off-by: Josef Bacik

Josef Bacik
2009-02-20 23:59:53 +0800
2cfbd50b5 Btrfs: check file pointer in btrfs_sync_file ... Browse Code »

fsync can be called by NFS with a null file pointer, and btrfs was
oopsing in this case.

Signed-off-by: Chris Mason

Chris Mason
2009-02-20 23:55:10 +0800
620565ef5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
Revert "[XFS] remove old vmap cache"
Revert "[XFS] use scalable vmap API"

Linus Torvalds
2009-02-20 05:09:32 +0800
27e88bf6a Revert "[XFS] remove old vmap cache" ... Browse Code »

This reverts commit d2859751cd0bf586941ffa7308635a293f943c17.

This commit caused regression. We'll try to fix use of new
vmap API for next release.

Signed-off-by: Christoph Hellwig
Signed-off-by: Felix Blyakher

Felix Blyakher
2009-02-20 03:15:55 +0800
7fdf58244 Revert "[XFS] use scalable vmap API" ... Browse Code »

This reverts commit 95f8e302c04c0b0c6de35ab399a5551605eeb006.

This commit caused regression. We'll try to fix use of new
vmap API for next release.

Signed-off-by: Christoph Hellwig
Signed-off-by: Felix Blyakher

Felix Blyakher
2009-02-20 03:15:44 +0800

19 Feb, 2009

4 commits

ba95fd47d Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
block: fix booting from partitioned md array
block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb
cciss: PCI power management reset for kexec
paride/pg.c: xs(): &&/|| confusion
fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
block: fix bad definition of BIO_RW_SYNC
bsg: Fix sense buffer bug in SG_IO

Linus Torvalds
2009-02-19 10:33:04 +0800
f04b30de3 inotify: fix GFP_KERNEL related deadlock ... Browse Code »

Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
assert:

[ 1093.677775]
[ 1093.677781] =================================
[ 1093.680031] [ INFO: inconsistent lock state ]
[ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
[ 1093.680031] ---------------------------------
[ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1093.680031] (&inode->inotify_mutex){+.+.?.}, at: [] inotify_inode_is_dead+0x20/0x80
[ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
[ 1093.680031] [] mark_held_locks+0x43/0x5b
[ 1093.680031] [] lockdep_trace_alloc+0x6c/0x6e
[ 1093.680031] [] kmem_cache_alloc+0x20/0x150
[ 1093.680031] [] idr_pre_get+0x27/0x6c
[ 1093.680031] [] inotify_handle_get_wd+0x25/0xad
[ 1093.680031] [] inotify_add_watch+0x7a/0x129
[ 1093.680031] [] sys_inotify_add_watch+0x20f/0x250
[ 1093.680031] [] sysenter_do_call+0x12/0x35
[ 1093.680031] [] 0xffffffff
[ 1093.680031] irq event stamp: 60417
[ 1093.680031] hardirqs last enabled at (60417): [] call_rcu+0x53/0x59
[ 1093.680031] hardirqs last disabled at (60416): [] call_rcu+0x17/0x59
[ 1093.680031] softirqs last enabled at (59656): [] __do_softirq+0x157/0x16b
[ 1093.680031] softirqs last disabled at (59651): [] do_softirq+0x74/0x15d
[ 1093.680031]
[ 1093.680031] other info that might help us debug this:
[ 1093.680031] 2 locks held by kswapd0/308:
[ 1093.680031] #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x36/0x189
[ 1093.680031] #1: (&type->s_umount_key#4){+++++.}, at: [] shrink_dcache_memory+0x110/0x1fb
[ 1093.680031]
[ 1093.680031] stack backtrace:
[ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
[ 1093.680031] Call Trace:
[ 1093.680031] [] valid_state+0x12a/0x13d
[ 1093.680031] [] mark_lock+0xc1/0x1e9
[ 1093.680031] [] ? check_usage_forwards+0x0/0x3f
[ 1093.680031] [] __lock_acquire+0x2c6/0xac8
[ 1093.680031] [] ? register_lock_class+0x17/0x228
[ 1093.680031] [] lock_acquire+0x5d/0x7a
[ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] __mutex_lock_common+0x3a/0x4cb
[ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] mutex_lock_nested+0x2e/0x36
[ 1093.680031] [] ? inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] inotify_inode_is_dead+0x20/0x80
[ 1093.680031] [] dentry_iput+0x90/0xc2
[ 1093.680031] [] d_kill+0x21/0x45
[ 1093.680031] [] __shrink_dcache_sb+0x27f/0x355
[ 1093.680031] [] shrink_dcache_memory+0x15e/0x1fb
[ 1093.680031] [] shrink_slab+0x121/0x189
[ 1093.680031] [] kswapd+0x39f/0x561
[ 1093.680031] [] ? isolate_pages_global+0x0/0x233
[ 1093.680031] [] ? autoremove_wake_function+0x0/0x43
[ 1093.680031] [] ? kswapd+0x0/0x561
[ 1093.680031] [] kthread+0x41/0x82
[ 1093.680031] [] ? kthread+0x0/0x82
[ 1093.680031] [] kernel_thread_helper+0x7/0x10

inotify_handle_get_wd() does idr_pre_get() which does a
kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
extreme MM pressure.

Signed-off-by: Ingo Molnar
Acked-by: Peter Zijlstra
Cc: MinChan Kim
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2009-02-19 07:37:56 +0800
2db69a934 vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls. ... Browse Code »

Otherwise, these don't work when called from 32-bit userspace on 64-bit
kernels.

Cc: Jiri Kosina
Cc: Alan Cox
Cc: [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bill Nottingham
2009-02-19 07:37:56 +0800
ada723dcd fs/super.c: add lockdep annotation to s_umount ... Browse Code »

Li Zefan said:

Thread 1:
for ((; ;))
{
mount -t cpuset xxx /mnt > /dev/null 2>&1
cat /mnt/cpus > /dev/null 2>&1
umount /mnt > /dev/null 2>&1
}

Thread 2:
for ((; ;))
{
mount -t cpuset xxx /mnt > /dev/null 2>&1
umount /mnt > /dev/null 2>&1
}

(Note: It is irrelevant which cgroup subsys is used.)

After a while a lockdep warning showed up:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28 #479
---------------------------------------------
mount/13554 is trying to acquire lock:
(&type->s_umount_key#19){--..}, at: [] sget+0x5e/0x321

but task is already holding lock:
(&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321

other info that might help us debug this:
1 lock held by mount/13554:
#0: (&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321

stack backtrace:
Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
Call Trace:
[] validate_chain+0x4c6/0xbbd
[] __lock_acquire+0x676/0x700
[] lock_acquire+0x5d/0x7a
[] ? sget+0x5e/0x321
[] down_write+0x34/0x50
[] ? sget+0x5e/0x321
[] sget+0x5e/0x321
[] ? cgroup_set_super+0x0/0x3e
[] ? cgroup_test_super+0x0/0x2f
[] cgroup_get_sb+0x98/0x2e7
[] cpuset_get_sb+0x4a/0x5f
[] vfs_kern_mount+0x40/0x7b
[] do_kern_mount+0x37/0xbf
[] do_mount+0x5c3/0x61a
[] ? copy_mount_options+0x2c/0x111
[] sys_mount+0x69/0xa0
[] sysenter_do_call+0x12/0x31

The cause is after alloc_super() and then retry, an old entry in list
fs_supers is found, so grab_super(old) is called, but both functions hold
s_umount lock:

struct super_block *sget(...)
{
...
retry:
spin_lock(&sb_lock);
if (test) {
list_for_each_entry(old, &type->fs_supers, s_instances) {
if (!test(old, data))
continue;
if (!grab_super(old)) s_umount);
goto retry;
if (s)
destroy_super(s);
return old;
}
}
if (!s) {
spin_unlock(&sb_lock);
s = alloc_super(type); s_umount)
if (!s)
return ERR_PTR(-ENOMEM);
goto retry;
}
...
}

It seems like a false positive, and seems like VFS but not cgroup needs to
be fixed.

Peter said:

We can simply put the new s_umount instance in a but lockdep doesn't
particularly cares about subclass order.

If there's any issue with the callers of sget() assuming the s_umount lock
being of sublcass 0, then there is another annotation we can use to fix
that, but lets not bother with that if this is sufficient.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673

Signed-off-by: Peter Zijlstra
Tested-by: Li Zefan
Reported-by: Li Zefan
Cc: Al Viro
Cc: Paul Menage
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2009-02-19 07:37:55 +0800