Eric Lee / smarc-fsl-linux-kernel

08 May, 2010

1 commit

916774671 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix RCU issues in the NFSv4 delegation code
NFSv4: Fix the locking in nfs_inode_reclaim_delegation()

Linus Torvalds
2010-05-08 04:59:48 +0800

05 May, 2010

1 commit

7572e5631 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Avoid a gcc warning in ocfs2_wipe_inode().
ocfs2: Avoid direct write if we fall back to buffered I/O
ocfs2_dlmfs: Fix math error when reading LVB.
ocfs2: Update VFS inode's id info after reflink.
ocfs2: potential ERR_PTR dereference on error paths
ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod()
ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path
ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path
ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code
ocfs2: Reset status if we want to restart file extension.
ocfs2: Compute metaecc for superblocks during online resize.
ocfs2: Check the owner of a lockres inside the spinlock
ocfs2: one more warning fix in ocfs2_file_aio_write(), v2
ocfs2_dlmfs: User DLM_* when decoding file open flags.

Linus Torvalds
2010-05-05 07:33:18 +0800

04 May, 2010

13 commits

d577632e6 ocfs2: Avoid a gcc warning in ocfs2_wipe_inode(). ... Browse Code »

gcc warns that a variable is uninitialized. It's actually handled, but
an early return fools gcc. Let's just initialize the variable to a
garbage value that will crash if the usage is ever broken.

Signed-off-by: Joel Becker

Joel Becker
2010-05-04 10:15:49 +0800
d93ac51c7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: remove bad auth_x kmem_cache
ceph: fix lockless caps check
ceph: clear dir complete, invalidate dentry on replayed rename
ceph: fix direct io truncate offset
ceph: discard incoming messages with bad seq #
ceph: fix seq counting for skipped messages
ceph: add missing #includes
ceph: fix leaked spinlock during mds reconnect
ceph: print more useful version info on module load
ceph: fix snap realm splits
ceph: clear dir complete on d_move

Linus Torvalds
2010-05-04 07:36:19 +0800
b0930f8d3 ceph: remove bad auth_x kmem_cache ... Browse Code »

It's useless, since our allocations are already a power of 2. And it was
allocated per-instance (not globally), which caused a name collision when
we tried to mount a second file system with auth_x enabled.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:25 +0800
7ff899da0 ceph: fix lockless caps check ... Browse Code »

The __ variant requires caller to hold i_lock.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:25 +0800
ea1409f96 ceph: clear dir complete, invalidate dentry on replayed rename ... Browse Code »

If a rename operation is resent to the MDS following an MDS restart, the
client does not get a full reply (containing the resulting metadata) back.
In that case, a ceph_rename() needs to compensate by doing anything useful
that fill_inode() would have, like d_move().

It also needs to invalidate the dentry (to workaround the vfs_rename_dir()
bug) and clear the dir complete flag, just like fill_trace().

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:25 +0800
5c6a2cdb4 ceph: fix direct io truncate offset ... Browse Code »

truncate_inode_pages_range wants the end offset to align with the last byte
in a page.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:25 +0800
ae18756b9 ceph: discard incoming messages with bad seq # ... Browse Code »

We can get old message seq #'s after a tcp reconnect for stateful sessions
(i.e., the MDS). If we get a higher seq #, that is an error, and we
shouldn't see any bad seq #'s for stateless (mon, osd) connections.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:24 +0800
684be25c5 ceph: fix seq counting for skipped messages ... Browse Code »

Increment in_seq even when the message is skipped for some reason.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:24 +0800
d45d0d970 ceph: add missing #includes ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:24 +0800
0b0c06d14 ceph: fix leaked spinlock during mds reconnect ... Browse Code »

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:23 +0800
c8f16584a ceph: print more useful version info on module load ... Browse Code »

Decouple the client version from the server side. Print relevant protocol
and map version info instead.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:23 +0800
91dee39ee ceph: fix snap realm splits ... Browse Code »

The snap realm split was checking i_snap_realm, not the list_head, to
determine if an inode belonged in the new realm. The check always failed,
which meant we always moved the inode, corrupting the old realm's list and
causing various crashes.

Also wait to release old realm reference to avoid possibility of use after
free.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:23 +0800
c10f5e12b ceph: clear dir complete on d_move ... Browse Code »

d_move() reorders the d_subdirs list, breaking the readdir result caching.
Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on
rename.

Signed-off-by: Sage Weil

Sage Weil
2010-05-04 01:49:22 +0800

03 May, 2010

1 commit

973bec34b nilfs2: fix sync silent failure ... Browse Code »

As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
And nilfs does not set s_bdi anywhere. I noticed this problem by the
warning introduced by the recent commit 5129a469 ("Catch filesystem
lacking s_bdi").

WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
Hardware name: PowerEdge 2850
Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
Call Trace:
[] warn_slowpath_common+0x60/0x90
[] warn_slowpath_null+0xd/0x10
[] vfs_kern_mount+0xc5/0x14e
[] do_kern_mount+0x32/0xbd
[] do_mount+0x671/0x6d0
[] ? __get_free_pages+0x1f/0x21
[] ? copy_mount_options+0x2b/0xe2
[] ? strndup_user+0x48/0x67
[] sys_mount+0x61/0x8f
[] sysenter_do_call+0x12/0x32

This ensures to set s_bdi for nilfs and fixes the sync silent failure.

Signed-off-by: Ryusuke Konishi
Acked-by: Jens Axboe
Signed-off-by: Linus Torvalds

Ryusuke Konishi
2010-05-03 22:36:01 +0800

02 May, 2010

2 commits

17d2c0a0c NFS: Fix RCU issues in the NFSv4 delegation code ... Browse Code »

Fix a number of RCU issues in the NFSv4 delegation code.

(1) delegation->cred doesn't need to be RCU protected as it's essentially an
invariant refcounted structure.

By the time we get to nfs_free_delegation(), the delegation is being
released, so no one else should be attempting to use the saved
credentials, and they can be cleared.

However, since the list of delegations could still be under traversal at
this point by such as nfs_client_return_marked_delegations(), the cred
should be released in nfs_do_free_delegation() rather than in
nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is
insufficient as that doesn't stop the cred from being destroyed, and nor
does calling put_rpccred() after call_rcu(), given that the latter is
asynchronous.

(2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
rcu_derefence_protected() because they can only be called if
nfs_client::cl_lock is held, and that guards against anyone changing
nfsi->delegation under it. Furthermore, the barrier imposed by
rcu_dereference() is superfluous, given that the spin_lock() is also a
barrier.

(3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
struct so that it can issue lockdep advice based on clp->cl_lock for (2).

(4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
should use rcu_access_pointer() outside the spinlocked region as they
merely examine the pointer and don't follow it, thus rendering unnecessary
the need to impose a partial ordering over the one item of interest.

These result in an RCU warning like the following:

[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by mount.nfs4/2281:
#0: (&type->s_umount_key#34){+.+...}, at: [] deactivate_super+0x60/0x80
#1: (iprune_sem){+.+...}, at: [] invalidate_inodes+0x39/0x13a

stack backtrace:
Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb2
[] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
[] nfs4_clear_inode+0x11/0x1e [nfs]
[] clear_inode+0x9e/0xf8
[] dispose_list+0x67/0x10e
[] invalidate_inodes+0x11c/0x13a
[] generic_shutdown_super+0x42/0xf4
[] kill_anon_super+0x11/0x4f
[] nfs4_kill_super+0x3f/0x72 [nfs]
[] deactivate_super+0x68/0x80
[] mntput_no_expire+0xbb/0xf8
[] release_mounts+0x9a/0xb0
[] put_mnt_ns+0x6a/0x79
[] nfs_follow_remote_path+0x5a/0x146 [nfs]
[] ? nfs_do_root_mount+0x82/0x95 [nfs]
[] nfs4_try_mount+0x75/0xaf [nfs]
[] nfs4_get_sb+0x291/0x31a [nfs]
[] vfs_kern_mount+0xb8/0x177
[] do_kern_mount+0x48/0xe8
[] do_mount+0x782/0x7f9
[] sys_mount+0x83/0xbe
[] system_call_fastpath+0x16/0x1b

Also on:

fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
[] lockdep_rcu_dereference+0xaa/0xb2
[] nfs_inode_set_delegation+0xfe/0x219 [nfs]
[] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
[] nfs4_do_open+0x2a6/0x3a6 [nfs]
...

And:

fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
[] lockdep_rcu_dereference+0xaa/0xb2
[] nfs_free_delegation+0x3d/0x6e [nfs]
[] nfs_do_return_delegation+0x26/0x30 [nfs]
[] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
[] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
...

Signed-off-by: David Howells
Signed-off-by: Paul E. McKenney
Signed-off-by: Trond Myklebust

David Howells
2010-05-02 00:37:18 +0800
8f649c376 NFSv4: Fix the locking in nfs_inode_reclaim_delegation() ... Browse Code »

Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.

Signed-off-by: Trond Myklebust
Signed-off-by: David Howells
Signed-off-by: Paul E. McKenney

Trond Myklebust
2010-05-02 00:36:18 +0800

01 May, 2010

3 commits

6b933c8e6 ocfs2: Avoid direct write if we fall back to buffered I/O ... Browse Code »

when we fall back to buffered write from direct write, we call
__generic_file_aio_write() but that will end up doing direct write
even we are only prepared to do buffered write because the file
has the O_DIRECT flag set. This is a fix for
https://bugzilla.novell.com/show_bug.cgi?id=591039
revised with Joel's comments.

Signed-off-by: Li Dongyang
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker

Li Dongyang
2010-05-01 04:45:13 +0800
f9221fd80 Merge branch 'skip_delete_inode' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/mfasheh/ocfs2-mark into ocfs2-fixes

Joel Becker
2010-05-01 04:37:29 +0800
12b1b3216 Inotify: Fix build failure in inotify user support ... Browse Code »

CONFIG_INOTIFY_USER defined but CONFIG_ANON_INODES undefined will result
in the following build failure:

LD vmlinux
fs/built-in.o: In function 'sys_inotify_init1':
(.text.sys_inotify_init1+0x22c): undefined reference to 'anon_inode_getfd'
fs/built-in.o: In function `sys_inotify_init1':
(.text.sys_inotify_init1+0x22c): relocation truncated to fit: R_MIPS_26 against 'anon_inode_getfd'
make[2]: *** [vmlinux] Error 1
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2

Signed-off-by: Ralf Baechle
Cc: Al Viro
Signed-off-by: Linus Torvalds

Ralf Baechle
2010-05-01 01:14:56 +0800

30 Apr, 2010

6 commits

e97e7120e Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: add a shrinker to background inode reclaim

Linus Torvalds
2010-04-30 10:49:34 +0800
fed0a9c64 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
exofs: Fix "add bdi backing to mount session" fall out
fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK

Linus Torvalds
2010-04-30 08:18:07 +0800
9bf729c0a xfs: add a shrinker to background inode reclaim ... Browse Code »

On low memory boxes or those with highmem, kernel can OOM before the
background reclaims inodes via xfssyncd. Add a shrinker to run inode
reclaim so that it inode reclaim is expedited when memory is low.

This is more complex than it needs to be because the VM folk don't
want a context added to the shrinker infrastructure. Hence we need
to add a global list of XFS mount structures so the shrinker can
traverse them.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-04-30 05:22:13 +0800
3c2023dd8 exofs: Fix "add bdi backing to mount session" fall out ... Browse Code »

The patch: add bdi backing to mount session
(b3d0ab7e60d1865bb6f6a79a77aaba22f2543236)

Has a bug in the placement of the bdi member at
struct exofs_sb_info. The layout member must be kept
last.

Signed-off-by: Boaz Harrosh
Signed-off-by: Jens Axboe

Boaz Harrosh
2010-04-30 02:35:29 +0800
5477d0fac fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK ... Browse Code »

When CONFIG_BLOCK is set, it ends up getting backing-dev.h included.
But for !CONFIG_BLOCK, it isn't so lucky. The proper thing to do is
include directly from the file it's used from,
so do that.

Signed-off-by: Jens Axboe

Jens Axboe
2010-04-30 02:33:35 +0800
27fb8d7b1 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
nfs: fix some issues in nfs41_proc_reclaim_complete()
NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
NFS: Fix an unstable write data integrity race
nfs: testing for null instead of ERR_PTR()
NFS: rsize and wsize settings ignored on v4 mounts
NFSv4: Don't attempt an atomic open if the file is a mountpoint
SUNRPC: Fix a bug in rpcauth_prune_expired

Linus Torvalds
2010-04-30 01:23:44 +0800

29 Apr, 2010

5 commits

f80a0ca6a pktcdvd: improve BKL and compat_ioctl.c usage ... Browse Code »

The pktcdvd driver uses proper locking and does not need the BKL in the
ioctl and llseek functions of the character device, so kill both.

Moving the compat_ioctl handling from common code into the driver itself
fixes build problems when CONFIG_BLOCK is disabled.

Acked-by: Randy Dunlap
Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-04-29 23:44:37 +0800
a36fed12a exofs: Fix "add bdi backing to mount session" fall out ... Browse Code »

Commit b3d0ab7e60d1865bb6f6a79a77aaba22f2543236 ("exofs: add bdi backing
to mount session") has a bug in the placement of the bdi member at
struct exofs_sb_info. The layout member must be kept last.

Signed-off-by: Boaz Harrosh
Acked-by: Jens Axboe
Signed-off-by: Linus Torvalds

Boaz Harrosh
2010-04-29 22:59:16 +0800
d9e80b7de nfs d_revalidate() is too trigger-happy with d_drop() ... Browse Code »

If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.

Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.

Signed-off-by: Al Viro
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Al Viro
2010-04-29 11:40:03 +0800
9699eda6b nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4 ... Browse Code »

With CONFIG_NFS_V4 and data version 4, nfs_get_sb will allocate memory for
export_path in nfs4_validate_text_mount_data, so we need to free it then.
This is addressed in following kmemleak report:

unreferenced object 0xffff88016bf48a50 (size 16):
comm "mount.nfs", pid 22567, jiffies 4651574704 (age 175471.200s)
hex dump (first 16 bytes):
2f 6f 70 74 2f 77 6f 72 6b 00 6b 6b 6b 6b 6b a5 /opt/work.kkkkk.
backtrace:
[] kmemleak_alloc+0x60/0xa7
[] kmemleak_alloc_recursive.clone.5+0x1b/0x1d
[] __kmalloc_track_caller+0x18f/0x1b7
[] kstrndup+0x37/0x54
[] nfs_parse_devname+0x152/0x204 [nfs]
[] nfs4_validate_text_mount_data+0xd0/0xdc [nfs]
[] nfs_get_sb+0x325/0x736 [nfs]
[] vfs_kern_mount+0xbd/0x17c
[] do_kern_mount+0x4d/0xed
[] do_mount+0x787/0x7fe
[] sys_mount+0x88/0xc2
[] system_call_fastpath+0x16/0x1b

Signed-off-by: Xiaotian Feng
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: Benny Halevy
Cc: Al Viro
Cc: Andy Adamson
Signed-off-by: Trond Myklebust

Xiaotian Feng
2010-04-29 01:46:28 +0800
acf82b85a nfs: fix some issues in nfs41_proc_reclaim_complete() ... Browse Code »

The original code passed an ERR_PTR() to rpc_put_task() and instead of
returning zero on success it returned -ENOMEM.

Signed-off-by: Dan Carpenter
Signed-off-by: Trond Myklebust

Dan Carpenter
2010-04-29 01:45:12 +0800

28 Apr, 2010

5 commits

970b06485 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
coda: move backing-dev.h kernel include inside __KERNEL__
mtd: ensure that bdi entries are properly initialized and registered
Move mtd_bdi_*mappable to mtdcore.c
btrfs: convert to using bdi_setup_and_register()
Catch filesystems lacking s_bdi
drbd: Terminate a connection early if sending the protocol fails
drbd: fix memory leak
Fix JFFS2 sync silent failure
smbfs: add bdi backing to mount session
ncpfs: add bdi backing to mount session
exofs: add bdi backing to mount session
ecryptfs: add bdi backing to mount session
coda: add bdi backing to mount session
cifs: add bdi backing to mount session
afs: add bdi backing to mount session.
9p: add bdi backing to mount session
bdi: add helper function for doing init and register of a bdi for a file system
block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer

Linus Torvalds
2010-04-28 22:56:05 +0800
11e39d993 Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
nfsd4: bug in read_buf

Linus Torvalds
2010-04-28 07:26:21 +0800
3835541dd procfs: fix tid fdinfo ... Browse Code »

Correct the file_operations struct in fdinfo entry of tid_base_stuff[].

Presently /proc/*/task/*/fdinfo contains symlinks to opened files like
/proc/*/fd/.

Signed-off-by: Jerome Marchand
Cc: Alexander Viro
Cc: Miklos Szeredi
Cc: Alexey Dobriyan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerome Marchand
2010-04-28 07:26:03 +0800
ba8b06e67 NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear ... Browse Code »

Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
nfs_page_async_flush. According to the trace in
https://bugzilla.novell.com/show_bug.cgi?id=599628
the problem appears to be due to nfs_wb_page() not waiting for the
PG_writeback flag to clear.

There is a ditto problem in nfs_wb_page_cancel()

Signed-off-by: Trond Myklebust

Trond Myklebust
2010-04-28 06:33:54 +0800
16a5b3c41 Remove redundant check for CONFIG_MMU ... Browse Code »

The checks for CONFIG_MMU at this location are duplicated as all the code is
located inside a #ifndef CONFIG_MMU block. So the first conditional block will
always be included while the second never will.

Signed-off-by: Christoph Egger
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Christoph Egger
2010-04-28 00:01:26 +0800

27 Apr, 2010

3 commits

bc113f151 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
squashfs: fix potential buffer over-run on 4K block file systems
squashfs: add missing buffer free
squashfs: fix warn_on when root inode is corrupted
squashfs: fix locking bug in zlib wrapper

Linus Torvalds
2010-04-27 23:59:38 +0800
2bc3c1179 nfsd4: bug in read_buf ... Browse Code »

When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to. So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.

We never encountered thsi in testing because typically the only
operations which use more than two pages are write-like operations,
which have their own decoding logic. Something like a getattr after a
write may cross a page boundary, but it would be very unusual for it to
cross another boundary after that.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields

Neil Brown
2010-04-27 03:39:08 +0800
dd77ef924 xfs: more swap extent fixes for dynamic fork offsets ... Browse Code »

A new xfsqa test (226) with a prototype xfs_fsr change to try to
handle dynamic fork offsets better triggers an assertion failure
where the inode data fork is in btree format, yet there is room in
the inode for it to be in extent format. The two inodes look like:

before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
before: ino 0x115 (temp), num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56

Basically the target inode ends up with 5 extents in btree format,
but it had space for 6 extents in extent format, so ends up
incorrect. Notably here the broot size is the same, and that is
where the kernel code is going wrong - the btree root will fit, so
it lets the swap go ahead.

The check should not allow the swap to take place if the number of
extents while in btree format is less than the number of extents
that can fit in the inode in extent format. Adding that check will
prevent this swap and corruption from occurring.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-04-27 01:38:51 +0800