Doug / smarc-fsl-linux-kernel | Embedian Git Server

27 Aug, 2011

1 commit

f5b940997 All Arch: remove linkage for sys_nfsservctl system call ... Browse Code »

The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Linus Torvalds

NeilBrown
2011-08-27 06:09:58 +0800

26 Aug, 2011

1 commit

e096d0c7e lockdep: Add helper function for dir vs file i_mutex annotation ... Browse Code »

Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists. As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes. Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that. As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

find/645 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [] might_fault+0x5c/0xac

but task is already holding lock:
(&sb->s_type->i_mutex_key#15){+.+.+.}, at: []
vfs_readdir+0x5b/0xb4

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
[] lock_acquire+0xbf/0x103
[] __mutex_lock_common+0x4c/0x361
[] mutex_lock_nested+0x40/0x45
[] hugetlbfs_file_mmap+0x82/0x110
[] mmap_region+0x258/0x432
[] do_mmap_pgoff+0x2ac/0x306
[] sys_mmap_pgoff+0x118/0x16a
[] sys_mmap+0x22/0x24
[] system_call_fastpath+0x16/0x1b

-> #0 (&mm->mmap_sem){++++++}:
[] __lock_acquire+0xa1a/0xcf7
[] lock_acquire+0xbf/0x103
[] might_fault+0x89/0xac
[] filldir+0x6f/0xc7
[] dcache_readdir+0x67/0x205
[] vfs_readdir+0x7b/0xb4
[] sys_getdents+0x7e/0xd1
[] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.

Signed-off-by: Josh Boyer
Acked-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Josh Boyer
2011-08-26 01:50:18 +0800

25 Aug, 2011

1 commit

051732bcb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
fuse: mark pages accessed when written to
fuse: delete dead .write_begin and .write_end aops
fuse: fix flock
fuse: fix non-ANSI void function notation

Linus Torvalds
2011-08-25 00:14:42 +0800

24 Aug, 2011

2 commits

c2183d1e9 fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message ... Browse Code »

FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the
message processing could overrun and result in a "kernel BUG at
fs/fuse/dev.c:629!"

Reported-by: Han-Wen Nienhuys
Signed-off-by: Miklos Szeredi
CC: stable@kernel.org

Miklos Szeredi
2011-08-24 16:20:17 +0800
35a177a08 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix tracing builds inside the source tree
xfs: remove subdirectories
xfs: don't expect xfs headers to be in subdirectories

Linus Torvalds
2011-08-24 02:41:44 +0800

23 Aug, 2011

1 commit

b6bede3b4 xfs: fix tracing builds inside the source tree ... Browse Code »

The code really requires the current source directory to be in the
header search path. We already do this if building with an object
tree separate from the source, but it needs to be added manually
if building inside the source. The cflags addition for it accidentally
got removed when collapsing the xfs directory structure.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-08-23 05:37:24 +0800

21 Aug, 2011

2 commits

6719db6a2 Btrfs: fix 64 bit divide problem ... Browse Code »

This fixes a regression introduced by commit cdcb725c05fe ("Btrfs: check
if there is enough space for balancing smarter"). We can't do 64-bit
divides on 32-bit architectures.

In cases where we need to divide/multiply by 2 we should just left/right
shift respectively, and in cases where theres N number of devices use
do_div. Also make the counters u64 to match up with rw_devices.
Thanks,

Signed-off-by: Josef Bacik
Acked-and-tested-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Josef Bacik
2011-08-21 22:02:00 +0800
c063d8a60 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: flush any pending end_io requests before DIO reads w/dioread_nolock
ext4: fix nomblk_io_submit option so it correctly converts uninit blocks
ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.
ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode
ext4: Fix ext4_should_writeback_data() for no-journal mode

Linus Torvalds
2011-08-21 21:59:41 +0800

20 Aug, 2011

1 commit

dccaf33fa ext4: flush any pending end_io requests before DIO reads w/dioread_nolock ... Browse Code »

There is a race between ext4 buffer write and direct_IO read with
dioread_nolock mount option enabled. The problem is that we clear
PageWriteback flag during end_io time but will do
uninitialized-to-initialized extent conversion later with dioread_nolock.
If an O_direct read request comes in during this period, ext4 will return
zero instead of the recently written data.

This patch checks whether there are any pending uninitialized-to-initialized
extent conversion requests before doing O_direct read to close the race.
Note that this is just a bandaid fix. The fundamental issue is that we
clear PageWriteback flag before we really complete an IO, which is
problem-prone. To fix the fundamental issue, we may need to implement an
extent tree cache that we can use to look up pending to-be-converted extents.

Signed-off-by: Jiaying Zhang
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Jiaying Zhang
2011-08-20 07:13:32 +0800

19 Aug, 2011

6 commits

35a21b429 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets
NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
pnfs-obj: Fix the comp_index != 0 case
pnfs-obj: Bug when we are running out of bio
nfs: add missing prefetch.h include

Linus Torvalds
2011-08-19 13:47:13 +0800
5c80c71b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: set i_size properly when fallocating and we already
btrfs: unlock on error in btrfs_file_llseek()
btrfs: btrfs_permission's RO check shouldn't apply to device nodes
Btrfs: truncate pages from clone ioctl target range
Btrfs: fix uninitialized sync_pending
Btrfs: fix wrong free space information
btrfs: memory leak in btrfs_add_inode_defrag()
Btrfs: use plain page_address() in header fields setget functions
Btrfs: forced readonly when btrfs_drop_snapshot() fails
Btrfs: check if there is enough space for balancing smarter
Btrfs: fix a bug of balance on full multi-disk partitions
Btrfs: fix an oops of log replay
Btrfs: detect wether a device supports discard
Btrfs: force unplugs when switching from high to regular priority bios

Linus Torvalds
2011-08-19 05:20:00 +0800
01fa4ba52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
update cifs version to 1.75
[CIFS] possible memory corruption on mount
cifs: demote cERROR in build_path_from_dentry to cFYI

Linus Torvalds
2011-08-19 05:19:36 +0800
f6a975c50 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
fat: fat16 support maximum 4GB file/vol size as WinXP or 7.
fat: fix utf8 iocharset warning message
fat: fix build warning

Linus Torvalds
2011-08-19 05:16:13 +0800
04c05b4a6 update cifs version to 1.75 ... Browse Code »

Signed-off-by: Steve French

Steve French
2011-08-19 00:55:10 +0800
13589c437 [CIFS] possible memory corruption on mount ... Browse Code »

CIFS cleanup_volume_info_contents() looks like having a memory
corruption problem.
When UNCip is set to "&vol->UNC[2]" in cifs_parse_mount_options(), it
should not be kfree()-ed in cleanup_volume_info_contents().

Introduced in commit b946845a9dc523c759cae2b6a0f6827486c3221a

Signed-off-by: J.R. Okajima
Reviewed-by: Jeff Layton
CC: Stable
Signed-off-by: Steve French

Steve French
2011-08-19 00:53:02 +0800

18 Aug, 2011

5 commits

81d86e1b7 Merge branch 'btrfs-3.0' into for-linus Browse Code »

Chris Mason
2011-08-18 22:38:03 +0800
f1e490a7e Btrfs: set i_size properly when fallocating and we already ... Browse Code »

xfstests exposed a problem with preallocate when it fallocates a range that
already has an extent. We don't set the new i_size properly because we see that
we already have an extent. This isn't right and we should update i_size if the
space already exists. With this patch we now pass xfstests 075. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-08-18 22:36:39 +0800
9a4327ca1 btrfs: unlock on error in btrfs_file_llseek() ... Browse Code »

There were some unlocks on error missing in a recent patch to
btrfs_file_llseek().

Signed-off-by: Dan Carpenter
Signed-off-by: Chris Mason

Dan Carpenter
2011-08-18 22:16:05 +0800
cb6db4e57 btrfs: btrfs_permission's RO check shouldn't apply to device nodes ... Browse Code »

This patch tightens the read-only access checks in btrfs_permission to
match the constraints in inode_permission. Currently, even though the
device node itself will be unmodified, read-write access to device nodes
is denied to when the device node resides on a read-only subvolume or a
is a file that has been marked read-only by the btrfs conversion utility.

With this patch applied, the check only affects regular files,
directories, and symlinks. It also restructures the code a bit so that
we don't duplicate the MAY_WRITE check for both tests.

Signed-off-by: Jeff Mahoney
Signed-off-by: Chris Mason

Jeff Mahoney
2011-08-18 22:16:03 +0800
338d0f0a6 befs: Validate length of long symbolic links. ... Browse Code »

Signed-off-by: Timo Warns
Signed-off-by: Linus Torvalds

Timo Warns
2011-08-18 04:31:24 +0800

17 Aug, 2011

13 commits

710d4403a fat: fat16 support maximum 4GB file/vol size as WinXP or 7. ... Browse Code »

FAT16 support maximum 4GB vol/file size with 64KB cluster size.

Win NT/XP/7 increased the maximum cluster size to 64KB, and file/vol
size increased 4GB also. Although increasing, the file size of linux
FAT is still limited at 2GB.

I found that it is limited by sb->maxbytes(0x7fffffff) when partition
is formatted by FAT16. sb->s_maxbytes in fill_super should be set to
0xffffffff like fat32.

Signed-off-by: Namjae Jeon
Signed-off-by: OGAWA Hirofumi

Namjae Jeon
2011-08-17 18:35:00 +0800
186b53701 fat: fix utf8 iocharset warning message ... Browse Code »

The fat_msg function already formats the given message and appends
a newline to it - we don't need to do this in the passed message
string as well, or will end up with a blank line printed in the
kernel log ring buffer.

Also change the loglevel from error to warning.

Signed-off-by: Mihai Moldovan
Signed-off-by: OGAWA Hirofumi

Mihai Moldovan
2011-08-17 18:34:59 +0800
8c320c079 fat: fix build warning ... Browse Code »

This fixes a compile warning (unititialized variable) in
the fat filesystem code.

Signed-off-by: Jonas Aberg
Signed-off-by: Lee Jones
Signed-off-by: Linus Walleij
Signed-off-by: OGAWA Hirofumi

Jonas Aberg
2011-08-17 18:34:58 +0800
f81c9cdc5 Btrfs: truncate pages from clone ioctl target range ... Browse Code »

We need to truncate page cache pages for the clone ioctl target range or
else we'll confuse ourselves to no end. If the old data was cached, we
used to still see it (until remount). If the page was partially updated
we used to get a mix of old and new data.

Signed-off-by: Sage Weil
Signed-off-by: Chris Mason

Sage Weil
2011-08-17 09:09:31 +0800
0e5888596 Btrfs: fix uninitialized sync_pending ... Browse Code »

sync_pending is uninitialized before it be used, fix it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-08-17 09:09:31 +0800
bb3ac5a4d Btrfs: fix wrong free space information ... Browse Code »

Btrfs subtracted the size of the allocated space twice when it allocated
the space from the bitmap in the cluster, it broke the free space information
and led to oops finally.

And this patch also fixes the bug that ctl->free_space was subtracted
without lock.

Reported-by: Liu Bo
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-08-17 09:09:31 +0800
f4ac904c4 btrfs: memory leak in btrfs_add_inode_defrag() ... Browse Code »

We don't use the defrag struct on this path.

Signed-off-by: Dan Carpenter
Signed-off-by: Chris Mason

Dan Carpenter
2011-08-17 09:09:15 +0800
c97c2916e Btrfs: use plain page_address() in header fields setget functions ... Browse Code »

We've stopped using highmem for extent buffers.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-08-17 09:09:15 +0800
cb1b69f45 Btrfs: forced readonly when btrfs_drop_snapshot() fails ... Browse Code »

The filesystem turns readonly instead of returning the error to the
caller when detected error in btrfs_drop_snapshot().
and, because the caller doesn't check the error, the function type is
changed to 'void'.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-08-17 09:09:15 +0800
cdcb725c0 Btrfs: check if there is enough space for balancing smarter ... Browse Code »

When checking if there is enough space for balancing a block group,
since we do not take raid types into consideration, we do not account
corrent amounts of space that we needed. This makes us do some extra
work before we get ENOSPC.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

liubo
2011-08-17 09:09:15 +0800
38c01b960 Btrfs: fix a bug of balance on full multi-disk partitions ... Browse Code »

When balancing, we'll first try to shrink devices for some space,
but if it is working on a full multi-disk partition with raid protection,
we may encounter a bug, that is, while shrinking, total_bytes may be less
than bytes_used, and btrfs may allocate a dev extent that accesses out of
device's bounds.

Then we will not be able to write or read the data which stores at the end
of the device, and get the followings:

device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5
Btrfs detected SSD devices, enabling SSD mode
btrfs: relocating block group 476315648 flags 9
btrfs: found 4 extents
attempt to access beyond end of device
sdb5: rw=145, want=546176, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546304, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546432, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546560, limit=546147
attempt to access beyond end of device

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

liubo
2011-08-17 09:09:15 +0800
34f3e4f23 Btrfs: fix an oops of log replay ... Browse Code »

When btrfs recovers from a crash, it may hit the oops below:

------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:4580!
[...]
RIP: 0010:[] [] btrfs_add_link+0x161/0x1c0 [btrfs]
[...]
Call Trace:
[] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
[] add_inode_ref+0x319/0x3f0 [btrfs]
[] replay_one_buffer+0x2c7/0x390 [btrfs]
[] walk_down_log_tree+0x32a/0x480 [btrfs]
[] walk_log_tree+0xf5/0x240 [btrfs]
[] btrfs_recover_log_trees+0x250/0x350 [btrfs]
[] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
[] open_ctree+0x1442/0x17d0 [btrfs]
[...]

This comes from that while replaying an inode ref item, we forget to
check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree,
then we will come to conflict corners which lead to BUG_ON().

Signed-off-by: Liu Bo
Tested-by: Andy Lutomirski
Signed-off-by: Chris Mason

liubo
2011-08-17 09:09:15 +0800
d5e2003c2 Btrfs: detect wether a device supports discard ... Browse Code »

We have a problem where if a user specifies discard but doesn't actually support
it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem
because this gets called (in a fashion) from the tree log recovery code, which
has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
replay. So instead detect wether our devices support discard when we're adding
them and then don't issue discards if we know that the device doesn't support
it. And just for good measure set ret = 0 in btrfs_issue_discard just in case
we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-08-17 09:09:15 +0800

16 Aug, 2011

1 commit

fa71f4470 cifs: demote cERROR in build_path_from_dentry to cFYI ... Browse Code »

Running the cthon tests on a recent kernel caused this message to pop
occasionally:

CIFS VFS: did not end path lookup where expected namelen is 0

Some added debugging showed that namelen and dfsplen were both 0 when
this occurred. That means that the read_seqretry returned true.

Assuming that the comment inside the if statement is true, this should
be harmless and just means that we raced with a rename. If that is the
case, then there's no need for alarm and we can demote this to cFYI.

While we're at it, print the dfsplen too so that we can see what
happened here if the message pops during debugging.

Cc: stable@kernel.org
Cc: Al Viro
Signed-off-by: Jeff Layton
Signed-off-by: Steve French

Jeff Layton
2011-08-16 21:07:24 +0800

15 Aug, 2011

2 commits

a0b3447fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
jfs: flush journal completely before releasing metadata inodes

Linus Torvalds
2011-08-15 23:40:24 +0800
aa2b1cf5c Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: Do not set cifs/ntfs acl using a file handle (try #4)
[CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable

Linus Torvalds
2011-08-15 23:36:30 +0800

14 Aug, 2011

3 commits

9dd75f1f1 ext4: fix nomblk_io_submit option so it correctly converts uninit blocks ... Browse Code »

Bug discovered by Jan Kara:

Finally, commit 1449032be17abb69116dbc393f67ceb8bd034f92 returned back
the old IO submission code but apparently it forgot to return the old
handling of uninitialized buffers so we unconditionnaly call
block_write_full_page() without specifying end_io function. So AFAICS
we never convert unwritten extents to written in some cases. For
example when I mount the fs as: mount -t ext4 -o
nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
char buf[1024];
memset(buf, 'a', sizeof(buf));
fallocate(fd, 0, 0, 16384);
write(fd, buf, sizeof(buf));

I get a file full of zeros (after remounting the filesystem so that
pagecache is dropped) instead of seeing the first KB contain 'a's.

Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Theodore Ts'o
2011-08-14 00:58:21 +0800
32c80b32c ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. ... Browse Code »

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We don't increase i_aiodio_unwritten when setting
EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
to wait forever.

Part of the patch came from Eric in his e-mail, but it doesn't fix the
problem met by Michael actually.

http://marc.info/?l=linux-ext4&m=131316851417460&w=2

Reported-and-Tested-by: Michael Tokarev
Signed-off-by: Eric Sandeen
Signed-off-by: Tao Ma
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Tao Ma
2011-08-14 00:30:59 +0800
2581fdc81 ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode ... Browse Code »

Flush inode's i_completed_io_list before calling ext4_io_wait to
prevent the following deadlock scenario: A page fault happens while
some process is writing inode A. During page fault,
shrink_icache_memory is called that in turn evicts another inode
B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Also moves ext4_flush_completed_IO and ext4_ioend_wait from
ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
ext4_evict_inode() is called before ext4_destroy_inode() and in
ext4_evict_inode(), we may call ext4_truncate() without holding
i_mutex lock. As a result, there is a race between flush_completed_IO
that is called from ext4_ext_truncate() and ext4_end_io_work, which
may cause corruption on an io_end structure. This change moves
ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
to ext4_evict_inode() to resolve the race between ext4_truncate() and
ext4_end_io_work during inode deletion.

Signed-off-by: Jiaying Zhang
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Jiaying Zhang
2011-08-14 00:17:13 +0800

13 Aug, 2011

1 commit

441c85085 ext4: Fix ext4_should_writeback_data() for no-journal mode ... Browse Code »

ext4_should_writeback_data() had an incorrect sequence of
tests to determine if it should return 0 or 1: in
particular, even in no-journal mode, 0 was being returned
for a non-regular-file inode.

This meant that, in non-journal mode, we would use
ext4_journalled_aops for directories, symlinks, and other
non-regular files. However, calling journalled aop
callbacks when there is no valid handle, can cause problems.

This would cause a kernel crash with Jan Kara's commit
2d859db3e4 ("ext4: fix data corruption in inodes with
journalled data"), because we now dereference 'handle' in
ext4_journalled_write_end().

I also added BUG_ONs to check for a valid handle in the
obviously journal-only aops callbacks.

I tested this running xfstests with a scratch device in
these modes:

- no-journal
- data=ordered
- data=writeback
- data=journal

All work fine; the data=journal run has many failures and a
crash in xfstests 074, but this is no different from a
vanilla kernel.

Signed-off-by: Curt Wohlgemuth
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Curt Wohlgemuth
2011-08-13 23:25:18 +0800