Eric Lee / smarc-fsl-linux-kernel

13 Sep, 2011

3 commits

0b001b2ed Merge branch 'for-linus' of git://github.com/chrismason/linux ... Browse Code »

* 'for-linus' of git://github.com/chrismason/linux:
Btrfs: add dummy extent if dst offset excceeds file end in
Btrfs: calc file extent num_bytes correctly in file clone
btrfs: xattr: fix attribute removal
Btrfs: fix wrong nbytes information of the inode
Btrfs: fix the file extent gap when doing direct IO
Btrfs: fix unclosed transaction handle in btrfs_cont_expand
Btrfs: fix misuse of trans block rsv
Btrfs: reset to appropriate block rsv after orphan operations
Btrfs: skip locking if searching the commit root in csum lookup
btrfs: fix warning in iput for bad-inode
Btrfs: fix an oops when deleting snapshots

Linus Torvalds
2011-09-13 02:47:49 +0800
5dfcc87fd fuse: fix memory leak ... Browse Code »
1

kmemleak is reporting that 32 bytes are being leaked by FUSE:

unreferenced object 0xe373b270 (size 32):
comm "fusermount", pid 1207, jiffies 4294707026 (age 2675.187s)
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x27/0x50
[] kmem_cache_alloc+0xc5/0x180
[] fuse_alloc_forget+0x1e/0x20
[] fuse_alloc_inode+0xb0/0xd0
[] alloc_inode+0x1c/0x80
[] iget5_locked+0x8f/0x1a0
[] fuse_iget+0x72/0x1a0
[] fuse_get_root_inode+0x8a/0x90
[] fuse_fill_super+0x3ef/0x590
[] mount_nodev+0x3f/0x90
[] fuse_mount+0x15/0x20
[] mount_fs+0x1c/0xc0
[] vfs_kern_mount+0x41/0x90
[] do_kern_mount+0x39/0xd0
[] do_mount+0x2e5/0x660
[] sys_mount+0x66/0xa0

This leak report is consistent and happens once per boot on
3.1.0-rc5-dirty.

This happens if a FORGET request is queued after the fuse device was
released.

Reported-by: Sitsofe Wheeler
Signed-off-by: Miklos Szeredi
Tested-by: Sitsofe Wheeler
Signed-off-by: Linus Torvalds

Miklos Szeredi
2011-09-13 02:47:10 +0800
24114504c fuse: fix flock breakage ... Browse Code »

Commit 37fb3a30b4 ("fuse: fix flock") added in 3.1-rc4 caused flock() to
fail with ENOSYS with the kernel ABI version 7.16 or earlier.

Fix by falling back to testing FUSE_POSIX_LOCKS for ABI versions 7.16
and earlier.

Reported-by: Martin Ziegler
Signed-off-by: Miklos Szeredi
Tested-by: Martin Ziegler
Signed-off-by: Linus Torvalds

Miklos Szeredi
2011-09-13 02:47:10 +0800

11 Sep, 2011

12 commits

d525e8ab0 Btrfs: add dummy extent if dst offset excceeds file end in ... Browse Code »

You can see there's no file extent with range [0, 4096]. Check this by
btrfsck:

# btrfsck /dev/sda7
root 5 inode 258 errors 100
...

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-09-11 22:52:25 +0800
d72c0842f Btrfs: calc file extent num_bytes correctly in file clone ... Browse Code »

num_bytes should be 4096 not 12288.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-09-11 22:52:25 +0800
4815053ab btrfs: xattr: fix attribute removal ... Browse Code »

An attribute is not removed by 'setfattr -x attr file' and remains
visible in attr list. This makes xfstests/062 pass again.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-09-11 22:52:25 +0800
a39f75214 Btrfs: fix wrong nbytes information of the inode ... Browse Code »

If we write some data into the data hole of the file(no preallocation for this
hole), Btrfs will allocate some disk space, and update nbytes of the inode, but
the other element--disk_i_size needn't be updated. At this condition, we must
update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size()
return 1).

# mkfs.btrfs /dev/sdb1
# mount /dev/sdb1 /mnt
# touch /mnt/a
# truncate -s 856002 /mnt/a
# dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc
# umount /mnt
# btrfsck /dev/sdb1
root 5 inode 257 errors 400
found 32768 bytes used err is 1

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-09-11 22:52:25 +0800
0c1a98c81 Btrfs: fix the file extent gap when doing direct IO ... Browse Code »

When we write some data to the place that is beyond the end of the file
in direct I/O mode, a data hole will be created. And Btrfs should insert
a file extent item that point to this hole into the fs tree. But unfortunately
Btrfs forgets doing it.

The following is a simple way to reproduce it:
# mkfs.btrfs /dev/sdc2
# mount /dev/sdc2 /test4
# touch /test4/a
# dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
# umount /test4
# btrfsck /dev/sdc2
root 5 inode 257 errors 100

Reported-by: Tsutomu Itoh
Signed-off-by: Miao Xie
Tested-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Miao Xie
2011-09-11 22:52:24 +0800
5b397377e Btrfs: fix unclosed transaction handle in btrfs_cont_expand ... Browse Code »

The function - btrfs_cont_expand() forgot to close the transaction handle before
it jump out the while loop. Fix it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-09-11 22:52:24 +0800
98c9942ac Btrfs: fix misuse of trans block rsv ... Browse Code »

At the beginning of create_pending_snapshot, trans->block_rsv is set
to pending->block_rsv and is used for snapshot things, however, when
it is done, we do not recover it as will.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2011-09-11 22:52:24 +0800
65450aa64 Btrfs: reset to appropriate block rsv after orphan operations ... Browse Code »

While truncating free space cache, we forget to change trans->block_rsv
back to the original one, but leave it with the orphan_block_rsv, and
then with option inode_cache enable, it leads to countless warnings of
btrfs_alloc_free_block and btrfs_orphan_commit_root:

WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]()
...
WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2011-09-11 22:52:24 +0800
ddf23b3fc Btrfs: skip locking if searching the commit root in csum lookup ... Browse Code »

It's not enough to just search the commit root, since we could be cow'ing the
very block we need to search through, which would mean that its locked and we'll
still deadlock. So use path->skip_locking as well. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-09-11 22:52:24 +0800
e0b6d65be btrfs: fix warning in iput for bad-inode ... Browse Code »

iput() shouldn't be called for inodes in I_NEW state.
We need to mark inode as constructed first.

WARNING: at fs/inode.c:1309 iput+0x20b/0x210()
Call Trace:
[] warn_slowpath_common+0x7a/0xb0
[] warn_slowpath_null+0x15/0x20
[] iput+0x20b/0x210
[] btrfs_iget+0x1eb/0x4a0
[] btrfs_run_defrag_inodes+0x136/0x210
[] cleaner_kthread+0x17f/0x1a0
[] ? sub_preempt_count+0x9d/0xd0
[] ? transaction_kthread+0x280/0x280
[] kthread+0x96/0xa0
[] kernel_thread_helper+0x4/0x10
[] ? kthread_worker_fn+0x190/0x190
[] ? gs_change+0xb/0xb

Signed-off-by: Sergei Trofimovich
CC: Konstantin Khlebnikov
Tested-by: David Sterba
CC: Josef Bacik
CC: Chris Mason
Signed-off-by: Chris Mason

Sergei Trofimovich
2011-09-11 22:52:24 +0800
14c7cca78 Btrfs: fix an oops when deleting snapshots ... Browse Code »

We can reproduce this oops via the following steps:

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs
$ for ((i=0; ii_ino
to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID,
while the snapshot's location.objectid remains unchanged.

However, btrfs_ino() does not take this into account, and returns a wrong ino,
and causes the oops.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2011-09-11 22:52:24 +0800
290a1cc4f Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md:
md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
md/raid1,10: Remove use-after-free bug in make_request.
md/raid10: unify handling of write completion.
Avoid dereferencing a 'request_queue' after last close.

Linus Torvalds
2011-09-11 01:19:15 +0800

10 Sep, 2011

3 commits

94007751b Avoid dereferencing a 'request_queue' after last close. ... Browse Code »
1

On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed. The free
is done in a separate thread so it might happen a short time later.

__blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
called.

Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock. This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.

So move the called to bdev_inode_switch_bdi before the call to
->release.

Cc: Christoph Hellwig
Cc: Hugh Dickins
Cc: Andrew Morton
Cc: Wu Fengguang
Acked-by: Wu Fengguang
Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-09-10 15:20:21 +0800
0d20fbbe8 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client ... Browse Code »

* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
libceph: fix leak of osd structs during shutdown
ceph: fix memory leak
ceph: fix encoding of ino only (not relative) paths
libceph: fix msgpool

Linus Torvalds
2011-09-10 06:48:34 +0800
0ec26fd06 vfs: automount should ignore LOOKUP_FOLLOW ... Browse Code »
1

Prior to 2.6.38 automount would not trigger on either stat(2) or
lstat(2) on the automount point.

After 2.6.38, with the introduction of the ->d_automount()
infrastructure, stat(2) and others would start triggering automount
while lstat(2), etc. still would not. This is a regression and a
userspace ABI change.

Problem originally reported here:

http://thread.gmane.org/gmane.linux.kernel.autofs/6098

It appears that there was an attempt at fixing various userspace tools
to not trigger the automount. But since the stat system call is
rather common it is impossible to "fix" all userspace.

This patch reverts the original behavior, which is to not trigger on
stat(2) and other symlink following syscalls.

[ It's not really clear what the right behavior is. Apparently Solaris
does the "automount on stat, leave alone on lstat". And some programs
can get unhappy when "stat+open+fstat" ends up giving a different
result from the fstat than from the initial stat.

But the change in 2.6.38 resulted in problems for some people, so
we're going back to old behavior. Maybe we can re-visit this
discussion at some future date - Linus ]

Reported-by: Leonardo Chiquitto
Signed-off-by: Miklos Szeredi
Acked-by: Ian Kent
Cc: David Howells
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Miklos Szeredi
2011-09-10 06:42:34 +0800

08 Sep, 2011

1 commit

54d6d5374 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 and git://git.infradead.org/ubi-2.6 ... Browse Code »

* branch 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: not build debug messages with CONFIG_UBIFS_FS_DEBUG disabled

* branch 'linux-next' of git://git.infradead.org/ubi-2.6:
UBI: do not link debug messages when debugging is disabled

Linus Torvalds
2011-09-08 00:51:43 +0800

06 Sep, 2011

5 commits

51b8b4fb3 fs/9p: Use protocol-defined value for lock/getlock 'type' field. ... Browse Code »
1

Signed-off-by: Jim Garlick
Signed-off-by: Aneesh Kumar K.V

Jim Garlick
2011-09-06 21:17:16 +0800
73f507171 fs/9p: Always ask new inode in lookup for cache mode disabled ... Browse Code »
1

This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.

Signed-off-by: Aneesh Kumar K.V

Aneesh Kumar K.V
2011-09-06 21:17:15 +0800
f88657ce3 fs/9p: Add OS dependent open flags in 9p protocol ... Browse Code »
1

Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri

Signed-off-by: Aneesh Kumar K.V

Aneesh Kumar K.V
2011-09-06 21:17:15 +0800
45089142b fs/9p: Don't update file type when updating file attributes ... Browse Code »
1

We should only update attributes that we can change on stat2inode.
Also do file type initialization in v9fs_init_inode.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2011-09-06 21:17:14 +0800
5441ae5eb fs/9p: Add fid before dentry instantiation ... Browse Code »
1

d_instantiate marks the dentry positive. So a parallel lookup and mkdir of
the directory can find dentry that doesn't have fid attached. This can result
in both the code path doing v9fs_fid_add which results in v9fs_dentry leak.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2011-09-06 21:17:14 +0800

02 Sep, 2011

1 commit

4d7b5a116 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix ->write_inode return values
xfs: fix xfs_mark_inode_dirty during umount
xfs: deprecate the nodelaylog mount option

Linus Torvalds
2011-09-02 23:25:23 +0800

01 Sep, 2011

3 commits

58d84c4ee xfs: fix ->write_inode return values ... Browse Code »
1

Currently we always redirty an inode that was attempted to be written out
synchronously but has been cleaned by an AIL pushed internall, which is
rather bogus. Fix that by doing the i_update_core check early on and
return 0 for it. Also include async calls for it, as doing any work for
those is just as pointless. While we're at it also fix the sign for the
EIO return in case of a filesystem shutdown, and fix the completely
non-sensical locking around xfs_log_inode.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder
(cherry picked from commit 297db93bb74cf687510313eb235a7aec14d67e97)

Signed-off-by: Alex Elder

Christoph Hellwig
2011-09-01 22:46:11 +0800
866e4ed77 xfs: fix xfs_mark_inode_dirty during umount ... Browse Code »
1

During umount we do not add a dirty inode to the lru and wait for it to
become clean first, but force writeback of data and metadata with
I_WILL_FREE set. Currently there is no way for XFS to detect that the
inode has been redirtied for metadata operations, as we skip the
mark_inode_dirty call during teardown. Fix this by setting i_update_core
nanually in that case, so that the inode gets flushed during inode reclaim.

Alternatively we could enable calling mark_inode_dirty for inodes in
I_WILL_FREE state, and let the VFS dirty tracking handle this. I decided
against this as we will get better I/O patterns from reclaim compared to
the synchronous writeout in write_inode_now, and always marking the inode
dirty in some way from xfs_mark_inode_dirty is a better safetly net in
either case.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder
(cherry picked from commit da6742a5a4cc844a9982fdd936ddb537c0747856)

Signed-off-by: Alex Elder

Christoph Hellwig
2011-09-01 06:59:39 +0800
b79c4f75e Merge tag 'for_linus-20110831' of git://github.com/tytso/ext4 ... Browse Code »

* tag 'for_linus-20110831' of git://github.com/tytso/ext4:
ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining

Linus Torvalds
2011-09-01 06:08:19 +0800

31 Aug, 2011

1 commit

8c0bec215 ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining ... Browse Code »
1

The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places. In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode(). Instead, we take a different approach to resolve
the software lockup that commit 2581fdc810 intends to fix. Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Signed-off-by: Jiaying Zhang
Signed-off-by: "Theodore Ts'o"

Jiaying Zhang
2011-08-31 23:50:51 +0800

27 Aug, 2011

1 commit

f5b940997 All Arch: remove linkage for sys_nfsservctl system call ... Browse Code »

The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Linus Torvalds

NeilBrown
2011-08-27 06:09:58 +0800

26 Aug, 2011

1 commit

e096d0c7e lockdep: Add helper function for dir vs file i_mutex annotation ... Browse Code »
43

Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists. As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes. Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that. As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

find/645 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [] might_fault+0x5c/0xac

but task is already holding lock:
(&sb->s_type->i_mutex_key#15){+.+.+.}, at: []
vfs_readdir+0x5b/0xb4

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
[] lock_acquire+0xbf/0x103
[] __mutex_lock_common+0x4c/0x361
[] mutex_lock_nested+0x40/0x45
[] hugetlbfs_file_mmap+0x82/0x110
[] mmap_region+0x258/0x432
[] do_mmap_pgoff+0x2ac/0x306
[] sys_mmap_pgoff+0x118/0x16a
[] sys_mmap+0x22/0x24
[] system_call_fastpath+0x16/0x1b

-> #0 (&mm->mmap_sem){++++++}:
[] __lock_acquire+0xa1a/0xcf7
[] lock_acquire+0xbf/0x103
[] might_fault+0x89/0xac
[] filldir+0x6f/0xc7
[] dcache_readdir+0x67/0x205
[] vfs_readdir+0x7b/0xb4
[] sys_getdents+0x7e/0xd1
[] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.

Signed-off-by: Josh Boyer
Acked-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Josh Boyer
2011-08-26 01:50:18 +0800

25 Aug, 2011

2 commits

242d62196 xfs: deprecate the nodelaylog mount option ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-08-25 23:30:05 +0800
051732bcb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
fuse: mark pages accessed when written to
fuse: delete dead .write_begin and .write_end aops
fuse: fix flock
fuse: fix non-ANSI void function notation

Linus Torvalds
2011-08-25 00:14:42 +0800

24 Aug, 2011

2 commits

c2183d1e9 fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message ... Browse Code »
1

FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the
message processing could overrun and result in a "kernel BUG at
fs/fuse/dev.c:629!"

Reported-by: Han-Wen Nienhuys
Signed-off-by: Miklos Szeredi
CC: stable@kernel.org

Miklos Szeredi
2011-08-24 16:20:17 +0800
35a177a08 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix tracing builds inside the source tree
xfs: remove subdirectories
xfs: don't expect xfs headers to be in subdirectories

Linus Torvalds
2011-08-24 02:41:44 +0800

23 Aug, 2011

2 commits

b6bede3b4 xfs: fix tracing builds inside the source tree ... Browse Code »

The code really requires the current source directory to be in the
header search path. We already do this if building with an object
tree separate from the source, but it needs to be added manually
if building inside the source. The cflags addition for it accidentally
got removed when collapsing the xfs directory structure.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-08-23 05:37:24 +0800
259a187ad ceph: fix memory leak ... Browse Code »

kfree does not clean up indirect allocations in
ceph_fs_client and ceph_options (e.g. snapdir_name).

Signed-off-by: Noah Watkins
Signed-off-by: Sage Weil

Noah Watkins
2011-08-23 04:06:59 +0800

21 Aug, 2011

2 commits

6719db6a2 Btrfs: fix 64 bit divide problem ... Browse Code »

This fixes a regression introduced by commit cdcb725c05fe ("Btrfs: check
if there is enough space for balancing smarter"). We can't do 64-bit
divides on 32-bit architectures.

In cases where we need to divide/multiply by 2 we should just left/right
shift respectively, and in cases where theres N number of devices use
do_div. Also make the counters u64 to match up with rw_devices.
Thanks,

Signed-off-by: Josef Bacik
Acked-and-tested-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Josef Bacik
2011-08-21 22:02:00 +0800
c063d8a60 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: flush any pending end_io requests before DIO reads w/dioread_nolock
ext4: fix nomblk_io_submit option so it correctly converts uninit blocks
ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.
ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode
ext4: Fix ext4_should_writeback_data() for no-journal mode

Linus Torvalds
2011-08-21 21:59:41 +0800

20 Aug, 2011

1 commit

dccaf33fa ext4: flush any pending end_io requests before DIO reads w/dioread_nolock ... Browse Code »
1

There is a race between ext4 buffer write and direct_IO read with
dioread_nolock mount option enabled. The problem is that we clear
PageWriteback flag during end_io time but will do
uninitialized-to-initialized extent conversion later with dioread_nolock.
If an O_direct read request comes in during this period, ext4 will return
zero instead of the recently written data.

This patch checks whether there are any pending uninitialized-to-initialized
extent conversion requests before doing O_direct read to close the race.
Note that this is just a bandaid fix. The fundamental issue is that we
clear PageWriteback flag before we really complete an IO, which is
problem-prone. To fix the fundamental issue, we may need to implement an
extent tree cache that we can use to look up pending to-be-converted extents.

Signed-off-by: Jiaying Zhang
Signed-off-by: "Theodore Ts'o"
Cc: stable@kernel.org

Jiaying Zhang
2011-08-20 07:13:32 +0800