Eric Lee / smarc-fsl-linux-kernel

06 Mar, 2011

1 commit

fb62c00a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: no .snap inside of snapped namespace
libceph: fix msgr standby handling
libceph: fix msgr keepalive flag
libceph: fix msgr backoff
libceph: retry after authorization failure
libceph: fix handling of short returns from get_user_pages
ceph: do not clear I_COMPLETE from d_release
ceph: do not set I_COMPLETE
Revert "ceph: keep reference to parent inode on ceph_dentry"

Linus Torvalds
2011-03-06 02:43:22 +0800

05 Mar, 2011

2 commits

e9e3d724e nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) ... Browse Code »

The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

bad_page+0x69/0x91
free_hot_cold_page+0x81/0x144
skb_release_data+0x5f/0x98
__kfree_skb+0x11/0x1a
tcp_ack+0x6a3/0x1868
tcp_rcv_established+0x7a6/0x8b9
tcp_v4_do_rcv+0x2a/0x2fa
tcp_v4_rcv+0x9a2/0x9f6
do_timer+0x2df/0x52c
ip_local_deliver+0x19d/0x263
ip_rcv+0x539/0x57c
netif_receive_skb+0x470/0x49f
:virtio_net:virtnet_poll+0x46b/0x5c5
net_rx_action+0xac/0x1b3
__do_softirq+0x89/0x133
call_softirq+0x1c/0x28
do_softirq+0x2c/0x7d
do_IRQ+0xec/0xf5
default_idle+0x0/0x50
ret_from_intr+0x0/0xa
default_idle+0x29/0x50
cpu_idle+0x95/0xb8
start_kernel+0x220/0x225
_sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set. We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
PG_Slab set

or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go. I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it. This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked. We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages. This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.

Signed-off-by: Neil Horman
CC: Trond Myklebust
CC: security@kernel.org
CC: Jeff Layton
Signed-off-by: Linus Torvalds

Neil Horman
2011-03-05 09:28:52 +0800
455cec0ab ceph: no .snap inside of snapped namespace ... Browse Code »

Otherwise you can do things like

# mkdir .snap/foo
# cd .snap/foo/.snap
# ls

Signed-off-by: Sage Weil

Sage Weil
2011-03-05 04:25:09 +0800

04 Mar, 2011

6 commits

833602694 Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
hfs: fix rename() over non-empty directory
udf: fix i_nlink limit
fix reiserfs mkdir() breakage
exofs: i_nlink races in rename()
nilfs2: i_nlink races in rename()
minix: i_nlink races in rename()
ufs: i_nlink races in rename()
sysv: i_nlink races in rename()

Linus Torvalds
2011-03-04 07:37:59 +0800
4c7fd114c Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: zero proper structure size for geometry calls

Linus Torvalds
2011-03-04 04:44:22 +0800
c640e13f8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix regression that i-flag is not set on changeless checkpoints

Linus Torvalds
2011-03-04 04:42:48 +0800
16a8b70a5 ceph: do not clear I_COMPLETE from d_release ... Browse Code »

First, this was racy anyway: d_release isn't called until well after the
dentry is unhashed. Second, this runs afoul of the recent dcache change
that clears d_parent prior to calling d_release (949854d0), causing a NULL
pointer dereference.

Signed-off-by: Sage Weil

Sage Weil
2011-03-04 02:09:52 +0800
b545cc150 ceph: do not set I_COMPLETE ... Browse Code »

Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.

Signed-off-by: Sage Weil

Sage Weil
2011-03-04 02:09:51 +0800
9bde178d0 Revert "ceph: keep reference to parent inode on ceph_dentry" ... Browse Code »

This reverts commit 97d79b403ef03f729883246208ef5d8a2ebc4d68.

This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.

Signed-off-by: Sage Weil

Sage Weil
2011-03-04 02:09:50 +0800

03 Mar, 2011

10 commits

69102e9b4 hfs: fix rename() over non-empty directory ... Browse Code »

merge hfs_unlink() and hfs_rmdir(), while we are at it.

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:40 +0800
810c1b2e4 udf: fix i_nlink limit ... Browse Code »

(256 << sizeof(x)) - 1 is not the maximal possible value of x...
In reality, the maximal allowed value for UDF FileLinkCount is
65535.

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:40 +0800
99890a3be fix reiserfs mkdir() breakage ... Browse Code »

if directory has so many subdirectories that its link count is set
to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails,
we shouldn't decrement the parent's link count in cleanup path;
that's what DEC_DIR_INODE_NLINK() is for. As it is, we end up
with parent suddenly getting zero i_nlink, with very unpleasant
effects.

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:40 +0800
babfe5604 exofs: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:17 +0800
30eb43d31 nilfs2: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:17 +0800
6f88049ca minix: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:16 +0800
37750cdda ufs: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:16 +0800
4787d45fa sysv: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:16 +0800
f7d222ea2 Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 ... Browse Code »

* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
of/promtree: allow DT device matching by fixing 'name' brokenness (v5)
x86: OLPC: have prom_early_alloc BUG rather than return NULL
of/flattree: Drop an uninteresting message to pr_debug level
of: Add missing of_address.h to xilinx ehci driver

Linus Torvalds
2011-03-03 12:01:57 +0800
8aaccf7fa of/flattree: Drop an uninteresting message to pr_debug level ... Browse Code »

This message looks like an error (which it isn't) when booting with a
flattened device tree. Remove the message from normal kernel builds.

Signed-off-by: Paul Bolle
Signed-off-by: Grant Likely

Paul Bolle
2011-03-03 04:45:18 +0800

02 Mar, 2011

3 commits

e8a80c6f7 ext2: Fix link count corruption under heavy link+rename load ... Browse Code »

vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.

In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.

CC: stable@kernel.org
CC: Al Viro
Signed-off-by: Josh Hunt
Signed-off-by: Jan Kara

Josh Hunt
2011-03-02 18:03:52 +0800
af24ee9ea xfs: zero proper structure size for geometry calls ... Browse Code »

Commit 493f3358cb289ccf716c5a14fa5bb52ab75943e5 added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:

+ memset(geo, 0, sizeof(*geo));

Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires. As a result, this can happen:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93

Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
Call Trace:

[] ? panic+0x50/0x150
[] ? __stack_chk_fail+0x10/0x18
[] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]

Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.

Note: This patch is an alternative to one originally proposed by
Eric Sandeen.

Reported-by: Jeffrey Hundstad
Signed-off-by: Alex Elder
Reviewed-by: Eric Sandeen
Tested-by: Jeffrey Hundstad

Alex Elder
2011-03-02 11:21:13 +0800
72746ac64 nilfs2: fix regression that i-flag is not set on changeless checkpoints ... Browse Code »

According to the report from Jiro SEKIBA titled "regression in
2.6.37?" (Message-Id: ), on 2.6.37 and
later kernels, lscp command no longer displays "i" flag on checkpoints
that snapshot operations or garbage collection created.

This is a regression of nilfs2 checkpointing function, and it's
critical since it broke behavior of a part of nilfs2 applications.
For instance, snapshot manager of TimeBrowse gets to create
meaningless snapshots continuously; snapshot creation triggers another
checkpoint, but applications cannot distinguish whether the new
checkpoint contains meaningful changes or not without the i-flag.

This patch fixes the regression and brings that application behavior
back to normal.

Reported-by: Jiro SEKIBA
Signed-off-by: Ryusuke Konishi
Tested-by: Ryusuke Konishi
Tested-by: Jiro SEKIBA
Cc: stable [2.6.37]

Ryusuke Konishi
2011-03-02 08:55:18 +0800

01 Mar, 2011

3 commits

e6eb5ce1b fs/block_dev.c: fix new kernel-doc warning ... Browse Code »

Fix new kernel-doc warning in fs/block_dev.c:

Warning(fs/block_dev.c:937): No description found for parameter 'kill_dirty'

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-03-01 10:08:31 +0800
58da94f01 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix truncate after open
fuse: fix hang of single threaded fuseblk filesystem

Linus Torvalds
2011-03-01 09:53:04 +0800
158a5d61f Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Check heartbeat mode for kernel stacks only
Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number.
ocfs2: Fix estimate of necessary credits for mkdir

Linus Torvalds
2011-03-01 09:52:47 +0800

26 Feb, 2011

7 commits

7137c6bd4 aio: fix race between io_destroy() and io_submit() ... Browse Code »

A race can occur when io_submit() races with io_destroy():

CPU1 CPU2
io_submit()
do_io_submit()
...
ctx = lookup_ioctx(ctx_id);
io_destroy()
Now do_io_submit() holds the last reference to ctx.
...
queue new AIO
put_ioctx(ctx) - frees ctx with active AIOs

We solve this issue by checking whether ctx is being destroyed in AIO
submission path after adding new AIO to ctx. Then we are guaranteed that
either io_destroy() waits for new AIO or we see that ctx is being
destroyed and bail out.

Cc: Nick Piggin
Reviewed-by: Jeff Moyer
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2011-02-26 07:07:37 +0800
3bd9a5d73 aio: fix rcu ioctx lookup ... Browse Code »

aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.

lookup_ioctx doesn't implement the rcu lookup pattern properly.
rcu_read_lock does not prevent refcount going to zero, so we might take
a refcount on a zero count ioctx.

Fix the bug by atomically testing for zero refcount before incrementing.

[jack@suse.cz: added comment into the code]
Reviewed-by: Jeff Moyer
Signed-off-by: Nick Piggin
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2011-02-26 07:07:37 +0800
294f6cf48 ldm: corrupted partition table can cause kernel oops ... Browse Code »

The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions. A
kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.

The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.

Signed-off-by: Timo Warns
Cc: Eugene Teo
Acked-by: Richard Russon
Cc: Harvey Harrison
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Timo Warns
2011-02-26 07:07:36 +0800
22bacca48 epoll: prevent creating circular epoll structures ... Browse Code »

In several places, an epoll fd can call another file's ->f_op->poll()
method with ep->mtx held. This is in general unsafe, because that other
file could itself be an epoll fd that contains the original epoll fd.

The code defends against this possibility in its own ->poll() method using
ep_call_nested, but there are several other unsafe calls to ->poll
elsewhere that can be made to deadlock. For example, the following simple
program causes the call in ep_insert recursively call the original fd's
->poll, leading to deadlock:

#include
#include

int main(void) {
int e1, e2, p[2];
struct epoll_event evt = {
.events = EPOLLIN
};

e1 = epoll_create(1);
e2 = epoll_create(2);
pipe(p);

epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt);
epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt);
write(p[1], p, sizeof p);
epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);

return 0;
}

On insertion, check whether the inserted file is itself a struct epoll,
and if so, do a recursive walk to detect whether inserting this file would
create a loop of epoll structures, which could lead to deadlock.

[nelhage@ksplice.com: Use epmutex to serialize concurrent inserts]
Signed-off-by: Davide Libenzi
Signed-off-by: Nelson Elhage
Reported-by: Nelson Elhage
Tested-by: Nelson Elhage
Cc: [2.6.34+, possibly earlier]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davide Libenzi
2011-02-26 07:07:36 +0800
4660ba63f Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix fiemap bugs with delalloc
Btrfs: set FMODE_EXCL in btrfs_device->mode
Btrfs: make btrfs_rm_device() fail gracefully
Btrfs: Avoid accessing unmapped kernel address
Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl
Btrfs: allow balance to explicitly allocate chunks as it relocates
Btrfs: put ENOSPC debugging under a mount option

Linus Torvalds
2011-02-26 06:03:39 +0800
638691a7a Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md:
md: Fix - again - partition detection when array becomes active
Fix over-zealous flush_disk when changing device size.
md: avoid spinlock problem in blk_throtl_exit
md: correctly handle probe of an 'mdp' device.
md: don't set_capacity before array is active.
md: Fix raid1->raid0 takeover

Linus Torvalds
2011-02-26 03:13:26 +0800
f129ccc92 afs: Fix oops in afs_unlink_writeback ... Browse Code »

I'm seeing the following oops when testing afs:

Unable to handle kernel paging request for data at address 0x00000008
...
NIP [c0000000003393b0] .afs_unlink_writeback+0x38/0xc0
LR [c00000000033987c] .afs_put_writeback+0x98/0xec
Call Trace:
[c00000000345f600] [c00000000033987c] .afs_put_writeback+0x98/0xec
[c00000000345f690] [c00000000033ae80] .afs_write_begin+0x6a4/0x75c
[c00000000345f790] [c00000000012b77c] .generic_file_buffered_write+0x148/0x320
[c00000000345f8d0] [c00000000012e1b8] .__generic_file_aio_write+0x37c/0x3e4
[c00000000345f9d0] [c00000000012e2a8] .generic_file_aio_write+0x88/0xfc
[c00000000345fa90] [c0000000003390a8] .afs_file_write+0x10c/0x178
[c00000000345fb40] [c000000000188788] .do_sync_write+0xc4/0x128
[c00000000345fcc0] [c000000000189658] .vfs_write+0xe8/0x1d8
[c00000000345fd70] [c000000000189884] .SyS_write+0x68/0xb0
[c00000000345fe30] [c000000000008564] syscall_exit+0x0/0x40

afs_write_begin hits an error and calls afs_unlink_writeback. In there
we do list_del_init on an uninitialised list.

The patch below initialises ->link when creating the afs_writeback struct.

Signed-off-by: Anton Blanchard
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Anton Blanchard
2011-02-26 03:12:37 +0800

25 Feb, 2011

3 commits

8d56addd7 fuse: fix truncate after open ... Browse Code »

Commit e1181ee6 "vfs: pass struct file to do_truncate on O_TRUNC
opens" broke the behavior of open(O_TRUNC|O_RDONLY) in fuse. Fuse
assumed that when called from open, a truncate() will be done, not an
ftruncate().

Fix by restoring the old behavior, based on the ATTR_OPEN flag.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2011-02-25 21:44:58 +0800
5a18ec176 fuse: fix hang of single threaded fuseblk filesystem ... Browse Code »

Single threaded NTFS-3G could get stuck if a delayed RELEASE reply
triggered a DESTROY request via path_put().

Fix this by

a) making RELEASE requests synchronous, whenever possible, on fuseblk
filesystems

b) if not possible (triggered by an asynchronous read/write) then do
the path_put() in a separate thread with schedule_work().

Reported-by: Oliver Neukum
Cc: stable@kernel.org
Signed-off-by: Miklos Szeredi

Miklos Szeredi
2011-02-25 21:44:58 +0800
e7407d161 block: bd_link_disk_holder() should hold on to holder_dir ... Browse Code »

The new implementation of bd_link_disk_holder() added by 49731baa41d
(block: restore multiple bd_link_disk_holder() support) didn't get an
extra reference for the holder_dir kobject of the slave bdev; however,
bdev kills holder_dir on removal, not release, so if the slave bdev is
removed while there are holder links, the holder_dir will be destroyed
while there still are holder links, which leads to oops later when
bd_unlink_disk_order() tries to remove those links.

Make bd_link_disk_holder() grab an extra reference for the slave's
holder_dir and put it in bd_unlink_disk_holder().

Signed-off-by: Tejun Heo
Reported-by: "Hawrylewicz Czarnowski, Przemyslaw"
Tested-by: "Hawrylewicz Czarnowski, Przemyslaw"
Cc: Neil Brown
Cc: Jens Axboe
Signed-off-by: Linus Torvalds

Tejun Heo
2011-02-25 00:55:55 +0800

24 Feb, 2011

4 commits

bf9faa2aa Unlock vfsmount_lock in do_umount ... Browse Code »

By the commit
b3e19d9 2011-01-07 fs: scale mntget/mntput
vfsmount_lock was introduced around testing mnt_count.
Fix the mis-typed 'unlock'

Signed-off-by: J. R. Okajima
Acked-by: Al Viro
Signed-off-by: Al Viro

J. R. Okajima
2011-02-24 15:10:57 +0800
93b270f76 Fix over-zealous flush_disk when changing device size. ... Browse Code »

There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data. In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: Jeff Moyer
Cc: Andrew Patterson
Cc: Jens Axboe
Signed-off-by: NeilBrown

NeilBrown
2011-02-24 14:25:47 +0800
2aa15890f mm: prevent concurrent unmap_mapping_range() on the same inode ... Browse Code »

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode. For example:

thread1: going through a big range, stops in the middle of a vma and
stores the restart address in vm_truncate_count.

thread2: comes in with a small (e.g. single page) unmap request on
the same vma, somewhere before restart_address, finds that the
vma was already unmapped up to the restart address and happily
returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value. This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex. Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers. In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
lockbreak" patch in particular. But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi
Reported-by: Michael Leun
Reported-by: Gurudas Pai
Tested-by: Gurudas Pai
Acked-by: Hugh Dickins
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Miklos Szeredi
2011-02-24 11:52:52 +0800
ec29ed5b4 Btrfs: fix fiemap bugs with delalloc ... Browse Code »

The Btrfs fiemap code wasn't properly returning delalloc extents,
so applications that trust fiemap to decide if there are holes in the
file see holes instead of delalloc.

This reworks the btrfs fiemap code, adding a get_extent helper that
searches for delalloc ranges and also adding a helper for extent_fiemap
that skips past holes in the file.

Signed-off-by: Chris Mason

Chris Mason
2011-02-24 05:23:20 +0800

23 Feb, 2011

1 commit

be715140b xfs: check if device support discard in xfs_ioc_trim() ... Browse Code »

Right now we, are relying on the fact that when we attempt to
actually do the discard, blkdev_issue_discar() returns -EOPNOTSUPP
and the user is informed that the device does not support discard.

However, in the case where the we do not hit any suitable free
extent to trim in FITRIM code, it will finish without any error.
This is very confusing, because it seems that FITRIM was successful
even though the device does not actually supports discard.

Solution: Check for the discard support before attempt to search for
free extents.

Signed-off-by: Lukas Czerner
Signed-off-by: Alex Elder

Lukas Czerner
2011-02-23 05:08:44 +0800