Eric Lee / smarc-fsl-linux-kernel

27 Jan, 2010

1 commit

b04da8bfd fnctl: f_modown should call write_lock_irqsave/restore ... Browse Code »

Commit 703625118069f9f8960d356676662d3db5a9d116 exposed that f_modown()
should call write_lock_irqsave instead of just write_lock_irq so that
because a caller could have a spinlock held and it would not be good to
renable interrupts.

Cc: Eric W. Biederman
Cc: Al Viro
Cc: Alan Cox
Cc: Tavis Ormandy
Cc: stable
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds

Greg Kroah-Hartman
2010-01-27 09:25:38 +0800

26 Jan, 2010

1 commit

9a3cbe326 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag
ext4: Fix quota accounting error with fallocate
ext4: Handle -EDQUOT error on write

Linus Torvalds
2010-01-26 11:05:06 +0800

25 Jan, 2010

1 commit

cb289d624 eventfd - allow atomic read and waitqueue remove ... Browse Code »

KVM needs a wait to atomically remove themselves from the eventfd ->poll()
wait queue head, in order to handle correctly their IRQfd deassign
operation.

This patch introduces such API, plus a way to read an eventfd from its
context.

Signed-off-by: Davide Libenzi
Signed-off-by: Avi Kivity

Davide Libenzi
2010-01-25 22:26:38 +0800

21 Jan, 2010

5 commits

bdeef61cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
tty: fix race in tty_fasync
serial: serial_cs: oxsemi quirk breaks resume
serial: imx: bit &/| confusion
serial: Fix crash if the minimum rate of the device is > 9600 baud
serial-core: resume serial hardware with no_console_suspend
serial: 8250_pnp: use wildcard for serial Wacom tablets
nozomi: quick fix for the close/close bug
compat_ioctl: Supress "unknown cmd" message on serial /dev/console

Linus Torvalds
2010-01-21 23:37:20 +0800
456eac947 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
fs/bio.c: fix shadows sparse warning
drbd: The kernel code is now equivalent to out of tree release 8.3.7
drbd: Allow online resizing of DRBD devices while peer not reachable (needs to be explicitly forced)
drbd: Don't go into StandAlone mode when authentification failes because of network error
drivers/block/drbd/drbd_receiver.c: correct NULL test
cfq-iosched: Respect ioprio_class when preempting
genhd: overlapping variable definition
block: removed unused as_io_context
DM: Fix device mapper topology stacking
block: bdev_stack_limits wrapper
block: Fix discard alignment calculation and printing
block: Correct handling of bottom device misaligment
drbd: check on CONFIG_LBDAF, not LBD
drivers/block/drbd: Correct NULL test
drbd: Silenced an assert that could triggered after changing write ordering method
drbd: Kconfig fix
drbd: Fix for a race between IO and a detach operation [Bugz 262]
drbd: Use drbd_crypto_is_hash() instead of an open coded check

Linus Torvalds
2010-01-21 23:32:11 +0800
15e551e52 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
ecryptfs: use after free
ecryptfs: Eliminate useless code
ecryptfs: fix interpose/interpolate typos in comments
ecryptfs: pass matching flags to interpose as defined and used there
ecryptfs: remove unnecessary d_drop calls in ecryptfs_link
ecryptfs: don't ignore return value from lock_rename
ecryptfs: initialize private persistent file before dereferencing pointer
eCryptfs: Remove mmap from directory operations
eCryptfs: Add getattr function
eCryptfs: Use notify_change for truncating lower inodes

Linus Torvalds
2010-01-21 23:28:54 +0800
30a0f5e1f Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix possible panic on unmount
Btrfs: deal with NULL acl sent to btrfs_set_acl
Btrfs: fix regression in orphan cleanup
Btrfs: Fix race in btrfs_mark_extent_written
Btrfs, fix memory leaks in error paths
Btrfs: align offsets for btrfs_ordered_update_i_size
btrfs: fix missing last-entry in readdir(3)

Linus Torvalds
2010-01-21 23:28:05 +0800
3f0017112 compat_ioctl: Supress "unknown cmd" message on serial /dev/console ... Browse Code »

After the commit fb07a5f8 ("compat_ioctl: remove all VT ioctl
handling"), I got this error message on 64-bit mips kernel with 32-bit
busybox userland:

ioctl32(init:1): Unknown cmd fd(0) cmd(00005600){t:'V';sz:0} arg(7fd76480) on /dev/console

The cmd 5600 is VT_OPENQRY. The busybox's init issues this ioctl to
know vt-console or serial-console. If the console was serial console,
VT ioctls are not handled by the serial driver.

And by quick search, I found some programs using VT_GETMODE to check
vt-console is available or not.

Signed-off-by: Atsushi Nemoto
Cc: Arnd Bergmann
Signed-off-by: Greg Kroah-Hartman

Atsushi Nemoto
2010-01-21 07:03:26 +0800

20 Jan, 2010

10 commits

ece550f51 ecryptfs: use after free ... Browse Code »

The "full_alg_name" variable is used on a couple error paths, so we
shouldn't free it until the end.

Signed-off-by: Dan Carpenter
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks

Dan Carpenter
2010-01-20 12:36:06 +0800
4aa25bcb7 ecryptfs: Eliminate useless code ... Browse Code »

The variable lower_dentry is initialized twice to the same (side effect-free)
expression. Drop one initialization.

A simplified version of the semantic match that finds this problem is:
(http://coccinelle.lip6.fr/)

//
@forall@
idexpression *x;
identifier f!=ERR_PTR;
@@

x = f(...)
... when != x
(
x = f(...,,...)
|
* x = f(...)
)
//

Signed-off-by: Julia Lawall
Signed-off-by: Tyler Hicks

Julia Lawall
2010-01-20 12:36:05 +0800
fe0fc013c ecryptfs: fix interpose/interpolate typos in comments ... Browse Code »

Signed-off-by: Erez Zadok
Acked-by: Dustin Kirkland
Signed-off-by: Tyler Hicks

Erez Zadok
2010-01-20 12:36:03 +0800
3469b5732 ecryptfs: pass matching flags to interpose as defined and used there ... Browse Code »

ecryptfs_interpose checks if one of the flags passed is
ECRYPTFS_INTERPOSE_FLAG_D_ADD, defined as 0x00000001 in ecryptfs_kernel.h.
But the only user of ecryptfs_interpose to pass a non-zero flag to it, has
hard-coded the value as "1". This could spell trouble if any of these values
changes in the future.

Signed-off-by: Erez Zadok
Cc: Dustin Kirkland
Cc: Al Viro
Signed-off-by: Tyler Hicks

Erez Zadok
2010-01-20 12:36:02 +0800
c44a66d67 ecryptfs: remove unnecessary d_drop calls in ecryptfs_link ... Browse Code »

Unnecessary because it would unhash perfectly valid dentries, causing them
to have to be re-looked up the next time they're needed, which presumably is
right after.

Signed-off-by: Aseem Rastogi
Signed-off-by: Shrikar archak
Signed-off-by: Erez Zadok
Cc: Saumitra Bhanage
Cc: Al Viro
Signed-off-by: Tyler Hicks

Erez Zadok
2010-01-20 12:36:00 +0800
0d132f736 ecryptfs: don't ignore return value from lock_rename ... Browse Code »

Signed-off-by: Erez Zadok
Cc: Dustin Kirkland
Cc: Andrew Morton
Cc: Al Viro
Signed-off-by: Tyler Hicks

Erez Zadok
2010-01-20 12:35:59 +0800
e27759d7a ecryptfs: initialize private persistent file before dereferencing pointer ... Browse Code »

Ecryptfs_open dereferences a pointer to the private lower file (the one
stored in the ecryptfs inode), without checking if the pointer is NULL.
Right afterward, it initializes that pointer if it is NULL. Swap order of
statements to first initialize. Bug discovered by Duckjin Kang.

Signed-off-by: Duckjin Kang
Signed-off-by: Erez Zadok
Cc: Dustin Kirkland
Cc: Al Viro
Cc:
Signed-off-by: Tyler Hicks

Erez Zadok
2010-01-20 12:32:54 +0800
38e3eaeed eCryptfs: Remove mmap from directory operations ... Browse Code »

Adrian reported that mkfontscale didn't work inside of eCryptfs mounts.
Strace revealed the following:

open("./", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
open("./fonts.scale", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
getdents(3, /* 80 entries */, 32768) = 2304
open("./.", O_RDONLY) = 5
fcntl64(5, F_SETFD, FD_CLOEXEC) = 0
fstat64(5, {st_mode=S_IFDIR|0755, st_size=16384, ...}) = 0
mmap2(NULL, 16384, PROT_READ, MAP_PRIVATE, 5, 0) = 0xb7fcf000
close(5) = 0
--- SIGBUS (Bus error) @ 0 (0) ---
+++ killed by SIGBUS +++

The mmap2() on a directory was successful, resulting in a SIGBUS
signal later. This patch removes mmap() from the list of possible
ecryptfs_dir_fops so that mmap() isn't possible on eCryptfs directory
files.

https://bugs.launchpad.net/ecryptfs/+bug/400443

Reported-by: Adrian C.
Signed-off-by: Tyler Hicks

Tyler Hicks
2010-01-20 12:32:11 +0800
f8f484d1b eCryptfs: Add getattr function ... Browse Code »

The i_blocks field of an eCryptfs inode cannot be trusted, but
generic_fillattr() uses it to instantiate the blocks field of a stat()
syscall when a filesystem doesn't implement its own getattr(). Users
have noticed that the output of du is incorrect on newly created files.

This patch creates ecryptfs_getattr() which calls into the lower
filesystem's getattr() so that eCryptfs can use its kstat.blocks value
after calling generic_fillattr(). It is important to note that the
block count includes the eCryptfs metadata stored in the beginning of
the lower file plus any padding used to fill an extent before
encryption.

https://bugs.launchpad.net/ecryptfs/+bug/390833

Reported-by: Dominic Sacré
Signed-off-by: Tyler Hicks

Tyler Hicks
2010-01-20 12:32:09 +0800
5f3ef64f4 eCryptfs: Use notify_change for truncating lower inodes ... Browse Code »

When truncating inodes in the lower filesystem, eCryptfs directly
invoked vmtruncate(). As Christoph Hellwig pointed out, vmtruncate() is
a filesystem helper function, but filesystems may need to do more than
just a call to vmtruncate().

This patch moves the lower inode truncation out of ecryptfs_truncate()
and renames the function to truncate_upper(). truncate_upper() updates
an iattr for the lower inode to indicate if the lower inode needs to be
truncated upon return. ecryptfs_setattr() then calls notify_change(),
using the updated iattr for the lower inode, to complete the truncation.

For eCryptfs functions needing to truncate, ecryptfs_truncate() is
reintroduced as a simple way to truncate the upper inode to a specified
size and then truncate the lower inode accordingly.

https://bugs.launchpad.net/bugs/451368

Reported-by: Christoph Hellwig
Acked-by: Dustin Kirkland
Cc: ecryptfs-devel@lists.launchpad.net
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Tyler Hicks

Tyler Hicks
2010-01-20 12:32:07 +0800

19 Jan, 2010

2 commits

f06f135d8 fs/bio.c: fix shadows sparse warning ... Browse Code »

fs/bio.c:81:33: warning: symbol 'bslab' shadows an earlier one
fs/bio.c:74:25: originally declared here

Signed-off-by: Thiago Farina
Signed-off-by: Jens Axboe

Thiago Farina
2010-01-19 21:07:09 +0800
1e868d8e6 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: xfs_swap_extents needs to handle dynamic fork offsets
xfs: fix missing error check in xfs_rtfree_range
xfs: fix stale inode flush avoidance
xfs: Remove inode iolock held check during allocation
xfs: reclaim all inodes by background tree walks
xfs: Avoid inodes in reclaim when flushing from inode cache
xfs: reclaim inodes under a write lock

Linus Torvalds
2010-01-19 06:08:07 +0800

18 Jan, 2010

8 commits

11dfe35a0 Btrfs: fix possible panic on unmount ... Browse Code »

We can race with the unmount of an fs and the stopping of a kthread where we
will free the block group before we're done using it. The reason for this is
because we do not hold a reference on the block group while its caching, since
the allocator drops its reference once it exits or moves on to the next block
group. This patch fixes the problem by taking a reference to the block group
before we start caching and dropping it when we're done to make sure all
accesses to the block group are safe. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-01-18 09:40:30 +0800
a9cc71a60 Btrfs: deal with NULL acl sent to btrfs_set_acl ... Browse Code »

It is legal for btrfs_set_acl to be sent a NULL acl. This
makes sure we don't dereference it. A similar patch was sent by
Johannes Hirte

Signed-off-by: Chris Mason

Chris Mason
2010-01-18 09:40:22 +0800
6c090a11e Btrfs: fix regression in orphan cleanup ... Browse Code »

Currently orphan cleanup only ever gets triggered if we cross subvolumes during
a lookup, which means that if we just mount a plain jane fs that has orphans in
it, they will never get cleaned up. This results in panic's like these

http://www.kerneloops.org/oops.php?number=1109085

where adding an orphan entry results in -EEXIST being returned and we panic. In
order to fix this, we check to see on lookup if our root has had the orphan
cleanup done, and if not go ahead and do it. This is easily reproduceable by
running this testcase

#include
#include
#include
#include
#include
#include

int main(int argc, char **argv)
{
char data[4096];
char newdata[4096];
int fd1, fd2;

memset(data, 'a', 4096);
memset(newdata, 'b', 4096);

while (1) {
int i;

fd1 = creat("file1", 0666);
if (fd1 < 0)
break;

for (i = 0; i < 512; i++)
write(fd1, data, 4096);

fsync(fd1);
close(fd1);

fd2 = creat("file2", 0666);
if (fd2 < 0)
break;

ftruncate(fd2, 4096 * 512);

for (i = 0; i < 512; i++)
write(fd2, newdata, 4096);
close(fd2);

i = rename("file2", "file1");
unlink("file1");
}

return 0;
}

and then pulling the power on the box, and then trying to run that test again
when the box comes back up. I've tested this locally and it fixes the problem.
Thanks to Tomas Carnecky for helping me track this down initially.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-01-18 09:40:21 +0800
6c7d54ac8 Btrfs: Fix race in btrfs_mark_extent_written ... Browse Code »

Fix bug reported by Johannes Hirte. The reason of that bug
is btrfs_del_items is called after btrfs_duplicate_item and
btrfs_del_items triggers tree balance. The fix is check that
case and call btrfs_search_slot when needed.

Signed-off-by: Yan Zheng
Signed-off-by: Chris Mason

Yan, Zheng
2010-01-18 09:40:21 +0800
2423fdfb9 Btrfs, fix memory leaks in error paths ... Browse Code »

Stanse found 2 memory leaks in relocate_block_group and
__btrfs_map_block. cluster and multi are not freed/assigned on all
paths. Fix that.

Signed-off-by: Jiri Slaby
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason

Jiri Slaby
2010-01-18 09:40:20 +0800
a038fab0c Btrfs: align offsets for btrfs_ordered_update_i_size ... Browse Code »

Some callers of btrfs_ordered_update_i_size can now pass in
a NULL for the ordered extent to update against. This makes
sure we properly align the offset they pass in when deciding
how much to bump the on disk i_size.

Signed-off-by: Chris Mason

Yan, Zheng
2010-01-18 09:06:27 +0800
406266ab9 btrfs: fix missing last-entry in readdir(3) ... Browse Code »

parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd)
commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d
Author: Jan Engelhardt
Date: Wed Dec 9 22:57:36 2009 +0100

Btrfs: fix missing last-entry in readdir(3)

When one does a 32-bit readdir(3), the last entry of a directory is
missing. This is however not due to passing a large value to filldir,
but it seems to have to do with glibc doing telldir or something
quirky. In any case, this patch fixes it in practice.

Signed-off-by: Jan Engelhardt
Signed-off-by: Chris Mason

Jan Engelhardt
2010-01-18 09:06:27 +0800
7dc9c484a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
do_add_mount() should sanitize mnt_flags
CIFS shouldn't make mountpoints shrinkable
mnt_flags fixes in do_remount()
attach_recursive_mnt() needs to hold vfsmount_lock over set_mnt_shared()
may_umount() needs namespace_sem
Fix configfs leak
Fix the -ESTALE handling in do_filp_open()
ecryptfs: Fix refcnt leak on ecryptfs_follow_link() error path
Fix ACC_MODE() for real
Unrot uml mconsole a bit
hppfs: handle ->put_link()
Kill 9p readlink()
fix autofs/afs/etc. magic mountpoint breakage

Linus Torvalds
2010-01-18 03:01:16 +0800

17 Jan, 2010

7 commits

7e6608724 nommu: fix shared mmap after truncate shrinkage problems ... Browse Code »

Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen
over the end of a truncation. The problem is that
ramfs_nommu_check_mappings() checks that the reduced file size against the
VMA tree, but not the vm_region tree.

The following sequence of events can cause the problem:

fd = open("/tmp/x", O_RDWR|O_TRUNC|O_CREAT, 0600);
ftruncate(fd, 32 * 1024);
a = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
b = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
munmap(a, 32 * 1024);
ftruncate(fd, 16 * 1024);
c = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'a' creates a vm_region covering 32KB of the file. Mapping 'b'
sees that the vm_region from 'a' is covering the region it wants and so
shares it, pinning it in memory.

Mapping 'a' then goes away and the file is truncated to the end of VMA
'b'. However, the region allocated by 'a' is still in effect, and has
_not_ been reduced.

Mapping 'c' is then created, and because there's a vm_region covering the
desired region, get_unmapped_area() is _not_ called to repeat the check,
and the mapping is granted, even though the pages from the latter half of
the mapping have been discarded.

However:

d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'd' should work, and should end up sharing the region allocated by
'a'.

To deal with this, we shrink the vm_region struct during the truncation,
lest do_mmap_pgoff() take it as licence to share the full region
automatically without calling the get_unmapped_area() file op again.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
81759b5b2 nommu: fix race between ramfs truncation and shared mmap ... Browse Code »

Fix the race between the truncation of a ramfs file and an attempt to make
a shared mmap of region of that file.

The problem is that do_mmap_pgoff() calls f_op->get_unmapped_area() to
verify that the file region is made of contiguous pages and to find its
base address - but there isn't any locking to guarantee this region until
vma_prio_tree_insert() is called by add_vma_to_mm().

Note that moving the functionality into f_op->mmap() doesn't help as that
is also called before vma_prio_tree_insert().

Instead make ramfs_nommu_check_mappings() grab nommu_region_sem whilst it
does its checks. This means that this function will wait whilst mmaps
take place.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
27d55f1f4 do_add_mount() should sanitize mnt_flags ... Browse Code »

MNT_WRITE_HOLD shouldn't leak into new vfsmount and neither
should MNT_SHARED (the latter will be set properly, along with
the rest of shared-subtree data structures)

Signed-off-by: Al Viro

Al Viro
2010-01-17 02:07:36 +0800
7e1295d9f CIFS shouldn't make mountpoints shrinkable ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-01-17 02:06:32 +0800
7b43a79f3 mnt_flags fixes in do_remount() ... Browse Code »

* need vfsmount_lock over modifying it
* need to preserve MNT_SHARED/MNT_UNBINDABLE

Signed-off-by: Al Viro

Al Viro
2010-01-17 02:01:26 +0800
df1a1ad29 attach_recursive_mnt() needs to hold vfsmount_lock over set_mnt_shared() ... Browse Code »

race in mnt_flags update

Signed-off-by: Al Viro

Al Viro
2010-01-17 01:57:40 +0800
8ad08d8a0 may_umount() needs namespace_sem ... Browse Code »

otherwise it races with clone_mnt() changing mnt_share/mnt_slaves

Signed-off-by: Al Viro

Al Viro
2010-01-17 01:56:08 +0800

16 Jan, 2010

5 commits

976ae32be inotify: only warn once for inotify problems ... Browse Code »

inotify will WARN() if it finds that the idr and the fsnotify internals
somehow got out of sync. It was only supposed to do this once but due
to this stupid bug it would warn every single time a problem was
detected.

Signed-off-by: Eric Paris
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Eric Paris
2010-01-16 06:49:23 +0800
9e572cc98 inotify: do not reuse watch descriptors ... Browse Code »

Since commit 7e790dd5fc937bc8d2400c30a05e32a9e9eef276 ("inotify: fix
error paths in inotify_update_watch") inotify changed the manor in which
it gave watch descriptors back to userspace. Previous to this commit
inotify acted like the following:

inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 2

but after this patch inotify would return watch descriptors like so:

inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 1

which I saw as equivalent to opening an fd where

open(file) = 1;
close(1);
open(file) = 1;

seemed perfectly reasonable. The issue is that quite a bit of userspace
apparently relies on the behavior in which watch descriptors will not be
quickly reused. KDE relies on it, I know some selinux packages rely on
it, and I have heard complaints from other random sources such as debian
bug 558981.

Although the man page implies what we do is ok, we broke userspace so
this patch almost reverts us to the old behavior. It is still slightly
racey and I have patches that would fix that, but they are rather large
and this will fix it for all real world cases. The race is as follows:

- task1 creates a watch and blocks in idr_new_watch() before it updates
the hint.
- task2 creates a watch and updates the hint.
- task1 updates the hint with it's older wd
- task removes the watch created by task2
- task adds a new watch and will reuse the wd originally given to task2

it requires moving some locking around the hint (last_wd) but this should
solve it for the real world and be -stable safe.

As a side effect this patch papers over a bug in the lib/idr code which
is causing a large number WARN's to pop on people's system and many
reports in kerneloops.org. I'm working on the root cause of that idr
bug seperately but this should make inotify immune to that issue.

Signed-off-by: Eric Paris
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Eric Paris
2010-01-16 06:49:23 +0800
e09f98606 xfs: xfs_swap_extents needs to handle dynamic fork offsets ... Browse Code »

When swapping extents, we can corrupt inodes by swapping data forks
that are in incompatible formats. This is caused by the two indoes
having different fork offsets due to the presence of an attribute
fork on an attr2 filesystem. xfs_fsr tries to be smart about
setting the fork offset, but the trick it plays only works on attr1
(old fixed format attribute fork) filesystems.

Changing the way xfs_fsr sets up the attribute fork will prevent
this situation from ever occurring, so in the kernel code we can get
by with a preventative fix - check that the data fork in the
defragmented inode is in a format valid for the inode it is being
swapped into. This will lead to files that will silently and
potentially repeatedly fail defragmentation, so issue a warning to
the log when this particular failure occurs to let us know that
xfs_fsr needs updating/fixing.

To help identify how to improve xfs_fsr to avoid this issue, add
trace points for the inodes being swapped so that we can determine
why the swap was rejected and to confirm that the code is making the
right decisions and modifications when swapping forks.

A further complication is even when the swap is allowed to proceed
when the fork offset is different between the two inodes then value
for the maximum number of extents the data fork can hold can be
wrong. Make sure these are also set correctly after the swap occurs.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 03:49:07 +0800
3daeb42c1 xfs: fix missing error check in xfs_rtfree_range ... Browse Code »

When xfs_rtfind_forw() returns an error, the block is returned
uninitialised. xfs_rtfree_range() is not checking the error return,
so could be using an uninitialised block number for modifying bitmap
summary info.

The problem was found by gcc when compiling the *userspace* libxfs
code - it is an copy of the kernel code with the exact same bug.
gcc gives an uninitialised variable warning on the userspace code
but not on the kernel code. You gotta love the consistency (Mmmm,
slightly chewy today!).

Signed-off-by: Dave Chinner
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 03:46:19 +0800
4b6a46882 xfs: fix stale inode flush avoidance ... Browse Code »

When reclaiming stale inodes, we need to guarantee that inodes are
unpinned before returning with a "clean" status. If we don't we can
reclaim inodes that are pinned, leading to use after free in the
transaction subsystem as transactions complete.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-01-16 03:46:02 +0800