Eric Lee / smarc-fsl-linux-kernel

21 Jun, 2011

2 commits

eda084109 Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix break_lease flags on nfsd open
nfsd: link returns nfserr_delay when breaking lease
nfsd: v4 support requires CRYPTO
nfsd: fix dependency of nfsd on auth_rpcgss

Linus Torvalds
2011-06-21 11:10:52 +0800
366982065 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
fix comment in generic_permission()
kill obsolete comment for follow_down()
proc_sys_permission() is OK in RCU mode
reiserfs_permission() doesn't need to bail out in RCU mode
proc_fd_permission() is doesn't need to bail out in RCU mode
nilfs2_permission() doesn't need to bail out in RCU mode
logfs doesn't need ->permission() at all
coda_ioctl_permission() is safe in RCU mode
cifs_permission() doesn't need to bail out in RCU mode
bad_inode_permission() is safe from RCU mode
ubifs: dereferencing an ERR_PTR in ubifs_mount()

Linus Torvalds
2011-06-21 11:09:15 +0800

20 Jun, 2011

14 commits

90a800de0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: avoid delayed metadata items during commits
btrfs: fix uninitialized return value
btrfs: fix wrong reservation when doing delayed inode operations
btrfs: Remove unused sysfs code
btrfs: fix dereference of ERR_PTR value
Btrfs: fix relocation races
Btrfs: set no_trans_join after trying to expand the transaction
Btrfs: protect the pending_snapshots list with trans_lock
Btrfs: fix path leakage on subvol deletion
Btrfs: drop the delalloc_bytes check in shrink_delalloc
Btrfs: check the return value from set_anon_super

Linus Torvalds
2011-06-20 23:58:53 +0800
8e833fd2e fix comment in generic_permission() ... Browse Code »

CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if
no exec bits are set.

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:45:56 +0800
6291176bc kill obsolete comment for follow_down() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:45:49 +0800
1aec7036d proc_sys_permission() is OK in RCU mode ... Browse Code »

nothing blocking there, since all instances of sysctl
->permissions() method are non-blocking - both of them,
that is.

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:45:25 +0800
1d29b5a2e reiserfs_permission() doesn't need to bail out in RCU mode ... Browse Code »

nothing blocking other than generic_permission() (and
check_acl callback does bail out in RCU mode).

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:45:21 +0800
cf1279111 proc_fd_permission() is doesn't need to bail out in RCU mode ... Browse Code »

nothing blocking except generic_permission()

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:44:50 +0800
730e908f3 nilfs2_permission() doesn't need to bail out in RCU mode ... Browse Code »

Nothing blocking except for generic_permission(). Which will DTRT.

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:44:33 +0800
a63ab94d6 logfs doesn't need ->permission() at all ... Browse Code »

... and never did, what with its ->permission() being what we do by default
when ->permission is NULL...

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:44:26 +0800
6b419951f coda_ioctl_permission() is safe in RCU mode ... Browse Code »

return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking...

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:44:19 +0800
ec12781f1 cifs_permission() doesn't need to bail out in RCU mode ... Browse Code »

nothing potentially blocking except generic_permission(), which
will DTRT

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:44:07 +0800
1712c20da bad_inode_permission() is safe from RCU mode ... Browse Code »

return -EIO; is *not* a blocking operation, thank you very much.
Nick, what the hell have you been smoking?

Signed-off-by: Al Viro

Al Viro
2011-06-20 22:44:00 +0800
185bf8739 ubifs: dereferencing an ERR_PTR in ubifs_mount() ... Browse Code »

d251ed271d5 "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.

Signed-off-by: Dan Carpenter
Signed-off-by: Al Viro

Dan Carpenter
2011-06-20 22:42:34 +0800
105f46221 nfsd4: fix break_lease flags on nfsd open ... Browse Code »

Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!

Reported-by: Casey Bodley
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2011-06-20 22:38:01 +0800
8816ead9d Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent… ... Browse Code »

…-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tools/perf: Fix static build of perf tool
tracing: Fix regression in printk_formats file

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: Make watchdog robust vs. interruption
timerfd: Fix wakeup of processes when timer is cancelled on clock change

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, MAINTAINERS: Add x86 MCE people
x86, efi: Do not reserve boot services regions within reserved areas

Linus Torvalds
2011-06-20 00:00:18 +0800

18 Jun, 2011

10 commits

c11760c6d isofs: fix bh leak in isofs_fill_super() error case ... Browse Code »

In isofs_fill_super(), when an iso_primary_descriptor is found, it is
kept in pri_bh. The error cases don't properly release it. Fix it.

Reported-and-tested-by: 김원석
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-06-18 22:25:42 +0800
e999376f0 Btrfs: avoid delayed metadata items during commits ... Browse Code »

Snapshot creation has two phases. One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.

The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.

This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.

Signed-off-by: Chris Mason

Chris Mason
2011-06-18 04:38:47 +0800
35a30d7ce btrfs: fix uninitialized return value ... Browse Code »

When allocation fails in btrfs_read_fs_root_no_name, ret is not set
although it is returned, holding a garbage value.

Signed-off-by: David Sterba
Reviewed-by: Li Zefan
Signed-off-by: Chris Mason

David Sterba
2011-06-18 02:54:18 +0800
19fd29495 btrfs: fix wrong reservation when doing delayed inode operations ... Browse Code »

We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened:

[ 9792.654889] ------------[ cut here ]------------
[ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
btrfs_alloc_free_block+0xca/0x27c [btrfs]()
[ 9792.654899] Hardware name: To Be Filled By O.E.M.
[ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[ 9792.654919] Pid: 2762, comm: rm Tainted: G W 2.6.39+ #1
[ 9792.654920] Call Trace:
[ 9792.654922] [] warn_slowpath_common+0x83/0x9b
[ 9792.654925] [] warn_slowpath_null+0x1a/0x1c
[ 9792.654933] [] btrfs_alloc_free_block+0xca/0x27c [btrfs]
[ 9792.654945] [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.654953] [] __btrfs_cow_block+0xfc/0x30c [btrfs]
[ 9792.654963] [] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs]
[ 9792.654970] [] ? read_block_for_search+0x94/0x368 [btrfs]
[ 9792.654978] [] btrfs_cow_block+0xfe/0x146 [btrfs]
[ 9792.654986] [] btrfs_search_slot+0x14d/0x4b6 [btrfs]
[ 9792.654997] [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.655022] [] btrfs_lookup_inode+0x2f/0x8f [btrfs]
[ 9792.655025] [] ? _cond_resched+0xe/0x22
[ 9792.655027] [] ? mutex_lock+0x29/0x50
[ 9792.655039] [] btrfs_update_delayed_inode+0x72/0x137 [btrfs]
[ 9792.655051] [] btrfs_run_delayed_items+0x90/0xdb [btrfs]
[ 9792.655062] [] btrfs_commit_transaction+0x228/0x654 [btrfs]
[ 9792.655064] [] ? remove_wait_queue+0x3a/0x3a
[ 9792.655075] [] btrfs_evict_inode+0x14d/0x202 [btrfs]
[ 9792.655077] [] evict+0x71/0x111
[ 9792.655079] [] iput+0x12a/0x132
[ 9792.655081] [] do_unlinkat+0x106/0x155
[ 9792.655083] [] ? path_put+0x1f/0x23
[ 9792.655085] [] ? audit_syscall_entry+0x145/0x171
[ 9792.655087] [] ? putname+0x34/0x36
[ 9792.655090] [] sys_unlinkat+0x29/0x2b
[ 9792.655092] [] system_call_fastpath+0x16/0x1b
[ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---

This patch fix it by setting the reservation of the transaction handle to the
correct one.

Reported-by: Josef Bacik
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-06-18 02:54:18 +0800
9fe6a50fb btrfs: Remove unused sysfs code ... Browse Code »

Removes code no longer used. The sysfs file itself is kept, because the
btrfs developers expressed interest in putting new entries to sysfs.

Signed-off-by: Maarten Lankhorst
Signed-off-by: Chris Mason

Maarten Lankhorst
2011-06-18 02:54:18 +0800
3ed4498ca btrfs: fix dereference of ERR_PTR value ... Browse Code »

smatch reports:

btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing
possible ERR_PTR()

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-06-18 02:54:17 +0800
e038dca80 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/… ... Browse Code »

…btrfs-work into for-linus

Conflicts:
fs/btrfs/transaction.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Chris Mason
2011-06-18 02:16:13 +0800
01eff85b0 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: make log devices with write back caches work
xfs: fix ->mknod() return value on xfs_get_acl() failure

Linus Torvalds
2011-06-18 01:37:41 +0800
7585717f3 Btrfs: fix relocation races ... Browse Code »

The recent commit to get rid of our trans_mutex introduced
some races with block group relocation. The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations. The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.

Signed-off-by: Chris Mason

Chris Mason
2011-06-18 01:36:58 +0800
879669961 KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring ... Browse Code »

____call_usermodehelper() now erases any credentials set by the
subprocess_inf::init() function. The problem is that commit
17f60a7da150 ("capabilites: allow the application of capability limits
to usermode helpers") creates and commits new credentials with
prepare_kernel_cred() after the call to the init() function. This wipes
all keyrings after umh_keys_init() is called.

The best way to deal with this is to put the init() call just prior to
the commit_creds() call, and pass the cred pointer to init(). That
means that umh_keys_init() and suchlike can modify the credentials
_before_ they are published and potentially in use by the rest of the
system.

This prevents request_key() from working as it is prevented from passing
the session keyring it set up with the authorisation token to
/sbin/request-key, and so the latter can't assume the authority to
instantiate the key. This causes the in-kernel DNS resolver to fail
with ENOKEY unconditionally.

Signed-off-by: David Howells
Acked-by: Eric Paris
Tested-by: Jeff Layton
Signed-off-by: Linus Torvalds

David Howells
2011-06-18 00:40:48 +0800

17 Jun, 2011

2 commits

8b97b21e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
proc: Fix Oops on stat of /proc//ns/net

Linus Torvalds
2011-06-17 06:02:20 +0800
8dac6bee3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
AFS: Use i_generation not i_version for the vnode uniquifier
AFS: Set s_id in the superblock to the volume name
vfs: Fix data corruption after failed write in __block_write_begin()
afs: afs_fill_page reads too much, or wrong data
VFS: Fix vfsmount overput on simultaneous automount
fix wrong iput on d_inode introduced by e6bc45d65d
Delay struct net freeing while there's a sysfs instance refering to it
afs: fix sget() races, close leak on umount
ubifs: fix sget races
ubifs: split allocation of ubifs_info into a separate function
fix leak in proc_set_super()

Linus Torvalds
2011-06-17 01:21:59 +0800

16 Jun, 2011

12 commits

a27a263ba xfs: make log devices with write back caches work ... Browse Code »

There's no reason not to support cache flushing on external log devices.
The only thing this really requires is flushing the data device first
both in fsync and log commits. A side effect is that we also have to
remove the barrier write test during mount, which has been superflous
since the new FLUSH+FUA code anyway. Also use the chance to flush the
RT subvolume write cache before the fsync commit, which is required
for correct semantics.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-06-16 23:52:39 +0800
d6e43f751 AFS: Use i_generation not i_version for the vnode uniquifier ... Browse Code »

Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct. i_version can then be given the AFS data version
number.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2011-06-16 23:44:48 +0800
2e41ae225 AFS: Set s_id in the superblock to the volume name ... Browse Code »

Set s_id in the superblock to the name of the AFS volume that this superblock
corresponds to.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2011-06-16 23:44:47 +0800
f9f07b6c1 vfs: Fix data corruption after failed write in __block_write_begin() ... Browse Code »

I've got a report of a file corruption from fsxlinux on ext3. The important
operations to the page were:
mapwrite to a hole
partial write to the page
read - found the page zeroed from the end of the normal write

The culprit seems to be that if get_block() fails in __block_write_begin()
(e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
of the page is needed and overwrites old data. In fact, I don't see why we
should ever need to zero the uptodate bit here - either the page was uptodate
when we entered __block_write_begin() and it should stay so when we leave it,
or it was not uptodate and noone had right to set it uptodate during
__block_write_begin() so it remains !uptodate when we leave as well. So just
remove clearing of the bit.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2011-06-16 23:44:46 +0800
5e7f23373 afs: afs_fill_page reads too much, or wrong data ... Browse Code »

afs_fill_page should read the page that is about to be written but
the current implementation has a number of issues. If we aren't
extending the file we always read PAGE_CACHE_SIZE at offset 0. If we
are extending the file we try to read the entire file.

Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset,
clamped to i_size.

While here, avoid calling afs_fill_page when we are doing a
PAGE_CACHE_SIZE write.

Signed-off-by: Anton Blanchard
Signed-off-by: David Howells
Signed-off-by: Al Viro

Anton Blanchard
2011-06-16 23:44:46 +0800
8aef18845 VFS: Fix vfsmount overput on simultaneous automount ... Browse Code »

[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount(). However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed. follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt. That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move. The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

#include
#include
#include
#include
#include
int main(int argc, char **argv)
{
int pid, ws;
struct stat buf;
pid = fork();
stat(argv[1], &buf);
if (pid > 0) wait(&ws);
return 0;
}

and the following procedure:

(1) Mount an NFS volume that on the server has something else mounted on a
subdirectory. For instance, I can mount / from my server:

mount warthog:/ /mnt -t nfs4 -r

On the server /data has another filesystem mounted on it, so NFS will see
a change in FSID as it walks down the path, and will mark /mnt/data as
being a mountpoint. This will cause the automount code to be triggered.

!!! Do not look inside the mounted fs at this point !!!

(2) Run the above program on a file within the submount to generate two
simultaneous automount requests:

/tmp/forkstat /mnt/data/testfile

(3) Unmount the automounted submount:

umount /mnt/data

(4) Unmount the original mount:

umount /mnt

At this point the kernel should throw a BUG with something like the
following:

BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

[] shrink_dcache_for_umount+0x69/0x82
[] generic_shutdown_super+0x37/0x15b
[] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
[] kill_anon_super+0x1d/0x7e
[] nfs4_kill_super+0x60/0xb6 [nfs]
[] deactivate_locked_super+0x34/0x83
[] deactivate_super+0x6f/0x7b
[] mntput_no_expire+0x18d/0x199
[] mntput+0x3b/0x44
[] release_mounts+0xa2/0xbf
[] sys_umount+0x47a/0x4ba
[] ? trace_hardirqs_on_caller+0x1fd/0x22f
[] system_call_fastpath+0x16/0x1b

as do_umount() is inlined. However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton
Tested-by: Ian Kent
Signed-off-by: David Howells
Signed-off-by: Al Viro

Al Viro
2011-06-16 23:28:16 +0800
50338b889 fix wrong iput on d_inode introduced by e6bc45d65d ... Browse Code »

Git bisection shows that commit e6bc45d65df8599fdbae73be9cec4ceed274db53 causes
BUG_ONs under high I/O load:

kernel BUG at fs/inode.c:1368!
[ 2862.501007] Call Trace:
[ 2862.501007] [] d_kill+0xf8/0x140
[ 2862.501007] [] dput+0xc9/0x190
[ 2862.501007] [] fput+0x15f/0x210
[ 2862.501007] [] filp_close+0x61/0x90
[ 2862.501007] [] sys_close+0xb1/0x110
[ 2862.501007] [] system_call_fastpath+0x16/0x1b

A reliable way to reproduce this bug is:
Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
and apt-get remove openjdk-6-jdk.

The buggy part of the patch is this:
struct inode *inode = NULL;
.....
- if (nd.last.name[nd.last.len])
- goto slashes;
inode = dentry->d_inode;
- if (inode)
- ihold(inode);
+ if (nd.last.name[nd.last.len] || !inode)
+ goto slashes;
+ ihold(inode)
...
if (inode)
iput(inode); /* truncate the inode here */

If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
and dentry->d_inode is non-NULL, then this code now does an additional iput on
the inode, which is wrong.

Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.

Reference: https://lkml.org/lkml/2011/6/15/50
Reported-by: Norbert Preining
Reported-by: Török Edwin
Cc: "Theodore Ts'o"
Cc: Al Viro
Signed-off-by: Török Edwin
Signed-off-by: Al Viro

Török Edwin
2011-06-16 23:27:39 +0800
13fca640b Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP" ... Browse Code »

This reverts commit 7f81c8890c15a10f5220bebae3b6dfae4961962a.

It turns out that it's not actually a build-time check on x86-64 UML,
which does some seriously crazy stuff with VM_STACK_FLAGS.

The VM_STACK_FLAGS define depends on the arch-supplied
VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have

arch/um/sys-x86_64/shared/sysdep/vm-flags.h:

#define VM_STACK_DEFAULT_FLAGS \
(test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags)

#define VM_STACK_DEFAULT_FLAGS vm_stack_flags

(yes, seriously: two different #define's for that thing, with the first
one being inside an "#ifdef TIF_IA32")

It's possible that it is UML that should just be fixed in this area, but
for now let's just undo the (very small) optimization.

Reported-by: Randy Dunlap
Acked-by: Andrew Morton
Cc: Michal Hocko
Cc: Richard Weinberger
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-06-16 12:53:52 +0800
7f81c8890 fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP ... Browse Code »

Commit a8bef8ff6ea1 ("mm: migration: avoid race between shift_arg_pages()
and rmap_walk() during migration by not migrating temporary stacks")
introduced a BUG_ON() to ensure that VM_STACK_FLAGS and
VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile time
one, so BUILD_BUG_ON is more appropriate.

Signed-off-by: Michal Hocko
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-06-16 11:03:59 +0800
793925334 proc: Fix Oops on stat of /proc/<zombie pid>/ns/net ... Browse Code »

Don't call iput with the inode half setup to be a namespace filedescriptor.
Instead rearrange the code so that we don't initialize ei->ns_ops until
after I ns_ops->get succeeds, preventing us from invoking ns_ops->put
when ns_ops->get failed.

Reported-by: Ingo Saitz
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2011-06-16 05:35:29 +0800
ed0ca1402 Btrfs: set no_trans_join after trying to expand the transaction ... Browse Code »

We can lockup if we try to allow new writers join the transaction and we have
flushoncommit set or have a pending snapshot. This is because we set
no_trans_join and then loop around and try to wait for ordered extents again.
The problem is the ordered endio stuff needs to join the transaction, which it
can't do because no_trans_join is set. So instead wait until after this loop to
set no_trans_join and then make sure to wait for num_writers == 1 in case
anybody got started in between us exiting the loop and setting no_trans_join.
This could easily be reproduced by mounting -o flushoncommit and running xfstest
13. It cannot be reproduced with this patch. Thanks,

Reported-by: Jim Schutt
Signed-off-by: Josef Bacik

Josef Bacik
2011-06-16 01:24:47 +0800
8351583e3 Btrfs: protect the pending_snapshots list with trans_lock ... Browse Code »

Currently there is nothing protecting the pending_snapshots list on the
transaction. We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption. So protect
this list with the trans_lock. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-06-16 01:24:46 +0800