Eric Lee / smarc-fsl-linux-kernel

15 Sep, 2014

4 commits

83373f702 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"double iput() on failure exit in lustre, racy removal of spliced
dentries from ->s_anon in __d_materialise_dentry() plus a bunch of
assorted RCU pathwalk fixes"

The RCU pathwalk fixes end up fixing a couple of cases where we
incorrectly dropped out of RCU walking, due to incorrect initialization
and testing of the sequence locks in some corner cases. Since dropping
out of RCU walk mode forces the slow locked accesses, those corner cases
slowed down quite dramatically.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
be careful with nd->inode in path_init() and follow_dotdot_rcu()
don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu()
fix bogus read_seqretry() checks introduced in b37199e
move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon)
[fix] lustre: d_make_root() does iput() on dentry allocation failure

Linus Torvalds
2014-09-15 08:37:36 +0800
9226b5b44 vfs: avoid non-forwarding large load after small store in path lookup ... Browse Code »

The performance regression that Josef Bacik reported in the pathname
lookup (see commit 99d263d4c5b2 "vfs: fix bad hashing of dentries") made
me look at performance stability of the dcache code, just to verify that
the problem was actually fixed. That turned up a few other problems in
this area.

There are a few cases where we exit RCU lookup mode and go to the slow
serializing case when we shouldn't, Al has fixed those and they'll come
in with the next VFS pull.

But my performance verification also shows that link_path_walk() turns
out to have a very unfortunate 32-bit store of the length and hash of
the name we look up, followed by a 64-bit read of the combined hash_len
field. That screws up the processor store to load forwarding, causing
an unnecessary hickup in this critical routine.

It's caused by the ugly calling convention for the "hash_name()"
function, and easily fixed by just making hash_name() fill in the whole
'struct qstr' rather than passing it a pointer to just the hash value.

With that, the profile for this function looks much smoother.

Signed-off-by: Linus Torvalds

Linus Torvalds
2014-09-15 08:28:32 +0800
4023bfc9f be careful with nd->inode in path_init() and follow_dotdot_rcu() ... Browse Code »

in the former we simply check if dentry is still valid after picking
its ->d_inode; in the latter we fetch ->d_inode in the same places
where we fetch dentry and its ->d_seq, under the same checks.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Al Viro

Al Viro
2014-09-15 02:24:47 +0800
7bd88377d don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() ... Browse Code »

return the value instead, and have path_init() do the assignment. Broken by
"vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
which was Cc-stable with 2.6.38+ as destination. This one should go where
it went.

To avoid dummy value returned in case when root is already set (it would do
no harm, actually, since the only caller that doesn't ignore the return value
is guaranteed to have nd->root *not* set, but it's more obvious that way),
lift the check into callers. And do the same to set_root(), to keep them
in sync.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Al Viro

Al Viro
2014-09-15 02:19:44 +0800

14 Sep, 2014

3 commits

f5be3e291 fix bogus read_seqretry() checks introduced in b37199e ... Browse Code »

read_seqretry() returns true on mismatch, not on match...

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Al Viro

Al Viro
2014-09-14 10:14:16 +0800
6f18493e5 move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon) ... Browse Code »

and lock the right list there

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2014-09-14 10:14:03 +0800
99d263d4c vfs: fix bad hashing of dentries ... Browse Code »

Josef Bacik found a performance regression between 3.2 and 3.10 and
narrowed it down to commit bfcfaa77bdf0 ("vfs: use 'unsigned long'
accesses for dcache name comparison and hashing"). He reports:

"The test case is essentially

for (i = 0; i < 1000000; i++)
mkdir("a$i");

On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
dir/sec with 3.10. This is because we spend waaaaay more time in
__d_lookup on 3.10 than in 3.2.

The new hashing function for strings is suboptimal for <
sizeof(unsigned long) string names (and hell even > sizeof(unsigned
long) string names that I've tested). I broke out the old hashing
function and the new one into a userspace helper to get real numbers
and this is what I'm getting:

Old hash table had 1000000 entries, 0 dupes, 0 max dupes
New hash table had 12628 entries, 987372 dupes, 900 max dupes
We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash

My test does the hash, and then does the d_hash into a integer pointer
array the same size as the dentry hash table on my system, and then
just increments the value at the address we got to see how many
entries we overlap with.

As you can see the old hash function ended up with all 1 million
entries in their own bucket, whereas the new one they are only
distributed among ~12.5k buckets, which is why we're using so much
more CPU in __d_lookup".

The reason for this hash regression is two-fold:

- On 64-bit architectures the down-mixing of the original 64-bit
word-at-a-time hash into the final 32-bit hash value is very
simplistic and suboptimal, and just adds the two 32-bit parts
together.

In particular, because there is no bit shuffling and the mixing
boundary is also a byte boundary, similar character patterns in the
low and high word easily end up just canceling each other out.

- the old byte-at-a-time hash mixed each byte into the final hash as it
hashed the path component name, resulting in the low bits of the hash
generally being a good source of hash data. That is not true for the
word-at-a-time case, and the hash data is distributed among all the
bits.

The fix is the same in both cases: do a better job of mixing the bits up
and using as much of the hash data as possible. We already have the
"hash_32|64()" functions to do that.

Reported-by: Josef Bacik
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Chris Mason
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2014-09-14 02:30:10 +0800

13 Sep, 2014

2 commits

602b53662 Merge tag 'nfs-for-3.17-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client fixes from Trond Myklebust:
"Highlights:
- fix a kernel warning when removing /proc/net/nfsfs
- revert commit 49a4bda22e18 due to Oopses
- fix a typo in the pNFS file layout commit code"

* tag 'nfs-for-3.17-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
pnfs: fix filelayout_retry_commit when idx > 0
nfs: revert "nfs4: queue free_lock_state job submission to nfsiod"
nfs: fix kernel warning when removing proc entry

Linus Torvalds
2014-09-13 02:54:54 +0800
7ed641be7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"Filipe is doing a careful pass through fsync problems, and these are
the fixes so far. I'll have one more for rc6 that we're still
testing.

My big commit is fixing up some inode hash races that Al Viro found
(thanks Al)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: use insert_inode_locked4 for inode creation
Btrfs: fix fsync data loss after a ranged fsync
Btrfs: kfree()ing ERR_PTRs
Btrfs: fix crash while doing a ranged fsync
Btrfs: fix corruption after write/fsync failure + fsync + log recovery
Btrfs: fix autodefrag with compression

Linus Torvalds
2014-09-13 02:53:30 +0800

11 Sep, 2014

6 commits

584f1adaf Merge branch 'akpm' (fixes from Andrew Morton) ... Browse Code »

Merge misc fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton :
fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
fsnotify/fdinfo: use named constants instead of hardcoded values
kcmp: fix standard comparison bug
mm/mmap.c: use pr_emerg when printing BUG related information
shm: add memfd.h to UAPI export list
checkpatch: allow commit descriptions on separate line from commit id
sh: get_user_pages_fast() must flush cache
eventpoll: fix uninitialized variable in epoll_ctl
kernel/printk/printk.c: fix faulty logic in the case of recursive printk
mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()

Linus Torvalds
2014-09-11 06:42:18 +0800
7e8824816 fs/notify: don't show f_handle if exportfs_encode_inode_fh failed ... Browse Code »

Currently we handle only ENOSPC. In case of other errors the file_handle
variable isn't filled properly and we will show a part of stack.

Signed-off-by: Andrey Vagin
Acked-by: Cyrill Gorcunov
Cc: Alexander Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2014-09-11 06:42:12 +0800
1fc98d11c fsnotify/fdinfo: use named constants instead of hardcoded values ... Browse Code »

MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
bytes, so exportfs_encode_inode_fh can return an error.

Signed-off-by: Andrey Vagin
Acked-by: Cyrill Gorcunov
Cc: Alexander Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2014-09-11 06:42:12 +0800
c680e41b3 eventpoll: fix uninitialized variable in epoll_ctl ... Browse Code »

When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
not initialized but ep_take_care_of_epollwakeup reads its event field.
When this unintialized field has EPOLLWAKEUP bit set, a capability check
is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup. This
produces unexpected messages in the audit log, such as (on a system
running SELinux):

type=AVC msg=audit(1408212798.866:410): avc: denied
{ block_suspend } for pid=7754 comm="dbus-daemon" capability=36
scontext=unconfined_u:unconfined_r:unconfined_t
tcontext=unconfined_u:unconfined_r:unconfined_t
tclass=capability2 permissive=1

type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
exe="/usr/bin/dbus-daemon"
subj=unconfined_u:unconfined_r:unconfined_t key=(null)

("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")

Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.

Fixes: 4d7e30d98939 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
Signed-off-by: Nicolas Iooss
Cc: Alexander Viro
Cc: Arve Hjønnevåg
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Iooss
2014-09-11 06:42:12 +0800
7ec62d421 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull UDF fixes from Jan Kara:
"Fixes for UDF handling of NFS handles and one fix for proper handling
of corrupted media"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: saner calling conventions for udf_new_inode()
udf: fix the udf_iget() vs. udf_new_inode() races
udf: merge the pieces inserting a new non-directory object into directory
udf: Set i_generation field
udf: Properly detect stale inodes
udf: Make udf_read_inode() and udf_iget() return error
udf: Avoid infinite loop when processing indirect ICBs
udf: Fold udf_fill_inode() into __udf_read_inode()
udf: Avoid dir link count to go negative

Linus Torvalds
2014-09-11 05:04:17 +0800
224ecbf5a pnfs: fix filelayout_retry_commit when idx > 0 ... Browse Code »

filelayout_retry_commit was recently split out from alloc_ds_commits,
but was done in such a way that the bucket pointer always starts at
index 0 no matter what the @idx argument is set to.

The intention of the @idx argument is to retry commits starting at
bucket @idx. This is called when alloc_ds_commits fails for a bucket.

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2014-09-11 03:43:45 +0800

10 Sep, 2014

1 commit

e874a5fe3 Merge branch 'for-next-3.17' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs/smb3 fixes from Steve French:
"This includes various cifs and smb3 bug fixes including those for bugs
found with the recently updated xfstests.

Also I am working fixes for two additional cifs problems found by
xfstests which I plan to send later (when reviewed and run additional
tests)"

* 'for-next-3.17' of git://git.samba.org/sfrench/cifs-2.6:
Clarify Kconfig help text for CIFS and SMB2/SMB3
CIFS: Fix wrong filename length for SMB2
CIFS: Fix wrong restart readdir for SMB1
CIFS: Fix directory rename error
cifs: No need to send SIGKILL to demux_thread during umount
cifs: Allow directIO read/write during cache=strict
cifs: remove unneeded check of null checking in if condition
cifs: fix a possible use of uninit variable in SMB2_sess_setup
cifs: fix memory leak when password is supplied multiple times
cifs: fix a possible null pointer deref in decode_ascii_ssetup
Trivial whitespace fix

Linus Torvalds
2014-09-10 08:00:43 +0800

09 Sep, 2014

9 commits

0c0e0d3c0 nfs: revert "nfs4: queue free_lock_state job submission to nfsiod" ... Browse Code »

This reverts commit 49a4bda22e186c4d0eb07f4a36b5b1a378f9398d.

Christoph reported an oops due to the above commit:

generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
SMP
[ 2187.042899] Modules linked in:
[ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
[ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 2187.044287] Workqueue: nfsiod free_lock_state_work
[ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
[ 2187.044287] RIP: 0010:[] [] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296
[ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
[ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
[ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
[ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
[ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
[ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
[ 2187.044287] Stack:
[ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
[ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
[ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
[ 2187.044287] Call Trace:
[ 2187.044287] [] process_one_work+0x1c7/0x490
[ 2187.044287] [] ? process_one_work+0x15d/0x490
[ 2187.044287] [] worker_thread+0x119/0x4f0
[ 2187.044287] [] ? trace_hardirqs_on+0xd/0x10
[ 2187.044287] [] ? init_pwq+0x190/0x190
[ 2187.044287] [] kthread+0xdf/0x100
[ 2187.044287] [] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] [] ret_from_fork+0x7c/0xb0
[ 2187.044287] [] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
[ 2187.044287] RIP [] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP
[ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---

The original reason for this patch was because the fl_release_private
operation couldn't sleep. With commit ed9814d85810 (locks: defer freeing
locks in locks_delete_lock until after i_lock has been dropped), this is
no longer a problem so we can revert this patch.

Reported-by: Christoph Hellwig
Signed-off-by: Jeff Layton
Reviewed-by: Christoph Hellwig
Tested-by: Christoph Hellwig
Signed-off-by: Trond Myklebust

Jeff Layton
2014-09-09 08:00:32 +0800
21e81002f nfs: fix kernel warning when removing proc entry ... Browse Code »

I saw the following kernel warning:

[ 1852.321222] ------------[ cut here ]------------
[ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b()
[ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes'
[ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540
[ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1852.354992] Workqueue: netns cleanup_net
[ 1852.358701] 0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18
[ 1852.366474] ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238
[ 1852.373507] ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68
[ 1852.380224] Call Trace:
[ 1852.381976] [] dump_stack+0x4d/0x66
[ 1852.385495] [] warn_slowpath_common+0x7a/0x93
[ 1852.389869] [] ? remove_proc_entry+0x154/0x16b
[ 1852.393987] [] warn_slowpath_fmt+0x4c/0x4e
[ 1852.397999] [] remove_proc_entry+0x154/0x16b
[ 1852.402034] [] nfs_fs_proc_net_exit+0x53/0x56
[ 1852.406136] [] nfs_net_exit+0x12/0x1d
[ 1852.409774] [] ops_exit_list+0x44/0x55
[ 1852.413529] [] cleanup_net+0xee/0x182
[ 1852.417198] [] process_one_work+0x209/0x40d
[ 1852.502320] [] ? process_one_work+0x162/0x40d
[ 1852.587629] [] worker_thread+0x1f0/0x2c7
[ 1852.673291] [] ? process_scheduled_works+0x2f/0x2f
[ 1852.759470] [] kthread+0xc9/0xd1
[ 1852.843099] [] ? finish_task_switch+0x3a/0xce
[ 1852.926518] [] ? __kthread_parkme+0x61/0x61
[ 1853.008565] [] ret_from_fork+0x7c/0xb0
[ 1853.076477] [] ? __kthread_parkme+0x61/0x61
[ 1853.140653] ---[ end trace 69c4c6617f78e32d ]---

It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init()
while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit().

Fixes: commit 65b38851a17 (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes)
Cc: Eric W. Biederman
Cc: Trond Myklebust
Cc: Dan Aloni
Signed-off-by: Cong Wang
[Trond: replace uses of remove_proc_entry() with remove_proc_subtree()
as suggested by Al Viro]
Cc: stable@vger.kernel.org # 3.4.x : 65b38851a17: NFS: Fix /proc/fs/nfsfs/servers
Cc: stable@vger.kernel.org # 3.4.x
Signed-off-by: Trond Myklebust

Cong Wang
2014-09-09 07:41:36 +0800
8c68face5 Merge branch 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bugfix from Ted Ts'o.

[ Hmm. It's possible we should make kfree() aware of error pointers,
and use IS_ERR_OR_NULL rather than a NULL check. But in the meantime
this is obviously the right fix. - Linus ]

* 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: avoid trying to kfree an ERR_PTR pointer

Linus Torvalds
2014-09-09 06:51:01 +0800
861b7102b Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd bugfixes from Bruce Fields:
"A couple minor nfsd bugfixes"

* 'for-3.17' of git://linux-nfs.org/~bfields/linux:
lockd: fix rpcbind crash on lockd startup failure
nfsd4: fix rd_dircount enforcement

Linus Torvalds
2014-09-09 06:18:06 +0800
b0d5d10f4 Btrfs: use insert_inode_locked4 for inode creation ... Browse Code »

Btrfs was inserting inodes into the hash table before we had fully
set the inode up on disk. This leaves us open to rare races that allow
two different inodes in memory for the same [root, inode] pair.

This patch fixes things by using insert_inode_locked4 to insert an I_NEW
inode and unlock_new_inode when we're ready for the rest of the kernel
to use the inode.

It also makes sure to init the operations pointers on the inode before
going into the error handling paths.

Signed-off-by: Chris Mason
Reported-by: Al Viro

Chris Mason
2014-09-09 04:56:45 +0800
49dae1bc1 Btrfs: fix fsync data loss after a ranged fsync ... Browse Code »

While we're doing a full fsync (when the inode has the flag
BTRFS_INODE_NEEDS_FULL_SYNC set) that is ranged too (covers only a
portion of the file), we might have ordered operations that are started
before or while we're logging the inode and that fall outside the fsync
range.

Therefore when a full ranged fsync finishes don't remove every extent
map from the list of modified extent maps - as for some of them, that
fall outside our fsync range, their respective ordered operation hasn't
finished yet, meaning the corresponding file extent item wasn't inserted
into the fs/subvol tree yet and therefore we didn't log it, and we must
let the next fast fsync (one that checks only the modified list) see this
extent map and log a matching file extent item to the log btree and wait
for its ordered operation to finish (if it's still ongoing).

A test case for xfstests follows.

Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-09-09 04:56:43 +0800
c47ca32d3 Btrfs: kfree()ing ERR_PTRs ... Browse Code »

The "inherit" in btrfs_ioctl_snap_create_v2() and "vol_args" in
btrfs_ioctl_rm_dev() are ERR_PTRs so we can't call kfree() on them.

These kind of bugs are "One Err Bugs" where there is just one error
label that does everything. I could set the "inherit = NULL" and keep
the single out label but it ends up being more complicated that way. It
makes the code simpler to re-order the unwind so it's in the mirror
order of the allocation and introduce some new error labels.

Signed-off-by: Dan Carpenter
Signed-off-by: Chris Mason

Dan Carpenter
2014-09-09 04:56:42 +0800
7c17705e7 lockd: fix rpcbind crash on lockd startup failure ... Browse Code »

Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
then nfs mounting without portmap or rpcbind running using a busybox
mount resulted in:

# mount -t nfs 10.30.130.21:/opt /mnt
svc: failed to register lockdv1 RPC service (errno 111).
lockd_up: makesock failed, error=-111
Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0xc055e65c
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
task: cf29cea0 ti: cf35c000 task.ti: cf35c000
NIP: c055e65c LR: c0566490 CTR: c055e648
REGS: cf35dad0 TRAP: 0300 Not tainted (3.10.44.cge)
MSR: 00029000 CR: 22442488 XER: 20000000
DEAR: 00000030, ESR: 00000000

GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
00000000
GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
10090ae8
GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
bfa46ef0
GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
cf0ded80
NIP [c055e65c] call_start+0x14/0x34
LR [c0566490] __rpc_execute+0x70/0x250
Call Trace:
[cf35db80] [00000080] 0x80 (unreliable)
[cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
[cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
[cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
[cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
[cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
[cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
[cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
[cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
[cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
[cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
[cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
[cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
[cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
[cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
[cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
[cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
[cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
[cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c

The addition of svc_shutdown_net() resulted in two calls to
svc_rpcb_cleanup(); the second is no longer necessary and crashes when
it calls rpcb_register_call with clnt=NULL.

Reported-by: Nikita Yushchenko
Fixes: 679b033df484 "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
Cc: stable@vger.kernel.org
Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2014-09-09 00:03:32 +0800
aee377644 nfsd4: fix rd_dircount enforcement ... Browse Code »

Commit 3b299709091b "nfsd4: enforce rd_dircount" totally misunderstood
rd_dircount; it refers to total non-attribute bytes returned, not number
of directory entries returned.

Bring the code into agreement with RFC 3530 section 14.2.24.

Cc: stable@vger.kernel.org
Fixes: 3b299709091b "nfsd4: enforce rd_dircount"
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2014-09-09 00:02:03 +0800

08 Sep, 2014

2 commits

9142eadef Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull filesystem fixes from Al Viro:
"Several bugfixes (all of them -stable fodder).

Alexey's one deals with double mutex_lock() in UFS (apparently, nobody
has tried to test "ufs: sb mutex merge + mutex_destroy" on something
like file creation/removal on ufs). Mine deal with two kinds of
umount bugs, in umount propagation and in handling of automounted
submounts, both resulting in bogus transient EBUSY from umount"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: fix deadlocks introduced by sb mutex merge
fix EBUSY on umount() from MNT_SHRINKABLE
get rid of propagate_umount() mistakenly treating slaves as busy.

Linus Torvalds
2014-09-08 01:59:58 +0800
9ef7db7f3 ufs: fix deadlocks introduced by sb mutex merge ... Browse Code »

Commit 0244756edc4b ("ufs: sb mutex merge + mutex_destroy") introduces
deadlocks in ufs_new_inode() and ufs_free_inode().
Most callers of that functions acqure the mutex by themselves and
ufs_{new,free}_inode() do that via lock_ufs(),
i.e we have an unavoidable double lock.

The patch proposes to resolve the issue by making sure that
ufs_{new,free}_inode() are not called with the mutex held.

Found by Linux Driver Verification project (linuxtesting.org).

Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Alexey Khoroshilov
Signed-off-by: Al Viro

Alexey Khoroshilov
2014-09-08 01:26:39 +0800

07 Sep, 2014

1 commit

11e973981 Merge tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pull xfs fixes from Dave Chinner:
"The fixes all address recently discovered data corruption issues.

The original Direct IO issue was discovered by Chris Mason @ Facebook
on a production workload which mixed buffered reads with direct reads
and writes IO to the same file. The fix for that exposed other issues
with page invalidation (exposed by millions of fsx operations) failing
due to dirty buffers beyond EOF.

Finally, the collapse_range code could also cause problems due to
racing writeback changing the extent map while it was being shifted
around. The commits for that problem are simple mitigation fixes that
prevent the problem from occuring. A more robust fix for 3.18 that
addresses the underlying problem is currently being worked on by
Brian.

Summary of fixes:
- a direct IO read/buffered read data corruption
- the associated fallout from the DIO data corruption fix
- collapse range bugs that are potential data corruption issues"

* tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: trim eofblocks before collapse range
xfs: xfs_file_collapse_range is delalloc challenged
xfs: don't log inode unless extent shift makes extent modifications
xfs: use ranged writeback and invalidation for direct IO
xfs: don't zero partial page cache pages during O_DIRECT writes
xfs: don't zero partial page cache pages during O_DIRECT writes
xfs: don't dirty buffers beyond EOF

Linus Torvalds
2014-09-07 03:13:17 +0800

05 Sep, 2014

9 commits

10096fb10 Export sync_filesystem() for modular ->remount_fs() use ... Browse Code »

This patch changes sync_filesystem() to be EXPORT_SYMBOL().

The reason this is needed is that starting with 3.15 kernel, due to
Theodore Ts'o's commit 02b9984d6408 ("fs: push sync_filesystem() down to
the file system's remount_fs()"), all file systems that have dirty data
to be written out need to call sync_filesystem() from their
->remount_fs() method when remounting read-only.

As this is now a generically required function rather than an internal
only function it should be EXPORT_SYMBOL() so that all file systems can
call it.

Signed-off-by: Anton Altaparmakov
Acked-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Altaparmakov
2014-09-05 23:16:21 +0800
b7fece1be Merge git://git.kvack.org/~bcrl/aio-fixes ... Browse Code »

Pull aio bugfixes from Ben LaHaise:
"Two small fixes"

* git://git.kvack.org/~bcrl/aio-fixes:
aio: block exit_aio() until all context requests are completed
aio: add missing smp_rmb() in read_events_ring

Linus Torvalds
2014-09-05 07:08:55 +0800
6098b45b3 aio: block exit_aio() until all context requests are completed ... Browse Code »

It seems that exit_aio() also needs to wait for all iocbs to complete (like
io_destroy), but we missed the wait step in current implemention, so fix
it in the same way as we did in io_destroy.

Signed-off-by: Gu Zheng
Signed-off-by: Benjamin LaHaise
Cc: stable@vger.kernel.org

Gu Zheng
2014-09-05 04:54:47 +0800
0b93a92be udf: saner calling conventions for udf_new_inode() ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Jan Kara

Al Viro
2014-09-05 03:37:41 +0800
b23150961 udf: fix the udf_iget() vs. udf_new_inode() races ... Browse Code »

Currently udf_iget() (triggered by NFS) can race with udf_new_inode()
leading to two inode structures with the same inode number:

nfsd: iget_locked() creates inode
nfsd: try to read from disk, block on that.
udf_new_inode(): allocate inode with that inumber
udf_new_inode(): insert it into icache, set it up and dirty
udf_write_inode(): write inode into buffer cache
nfsd: get CPU again, look into buffer cache, see nice and sane on-disk
inode, set the in-core inode from it

Fix the problem by putting inode into icache in locked state (I_NEW set)
and unlocking it only after it's fully set up.

Signed-off-by: Al Viro
Signed-off-by: Jan Kara

Al Viro
2014-09-05 03:37:41 +0800
d2be51cb3 udf: merge the pieces inserting a new non-directory object into directory ... Browse Code »

boilerplate code in udf_{create,mknod,symlink} taken to new helper

symlink case converted to unique id calculated by udf_new_inode() - no
point finding a new one.

Signed-off-by: Al Viro
Signed-off-by: Jan Kara

Al Viro
2014-09-05 03:37:40 +0800
470cca56c udf: Set i_generation field ... Browse Code »

Currently UDF doesn't initialize i_generation in any way and thus NFS
can easily get reallocated inodes from stale file handles. Luckily UDF
already has a unique object identifier associated with each inode -
i_unique. Use that for initialization of i_generation.

Signed-off-by: Jan Kara

Jan Kara
2014-09-05 03:37:40 +0800
4071b9136 udf: Properly detect stale inodes ... Browse Code »

NFS can easily ask for inodes that are already deleted. Currently UDF
happily returns such inodes which is a bug. Return -ESTALE if
udf_read_inode() is asked to read deleted inode.

Signed-off-by: Jan Kara

Jan Kara
2014-09-05 03:37:39 +0800
6d3d5e860 udf: Make udf_read_inode() and udf_iget() return error ... Browse Code »

Currently __udf_read_inode() wasn't returning anything and we found out
whether we succeeded reading inode by checking whether inode is bad or
not. udf_iget() returned NULL on failure and inode pointer otherwise.
Make these two functions properly propagate errors up the call stack and
use the return value in callers.

Signed-off-by: Jan Kara

Jan Kara
2014-09-05 03:36:35 +0800

04 Sep, 2014

3 commits

c03aa9f6e udf: Avoid infinite loop when processing indirect ICBs ... Browse Code »

We did not implement any bound on number of indirect ICBs we follow when
loading inode. Thus corrupted medium could cause kernel to go into an
infinite loop, possibly causing a stack overflow.

Fix the possible stack overflow by removing recursion from
__udf_read_inode() and limit number of indirect ICBs we follow to avoid
infinite loops.

Signed-off-by: Jan Kara

Jan Kara
2014-09-04 20:12:29 +0800
bb7720a0b udf: Fold udf_fill_inode() into __udf_read_inode() ... Browse Code »

There's no good reason to separate these since udf_fill_inode() is
called only from __udf_read_inode() and both do part of the same thing.

Signed-off-by: Jan Kara

Jan Kara
2014-09-04 19:32:50 +0800
8a70ee330 udf: Avoid dir link count to go negative ... Browse Code »

If we are writing back inode of unlinked directory, its link count ends
up being (u16)-1. Although the inode is deleted, udf_iget() can load the
inode when NFS uses stale file handle and get confused.

Signed-off-by: Jan Kara

Jan Kara
2014-09-04 17:47:51 +0800