Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

13 Oct, 2013

4 commits

9d05746e7 vfs: allow O_PATH file descriptors for fstatfs() ... Browse Code »

Olga reported that file descriptors opened with O_PATH do not work with
fstatfs(), found during further development of ksh93's thread support.

There is no reason to not allow O_PATH file descriptors here (fstatfs is
very much a path operation), so use "fdget_raw()". See commit
55815f70147d ("vfs: make O_PATH file descriptors usable for 'fstat()'")
for a very similar issue reported for fstat() by the same team.

Reported-and-tested-by: ольга крыжановская
Acked-by: Al Viro
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-10-13 04:12:31 +0800
be5090da4 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bugfixes from Ted Ts'o:
"A bug fix and performance regression fix for ext4"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix memory leak in xattr
ext4: fix performance regression in writeback of random writes

Linus Torvalds
2013-10-13 03:55:15 +0800
d64dab903 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"We've got more bug fixes in my for-linus branch:

One of these fixes another corner of the compression oops from last
time. Miao nailed down some problems with concurrent snapshot
deletion and drive balancing.

I kept out one of his patches for more testing, but these are all
stable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix oops caused by the space balance and dead roots
Btrfs: insert orphan roots into fs radix tree
Btrfs: limit delalloc pages outside of find_delalloc_range
Btrfs: use right root when checking for hash collision

Linus Torvalds
2013-10-13 03:54:24 +0800
6e4ea8e33 ext4: fix memory leak in xattr ... Browse Code »

If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
potentionally return from the function without having freed these
allocations. If we don't do the return, we over-write the previous
allocation pointers, so we leak either way.

Spotted with Coverity.

[ Fixed by tytso to set is and bs to NULL after freeing these
pointers, in case in the retry loop we later end up triggering an
error causing a jump to cleanup, at which point we could have a double
free bug. -- Ted ]

Signed-off-by: Dave Jones
Signed-off-by: "Theodore Ts'o"
Reviewed-by: Eric Sandeen
Cc: stable@vger.kernel.org

Dave Jones
2013-10-13 02:39:49 +0800

11 Oct, 2013

4 commits

c00869f1a Btrfs: fix oops caused by the space balance and dead roots ... Browse Code »

When doing space balance and subvolume destroy at the same time, we met
the following oops:

kernel BUG at fs/btrfs/relocation.c:2247!
RIP: 0010: [] prepare_to_merge+0x154/0x1f0 [btrfs]
Call Trace:
[] relocate_block_group+0x466/0x4e6 [btrfs]
[] btrfs_relocate_block_group+0x143/0x275 [btrfs]
[] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
[] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
[] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
[] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
[] btrfs_balance+0x9c7/0xb6f [btrfs]
[] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
[] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
[] ? path_openat+0x234/0x4db
[] ? __do_page_fault+0x31d/0x391
[] ? vma_link+0x74/0x94
[] vfs_ioctl+0x1d/0x39
[] do_vfs_ioctl+0x32d/0x3e2
[] SyS_ioctl+0x57/0x83
[] ? do_page_fault+0xe/0x10
[] system_call_fastpath+0x16/0x1b

It is because we returned the error number if the reference of the root was 0
when doing space relocation. It was not right here, because though the root
was dead(refs == 0), but the space it held still need be relocated, or we
could not remove the block group. So in this case, we should return the root
no matter it is dead or not.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Miao Xie
2013-10-11 09:31:02 +0800
14927d954 Btrfs: insert orphan roots into fs radix tree ... Browse Code »

Now we don't drop all the deleted snapshots/subvolumes before the space
balance. It means we have to relocate the space which is held by the dead
snapshots/subvolumes. So we must into them into fs radix tree, or we would
forget to commit the change of them when doing transaction commit, and it
would corrupt the metadata.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Miao Xie
2013-10-11 09:30:53 +0800
7bf811a59 Btrfs: limit delalloc pages outside of find_delalloc_range ... Browse Code »

Liu fixed part of this problem and unfortunately I steered him in slightly the
wrong direction and so didn't completely fix the problem. The problem is we
limit the size of the delalloc range we are looking for to max bytes and then we
try to lock that range. If we fail to lock the pages in that range we will
shrink the max bytes to a single page and re loop. However if our first page is
inside of the delalloc range then we will end up limiting the end of the range
to a period before our first page. This is illustrated below

[0 -------- delalloc range --------- 256mb]
[page]

So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
and then we will notice that delalloc_start < *start and adjust it up, but not
adjust delalloc_end up, so things go sideways. To fix this we need to not limit
the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
way we don't end up with this confusion. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-10-11 09:27:56 +0800
4871c1588 Btrfs: use right root when checking for hash collision ... Browse Code »

btrfs_rename was using the root of the old dir instead of the root of the new
dir when checking for a hash collision, so if you tried to move a file into a
subvol it would freak out because it would see the file you are trying to move
in its current root. This fixes the bug where this would fail

btrfs subvol create test1
btrfs subvol create test2
mv test1 test2.

Thanks to Chris Murphy for catching this,

Cc: stable@vger.kernel.org
Reported-by: Chris Murphy
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-10-11 09:27:45 +0800

06 Oct, 2013

1 commit

e62063d69 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes, including a regression fix from
Liu Bo that solves rare crashes with compression on.

I've merged my for-linus up to 3.12-rc3 because the top commit is only
meant for 3.12. The rest of the fixes are also available in my master
branch on top of my last 3.11 based pull"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: Fix crash due to not allocating integrity data for a bioset
Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
Btrfs: eliminate races in worker stopping code
Btrfs: fix crash of compressed writes
Btrfs: fix transid verify errors when recovering log tree

Linus Torvalds
2013-10-06 03:17:24 +0800

05 Oct, 2013

13 commits

b208c2f7c btrfs: Fix crash due to not allocating integrity data for a bioset ... Browse Code »

When btrfs creates a bioset, we must also allocate the integrity data pool.
Otherwise btrfs will crash when it tries to submit a bio to a checksumming
disk:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [] mempool_alloc+0x4a/0x150
PGD 2305e4067 PUD 23063d067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
RIP: 0010:[] [] mempool_alloc+0x4a/0x150
RSP: 0018:ffff880230699688 EFLAGS: 00010286
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
FS: 00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
Stack:
ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
Call Trace:
[] bio_integrity_alloc+0x48/0x1b0
[] bio_integrity_prep+0xac/0x360
[] ? mempool_alloc+0x58/0x150
[] ? alloc_extent_state+0x31/0x110 [btrfs]
[] blk_queue_bio+0x1c9/0x460
[] generic_make_request+0xca/0x100
[] submit_bio+0x79/0x160
[] btrfs_map_bio+0x48e/0x5b0 [btrfs]
[] btree_submit_bio_hook+0xda/0x110 [btrfs]
[] submit_one_bio+0x6a/0xa0 [btrfs]
[] read_extent_buffer_pages+0x250/0x310 [btrfs]
[] ? __radix_tree_preload+0x66/0xf0
[] ? radix_tree_insert+0x95/0x260
[] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
[btrfs]
[] read_tree_block+0x3a/0x60 [btrfs]
[] open_ctree+0x139d/0x2030 [btrfs]
[] btrfs_mount+0x53a/0x7d0 [btrfs]
[] ? pcpu_alloc+0x8eb/0x9f0
[] ? __kmalloc_track_caller+0x35/0x1e0
[] mount_fs+0x20/0xd0
[] vfs_kern_mount+0x76/0x120
[] do_mount+0x200/0xa40
[] ? strndup_user+0x5b/0x80
[] SyS_mount+0x90/0xe0
[] system_call_fastpath+0x1a/0x1f
Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 8b 73 18 44 89 c7
44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
RIP [] mempool_alloc+0x4a/0x150
RSP
CR2: 0000000000000018
---[ end trace 7a96042017ed21e2 ]---

Signed-off-by: Darrick J. Wong
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Darrick J. Wong
2013-10-05 22:52:10 +0800
1329dfc8b Merge branch 'for-linus' into for-linus-3.12 Browse Code »

Chris Mason
2013-10-05 22:51:32 +0800
a5c984cc2 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull CIFS fixes from Steve French:
"Small set of cifs fixes. Most important is Jeff's fix that works
around disconnection problems which can be caused by simultaneous use
of user space tools (starting a long running smbclient backup then
doing a cifs kernel mount) or multiple cifs mounts through a NAT, and
Jim's fix to deal with reexport of cifs share.

I expect to send two more cifs fixes next week (being tested now) -
fixes to address an SMB2 unmount hang when server dies and a fix for
cifs symlink handling of Windows "NFS" symlinks"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] update cifs.ko version
[CIFS] Remove ext2 flags that have been moved to fs.h
[CIFS] Provide sane values for nlink
cifs: stop trying to use virtual circuits
CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them

Linus Torvalds
2013-10-05 11:50:16 +0800
3dbecf0aa Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs bugfixes from Ben Myers:
"There are lockdep annotations for project quotas, a fix for dirent
dtype support on v4 filesystems, a fix for a memory leak in recovery,
and a fix for the build error that resulted from it. D'oh"

* tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs:
xfs: Use kmem_free() instead of free()
xfs: fix memory leak in xlog_recover_add_to_trans
xfs: dirent dtype presence is dependent on directory magic numbers
xfs: lockdep needs to know about 3 dquot-deep nesting

Linus Torvalds
2013-10-05 05:47:22 +0800
1357272fc Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing ... Browse Code »

free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents. Fix this by
zeroing the superblock before the rcu callback is registered.

Cc: Stefan Behrens
Signed-off-by: Ilya Dryomov
Signed-off-by: Josef Bacik

Ilya Dryomov
2013-10-05 04:02:14 +0800
964fb15ac Btrfs: eliminate races in worker stopping code ... Browse Code »

The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop. The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

btrfs_stop_workers(): check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
working list
- removes the first worker from the
working list
- releases the lock to wait for
its kthread's completion
- grabs the lock
- if aworker is on the working list,
moves aworker from the working list
to the idle list
- releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
working list
......
btrfs_stop_workers returns, aworker is on the idle list
FS is umounted, memory is freed
......
aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba
Signed-off-by: Ilya Dryomov
Signed-off-by: Josef Bacik

Ilya Dryomov
2013-10-05 04:02:13 +0800
385fe0bed Btrfs: fix crash of compressed writes ... Browse Code »

The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.

The bug is quite interesting, it's actually introduced by a recent commit
(573aecafca1cf7a974231b759197a1aebcf39c2a,
Btrfs: actually limit the size of delalloc range).

Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
ordered extent and so on.
(4) submit the page A.

It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).

The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end 0).

This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.

This fixes it by just thinking that we find nothing and returning to caller
as the caller knows how to deal with it properly.

[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G O 3.11.0+ #8
[...]
RIP: 0010:[] [] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731] ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731] 0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731] ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731] [] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731] [] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731] [] ? detach_if_pending+0x22/0x4b
[ 4934.248731] [] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731] [] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731] [] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731] [] kthread+0x8d/0x95
[ 4934.248731] [] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] [] ret_from_fork+0x7c/0xb0
[ 4934.248731] [] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP [] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731] RSP
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-10-05 04:02:11 +0800
60e7cd3a4 Btrfs: fix transid verify errors when recovering log tree ... Browse Code »

If we crash with a log, remount and recover that log, and then crash before we
can commit another transaction we will get transid verify errors on the next
mount. This is because we were not zero'ing out the log when we committed the
transaction after recovery. This is ok as long as we commit another transaction
at some point in the future, but if you abort or something else goes wrong you
can end up in this weird state because the recovery stuff says that the tree log
should have a generation+1 of the super generation, which won't be the case of
the transaction that was started for recovery. Fix this by removing the check
and _always_ zero out the log portion of the super when we commit a transaction.
This fixes the transid verify issues I was seeing with my force errors tests.
Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-10-05 04:02:09 +0800
b2a42f78a xfs: Use kmem_free() instead of free() ... Browse Code »

This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.

Signed-off-by: Thierry Reding
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

(cherry picked from commit aaaae98022efa4f3c31042f1fdf9e7a0c5f04663)

Thierry Reding
2013-10-05 02:56:12 +0800
9b3b77fe6 xfs: fix memory leak in xlog_recover_add_to_trans ... Browse Code »

Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.

Signed-off-by: Mark Tinguely
Reviewed-by: Eric Sandeen
Signed-off-by: Ben Myers

(cherry picked from commit 519ccb81ac1c8e3e4eed294acf93be00b43dcad6)

tinguely@sgi.com
2013-10-05 02:56:03 +0800
6d313498f xfs: dirent dtype presence is dependent on directory magic numbers ... Browse Code »

The determination of whether a directory entry contains a dtype
field originally was dependent on the filesystem having CRCs
enabled. This meant that the format for dtype beign enabled could be
determined by checking the directory block magic number rather than
doing a feature bit check. This was useful in that it meant that we
didn't need to pass a struct xfs_mount around to functions that
were already supplied with a directory block header.

Unfortunately, the introduction of dtype fields into the v4
structure via a feature bit meant this "use the directory block
magic number" method of discriminating the dirent entry sizes is
broken. Hence we need to convert the places that use magic number
checks to use feature bit checks so that they work correctly and not
by chance.

The current code works on v4 filesystems only because the dirent
size roundup covers the extra byte needed by the dtype field in the
places where this problem occurs.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

(cherry picked from commit 367993e7c6428cb7617ab7653d61dca54e2fdede)

Dave Chinner
2013-10-05 02:55:48 +0800
89c6c89af xfs: lockdep needs to know about 3 dquot-deep nesting ... Browse Code »

Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
(&xfs_dquot_other_class){+.+...}, at: [] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
(&xfs_dquot_other_class){+.+...}, at: [] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&xfs_dquot_other_class);
lock(&xfs_dquot_other_class);

*** DEADLOCK ***

May be due to missing lock nesting notation

7 locks held by touch/21072:
#0: (sb_writers#10){++++.+}, at: [] mnt_want_write+0x1e/0x3e
#1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [] do_last+0x245/0xe40
#2: (sb_internal#2){++++.+}, at: [] xfs_trans_alloc+0x1f/0x35
#3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [] xfs_ilock+0x100/0x1f1
#4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [] xfs_ilock_nowait+0x105/0x22f
#5: (&dqp->q_qlock){+.+...}, at: [] xfs_trans_dqlockedjoin+0x57/0x64
#6: (&xfs_dquot_other_class){+.+...}, at: [] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.

Reported-by: Michael L. Semon
Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

(cherry picked from commit f112a049712a5c07de25d511c3c6587a2b1a015e)

Dave Chinner
2013-10-05 02:55:33 +0800
15c83d26e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse bugfixes from Miklos Szeredi:
"This contains two more fixes by Maxim for writeback/truncate races and
fixes for RCU walk in fuse_dentry_revalidate()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: no RCU mode in fuse_access()
fuse: readdirplus: fix RCU walk
fuse: don't check_submounts_and_drop() in RCU walk
fuse: fix fallocate vs. ftruncate race
fuse: wait for writeback in fuse_file_fallocate()

Linus Torvalds
2013-10-05 00:06:13 +0800

03 Oct, 2013

1 commit

981d90109 Merge git://git.kvack.org/~bcrl/aio-next ... Browse Code »

Pull aio use-after-free fix from Ben LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
aio: fix use-after-free in aio_migratepage

Linus Torvalds
2013-10-03 00:38:17 +0800

02 Oct, 2013

2 commits

517bf8fc2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs lru leak fix from Al Viro:
"The fix in "super: fix for destroy lrus" didn't - they need to be
destroyed, all right, but that's the wrong place..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/super.c: fix lru_list leak for real

Linus Torvalds
2013-10-02 01:28:11 +0800
c2d22ecd3 fs/super.c: fix lru_list leak for real ... Browse Code »

Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
the right place is destroy_super(). As it is, we leak them if sget()
decides that new superblock it has allocated (and never shown to
anybody) isn't needed and should be freed.

Signed-off-by: Al Viro

Al Viro
2013-10-02 01:11:21 +0800

01 Oct, 2013

7 commits

698fa1d16 fuse: no RCU mode in fuse_access() ... Browse Code »

fuse_access() is never called in RCU walk, only on the final component of
access(2) and chdir(2)...

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2013-10-01 22:41:23 +0800
6314efee3 fuse: readdirplus: fix RCU walk ... Browse Code »

Doing dput(parent) is not valid in RCU walk mode. In RCU mode it would
probably be okay to update the parent flags, but it's actually not
necessary most of the time...

So only set the FUSE_I_ADVISE_RDPLUS flag on the parent when the entry was
recently initialized by READDIRPLUS.

This is achieved by setting FUSE_I_INIT_RDPLUS on entries added by
READDIRPLUS and only dropping out of RCU mode if this flag is set.
FUSE_I_INIT_RDPLUS is cleared once the FUSE_I_ADVISE_RDPLUS flag is set in
the parent.

Reported-by: Al Viro
Signed-off-by: Miklos Szeredi
Cc: stable@vger.kernel.org

Miklos Szeredi
2013-10-01 22:41:22 +0800
3c70b8eed fuse: don't check_submounts_and_drop() in RCU walk ... Browse Code »

If revalidate finds an invalid dentry in RCU walk mode, let the VFS deal
with it instead of calling check_submounts_and_drop() which is not prepared
for being called from RCU walk.

Signed-off-by: Miklos Szeredi
Cc: stable@vger.kernel.org

Miklos Szeredi
2013-10-01 22:41:22 +0800
f92731884 Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root
filesystem
- Fix a memory ordering issue in the pNFS files layout driver

* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Give "flavor" an initial value to fix a compile warning
NFSv4.1: try SECINFO_NO_NAME flavs until one works
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method

Linus Torvalds
2013-10-01 08:10:26 +0800
522d6d38f Merge branch 'akpm' (fixes from Andrew Morton) ... Browse Code »

Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton : (22 commits)
pidns: fix free_pid() to handle the first fork failure
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/sem.c: update sem_otime for all operations
mm/hwpoison: fix the lack of one reference count against poisoned page
mm/hwpoison: fix false report on 2nd attempt at page recovery
mm/hwpoison: fix test for a transparent huge page
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
block: change config option name for cmdline partition parsing
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
mm: avoid reinserting isolated balloon pages into LRU lists
arch/parisc/mm/fault.c: fix uninitialized variable usage
include/asm-generic/vtime.h: avoid zero-length file
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Documentation/kernel-parameters.txt: replace kernelcore with Movable
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
ipc/sem.c: synchronize the proc interface
ipc/sem.c: optimize sem_lock()
ipc/sem.c: fix race in sem_lock()
mm/compaction.c: periodically schedule when freeing pages
...

Linus Torvalds
2013-10-01 05:32:32 +0800
7f42ec394 nilfs2: fix issue with race condition of competition between segments for dirty blocks ... Browse Code »
13

Many NILFS2 users were reported about strange file system corruption
(for example):

NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

But such error messages are consequence of file system's issue that takes
place more earlier. Fortunately, Jerome Poulin
and Anton Eliasson were reported about another
issue not so recently. These reports describe the issue with segctor
thread's crash:

BUG: unable to handle kernel paging request at 0000000000004c83
IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

Call Trace:
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
kthread+0xc0/0xd0
ret_from_fork+0x7c/0xb0

These two issues have one reason. This reason can raise third issue
too. Third issue results in hanging of segctor thread with eating of
100% CPU.

REPRODUCING PATH:

One of the possible way or the issue reproducing was described by
Jermoe me Poulin :

1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e > lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} > /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux > ps-aux-crashed.log
13. sysrq+W
14. sysrq+E wait for everything to terminate
15. sysrq+SUSB

Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.

REPRODUCIBILITY:

The issue is reproduced not stable [60% - 80%]. It is very important to
have proper environment for the issue reproducing. The critical
conditions for successful reproducing:

(1) It should have big modified file by mmap() way.

(2) This file should have the count of dirty blocks are greater that
several segments in size (for example, two or three) from time to time
during processing.

(3) It should be intensive background activity of files modification
in another thread.

INVESTIGATION:

First of all, it is possible to see that the reason of crash is not valid
page address:

NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783

Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
number. And b_blocknr with b_size values look like block numbers. So,
buffer_head's pointer points on not proper address value.

Detailed investigation of the issue is discovered such picture:

[-----------------------------SEGMENT 6783-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783

[-----------------------------SEGMENT 6784-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15

[-----------------------------SEGMENT 6785-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12

NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785

NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82

BUG: unable to handle kernel paging request at 0000000000001a82
IP: [] nilfs_end_page_io+0x12/0xd0 [nilfs2]

Usually, for every segment we collect dirty files in list. Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer. Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.

It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase. Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:

[SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
[SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8

The next pointer is the same but prev pointer has changed. It means
that buffer_head has next pointer from one list but prev pointer from
another. Such modification can be made several times. And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.

FIX:
This patch adds:

(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
for every proccessed dirty block;

(2) checking of BH_Async_Write flag in
nilfs_lookup_dirty_data_buffers() and
nilfs_lookup_dirty_node_buffers();

(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

Reported-by: Jerome Poulin
Reported-by: Anton Eliasson
Cc: Paul Fertser
Cc: ARAI Shun-ichi
Cc: Piotr Szymaniak
Cc: Juan Barry Manuel Canham
Cc: Zahid Chowdhury
Cc: Elmer Zhang
Cc: Kenneth Langga
Signed-off-by: Vyacheslav Dubeyko
Acked-by: Ryusuke Konishi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vyacheslav Dubeyko
2013-10-01 05:31:02 +0800
720236569 fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing ... Browse Code »

A high setting of max_map_count, and a process core-dumping with a large
enough vm_map_count could result in an NT_FILE note not being written,
and the kernel crashing immediately later because it has assumed
otherwise.

Reproduction of the oops-causing bug described here:

https://lkml.org/lkml/2013/8/30/50

Rge ussue originated in commit 2aa362c49c31 ("coredump: extend core dump
note section to contain file names of mapped file") from Oct 4, 2012.

This patch make that section optional in that case. fill_files_note()
should signify the error, and also let the info struct in
elf_core_dump() be zero-initialized so that we can check for the
optionally written note.

[akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Dan Aloni
Cc: Al Viro
Cc: Denys Vlasenko
Reported-by: Martin MOKREJS
Tested-by: Martin MOKREJS
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Aloni
2013-10-01 05:31:01 +0800

30 Sep, 2013

7 commits

13f358389 afs: dget_parent() can't return a negative dentry ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-09-30 10:02:24 +0800
7b9a2378b ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-09-30 10:02:20 +0800
494755558 sysv: Add forgotten superblock lock init for v7 fs ... Browse Code »

Superblock lock was replaced with (un)lock_super() removal, but left
uninitialized for Seventh Edition UNIX filesystem in the following commit (3.7):
c07cb01 sysv: drop lock/unlock super

Signed-off-by: Lubomir Rintel
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Lubomir Rintel
2013-09-30 10:02:02 +0800
367156d9a NFS: Give "flavor" an initial value to fix a compile warning ... Browse Code »

The previous patch introduces a compile warning by not assigning an initial
value to the "flavor" variable. This could only be a problem if the server
returns a supported secflavor list of length zero, but it's better to
fix this before it's ever hit.

Signed-off-by: Anna Schumaker
Acked-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Anna Schumaker
2013-09-30 04:03:34 +0800
58a8cf121 NFSv4.1: try SECINFO_NO_NAME flavs until one works ... Browse Code »

Call nfs4_lookup_root_sec for each flavor returned by SECINFO_NO_NAME until
one works.

One example of a situation this fixes:

- server configured for krb5
- server principal somehow gets deleted from KDC
- server still thinking krb is good, sends krb5 as first entry in
SECINFO_NO_NAME response
- client tries krb5, but this fails without even sending an RPC because
gssd's requests to the KDC can't find the server's principal

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-09-30 04:03:34 +0800
acd65e5bc NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds ... Browse Code »

We need to ensure that the initialisation of the data server nfs_client
structure in nfs4_ds_connect is correctly ordered w.r.t. the read of
ds->ds_clp in nfs4_fl_prepare_ds.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-09-30 03:58:35 +0800
52b26a3e1 NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails ... Browse Code »

- Fix an Oops when nfs4_ds_connect() returns an error.
- Always check the device status after waiting for a connect to complete.

Reported-by: Andy Adamson
Reported-by: Jeff Layton
Signed-off-by: Trond Myklebust
Cc: # v3.10+

Trond Myklebust
2013-09-30 03:56:35 +0800

29 Sep, 2013

1 commit

ddd23eb18 Merge tag 'xfs-for-linus-v3.12-rc3' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs bugfixes from Ben Myers:
- fix for directory node collapse regression
- fix for recovery over stale on disk structures
- fix for eofblocks ioctl
- fix asserts in xfs_inode_free
- lock the ail before removing an item from it

* tag 'xfs-for-linus-v3.12-rc3' of git://oss.sgi.com/xfs/xfs:
xfs: fix node forward in xfs_node_toosmall
xfs: log recovery lsn ordering needs uuid check
xfs: fix XFS_IOC_FREE_EOFBLOCKS definition
xfs: asserting lock not held during freeing not valid
xfs: lock the AIL before removing the buffer item

Linus Torvalds
2013-09-29 04:52:05 +0800