Eric Lee / smarc-fsl-linux-kernel

16 Aug, 2014

1 commit

e64df3ebe Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs updates from Chris Mason:
"These are all fixes I'd like to get out to a broader audience.

The biggest of the bunch is Mark's quota fix, which is also in the
SUSE kernel, and makes our subvolume quotas dramatically more
accurate.

I've been running xfstests with these against your current git
overnight, but I'm queueing up longer tests as well"

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: disable strict file flushes for renames and truncates
Btrfs: fix csum tree corruption, duplicate and outdated checksums
Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch
Btrfs: fix compressed write corruption on enospc
btrfs: correctly handle return from ulist_add
btrfs: qgroup: account shared subtrees during snapshot delete
Btrfs: read lock extent buffer while walking backrefs
Btrfs: __btrfs_mod_ref should always use no_quota
btrfs: adjust statfs calculations according to raid profiles

Linus Torvalds
2014-08-16 23:06:55 +0800

15 Aug, 2014

9 commits

8d875f95d btrfs: disable strict file flushes for renames and truncates ... Browse Code »

Truncates and renames are often used to replace old versions of a file
with new versions. Applications often expect this to be an atomic
replacement, even if they haven't done anything to make sure the new
version is fully on disk.

Btrfs has strict flushing in place to make sure that renaming over an
old file with a new file will fully flush out the new file before
allowing the transaction commit with the rename to complete.

This ordering means the commit code needs to be able to lock file pages,
and there are a few paths in the filesystem where we will try to end a
transaction with the page lock held. It's rare, but these things can
deadlock.

This patch removes the ordered flushes and switches to a best effort
filemap_flush like ext4 uses. It's not perfect, but it should fix the
deadlocks.

Signed-off-by: Chris Mason

Chris Mason
2014-08-15 22:43:42 +0800
27b9a8122 Btrfs: fix csum tree corruption, duplicate and outdated checksums ... Browse Code »

Under rare circumstances we can end up leaving 2 versions of a checksum
for the same file extent range.

The reason for this is that after calling btrfs_next_leaf we process
slot 0 of the leaf it returns, instead of processing the slot set in
path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
btrfs_next_leaf() releases the path and before it searches for the next
leaf, another task might cause a split of the next leaf, which migrates
some of its keys to the leaf we were processing before calling
btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
same leaf but with path->slots[0] having a slot number corresponding
to the first new key it got, that is, a slot number that didn't exist
before calling btrfs_next_leaf(), as the leaf now has more keys than
it had before. So we must really process the returned leaf starting at
path->slots[0] always, as it isn't always 0, and the key at slot 0 can
have an offset much lower than our search offset/bytenr.

For example, consider the following scenario, where we have:

sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472

Leaf N:

slot = 0 slot = btrfs_header_nritems() - 1
|-------------------------------------------------------------------|
| [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
|-------------------------------------------------------------------|

Leaf N + 1:

slot = 0 slot = btrfs_header_nritems() - 1
|--------------------------------------------------------------------|
| [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
|--------------------------------------------------------------------|

Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
find the next highest key, which releases the current path and then searches
for that next key. However after releasing the path and before finding that
next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
btrfs_next_leaf() will returns us a path again with leaf N but with the slot
pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
is then:

slot = 0 slot = btrfs_header_nritems() - 2 slot = btrfs_header_nritems() - 1
|----------------------------------------------------------------------------------------------------|
| [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] [(CSUM CSUM 40161280), size 32] |
|----------------------------------------------------------------------------------------------------|

And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
into the "insert:" label, which will set tmp to:

tmp = min((sums->len - total_bytes) >> blocksize_bits,
(next_offset - file_key.offset) >> blocksize_bits) =
min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4

and

ins_size = csum_size * tmp = 4 * 4 = 16 bytes.

In other words, we insert a new csum item in the tree with key
(CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
because the item with key (CSUM CSUM 40161280) (the one that was moved from
leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
bytes of our data and won't get those old checksums removed.

So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
and breaks the logical rule:

Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover

An obvious bad effect of this is that a subsequent csum tree lookup to get
the checksum of any of the blocks with logical offset of 40161280, 40165376
or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-08-15 22:43:40 +0800
4eb1f66dc Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch ... Browse Code »

We've got bug reports that btrfs crashes when quota is enabled on
32bit kernel, typically with the Oops like below:
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [] find_parent_nodes+0x360/0x1380 [btrfs]
*pde = 00000000
Oops: 0000 [#1] SMP
CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S W 3.15.2-1.gd43d97e-default #1
Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
task: f1478130 ti: f147c000 task.ti: f147c000
EIP: 0060:[] EFLAGS: 00010213 CPU: 0
EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
Stack:
00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
Call Trace:
[] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
[] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
[] normal_work_helper+0xc8/0x270 [btrfs]
[] process_one_work+0x11b/0x390
[] worker_thread+0x101/0x340
[] kthread+0x9b/0xb0
[] ret_from_kernel_thread+0x21/0x30
[] kthread_create_on_node+0x110/0x110

This indicates a NULL corruption in prefs_delayed list. The further
investigation and bisection pointed that the call of ulist_add_merge()
results in the corruption.

ulist_add_merge() takes u64 as aux and writes a 64bit value into
old_aux. The callers of this function in backref.c, however, pass a
pointer of a pointer to old_aux. That is, the function overwrites
64bit value on 32bit pointer. This caused a NULL in the adjacent
variable, in this case, prefs_delayed.

Here is a quick attempt to band-aid over this: a new function,
ulist_add_merge_ptr() is introduced to pass/store properly a pointer
value instead of u64. There are still ugly void ** cast remaining
in the callers because void ** cannot be taken implicitly. But, it's
safer than explicit cast to u64, anyway.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046
Cc: [v3.11+]
Signed-off-by: Takashi Iwai
Signed-off-by: Chris Mason

Takashi Iwai
2014-08-15 22:43:19 +0800
ce62003f6 Btrfs: fix compressed write corruption on enospc ... Browse Code »

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.

Signed-off-by: Liu Bo
Tested-By: Martin Steigerwald
Signed-off-by: Chris Mason

Liu Bo
2014-08-15 22:43:18 +0800
f90e579c2 btrfs: correctly handle return from ulist_add ... Browse Code »

ulist_add() can return '1' on sucess, which qgroup_subtree_accounting()
doesn't take into account. As a result, that value can be bubbled up to
callers, causing an error to be printed. Fix this by only returning the
value of ulist_add() when it indicates an error.

Signed-off-by: Mark Fasheh
Signed-off-by: Chris Mason

Mark Fasheh
2014-08-15 22:43:16 +0800
1152651a0 btrfs: qgroup: account shared subtrees during snapshot delete ... Browse Code »

During its tree walk, btrfs_drop_snapshot() will skip any shared
subtrees it encounters. This is incorrect when we have qgroups
turned on as those subtrees need to have their contents
accounted. In particular, the case we're concerned with is when
removing our snapshot root leaves the subtree with only one root
reference.

In those cases we need to find the last remaining root and add
each extent in the subtree to the corresponding qgroup exclusive
counts.

This patch implements the shared subtree walk and a new qgroup
operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
this type is encountered during qgroup accounting, we search for
any root references to that extent and in the case that we find
only one reference left, we go ahead and do the math on it's
exclusive counts.

Signed-off-by: Mark Fasheh
Reviewed-by: Josef Bacik
Signed-off-by: Chris Mason

Mark Fasheh
2014-08-15 22:43:14 +0800
6f7ff6d78 Btrfs: read lock extent buffer while walking backrefs ... Browse Code »

Before processing the extent buffer, acquire a read lock on it, so
that we're safe against concurrent updates on the extent buffer.

Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-08-15 22:43:13 +0800
e339a6b09 Btrfs: __btrfs_mod_ref should always use no_quota ... Browse Code »

Before I extended the no_quota arg to btrfs_dec/inc_ref because I didn't
understand how snapshot delete was using it and assumed that we needed the
quota operations there. With Mark's work this has turned out to be not the
case, we _always_ need to use no_quota for btrfs_dec/inc_ref, so just drop the
argument and make __btrfs_mod_ref call it's process function with no_quota set
always. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-08-15 22:43:11 +0800
ba7b6e62f btrfs: adjust statfs calculations according to raid profiles ... Browse Code »

This has been discussed in thread:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/32528

and this patch implements this proposal:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/32536

Works fine for "clean" raid profiles where the raid factor correction
does the right job. Otherwise it's pessimistic and may show low space
although there's still some left.

The df nubmers are lightly wrong in case of mixed block groups, but this
is not a major usecase and can be addressed later.

The RAID56 numbers are wrong almost the same way as before and will be
addressed separately.

CC: Hugo Mills
CC: cwillu
CC: Josef Bacik
Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2014-08-15 22:43:10 +0800

12 Aug, 2014

1 commit

f6f993328 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"Stuff in here:

- acct.c fixes and general rework of mnt_pin mechanism. That allows
to go for delayed-mntput stuff, which will permit mntput() on deep
stack without worrying about stack overflows - fs shutdown will
happen on shallow stack. IOW, we can do Eric's umount-on-rmdir
series without introducing tons of stack overflows on new mntput()
call chains it introduces.
- Bruce's d_splice_alias() patches
- more Miklos' rename() stuff.
- a couple of regression fixes (stable fodder, in the end of branch)
and a fix for API idiocy in iov_iter.c.

There definitely will be another pile, maybe even two. I'd like to
get Eric's series in this time, but even if we miss it, it'll go right
in the beginning of for-next in the next cycle - the tricky part of
prereqs is in this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
fix copy_tree() regression
__generic_file_write_iter(): fix handling of sync error after DIO
switch iov_iter_get_pages() to passing maximal number of pages
fs: mark __d_obtain_alias static
dcache: d_splice_alias should detect loops
exportfs: update Exporting documentation
dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
dcache: remove unused d_find_alias parameter
dcache: d_obtain_alias callers don't all want DISCONNECTED
dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
dcache: d_splice_alias mustn't create directory aliases
dcache: close d_move race in d_splice_alias
dcache: move d_splice_alias
namei: trivial fix to vfs_rename_dir comment
VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
cifs: support RENAME_NOREPLACE
hostfs: support rename flags
shmem: support RENAME_EXCHANGE
shmem: support RENAME_NOREPLACE
btrfs: add RENAME_NOREPLACE
...

Linus Torvalds
2014-08-12 02:44:11 +0800

08 Aug, 2014

2 commits

1a0a397e4 dcache: d_obtain_alias callers don't all want DISCONNECTED ... Browse Code »

There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.

This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.

It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.

In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a12e "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.

Cc: Josef Bacik
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
2014-08-08 02:40:10 +0800
80ace85c9 btrfs: add RENAME_NOREPLACE ... Browse Code »

RENAME_NOREPLACE is trivial to implement for most filesystems: switch over
to ->rename2() and check for the supported flags. The rest is done by the
VFS.

Signed-off-by: Miklos Szeredi
Cc: Chris Mason
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Miklos Szeredi
2014-08-08 02:40:09 +0800

28 Jul, 2014

1 commit

ca5bc6cd5 Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new changes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2014-07-28 16:03:00 +0800

21 Jul, 2014

1 commit

da83fc6e0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"We have two more fixes in my for-linus branch.

I was hoping to also include a fix for a btrfs deadlock with
compression enabled, but we're still nailing that one down"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: test for valid bdev before kobj removal in btrfs_rm_device
Btrfs: fix abnormal long waiting in fsync

Linus Torvalds
2014-07-21 11:21:05 +0800

20 Jul, 2014

2 commits

0bfaa9c5c btrfs: test for valid bdev before kobj removal in btrfs_rm_device ... Browse Code »

commit 99994cd btrfs: dev delete should remove sysfs entry
added a btrfs_kobj_rm_device, which dereferences device->bdev...
right after we check whether device->bdev might be NULL.

I don't honestly know if it's possible to have a NULL device->bdev
here, but assuming that it is (given the test), we need to move
the kobject removal to be under that test.

(Coverity spotted this)

Signed-off-by: Eric Sandeen
Signed-off-by: Chris Mason

Eric Sandeen
2014-07-20 02:49:44 +0800
98ce2deda Btrfs: fix abnormal long waiting in fsync ... Browse Code »

xfstests generic/127 detected this problem.

With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush
data within the passed range. This is the cause of the above problem,
-- btrfs's fsync has a stage called 'sync log' which will wait for all the
ordered extents it've recorded to finish.

In xfstests/generic/127, with mixed operations such as truncate, fallocate,
punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
mmap, and then msync. And I find that msync will wait for quite a long time
(about 20s in my case), thanks to ftrace, it turns out that the previous
fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
there can be some ordered extents created but not getting corresponding pages
flushed, then they're left in memory until we fsync which runs into the
stage 'sync log', and fsync will just wait for the system writeback thread
to flush those pages and get ordered extents finished, so the latency is
inevitable.

This adds a flush similar to btrfs_start_ordered_extent() in
btrfs_wait_logged_extents() to fix that.

Reviewed-by: Miao Xie
Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2014-07-20 02:49:44 +0800

16 Jul, 2014

1 commit

743162013 sched: Remove proliferation of wait_on_bit() action functions ... Browse Code »

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.

Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.

All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown
Acked-by: David Howells (fscache, keys)
Acked-by: Steven Whitehouse (gfs2)
Acked-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steve French
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar

NeilBrown
2014-07-16 21:10:39 +0800

04 Jul, 2014

1 commit

b82207b8e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"We've queued up a few fixes in my for-linus branch"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix crash when starting transaction
Btrfs: fix btrfs_print_leaf for skinny metadata
Btrfs: fix race of using total_bytes_pinned
btrfs: use E2BIG instead of EIO if compression does not help
btrfs: remove stale comment from btrfs_flush_all_pending_stuffs
Btrfs: fix use-after-free when cloning a trailing file hole
btrfs: fix null pointer dereference in btrfs_show_devname when name is null
btrfs: fix null pointer dereference in clone_fs_devices when name is null
btrfs: fix nossd and ssd_spread mount option regression
Btrfs: fix race between balance recovery and root deletion
Btrfs: atomically set inode->i_flags in btrfs_update_iflags
btrfs: only unlock block in verify_parent_transid if we locked it
Btrfs: assert send doesn't attempt to start transactions
btrfs compression: reuse recently used workspace
Btrfs: fix crash when mounting raid5 btrfs with missing disks
btrfs: create sprout should rename fsid on the sysfs as well
btrfs: dev replace should replace the sysfs entry
btrfs: dev add should add its sysfs entry
btrfs: dev delete should remove sysfs entry
btrfs: rename add_device_membership to btrfs_kobj_add_device

Linus Torvalds
2014-07-04 23:53:53 +0800

03 Jul, 2014

11 commits

abdd2e80a Btrfs: fix crash when starting transaction ... Browse Code »

Often when starting a transaction we commit the currently running transaction,
which can end up writing block group caches when the current process has its
journal_info set to NULL (and not to a transaction). This makes our assertion
at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
in a crash/hang. Therefore fix it by setting journal_info.

Two different traces of this issue follow below.

1)

[51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
[51502.242213] ------------[ cut here ]------------
[51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
[51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
(...)
[51502.244010] Call Trace:
[51502.244010] [] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
[51502.244010] [] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
[51502.244010] [] commit_cowonly_roots+0x164/0x226 [btrfs]
[51502.244010] [] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
[51502.244010] [] ? _raw_spin_unlock+0x2b/0x40
[51502.244010] [] start_transaction+0x459/0x620 [btrfs]
[51502.244010] [] btrfs_start_transaction+0x1b/0x20 [btrfs]
[51502.244010] [] __unlink_start_trans+0x31/0xe0 [btrfs]
[51502.244010] [] btrfs_unlink+0x37/0xc0 [btrfs]
[51502.244010] [] ? do_unlinkat+0x114/0x2a0
[51502.244010] [] vfs_unlink+0xcc/0x150
[51502.244010] [] do_unlinkat+0x260/0x2a0
[51502.244010] [] ? filp_close+0x64/0x90
[51502.244010] [] ? trace_hardirqs_on_caller+0x16/0x1e0
[51502.244010] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
[51502.244010] [] SyS_unlinkat+0x1b/0x40
[51502.244010] [] system_call_fastpath+0x16/0x1b
[51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
[51502.244010] RIP [] assfail.constprop.88+0x1e/0x20 [btrfs]

2)

[25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
[25405.097488] ------------[ cut here ]------------
[25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
[25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
(...)
[25405.100008] Call Trace:
[25405.100008] [] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
[25405.100008] [] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
[25405.100008] [] commit_cowonly_roots+0x164/0x226 [btrfs]
[25405.100008] [] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
[25405.100008] [] ? bit_waitqueue+0xc0/0xc0
[25405.100008] [] start_transaction+0x459/0x620 [btrfs]
[25405.100008] [] btrfs_start_transaction+0x1b/0x20 [btrfs]
[25405.100008] [] btrfs_create+0x47/0x210 [btrfs]
[25405.100008] [] ? btrfs_permission+0x3c/0x80 [btrfs]
[25405.100008] [] vfs_create+0x9b/0x130
[25405.100008] [] do_last+0x849/0xe20
[25405.100008] [] ? link_path_walk+0x79/0x820
[25405.100008] [] path_openat+0xc5/0x690
[25405.100008] [] ? trace_hardirqs_on+0xd/0x10
[25405.100008] [] ? __alloc_fd+0x32/0x1d0
[25405.100008] [] do_filp_open+0x43/0xa0
[25405.100008] [] ? __alloc_fd+0x151/0x1d0
[25405.100008] [] do_sys_open+0x13c/0x230
[25405.100008] [] ? trace_hardirqs_on_caller+0x16/0x1e0
[25405.100008] [] SyS_open+0x22/0x30
[25405.100008] [] system_call_fastpath+0x16/0x1b
[25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
[25405.100008] RIP [] assfail.constprop.88+0x1e/0x20 [btrfs]

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-07-03 22:04:18 +0800
be2c765df Btrfs: fix btrfs_print_leaf for skinny metadata ... Browse Code »

We wouldn't actuall print the extent information if we had a skinny metadata
item, this fixes that. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-07-03 22:04:16 +0800
d288db5dc Btrfs: fix race of using total_bytes_pinned ... Browse Code »

This percpu counter @total_bytes_pinned is introduced to skip unnecessary
operations of 'commit transaction', it accounts for those space we may free
but are stuck in delayed refs.

And we zero out @space_info->total_bytes_pinned every transaction period so
we have a better idea of how much space we'll actually free up by committing
this transaction. However, we do the 'zero out' part a little earlier, before
we actually unpin space, so we end up returning ENOSPC when we actually have
free space that's just unpinned from committing transaction.

xfstests/generic/074 complained then.

This fixes it by actually accounting the percpu pinned number when 'unpin',
and since it's protected by space_info->lock, the race is gone now.

Signed-off-by: Liu Bo
Reviewed-by: Miao Xie
Signed-off-by: Chris Mason

Liu Bo
2014-07-03 22:04:15 +0800
130d5b415 btrfs: use E2BIG instead of EIO if compression does not help ... Browse Code »

Return codes got updated in 60e1975acb48fc3d74a3422b21dde74c977ac3d5
(btrfs: return errno instead of -1 from compression)
lzo wrapper returns E2BIG in this case, do the same for zlib.

Signed-off-by: David Sterba

David Sterba
2014-07-03 22:04:13 +0800
0a4eaea89 btrfs: remove stale comment from btrfs_flush_all_pending_stuffs ... Browse Code »

Commit fcebe4562dec83b3f8d3088d77584727b09130b2 (Btrfs: rework qgroup
accounting) removed the qgroup accounting after delayed refs.

Signed-off-by: David Sterba

David Sterba
2014-07-03 22:04:12 +0800
14f597963 Btrfs: fix use-after-free when cloning a trailing file hole ... Browse Code »

The transaction handle was being used after being freed.

Cc: Chris Mason
Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-07-03 22:04:10 +0800
0aeb8a6e6 btrfs: fix null pointer dereference in btrfs_show_devname when name is null ... Browse Code »

dev->name is null but missing flag is not set.
Strictly speaking the missing flag should have been set, but there
are more places where code just checks if name is null. For now this
patch does the same.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000064
IP: [] btrfs_show_devname+0x58/0xf0 [btrfs]

[] show_vfsmnt+0x39/0x130
[] m_show+0x16/0x20
[] seq_read+0x296/0x390
[] vfs_read+0x9d/0x160
[] SyS_read+0x49/0x90
[] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs

Signed-off-by: Anand Jain
Signed-off-by: Chris Mason

Anand Jain
2014-07-03 22:04:09 +0800
e755f7808 btrfs: fix null pointer dereference in clone_fs_devices when name is null ... Browse Code »

when one of the device path is missing btrfs_device name is null. So this
patch will check for that.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [] strlen+0x0/0x30
[] ? clone_fs_devices+0xaa/0x160 [btrfs]
[] btrfs_init_new_device+0x317/0xca0 [btrfs]
[] ? __kmalloc_track_caller+0x15a/0x1a0
[] btrfs_ioctl+0xaa3/0x2860 [btrfs]
[] ? handle_mm_fault+0x48c/0x9c0
[] ? __blkdev_put+0x171/0x180
[] ? __do_page_fault+0x4ac/0x590
[] ? blkdev_put+0x106/0x110
[] ? mntput+0x35/0x40
[] do_vfs_ioctl+0x460/0x4a0
[] ? ____fput+0xe/0x10
[] ? task_work_run+0xb3/0xd0
[] SyS_ioctl+0x57/0x90
[] ? do_page_fault+0xe/0x10
[] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs

Signed-off-by: Anand Jain
Signed-off-by: Chris Mason

Anand Jain
2014-07-03 22:04:07 +0800
2aa06a35d btrfs: fix nossd and ssd_spread mount option regression ... Browse Code »

The commit

0780253 btrfs: Cleanup the btrfs_parse_options for remount.

broke ssd options quite badly; it stopped making ssd_spread
imply ssd, and it made "nossd" unsettable.

Put things back at least as well as they were before
(though ssd mount option handling is still pretty odd:
# mount -o "nossd,ssd_spread" works?)

Reported-by: Roman Mamedov
Signed-off-by: Eric Sandeen
Signed-off-by: Chris Mason

Eric Sandeen
2014-07-03 22:04:06 +0800
5f3164813 Btrfs: fix race between balance recovery and root deletion ... Browse Code »

Balance recovery is called when RW mounting or remounting from
RO to RW, it is called to finish roots merging.

When doing balance recovery, relocation root's corresponding
fs root(whose root refs is 0) might be destroyed by cleaner
thread, this will make btrfs fail to mount.

Fix this problem by holding @cleaner_mutex when doing balance
recovery.

Signed-off-by: Wang Shilong
Signed-off-by: Chris Mason

Wang Shilong
2014-07-03 22:04:04 +0800
3cc793925 Btrfs: atomically set inode->i_flags in btrfs_update_iflags ... Browse Code »

This change is based on the corresponding recent change for ext4:

ext4: atomically set inode->i_flags in ext4_set_inode_flags()

That has the following commit message that applies to btrfs as well:

"Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time."

Replacing EXT4_IMMUTABLE_FL and EXT4_APPEND_FL with BTRFS_INODE_IMMUTABLE
and BTRFS_INODE_APPEND, respectively.

Reviewed-by: David Sterba
Reviewed-by: Satoru Takeuchi
Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-07-03 22:03:23 +0800

29 Jun, 2014

9 commits

472b909ff btrfs: only unlock block in verify_parent_transid if we locked it ... Browse Code »

This is a regression from my patch a26e8c9f75b0bfd8cccc9e8f110737b136eb5994, we
need to only unlock the block if we were the one who locked it. Otherwise this
will trip BUG_ON()'s in locking.c Thanks,

cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-06-29 04:48:47 +0800
46c4e71e9 Btrfs: assert send doesn't attempt to start transactions ... Browse Code »

When starting a transaction just assert that current->journal_info
doesn't contain a send transaction stub, since send isn't supposed
to start transactions and when it finishes (either successfully or
not) it's supposed to set current->journal_info to NULL.

This is motivated by the change titled:

Btrfs: fix crash when starting transaction

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-06-29 04:48:46 +0800
c39aa7056 btrfs compression: reuse recently used workspace ... Browse Code »

Add compression `workspace' in free_workspace() to
`idle_workspace' list head, instead of tail. So we have
better chances to reuse most recently used `workspace'.

Signed-off-by: Sergey Senozhatsky
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Sergey Senozhatsky
2014-06-29 04:48:46 +0800
5588383ec Btrfs: fix crash when mounting raid5 btrfs with missing disks ... Browse Code »

The reproducer is

$ mkfs.btrfs D1 D2 D3 -mraid5
$ mkfs.ext4 D2 && mkfs.ext4 D3
$ mount D1 /btrfs -odegraded

-------------------

[ 87.672992] ------------[ cut here ]------------
[ 87.673845] kernel BUG at fs/btrfs/raid56.c:1828!
...
[ 87.673845] RIP: 0010:[] [] __raid_recover_end_io+0x4ae/0x4d0
...
[ 87.673845] Call Trace:
[ 87.673845] [] ? mempool_free+0x36/0xa0
[ 87.673845] [] raid_recover_end_io+0x75/0xa0
[ 87.673845] [] bio_endio+0x5b/0xa0
[ 87.673845] [] bio_endio_nodec+0x12/0x20
[ 87.673845] [] end_workqueue_fn+0x41/0x50
[ 87.673845] [] normal_work_helper+0xca/0x2c0
[ 87.673845] [] process_one_work+0x1eb/0x530
[ 87.673845] [] ? process_one_work+0x189/0x530
[ 87.673845] [] worker_thread+0x11b/0x4f0
[ 87.673845] [] ? rescuer_thread+0x290/0x290
[ 87.673845] [] kthread+0xe4/0x100
[ 87.673845] [] ? kthread_create_on_node+0x220/0x220
[ 87.673845] [] ret_from_fork+0x7c/0xb0
[ 87.673845] [] ? kthread_create_on_node+0x220/0x220

-------------------

It's because that we miscalculate @rbio->bbio->error so that it doesn't
reach maximum of tolerable errors while it should have.

Signed-off-by: Liu Bo
Tested-by: Satoru Takeuchi
Signed-off-by: Chris Mason

Liu Bo
2014-06-29 04:48:45 +0800
b2373f255 btrfs: create sprout should rename fsid on the sysfs as well ... Browse Code »

Creating sprout will change the fsid of the mounted root.
do the same on the sysfs as well.

reproducer:
mount /dev/sdb /btrfs (seed disk)
btrfs dev add /dev/sdc /btrfs
mount -o rw,remount /btrfs
btrfs dev del /dev/sdb /btrfs
mount /dev/sdb /btrfs

Error:
kobject_add_internal failed for fe350492-dc28-4051-a601-e017b17e6145 with -EEXIST, don't try to register things with the same name in the same directory.

Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Anand Jain
2014-06-29 04:48:44 +0800
49c6f736f btrfs: dev replace should replace the sysfs entry ... Browse Code »

when we replace the device its corresponding sysfs
entry has to be replaced as well

Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Anand Jain
2014-06-29 04:48:44 +0800
0d39376aa btrfs: dev add should add its sysfs entry ... Browse Code »

we would need the device links to be created,
when device is added.

Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Anand Jain
2014-06-29 04:48:43 +0800
99994cde9 btrfs: dev delete should remove sysfs entry ... Browse Code »

when we delete the device from the mounted btrfs,
we would need its corresponding sysfs enty to
be removed as well.

Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Anand Jain
2014-06-29 04:48:42 +0800
9b4eaf43f btrfs: rename add_device_membership to btrfs_kobj_add_device ... Browse Code »

Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: Chris Mason

Anand Jain
2014-06-29 04:48:41 +0800

22 Jun, 2014

1 commit

e13d100be Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"This fixes some lockups in btrfs reported with rc1. It probably has
some performance impact because it is backing off our spinning locks
more often and switching to a blocking lock. I'll be able to nail
that down next week, but for now I want to get the lockups taken care
of.

Otherwise some more stack reduction and assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix wrong error handle when the device is missing or is not writeable
Btrfs: fix deadlock when mounting a degraded fs
Btrfs: use bio_endio_nodec instead of open code
Btrfs: fix NULL pointer crash when running balance and scrub concurrently
btrfs: Skip scrubbing removed chunks to avoid -ENOENT.
Btrfs: fix broken free space cache after the system crashed
Btrfs: make free space cache write out functions more readable
Btrfs: remove unused wait queue in struct extent_buffer
Btrfs: fix deadlocks with trylock on tree nodes

Linus Torvalds
2014-06-22 08:21:43 +0800