Doug / smarc-fsl-linux-kernel | Embedian Git Server

28 Mar, 2013

5 commits

fdf30d1c1 Btrfs: limit the global reserve to 512mb ... Browse Code »

A user reported a problem where he was getting early ENOSPC with hundreds of
gigs of free data space and 6 gigs of free metadata space. This is because the
global block reserve was taking up the entire free metadata space. This is
ridiculous, we have infrastructure in place to throttle if we start using too
much of the global reserve, so instead of letting it get this huge just limit it
to 512mb so that users can still get work done. This allowed the user to
complete his rsync without issues. Thanks

Cc: stable@vger.kernel.org
Reported-and-tested-by: Stefan Priebe
Signed-off-by: Josef Bacik

Josef Bacik
2013-03-28 21:51:29 +0800
db1d607d3 Btrfs: hold the ordered operations mutex when waiting on ordered extents ... Browse Code »

We need to hold the ordered_operations mutex while waiting on ordered extents
since we splice and run the ordered extents list. We need to make sure anybody
else who wants to wait on ordered extents does actually wait for them to be
completed. This will keep us from bailing out of flushing in case somebody is
already waiting on ordered extents to complete. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-03-28 21:51:28 +0800
6e137ed3f Btrfs: fix space accounting for unlink and rename ... Browse Code »

We are way over-reserving for unlink and rename. Rename is just some random
huge number and unlink accounts for tree log operations that don't actually
happen during unlink, not to mention the tree log doesn't take from the trans
block rsv anyway so it's completely useless. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-03-28 21:51:27 +0800
f4881bc7a Btrfs: fix space leak when we fail to reserve metadata space ... Browse Code »

Dave reported a warning when running xfstest 275. We have been leaking delalloc
metadata space when our reservations fail. This is because we were improperly
calculating how much space to free for our checksum reservations. The problem
is we would sometimes free up space that had already been freed in another
thread and we would end up with negative usage for the delalloc space. This
patch fixes the problem by calculating how much space the other threads would
have already freed, and then calculate how much space we need to free had we not
done the reservation at all, and then freeing any excess space. This makes
xfstests 275 no longer have leaked space. Thanks

Cc: stable@vger.kernel.org
Reported-by: David Sterba
Signed-off-by: Josef Bacik

Josef Bacik
2013-03-28 21:51:26 +0800
adaa4b8e4 Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes ... Browse Code »

When you take a snapshot, punch a hole where there has been data, then take
another snapshot and try to send an incremental stream, btrfs send would
give you EIO. That is because is_extent_unchanged had no support for holes
being punched. With this patch, instead of returning EIO we just return
0 (== the extent is not unchanged) and we're good.

Signed-off-by: Jan Schmidt
Cc: Alexander Block
Signed-off-by: Josef Bacik

Jan Schmidt
2013-03-28 21:51:26 +0800

27 Mar, 2013

1 commit

4adaa6110 Btrfs: fix race between mmap writes and compression ... Browse Code »

Btrfs uses page_mkwrite to ensure stable pages during
crc calculations and mmap workloads. We call clear_page_dirty_for_io
before we do any crcs, and this forces any application with the file
mapped to wait for the crc to finish before it is allowed to change
the file.

With compression on, the clear_page_dirty_for_io step is happening after
we've compressed the pages. This means the applications might be
changing the pages while we are compressing them, and some of those
modifications might not hit the disk.

This commit adds the clear_page_dirty_for_io before compression starts
and makes sure to redirty the page if we have to fallback to
uncompressed IO as well.

Signed-off-by: Chris Mason
Reported-by: Alexandre Oliva
cc: stable@vger.kernel.org

Chris Mason
2013-03-27 01:19:14 +0800

22 Mar, 2013

5 commits

1dd05682b Btrfs: fix memory leak in btrfs_create_tree() ... Browse Code »

We should free leaf and root before returning from the error
handling code.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Tsutomu Itoh
2013-03-22 07:31:52 +0800
d9abbf1c3 Btrfs: fix locking on ROOT_REPLACE operations in tree mod log ... Browse Code »

To resolve backrefs, ROOT_REPLACE operations in the tree mod log are
required to be tied to at least one KEY_REMOVE_WHILE_FREEING operation.
Therefore, those operations must be enclosed by tree_mod_log_write_lock()
and tree_mod_log_write_unlock() calls.

Those calls are private to the tree_mod_log_* functions, which means that
removal of the elements of an old root node must be logged from
tree_mod_log_insert_root. This partly reverts and corrects commit ba1bfbd5
(Btrfs: fix a tree mod logging issue for root replacement operations).

This fixes the brand-new version of xfstest 276 as of commit cfe73f71.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Schmidt
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Jan Schmidt
2013-03-22 07:31:52 +0800
6113077cd Btrfs: fix missing qgroup reservation before fallocating ... Browse Code »

Steps to reproduce:
mkfs.btrfs
mount
btrfs quota enable
btrfs sub create /subv
btrfs qgroup limit 10M /subv
fallocate --length 20M /subv/data

For the above example, fallocating will return successfully which
is not expected, we try to fix it by doing qgroup reservation before
fallocating.

Signed-off-by: Wang Shilong
Reviewed-by: Miao Xie
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Wang Shilong
2013-03-22 07:24:32 +0800
835d974fa Btrfs: handle a bogus chunk tree nicely ... Browse Code »

If you restore a btrfs-image file system and try to mount that file system we'll
panic. That's because btrfs-image restores and just makes one big chunk to
envelope the whole disk, since they are really only meant to be messed with by
our btrfs-progs. So fix up btrfs_rmap_block and the callers of it for mount so
that we no longer panic but instead just return an error and fail to mount.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-03-22 07:24:31 +0800
d76344828 Btrfs: update to use fs_state bit ... Browse Code »

Now that we use bit operation to check fs_state, update
btrfs_free_fs_root()'s checker, otherwise we get back to
memory leak case.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Liu Bo
2013-03-22 07:24:31 +0800

16 Mar, 2013

1 commit

3b2775942 Btrfs: fix warning of free_extent_map ... Browse Code »

Users report that an extent map's list is still linked when it's actually
going to be freed from cache.

The story is that

a) when we're going to drop an extent map and may split this large one into
smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
that it's on the list to be logged, then the smaller ems split from it will also
be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.

b) we'll keep ems from unlinking the list and freeing when they are flagged with
EXTENT_FLAG_LOGGING, because the log code holds one reference.

The end result is the warning, but the truth is that we set the flag
EXTENT_FLAG_LOGGING only during fsync.

So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.

Reported-by: Johannes Hirte
Reported-by: Darrick J. Wong
Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2013-03-16 09:51:49 +0800

15 Mar, 2013

6 commits

7c2ec3f07 Btrfs: fix warning when creating snapshots ... Browse Code »

Creating snapshot passes extent_root to commit its transaction,
but it can lead to the warning of checking root for quota in
the __btrfs_end_transaction() when someone else is committing
the current transaction. Since we've recorded the needed root
in trans_handle, just use it to get rid of the warning.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2013-03-15 02:57:30 +0800
720f1e206 Btrfs: return as soon as possible when edquot happens ... Browse Code »

If one of qgroup fails to reserve firstly, we should return immediately,
it is unnecessary to continue check.

Signed-off-by: Wang Shilong
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Wang Shilong
2013-03-15 02:57:29 +0800
492104c86 Btrfs: return EIO if we have extent tree corruption ... Browse Code »

The callers of lookup_inline_extent_info all handle getting an error back
properly, so return an error if we have corruption instead of being a jerk and
panicing. Still WARN_ON() since this is kind of crucial and I've been seeing it
a bit too much recently for my taste, I think we're doing something wrong
somewhere. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-03-15 02:57:29 +0800
bc178622d btrfs: use rcu_barrier() to wait for bdev puts at unmount ... Browse Code »

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
__btrfs_close_devices
call_rcu(&device->rcu, free_device);
free_device
INIT_WORK(&device->rcu_work, __free_device);
schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Eric Sandeen
2013-03-15 02:57:29 +0800
d340d2475 Btrfs: remove btrfs_try_spin_lock ... Browse Code »

Remove a useless function declaration

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Liu Bo
2013-03-15 02:57:10 +0800
a09a0a705 Btrfs: get better concurrency for snapshot-aware defrag work ... Browse Code »

Using spinning case instead of blocking will result in better concurrency
overall.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Liu Bo
2013-03-15 02:50:19 +0800

07 Mar, 2013

3 commits

de3cb945d Btrfs: improve the delayed inode throttling ... Browse Code »

The delayed inode code batches up changes to the btree in hopes of doing
them in bulk. As the changes build up, processes kick off worker
threads and wait for them to make progress.

The current code kicks off an async work queue item for each delayed
node, which creates a lot of churn. It also uses a fixed 1 HZ waiting
period for the throttle, which allows us to build a lot of pending
work and can slow down the commit.

This changes us to watch a sequence counter as it is bumped during the
operations. We kick off fewer work items and have each work item do
more work.

Signed-off-by: Chris Mason

Chris Mason
2013-03-07 20:52:40 +0800
3a01aa7a2 Btrfs: fix a mismerge in btrfs_balance() ... Browse Code »

Raid56 merge (merge commit e942f88) had mistakenly removed a call to
__cancel_balance(), which resulted in balance not cleaning up after itself
after a successful finish. (Cleanup includes switching the state, removing
the balance item and releasing mut_ex_op testnset lock.) Bring it back.

Reported-by: David Sterba
Signed-off-by: Ilya Dryomov
Signed-off-by: Chris Mason

Ilya Dryomov
2013-03-07 11:03:16 +0800
2cc65e3e5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btr… ... Browse Code »

…fs-next into for-linus-3.9

Chris Mason
2013-03-07 08:46:29 +0800

06 Mar, 2013

1 commit

154ea2893 Btrfs: enforce min_bytes parameter during extent allocation ... Browse Code »

Commit 24542bf7ea5e4fdfdb5157ff544c093fa4dcb536 changed preallocation of
extents to cap the max size we try to allocate. It's a valid change,
but the extent reservation code is also used by balance, and that
can't tolerate a smaller extent being allocated.

__btrfs_prealloc_file_range already has a min_size parameter, which is
used by relocation to request a specific extent size. This commit
adds an extra check to enforce that minimum extent size.

Signed-off-by: Chris Mason
Reported-by: Stefan Behrens

Chris Mason
2013-03-06 00:30:16 +0800

05 Mar, 2013

10 commits

9b53157aa Btrfs: allow running defrag in parallel to administrative tasks ... Browse Code »

Commit 5ac00add added a testnset mutex and code that disallows
running administrative tasks in parallel. It is prevented that
the device add/delete/balance/replace/resize operations are
started in parallel. By mistake, the defragmentation operation
was included in the check for mutually exclusiveness as well.
This is fixed with this commit.

Signed-off-by: Stefan Behrens
Signed-off-by: Josef Bacik

Stefan Behrens
2013-03-05 05:33:24 +0800
66b6135b7 Btrfs: avoid deadlock on transaction waiting list ... Browse Code »

Only let one trans handle to wait for other handles, otherwise we
will get ABBA issues.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-03-05 05:33:23 +0800
0f788c581 Btrfs: do not BUG_ON on aborted situation ... Browse Code »

Btrfs balance can easily hit BUG_ON in these places, but we want
to it bail out gracefully after we force the whole filesystem to
readonly. So we use btrfs_std_error hook in place of BUG_ON.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-03-05 05:33:23 +0800
288189471 Btrfs: do not BUG_ON in prepare_to_reloc ... Browse Code »

We can bail out from here gracefully instead of a cold BUG_ON.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-03-05 05:33:23 +0800
e1a126705 Btrfs: free all recorded tree blocks on error ... Browse Code »

We've missed the 'free blocks' part on ENOMEM error.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-03-05 05:33:23 +0800
aca1bba6f Btrfs: build up error handling for merge_reloc_roots ... Browse Code »

We first use btrfs_std_error hook to replace with BUG_ON, and we
also need to cleanup what is left, including reloc roots rbtree
and reloc roots list.
Here we use a helper function to cleanup both rbtree and list, and
since this function can also be used in the balance recover path,
we also make the change as well to keep code simple.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-03-05 05:33:22 +0800
8f71f3e0e Btrfs: check for NULL pointer in updating reloc roots ... Browse Code »

Add a check for NULL pointer to avoid invalid reference.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik

Liu Bo
2013-03-05 05:33:22 +0800
00d71c9c1 Btrfs: fix unclosed transaction handler when the async transaction commitment fails ... Browse Code »

If the async transaction commitment failed, we need close the
current transaction handler, or the current transaction will be
blocked to commit because of this orphan handler.

We fix the problem by doing sync transaction commitment, that is
to invoke btrfs_commit_transaction().

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-03-05 05:33:22 +0800
aec8030a8 Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails ... Browse Code »

There are several bugs at error path of create_snapshot() when the
transaction commitment failed.
- access the freed transaction handler. At the end of the
transaction commitment, the transaction handler was freed, so we
should not access it after the transaction commitment.
- we were not aware of the error which happened during the snapshot
creation if we submitted a async transaction commitment.
- pending snapshot access vs pending snapshot free. when something
wrong happened after we submitted a async transaction commitment,
the transaction committer would cleanup the pending snapshots and
free them. But the snapshot creators were not aware of it, they
would access the freed pending snapshots.

This patch fixes the above problems by:
- remove the dangerous code that accessed the freed handler
- assign ->error if the error happens during the snapshot creation
- the transaction committer doesn't free the pending snapshots,
just assigns the error number and evicts them before we unblock
the transaction.

Reported-by: Dan Carpenter
Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-03-05 05:33:22 +0800
9bf7a4890 Btrfs: use set_nlink if our i_nlink is 0 ... Browse Code »

We need to inc the nlink of deleted entries when running replay so we can do the
unlink on the fs_root and get everything cleaned up and then have the orphan
cleanup do the right thing. The problem is inc_nlink complains about this, even
thought it still does the right thing. So use set_nlink() if our i_nlink is 0
to keep users from seeing the warnings during log replay. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-03-05 05:30:06 +0800

03 Mar, 2013

1 commit

d7011f5b9 btrfs/raid56: Add missing #include <linux/vmalloc.h> ... Browse Code »

tilegx_defconfig:

fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table':
fs/btrfs/raid56.c:206:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
fs/btrfs/raid56.c:206:9: warning: assignment makes pointer from integer without a cast [enabled by default]
fs/btrfs/raid56.c:226:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]

Signed-off-by: Geert Uytterhoeven
Signed-off-by: Chris Mason

Geert Uytterhoeven
2013-03-03 19:53:41 +0800

02 Mar, 2013

2 commits

180e001cd btrfs: fixup/remove module.h usage as required ... Browse Code »

We want to avoid module.h where posible, since it in turn includes
nearly all of header space. This means removing it where it is not
required, and using export.h where we are only exporting symbols via
EXPORT_SYMBOL and friends.

Signed-off-by: Paul Gortmaker
Signed-off-by: Chris Mason

Paul Gortmaker
2013-03-02 04:01:01 +0800
124fe663f Btrfs: delete inline extents when we find them during logging ... Browse Code »

Apparently when we do inline extents we allow the data to overlap the last chunk
of the btrfs_file_extent_item, which means that we can possibly have a
btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
This messes with us when we try to overwrite the extent when logging new extents
since we expect for it to be the right size. To fix this just delete the item
and try to do the insert again which will give us the proper sized
btrfs_file_extent_item. This fixes a panic where map_private_extent_buffer
would blow up because we're trying to write past the end of the leaf. Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik

Josef Bacik
2013-03-02 00:47:21 +0800

01 Mar, 2013

5 commits

83c8266ac btrfs: try harder to allocate raid56 stripe cache ... Browse Code »

The stripe hash table is large, starting with allocation order 4 and can go as
high as order 7 in case lock debugging is turned on and structure padding
happens.

Observed mount failure:

mount: page allocation failure: order:7, mode:0x200050
Pid: 8234, comm: mount Tainted: G W 3.8.0-default+ #267
Call Trace:
[] warn_alloc_failed+0xf3/0x140
[] ? __alloc_pages_direct_compact+0x92/0x250
[] __alloc_pages_nodemask+0x733/0x9d0
[] ? cache_alloc_refill+0x3f8/0x840
[] cache_alloc_refill+0x43c/0x840
[] ? is_kernel_percpu_address+0x4b/0x90
[] ? btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
[] kmem_cache_alloc_trace+0x247/0x270
[] btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
[] open_ctree+0xb2f/0x1f90 [btrfs]
[] ? string+0x49/0xe0
[] ? vsnprintf+0x443/0x5d0
[] btrfs_mount+0x526/0x600 [btrfs]
[] ? cache_alloc_debugcheck_after+0x4c/0x200
[] mount_fs+0x20/0xe0
[] vfs_kern_mount+0x76/0x120
[] do_mount+0x386/0x980
[] ? strndup_user+0x5b/0x80
[] sys_mount+0x90/0xe0
[] system_call_fastpath+0x16/0x1b

Signed-off-by: David Sterba
Signed-off-by: Josef Bacik

David Sterba
2013-03-01 23:13:05 +0800
88e081bf8 Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic ... Browse Code »

The original code is a little confusing and not clear, The right
way to deal with the kernel code like this:
[...]
if (ret)
goto out;
[...]

So i move the common clean_up code to the place labeled with
out_fail, this will be easier to maintain.

Signed-off-by: Wang Shilong
Signed-off-by: Josef Bacik

Wang Shilong
2013-03-01 23:13:04 +0800
a9870c0e0 Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails ... Browse Code »

commit eb6b88d92c6df083dd09a8c471011e3788dfd7c6 leads into another bug.
If it is just because qgroup_reserve fails, the function btrfs_qgroup_free
should not be called, otherwise, it will cause the wrong quota accounting.

Signed-off-by: Wang Shilong
Signed-off-by: Josef Bacik

Wang Shilong
2013-03-01 23:13:04 +0800
235fdb8ef Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree ... Browse Code »

The check work has been done just before the function btrfs_clean_quota_tree
is called, it is not necessary to check it again, remove it.

Signed-off-by: Wang Shilong
Signed-off-by: Josef Bacik

Wang Shilong
2013-03-01 23:13:04 +0800
84cbe2f72 Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails ... Browse Code »

Return ENOMEM rather trigger BUG_ON, fix it.

Signed-off-by: Wang Shilong
Reviewed-by: Miao Xie
Reviewed-by: Zach Brown
Signed-off-by: Josef Bacik

Wang Shilong
2013-03-01 23:13:04 +0800