Eric Lee / smarc-fsl-linux-kernel

01 Dec, 2011

10 commits

be064d113 Btrfs: skip allocation attempt from empty cluster ... Browse Code »

If we don't have a cluster, don't bother trying to allocate from it,
jumping right away to the attempt to allocate a new cluster.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
425d83156 Btrfs: skip block groups without enough space for a cluster ... Browse Code »

We test whether a block group has enough free space to hold the
requested block, but when we're doing clustered allocation, we can
save some cycles by testing whether it has enough room for the cluster
upfront, otherwise we end up attempting to set up a cluster and
failing. Only in the NO_EMPTY_SIZE loop do we attempt an unclustered
allocation, and by then we'll have zeroed the cluster size, so this
patch won't stop us from using the block group as a last resort.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
1b22bad77 Btrfs: start search for new cluster at the beginning ... Browse Code »

Instead of starting at zero (offset is always zero), request a cluster
starting at search_start, that denotes the beginning of the current
block group.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
b78d09bce Btrfs: reset cluster's max_size when creating bitmap ... Browse Code »

The field that indicates the size of the largest contiguous chunk of
free space in the cluster is not initialized when setting up bitmaps,
it's only increased when we find a larger contiguous chunk. We end up
retaining a larger value than appropriate for highly-fragmented
clusters, which may cause pointless searches for large contiguous
groups, and even cause clusters that do not meet the density
requirements to be set up.

Signed-off-by: Alexandre Oliva
Signed-off-by: Chris Mason

Alexandre Oliva
2011-12-01 02:43:00 +0800
f2d0f6765 Btrfs: initialize new bitmaps' list ... Browse Code »

We're failing to create clusters with bitmaps because
setup_cluster_no_bitmap checks that the list is empty before inserting
the bitmap entry in the list for setup_cluster_bitmap, but the list
field is only initialized when it is restored from the on-disk free
space cache, or when it is written out to disk.

Besides a potential race condition due to the multiple use of the list
field, filesystem performance severely degrades over time: as we use
up all non-bitmap free extents, the try-to-set-up-cluster dance is
done at every metadata block allocation. For every block group, we
fail to set up a cluster, and after failing on them all up to twice,
we fall back to the much slower unclustered allocation.

To make matters worse, before the unclustered allocation, we try to
create new block groups until we reach the 1% threshold, which
introduces additional bitmaps and thus block groups that we'll iterate
over at each metadata block request.

Alexandre Oliva
2011-12-01 01:46:06 +0800
b772a86ea Btrfs: fix oops when calling statfs on readonly device ... Browse Code »

To reproduce this bug:

# dd if=/dev/zero of=img bs=1M count=256
# mkfs.btrfs img
# losetup -r /dev/loop1 img
# mount /dev/loop1 /mnt
OOPS!!

It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

To fix this, instead of checking write-only devices, we check all open
deivces:

# df -h /dev/loop1
Filesystem Size Used Avail Use% Mounted on
/dev/loop1 250M 28K 238M 1% /mnt

Signed-off-by: Li Zefan

Li Zefan
2011-12-01 01:46:05 +0800
ece7d20e8 Btrfs: Don't error on resizing FS to same size ... Browse Code »

It seems overly harsh to fail a resize of a btrfs file system to the
same size when a shrink or grow would succeed. User app GParted trips
over this error. Allow it by bypassing the shrink or grow operation.

Signed-off-by: Mike Fleetwood

Mike Fleetwood
2011-12-01 01:46:04 +0800
aa38a711a Btrfs: fix deadlock on metadata reservation when evicting a inode ... Browse Code »

When I ran the xfstests, I found the test tasks was blocked on meta-data
reservation.

By debugging, I found the reason of this bug:
start transaction
|
v
reserve meta-data space
|
v
flush delay allocation -> iput inode -> evict inode
^ |
| v
wait for delay allocation flush

Miao Xie
2011-12-01 01:46:03 +0800
b52f75a59 Fix URL of btrfs-progs git repository in docs ... Browse Code »

The location of the btrfs-progs repository has been changed.
This patch updates the documentation accordingly.

Signed-off-by: Arnd Hannemann

Arnd Hannemann
2011-12-01 01:46:02 +0800
26bdef541 btrfs scrub: handle -ENOMEM from init_ipath() ... Browse Code »

init_ipath() can return an ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter

Dan Carpenter
2011-12-01 01:46:01 +0800

22 Nov, 2011

1 commit

24a703139 Btrfs: remove free-space-cache.c WARN during log replay ... Browse Code »

The log replay code only partially loads block groups, since
the block group caching code is able to detect and deal with
extents the logging code has pinned down.

While the logging code is pinning down block groups, there is
a bogus WARN_ON we're hitting if the code wasn't able to find
an extent in the cache. This commit removes the warning because
it can happen any time there isn't a valid free space cache
for that block group.

Signed-off-by: Chris Mason

Chris Mason
2011-11-22 03:57:33 +0800

20 Nov, 2011

10 commits

4d479cf01 Btrfs: sectorsize align offsets in fiemap ... Browse Code »

We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere
else that calls btrfs_get_extent while running xfstests 13 in a loop. This is
because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets,
which will end up adding mappings that are not sectorsize aligned, which will
cause problems in some cases for subsequent calls to btrfs_get_extent for
similar areas that are sectorsize aligned. With this patch I ran xfstests 13 in
a loop for a couple of hours and didn't hit the problem that I could previously
hit in at most 20 minutes. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-11-20 20:42:17 +0800
f7d61dcd6 Btrfs: clear pages dirty for io and set them extent mapped ... Browse Code »

When doing the io_ctl helpers to clean up the free space cache stuff I stopped
using our normal prepare_pages stuff, which means I of course forgot to do
things like set the pages extent mapped, which will cause us all sorts of
wonderful propblems. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-11-20 20:42:17 +0800
291c7d2f5 Btrfs: wait on caching if we're loading the free space cache ... Browse Code »

We've been hitting panics when running xfstest 13 in a loop for long periods of
time. And actually this problem has always existed so we've been hitting these
things randomly for a while. Basically what happens is we get a thread coming
into the allocator and reading the space cache off of disk and adding the
entries to the free space cache as we go. Then we get another thread that comes
in and tries to allocate from that block group. Since block_group->cached !=
BTRFS_CACHE_NO it goes ahead and tries to do the allocation. We do this because
if we're doing the old slow way of caching we don't want to hold people up and
wait for everything to finish. The problem with this is we could end up
discarding the space cache at some arbitrary point in the future, which means we
could very well end up allocating space that is either bad, or when the real
caching happens it could end up thinking the space isn't in use when it really
is and cause all sorts of other problems.

The solution is to add a new flag to indicate we are loading the free space
cache from disk, and always try to cache the block group if cache->cached !=
BTRFS_CACHE_FINISHED. That way if we are loading the space cache anybody else
who tries to allocate from the block group will have to wait until it's finished
to make sure it completes successfully. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-11-20 20:42:16 +0800
5bb146823 Btrfs: prefix resize related printks with btrfs: ... Browse Code »

For the user it is confusing to find something like:
[10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
in kernel log, because it doesn't point directly to btrfs.

This patch prefixes those messages with "btrfs:" like other btrfs
related printks.

Signed-off-by: Arnd Hannemann
Signed-off-by: Chris Mason

Arnd Hannemann
2011-11-20 20:42:16 +0800
fadc0d8be btrfs: fix stat blocks accounting ... Browse Code »

Round inode bytes and delalloc bytes up to real blocksize before
converting to sector size. Otherwise eg. files smaller than 512
are reported with zero blocks due to incorrect rounding.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-11-20 20:42:15 +0800
52621cb6e Btrfs: avoid unnecessary bitmap search for cluster setup ... Browse Code »

setup_cluster_no_bitmap() searches all the extents and bitmaps starting
from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
from offset are in the bitmaps list, so it's sufficient to search from
this list in setup_cluser_bitmap().

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-11-20 20:42:15 +0800
0f0fbf1d0 Btrfs: fix to search one more bitmap for cluster setup ... Browse Code »

Suppose there are two bitmaps [0, 256], [256, 512] and one extent
[100, 120] in the free space cache, and we want to setup a cluster
with offset=100, bytes=50.

In this case, there will be only one bitmap [256, 512] in the temporary
bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].

The cause is, the list is constructed in setup_cluster_no_bitmap(),
and only bitmaps with bitmap_entry->offset >= offset will be added
into the list, and the very bitmap that convers offset has
bitmap_entry->offset
Signed-off-by: Chris Mason

Li Zefan
2011-11-20 20:42:14 +0800
32240a913 btrfs: mirror_num should be int, not u64 ... Browse Code »

My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.

Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason

Jan Schmidt
2011-11-20 20:42:14 +0800
745c4d8e1 btrfs: Fix up 32/64-bit compatibility for new ioctls ... Browse Code »

This patch casts to unsigned long before casting to a pointer and fixes
the following warnings:
fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Signed-off-by: Jeff Mahoney
Signed-off-by: Chris Mason

Jeff Mahoney
2011-11-20 20:42:13 +0800
387125fc7 Btrfs: fix barrier flushes ... Browse Code »

When btrfs is writing the super blocks, it send barrier flushes to make
sure writeback caching drives get all the metadata on disk in the
right order.

But, we have two bugs in the way these are sent down. When doing
full commits (not via the tree log), we are sending the barrier down
before the last super when it should be going down before the first.

In multi-device setups, we should be waiting for the barriers to
complete on all devices before writing any of the supers.

Both of these bugs can cause corruptions on power failures. We fix it
with some new code to send down empty barriers to all devices before
writing the first super.

Alexandre Oliva found the multi-device bug. Arne Jansen did the async
barrier loop.

Signed-off-by: Chris Mason
Reported-by: Alexandre Oliva

Chris Mason
2011-11-20 20:21:14 +0800

15 Nov, 2011

1 commit

f1ebcc74d Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush ... Browse Code »

The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit.

But there are two cases to lead to tree corruptions:

1) multi-thread snapshots can commit serveral snapshots in a transaction,
and this may change the src root when processing the following pending
snapshots, which lead to the former snapshots corruptions;

2) the free inode cache was changing the roots when it root the cache,
which lead to corruptions.

This fixes things by making sure we force COW the block after we create a
snapshot during commiting a transaction, then any changes to the roots
will result in COW, and we get all the fs roots and snapshot roots to be
consistent.

Signed-off-by: Liu Bo
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Liu Bo
2011-11-15 22:53:28 +0800

11 Nov, 2011

11 commits

8965593e4 btrfs: rename the option to nospace_cache ... Browse Code »

Rename no_space_cache option to nospace_cache to be more consistent with
the rest, where the simple prefix 'no' is used to negate an option.

The option has been introduced during the -rc1 cycle and there are has not been
widely used, so it's safe.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-11-11 23:14:57 +0800
69f4cb526 Btrfs: handle bio_add_page failure gracefully in scrub ... Browse Code »

Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately
dm based targets accept only one page per bio, thus making scrub always
fails. This patch just submits the current bio when an error is encountered
and starts a new one.

Signed-off-by: Arne Jansen
Signed-off-by: Chris Mason

Arne Jansen
2011-11-11 21:17:10 +0800
62f30c546 Btrfs: fix deadlock caused by the race between relocation ... Browse Code »

We can not do flushable reservation for the relocation when we create snapshot,
because it may make the transaction commit task and the flush task wait for
each other and the deadlock happens.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
2f120c05e Btrfs: only map pages if we know we need them when reading the space cache ... Browse Code »

People have been running into a warning when loading space cache because the
page is already mapped when trying to read in a bitmap. The way we read in
entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry
maps the entries if it needs to, and if it hits the end of the page it simply
unmaps the page. That way we can unconditionally unmap the io_ctl before
reading in the bitmap and we should stop hitting these warnings. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-11 09:45:05 +0800
76b9e23d2 Btrfs: fix orphan backref nodes ... Browse Code »

If the root node of a fs/file tree is in the block group that is
being relocated, but the others are not in the other block groups.
when we create a snapshot for this tree between the relocation tree
creation ends and ->create_reloc_tree is set to 0, Btrfs will create
some backref nodes that are the lowest nodes of the backrefs cache.
But we forget to add them into ->leaves list of the backref cache
and deal with them, and at last, they will triggered BUG_ON().

kernel BUG at fs/btrfs/relocation.c:239!

This patch fixes it by adding them into ->leaves list of backref cache.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
61b520a9d Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush} ... Browse Code »

btrfs_block_rsv_add{, _noflush}() have similar code, so abstract that code.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
3254c8761 Btrfs: fix unreleased path in btrfs_orphan_cleanup() ... Browse Code »

When we did stress test for the space relocation, the deadlock happened.
By debugging, We found it was caused by the carelessness that we forgot
to unlock the read lock of the extent buffers in btrfs_orphan_cleanup()
before we end the transaction handle, so the transaction commit task waited
the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer,
but that task waited the commit task to end the transaction commit, and
the deadlock happened. Fix it.

Signed-ff-by: Miao Xie

Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
ba38eb4de Btrfs: fix no reserved space for writing out inode cache ... Browse Code »

I-node cache forgets to reserve the space when writing out it. And when
we do some stress test, such as synctest, it will trigger WARN_ON() in
use_block_rsv().

WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
...
Call Trace:
[] warn_slowpath_common+0x80/0x98
[] warn_slowpath_null+0x15/0x17
[] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
[] ? __set_page_dirty_nobuffers+0xfe/0x108
[] __btrfs_cow_block+0x118/0x3b5 [btrfs]
[] btrfs_cow_block+0x103/0x14e [btrfs]
[] btrfs_search_slot+0x249/0x6a4 [btrfs]
[] btrfs_lookup_inode+0x2a/0x8a [btrfs]
[] btrfs_update_inode+0xaa/0x141 [btrfs]
[] btrfs_save_ino_cache+0xea/0x202 [btrfs]
[] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
[] commit_fs_roots+0xaa/0x158 [btrfs]
[] btrfs_commit_transaction+0x405/0x731 [btrfs]
[] ? wake_up_bit+0x25/0x25
[] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
[] btrfs_sync_file+0x16a/0x198 [btrfs]
[] ? mntput+0x21/0x23
[] vfs_fsync_range+0x18/0x21
[] vfs_fsync+0x17/0x19
[] do_fsync+0x29/0x3e
[] sys_fsync+0xb/0xf
[] system_call_fastpath+0x16/0x1b

Sometimes it causes BUG_ON() in the reservation code of the delayed inode
is triggered.

So we must reserve enough space for inode cache.

Note: If we can not reserve the enough space for inode cache, we will
give up writing out it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:04 +0800
924cd8fbe Btrfs: fix nocow when deleting the item ... Browse Code »

btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves,
if we modify the result of it, the meta-data will be broken. fix it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:04 +0800
f7d572188 Merge branch 'mount-fixes' of git://github.com/idryomov/btrfs-unstable into integration Browse Code »

Chris Mason
2011-11-11 09:42:53 +0800
2115133f8 Btrfs: tweak the delayed inode reservations again ... Browse Code »

Josef sent along an incremental to the inode reservation
code to make sure we try and fall back to directly updating
the inode item if things go horribly wrong.

This reworks that patch slightly, adding a fallback function
that will always try to update the inode item directly without
going through the delayed_inode code.

Signed-off-by: Chris Mason

Chris Mason
2011-11-11 09:39:08 +0800

10 Nov, 2011

5 commits

04d21a244 Btrfs: rework error handling in btrfs_mount() ... Browse Code »

Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer
dereference on error paths, also we would leave all devices busy and
leak fs_info with all sub-structures on error when trying to mount an
already mounted fs to a different directory.

Fix this by doing all allocations before trying to open any of the
devices, adjust error path for mount-already-mounted-fs case.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:39 +0800
586e46e28 Btrfs: close devices on all error paths in open_ctree() ... Browse Code »

Fix a bug introduced by 7e662854 where we would leave devices busy on
certain error paths in open_ctree(). fs_info is guaranteed to be
non-NULL now so it's safe to dereference it on all error paths.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800
4d34b2789 Btrfs: avoid null dereference and leaks when bailing from open_ctree() ... Browse Code »

Fix bugs introduced by 6c41761f. Firstly, after failing to allocate any
of the tree roots (first 'goto fail' in open_ctree()) we would
dereference a NULL fs_info pointer in free_fs_info(). Secondly, after
failures from init_srcu_struct(), setup_bdi() and new_inode() we would
leak all earlier allocated roots: fs_info fields haven't been
initialized yet so free_fs_info() is rendered useless.

Fix this by initializing fs_info pointer and fs_info fields before any
allocations happen.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800
f23c8af8c Btrfs: fix subvol_name leak on error in btrfs_mount() ... Browse Code »

btrfs_parse_early_options() can fail due to error while scanning devices
(-o device= option), but still strdup() subvol_name string:

mount -o subvol=SUBV,device=BAD_DEVICE

So free subvol_name string on error.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800
a90e8b6fb Btrfs: fix memory leak in btrfs_parse_early_options() ... Browse Code »

Don't leak subvol_name string in case multiple subvol= options are
given. "The lastest option is effective" behavior (consistent with
subvolid= and subvolrootid= options) is preserved.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800

09 Nov, 2011

2 commits

7fd2ae21a Btrfs: fix our reservations for updating an inode when completing io ... Browse Code »

People have been reporting ENOSPC crashes in finish_ordered_io. This is because
we try to steal from the delalloc block rsv to satisfy a reservation to update
the inode. The problem with this is we don't explicitly save space for updating
the inode when doing delalloc. This is kind of a problem and we've gotten away
with this because way back when we just stole from the delalloc reserve without
any questions, and this worked out fine because generally speaking the leaf had
been modified either by the mtime update when we did the original write or
because we just updated the leaf when we inserted the file extent item, only on
rare occasions had the leaf not actually been modified, and that was still ok
because we'd just use a block or two out of the over-reservation that is
delalloc.

Then came the delayed inode stuff. This is amazing, except it wants a full
reservation for updating the inode since it may do it at some point down the
road after we've written the blocks and we have to recow everything again. This
worked out because the delayed inode stuff just stole from the global reserve,
that is until recently when I changed that because it caused other problems.

So here we are, we're doing everything right and being screwed for it. So take
an extra reservation for the inode at delalloc reservation time and carry it
through the life of the delalloc reservation. If we need it we can steal it in
the delayed inode stuff. If we have already stolen it try and do a normal
metadata reservation. If that fails try to steal from the delalloc reservation.
If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to
solve this and in the meantime we'll steal from the global reserve.

With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see
any problems.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-09 04:47:34 +0800
917c16b2b Btrfs: fix oops on NULL trans handle in btrfs_truncate ... Browse Code »

If we fail to reserve space in the transaction during truncate, we can
error out with a NULL trans handle. The cleanup code needs an extra
check to make sure we aren't trying to use the bad handle.

Signed-off-by: Chris Mason

Chris Mason
2011-11-09 03:49:59 +0800