Eric Lee / smarc-fsl-linux-kernel

20 Nov, 2011

3 commits

32240a913 btrfs: mirror_num should be int, not u64 ... Browse Code »

My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.

Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason

Jan Schmidt
2011-11-20 20:42:14 +0800
745c4d8e1 btrfs: Fix up 32/64-bit compatibility for new ioctls ... Browse Code »

This patch casts to unsigned long before casting to a pointer and fixes
the following warnings:
fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Signed-off-by: Jeff Mahoney
Signed-off-by: Chris Mason

Jeff Mahoney
2011-11-20 20:42:13 +0800
387125fc7 Btrfs: fix barrier flushes ... Browse Code »

When btrfs is writing the super blocks, it send barrier flushes to make
sure writeback caching drives get all the metadata on disk in the
right order.

But, we have two bugs in the way these are sent down. When doing
full commits (not via the tree log), we are sending the barrier down
before the last super when it should be going down before the first.

In multi-device setups, we should be waiting for the barriers to
complete on all devices before writing any of the supers.

Both of these bugs can cause corruptions on power failures. We fix it
with some new code to send down empty barriers to all devices before
writing the first super.

Alexandre Oliva found the multi-device bug. Arne Jansen did the async
barrier loop.

Signed-off-by: Chris Mason
Reported-by: Alexandre Oliva

Chris Mason
2011-11-20 20:21:14 +0800

15 Nov, 2011

1 commit

f1ebcc74d Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush ... Browse Code »

The btrfs snapshotting code requires that once a root has been
snapshotted, we don't change it during a commit.

But there are two cases to lead to tree corruptions:

1) multi-thread snapshots can commit serveral snapshots in a transaction,
and this may change the src root when processing the following pending
snapshots, which lead to the former snapshots corruptions;

2) the free inode cache was changing the roots when it root the cache,
which lead to corruptions.

This fixes things by making sure we force COW the block after we create a
snapshot during commiting a transaction, then any changes to the roots
will result in COW, and we get all the fs roots and snapshot roots to be
consistent.

Signed-off-by: Liu Bo
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Liu Bo
2011-11-15 22:53:28 +0800

11 Nov, 2011

11 commits

8965593e4 btrfs: rename the option to nospace_cache ... Browse Code »

Rename no_space_cache option to nospace_cache to be more consistent with
the rest, where the simple prefix 'no' is used to negate an option.

The option has been introduced during the -rc1 cycle and there are has not been
widely used, so it's safe.

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2011-11-11 23:14:57 +0800
69f4cb526 Btrfs: handle bio_add_page failure gracefully in scrub ... Browse Code »

Currently scrub fails with ENOMEM when bio_add_page fails. Unfortunately
dm based targets accept only one page per bio, thus making scrub always
fails. This patch just submits the current bio when an error is encountered
and starts a new one.

Signed-off-by: Arne Jansen
Signed-off-by: Chris Mason

Arne Jansen
2011-11-11 21:17:10 +0800
62f30c546 Btrfs: fix deadlock caused by the race between relocation ... Browse Code »

We can not do flushable reservation for the relocation when we create snapshot,
because it may make the transaction commit task and the flush task wait for
each other and the deadlock happens.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
2f120c05e Btrfs: only map pages if we know we need them when reading the space cache ... Browse Code »

People have been running into a warning when loading space cache because the
page is already mapped when trying to read in a bitmap. The way we read in
entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry
maps the entries if it needs to, and if it hits the end of the page it simply
unmaps the page. That way we can unconditionally unmap the io_ctl before
reading in the bitmap and we should stop hitting these warnings. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-11 09:45:05 +0800
76b9e23d2 Btrfs: fix orphan backref nodes ... Browse Code »

If the root node of a fs/file tree is in the block group that is
being relocated, but the others are not in the other block groups.
when we create a snapshot for this tree between the relocation tree
creation ends and ->create_reloc_tree is set to 0, Btrfs will create
some backref nodes that are the lowest nodes of the backrefs cache.
But we forget to add them into ->leaves list of the backref cache
and deal with them, and at last, they will triggered BUG_ON().

kernel BUG at fs/btrfs/relocation.c:239!

This patch fixes it by adding them into ->leaves list of backref cache.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
61b520a9d Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush} ... Browse Code »

btrfs_block_rsv_add{, _noflush}() have similar code, so abstract that code.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
3254c8761 Btrfs: fix unreleased path in btrfs_orphan_cleanup() ... Browse Code »

When we did stress test for the space relocation, the deadlock happened.
By debugging, We found it was caused by the carelessness that we forgot
to unlock the read lock of the extent buffers in btrfs_orphan_cleanup()
before we end the transaction handle, so the transaction commit task waited
the task, which called btrfs_orphan_cleanup(), to unlock the extent buffer,
but that task waited the commit task to end the transaction commit, and
the deadlock happened. Fix it.

Signed-ff-by: Miao Xie

Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800
ba38eb4de Btrfs: fix no reserved space for writing out inode cache ... Browse Code »

I-node cache forgets to reserve the space when writing out it. And when
we do some stress test, such as synctest, it will trigger WARN_ON() in
use_block_rsv().

WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
...
Call Trace:
[] warn_slowpath_common+0x80/0x98
[] warn_slowpath_null+0x15/0x17
[] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
[] ? __set_page_dirty_nobuffers+0xfe/0x108
[] __btrfs_cow_block+0x118/0x3b5 [btrfs]
[] btrfs_cow_block+0x103/0x14e [btrfs]
[] btrfs_search_slot+0x249/0x6a4 [btrfs]
[] btrfs_lookup_inode+0x2a/0x8a [btrfs]
[] btrfs_update_inode+0xaa/0x141 [btrfs]
[] btrfs_save_ino_cache+0xea/0x202 [btrfs]
[] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
[] commit_fs_roots+0xaa/0x158 [btrfs]
[] btrfs_commit_transaction+0x405/0x731 [btrfs]
[] ? wake_up_bit+0x25/0x25
[] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
[] btrfs_sync_file+0x16a/0x198 [btrfs]
[] ? mntput+0x21/0x23
[] vfs_fsync_range+0x18/0x21
[] vfs_fsync+0x17/0x19
[] do_fsync+0x29/0x3e
[] sys_fsync+0xb/0xf
[] system_call_fastpath+0x16/0x1b

Sometimes it causes BUG_ON() in the reservation code of the delayed inode
is triggered.

So we must reserve enough space for inode cache.

Note: If we can not reserve the enough space for inode cache, we will
give up writing out it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:04 +0800
924cd8fbe Btrfs: fix nocow when deleting the item ... Browse Code »

btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves,
if we modify the result of it, the meta-data will be broken. fix it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:04 +0800
f7d572188 Merge branch 'mount-fixes' of git://github.com/idryomov/btrfs-unstable into integration Browse Code »

Chris Mason
2011-11-11 09:42:53 +0800
2115133f8 Btrfs: tweak the delayed inode reservations again ... Browse Code »

Josef sent along an incremental to the inode reservation
code to make sure we try and fall back to directly updating
the inode item if things go horribly wrong.

This reworks that patch slightly, adding a fallback function
that will always try to update the inode item directly without
going through the delayed_inode code.

Signed-off-by: Chris Mason

Chris Mason
2011-11-11 09:39:08 +0800

10 Nov, 2011

5 commits

04d21a244 Btrfs: rework error handling in btrfs_mount() ... Browse Code »

Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer
dereference on error paths, also we would leave all devices busy and
leak fs_info with all sub-structures on error when trying to mount an
already mounted fs to a different directory.

Fix this by doing all allocations before trying to open any of the
devices, adjust error path for mount-already-mounted-fs case.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:39 +0800
586e46e28 Btrfs: close devices on all error paths in open_ctree() ... Browse Code »

Fix a bug introduced by 7e662854 where we would leave devices busy on
certain error paths in open_ctree(). fs_info is guaranteed to be
non-NULL now so it's safe to dereference it on all error paths.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800
4d34b2789 Btrfs: avoid null dereference and leaks when bailing from open_ctree() ... Browse Code »

Fix bugs introduced by 6c41761f. Firstly, after failing to allocate any
of the tree roots (first 'goto fail' in open_ctree()) we would
dereference a NULL fs_info pointer in free_fs_info(). Secondly, after
failures from init_srcu_struct(), setup_bdi() and new_inode() we would
leak all earlier allocated roots: fs_info fields haven't been
initialized yet so free_fs_info() is rendered useless.

Fix this by initializing fs_info pointer and fs_info fields before any
allocations happen.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800
f23c8af8c Btrfs: fix subvol_name leak on error in btrfs_mount() ... Browse Code »

btrfs_parse_early_options() can fail due to error while scanning devices
(-o device= option), but still strdup() subvol_name string:

mount -o subvol=SUBV,device=BAD_DEVICE

So free subvol_name string on error.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800
a90e8b6fb Btrfs: fix memory leak in btrfs_parse_early_options() ... Browse Code »

Don't leak subvol_name string in case multiple subvol= options are
given. "The lastest option is effective" behavior (consistent with
subvolid= and subvolrootid= options) is preserved.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2011-11-10 04:53:38 +0800

09 Nov, 2011

2 commits

7fd2ae21a Btrfs: fix our reservations for updating an inode when completing io ... Browse Code »

People have been reporting ENOSPC crashes in finish_ordered_io. This is because
we try to steal from the delalloc block rsv to satisfy a reservation to update
the inode. The problem with this is we don't explicitly save space for updating
the inode when doing delalloc. This is kind of a problem and we've gotten away
with this because way back when we just stole from the delalloc reserve without
any questions, and this worked out fine because generally speaking the leaf had
been modified either by the mtime update when we did the original write or
because we just updated the leaf when we inserted the file extent item, only on
rare occasions had the leaf not actually been modified, and that was still ok
because we'd just use a block or two out of the over-reservation that is
delalloc.

Then came the delayed inode stuff. This is amazing, except it wants a full
reservation for updating the inode since it may do it at some point down the
road after we've written the blocks and we have to recow everything again. This
worked out because the delayed inode stuff just stole from the global reserve,
that is until recently when I changed that because it caused other problems.

So here we are, we're doing everything right and being screwed for it. So take
an extra reservation for the inode at delalloc reservation time and carry it
through the life of the delalloc reservation. If we need it we can steal it in
the delayed inode stuff. If we have already stolen it try and do a normal
metadata reservation. If that fails try to steal from the delalloc reservation.
If _that_ fails we'll get a WARN_ON() so I can start thinking of a better way to
solve this and in the meantime we'll steal from the global reserve.

With this patch I ran xfstests 13 in a loop for a couple of hours and didn't see
any problems.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-09 04:47:34 +0800
917c16b2b Btrfs: fix oops on NULL trans handle in btrfs_truncate ... Browse Code »

If we fail to reserve space in the transaction during truncate, we can
error out with a NULL trans handle. The cleanup code needs an extra
check to make sure we aren't trying to use the bad handle.

Signed-off-by: Chris Mason

Chris Mason
2011-11-09 03:49:59 +0800

08 Nov, 2011

1 commit

45ea6095c btrfs: fix double-free 'tree_root' in 'btrfs_mount()' ... Browse Code »

On error path 'tree_root' is treed in 'free_fs_info()'.
No need to free it explicitely. Noticed by SLUB in debug mode:

Complete reproducer under usermode linux (discovered on real
machine):

bdev=/dev/ubda
btr_root=/btr
/mkfs.btrfs $bdev
mount $bdev $btr_root
mkdir $btr_root/subvols/
cd $btr_root/subvols/
/btrfs su cr foo
/btrfs su cr bar
mount $bdev -osubvol=subvols/foo $btr_root/subvols/bar
umount $btr_root/subvols/bar

which gives

device fsid 4d55aa28-45b1-474b-b4ec-da912322195e devid 1 transid 7 /dev/ubda
=============================================================================
BUG kmalloc-2048: Object already free
-----------------------------------------------------------------------------

INFO: Allocated in btrfs_mount+0x389/0x7f0 age=0 cpu=0 pid=277
INFO: Freed in btrfs_mount+0x51c/0x7f0 age=0 cpu=0 pid=277
INFO: Slab 0x0000000062886200 objects=15 used=9 fp=0x0000000070b4d2d0 flags=0x4081
INFO: Object 0x0000000070b4d2d0 @offset=21200 fp=0x0000000070b4a968
...
Call Trace:
70b31948: [] print_trailer+0xe2/0x130
70b31978: [] object_err+0x3a/0x50
70b319a8: [] free_debug_processing+0x142/0x2a0
70b319e0: [] btrfs_mount+0x55f/0x7f0
70b319f8: [] __slab_free+0x221/0x2d0

Signed-off-by: Sergei Trofimovich
Cc: Arne Jansen
Cc: Chris Mason
Cc: David Sterba
Signed-off-by: Chris Mason

slyich@gmail.com
2011-11-08 05:08:01 +0800

07 Nov, 2011

1 commit

7c7e82a77 Btrfs: check for a null fs root when writing to the backup root log ... Browse Code »

During log replay, can commit the transaction before the fs_root
pointers are setup, so we have to make sure they are not null before
trying to use them.

Signed-off-by: Chris Mason

Chris Mason
2011-11-07 07:50:56 +0800

06 Nov, 2011

16 commits

d43317dcd Btrfs: fix race during transaction joins ... Browse Code »

While we're allocating ram for a new transaction, we drop our spinlock.
When we get the lock back, we do check to see if a transaction started
while we slept, but we don't check to make sure it isn't blocked
because a commit has already started.

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:26:19 +0800
56d2a48f8 Btrfs: fix a potential btrfs_bio leak on scrub fixups ... Browse Code »

In case we were able to map less than we wanted (length < PAGE_SIZE
clause is true) btrfs_bio is still allocated and we have to free it.

Signed-off-by: Ilya Dryomov
Signed-off-by: Chris Mason

Ilya Dryomov
2011-11-06 16:11:29 +0800
21ca543ef Btrfs: rename btrfs_bio multi -> bbio for consistency ... Browse Code »

Signed-off-by: Chris Mason

Ilya Dryomov
2011-11-06 16:11:21 +0800
9510dc4c6 Btrfs: stop leaking btrfs_bios on readahead ... Browse Code »

Signed-off-by: Chris Mason

Ilya Dryomov
2011-11-06 16:11:08 +0800
306c8b68c Btrfs: stop the readahead threads on failed mount ... Browse Code »

If we don't stop them, they linger around corrupting
memory by using pointers to freed things.

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:09:41 +0800
c674e04e1 Btrfs: fix extent_buffer leak in the metadata IO error handling ... Browse Code »

The scrub readahead branch brought in a new error handling hook,
but it was leaking extent_buffer references.

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:09:10 +0800
740c3d226 Btrfs: fix the new inspection ioctls for 32 bit compat ... Browse Code »

The new ioctls to follow backrefs are not clean for 32/64 bit
compat. This reworks them for u64s everywhere. They are brand new, so
there are no problems with changing the interface now.

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:08:49 +0800
806468f8b Merge git://git.jan-o-sch.net/btrfs-unstable into integration ... Browse Code »

Conflicts:
fs/btrfs/Makefile
fs/btrfs/extent_io.c
fs/btrfs/extent_io.h
fs/btrfs/scrub.c

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:07:10 +0800
531f4b1ae Merge branch 'for-chris' of git://github.com/sensille/linux into integration ... Browse Code »

Conflicts:
fs/btrfs/ctree.h

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:05:08 +0800
c06a0e120 Btrfs: fix delayed insertion reservation ... Browse Code »

We all keep getting those stupid warnings from use_block_rsv when running
stress.sh, and it's because the delayed insertion stuff is being stupid. It's
not the delayed insertion stuffs fault, it's all just stupid. When marking an
inode dirty for oh say updating the time on it, we just do a
btrfs_join_transaction, which doesn't reserve any space. This is stupid because
we're going to have to have space reserve to make this change, but we do it
because it's fast because chances are we're going to call it over and over again
and it doesn't matter. Well thanks to the delayed insertion stuff this is
mostly the case, so we do actually need to make this reservation. So if
trans->bytes_reserved is 0 then try to do a normal reservation. If not return
ENOSPC which will make the btrfs_dirty_inode start a proper transaction which
will let it do the whole ENOSPC dance and reserve enough space for the delayed
insertion to steal the reservation from the transaction.

The other stupid thing we do is not reserve space for the inode when writing to
the thing. Usually this is ok since we have to update the time so we'd have
already done all this work before we get to the endio stuff, so it doesn't
matter. But this is stupid because we could write the data after the
transaction commits where we changed the mtime of the inode so we have to cow
all the way down to the inode anyway. This used to be masked by the delalloc
reservation stuff, but because we delay the update it doesn't get masked in this
case. So again the delayed insertion stuff bites us in the ass. So if our
trans->block_rsv is delalloc, just steal the reservation from the delalloc
reserve. Hopefully this won't bite us in the ass, but I've said that before.

With this patch stress.sh no longer spits out those stupid warnings (famous last
words). Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-06 16:04:20 +0800
bf0da8c18 Btrfs: ClearPageError during writepage and clean_tree_block ... Browse Code »

Failure testing was tripping up over stale PageError bits in
metadata pages. If we have an io error on a block, and later on
end up reusing it, nobody ever clears PageError on those pages.

During commit, we'll find PageError and think we had trouble writing
the block, which will lead to aborts and other problems.

This changes clean_tree_block and the btrfs writepage code to
clear the PageError bit. In both cases we're either completely
done with the page or the page has good stuff and the error bit
is no longer valid.

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:04:20 +0800
663350ac3 Btrfs: be smarter about committing the transaction in reserve_metadata_bytes ... Browse Code »

Because of the overcommit stuff I had to make it so that we committed the
transaction all the time in reserve_metadata_bytes in case we had overcommitted
because of delayed items. This was because previously we had no way of knowing
how much space was reserved for delayed items. Now that we have the
delayed_block_rsv we can check it to see if committing the transaction would get
us anywhere. This patch breaks out the committing logic into a helper function
that will check to see if committing the transaction would free enough space for
us to get anything done. With this patch xfstests 83 goes from taking 445
seconds to taking 28 seconds on my box. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-06 16:04:19 +0800
6d668dda0 Btrfs: make a delayed_block_rsv for the delayed item insertion ... Browse Code »

I've been hitting warnings in use_block_rsv when running the delayed insertion
stuff. It's because we will readjust global block rsv based on what is in use,
which means we could end up discarding reservations that are for the delayed
insertion stuff. So instead create a seperate block rsv for the delayed
insertion stuff. This will also make it easier to debug problems with the
delayed insertion reservations since we will know that only the delayed
insertion code touches this block_rsv. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-06 16:04:18 +0800
af31f5e5b Btrfs: add a log of past tree roots ... Browse Code »

This takes some of the free space in the btrfs super block
to record information about most of the roots in the last four
commits.

It also adds a -o recovery to use the root history log when
we're not able to read the tree of tree roots, the extent
tree root, the device tree root or the csum root.

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:04:15 +0800
6c41761fc btrfs: separate superblock items out of fs_info ... Browse Code »

fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.

Signed-off-by: David Sterba

David Sterba
2011-11-06 16:04:01 +0800
c8174313a Btrfs: use the global reserve when truncating the free space cache inode ... Browse Code »

We no longer use the orphan block rsv for holding the reservation for truncating
the inode, so instead use the global block rsv and check to make sure it has
enough space for us to truncate the space. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2011-11-06 16:03:50 +0800