Doug / smarc-fsl-linux-kernel | Embedian Git Server

29 Jan, 2012

1 commit

67d2433ee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix reservations in btrfs_page_mkwrite
Btrfs: advance window_start if we're using a bitmap
btrfs: mask out gfp flags in releasepage
Btrfs: fix enospc error caused by wrong checks of the chunk
Btrfs: do not defrag a file partially
Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
Btrfs: use cluster->window_start when allocating from a cluster bitmap
Btrfs: Check for NULL page in extent_range_uptodate
btrfs: Fix busyloops in transaction waiting code
Btrfs: make sure a bitmap has enough bytes
Btrfs: fix uninit warning in backref.c

Linus Torvalds
2012-01-29 09:00:19 +0800

27 Jan, 2012

11 commits

9998eb703 Btrfs: fix reservations in btrfs_page_mkwrite ... Browse Code »

Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.

This makes sure we only release if we were able to reserve.

Signed-off-by: Chris Mason

Chris Mason
2012-01-27 23:44:44 +0800
9b2306284 Btrfs: advance window_start if we're using a bitmap ... Browse Code »

If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:12 +0800
0c4e538bc btrfs: mask out gfp flags in releasepage ... Browse Code »

btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state

3399 if ((mask & GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;

will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963

[] cache_alloc_refill+0x3b4/0x5c8
[] kmem_cache_alloc+0x204/0x294
[] mempool_alloc+0x52/0x170
[] alloc_extent_state+0x40/0xd4 [btrfs]
[] __clear_extent_bit+0x38a/0x4cc [btrfs]
[] try_release_extent_state+0x9c/0xd4 [btrfs]
[] btree_releasepage+0x7e/0xd0 [btrfs]
[] shrink_page_list+0x6a0/0x724
[] shrink_inactive_list+0x230/0x578
[] shrink_list+0x6c/0x120
[] shrink_zone+0x1e2/0x228
[] shrink_zones+0x90/0x254
[] do_try_to_free_pages+0xac/0x420
[] try_to_free_pages+0x13c/0x1b0
[] __alloc_pages_nodemask+0x5b4/0x9a8
[] grab_cache_page_write_begin+0x7e/0xe8

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2012-01-27 04:01:12 +0800
9e622d6be Btrfs: fix enospc error caused by wrong checks of the chunk ... Browse Code »

When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.

Reproduce steps:
# mkfs.btrfs -b $((2 * 1024 * 1024 * 1024))
# mount /mnt
# ulimit -n 102400
# cd /mnt
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr prepare
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr run

The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.

One is
if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.

The other is
if (space_info->force_alloc)
force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.

Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2012-01-27 04:01:12 +0800
7ec31b548 Btrfs: do not defrag a file partially ... Browse Code »

xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2

To fix this, we need to set max_to_defrag count properly.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-01-27 04:01:12 +0800
0b485143d Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c ... Browse Code »

There have been 4 warnings on 32-bit build, they are herewith fixed.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-01-27 04:01:11 +0800
0b4a9d248 Btrfs: use cluster->window_start when allocating from a cluster bitmap ... Browse Code »

We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:11 +0800
8bedd51b6 Btrfs: Check for NULL page in extent_range_uptodate ... Browse Code »

A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors. The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).

After testing this patch, the user reported that the error with
the NULL pointer oops was solved. However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.c

for (i = start_i; i < num_pages; i++) {
page = extent_buffer_page(eb, i);
wait_on_page_locked(page);
if (!PageUptodate(page))
ret = -EIO;
}

This patch leaves the issue with the locked page yet to be resolved.

Signed-off-by: Mitch Harder
Signed-off-by: Chris Mason

Mitch Harder
2012-01-27 04:01:11 +0800
6dd70ce4e btrfs: Fix busyloops in transaction waiting code ... Browse Code »

wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...

Signed-off-by: Jan Kara
Signed-off-by: Chris Mason

Jan Kara
2012-01-27 04:01:11 +0800
357b9784b Btrfs: make sure a bitmap has enough bytes ... Browse Code »

We have only been checking for min_bytes available in bitmap entries, but we
won't successfully setup a bitmap cluster unless it has at least bytes in the
bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
if there are a bunch of bitmap entries with less than 2mb's in them, we'll
search all them anyway, which is suboptimal. Fix this check. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:11 +0800
b1375d64c Btrfs: fix uninit warning in backref.c ... Browse Code »

Added initialization with the declaration of ret. It isn't set later on the
switch-default branch (which should never be taken).

Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason

Jan Schmidt
2012-01-27 04:01:11 +0800

18 Jan, 2012

2 commits

d65773b22 Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
btrfs: take allocation of ->tree_root into open_ctree()
btrfs: let ->s_fs_info point to fs_info, not root...
btrfs: consolidate failure exits in btrfs_mount() a bit
btrfs: make free_fs_info() call ->kill_sb() unconditional
btrfs: merge free_fs_info() calls on fill_super failures
btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
btrfs: make open_ctree() return int
btrfs: sanitizing ->fs_info, part 5
btrfs: sanitizing ->fs_info, part 4
btrfs: sanitizing ->fs_info, part 3
btrfs: sanitizing ->fs_info, part 2
btrfs: sanitizing ->fs_info, part 1
btrfs: fix a deadlock in btrfs_scan_one_device()
btrfs: fix mount/umount race
btrfs: get ->kill_sb() of its own
btrfs: preparation to fixing mount/umount race

Linus Torvalds
2012-01-18 07:52:51 +0800
f9156c728 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
Btrfs: use larger system chunks
Btrfs: add a delalloc mutex to inodes for delalloc reservations
Btrfs: space leak tracepoints
Btrfs: protect orphan block rsv with spin_lock
Btrfs: add allocator tracepoints
Btrfs: don't call btrfs_throttle in file write
Btrfs: release space on error in page_mkwrite
Btrfs: fix btrfsck error 400 when truncating a compressed
Btrfs: do not use btrfs_end_transaction_throttle everywhere
Btrfs: add balance progress reporting
Btrfs: allow for resuming restriper after it was paused
Btrfs: allow for canceling restriper
Btrfs: allow for pausing restriper
Btrfs: add skip_balance mount option
Btrfs: recover balance on mount
Btrfs: save balance parameters to disk
Btrfs: soft profile changing mode (aka soft convert)
Btrfs: implement online profile changing
Btrfs: do not reduce profile in do_chunk_alloc()
Btrfs: virtual address space subset filter
...

Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
mnt_drop_write_file() helper.

Linus Torvalds
2012-01-18 07:49:54 +0800

17 Jan, 2012

26 commits

96bdc7dc6 Btrfs: use larger system chunks ... Browse Code »

system chunks by default are very small. This makes them slightly
larger and also fixes the conditional checks to make sure we don't
allocate a billion of them at once.

Signed-off-by: Chris Mason

Chris Mason
2012-01-17 04:38:24 +0800
f248679e8 Btrfs: add a delalloc mutex to inodes for delalloc reservations ... Browse Code »

I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
that and theres no real way to get rid of those, so just stop using i_mutex to
protect delalloc metadata reservations and use a delalloc mutex instead. This
shouldn't be contended often at all, only if you are writing and mmap writing to
the file at the same time. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:43 +0800
8c2a3ca20 Btrfs: space leak tracepoints ... Browse Code »

This in addition to a script in my btrfs-tracing tree will help track down space
leaks when we're getting space left over in block groups on umount. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:43 +0800
90290e198 Btrfs: protect orphan block rsv with spin_lock ... Browse Code »

We've been seeing warnings coming out of the orphan commit stuff forever from
ceph. Turns out it's because we're racing with checking if the orphan block
reserve is set, because we clear it outside of the spin_lock. So leave the
normal fastpath checks where they are, but take the spin_lock and _recheck_ to
make sure we haven't had an orphan block rsv added in the meantime. Then clear
the root's orphan block rsv and release the lock. With this patch a user said
the warnings went away and they usually showed up pretty soon after he started
ceph. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:42 +0800
3f7de037f Btrfs: add allocator tracepoints ... Browse Code »

I used these tracepoints when figuring out what the cluster stuff was doing, so
add them to mainline in case we need to profile this stuff again. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:42 +0800
45a8090e6 Btrfs: don't call btrfs_throttle in file write ... Browse Code »

Btrfs_throttle will make us wait if there is a currently committing transaction
until we can open new transactions, which is ridiculous since we don't actually
start any transactions within the file write path anyway, so all this does is
introduce big latencies if we have a sync/fsync heavy workload going on while
somebody else is trying to do work. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-17 04:28:55 +0800
ec39e180f Btrfs: release space on error in page_mkwrite ... Browse Code »

If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-17 04:28:54 +0800
f70a9a6b9 Btrfs: fix btrfsck error 400 when truncating a compressed ... Browse Code »

Reproduce steps:
# mkfs.btrfs /dev/sdb5
# mount /dev/sdb5 -o compress=lzo /mnt
# dd if=/dev/zero of=/mnt/tmpfile bs=128K count=1
# sync
# truncate -s 64K /mnt/tmpfile
root 5 inode 257 errors 400

This is because of the wrong if condition, which is used to check if we should
subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not.
When we truncate a compressed extent, btrfs substracts the bytes of the whole
extent, it's wrong. We should substract the real size that we truncate, no
matter it is a compressed extent or not. Fix it.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2012-01-17 04:28:54 +0800
7ad85bb76 Btrfs: do not use btrfs_end_transaction_throttle everywhere ... Browse Code »

A user reported a problem where things like open with O_CREAT would take up to
30 seconds when he had nfs activity on the same mount. This is because all of
our quick metadata operations, like create, symlink etc all do
btrfs_end_transaction_throttle, which if the transaction is blocked will wait
for the commit to complete before it returns. This adds a ridiculous amount of
latency and isn't really needed. The normal btrfs_end_transaction will mark the
transaction as blocked and wake the transaction kthread up if it thinks the
transaction needs to end (this being in the running out of global reserve space
scenario), and this is all that is really needed since we've already done
everything we're going to do, we just need to return. This should help people
with the latency they were seeing when using synchronous heavy workloads.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-17 04:28:54 +0800
c126dea77 Merge branch 'integrity-check-patch-v2' of git://btrfs.giantdisaster.de/git/btrfs into integration ... Browse Code »

Conflicts:
fs/btrfs/ctree.h
fs/btrfs/super.c

Signed-off-by: Chris Mason

Chris Mason
2012-01-17 04:27:58 +0800
9785dbdf2 Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into integration Browse Code »

Chris Mason
2012-01-17 04:26:31 +0800
d756bd2d9 Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration ... Browse Code »

Conflicts:
fs/btrfs/volumes.c

Signed-off-by: Chris Mason

Chris Mason
2012-01-17 04:26:17 +0800
27263e283 Merge branch 'restriper' of git://github.com/idryomov/btrfs-unstable into integration Browse Code »

Chris Mason
2012-01-17 04:26:02 +0800
19a39dce3 Btrfs: add balance progress reporting ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
de322263d Btrfs: allow for resuming restriper after it was paused ... Browse Code »

Recognize BTRFS_BALANCE_RESUME flag passed from userspace. We use the
same heuristics used when recovering balance after a crash to try to
start where we left off last time.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
a7e99c691 Btrfs: allow for canceling restriper ... Browse Code »

Implement an ioctl for canceling restriper. Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit. Balance item is deleted and no memory
about the interrupted balance is kept.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
837d5b6e4 Btrfs: allow for pausing restriper ... Browse Code »

Implement an ioctl for pausing restriper. This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc. If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount. (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
9555c6c18 Btrfs: add skip_balance mount option ... Browse Code »

Since restriper kthread starts involuntarily on mount and can suck cpu
and memory bandwidth add a mount option to forcefully skip it. The
restriper in that case hangs around in paused state and can be resumed
from userspace when it's convenient.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
596410151 Btrfs: recover balance on mount ... Browse Code »

On mount, if balance item is found, resume balance in a separate
kernel thread.

Try to be smart to continue roughly where previous balance (or convert)
was interrupted. For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
0940ebf6b Btrfs: save balance parameters to disk ... Browse Code »

Introduce a new btree objectid for storing balance item. The reason is
to be able to resume restriper after a crash with the same parameters.
Balance item has a very high objectid and goes into tree of tree roots.

The key for the new item is as follows:

[ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ]

Older kernels simply ignore it so it's safe to mount with an older
kernel and then go back to the newer one.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
cfa4c961c Btrfs: soft profile changing mode (aka soft convert) ... Browse Code »

When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to. This is useful if e.g. half of the FS was converted
earlier.

The soft mode switch is (like every other filter) per-type. This means
that we can convert for example meta chunks the "hard" way while
converting data chunks selectively with soft switch.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
e4d8ec0f6 Btrfs: implement online profile changing ... Browse Code »

Profile changing is done by launching a balance with
BTRFS_BALANCE_CONVERT bits set and target fields of respective
btrfs_balance_args structs initialized. Profile reducing code in this
case will pick restriper's target profile if it's available instead of
doing a blind reduce. If target profile is not yet available it goes
back to a plain reduce.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
70922617b Btrfs: do not reduce profile in do_chunk_alloc() ... Browse Code »

Every caller of do_chunk_alloc() feeds it the reduced allocation
profile, so stop trying to reduce it one more time. Instead check the
validity of the passed profile.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
ea67176ae Btrfs: virtual address space subset filter ... Browse Code »

Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
94e60d5a5 Btrfs: devid subset filter ... Browse Code »

Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.

This filter only works when devid filter is turned on.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
409d404b4 Btrfs: devid filter ... Browse Code »

Relocate chunks which have at least one stripe located on a device with
devid X.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:47 +0800