Eric Lee / smarc-fsl-linux-kernel

23 Feb, 2012

4 commits

506531905 Btrfs: clear the extent uptodate bits during parent transid failures ... Browse Code »

If btrfs reads a block and finds a parent transid mismatch, it clears
the uptodate flags on the extent buffer, and the pages inside it. But
we only clear the uptodate bits in the state tree if the block straddles
more than one page.

This is from an old optimization from to reduce contention on the extent
state tree. But it is buggy because the code that retries a read from
a different copy of the block is going to find the uptodate state bits
set and skip the IO.

The end result of the bug is that we'll never actually read the good
copy (if there is one).

The fix here is to always clear the uptodate state bits, which is safe
because this code is only called when the parent transid fails.

Signed-off-by: Chris Mason

Chris Mason
2012-02-23 23:43:45 +0800
16780cabb Btrfs: add extra sanity checks on the path names in btrfs_mksubvol ... Browse Code »

Signed-off-by: Chris Mason

Chris Mason
2012-02-23 23:43:45 +0800
a6b0d5c8d Btrfs: make sure we update latest_bdev ... Browse Code »

When we are setting up the mount, we close all the
devices that were not actually part of the metadata we found.

But, we don't make sure that one of those devices wasn't
fs_devices->latest_bdev, which means we can do a use after free
on the one we closed.

This updates latest_bdev as it goes.

Signed-off-by: Chris Mason

Chris Mason
2012-02-23 23:43:45 +0800
fe66a05a0 Btrfs: improve error handling for btrfs_insert_dir_item callers ... Browse Code »

This allows us to gracefully continue if we aren't able to insert
directory items, both for normal files/dirs and snapshots.

Signed-off-by: Chris Mason

Chris Mason
2012-02-23 23:43:45 +0800

21 Feb, 2012

1 commit

692e5759a Btrfs: be less strict on finding next node in clear_extent_bit ... Browse Code »

In clear_extent_bit, it is enough that next node is adjacent in tree level.

Signed-off-by: Liu Bo

Liu Bo
2012-02-21 23:02:10 +0800

17 Feb, 2012

6 commits

d9b0218f6 Btrfs: fix a bug on overcommit stuff ... Browse Code »

When overcommitting, we should check the sum of pinned space and
bytes for delayed item.

Signed-off-by: Liu Bo

Liu Bo
2012-02-17 00:23:18 +0800
9d47c7671 Btrfs: kick out redundant stuff in convert_extent_bit ... Browse Code »

clear_state_bit will do merge_state for us, so kick out the redundant one.

Signed-off-by: Liu Bo

Liu Bo
2012-02-17 00:23:17 +0800
0449314a9 Btrfs: skip states when they does not contain bits to clear ... Browse Code »

Clearing a range's bits is different with setting them, since we don't
need to touch them when states do not contain bits we want.

Signed-off-by: Liu Bo

Liu Bo
2012-02-17 00:23:17 +0800
285190d99 Btrfs: check return value of lookup_extent_mapping() correctly ... Browse Code »

This patch corrects error checking of lookup_extent_mapping().

Signed-off-by: Tsutomu Itoh

Tsutomu Itoh
2012-02-17 00:23:17 +0800
600a45e1d Btrfs: fix deadlock on page lock when doing auto-defragment ... Browse Code »
43

When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
happened.

Steps to reproduce:
[tty0]
# export MOUNT_OPTIONS="-o autodefrag"
# export TEST_DEV=
# export TEST_DIR=
# export SCRATCH_DEV=
# export SCRATCH_MNT=
# while [ 1 ]
> do
> ./check 091 127 263
> sleep 1
> done
[tty1]
# while [ 1 ]
> do
> echo 3 > /proc/sys/vm/drop_caches
> done

Several hours later, the test processes will hang on, and the deadlock will
happen on page lock.

The reason is that:
Auto defrag task Flush thread Test task
btrfs_writepages()
add ordered extent
(including page 1, 2)
set page 1 writeback
set page 2 writeback
endio_fn()
end page 2 writeback
release page 2
lock page 1
alloc and lock page 2
page 2 is not uptodate
btrfs_readpage()
start ordered extent()
btrfs_writepages()
try to lock page 1

so deadlock happens.

Fix this bug by unlocking the page which is in writeback, and re-locking it
after the writeback end.

Signed-off-by: Miao Xie

Miao Xie
2012-02-17 00:23:16 +0800
013bd4c33 Btrfs: fix return value check of extent_io_ops ... Browse Code »

This patch adds the check on the return value of extent_io_ops.

Signed-off-by: Tsutomu Itoh

Tsutomu Itoh
2012-02-17 00:23:16 +0800

16 Feb, 2012

1 commit

12fc9d092 btrfs: honor umask when creating subvol root ... Browse Code »

Set the subvol root inode permissions based on the current umask.

Florian Albrechtskirchinger
2012-02-16 23:35:41 +0800

15 Feb, 2012

9 commits

8a3344269 btrfs: silence warning in raid array setup ... Browse Code »

Raid array setup code creates an extent buffer in an usual way. When the
PAGE_CACHE_SIZE is > super block size, the extent pages are not marked
up-to-date, which triggers a WARN_ON in the following
write_extent_buffer call. Add an explicit up-to-date call to silence the
warning.

Signed-off-by: David Sterba

David Sterba
2012-02-15 23:40:25 +0800
c08782dac btrfs: fix structs where bitfields and spinlock/atomic share 8B word ... Browse Code »

On ia64, powerpc64 and sparc64 the bitfield is modified through a RMW cycle and current
gcc rewrites the adjacent 4B word, which in case of a spinlock or atomic has
disaterous effect.

https://lkml.org/lkml/2012/2/1/220

Signed-off-by: David Sterba

David Sterba
2012-02-15 23:40:25 +0800
87826df0e btrfs: delalloc for page dirtied out-of-band in fixup worker ... Browse Code »

We encountered an issue that was easily observable on s/390 systems but
could really happen anywhere. The timing just seemed to hit reliably
on s/390 with limited memory.

The gist is that when an unexpected set_page_dirty() happened, we'd
run into the BUG() in btrfs_writepage_fixup_worker since it wasn't
properly set up for delalloc.

This patch does the following:
- Performs the missing delalloc in the fixup worker
- Allow the start hook to return -EBUSY which informs __extent_writepage
that it should mark the page skipped and not to redirty it. This is
required since the fixup worker can fail with -ENOSPC and the page
will have already been redirtied. That causes an Oops in
drop_outstanding_extents later. Retrying the fixup worker could
lead to an infinite loop. Deferring the page redirty also saves us
some cycles since the page would be stuck in a resubmit-redirty loop
until the fixup worker completes. It's not harmful, just wasteful.
- If the fixup worker fails, we mark the page and mapping as errored,
and end the writeback, similar to what we would do had the page
actually been submitted to writeback.

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-02-15 23:40:25 +0800
a7e221e90 Btrfs: fix memory leak in load_free_space_cache() ... Browse Code »

load_free_space_cache() has forgotten to free path.

Signed-off-by: Tsutomu Itoh

Tsutomu Itoh
2012-02-15 23:40:24 +0800
859acaf1a btrfs: don't check DUP chunks twice ... Browse Code »

Because scrub enumerates the dev extent tree to find the chunks to scrub,
it currently finds each DUP chunk twice and also scrubs it twice. This
patch makes sure that scrub_chunk only checks that part of the chunk the
dev extent has been found for. This only changes the behaviour for DUP
chunks.

Reported-and-tested-by: Stefan Behrens
Signed-off-by: Arne Jansen

Arne Jansen
2012-02-15 23:40:24 +0800
2cac13e41 Btrfs: fix trim 0 bytes after a device delete ... Browse Code »

A user reported a bug of btrfs's trim, that is we will trim 0 bytes
after a device delete.

The reproducer:

$ mkfs.btrfs disk1
$ mkfs.btrfs disk2
$ mount disk1 /mnt
$ fstrim -v /mnt
$ btrfs device add disk2 /mnt
$ btrfs device del disk1 /mnt
$ fstrim -v /mnt

This is because after we delete the device, the block group may start from
a non-zero place, which will confuse trim to discard nothing.

Reported-by: Lutz Euler
Signed-off-by: Liu Bo

Liu Bo
2012-02-15 23:40:23 +0800
6af021d8f Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call fai… ... Browse Code »

…led for SEEK_DATA/SEEK_HOLE inquiry

Given that ENXIO only means "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry
in a desired file range, so we should return the internal error unchanged if btrfs_get_extent_fiemap()
call failed, rather than ENXIO.

Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>

Jeff Liu
2012-02-15 23:40:23 +0800
8f24b4968 Btrfs: avoid positive number with ERR_PTR ... Browse Code »

inode_ref_info() returns 1 when the element wasn't found and < 0 on error,
just like btrfs_search_slot(). In iref_to_path() it's an error when the
inode ref can't be found, thus we return ERR_PTR(ret) in that case. In order
to avoid ERR_PTR(1), we now set ret to -ENOENT in that case.

Signed-off-by: Jan Schmidt

Jan Schmidt
2012-02-15 23:40:23 +0800
941b2ddf7 btrfs: Sector Size check during Mount ... Browse Code »

Gracefully fail when trying to mount a BTRFS file system that has a
sectorsize smaller than PAGE_SIZE.

On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel
then boot into a 64K PAGE_SIZE kernel. Presently open_ctree fails in an
endless loop and hangs the machine in this situation.

My debugging has show this Sector size < Page size to be a non trivial
situation and a graceful exit from the situation would be nice for the
time being.

Signed-off-by: Keith Mannthey

Keith Mannthey
2012-02-15 23:40:22 +0800

01 Feb, 2012

1 commit

d98456fca Btrfs: don't reserve data with extents locked in btrfs_fallocate ... Browse Code »

btrfs_fallocate tries to allocate space only if ranges in the file don't
already exist. But the enospc checks it does are not allowed with
extents locked.

Signed-off-by: Chris Mason

Chris Mason
2012-02-01 09:27:41 +0800

27 Jan, 2012

11 commits

9998eb703 Btrfs: fix reservations in btrfs_page_mkwrite ... Browse Code »

Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.

This makes sure we only release if we were able to reserve.

Signed-off-by: Chris Mason

Chris Mason
2012-01-27 23:44:44 +0800
9b2306284 Btrfs: advance window_start if we're using a bitmap ... Browse Code »

If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:12 +0800
0c4e538bc btrfs: mask out gfp flags in releasepage ... Browse Code »

btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state

3399 if ((mask & GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;

will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963

[] cache_alloc_refill+0x3b4/0x5c8
[] kmem_cache_alloc+0x204/0x294
[] mempool_alloc+0x52/0x170
[] alloc_extent_state+0x40/0xd4 [btrfs]
[] __clear_extent_bit+0x38a/0x4cc [btrfs]
[] try_release_extent_state+0x9c/0xd4 [btrfs]
[] btree_releasepage+0x7e/0xd0 [btrfs]
[] shrink_page_list+0x6a0/0x724
[] shrink_inactive_list+0x230/0x578
[] shrink_list+0x6c/0x120
[] shrink_zone+0x1e2/0x228
[] shrink_zones+0x90/0x254
[] do_try_to_free_pages+0xac/0x420
[] try_to_free_pages+0x13c/0x1b0
[] __alloc_pages_nodemask+0x5b4/0x9a8
[] grab_cache_page_write_begin+0x7e/0xe8

Signed-off-by: David Sterba
Signed-off-by: Chris Mason

David Sterba
2012-01-27 04:01:12 +0800
9e622d6be Btrfs: fix enospc error caused by wrong checks of the chunk ... Browse Code »

When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.

Reproduce steps:
# mkfs.btrfs -b $((2 * 1024 * 1024 * 1024))
# mount /mnt
# ulimit -n 102400
# cd /mnt
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr prepare
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr run

The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.

One is
if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.

The other is
if (space_info->force_alloc)
force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.

Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2012-01-27 04:01:12 +0800
7ec31b548 Btrfs: do not defrag a file partially ... Browse Code »

xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2

To fix this, we need to set max_to_defrag count properly.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-01-27 04:01:12 +0800
0b485143d Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c ... Browse Code »

There have been 4 warnings on 32-bit build, they are herewith fixed.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-01-27 04:01:11 +0800
0b4a9d248 Btrfs: use cluster->window_start when allocating from a cluster bitmap ... Browse Code »

We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:11 +0800
8bedd51b6 Btrfs: Check for NULL page in extent_range_uptodate ... Browse Code »

A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors. The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).

After testing this patch, the user reported that the error with
the NULL pointer oops was solved. However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.c

for (i = start_i; i < num_pages; i++) {
page = extent_buffer_page(eb, i);
wait_on_page_locked(page);
if (!PageUptodate(page))
ret = -EIO;
}

This patch leaves the issue with the locked page yet to be resolved.

Signed-off-by: Mitch Harder
Signed-off-by: Chris Mason

Mitch Harder
2012-01-27 04:01:11 +0800
6dd70ce4e btrfs: Fix busyloops in transaction waiting code ... Browse Code »

wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...

Signed-off-by: Jan Kara
Signed-off-by: Chris Mason

Jan Kara
2012-01-27 04:01:11 +0800
357b9784b Btrfs: make sure a bitmap has enough bytes ... Browse Code »

We have only been checking for min_bytes available in bitmap entries, but we
won't successfully setup a bitmap cluster unless it has at least bytes in the
bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
if there are a bunch of bitmap entries with less than 2mb's in them, we'll
search all them anyway, which is suboptimal. Fix this check. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-27 04:01:11 +0800
b1375d64c Btrfs: fix uninit warning in backref.c ... Browse Code »

Added initialization with the declaration of ret. It isn't set later on the
switch-default branch (which should never be taken).

Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason

Jan Schmidt
2012-01-27 04:01:11 +0800

17 Jan, 2012

7 commits

96bdc7dc6 Btrfs: use larger system chunks ... Browse Code »

system chunks by default are very small. This makes them slightly
larger and also fixes the conditional checks to make sure we don't
allocate a billion of them at once.

Signed-off-by: Chris Mason

Chris Mason
2012-01-17 04:38:24 +0800
f248679e8 Btrfs: add a delalloc mutex to inodes for delalloc reservations ... Browse Code »

I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
that and theres no real way to get rid of those, so just stop using i_mutex to
protect delalloc metadata reservations and use a delalloc mutex instead. This
shouldn't be contended often at all, only if you are writing and mmap writing to
the file at the same time. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:43 +0800
8c2a3ca20 Btrfs: space leak tracepoints ... Browse Code »

This in addition to a script in my btrfs-tracing tree will help track down space
leaks when we're getting space left over in block groups on umount. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:43 +0800
90290e198 Btrfs: protect orphan block rsv with spin_lock ... Browse Code »

We've been seeing warnings coming out of the orphan commit stuff forever from
ceph. Turns out it's because we're racing with checking if the orphan block
reserve is set, because we clear it outside of the spin_lock. So leave the
normal fastpath checks where they are, but take the spin_lock and _recheck_ to
make sure we haven't had an orphan block rsv added in the meantime. Then clear
the root's orphan block rsv and release the lock. With this patch a user said
the warnings went away and they usually showed up pretty soon after he started
ceph. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:42 +0800
3f7de037f Btrfs: add allocator tracepoints ... Browse Code »

I used these tracepoints when figuring out what the cluster stuff was doing, so
add them to mainline in case we need to profile this stuff again. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:42 +0800
45a8090e6 Btrfs: don't call btrfs_throttle in file write ... Browse Code »

Btrfs_throttle will make us wait if there is a currently committing transaction
until we can open new transactions, which is ridiculous since we don't actually
start any transactions within the file write path anyway, so all this does is
introduce big latencies if we have a sync/fsync heavy workload going on while
somebody else is trying to do work. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-17 04:28:55 +0800
ec39e180f Btrfs: release space on error in page_mkwrite ... Browse Code »

If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-01-17 04:28:54 +0800