Doug / smarc-fsl-linux-kernel | Embedian Git Server

29 Mar, 2012

6 commits

e1f041e14 Btrfs: update to the right index of defragment ... Browse Code »

When we use autodefrag, we forget to update the index which indicates
the last page we've dirty. And we'll set dirty flags on a same set of
pages again and again.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-03-29 21:57:45 +0800
66c268922 Btrfs: do not bother to defrag an extent if it is a big real extent ... Browse Code »

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
$ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 3072 10 eof
/mnt/btrfs/foobar: 1 extent found

Now we have a big real extent [0, 40960), but autodefrag will still defrag it.

$ sync
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 3082 10 eof
/mnt/btrfs/foobar: 1 extent found

So if we already find a big real extent, we're ok about that, just skip it.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-03-29 21:57:45 +0800
17ce6ef8d Btrfs: add a check to decide if we should defrag the range ... Browse Code »

If our file's layout is as follows:
| hole | data1 | hole | data2 |

we do not need to defrag this file, because this file has holes and
cannot be merged into one extent.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-03-29 21:57:45 +0800
1f12bd063 Btrfs: fix the mismatch of page->mapping ... Browse Code »

commit 600a45e1d5e376f679ff9ecc4ce9452710a6d27c
(Btrfs: fix deadlock on page lock when doing auto-defragment)
fixes the deadlock on page, but it also introduces another bug.

A page may have been truncated after unlock & lock.
So we need to find it again to get the right one.

And since we've held i_mutex lock, inode size remains unchanged and
we can drop isize overflow checks.

Signed-off-by: Liu Bo
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Liu Bo
2012-03-29 21:57:44 +0800
ecb8bea87 Btrfs: fix race between direct io and autodefrag ... Browse Code »

The bug is from running xfstests 209 with autodefrag.

The race is as follows:
t1 t2(autodefrag)
direct IO
invalidate pagecache
dio(old data) add_inode_defrag
invalidate pagecache
endio

direct IO
invalidate pagecache
run_defrag
readpage(old data)
set page dirty (old data)
dio(new data, rewrite)
invalidate pagecache (*)
endio

t2(autodefrag) will get old data into pagecache via readpage and set
pagecache dirty. Meanwhile, invalidate pagecache(*) will fail due to
dirty flags in pages. So the old data may be flushed into disk by
flush thread, which will lead to data loss.

And so does the case of user defragment progs.

The patch fixes this race by holding i_mutex when we readpage and set page dirty.

Signed-off-by: Liu Bo
Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Liu Bo
2012-03-29 21:57:44 +0800
98961a7e4 Merge git://git.jan-o-sch.net/btrfs-unstable into for-linus ... Browse Code »

Conflicts:
fs/btrfs/transaction.c

Signed-off-by: Chris Mason

Chris Mason
2012-03-29 08:33:40 +0800

27 Mar, 2012

1 commit

7a3ae2f8c Btrfs: fix regression in scrub path resolving ... Browse Code »

In commit 4692cf58 we introduced new backref walking code for btrfs. This
assumes we're searching live roots, which requires a transaction context.
While scrubbing, however, we must not join a transaction because this could
deadlock with the commit path. Additionally, what scrub really wants to do
is resolving a logical address in the commit root it's currently checking.

This patch adds support for logical to path resolving on commit roots and
makes scrub use that.

Signed-off-by: Jan Schmidt

Jan Schmidt
2012-03-27 20:51:21 +0800

22 Mar, 2012

3 commits

79787eaab btrfs: replace many BUG_ONs with proper error handling ... Browse Code »

btrfs currently handles most errors with BUG_ON. This patch is a work-in-
progress but aims to handle most errors other than internal logic
errors and ENOMEM more gracefully.

This iteration prevents most crashes but can run into lockups with
the page lock on occasion when the timing "works out."

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 18:52:54 +0800
ce598979b btrfs: Don't BUG_ON errors from btrfs_create_subvol_root() ... Browse Code »

This is called from only one place - create_subvol() which passes errors
safely back out to it's caller, btrfs_mksubvol where they are handled.

Additionally, btrfs_create_subvol_root() itself bug's needlessly from error
return of btrfs_update_inode(). Since create_subvol() was fixed to catch
errors we can bubble this one up too.

Signed-off-by: Mark Fasheh

Mark Fasheh
2012-03-22 08:45:36 +0800
d0082371c btrfs: drop gfp_t from lock_extent ... Browse Code »

lock_extent and unlock_extent are always called with GFP_NOFS, drop the
argument and use GFP_NOFS consistently.

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:35 +0800

25 Feb, 2012

1 commit

855a85f70 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Quoth Chris:
"This is later than I wanted because I got backed up running through
btrfs bugs from the Oracle QA teams. But they are all bug fixes that
we've queued and tested since rc1.

Nothing in particular stands out, this just reflects bug fixing and QA
done in parallel by all the btrfs developers. The most user visible
of these is:

Btrfs: clear the extent uptodate bits during parent transid failures

Because that helps deal with out of date drives (say an iscsi disk
that has gone away and come back). The old code wasn't always
properly retrying the other mirror for this type of failure."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
Btrfs: fix compiler warnings on 32 bit systems
Btrfs: increase the global block reserve estimates
Btrfs: clear the extent uptodate bits during parent transid failures
Btrfs: add extra sanity checks on the path names in btrfs_mksubvol
Btrfs: make sure we update latest_bdev
Btrfs: improve error handling for btrfs_insert_dir_item callers
Btrfs: be less strict on finding next node in clear_extent_bit
Btrfs: fix a bug on overcommit stuff
Btrfs: kick out redundant stuff in convert_extent_bit
Btrfs: skip states when they does not contain bits to clear
Btrfs: check return value of lookup_extent_mapping() correctly
Btrfs: fix deadlock on page lock when doing auto-defragment
Btrfs: fix return value check of extent_io_ops
btrfs: honor umask when creating subvol root
btrfs: silence warning in raid array setup
btrfs: fix structs where bitfields and spinlock/atomic share 8B word
btrfs: delalloc for page dirtied out-of-band in fixup worker
Btrfs: fix memory leak in load_free_space_cache()
btrfs: don't check DUP chunks twice
Btrfs: fix trim 0 bytes after a device delete
...

Linus Torvalds
2012-02-25 01:02:53 +0800

23 Feb, 2012

1 commit

16780cabb Btrfs: add extra sanity checks on the path names in btrfs_mksubvol ... Browse Code »

Signed-off-by: Chris Mason

Chris Mason
2012-02-23 23:43:45 +0800

17 Feb, 2012

1 commit

600a45e1d Btrfs: fix deadlock on page lock when doing auto-defragment ... Browse Code »

When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
happened.

Steps to reproduce:
[tty0]
# export MOUNT_OPTIONS="-o autodefrag"
# export TEST_DEV=
# export TEST_DIR=
# export SCRATCH_DEV=
# export SCRATCH_MNT=
# while [ 1 ]
> do
> ./check 091 127 263
> sleep 1
> done
[tty1]
# while [ 1 ]
> do
> echo 3 > /proc/sys/vm/drop_caches
> done

Several hours later, the test processes will hang on, and the deadlock will
happen on page lock.

The reason is that:
Auto defrag task Flush thread Test task
btrfs_writepages()
add ordered extent
(including page 1, 2)
set page 1 writeback
set page 2 writeback
endio_fn()
end page 2 writeback
release page 2
lock page 1
alloc and lock page 2
page 2 is not uptodate
btrfs_readpage()
start ordered extent()
btrfs_writepages()
try to lock page 1

so deadlock happens.

Fix this bug by unlocking the page which is in writeback, and re-locking it
after the writeback end.

Signed-off-by: Miao Xie

Miao Xie
2012-02-17 00:23:16 +0800

29 Jan, 2012

1 commit

67d2433ee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix reservations in btrfs_page_mkwrite
Btrfs: advance window_start if we're using a bitmap
btrfs: mask out gfp flags in releasepage
Btrfs: fix enospc error caused by wrong checks of the chunk
Btrfs: do not defrag a file partially
Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
Btrfs: use cluster->window_start when allocating from a cluster bitmap
Btrfs: Check for NULL page in extent_range_uptodate
btrfs: Fix busyloops in transaction waiting code
Btrfs: make sure a bitmap has enough bytes
Btrfs: fix uninit warning in backref.c

Linus Torvalds
2012-01-29 09:00:19 +0800

27 Jan, 2012

1 commit

7ec31b548 Btrfs: do not defrag a file partially ... Browse Code »

xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2

To fix this, we need to set max_to_defrag count properly.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-01-27 04:01:12 +0800

18 Jan, 2012

2 commits

d65773b22 Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
btrfs: take allocation of ->tree_root into open_ctree()
btrfs: let ->s_fs_info point to fs_info, not root...
btrfs: consolidate failure exits in btrfs_mount() a bit
btrfs: make free_fs_info() call ->kill_sb() unconditional
btrfs: merge free_fs_info() calls on fill_super failures
btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
btrfs: make open_ctree() return int
btrfs: sanitizing ->fs_info, part 5
btrfs: sanitizing ->fs_info, part 4
btrfs: sanitizing ->fs_info, part 3
btrfs: sanitizing ->fs_info, part 2
btrfs: sanitizing ->fs_info, part 1
btrfs: fix a deadlock in btrfs_scan_one_device()
btrfs: fix mount/umount race
btrfs: get ->kill_sb() of its own
btrfs: preparation to fixing mount/umount race

Linus Torvalds
2012-01-18 07:52:51 +0800
f9156c728 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
Btrfs: use larger system chunks
Btrfs: add a delalloc mutex to inodes for delalloc reservations
Btrfs: space leak tracepoints
Btrfs: protect orphan block rsv with spin_lock
Btrfs: add allocator tracepoints
Btrfs: don't call btrfs_throttle in file write
Btrfs: release space on error in page_mkwrite
Btrfs: fix btrfsck error 400 when truncating a compressed
Btrfs: do not use btrfs_end_transaction_throttle everywhere
Btrfs: add balance progress reporting
Btrfs: allow for resuming restriper after it was paused
Btrfs: allow for canceling restriper
Btrfs: allow for pausing restriper
Btrfs: add skip_balance mount option
Btrfs: recover balance on mount
Btrfs: save balance parameters to disk
Btrfs: soft profile changing mode (aka soft convert)
Btrfs: implement online profile changing
Btrfs: do not reduce profile in do_chunk_alloc()
Btrfs: virtual address space subset filter
...

Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
mnt_drop_write_file() helper.

Linus Torvalds
2012-01-18 07:49:54 +0800

17 Jan, 2012

9 commits

f248679e8 Btrfs: add a delalloc mutex to inodes for delalloc reservations ... Browse Code »

I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
that and theres no real way to get rid of those, so just stop using i_mutex to
protect delalloc metadata reservations and use a delalloc mutex instead. This
shouldn't be contended often at all, only if you are writing and mmap writing to
the file at the same time. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-01-17 04:29:43 +0800
9785dbdf2 Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into integration Browse Code »

Chris Mason
2012-01-17 04:26:31 +0800
d756bd2d9 Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration ... Browse Code »

Conflicts:
fs/btrfs/volumes.c

Signed-off-by: Chris Mason

Chris Mason
2012-01-17 04:26:17 +0800
19a39dce3 Btrfs: add balance progress reporting ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
de322263d Btrfs: allow for resuming restriper after it was paused ... Browse Code »

Recognize BTRFS_BALANCE_RESUME flag passed from userspace. We use the
same heuristics used when recovering balance after a crash to try to
start where we left off last time.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
a7e99c691 Btrfs: allow for canceling restriper ... Browse Code »

Implement an ioctl for canceling restriper. Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit. Balance item is deleted and no memory
about the interrupted balance is kept.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
837d5b6e4 Btrfs: allow for pausing restriper ... Browse Code »

Implement an ioctl for pausing restriper. This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc. If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount. (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
f43ffb60f Btrfs: add basic infrastructure for selective balancing ... Browse Code »

This allows to have a separate set of filters for each chunk type
(data,meta,sys). The code however is generic and switch on chunk type
is only done once.

This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:47 +0800
c9e9f97bd Btrfs: add basic restriper infrastructure ... Browse Code »

Add basic restriper infrastructure: extended balancing ioctl and all
related ioctl data structures, add data structure for tracking
restriper's state to fs_info, etc. The semantics of the old balancing
ioctl are fully preserved.

Explicitly disallow any volume operations when balance is in progress.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:47 +0800

11 Jan, 2012

2 commits

4da6f1a33 Btrfs: reserve metadata space in btrfs_ioctl_setflags() ... Browse Code »

Check and reserve space for btrfs_update_inode().

Signed-off-by: Li Zefan

Li Zefan
2012-01-11 10:26:39 +0800
f062abf08 Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags() ... Browse Code »

We can recover from errors and return -errno to user space.

Signed-off-by: Li Zefan

Li Zefan
2012-01-11 10:26:38 +0800

09 Jan, 2012

1 commit

815745cf3 btrfs: let ->s_fs_info point to fs_info, not root... ... Browse Code »

the latter can be obtained from the former (by looking as ->tree_root)
just as cheaply as we currently are doing the other way round.

Signed-off-by: Al Viro

Al Viro
2012-01-09 08:35:37 +0800

05 Jan, 2012

1 commit

4692cf58a Btrfs: new backref walking code ... Browse Code »

The old backref iteration code could only safely be used on commit roots.
Besides this limitation, it had bugs in finding the roots for these
references. This commit replaces large parts of it by btrfs_find_all_roots()
which a) really finds all roots and the correct roots, b) works correctly
under heavy file system load, c) considers delayed refs.

Signed-off-by: Jan Schmidt

Jan Schmidt
2012-01-05 17:49:43 +0800

04 Jan, 2012

2 commits

2a79f17e4 vfs: mnt_drop_write_file() ... Browse Code »

new helper (wrapper around mnt_drop_write()) to be used in pair with
mnt_want_write_file().

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:52:40 +0800
a561be710 switch a bunch of places to mnt_want_write_file() ... Browse Code »

it's both faster (in case when file has been opened for write) and cleaner.

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:52:35 +0800

22 Dec, 2011

1 commit

66d7e7f09 Btrfs: mark delayed refs as for cow ... Browse Code »

Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
from every call site. The for_cow parameter will later on be used to
determine if a ref will change anything with respect to qgroups.

Delayed refs coming from relocation are always counted as for_cow, as they
don't change subvol quota.

Also pass in the fs_info for later use.

btrfs_find_all_roots() will use this as an optimization, as changes that are
for_cow will not change anything with respect to which root points to a
certain leaf. Thus, we don't need to add the current sequence number to
those delayed refs.

Signed-off-by: Arne Jansen
Signed-off-by: Jan Schmidt

Arne Jansen
2011-12-22 23:22:27 +0800

16 Dec, 2011

2 commits

567a45e91 Merge branch 'for-chris' of http://git.kernel.org/pub/scm/linux/kernel/git/josef… ... Browse Code »

…/btrfs-work into integration

Conflicts:
fs/btrfs/inode.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Chris Mason
2011-12-16 02:43:49 +0800
660d3f6cd Btrfs: fix how we do delalloc reservations and how we free reservations on error ... Browse Code »

Running xfstests 269 with some tracing my scripts kept spitting out errors about
releasing bytes that we didn't actually have reserved. This took me down a huge
rabbit hole and it turns out the way we deal with reserved_extents is wrong,
we need to only be setting it if the reservation succeeds, otherwise the free()
method will come in and unreserve space that isn't actually reserved yet, which
can lead to other warnings and such. The math was all working out right in the
end, but it caused all sorts of other issues in addition to making my scripts
yell and scream and generally make it impossible for me to track down the
original issue I was looking for. The other problem is with our error handling
in the reservation code. There are two cases that we need to deal with

1) We raced with free. In this case free won't free anything because csum_bytes
is modified before we dro the lock in our reservation path, so free rightly
doesn't release any space because the reservation code may be depending on that
reservation. However if we fail, we need the reservation side to do the free at
that point since that space is no longer in use. So as it stands the code was
doing this fine and it worked out, except in case #2

2) We don't race with free. Nobody comes in and changes anything, and our
reservation fails. In this case we didn't reserve anything anyway and we just
need to clean up csum_bytes but not free anything. So we keep track of
csum_bytes before we drop the lock and if it hasn't changed we know we can just
decrement csum_bytes and carry on.

Because of the case where we can race with free()'s since we have to drop our
spin_lock to do the reservation, I'm going to serialize all reservations with
the i_mutex. We already get this for free in the heavy use paths, truncate and
file write all hold the i_mutex, just needed to add it to page_mkwrite and
various ioctl/balance things. With this patch my space leak scripts no longer
scream bloody murder. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-12-16 00:04:22 +0800

15 Dec, 2011

1 commit

306424cc8 Btrfs: fix ctime update of on-disk inode ... Browse Code »

To reproduce the bug:

# touch /mnt/tmp
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:23.412105981 +0800
# chattr +i /mnt/tmp
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:43.198105295 +0800
# umount /mnt
# mount /dev/loop1 /mnt
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:23.412105981 +0800

We should update ctime of in-memory inode before calling
btrfs_update_inode().

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-12-15 23:50:37 +0800

01 Dec, 2011

1 commit

ece7d20e8 Btrfs: Don't error on resizing FS to same size ... Browse Code »

It seems overly harsh to fail a resize of a btrfs file system to the
same size when a shrink or grow would succeed. User app GParted trips
over this error. Allow it by bypassing the shrink or grow operation.

Signed-off-by: Mike Fleetwood

Mike Fleetwood
2011-12-01 01:46:04 +0800

20 Nov, 2011

2 commits

5bb146823 Btrfs: prefix resize related printks with btrfs: ... Browse Code »

For the user it is confusing to find something like:
[10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
in kernel log, because it doesn't point directly to btrfs.

This patch prefixes those messages with "btrfs:" like other btrfs
related printks.

Signed-off-by: Arnd Hannemann
Signed-off-by: Chris Mason

Arnd Hannemann
2011-11-20 20:42:16 +0800
745c4d8e1 btrfs: Fix up 32/64-bit compatibility for new ioctls ... Browse Code »

This patch casts to unsigned long before casting to a pointer and fixes
the following warnings:
fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Signed-off-by: Jeff Mahoney
Signed-off-by: Chris Mason

Jeff Mahoney
2011-11-20 20:42:13 +0800

06 Nov, 2011

1 commit

740c3d226 Btrfs: fix the new inspection ioctls for 32 bit compat ... Browse Code »

The new ioctls to follow backrefs are not clean for 32/64 bit
compat. This reworks them for u64s everywhere. They are brand new, so
there are no problems with changing the interface now.

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:08:49 +0800