Eric Lee / smarc-fsl-linux-kernel

16 Dec, 2011

1 commit

660d3f6cd Btrfs: fix how we do delalloc reservations and how we free reservations on error ... Browse Code »

Running xfstests 269 with some tracing my scripts kept spitting out errors about
releasing bytes that we didn't actually have reserved. This took me down a huge
rabbit hole and it turns out the way we deal with reserved_extents is wrong,
we need to only be setting it if the reservation succeeds, otherwise the free()
method will come in and unreserve space that isn't actually reserved yet, which
can lead to other warnings and such. The math was all working out right in the
end, but it caused all sorts of other issues in addition to making my scripts
yell and scream and generally make it impossible for me to track down the
original issue I was looking for. The other problem is with our error handling
in the reservation code. There are two cases that we need to deal with

1) We raced with free. In this case free won't free anything because csum_bytes
is modified before we dro the lock in our reservation path, so free rightly
doesn't release any space because the reservation code may be depending on that
reservation. However if we fail, we need the reservation side to do the free at
that point since that space is no longer in use. So as it stands the code was
doing this fine and it worked out, except in case #2

2) We don't race with free. Nobody comes in and changes anything, and our
reservation fails. In this case we didn't reserve anything anyway and we just
need to clean up csum_bytes but not free anything. So we keep track of
csum_bytes before we drop the lock and if it hasn't changed we know we can just
decrement csum_bytes and carry on.

Because of the case where we can race with free()'s since we have to drop our
spin_lock to do the reservation, I'm going to serialize all reservations with
the i_mutex. We already get this for free in the heavy use paths, truncate and
file write all hold the i_mutex, just needed to add it to page_mkwrite and
various ioctl/balance things. With this patch my space leak scripts no longer
scream bloody murder. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-12-16 00:04:22 +0800

11 Nov, 2011

1 commit

76b9e23d2 Btrfs: fix orphan backref nodes ... Browse Code »

If the root node of a fs/file tree is in the block group that is
being relocated, but the others are not in the other block groups.
when we create a snapshot for this tree between the relocation tree
creation ends and ->create_reloc_tree is set to 0, Btrfs will create
some backref nodes that are the lowest nodes of the backrefs cache.
But we forget to add them into ->leaves list of the backref cache
and deal with them, and at last, they will triggered BUG_ON().

kernel BUG at fs/btrfs/relocation.c:239!

This patch fixes it by adding them into ->leaves list of backref cache.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2011-11-11 09:45:05 +0800

21 Oct, 2011

1 commit

84850e8d8 btrfs: check file extent backref offset underflow ... Browse Code »

Offset field in data extent backref can underflow if clone range ioctl
is used. We can reliably detect the underflow because max file size is
limited to 2^63 and max data extent size is limited by block group size.

Signed-off-by: Zheng Yan

Yan, Zheng
2011-10-21 00:10:31 +0800

20 Oct, 2011

6 commits

36ba022ac Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions ... Browse Code »

Currently btrfs_block_rsv_check does 2 things, it will either refill a block
reserve like in the truncate or refill case, or it will check to see if there is
enough space in the global reserve and possibly refill it. However because of
overcommit we could be well overcommitting ourselves just to try and refill the
global reserve, when really we should just be committing the transaction. So
breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check. Refill
will try to reserve more metadata if it can and btrfs_block_rsv_check will not,
it will only tell you if the factor of the total space is still reserved.
Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:59 +0800
3b16a4e3c Btrfs: use the inode's mapping mask for allocating pages ... Browse Code »

Johannes pointed out we were allocating only kernel pages for doing writes,
which is kind of a big deal if you are on 32bit and have more than a gig of ram.
So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we
don't re-enter. Thanks,

Reported-by: Johannes Weiner
Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:45 +0800
4a92b1b8d Btrfs: stop passing a trans handle all around the reservation code ... Browse Code »

The only thing that we need to have a trans handle for is in
reserve_metadata_bytes and thats to know how much flushing we can do. So
instead of passing it around, just check current->journal_info for a
trans_handle so we know if we can commit a transaction to try and free up space
or not. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:44 +0800
482e6dc52 Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check ... Browse Code »

If you run xfstest 224 it you will get lots of messages about not being able to
delete inodes and that they will be cleaned up next mount. This is because
btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to
flush, so if there was not enough space, it simply failed. But in truncate and
evict case we could easily flush space to try and get enough space to do our
work, so make btrfs_block_rsv_check take a flush argument to pass down to
reserve_metadata_bytes. Now xfstests 224 runs fine without all those
complaints. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:38 +0800
dabdb6408 Btrfs: kill unused parts of block_rsv ... Browse Code »

The priority and refill_used flags are not used anymore, and neither is the
usage counter, so just remove them from btrfs_block_rsv.

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:34 +0800
37be25bcb Btrfs: kill the durable block rsv stuff ... Browse Code »

This is confusing code and isn't used by anything anymore, so delete it.

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:32 +0800

28 Jul, 2011

2 commits

ff95acb67 Merge branch 'integration' into for-linus Browse Code »

Chris Mason
2011-07-28 04:18:13 +0800
a94733d0b Btrfs: use find_or_create_page instead of grab_cache_page ... Browse Code »

grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we
need GFP_NOFS so we don't deadlock. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-07-28 00:46:43 +0800

20 Jun, 2011

1 commit

90a800de0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: avoid delayed metadata items during commits
btrfs: fix uninitialized return value
btrfs: fix wrong reservation when doing delayed inode operations
btrfs: Remove unused sysfs code
btrfs: fix dereference of ERR_PTR value
Btrfs: fix relocation races
Btrfs: set no_trans_join after trying to expand the transaction
Btrfs: protect the pending_snapshots list with trans_lock
Btrfs: fix path leakage on subvol deletion
Btrfs: drop the delalloc_bytes check in shrink_delalloc
Btrfs: check the return value from set_anon_super

Linus Torvalds
2011-06-20 23:58:53 +0800

18 Jun, 2011

1 commit

7585717f3 Btrfs: fix relocation races ... Browse Code »

The recent commit to get rid of our trans_mutex introduced
some races with block group relocation. The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations. The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.

Signed-off-by: Chris Mason

Chris Mason
2011-06-18 01:36:58 +0800

05 Jun, 2011

1 commit

e6ece7073 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
btrfs: fix uninitialized variable warning
btrfs: add helper for fs_info->closing
Btrfs: add mount -o inode_cache
btrfs: scrub: add explicit plugging
btrfs: use btrfs_ino to access inode number
Btrfs: don't save the inode cache if we are deleting this root
btrfs: false BUG_ON when degraded
Btrfs: don't save the inode cache in non-FS roots
Btrfs: make sure we don't overflow the free space cache crc page
Btrfs: fix uninit variable in the delayed inode code
btrfs: scrub: don't reuse bios and pages
Btrfs: leave spinning on lookup and map the leaf
Btrfs: check for duplicate entries in the free space cache
Btrfs: don't try to allocate from a block group that doesn't have enough space
Btrfs: don't always do readahead
Btrfs: try not to sleep as much when doing slow caching
Btrfs: kill BTRFS_I(inode)->block_group
Btrfs: don't look at the extent buffer level 3 times in a row
Btrfs: map the node block when looking for readahead targets
Btrfs: set range_start to the right start in count_range_bits
...

Linus Torvalds
2011-06-05 05:17:23 +0800

28 May, 2011

2 commits

ff5714cca Merge branch 'for-chris' of ... Browse Code »

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus

Conflicts:
fs/btrfs/disk-io.c
fs/btrfs/extent-tree.c
fs/btrfs/free-space-cache.c
fs/btrfs/inode.c
fs/btrfs/transaction.c

Signed-off-by: Chris Mason

Chris Mason
2011-05-28 19:00:39 +0800
a0c306109 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (58 commits)
Btrfs: use the device_list_mutex during write_dev_supers
Btrfs: setup free ino caching in a more asynchronous way
btrfs scrub: don't coalesce pages that are logically discontiguous
Btrfs: return -ENOMEM in clear_extent_bit
Btrfs: add mount -o auto_defrag
Btrfs: using rcu lock in the reader side of devices list
Btrfs: drop unnecessary device lock
Btrfs: fix the race between remove dev and alloc chunk
Btrfs: fix the race between reading and updating devices
Btrfs: fix bh leak on __btrfs_open_devices path
Btrfs: fix unsafe usage of merge_state
Btrfs: allocate extent state and check the result properly
fs/btrfs: Add missing btrfs_free_path
Btrfs: check return value of btrfs_inc_extent_ref()
Btrfs: return error to caller if read_one_inode() fails
Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item
Btrfs: return error code to caller when btrfs_del_item fails
Btrfs: return error code to caller when btrfs_previous_item fails
btrfs: fix typo 'testeing' -> 'testing'
btrfs: typo: 'btrfS' -> 'btrfs'
...

Linus Torvalds
2011-05-28 04:57:12 +0800

24 May, 2011

3 commits

026fd3178 Btrfs: don't always do readahead ... Browse Code »

Our readahead is sort of sloppy, and really isn't always needed. For example if
ls is doing a stating ls (which is the default) it's going to stat in non-disk
order, so if say you have a directory with a stupid amount of files, readahead
is going to do nothing but waste time in the case of doing the stat. Taking the
unconditional readahead out made my test go from 57 minutes to 36 minutes. This
means that everywhere we do loop through the tree we want to make sure we do set
path->reada properly, so I went through and found all of the places where we
loop through the path and set reada to 1. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-05-24 01:03:14 +0800
a4abeea41 Btrfs: kill trans_mutex ... Browse Code »

We use trans_mutex for lots of things, here's a basic list

1) To serialize trans_handles joining the currently running transaction
2) To make sure that no new trans handles are started while we are committing
3) To protect the dead_roots list and the transaction lists

Really the serializing trans_handles joining is not too hard, and can really get
bogged down in acquiring a reference to the transaction. So replace the
trans_mutex with a trans_lock spinlock and use it to do the following

1) Protect fs_info->running_transaction. All trans handles have to do is check
this, and then take a reference of the transaction and keep on going.
2) Protect the fs_info->trans_list. This doesn't get used too much, basically
it just holds the current transactions, which will usually just be the currently
committing transaction and the currently running transaction at most.
3) Protect the dead roots list. This is only ever processed by splicing the
list so this is relatively simple.
4) Protect the fs_info->reloc_ctl stuff. This is very lightweight and was using
the trans_mutex before, so this is a pretty straightforward change.
5) Protect fs_info->no_trans_join. Because we don't hold the trans_lock over
the entirety of the commit we need to have a way to block new people from
creating a new transaction while we're doing our work. So we set no_trans_join
and in join_transaction we test to see if that is set, and if it is we do a
wait_on_commit.
6) Make the transaction use count atomic so we don't need to take locks to
modify it when we're dropping references.
7) Add a commit_lock to the transaction to make sure multiple people trying to
commit the same transaction don't race and commit at the same time.
8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
trans.

I have tested this with xfstests, but obviously it is a pretty hairy change so
lots of testing is greatly appreciated. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-05-24 01:00:57 +0800
7a7eaa40a Btrfs: take away the num_items argument from btrfs_join_transaction ... Browse Code »

I keep forgetting that btrfs_join_transaction() just ignores the num_items
argument, which leads me to sending pointless patches and looking stupid :). So
just kill the num_items argument from btrfs_join_transaction and
btrfs_start_ioctl_transaction, since neither of them use it. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-05-24 01:00:56 +0800

23 May, 2011

2 commits

712673339 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/b… ... Browse Code »

…trfs-unstable-arne into inode_numbers

Conflicts:
fs/btrfs/Makefile
fs/btrfs/ctree.h
fs/btrfs/volumes.h

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Chris Mason
2011-05-23 18:30:52 +0800
945d8962c Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into inode_numbers ... Browse Code »

Conflicts:
fs/btrfs/extent-tree.c
fs/btrfs/free-space-cache.c
fs/btrfs/inode.c
fs/btrfs/tree-log.c

Signed-off-by: Chris Mason

Chris Mason
2011-05-23 00:33:42 +0800

21 May, 2011

1 commit

096553730 Merge branch 'ino-alloc' of git://repo.or.cz/linux-btrfs-devel into inode_numbers ... Browse Code »

Conflicts:
fs/btrfs/free-space-cache.c

Signed-off-by: Chris Mason

Chris Mason
2011-05-21 21:27:38 +0800

12 May, 2011

1 commit

a2de733c7 btrfs: scrub ... Browse Code »

This adds an initial implementation for scrub. It works quite
straightforward. The usermode issues an ioctl for each device in the
fs. For each device, it enumerates the allocated device chunks. For
each chunk, the contained extents are enumerated and the data checksums
fetched. The extents are read sequentially and the checksums verified.
If an error occurs (checksum or EIO), a good copy is searched for. If
one is found, the bad copy will be rewritten.
All enumerations happen from the commit roots. During a transaction
commit, the scrubs get paused and afterwards continue from the new
roots.

This commit is based on the series originally posted to linux-btrfs
with some improvements that resulted from comments from David Sterba,
Ilya Dryomov and Jan Schmidt.

Signed-off-by: Arne Jansen

Arne Jansen
2011-05-12 20:45:20 +0800

10 May, 2011

1 commit

70f23fd66 treewide: fix a few typos in comments ... Browse Code »

- kenrel -> kernel
- whetehr -> whether
- ttt -> tt
- sss -> ss

Signed-off-by: Justin P. Mattock
Signed-off-by: Jiri Kosina

Justin P. Mattock
2011-05-10 16:16:21 +0800

06 May, 2011

1 commit

f2a97a9db btrfs: remove all unused functions ... Browse Code »

Remove static and global declarations and/or definitions. Reduces size
of btrfs.ko by ~3.4kB.

text data bss dec hex filename
402081 7464 200 409745 64091 btrfs.ko.base
398620 7144 200 405964 631cc btrfs.ko.remove-all

Signed-off-by: David Sterba

David Sterba
2011-05-06 18:34:03 +0800

02 May, 2011

4 commits

b3b4aa74b btrfs: drop unused parameter from btrfs_release_path ... Browse Code »

parameter tree root it's not used since commit
5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer
interface for large blocksizes")

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:22 +0800
172ddd60a btrfs: drop gfp parameter from alloc_extent_map ... Browse Code »

pass GFP_NOFS directly to kmem_cache_alloc

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:21 +0800
f993c883a btrfs: drop unused argument from extent_io_tree_init ... Browse Code »

all callers pass GFP_NOFS, but the GFP mask argument is not used in the
function; GFP_ATOMIC is passed to radix tree initialization and it's the
only correct one, since we're using the preload/insert mechanism of
radix tree.
Let's drop the gfp mask from btrfs function, this will not change
behaviour.

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:21 +0800
c704005d8 btrfs: unify checking of IS_ERR and null ... Browse Code »

use IS_ERR_OR_NULL when possible, done by this coccinelle script:

@ match @
identifier id;
@@
(
- BUG_ON(IS_ERR(id) || !id);
+ BUG_ON(IS_ERR_OR_NULL(id));
|
- IS_ERR(id) || !id
+ IS_ERR_OR_NULL(id)
|
- !id || IS_ERR(id)
+ IS_ERR_OR_NULL(id)
)

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:20 +0800

25 Apr, 2011

2 commits

33345d015 Btrfs: Always use 64bit inode number ... Browse Code »

There's a potential problem in 32bit system when we exhaust 32bit inode
numbers and start to allocate big inode numbers, because btrfs uses
inode->i_ino in many places.

So here we always use BTRFS_I(inode)->location.objectid, which is an
u64 variable.

There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
and inode->i_ino will be used in those cases.

Another reason to make this change is I'm going to use a special inode
to save free ino cache, and the inode number must be > (u64)-256.

Signed-off-by: Li Zefan

Li Zefan
2011-04-25 16:46:09 +0800
581bb0509 Btrfs: Cache free inode numbers in memory ... Browse Code »

Currently btrfs stores the highest objectid of the fs tree, and it always
returns (highest+1) inode number when we create a file, so inode numbers
won't be reclaimed when we delete files, so we'll run out of inode numbers
as we keep create/delete files in 32bits machines.

This fixes it, and it works similarly to how we cache free space in block
cgroups.

We start a kernel thread to read the file tree. By scanning inode items,
we know which chunks of inode numbers are free, and we cache them in
an rb-tree.

Because we are searching the commit root, we have to carefully handle the
cross-transaction case.

The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
of extents, and a bitmap will be used if we exceed this threshold. The
extents threshold is adjusted in runtime.

Signed-off-by: Li Zefan

Li Zefan
2011-04-25 16:46:04 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

28 Mar, 2011

1 commit

97d9a8a42 Btrfs: check return value of read_tree_block() ... Browse Code »

This patch is checking return value of read_tree_block(),
and if it is NULL, error processing.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-03-28 17:37:37 +0800

18 Mar, 2011

1 commit

66b4ffd11 Btrfs: handle errors in btrfs_orphan_cleanup ... Browse Code »

If we cannot truncate an inode for some reason we will never delete the orphan
item associated with that inode, which means that we will loop forever in
btrfs_orphan_cleanup. Instead of doing this just return error so we fail to
mount. It sucks, but hey it's better than hanging. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-03-18 02:21:26 +0800

17 Feb, 2011

1 commit

c87f08ca4 Btrfs: allow balance to explicitly allocate chunks as it relocates ... Browse Code »

Btrfs device shrinking and balancing ends up reallocating all the blocks
in order to allow COW to move them to new destinations. It is somewhat
awkward in terms of ENOSPC because most of the enospc code is built
around the idea that some operation on a reference counted tree triggers
allocations in the non-reference counted trees.

This commit changes the balancing code to deal with enospc by trying to
allocate a new chunk. If that allocation succeeds, we go ahead and
retry whatever failed due to enospc.

Signed-off-by: Chris Mason

Chris Mason
2011-02-17 04:28:47 +0800

15 Feb, 2011

1 commit

6848ad646 Btrfs: Fix balance panic ... Browse Code »

Mark the cloned backref_node as checked in clone_backref_node()

Signed-off-by: Yan, Zheng
Signed-off-by: Chris Mason

Yan, Zheng
2011-02-15 05:00:03 +0800

01 Feb, 2011

1 commit

98d5dc13e btrfs: fix return value check of btrfs_start_transaction() ... Browse Code »

The error check of btrfs_start_transaction() is added, and the mistake
of the error check on several places is corrected.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-02-01 20:17:27 +0800

29 Jan, 2011

1 commit

3612b4959 btrfs: fix return value check of btrfs_join_transaction() ... Browse Code »

The error check of btrfs_join_transaction()/btrfs_join_transaction_nolock()
is added, and the mistake of the error check in several places is
corrected.

For more stable Btrfs, I think that we should reduce BUG_ON().
But, I think that long time is necessary for this.
So, I propose this patch as a short-term solution.

With this patch:
- To more stable Btrfs, the part that should be corrected is clarified.
- The panic isn't done by the NULL pointer reference etc. (even if
BUG_ON() is increased temporarily)
- The error code is returned in the place where the error can be easily
returned.

As a long-term plan:
- BUG_ON() is reduced by using the forced-readonly framework, etc.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-01-29 05:40:37 +0800

30 Oct, 2010

1 commit

411fc6bce Btrfs: Fix variables set but not read (bugs found by gcc 4.6) ... Browse Code »

These are all the cases where a variable is set, but not
read which are really bugs.

- Couple of incorrect error handling fixed.
- One incorrect use of a allocation policy
- Some other things

Still needs more review.

Found by gcc 4.6's new warnings.

[akpm@linux-foundation.org: fix build. Might have been bitrot]
Signed-off-by: Andi Kleen
Cc: Chris Mason
Signed-off-by: Andrew Morton
Signed-off-by: Chris Mason

Andi Kleen
2010-10-30 03:14:31 +0800

29 Oct, 2010

1 commit

6b5b817f1 Merge branch 'bug-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work ... Browse Code »

Conflicts:
fs/btrfs/extent-tree.c

Signed-off-by: Chris Mason

Chris Mason
2010-10-29 21:27:49 +0800