Doug / smarc-fsl-linux-kernel | Embedian Git Server

21 Feb, 2009

1 commit

6a63209fc Btrfs: add better -ENOSPC handling ... Browse Code »

This is a step in the direction of better -ENOSPC handling. Instead of
checking the global bytes counter we check the space_info bytes counters to
make sure we have enough space.

If we don't we go ahead and try to allocate a new chunk, and then if that fails
we return -ENOSPC. This patch adds two counters to btrfs_space_info,
bytes_delalloc and bytes_may_use.

bytes_delalloc account for extents we've actually setup for delalloc and will
be allocated at some point down the line.

bytes_may_use is to keep track of how many bytes we may use for delalloc at
some point. When we actually set the extent_bit for the delalloc bytes we
subtract the reserved bytes from the bytes_may_use counter. This keeps us from
not actually being able to allocate space for any delalloc bytes.

Signed-off-by: Josef Bacik

Josef Bacik
2009-02-21 00:00:09 +0800

20 Feb, 2009

1 commit

2cfbd50b5 Btrfs: check file pointer in btrfs_sync_file ... Browse Code »

fsync can be called by NFS with a null file pointer, and btrfs was
oopsing in this case.

Signed-off-by: Chris Mason

Chris Mason
2009-02-20 23:55:10 +0800

22 Jan, 2009

1 commit

7237f1833 Btrfs: fix tree logs parallel sync ... Browse Code »

To improve performance, btrfs_sync_log merges tree log sync
requests. But it wrongly merges sync requests for different
tree logs. If multiple tree logs are synced at the same time,
only one of them actually gets synced.

This patch has following changes to fix the bug:

Move most tree log related fields in btrfs_fs_info to
btrfs_root. This allows merging sync requests separately
for each tree log.

Don't insert root item into the log root tree immediately
after log tree is allocated. Root item for log tree is
inserted when log tree get synced for the first time. This
allows syncing the log root tree without first syncing all
log trees.

At tree-log sync, btrfs_sync_log first sync the log tree;
then updates corresponding root item in the log root tree;
sync the log root tree; then update the super block.

Signed-off-by: Yan Zheng

Yan Zheng
2009-01-22 01:54:03 +0800

21 Jan, 2009

1 commit

7eaebe7d5 Btrfs: removed unused #include <version.h>'s ... Browse Code »

Removed unused #include 's in btrfs

Signed-off-by: Huang Weiyi
Signed-off-by: Chris Mason

Huang Weiyi
2009-01-21 23:49:16 +0800

06 Jan, 2009

3 commits

1ba12553f Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents ... Browse Code »

btrfs_drop_extents doesn't change file extent's ram_bytes
in the case of booked extent. To be consistent, we should
also not change ram_bytes when truncating existing extent.

Signed-off-by: Yan Zheng

Yan Zheng
2009-01-06 22:58:02 +0800
d397712bc Btrfs: Fix checkpatch.pl warnings ... Browse Code »

There were many, most are fixed now. struct-funcs.c generates some warnings
but these are bogus.

Signed-off-by: Chris Mason

Chris Mason
2009-01-06 10:25:51 +0800
9aead4358 Btrfs: Fix memset length in btrfs_file_write ... Browse Code »

Signed-off-by: Chris Mason

yanhai zhu
2009-01-06 04:49:11 +0800

12 Dec, 2008

1 commit

17d217fe9 Btrfs: fix nodatasum handling in balancing code ... Browse Code »

Checksums on data can be disabled by mount option, so it's
possible some data extents don't have checksums or have
invalid checksums. This causes trouble for data relocation.
This patch contains following things to make data relocation
work.

1) make nodatasum/nodatacow mount option only affects new
files. Checksums and COW on data are only controlled by the
inode flags.

2) check the existence of checksum in the nodatacow checker.
If checksums exist, force COW the data extent. This ensure that
checksum for a given block is either valid or does not exist.

3) update data relocation code to properly handle the case
of checksum missing.

Signed-off-by: Yan Zheng

Yan Zheng
2008-12-12 23:03:38 +0800

09 Dec, 2008

2 commits

580afd76e Btrfs: Fix compressed checksum fsync log copies ... Browse Code »

The fsync logging code makes sure to onl copy the relevant checksum for each
extent based on the file extent pointers it finds.

But for compressed extents, it needs to copy the checksum for the
entire extent.

Signed-off-by: Chris Mason

Chris Mason
2008-12-09 08:15:39 +0800
c3027eb55 Btrfs: Add inode sequence number for NFS and reserved space in a few structs ... Browse Code »

This adds a sequence number to the btrfs inode that is increased on
every update. NFS will be able to use that to detect when an inode has
changed, without relying on inaccurate time fields.

While we're here, this also:

Puts reserved space into the super block and inode

Adds a log root transid to the super so we can pick the newest super
based on the fsync log as well as the main transaction ID. For now
the log root transid is always zero, but that'll get fixed.

Adds a starting offset to the dev_item. This will let us do better
alignment calculations if we know the start of a partition on the disk.

Signed-off-by: Chris Mason

Chris Mason
2008-12-09 05:40:21 +0800

02 Dec, 2008

1 commit

6e430f94e Btrfs: fix shadowed variable declarations ... Browse Code »

Signed-off-by: Chris Mason

Christoph Hellwig
2008-12-02 19:36:09 +0800

13 Nov, 2008

1 commit

c36047d72 Btrfs: Fix race in btrfs_mark_extent_written ... Browse Code »

When extent needs to be split, btrfs_mark_extent_written truncates the extent
first, then inserts a new extent and increases the reference count.

The race happens if someone else deletes the old extent before the new extent
is inserted. The fix here is increase the reference count in advance. This race
is similar to the race in btrfs_drop_extents that was recently fixed.

Signed-off-by: Yan Zheng

Yan Zheng
2008-11-13 03:19:50 +0800

11 Nov, 2008

2 commits

8247b41ac Btrfs: Fix starting search offset inside btrfs_drop_extents ... Browse Code »

btrfs_drop_extents will drop paths and search again when it needs to
force COW of higher nodes. It was using the key it found during the last
search as the offset for the next search.

But, this wasn't always correct. The key could be from before our desired
range, and because we're dropping the path, it is possible for file's items
to change while we do the search again.

The fix here is to make sure we don't search for something smaller than
the offset btrfs_drop_extents was called with.

Signed-off-by: Chris Mason

Yan Zheng
2008-11-11 22:33:29 +0800
445a69449 Btrfs: Fix usage of struct extent_map->orig_start ... Browse Code »

This makes sure the orig_start field in struct extent_map gets set
everywhere the extent_map structs are created or modified.

Signed-off-by: Chris Mason

Chris Mason
2008-11-11 00:53:33 +0800

10 Nov, 2008

1 commit

ff5b7ee33 Btrfs: Fix csum error for compressed data ... Browse Code »

The decompress code doesn't take the logical offset in extent
pointer into account. If the logical offset isn't zero, data
will be decompressed into wrong pages.

The solution used here is to record the starting offset of the extent
in the file separately from the logical start of the extent_map struct.
This allows us to avoid problems inserting overlapping extents.

Signed-off-by: Yan Zheng

Yan Zheng
2008-11-10 20:34:43 +0800

07 Nov, 2008

1 commit

771ed689d Btrfs: Optimize compressed writeback and reads ... Browse Code »

When reading compressed extents, try to put pages into the page cache
for any pages covered by the compressed extent that readpages didn't already
preload.

Add an async work queue to handle transformations at delayed allocation processing
time. Right now this is just compression. The workflow is:

1) Find offsets in the file marked for delayed allocation
2) Lock the pages
3) Lock the state bits
4) Call the async delalloc code

The async delalloc code clears the state lock bits and delalloc bits. It is
important this happens before the range goes into the work queue because
otherwise it might deadlock with other work queue items that try to lock
those extent bits.

The file pages are compressed, and if the compression doesn't work the
pages are written back directly.

An ordered work queue is used to make sure the inodes are written in the same
order that pdflush or writepages sent them down.

This changes extent_write_cache_pages to let the writepage function
update the wbc nr_written count.

Signed-off-by: Chris Mason

Chris Mason
2008-11-07 11:02:51 +0800

01 Nov, 2008

1 commit

70b99e695 Btrfs: Compression corner fixes ... Browse Code »

Make sure we keep page->mapping NULL on the pages we're getting
via alloc_page. It gets set so a few of the callbacks can do the right
thing, but in general these pages don't have a mapping.

Don't try to truncate compressed inline items in btrfs_drop_extents.
The whole compressed item must be preserved.

Don't try to create multipage inline compressed items. When we try to
overwrite just the first page of the file, we would have to read in and recow
all the pages after it in the same compressed inline items. For now, only
create single page inline items.

Make sure we lock pages in the correct order during delalloc. The
search into the state tree for delalloc bytes can return bytes before
the page we already have locked.

Signed-off-by: Chris Mason

Chris Mason
2008-11-01 00:46:39 +0800

31 Oct, 2008

3 commits

d899e0521 Btrfs: Add fallocate support v2 ... Browse Code »

This patch updates btrfs-progs for fallocate support.

fallocate is a little different in Btrfs because we need to tell the
COW system that a given preallocated extent doesn't need to be
cow'd as long as there are no snapshots of it. This leverages the
-o nodatacow checks.

Signed-off-by: Yan Zheng

Yan Zheng
2008-10-31 02:25:28 +0800
6643558db Btrfs: Fix bookend extent race v2 ... Browse Code »

When dropping middle part of an extent, btrfs_drop_extents truncates
the extent at first, then inserts a bookend extent.

Since truncation and insertion can't be done atomically, there is a small
period that the bookend extent isn't in the tree. This causes problem for
functions that search the tree for file extent item. The way to fix this is
lock the range of the bookend extent before truncation.

Signed-off-by: Yan Zheng

Yan Zheng
2008-10-31 02:19:50 +0800
9036c1020 Btrfs: update hole handling v2 ... Browse Code »

This patch splits the hole insertion code out of btrfs_setattr
into btrfs_cont_expand and updates btrfs_get_extent to properly
handle the case that file extent items are not continuous.

Signed-off-by: Yan Zheng

Yan Zheng
2008-10-31 02:19:41 +0800

30 Oct, 2008

1 commit

c8b978188 Btrfs: Add zlib compression support ... Browse Code »

This is a large change for adding compression on reading and writing,
both for inline and regular extents. It does some fairly large
surgery to the writeback paths.

Compression is off by default and enabled by mount -o compress. Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.

If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.

* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler. This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.

* Inline extents are inserted at delalloc time now. This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.

* All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
are changed to record both an in-memory size and an on disk size, as well
as a flag for compression.

From a disk format point of view, the extent pointers in the file are changed
to record the on disk size of a given extent and some encoding flags.
Space in the disk format is allocated for compression encoding, as well
as encryption and a generic 'other' field. Neither the encryption or the
'other' field are currently used.

In order to limit the amount of data read for a single random read in the
file, the size of a compressed extent is limited to 128k. This is a
software only limit, the disk format supports u64 sized compressed extents.

In order to limit the ram consumed while processing extents, the uncompressed
size of a compressed extent is limited to 256k. This is a software only limit
and will be subject to tuning later.

Checksumming is still done on compressed extents, and it is done on the
uncompressed version of the data. This way additional encodings can be
layered on without having to figure out which encoding to checksum.

Compression happens at delalloc time, which is basically singled threaded because
it is usually done by a single pdflush thread. This makes it tricky to
spread the compression load across all the cpus on the box. We'll have to
look at parallel pdflush walks of dirty inodes at a later time.

Decompression is hooked into readpages and it does spread across CPUs nicely.

Signed-off-by: Chris Mason

Chris Mason
2008-10-30 02:49:59 +0800

09 Oct, 2008

2 commits

3bb1a1bc4 Btrfs: Remove offset field from struct btrfs_extent_ref ... Browse Code »

The offset field in struct btrfs_extent_ref records the position
inside file that file extent is referenced by. In the new back
reference system, tree leaves holding references to file extent
are recorded explicitly. We can scan these tree leaves very quickly, so the
offset field is not required.

This patch also makes the back reference system check the objectid
when extents are in deleting.

Signed-off-by: Yan Zheng

Yan Zheng
2008-10-09 23:46:24 +0800
a76a3cd40 Btrfs: Count space allocated to file in bytes ... Browse Code »

This patch makes btrfs count space allocated to file in bytes instead
of 512 byte sectors.

Everything else in btrfs uses a byte count instead of sector sizes or
blocks sizes, so this fits better.

Signed-off-by: Yan Zheng

Yan Zheng
2008-10-09 23:46:29 +0800

04 Oct, 2008

1 commit

cb843a6f5 Btrfs: O_DIRECT writes via buffered writes + invaldiate ... Browse Code »

This reworks the btrfs O_DIRECT write code a bit. It had always fallen
back to buffered IO and done an invalidate, but needed to be updated
for the data=ordered code. The invalidate wasn't actually removing pages
because they were still inside an ordered extent.

This also combines the O_DIRECT/O_SYNC paths where possible, and kicks
off IO in the main btrfs_file_write loop to keep the pipe down the the
disk full as we process long writes.

Signed-off-by: Chris Mason

Chris Mason
2008-10-04 00:30:02 +0800

30 Sep, 2008

1 commit

d352ac681 Btrfs: add and improve comments ... Browse Code »

This improves the comments at the top of many functions. It didn't
dive into the guts of functions because I was trying to
avoid merging problems with the new allocator and back reference work.

extent-tree.c and volumes.c were both skipped, and there is definitely
more work todo in cleaning and commenting the code.

Signed-off-by: Chris Mason

Chris Mason
2008-09-30 03:18:18 +0800

26 Sep, 2008

2 commits

5b21f2ed3 Btrfs: extent_map and data=ordered fixes for space balancing ... Browse Code »

* Add an EXTENT_BOUNDARY state bit to keep the writepage code
from merging data extents that are in the process of being
relocated. This allows us to do accounting for them properly.

* The balancing code relocates data extents indepdent of the underlying
inode. The extent_map code was modified to properly account for
things moving around (invalidating extent_map caches in the inode).

* Don't take the drop_mutex in the create_subvol ioctl. It isn't
required.

* Fix walking of the ordered extent list to avoid races with sys_unlink

* Change the lock ordering rules. Transaction start goes outside
the drop_mutex. This allows btrfs_commit_transaction to directly
drop the relocation trees.

Signed-off-by: Chris Mason

Zheng Yan
2008-09-26 22:05:38 +0800
2b1f55b0f Remove Btrfs compat code for older kernels ... Browse Code »

Btrfs had compatibility code for kernels back to 2.6.18. These have
been removed, and will be maintained in a separate backport
git tree from now on.

Signed-off-by: Chris Mason

Chris Mason
2008-09-26 03:41:59 +0800

25 Sep, 2008

13 commits

31840ae1a Btrfs: Full back reference support ... Browse Code »

This patch makes the back reference system to explicit record the
location of parent node for all types of extents. The location of
parent node is placed into the offset field of backref key. Every
time a tree block is balanced, the back references for the affected
lower level extents are updated.

Signed-off-by: Chris Mason

Zheng Yan
2008-09-25 23:04:07 +0800
49eb7e46d Btrfs: Dir fsync optimizations ... Browse Code »

Drop i_mutex during the commit

Don't bother doing the fsync at all unless the dir is marked as dirtied
and needing fsync in this transaction. For directories, this means
that someone has unlinked a file from the dir without fsyncing the
file.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:07 +0800
e02119d5a Btrfs: Add a write ahead tree log to optimize synchronous operations ... Browse Code »

File syncs and directory syncs are optimized by copying their
items into a special (copy-on-write) log tree. There is one log tree per
subvolume and the btrfs super block points to a tree of log tree roots.

After a crash, items are copied out of the log tree and back into the
subvolume. See tree-log.c for all the details.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:07 +0800
a1b32a593 Btrfs: Add debugging checks to track down corrupted metadata ... Browse Code »

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:07 +0800
ea8c28194 Btrfs: Maintain a list of inodes that are delalloc and a way to wait on them ... Browse Code »

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:06 +0800
f87f057b4 Btrfs: Improve and cleanup locking done by walk_down_tree ... Browse Code »

While dropping snapshots, walk_down_tree does most of the work of checking
reference counts and limiting tree traversal to just the blocks that
we are freeing.

It dropped and held the allocation mutex in strange and confusing ways,
this commit changes it to only hold the mutex while actually freeing a block.

The rest of the checks around reference counts should be safe without the lock
because we only allow one process in btrfs_drop_snapshot at a time. Other
processes dropping reference counts should not drop it to 1 because
their tree roots already have an extra ref on the block.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:06 +0800
3ce7e67a0 Btrfs: Drop some debugging around the extent_map pinned flag ... Browse Code »

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:05 +0800
37d1aeee3 Btrfs: Throttle tuning ... Browse Code »

This avoids waiting for transactions with pages locked by breaking out
the code to wait for the current transaction to close into a function
called by btrfs_throttle.

It also lowers the limits for where we start throttling.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:05 +0800
0ee0fda06 Btrfs: Add compatibility for kernels >= 2.6.27-rc1 ... Browse Code »

Add a couple of #if's to follow API changes.

Signed-off-by: Sven Wegener
Signed-off-by: Chris Mason

Sven Wegener
2008-09-25 23:04:05 +0800
bcc63abbf Btrfs: implement memory reclaim for leaf reference cache ... Browse Code »

The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.

Signed-off-by: Chris Mason

Yan
2008-09-25 23:04:05 +0800
ab78c84de Btrfs: Throttle operations if the reference cache gets too large ... Browse Code »

A large reference cache is directly related to a lot of work pending
for the cleaner thread. This throttles back new operations based on
the size of the reference cache so the cleaner thread will be able to keep
up.

Overall, this actually makes the FS faster because the cleaner thread will
be more likely to find things in cache.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:05 +0800
017e5369e Btrfs: Leaf reference cache update ... Browse Code »

This changes the reference cache to make a single cache per root
instead of one cache per transaction, and to key by the byte number
of the disk block instead of the keys inside.

This makes it much less likely to have cache misses if a snapshot
or something has an extra reference on a higher node or a leaf while
the first transaction that added the leaf into the cache is dropping.

Some throttling is added to functions that free blocks heavily so they
wait for old transactions to drop.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:05 +0800
f421950f8 Btrfs: Fix some data=ordered related data corruptions ... Browse Code »

Stress testing was showing data checksum errors, most of which were caused
by a lookup bug in the extent_map tree. The tree was caching the last
pointer returned, and searches would check the last pointer first.

But, search callers also expect the search to return the very first
matching extent in the range, which wasn't always true with the last
pointer usage.

For now, the code to cache the last return value is just removed. It is
easy to fix, but I think lookups are rare enough that it isn't required anymore.

This commit also replaces do_sync_mapping_range with a local copy of the
related functions.

Signed-off-by: Chris Mason

Chris Mason
2008-09-25 23:04:05 +0800