Eric Lee / smarc-fsl-linux-kernel

30 May, 2016

1 commit

f0e9b7d64 Btrfs: fix race setting block group readonly during device replace ... Browse Code »

When we do a device replace, for each device extent we find from the
source device, we set the corresponding block group to readonly mode to
prevent writes into it from happening while we are copying the device
extent from the source to the target device. However just before we set
the block group to readonly mode some concurrent task might have already
allocated an extent from it or decided it could perform a nocow write
into one of its extents, which can make the device replace process to
miss copying an extent since it uses the extent tree's commit root to
search for extents and only once it finishes searching for all extents
belonging to the block group it does set the left cursor to the logical
end address of the block group - this is a problem if the respective
ordered extents finish while we are searching for extents using the
extent tree's commit root and no transaction commit happens while we
are iterating the tree, since it's the delayed references created by the
ordered extents (when they complete) that insert the extent items into
the extent tree (using the non-commit root of course).
Example:

CPU 1 CPU 2

btrfs_dev_replace_start()
btrfs_scrub_dev()
scrub_enumerate_chunks()
--> finds device extent belonging
to block group X

starts buffered write
against some inode

writepages is run against
that inode forcing dellaloc
to run

btrfs_writepages()
extent_writepages()
extent_write_cache_pages()
__extent_writepage()
writepage_delalloc()
run_delalloc_range()
cow_file_range()
btrfs_reserve_extent()
--> allocates an extent
from block group X
(which is not yet
in RO mode)
btrfs_add_ordered_extent()
--> creates ordered extent Y
flush_epd_write_bio()
--> bio against the extent from
block group X is submitted

btrfs_inc_block_group_ro(bg X)
--> sets block group X to readonly

scrub_chunk(bg X)
scrub_stripe(device extent from srcdev)
--> keeps searching for extent items
belonging to the block group using
the extent tree's commit root
--> it never blocks due to
fs_info->scrub_pause_req as no
one tries to commit transaction N
--> copies all extents found from the
source device into the target device
--> finishes search loop

bio completes

ordered extent Y completes
and creates delayed data
reference which will add an
extent item to the extent
tree when run (typically
at transaction commit time)

--> so the task doing the
scrub/device replace
at CPU 1 misses this
and does not copy this
extent into the new/target
device

btrfs_dec_block_group_ro(bg X)
--> turns block group X back to RW mode

dev_replace->cursor_left is set to the
logical end offset of block group X

So fix this by waiting for all cow and nocow writes after setting a block
group to readonly mode.

Signed-off-by: Filipe Manana
Reviewed-by: Josef Bacik

Filipe Manana
2016-05-30 19:58:21 +0800

26 May, 2016

2 commits

42f31734e Merge branch 'cleanups-4.7' into for-chris-4.7-20160525 Browse Code »

David Sterba
2016-05-26 04:51:03 +0800
013276101 btrfs: fix string and comment grammatical issues and typos ... Browse Code »

Signed-off-by: Nicholas D Steeves
Signed-off-by: David Sterba

Nicholas D Steeves
2016-05-26 04:35:14 +0800

13 May, 2016

1 commit

578def7c5 Btrfs: don't wait for unrelated IO to finish before relocation ... Browse Code »

Before the relocation process of a block group starts, it sets the block
group to readonly mode, then flushes all delalloc writes and then finally
it waits for all ordered extents to complete. This last step includes
waiting for ordered extents destinated at extents allocated in other block
groups, making us waste unecessary time.

So improve this by waiting only for ordered extents that fall into the
block group's range.

Signed-off-by: Filipe Manana
Reviewed-by: Josef Bacik
Reviewed-by: Liu Bo

Filipe Manana
2016-05-13 08:59:14 +0800

22 Oct, 2015

1 commit

161c3549b Btrfs: change how we wait for pending ordered extents ... Browse Code »

We have a mechanism to make sure we don't lose updates for ordered extents that
were logged in the transaction that is currently running. We add the ordered
extent to a transaction list and then the transaction waits on all the ordered
extents in that list. However are substantially large file systems this list
can be extremely large, and can give us soft lockups, since the ordered extents
don't remove themselves from the list when they do complete.

To fix this we simply add a counter to the transaction that is incremented any
time we have a logged extent that needs to be completed in the current
transaction. Then when the ordered extent finally completes it decrements the
per transaction counter and wakes up the transaction if we are the last ones.
This will eliminate the softlockup. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2015-10-22 09:51:40 +0800

10 Jun, 2015

1 commit

b659ef027 Btrfs: avoid syncing log in the fast fsync path when not necessary ... Browse Code »

Commit 3a8b36f37806 ("Btrfs: fix data loss in the fast fsync path") added
a performance regression for that causes an unnecessary sync of the log
trees (fs/subvol and root log trees) when 2 consecutive fsyncs are done
against a file, without no writes or any metadata updates to the inode in
between them and if a transaction is committed before the second fsync is
called.

Huang Ying reported this to lkml (https://lkml.org/lkml/2015/3/18/99)
after a test sysbench test that measured a -62% decrease of file io
requests per second for that tests' workload.

The test is:

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
mkfs -t btrfs /dev/sda2
mount -t btrfs /dev/sda2 /fs/sda2
cd /fs/sda2
for ((i = 0; i < 1024; i++)); do fallocate -l 67108864 testfile.$i; done
sysbench --test=fileio --max-requests=0 --num-threads=4 --max-time=600 \
--file-test-mode=rndwr --file-total-size=68719476736 --file-io-mode=sync \
--file-num=1024 run

A test on kvm guest, running a debug kernel gave me the following results:

Without 3a8b36f378060d: 16.01 reqs/sec
With 3a8b36f378060d: 3.39 reqs/sec
With 3a8b36f378060d and this patch: 16.04 reqs/sec

Reported-by: Huang Ying
Tested-by: Huang, Ying
Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2015-06-10 22:02:43 +0800

03 Jun, 2015

1 commit

0c304304f Btrfs: remove csum_bytes_left ... Browse Code »

After commit 8407f553268a
("Btrfs: fix data corruption after fast fsync and writeback error"),
during wait_ordered_extents(), we wait for ordered extent setting
BTRFS_ORDERED_IO_DONE or BTRFS_ORDERED_IOERR, at which point we've
already got checksum information, so we don't need to check
(csum_bytes_left == 0) in the whole logging path.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2015-06-03 19:03:06 +0800

22 Nov, 2014

2 commits

0870295b2 Btrfs: collect only the necessary ordered extents on ranged fsync ... Browse Code »

Instead of collecting all ordered extents from the inode's ordered tree
and then wait for all of them to complete, just collect the ones that
overlap the fsync range.

Signed-off-by: Filipe Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-11-22 03:59:56 +0800
50d9aa99b Btrfs: make sure logged extents complete in the current transaction V3 ... Browse Code »

Liu Bo pointed out that my previous fix would lose the generation update in the
scenario I described. It is actually much worse than that, we could lose the
entire extent if we lose power right after the transaction commits. Consider
the following

write extent 0-4k
log extent in log tree
commit transaction
< power fail happens here
ordered extent completes

We would lose the 0-4k extent because it hasn't updated the actual fs tree, and
the transaction commit will reset the log so it isn't replayed. If we lose
power before the transaction commit we are save, otherwise we are not.

Fix this by keeping track of all extents we logged in this transaction. Then
when we go to commit the transaction make sure we wait for all of those ordered
extents to complete before proceeding. This will make sure that if we lose
power after the transaction commit we still have our data. This also fixes the
problem of the improperly updated extent generation. Thanks,

cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-11-22 03:58:32 +0800

15 Aug, 2014

1 commit

8d875f95d btrfs: disable strict file flushes for renames and truncates ... Browse Code »

Truncates and renames are often used to replace old versions of a file
with new versions. Applications often expect this to be an atomic
replacement, even if they haven't done anything to make sure the new
version is fully on disk.

Btrfs has strict flushing in place to make sure that renaming over an
old file with a new file will fully flush out the new file before
allowing the transaction commit with the rename to complete.

This ordering means the commit code needs to be able to lock file pages,
and there are a few paths in the filesystem where we will try to end a
transaction with the page lock held. It's rare, but these things can
deadlock.

This patch removes the ordered flushes and switches to a best effort
filemap_flush like ext4 uses. It's not perfect, but it should fix the
deadlocks.

Signed-off-by: Chris Mason

Chris Mason
2014-08-15 22:43:42 +0800

11 Mar, 2014

4 commits

d458b0540 btrfs: Cleanup the "_struct" suffix in btrfs_workequeue ... Browse Code »

Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:16 +0800
fccb5d86d btrfs: Replace fs_info->endio_* workqueue with btrfs_workqueue. ... Browse Code »

Replace the fs_info->endio_* workqueues with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:08 +0800
a44903abe btrfs: Replace fs_info->flush_workers with btrfs_workqueue. ... Browse Code »

Replace the fs_info->submit_workers with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo
Tested-by: David Sterba
Signed-off-by: Josef Bacik

Qu Wenruo
2014-03-11 03:17:07 +0800
827463c49 Btrfs: don't mix the ordered extents of all files together during logging the inodes ... Browse Code »

There was a problem in the old code:
If we failed to log the csum, we would free all the ordered extents in the log list
including those ordered extents that were logged successfully, it would make the
log committer not to wait for the completion of the ordered extents.

This patch doesn't insert the ordered extents that is about to be logged into
a global list, instead, we insert them into a local list. If we log the ordered
extents successfully, we splice them with the global list, or we will throw them
away, then do full sync. It can also reduce the lock contention and the traverse
time of list.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2014-03-11 03:15:36 +0800

12 Nov, 2013

2 commits

b02441999 Btrfs: don't wait for the completion of all the ordered extents ... Browse Code »

It is very likely that there are lots of ordered extents in the filesytem,
if we wait for the completion of all of them when we want to reclaim some
space for the metadata space reservation, we would be blocked for a long
time. The performance would drop down suddenly for a long time.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Miao Xie
2013-11-12 11:13:44 +0800
0ef8b7260 Btrfs: return an error from btrfs_wait_ordered_range ... Browse Code »

I noticed that if the free space cache has an error writing out it's data it
won't actually error out, it will just carry on. This is because it doesn't
check the return value of btrfs_wait_ordered_range, which didn't actually return
anything. So fix this in order to keep us from making free space cache look
valid when it really isnt. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-11-12 11:07:35 +0800

21 Sep, 2013

1 commit

f0de181c9 Btrfs: kill delay_iput arg to the wait_ordered functions ... Browse Code »

This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it. However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:27 +0800

01 Sep, 2013

1 commit

77cef2ec5 Btrfs: allow partial ordered extent completion ... Browse Code »

We currently have this problem where you can truncate pages that have not yet
been written for an ordered extent. We do this because the truncate will be
coming behind to clean us up anyway so what's the harm right? Well if truncate
fails for whatever reason we leave an orphan item around for the file to be
cleaned up later. But if the user goes and truncates up the file and tries to
read from the area that had been discarded previously they will get a csum error
because we never actually wrote that data out.

This patch fixes this by allowing us to either discard the ordered extent
completely, by which I mean we just free up the space we had allocated and not
add the file extent, or adjust the length of the file extent we write. We do
this by setting the length we truncated down to in the ordered extent, and then
we set the file extent length and ram bytes to this length. The total disk
space stays unchanged since we may be compressed and we can't just chop off the
disk space, but at least this way the file extent only points to the valid data.
Then when the file extent is free'd the extent and csums will be freed normally.

This patch is needed for the next series which will give us more graceful
recovery of failed truncates. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-01 20:16:34 +0800

02 Jul, 2013

1 commit

f51a4a182 Btrfs: remove btrfs_sector_sum structure ... Browse Code »

Using the structure btrfs_sector_sum to keep the checksum value is
unnecessary, because the extents that btrfs_sector_sum points to are
continuous, we can find out the expected checksums by btrfs_ordered_sum's
bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
removing bytenr, there is only one member in the structure, so it makes
no sense to keep the structure, just remove it, and use a u32 array to
store the checksum value.

By this change, we don't use the while loop to get the checksums one by
one. Now, we can get several checksum value at one time, it improved the
performance by ~74% on my SSD (31MB/s -> 54MB/s).

test command:
# dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-07-02 23:50:47 +0800

14 Jun, 2013

1 commit

199c2a9c3 Btrfs: introduce per-subvolume ordered extent list ... Browse Code »

The reason we introduce per-subvolume ordered extent list is the same
as the per-subvolume delalloc inode list.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-06-14 23:29:41 +0800

07 May, 2013

1 commit

e4100d987 Btrfs: improve the performance of the csums lookup ... Browse Code »

It is very likely that there are several blocks in bio, it is very
inefficient if we get their csums one by one. This patch improves
this problem by getting the csums in batch.

According to the result of the following test, the execute time of
__btrfs_lookup_bio_sums() is down by ~28%(300us -> 217us).

# dd if=/file of=/dev/null bs=1M count=1024

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-05-07 03:54:35 +0800

21 Feb, 2013

2 commits

b2c6b3e06 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btr… ... Browse Code »

…fs-next into for-linus-3.9

Signed-off-by: Chris Mason <chris.mason@fusionio.com>

Conflicts:
fs/btrfs/disk-io.c

Chris Mason
2013-02-21 03:05:45 +0800
569e0f358 Btrfs: place ordered operations on a per transaction list ... Browse Code »

Miao made the ordered operations stuff run async, which introduced a
deadlock where we could get somebody (sync) racing in and committing the
transaction while a commit was already happening. The new committer would
try and flush ordered operations which would hang waiting for the commit to
finish because it is done asynchronously and no longer inherits the callers
trans handle. To fix this we need to make the ordered operations list a per
transaction list. We can get new inodes added to the ordered operation list
by truncating them and then having another process writing to them, so this
makes it so that anybody trying to add an ordered operation _must_ start a
transaction in order to add itself to the list, which will keep new inodes
from getting added to the ordered operations list after we start committing.
This should fix the deadlock and also keeps us from doing a lot more work
than we need to during commit. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-02-21 01:59:57 +0800

20 Feb, 2013

1 commit

2ab28f322 Btrfs: wait on ordered extents at the last possible moment ... Browse Code »

Since we don't actually copy the extent information from the source tree in
the fast case we don't need to wait for ordered io to be completed in order
to fsync, we just need to wait for the io to be completed. So when we're
logging our file just attach all of the ordered extents to the log, and then
when the log syncs just wait for IO_DONE on the ordered extents and then
write the super. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-02-20 22:37:04 +0800

19 Dec, 2012

1 commit

a22180d26 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs update from Chris Mason:
"A big set of fixes and features.

In terms of line count, most of the code comes from Stefan, who added
the ability to replace a single drive in place. This is different
from how btrfs normally replaces drives, and is much much much faster.

Josef is plowing through our synchronous write performance. This pull
request does not include the DIO_OWN_WAITING patch that was discussed
on the list, but it has a number of other improvements to cut down our
latencies and CPU time during fsync/O_DIRECT writes.

Miao Xie has a big series of fixes and is spreading out ordered
operations over more CPUs. This improves performance and reduces
contention.

I've put in fixes for error handling around hash collisions. These
are going back to individual stable kernels as I test against them.

Otherwise we have a lot of fixes and cleanups, thanks everyone!
raid5/6 is being rebased against the device replacement code. I'll
have it posted this Friday along with a nice series of benchmarks."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (115 commits)
Btrfs: fix a bug of per-file nocow
Btrfs: fix hash overflow handling
Btrfs: don't take inode delalloc mutex if we're a free space inode
Btrfs: fix autodefrag and umount lockup
Btrfs: fix permissions of empty files not affected by umask
Btrfs: put raid properties into global table
Btrfs: fix BUG() in scrub when first superblock reading gives EIO
Btrfs: do not call file_update_time in aio_write
Btrfs: only unlock and relock if we have to
Btrfs: use tokens where we can in the tree log
Btrfs: optimize leaf_space_used
Btrfs: don't memset new tokens
Btrfs: only clear dirty on the buffer if it is marked as dirty
Btrfs: move checks in set_page_dirty under DEBUG
Btrfs: log changed inodes based on the extent map tree
Btrfs: add path->really_keep_locks
Btrfs: do not mark ems as prealloc if we are writing to them
Btrfs: keep track of the extents original block length
Btrfs: inline csums if we're fsyncing
Btrfs: don't bother copying if we're only logging the inode
...

Linus Torvalds
2012-12-19 01:42:05 +0800

12 Dec, 2012

2 commits

9afab8820 Btrfs: make ordered extent be flushed by multi-task ... Browse Code »

Though the process of the ordered extents is a bit different with the delalloc inode
flush, but we can see it as a subset of the delalloc inode flush, so we also handle
them by flush workers.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2012-12-12 02:31:38 +0800
25287e0a1 Btrfs: make ordered operations be handled by multi-task ... Browse Code »

The process of the ordered operations is similar to the delalloc inode flush, so
we handle them by flush workers.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2012-12-12 02:31:37 +0800

19 Nov, 2012

1 commit

48fc7f7e7 Fix misspellings of "whether" in comments. ... Browse Code »

"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.

Signed-off-by: Adam Buchbinder
Signed-off-by: Jiri Kosina

Adam Buchbinder
2012-11-19 21:31:35 +0800

04 Oct, 2012

1 commit

6bbe3a9c8 Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents ... Browse Code »

nocow_only is now an obsolete argument.

Signed-off-by: Liu Bo

Liu Bo
2012-10-04 21:39:57 +0800

02 Oct, 2012

2 commits

6352b91da Btrfs: use a slab for ordered extents allocation ... Browse Code »

The ordered extent allocation is in the fast path of the IO, so use a slab
to improve the speed of the allocation.

"Size of the struct is 280, so this will fall into the size-512 bucket,
giving 8 objects per page, while own slab will pack 14 objects into a page.

Another benefit I see is to check for leaked objects when the module is
removed (and the cache destroy takes place)."
-- David Sterba

Signed-off-by: Miao Xie

Miao Xie
2012-10-02 03:19:11 +0800
b9a8cc5be Btrfs: fix file extent discount problem in the, snapshot ... Browse Code »

If a snapshot is created while we are writing some data into the file,
the i_size of the corresponding file in the snapshot will be wrong, it will
be beyond the end of the last file extent. And btrfsck will report:
root 256 inode 257 errors 100

Steps to reproduce:
# mkfs.btrfs
# mount
# cd
# dd if=/dev/zero of=tmpfile bs=4M count=1024 &
# for ((i=0; i do
> btrfs sub snap . $i
> done

This because the algorithm of disk_i_size update is wrong. Though there are
some ordered extents behind the current one which we use to update disk_i_size,
it doesn't mean those extents will be dealt with in the same transaction. So
We shouldn't use the offset of those extents to update disk_i_size. Or we will
get the wrong i_size in the snapshot.

We fix this problem by recording the max real i_size. If we find there is a
ordered extent which is in front of the current one and doesn't complete, we
will record the end of the current one into that ordered extent. Surely, if
the current extent holds the end of other extent(it must be greater than
the current one because it is behind the current one), we will record the
number that the current extent holds. In this way, we can exclude the ordered
extents that may not be dealth with in the same transaction, and be easy to
know the real disk_i_size.

Signed-off-by: Miao Xie

Miao Xie
2012-10-02 03:19:10 +0800

30 May, 2012

1 commit

5fd020435 Btrfs: finish ordered extents in their own thread ... Browse Code »

We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page. This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done. Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread. This makes direct writes
quite a bit faster. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-05-30 22:23:33 +0800

22 Mar, 2012

1 commit

143bede52 btrfs: return void in functions without error conditions ... Browse Code »

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:34 +0800

22 Dec, 2010

1 commit

261507a02 btrfs: Allow to add new compression algorithm ... Browse Code »

Make the code aware of compression type, instead of always assuming
zlib compression.

Also make the zlib workspace function as common code for all
compression types.

Signed-off-by: Li Zefan

Li Zefan
2010-12-22 23:15:45 +0800

29 Nov, 2010

1 commit

163cf09c2 Btrfs: deal with DIO bios that span more than one ordered extent ... Browse Code »

The new DIO bio splitting code has problems when the bio
spans more than one ordered extent. This will happen as the
generic DIO code merges our get_blocks calls together into
a bigger single bio.

This fixes things by walking forward in the ordered extent
code finding all the overlapping ordered extents and completing them
all at once.

Signed-off-by: Chris Mason

Chris Mason
2010-11-29 08:56:33 +0800

25 May, 2010

1 commit

4b46fce23 Btrfs: add basic DIO read/write support ... Browse Code »

This provides basic DIO support for reading and writing. It does not do the
work to recover from mismatching checksums, that will come later. A few design
changes have been made from Jim's code (sorry Jim!)

1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO
code in order to account for all of BTRFS's oddities, but thanks to that work it
seems like the best bet is to just ignore compression and such and just opt to
fallback on buffered IO.

2) Fallback on buffered IO for compressed or inline extents. Jim's code did
it's own buffering to make dio with compressed extents work. Now we just
fallback onto normal buffered IO.

3) Use ordered extents for the writes so that all of the

lock_extent()
lookup_ordered()

type checks continue to work.

4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
DIO writes.

I've tested this with fsx and everything works great. This patch depends on my
dio and filemap.c patches to work. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-05-25 22:34:57 +0800

15 Mar, 2010

2 commits

5a1a3df1f Btrfs: cache ordered extent when completing io ... Browse Code »

When finishing io we run btrfs_dec_test_ordered_pending, and then immediately
run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that
already, so we're searching twice when we don't have to. This patch lets us
pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do
complete io on that ordered extent we can just use the one we found then instead
of having to do another btrfs_lookup_ordered_extent. This made my fio job with
the other patch go from 24 mb/s to 29 mb/s.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:13 +0800
49958fd7d Btrfs: change the ordered tree to use a spinlock instead of a mutex ... Browse Code »

The ordered tree used to need a mutex, but currently all we use it for is to
protect the rb_tree, and a spin_lock is just fine for that. Using a spin_lock
instead makes dbench run a little faster, 58 mb/s instead of 51 mb/s, and have
less latency, 3445.138 ms instead of 3820.633 ms.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2010-03-15 23:00:12 +0800

09 Mar, 2010

1 commit

6bef4d317 Btrfs: use RB_ROOT to intialize rb_trees instead of setting rb_node to NULL ... Browse Code »

btrfs inialize rb trees in quite a number of places by settin rb_node =
NULL; The problem with this is that 17d9ddc72fb8bba0d4f678 in the
linux-next tree adds a new field to that struct which needs to be NULL for
the new rbtree library code to work properly. This patch uses RB_ROOT as
the intializer so all of the relevant fields will be NULL'd. Without the
patch I get a panic.

Signed-off-by: Eric Paris
Acked-by: Venkatesh Pallipadi
Signed-off-by: Chris Mason

Eric Paris
2010-03-09 05:26:50 +0800

18 Dec, 2009

1 commit

24bbcf044 Btrfs: Add delayed iput ... Browse Code »

iput() can trigger new transactions if we are dropping the
final reference, so calling it in btrfs_commit_transaction
may end up deadlock. This patch adds delayed iput to avoid
the issue.

Signed-off-by: Yan Zheng
Signed-off-by: Chris Mason

Yan, Zheng
2009-12-18 01:33:35 +0800