Doug / smarc-fsl-linux-kernel | Embedian Git Server

09 Oct, 2012

3 commits

479ed9abd btrfs: move inline function code to header file ... Browse Code »

When building btrfs from kernel code, it will report:

fs/btrfs/extent_io.h:281: warning: 'extent_buffer_page' declared inline after being called
fs/btrfs/extent_io.h:281: warning: previous declaration of 'extent_buffer_page' was here
fs/btrfs/extent_io.h:280: warning: 'num_extent_pages' declared inline after being called
fs/btrfs/extent_io.h:280: warning: previous declaration of 'num_extent_pages' was here

because of the wrong declaration of inline functions.

Signed-off-by: Robin Dong

Robin Dong
2012-10-09 21:15:43 +0800
e6138876a Btrfs: cache extent state when writing out dirty metadata pages ... Browse Code »

Everytime we write out dirty pages we search for an offset in the tree,
convert the bits in the state, and then when we wait we search for the
offset again and clear the bits. So for every dirty range in the io tree we
are doing 4 rb searches, which is suboptimal. With this patch we are only
doing 2 searches for every cycle (modulo weird things happening). Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-10-09 21:15:41 +0800
de0022b9d Btrfs: do not async metadata csumming in certain situations ... Browse Code »

There are a coule scenarios where farming metadata csumming off to an async
thread doesn't help. The first is if our processor supports crc32c, in
which case the csumming will be fast and so the overhead of the async model
is not worth the cost. The other case is for our tree log. We will be
making that stuff dirty and writing it out and waiting for it immediately.
Even with software crc32c this gives me a ~15% increase in speed with O_SYNC
workloads. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-10-09 21:15:40 +0800

02 Oct, 2012

1 commit

9e8a4a8b0 Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag ... Browse Code »

We're going to use this flag EXTENT_DEFRAG to indicate which range
belongs to defragment so that we can implement snapshow-aware defrag:

We set the EXTENT_DEFRAG flag when dirtying the extents that need
defragmented, so later on writeback thread can differentiate between
normal writeback and writeback started by defragmentation.

Original-Signed-off-by: Li Zefan
Signed-off-by: Liu Bo

Liu Bo
2012-10-02 03:19:15 +0800

01 Jun, 2012

1 commit

1e20932a2 Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus ... Browse Code »

Conflicts:
fs/btrfs/ulist.h

Signed-off-by: Chris Mason

Chris Mason
2012-06-01 04:49:53 +0800

30 May, 2012

1 commit

5fd020435 Btrfs: finish ordered extents in their own thread ... Browse Code »

We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page. This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done. Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread. This makes direct writes
quite a bit faster. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-05-30 22:23:33 +0800

26 May, 2012

1 commit

815a51c74 Btrfs: dummy extent buffers for tree mod log ... Browse Code »

The tree modification log needs two ways to create dummy extent buffers,
once by allocating a fresh one (to rebuild an old root) and once by
cloning an existing one (to make private rewind modifications) to it.

Signed-off-by: Jan Schmidt

Jan Schmidt
2012-05-26 18:17:54 +0800

19 Apr, 2012

1 commit

5cf1ab561 Btrfs: always store the mirror we read the eb from ... Browse Code »

A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-04-19 01:22:30 +0800

29 Mar, 2012

1 commit

1d4284bd6 Merge branch 'error-handling' into for-linus ... Browse Code »

Conflicts:
fs/btrfs/ctree.c
fs/btrfs/disk-io.c
fs/btrfs/extent-tree.c
fs/btrfs/extent_io.c
fs/btrfs/extent_io.h
fs/btrfs/inode.c
fs/btrfs/scrub.c

Signed-off-by: Chris Mason

Chris Mason
2012-03-29 08:31:37 +0800

27 Mar, 2012

5 commits

ea4667940 Btrfs: deal with read errors on extent buffers differently ... Browse Code »

Since we need to read and write extent buffers in their entirety we can't use
the normal bio_readpage_error stuff since it only works on a per page basis. So
instead make it so that if we see an io error in endio we just mark the eb as
having an IO error and then in btree_read_extent_buffer_pages we will manually
try other mirrors and then overwrite the bad mirror if we find a good copy.
This works with larger than page size blocks. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-03-27 09:57:36 +0800
0b32f4bbb Btrfs: ensure an entire eb is written at once ... Browse Code »

This patch simplifies how we track our extent buffers. Previously we could exit
writepages with only having written half of an extent buffer, which meant we had
to track the state of the pages and the state of the extent buffers differently.
Now we only read in entire extent buffers and write out entire extent buffers,
this allows us to simply set bits in our bflags to indicate the state of the eb
and we no longer have to do things like track uptodate with our iotree. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2012-03-27 05:04:23 +0800
3083ee2e1 Btrfs: introduce free_extent_buffer_stale ... Browse Code »

Because btrfs cow's we can end up with extent buffers that are no longer
necessary just sitting around in memory. So instead of evicting these pages, we
could end up evicting things we actually care about. Thus we have
free_extent_buffer_stale for use when we are freeing tree blocks. This will
make it so that the ref for the eb being in the radix tree is dropped as soon as
possible and then is freed when the refcount hits 0 instead of waiting to be
released by releasepage. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-03-27 04:51:08 +0800
4f2de97ac Btrfs: set page->private to the eb ... Browse Code »

We spend a lot of time looking up extent buffers from pages when we could just
store the pointer to the eb the page is associated with in page->private. This
patch does just that, and it makes things a little simpler and reduces a bit of
CPU overhead involved with doing metadata IO. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-03-27 04:51:07 +0800
727011e07 Btrfs: allow metadata blocks larger than the page size ... Browse Code »

A few years ago the btrfs code to support blocks lager than
the page size was disabled to fix a few corner cases in the
page cache handling. This fixes the code to properly support
large metadata blocks again.

Since current kernels will crash early and often with larger
metadata blocks, this adds an incompat bit so that older kernels
can't mount it.

This also does away with different blocksizes for nodes and leaves.
You get a single block size for all tree blocks.

Signed-off-by: Chris Mason

Chris Mason
2012-03-27 04:50:37 +0800

22 Mar, 2012

3 commits

3fbe5c02a btrfs: split extent_state ops ... Browse Code »

set_extent_bit can do exclusive locking but only when called by lock_extent*,

Drop the exclusive bits argument except when called by lock_extent.

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:35 +0800
d0082371c btrfs: drop gfp_t from lock_extent ... Browse Code »

lock_extent and unlock_extent are always called with GFP_NOFS, drop the
argument and use GFP_NOFS consistently.

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:35 +0800
143bede52 btrfs: return void in functions without error conditions ... Browse Code »

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:34 +0800

15 Feb, 2012

1 commit

87826df0e btrfs: delalloc for page dirtied out-of-band in fixup worker ... Browse Code »

We encountered an issue that was easily observable on s/390 systems but
could really happen anywhere. The timing just seemed to hit reliably
on s/390 with limited memory.

The gist is that when an unexpected set_page_dirty() happened, we'd
run into the BUG() in btrfs_writepage_fixup_worker since it wasn't
properly set up for delalloc.

This patch does the following:
- Performs the missing delalloc in the fixup worker
- Allow the start hook to return -EBUSY which informs __extent_writepage
that it should mark the page skipped and not to redirty it. This is
required since the fixup worker can fail with -ENOSPC and the page
will have already been redirtied. That causes an Oops in
drop_outstanding_extents later. Retrying the fixup worker could
lead to an infinite loop. Deferring the page redirty also saves us
some cycles since the page would be stuck in a resubmit-redirty loop
until the fixup worker completes. It's not harmful, just wasteful.
- If the fixup worker fails, we mark the page and mapping as errored,
and end the writeback, similar to what we would do had the page
actually been submitted to writeback.

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-02-15 23:40:25 +0800

04 Jan, 2012

1 commit

5b25f70f4 Btrfs: add nested locking mode for paths ... Browse Code »

This patch adds the possibilty to read-lock an extent even if it is already
write-locked from the same thread. btrfs_find_all_roots() needs this
capability.

Signed-off-by: Arne Jansen
Signed-off-by: Jan Schmidt

Arne Jansen
2012-01-04 23:12:29 +0800

20 Nov, 2011

1 commit

32240a913 btrfs: mirror_num should be int, not u64 ... Browse Code »

My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.

Signed-off-by: Jan Schmidt
Signed-off-by: Chris Mason

Jan Schmidt
2011-11-20 20:42:14 +0800

06 Nov, 2011

3 commits

806468f8b Merge git://git.jan-o-sch.net/btrfs-unstable into integration ... Browse Code »

Conflicts:
fs/btrfs/Makefile
fs/btrfs/extent_io.c
fs/btrfs/extent_io.h
fs/btrfs/scrub.c

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:07:10 +0800
531f4b1ae Merge branch 'for-chris' of git://github.com/sensille/linux into integration ... Browse Code »

Conflicts:
fs/btrfs/ctree.h

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:05:08 +0800
01d658f2c Btrfs: make sure to flush queued bios if write_cache_pages waits ... Browse Code »

write_cache_pages tries to build up a large bio to stuff down the pipe.
But if it needs to wait for a page lock, it needs to make sure and send
down any pending writes so we don't deadlock with anyone who has the
page lock and is waiting for writeback of things inside the bio.

Dave Sterba triggered this as a deadlock between the autodefrag code and
the extent write_cache_pages

Signed-off-by: Chris Mason

Chris Mason
2011-11-06 16:03:48 +0800

20 Oct, 2011

2 commits

1728366ef Btrfs: stop using write_one_page ... Browse Code »

While looking for a performance regression a user was complaining about, I
noticed that we had a regression with the varmail test of filebench. This was
introduced by

0d10ee2e6deb5c8409ae65b970846344897d5e4e

which keeps us from calling writepages in writepage. This is a correct change,
however it happens to help the varmail test because we write out in larger
chunks. This is largly to do with how we write out dirty pages for each
transaction. If you run filebench with

load varmail
set $dir=/mnt/btrfs-test
run 60

prior to this patch you would get ~1420 ops/second, but with the patch you get
~1200 ops/second. This is a 16% decrease. So since we know the range of dirty
pages we want to write out, don't write out in one page chunks, write out in
ranges. So to do this we call filemap_fdatawrite_range() on the range of bytes.
Then we convert the DIRTY extents to NEED_WAIT extents. When we then call
btrfs_wait_marked_extents() we only have to filemap_fdatawait_range() on that
range and clear the NEED_WAIT extents. This doesn't get us back to our original
speeds, but I've been seeing ~1380 ops/second, which is a 15% regression. That is acceptable given that the original commit
greatly reduces our latency to begin with. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:48 +0800
462d6fac8 Btrfs: introduce convert_extent_bit ... Browse Code »

If I have a range where I know a certain bit is and I want to set it to another
bit the only option I have is to call set and then clear bit, which will result
in 2 tree searches. This is inefficient, so introduce convert_extent_bit which
will go through and set the bit I want and clear the old bit I don't want.
Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2011-10-20 03:12:47 +0800

02 Oct, 2011

2 commits

ab0fff030 btrfs: add READAHEAD extent buffer flag ... Browse Code »

Add a READAHEAD extent buffer flag.
Add a function to trigger a read with this flag set.

Changes v2:
- use extent buffer flags instead of extent state flags

Changes v5:
- adapt to changed read_extent_buffer_pages interface
- don't return eb from reada_tree_block_flagged if it has CORRUPT flag set

Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:47:57 +0800
bb82ab88d btrfs: add an extra wait mode to read_extent_buffer_pages ... Browse Code »

read_extent_buffer_pages currently has two modes, either trigger a read
without waiting for anything, or wait for the I/O to finish. The former
also bails when it's unable to lock the page. This patch now adds an
additional parameter to allow it to block on page lock, but don't wait
for completion.

Changes v5:
- merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
WAIT_PAGE_LOCK

Change v6:
- fix bug introduced in v5

Signed-off-by: Arne Jansen

Arne Jansen
2011-10-02 14:47:55 +0800

29 Sep, 2011

3 commits

4a54c8c16 btrfs: Moved repair code from inode.c to extent_io.c ... Browse Code »

The raid-retry code in inode.c can be generalized so that it works for
metadata as well. Thus, this patch moves it to extent_io.c and makes the
raid-retry code a raid-repair code.

Repair works that way: Whenever a read error occurs and we have more
mirrors to try, note the failed mirror, and retry another. If we find a
good one, check if we did note a failure earlier and if so, do not allow
the read to complete until after the bad sector was written with the good
data we just fetched. As we have the extent locked while reading, no one
can change the data in between.

Signed-off-by: Jan Schmidt

Jan Schmidt
2011-09-29 19:38:42 +0800
0ef8e4515 btrfs scrub: add fixup code for errors on nodatasum files ... Browse Code »

This removes a FIXME comment and introduces the first part of nodatasum
fixup: It gets the corresponding inode for a logical address and triggers a
regular readpage for the corrupted sector.

Once we have on-the-fly error correction our error will be automatically
corrected. The correction code is expected to clear the newly introduced
EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead
of "uncorrectable" eventually.

Signed-off-by: Jan Schmidt

Jan Schmidt
2011-09-29 18:54:28 +0800
8ddc7d9cd btrfs: add mirror_num to extent_read_full_page ... Browse Code »

Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.

Signed-off-by: Jan Schmidt

Jan Schmidt
2011-09-29 18:54:28 +0800

02 Aug, 2011

2 commits

3a6d457ec Btrfs: remove unused members from struct extent_state ... Browse Code »

These members are not used at all.

Signed-off-by: Xiao Guangrong
Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Xiao Guangrong
2011-08-02 02:30:50 +0800
1bf85046e btrfs: Make extent-io callbacks that never fail return void ... Browse Code »

The set/clear bit and the extent split/merge hooks only ever return 0.

Changing them to return void simplifies the error handling cases later.

This patch changes the hook prototypes, the single implementation of each,
and the functions that call them to return void instead.

Since all four of these hooks execute under a spinlock, they're necessarily
simple.

Signed-off-by: Jeff Mahoney
Signed-off-by: Chris Mason

Jeff Mahoney
2011-08-02 02:30:43 +0800

28 Jul, 2011

2 commits

bd681513f Btrfs: switch the btrfs tree locks to reader/writer ... Browse Code »

The btrfs metadata btree is the source of significant
lock contention, especially in the root node. This
commit changes our locking to use a reader/writer
lock.

The lock is built on top of rw spinlocks, and it
extends the lock tracking to remember if we have a
read lock or a write lock when we go to blocking. Atomics
count the number of blocking readers or writers at any
given time.

It removes all of the adaptive spinning from the old code
and uses only the spinning/blocking hints inside of btrfs
to decide when it should continue spinning.

In read heavy workloads this is dramatically faster. In write
heavy workloads we're still faster because of less contention
on the root node lock.

We suffer slightly in dbench because we schedule more often
during write locks, but all other benchmarks so far are improved.

Signed-off-by: Chris Mason

Chris Mason
2011-07-28 00:46:46 +0800
a65917156 Btrfs: stop using highmem for extent_buffers ... Browse Code »

The extent_buffers have a very complex interface where
we use HIGHMEM for metadata and try to cache a kmap mapping
to access the memory.

The next commit adds reader/writer locks, and concurrent use
of this kmap cache would make it even more complex.

This commit drops the ability to use HIGHMEM with extent buffers,
and rips out all of the related code.

Signed-off-by: Chris Mason

Chris Mason
2011-07-28 00:46:45 +0800

11 Jun, 2011

1 commit

9eb9104c6 btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline ... Browse Code »

Reorder extent_buffer to remove 8 bytes of alignment padding on 64 bit
builds. This shrinks its size to 128 bytes allowing it to fit into one
fewer cache lines and allows more objects per slab in its kmem_cache.

slabinfo extent_buffer reports :-

before:-
Sizes (bytes) Slabs
----------------------------------
Object : 136 Total : 123
SlabObj: 136 Full : 121
SlabSiz: 4096 Partial: 0
Loss : 0 CpuSlab: 2
Align : 8 Objects: 30

after :-
Object : 128 Total : 4
SlabObj: 128 Full : 2
SlabSiz: 4096 Partial: 0
Loss : 0 CpuSlab: 2
Align : 8 Objects: 32

Signed-off-by: Richard Kennedy
Signed-off-by: Chris Mason

richard kennedy
2011-06-11 06:57:10 +0800

06 May, 2011

1 commit

f2a97a9db btrfs: remove all unused functions ... Browse Code »

Remove static and global declarations and/or definitions. Reduces size
of btrfs.ko by ~3.4kB.

text data bss dec hex filename
402081 7464 200 409745 64091 btrfs.ko.base
398620 7144 200 405964 631cc btrfs.ko.remove-all

Signed-off-by: David Sterba

David Sterba
2011-05-06 18:34:03 +0800

04 May, 2011

1 commit

621496f4f btrfs: remove unused function prototypes ... Browse Code »

function prototypes without a body

Signed-off-by: David Sterba

David Sterba
2011-05-04 20:01:26 +0800

02 May, 2011

3 commits

ba1441926 btrfs: drop gfp parameter from alloc_extent_buffer ... Browse Code »

pass GFP_NOFS directly to kmem_cache_alloc

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:22 +0800
f09d1f60e btrfs: drop gfp parameter from find_extent_buffer ... Browse Code »

pass GFP_NOFS directly to kmem_cache_alloc

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:22 +0800
f993c883a btrfs: drop unused argument from extent_io_tree_init ... Browse Code »

all callers pass GFP_NOFS, but the GFP mask argument is not used in the
function; GFP_ATOMIC is passed to radix tree initialization and it's the
only correct one, since we're using the preload/insert mechanism of
radix tree.
Let's drop the gfp mask from btrfs function, this will not change
behaviour.

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:21 +0800