Doug / smarc-fsl-linux-kernel | Embedian Git Server

02 Jul, 2013

1 commit

6df9a95e6 Btrfs: make the chunk allocator completely tree lockless ... Browse Code »

When adjusting the enospc rules for relocation I ran into a deadlock because we
were relocating the only system chunk and that forced us to try and allocate a
new system chunk while holding locks in the chunk tree, which caused us to
deadlock. To fix this I've moved all of the dev extent addition and chunk
addition out to the delayed chunk completion stuff. We still keep the in-memory
stuff which makes sure everything is consistent.

One change I had to make was to search the commit root of the device tree to
find a free dev extent, and hold onto any chunk em's that we allocated in that
transaction so we do not allocate the same dev extent twice. This has the side
effect of fixing a bug with balance that has been there ever since balance
existed. Basically you can free a block group and it's dev extent and then
immediately allocate that dev extent for a new block group and write stuff to
that dev extent, all within the same transaction. So if you happen to crash
during a balance you could come back to a completely broken file system. This
patch should keep these sort of things from happening in the future since we
won't be able to allocate free'd dev extents until after the transaction
commits. This has passed all of the xfstests and my super annoying stress test
followed by a balance. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2013-07-02 23:50:53 +0800

14 Jun, 2013

1 commit

cb517eabb Btrfs: cleanup the similar code of the fs root read ... Browse Code »

There are several functions whose code is similar, such as
btrfs_find_last_root()
btrfs_read_fs_root_no_radix()

Besides that, some functions are invoked twice, it is unnecessary,
for example, we are sure that all roots which is found in
btrfs_find_orphan_roots()
have their orphan items, so it is unnecessary to check the orphan
item again.

So cleanup it.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-06-14 23:29:37 +0800

18 May, 2013

1 commit

9be3395bc Btrfs: use a btrfs bioset instead of abusing bio internals ... Browse Code »

Btrfs has been pointer tagging bi_private and using bi_bdev
to store the stripe index and mirror number of failed IOs.

As bios bubble back up through the call chain, we use these
to decide if and how to retry our IOs. They are also used
to count IO failures on a per device basis.

Recently a bio tracepoint was added lead to crashes because
we were abusing bi_bdev.

This commit adds a btrfs bioset, and creates explicit fields
for the mirror number and stripe index. The plan is to
extend this structure for all of the fields currently in
struct btrfs_bio, which will mean one less kmalloc in
our IO path.

Signed-off-by: Chris Mason
Reported-by: Tejun Heo

Chris Mason
2013-05-18 09:52:52 +0800

07 May, 2013

1 commit

48a3b6366 btrfs: make static code static & remove dead code ... Browse Code »

Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.

removed functions:

btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()

btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.

ulist.c functions are left, another patch will take care of those.

Signed-off-by: Eric Sandeen
Signed-off-by: Josef Bacik

Eric Sandeen
2013-05-07 03:55:23 +0800

21 Feb, 2013

1 commit

e942f883b Merge branch 'raid56-experimental' into for-linus-3.9 ... Browse Code »

Signed-off-by: Chris Mason

Conflicts:
fs/btrfs/ctree.h
fs/btrfs/extent-tree.c
fs/btrfs/inode.c
fs/btrfs/volumes.c

Chris Mason
2013-02-21 03:06:05 +0800

20 Feb, 2013

1 commit

55e301fd5 Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h ... Browse Code »

The header file will then be installed under /usr/include/linux so that
userspace applications can refer to Btrfs ioctls by name and use the same
structs used internally in the kernel.

Signed-off-by: Filipe Brandenburger
Signed-off-by: Josef Bacik

Filipe Brandenburger
2013-02-20 22:37:28 +0800

02 Feb, 2013

1 commit

53b381b3a Btrfs: RAID5 and RAID6 ... Browse Code »

This builds on David Woodhouse's original Btrfs raid5/6 implementation.
The code has changed quite a bit, blame Chris Mason for any bugs.

Read/modify/write is done after the higher levels of the filesystem have
prepared a given bio. This means the higher layers are not responsible
for building full stripes, and they don't need to query for the topology
of the extents that may get allocated during delayed allocation runs.
It also means different files can easily share the same stripe.

But, it does expose us to incorrect parity if we crash or lose power
while doing a read/modify/write cycle. This will be addressed in a
later commit.

Scrub is unable to repair crc errors on raid5/6 chunks.

Discard does not work on raid5/6 (yet)

The stripe size is fixed at 64KiB per disk. This will be tunable
in a later commit.

Signed-off-by: Chris Mason

David Woodhouse
2013-02-02 03:24:23 +0800

17 Dec, 2012

1 commit

31e502298 Btrfs: put raid properties into global table ... Browse Code »

Raid properties can be shared among raid calculation code, we can put
them into a global table to keep it simple.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

Liu Bo
2012-12-17 09:46:28 +0800

13 Dec, 2012

9 commits

8dabb7420 Btrfs: change core code of btrfs to support the device replace operations ... Browse Code »

This commit contains all the essential changes to the core code
of Btrfs for support of the device replace procedure.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:42 +0800
e93c89c1a Btrfs: add new sources for device replace code ... Browse Code »

This adds a new file to the sources together with the header file
and the changes to ioctl.h and ctree.h that are required by the
new C source file. Additionally, 4 new functions are added to
volume.c that deal with device creation and destruction.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:41 +0800
63a212abc Btrfs: disallow some operations on the device replace target device ... Browse Code »

This patch adds some code to disallow operations on the device that
is used as the target for the device replace operation.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:39 +0800
aa1b8cd40 Btrfs: pass fs_info instead of root ... Browse Code »

A small number of functions that are used in a device replace
procedure when the operation is resumed at mount time are unable
to pass the same root pointer that would be used in the regular
(ioctl) context. And since the root pointer is not required, only
the fs_info is, the root pointer argument is replaced with the
fs_info pointer argument.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:36 +0800
a8a6dab77 Btrfs: add btrfs_scratch_superblock() function ... Browse Code »

This new function is used by the device replace procedure in
a later patch.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:35 +0800
3ec706c83 Btrfs: pass fs_info to btrfs_map_block() instead of mapping_tree ... Browse Code »

This is required for the device replace procedure in a later step.
Two calling functions also had to be changed to have the fs_info
pointer: repair_io_failure() and scrub_setup_recheck_block().

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:34 +0800
5d9640517 Btrfs: Pass fs_info to btrfs_num_copies() instead of mapping_tree ... Browse Code »

This is required for the device replace procedure in a later step.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:34 +0800
7ba15b7d2 Btrfs: add two more find_device() methods ... Browse Code »

The new function btrfs_find_device_missing_or_by_path() will be
used for the device replace procedure. This function itself calls
the second new function btrfs_find_device_by_path().
Unfortunately, it is not possible to currently make the rest of the
code use these functions as well, since all functions that look
similar at first view are all a little bit different in what they
are doing. But in the future, new code could benefit from these
two new functions, and currently, device replace uses them.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:33 +0800
d9d181c1b Btrfs: rename the scrub context structure ... Browse Code »

The device replace procedure makes use of the scrub code. The scrub
code is the most efficient code to read the allocated data of a disk,
i.e. it reads sequentially in order to avoid disk head movements, it
skips unallocated blocks, it uses read ahead mechanisms, and it
contains all the code to detect and repair defects.
This commit is a first preparation step to adapt the scrub code to
be shareable for the device replace procedure.
The block device will be removed from the scrub context state
structure in a later step. It used to be the source block device.
The scrub code as it is used for the device replace procedure reads
the source data from whereever it is optimal. The source device might
even be gone (disconnected, for instance due to a hardware failure).
Or the drive can be so faulty so that the device replace procedure
tries to avoid access to the faulty source drive as much as possible,
and only if all other mirrors are damaged, as a last resort, the
source disk is accessed.
The modified scrub code operates as if it would handle the source
drive and thereby generates an exact copy of the source disk on the
target disk, even if the source disk is not present at all. Therefore
the block device pointer to the source disk is removed in a later
patch, and therefore the context structure is renamed (this is the
goal of the current patch) to reflect that no source block device
scope is there anymore.

Summary:
This first preparation step consists of a textual substitution of the
term "dev" to the term "ctx" whereever the scrub context is used.

Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-12-13 06:15:29 +0800

29 Aug, 2012

1 commit

5ee0844d6 Btrfs: revert checksum error statistic which can cause a BUG() ... Browse Code »

Commit 442a4f6308e694e0fa6025708bd5e4e424bbf51c added btrfs device
statistic counters for detected IO and checksum errors to Linux 3.5.
The statistic part that counts checksum errors in
end_bio_extent_readpage() can cause a BUG() in a subfunction:
"kernel BUG at fs/btrfs/volumes.c:3762!"
That part is reverted with the current patch.
However, the counting of checksum errors in the scrub context remains
active, and the counting of detected IO errors (read, write or flush
errors) in all contexts remains active.

Cc: stable # 3.5
Signed-off-by: Stefan Behrens
Signed-off-by: Chris Mason

Stefan Behrens
2012-08-29 04:53:39 +0800

24 Jul, 2012

2 commits

02db0844b Btrfs: add DEVICE_READY ioctl ... Browse Code »

This will be used in conjunction with btrfs device ready . This is
needed for initrd's to have a nice and lightweight way to tell if all of the
devices needed for a file system are in the cache currently. This keeps
them from having to do mount+sleep loops waiting for devices to show up.
Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-07-24 04:27:42 +0800
b27f7c0c1 btrfs: join DEV_STATS ioctls to one ... Browse Code »

Commit c11d2c236cc260b36 (Btrfs: add ioctl to get and reset the device
stats) introduced two ioctls doing almost the same thing distinguished
by just the ioctl number which encodes "do reset after read". I have
suggested

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16604.html

to implement it via the ioctl args. This hasn't happen, and I think we
should use a more clean way to pass flags and should not waste ioctl
numbers.

CC: Stefan Behrens
Signed-off-by: David Sterba

David Sterba
2012-07-24 03:41:40 +0800

03 Jul, 2012

2 commits

2b6ba629b Btrfs: resume balance on rw (re)mounts properly ... Browse Code »

This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-07-03 03:39:17 +0800
68310a5e4 Btrfs: restore restriper state on all mounts ... Browse Code »

Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts. This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-07-03 03:39:16 +0800

15 Jun, 2012

1 commit

606686eea Btrfs: use rcu to protect device->name ... Browse Code »

Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory. Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock(). This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch. Thanks,

Reviewed-by: David Sterba
Signed-off-by: Josef Bacik

Josef Bacik
2012-06-15 09:29:16 +0800

30 May, 2012

3 commits

733f4fbbc Btrfs: read device stats on mount, write modified ones during commit ... Browse Code »

The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.

Signed-off-by: Stefan Behrens

Stefan Behrens
2012-05-30 22:23:41 +0800
c11d2c236 Btrfs: add ioctl to get and reset the device stats ... Browse Code »

An ioctl interface is added to get the device statistic counters.
A second ioctl is added to atomically get and reset these counters.

Signed-off-by: Stefan Behrens

Stefan Behrens
2012-05-30 22:23:40 +0800
442a4f630 Btrfs: add device counters for detected IO and checksum errors ... Browse Code »

The goal is to detect when drives start to get an increased error rate,
when drives should be replaced soon. Therefore statistic counters are
added that count IO errors (read, write and flush). Additionally, the
software detected errors like checksum errors and corrupted blocks are
counted.

Signed-off-by: Stefan Behrens

Stefan Behrens
2012-05-30 22:23:39 +0800

22 Mar, 2012

1 commit

143bede52 btrfs: return void in functions without error conditions ... Browse Code »

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:34 +0800

17 Jan, 2012

13 commits

d756bd2d9 Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration ... Browse Code »

Conflicts:
fs/btrfs/volumes.c

Signed-off-by: Chris Mason

Chris Mason
2012-01-17 04:26:17 +0800
19a39dce3 Btrfs: add balance progress reporting ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
a7e99c691 Btrfs: allow for canceling restriper ... Browse Code »

Implement an ioctl for canceling restriper. Currently we wait until
relocation of the current block group is finished, in future this can be
done by triggering a commit. Balance item is deleted and no memory
about the interrupted balance is kept.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
837d5b6e4 Btrfs: allow for pausing restriper ... Browse Code »

Implement an ioctl for pausing restriper. This pauses the relocation,
but balance is still considered to be "in progress": balance item is
not deleted, other volume operations cannot be started, etc. If paused
in the middle of profile changing operation we will continue making
allocations with the target profile.

Add a hook to close_ctree() to pause restriper and free its data
structures on unmount. (It's safe to unmount when restriper is in
"paused" state, we will resume with the same parameters on the next
mount)

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:49 +0800
596410151 Btrfs: recover balance on mount ... Browse Code »

On mount, if balance item is found, resume balance in a separate
kernel thread.

Try to be smart to continue roughly where previous balance (or convert)
was interrupted. For chunk types that were being converted to some
profile we turn on soft convert, in case of a simple balance we turn on
usage filter and relocate only less-than-90%-full chunks of that type.
These are just heuristics but they help quite a bit, and can be improved
in future.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
cfa4c961c Btrfs: soft profile changing mode (aka soft convert) ... Browse Code »

When doing convert from one profile to another if soft mode is on
restriper won't touch chunks that already have the profile we are
converting to. This is useful if e.g. half of the FS was converted
earlier.

The soft mode switch is (like every other filter) per-type. This means
that we can convert for example meta chunks the "hard" way while
converting data chunks selectively with soft switch.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
e4d8ec0f6 Btrfs: implement online profile changing ... Browse Code »

Profile changing is done by launching a balance with
BTRFS_BALANCE_CONVERT bits set and target fields of respective
btrfs_balance_args structs initialized. Profile reducing code in this
case will pick restriper's target profile if it's available instead of
doing a blind reduce. If target profile is not yet available it goes
back to a plain reduce.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
ea67176ae Btrfs: virtual address space subset filter ... Browse Code »

Select chunks which have at least one byte located inside a given
[vstart, vend) virtual address space range.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
94e60d5a5 Btrfs: devid subset filter ... Browse Code »

Select chunks which have at least one byte of at least one stripe
located on a device with devid X in a given [pstart,pend) physical
address range.

This filter only works when devid filter is turned on.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:48 +0800
409d404b4 Btrfs: devid filter ... Browse Code »

Relocate chunks which have at least one stripe located on a device with
devid X.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:47 +0800
5ce5b3c09 Btrfs: usage filter ... Browse Code »

Select chunks that are less than X percent full.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:47 +0800
ed25e9b26 Btrfs: profiles filter ... Browse Code »

Select chunks based on a given profile mask.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:47 +0800
f43ffb60f Btrfs: add basic infrastructure for selective balancing ... Browse Code »

This allows to have a separate set of filters for each chunk type
(data,meta,sys). The code however is generic and switch on chunk type
is only done once.

This commit also adds a type filter: it allows to balance for example
meta and system chunks w/o touching data ones.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2012-01-17 04:04:47 +0800