13 Dec, 2012

4 commits

  • Before this commit, btrfs_map_block() was called with REQ_WRITE
    in order to retrieve the list of mirrors for a disk block.
    This needs to be changed for the device replace procedure since
    it makes a difference whether you are asking for read mirrors
    or for locations to write to.
    GET_READ_MIRRORS is introduced as a new interface to call
    btrfs_map_block().
    In the current commit, the functionality is not yet changed,
    only the interface for GET_READ_MIRRORS is introduced and all
    the places that should use this new interface are adapted.

    The reason that REQ_WRITE cannot be abused anymore to retrieve
    a list of read mirrors is that during a running dev replace
    operation all write requests to the live filesystem are
    duplicated to also write to the target drive.
    Keep in mind that the target disk is only partially a valid
    copy of the source disk while the operation is ongoing. All
    writes go to the target disk, but not all reads would return
    valid data on the target disk. Therefore it is not possible
    anymore to abuse a REQ_WRITE interface to find valid mirrors
    for a REQ_READ.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Chris Mason

    Stefan Behrens
     
  • This commit contains all the essential changes to the core code
    of Btrfs for support of the device replace procedure.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Chris Mason

    Stefan Behrens
     
  • The device replace procedure makes use of the scrub code. The scrub
    code is the most efficient code to read the allocated data of a disk,
    i.e. it reads sequentially in order to avoid disk head movements, it
    skips unallocated blocks, it uses read ahead mechanisms, and it
    contains all the code to detect and repair defects.
    This commit adds code to scrub to allow the scrub code to copy read
    data to another disk.
    One goal is to be able to perform as fast as possible. Therefore the
    write requests are collected until huge bios are built, and the
    write process is decoupled from the read process with some kind of
    flow control, of course, in order to limit the allocated memory.
    The best performance on spinning disks could by reached when the
    head movements are avoided as much as possible. Therefore a single
    worker is used to interface the read process with the write process.
    The regular scrub operation works as fast as before, it is not
    negatively influenced and actually it is more or less unchanged.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Chris Mason

    Stefan Behrens
     
  • This is required for the device replace procedure in a later step.
    Two calling functions also had to be changed to have the fs_info
    pointer: repair_io_failure() and scrub_setup_recheck_block().

    Signed-off-by: Stefan Behrens
    Signed-off-by: Chris Mason

    Stefan Behrens
     

03 Oct, 2012

1 commit


30 May, 2012

1 commit


19 Apr, 2012

2 commits

  • Normally when there are 2 copies of a block, we add both to the
    reada extent tree and prefetch only the one that is easier to reach.
    This way we can better utilize multiple devices.
    In case of DUP this makes no sense as both copies reside on the
    same device.

    Signed-off-by: Arne Jansen

    Arne Jansen
     
  • When inserting into the radix tree returns EEXIST, get the existing
    entry without giving up the spinlock in between.
    There was a race for both the zones trees and the extent tree.

    Signed-off-by: Arne Jansen

    Arne Jansen
     

28 Mar, 2012

1 commit


03 Mar, 2012

1 commit

  • The reada code from scrub was casting down a u64 to
    an unsigned long so it could insert it into a radix tree.

    What it really wanted to do was cast down the result of a shift, instead
    of casting down the u64. The bug resulted in trying to insert our
    reada struct into the wrong place, which caused soft lockups and other
    problems.

    Signed-off-by: Chris Mason

    Chris Mason
     

06 Nov, 2011

3 commits


02 Oct, 2011

1 commit

  • This is the implementation for the generic read ahead framework.

    To trigger a readahead, btrfs_reada_add must be called. It will start
    a read ahead for the given range [start, end) on tree root. The returned
    handle can either be used to wait on the readahead to finish
    (btrfs_reada_wait), or to send it to the background (btrfs_reada_detach).

    The read ahead works as follows:
    On btrfs_reada_add, the root of the tree is inserted into a radix_tree.
    reada_start_machine will then search for extents to prefetch and trigger
    some reads. When a read finishes for a node, all contained node/leaf
    pointers that lie in the given range will also be enqueued. The reads will
    be triggered in sequential order, thus giving a big win over a naive
    enumeration. It will also make use of multi-device layouts. Each disk
    will have its on read pointer and all disks will by utilized in parallel.
    Also will no two disks read both sides of a mirror simultaneously, as this
    would waste seeking capacity. Instead both disks will read different parts
    of the filesystem.
    Any number of readaheads can be started in parallel. The read order will be
    determined globally, i.e. 2 parallel readaheads will normally finish faster
    than the 2 started one after another.

    Changes v2:
    - protect root->node by transaction instead of node_lock
    - fix missed branches:
    The readahead had a too simple check to determine if a branch from
    a node should be checked or not. It now also records the upper bound
    of each node to see if the requested RA range lies within.
    - use KERN_CONT to debug output, to avoid line breaks
    - defer reada_start_machine to worker to avoid deadlock

    Changes v3:
    - protect root->node by rcu

    Changes v5:
    - changed EIO-semantics of reada_tree_block_flagged
    - remove spin_lock from reada_control and make elems an atomic_t
    - remove unused read_total from reada_control
    - kill reada_key_cmp, use btrfs_comp_cpu_keys instead
    - use kref-style release functions where possible
    - return struct reada_control * instead of void * from btrfs_reada_add

    Signed-off-by: Arne Jansen

    Arne Jansen