31 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (39 commits)
    Btrfs: deal with errors from updating the tree log
    Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed
    Btrfs: make SNAP_DESTROY async
    Btrfs: add SNAP_CREATE_ASYNC ioctl
    Btrfs: add START_SYNC, WAIT_SYNC ioctls
    Btrfs: async transaction commit
    Btrfs: fix deadlock in btrfs_commit_transaction
    Btrfs: fix lockdep warning on clone ioctl
    Btrfs: fix clone ioctl where range is adjacent to extent
    Btrfs: fix delalloc checks in clone ioctl
    Btrfs: drop unused variable in block_alloc_rsv
    Btrfs: cleanup warnings from gcc 4.6 (nonbugs)
    Btrfs: Fix variables set but not read (bugs found by gcc 4.6)
    Btrfs: Use ERR_CAST helpers
    Btrfs: use memdup_user helpers
    Btrfs: fix raid code for removing missing drives
    Btrfs: Switch the extent buffer rbtree into a radix tree
    Btrfs: restructure try_release_extent_buffer()
    Btrfs: use the flusher threads for delalloc throttling
    Btrfs: tune the chunk allocation to 5% of the FS as metadata
    ...

    Fix up trivial conflicts in fs/btrfs/super.c and fs/fs-writeback.c, and
    remove use of INIT_RCU_HEAD in fs/btrfs/extent_io.c (that init macro was
    useless and removed in commit 5e8067adfdba: "rcu head remove init")

    Linus Torvalds
     

30 Oct, 2010

2 commits

  • These are all the cases where a variable is set, but not read which are
    not bugs as far as I can see, but simply leftovers.

    Still needs more review.

    Found by gcc 4.6's new warnings

    Signed-off-by: Andi Kleen
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Chris Mason

    Andi Kleen
     
  • These are all the cases where a variable is set, but not
    read which are really bugs.

    - Couple of incorrect error handling fixed.
    - One incorrect use of a allocation policy
    - Some other things

    Still needs more review.

    Found by gcc 4.6's new warnings.

    [akpm@linux-foundation.org: fix build. Might have been bitrot]
    Signed-off-by: Andi Kleen
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Chris Mason

    Andi Kleen
     

29 Oct, 2010

2 commits

  • This patch reduces the CPU time spent in the extent buffer search by using the
    radix tree instead of the rbtree and using the rcu lock instead of the spin
    lock.

    I did a quick test by the benchmark tool[1] and found the patch improve the
    file creation/deletion performance problem that I have reported[2].

    Before applying this patch:
    Create files:
    Total files: 50000
    Total time: 0.971531
    Average time: 0.000019
    Delete files:
    Total files: 50000
    Total time: 1.366761
    Average time: 0.000027

    After applying this patch:
    Create files:
    Total files: 50000
    Total time: 0.927455
    Average time: 0.000019
    Delete files:
    Total files: 50000
    Total time: 1.292280
    Average time: 0.000026

    [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
    [2] http://marc.info/?l=linux-btrfs&m=128212635122920&w=2

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • restructure try_release_extent_buffer() and write a function to release the
    extent buffer. It will be used later.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     

06 Jul, 2010

1 commit


26 May, 2010

1 commit

  • This changes O_DIRECT write code to mark extents as delalloc
    while it is processing them. Yan Zheng has reworked the
    enospc accounting based on tracking delalloc extents and
    this makes it much easier to track enospc in the O_DIRECT code.

    There are a few space cases with the O_DIRECT code though,
    it only sets the EXTENT_DELALLOC bits, instead of doing
    EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because
    we don't want to mess with clearing the dirty and uptodate
    bits when things go wrong. This is important because there
    are no pages in the page cache, so any extent state structs
    that we put in the tree won't get freed by releasepage. We have
    to clear them ourselves as the DIO ends.

    With this commit, we reserve space at in btrfs_file_aio_write,
    and then as each btrfs_direct_IO call progresses it sets
    EXTENT_DELALLOC on the range.

    btrfs_get_blocks_direct is responsible for clearing the delalloc
    at the same time it drops the extent lock.

    Signed-off-by: Chris Mason

    Chris Mason
     

25 May, 2010

3 commits

  • The async helper threads offload crc work onto all the
    CPUs, and make streaming writes much faster. This
    changes the O_DIRECT write code to use them. The only
    small complication was that we need to pass in the
    logical offset in the file for each bio, because we can't
    find it in the bio's pages.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • In order for AIO to work, we need to implement aio_write. This patch converts
    our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and
    nothing broke, and the AIO stuff magically started working. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Introduce metadata reservation context for delayed allocation
    and update various related functions.

    This patch also introduces EXTENT_FIRST_DELALLOC control bit for
    set/clear_extent_bit. It tells set/clear_bit_hook whether they
    are processing the first extent_state with EXTENT_DELALLOC bit
    set. This change is important if set/clear_extent_bit involves
    multiple extent_state.

    Signed-off-by: Yan Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     

06 Apr, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: add check for changed leaves in setup_leaf_for_split
    Btrfs: create snapshot references in same commit as snapshot
    Btrfs: fix small race with delalloc flushing waitqueue's
    Btrfs: use add_to_page_cache_lru, use __page_cache_alloc
    Btrfs: fix chunk allocate size calculation
    Btrfs: kill max_extent mount option
    Btrfs: fail to mount if we have problems reading the block groups
    Btrfs: check btrfs_get_extent return for IS_ERR()
    Btrfs: handle kmalloc() failure in inode lookup ioctl
    Btrfs: dereferencing freed memory
    Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk()
    Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree()
    Btrfs: Remove unnecessary finish_wait() in wait_current_trans()
    Btrfs: add NULL check for do_walk_down()
    Btrfs: remove duplicate include in ioctl.c

    Fix trivial conflict in fs/btrfs/compression.c due to slab.h include
    cleanups.

    Linus Torvalds
     
  • Pagecache pages should be allocated with __page_cache_alloc, so they
    obey pagecache memory policies.

    add_to_page_cache_lru is exported, so it should be used. Benefits over
    using a private pagevec: neater code, 128 bytes fewer stack used, percpu
    lru ordering is preserved, and finally don't need to flush pagevec
    before returning so batching may be shared with other LRU insertions.

    Signed-off-by: Nick Piggin :
    Signed-off-by: Chris Mason

    Nick Piggin
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

15 Mar, 2010

3 commits

  • This patch just goes through and fixes everybody that does

    lock_extent()
    blah
    unlock_extent()

    to use

    lock_extent_bits()
    blah
    unlock_extent_cached()

    and pass around a extent_state so we only have to do the searches once per
    function. This gives me about a 3 mb/s boots on my random write test. I have
    not converted some things, like the relocation and ioctl's, since they aren't
    heavily used and the relocation stuff is in the middle of being re-written. I
    also changed the clear_extent_bit() to only unset the cached state if we are
    clearing EXTENT_LOCKED and related stuff, so we can do things like this

    lock_extent_bits()
    clear delalloc bits
    unlock_extent_cached()

    without losing our cached state. I tested this thoroughly and turned on
    LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out
    fine.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This patch makes us cache the extent state we find in find_delalloc_range since
    we'll have to lock the extent later on in the function. This will keep us from
    re-searching for the rang when we try to lock the extent.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The endio is done at reverse order of bio vectors.

    That means for a sequential read, the page first submitted will finish
    last in a bio. Considering we will do checksum (making cache hot) for
    every page, this does introduce delay (and chance to squeeze cache used
    soon) for pages submitted at the begining.

    I don't observe obvious performance difference with below patch at my
    simple test, but seems more natural to finish read in the order they are
    submitted.

    Signed-off-by: Shaohua Li
    Signed-off-by: Chris Mason

    Chris Mason
     

09 Mar, 2010

1 commit

  • btrfs inialize rb trees in quite a number of places by settin rb_node =
    NULL; The problem with this is that 17d9ddc72fb8bba0d4f678 in the
    linux-next tree adds a new field to that struct which needs to be NULL for
    the new rbtree library code to work properly. This patch uses RB_ROOT as
    the intializer so all of the relevant fields will be NULL'd. Without the
    patch I get a panic.

    Signed-off-by: Eric Paris
    Acked-by: Venkatesh Pallipadi
    Signed-off-by: Chris Mason

    Eric Paris
     

05 Feb, 2010

1 commit


09 Oct, 2009

2 commits

  • This patch fixes an issue with the delalloc metadata space reservation
    code. The problem is we used to free the reservation as soon as we
    allocated the delalloc region. The problem with this is if we are not
    inserting an inline extent, we don't actually insert the extent item until
    after the ordered extent is written out. This patch does 3 things,

    1) It moves the reservation clearing stuff into the ordered code, so when
    we remove the ordered extent we remove the reservation.
    2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear
    delalloc bits in the cases where we want to clear the metadata reservation
    when we clear the delalloc extent, in the case that we do an inline extent
    or we invalidate the page.
    3) It adds another waitqueue to the space info so that when we start a fs
    wide delalloc flush, anybody else who also hits that area will simply wait
    for the flush to finish and then try to make their allocation.

    This has been tested thoroughly to make sure we did not regress on
    performance.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • extent_clear_unlock_delalloc has a growing set of ugly parameters
    that is very difficult to read and maintain.

    This switches to a flag field and well named flag defines.

    Signed-off-by: Chris Mason

    Chris Mason
     

29 Sep, 2009

1 commit

  • At the start of a transaction we do a btrfs_reserve_metadata_space() and
    specify how many items we plan on modifying. Then once we've done our
    modifications and such, just call btrfs_unreserve_metadata_space() for
    the same number of items we reserved.

    For keeping track of metadata needed for data I've had to add an extent_io op
    for when we merge extents. This lets us track space properly when we are doing
    sequential writes, so we don't end up reserving way more metadata space than
    what we need.

    The only place where the metadata space accounting is not done is in the
    relocation code. This is because Yan is going to be reworking that code in the
    near future, so running btrfs-vol -b could still possibly result in a ENOSPC
    related panic. This patch also turns off the metadata_ratio stuff in order to
    allow users to more efficiently use their disk space.

    This patch makes it so we track how much metadata we need for an inode's
    delayed allocation extents by tracking how many extents are currently
    waiting for allocation. It introduces two new callbacks for the
    extent_io tree's, merge_extent_hook and split_extent_hook. These help
    us keep track of when we merge delalloc extents together and split them
    up. Reservations are handled prior to any actually dirty'ing occurs,
    and then we unreserve after we dirty.

    btrfs_unreserve_metadata_for_delalloc() will make the appropriate
    unreservations as needed based on the number of reservations we
    currently have and the number of extents we currently have. Doing the
    reservation outside of doing any of the actual dirty'ing lets us do
    things like filemap_flush() the inode to try and force delalloc to
    happen, or as a last resort actually start allocation on all delalloc
    inodes in the fs. This has survived dbench, fs_mark and an fsx torture
    test.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

24 Sep, 2009

3 commits

  • During releasepage, we try to drop any extent_state structs for the
    bye offsets of the page we're releaseing. But the code was incorrectly
    telling clear_extent_bit to delete the state struct unconditionallly.

    Normally this would be fine because we have the page locked, but other
    parts of btrfs will lock down an entire extent, the most common place
    being IO completion.

    releasepage was deleting the extent state without first locking the extent,
    which may result in removing a state struct that another process had
    locked down. The fix here is to leave the NODATASUM and EXTENT_LOCKED
    bits alone in releasepage.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • If test_range_bit finds an extent that goes all the way to (u64)-1, it
    can incorrectly wrap the u64 instead of treaing it like the end of
    the address space.

    This just adds a check for the highest possible offset so we don't wrap.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Both set and clear_extent_bit allow passing a cached
    state struct to reduce rbtree search times. clear_extent_bit
    was improperly bypassing some of the checks around making sure
    the extent state fields were correct for a given operation.

    The fix used here (from Yan Zheng) is to use the hit_next
    goto target instead of jumping all the way down to start clearing
    bits without making sure the cached state was exactly correct
    for the operation we were doing.

    This also fixes up the setting of the start variable for both
    ops in the case where we find an overlapping extent that
    begins before the range we want to change. In both cases
    we were incorrectly going backwards from the original
    requested change.

    Signed-off-by: Chris Mason

    Chris Mason
     

19 Sep, 2009

1 commit

  • When btrfs fills a delayed allocation, it tries to increase
    the wbc nr_to_write to cover a big part of allocation. The
    theory is that we're doing contiguous IO and writing a few
    more blocks will save seeks overall at a very low cost.

    The problem is that extent_write_cache_pages could ignore
    the new higher nr_to_write if nr_to_write had already gone
    down to zero. We fix that by rechecking the nr_to_write
    for every page that is processed in the pagevec.

    This updates the math around bumping the nr_to_write value
    to make sure we don't leave a tiny amount of IO hanging
    around for the very end of a new extent.

    Signed-off-by: Chris Mason

    Chris Mason
     

12 Sep, 2009

9 commits

  • Btrfs writes go through delalloc to the data=ordered code. This
    makes sure that all of the data is on disk before the metadata
    that references it. The tracking means that we have to make sure
    each page in an extent is fully written before we add that extent into
    the on-disk btree.

    This was done in the past by setting the EXTENT_ORDERED bit for the
    range of an extent when it was added to the data=ordered code, and then
    clearing the EXTENT_ORDERED bit in the extent state tree as each page
    finished IO.

    One of the reasons we had to do this was because sometimes pages are
    magically dirtied without page_mkwrite being called. The EXTENT_ORDERED
    bit is checked at writepage time, and if it isn't there, our page become
    dirty without going through the proper path.

    These bit operations make for a number of rbtree searches for each page,
    and can cause considerable lock contention.

    This commit switches from the EXTENT_ORDERED bit to use PagePrivate2.
    As pages go into the ordered code, PagePrivate2 is set on each one.
    This is a cheap operation because we already have all the pages locked
    and ready to go.

    As IO finishes, the PagePrivate2 bit is cleared and the ordered
    accoutning is updated for each page.

    At writepage time, if the PagePrivate2 bit is missing, we go into the
    writepage fixup code to handle improperly dirtied pages.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • This changes the btrfs code to find delalloc ranges in the extent state
    tree to use the new state caching code from set/test bit. It reduces
    one of the biggest causes of rbtree searches in the writeback path.

    test_range_bit is also modified to take the cached state as a starting
    point while searching.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • At writepage time, we have the page locked and we have the
    extent_map entry for this extent pinned in the extent_map tree.
    So, the page can't go away and its mapping can't change.

    There is no need for the extra extent_state lock bits during writepage.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Many of the btrfs extent state tree users follow the same pattern.
    They lock an extent range in the tree, do some operation and then
    unlock.

    This translates to at least 2 rbtree searches, and maybe more if they
    are doing operations on the extent state tree. A locked extent
    in the tree isn't going to be merged or changed, and so we can
    safely return the extent state structure as a cached handle.

    This changes set_extent_bit to give back a cached handle, and also
    changes both set_extent_bit and clear_extent_bit to use the cached
    handle if it is available.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Btrfs is currently mirroring some of the page state bits into
    its extent state tree. The goal behind this was to use it in supporting
    blocksizes other than the page size.

    But, we don't currently support that, and we're using quite a lot of CPU
    on the rb tree and its spin lock. This commit starts a series of
    cleanups to reduce the amount of work done in the extent state tree as
    part of each IO.

    This commit:

    * Adds the ability to lock an extent in the state tree and also set
    other bits. The idea is to do locking and delalloc in one call

    * Removes the EXTENT_WRITEBACK and EXTENT_DIRTY bits. Btrfs is using
    a combination of the page bits and the ordered write code for this
    instead.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • As the extent state tree is manipulated, there are call backs
    that are used to take extra actions when different state bits are set
    or cleared. One example of this is a counter for the total number
    of delayed allocation bytes in a single inode and in the whole FS.

    When new states are inserted, this callback is being done before we
    properly setup the new state. This hasn't caused problems before
    because the lock bit was always done first, and the existing call backs
    don't care about the lock bit.

    This patch makes sure the state is properly setup before using the
    callback, which is important for later optimizations that do more work
    without using the lock bit.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • There are two main users of the extent_map tree. The
    first is regular file inodes, where it is evenly spread
    between readers and writers.

    The second is the chunk allocation tree, which maps blocks from
    logical addresses to phyiscal ones, and it is 99.99% reads.

    The mapping tree is a point of lock contention during heavy IO
    workloads, so this commit switches things to a rw lock.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • When btrfs fills a large delayed allocation extent, it is a good idea
    to try and convince the write_cache_pages caller to go ahead and
    write a good chunk of that extent. The extra IO is basically free
    because we know it is contiguous.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The Btrfs set_extent_bit call currently searches the rbtree
    every time it needs to find more extent_state objects to fill
    the requested operation.

    This adds a simple test with rb_next to see if the next object
    in the tree was adjacent to the one we just found. If so,
    we skip the search and just use the next object.

    Signed-off-by: Chris Mason

    Chris Mason
     

10 Jun, 2009

1 commit


27 Apr, 2009

1 commit


25 Apr, 2009

1 commit


21 Apr, 2009

3 commits

  • The extent_io writepage call updates the writepage index in the inode
    as it makes progress. But, it was doing the update after unlocking the page,
    which isn't legal because page->mapping can't be trusted once the page
    is unlocked.

    This lead to an oops, especially common with compression turned on. The
    fix here is to update the writeback index before unlocking the page.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
    higher priority. But, the checksumming helper threads prevent it
    from being fully effective.

    There are two problems. First, a big queue of pending checksumming
    will delay the synchronous IO behind other lower priority writes. Second,
    the checksumming uses an ordered async work queue. The ordering makes sure
    that IOs are sent to the block layer in the same order they are sent
    to the checksumming threads. Usually this gives us less seeky IO.

    But, when we start mixing IO priorities, the lower priority IO can delay
    the higher priority IO.

    This patch solves both problems by adding a high priority list to the async
    helper threads, and a new btrfs_set_work_high_prio(), which is used
    to make put a new async work item onto the higher priority list.

    The ordering is still done on high priority IO, but all of the high
    priority bios are ordered separately from the low priority bios. This
    ordering is purely an IO optimization, it is not involved in data
    or metadata integrity.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for
    writes we plan on waiting on in the near future. This patch
    mirrors recent changes in other filesystems and the generic code to
    use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for
    other latency critical writes.

    Btrfs uses async worker threads for checksumming before the write is done,
    and then again to actually submit the bios. The bio submission code just
    runs a per-device list of bios that need to be sent down the pipe.

    This list is split into low priority and high priority lists so the
    WRITE_SYNC IO happens first.

    Signed-off-by: Chris Mason

    Chris Mason