16 Jan, 2018

1 commit

  • In order to provide data consistency with PPL for disks with write-back
    cache enabled all data has to be flushed to disks before next PPL
    entry. The disks to be flushed are marked in the bitmap. It's modified
    under a mutex and it's only read after PPL io unit is submitted.

    A limitation of 64 disks in the array has been introduced to keep data
    structures and implementation simple. RAID5 arrays with so many disks are
    not likely due to high risk of multiple disks failure. Such restriction
    should not be a real life limitation.

    With write-back cache disabled next PPL entry is submitted when data write
    for current one completes. Data flush defers next log submission so trigger
    it when there are no stripes for handling found.

    As PPL assures all data is flushed to disk at request completion, just
    acknowledge flush request when PPL is enabled.

    Signed-off-by: Tomasz Majchrzak
    Signed-off-by: Shaohua Li

    Tomasz Majchrzak
     

02 Nov, 2017

1 commit


04 May, 2017

1 commit

  • Pull MD updates from Shaohua Li:

    - Add Partial Parity Log (ppl) feature found in Intel IMSM raid array
    by Artur Paszkiewicz. This feature is another way to close RAID5
    writehole. The Linux implementation is also available for normal
    RAID5 array if specific superblock bit is set.

    - A number of md-cluser fixes and enabling md-cluster array resize from
    Guoqing Jiang

    - A bunch of patches from Ming Lei and Neil Brown to rewrite MD bio
    handling related code. Now MD doesn't directly access bio bvec,
    bi_phys_segments and uses modern bio API for bio split.

    - Improve RAID5 IO pattern to improve performance for hard disk based
    RAID5/6 from me.

    - Several patches from Song Liu to speed up raid5-cache recovery and
    allow raid5 cache feature disabling in runtime.

    - Fix a performance regression in raid1 resync from Xiao Ni.

    - Other cleanup and fixes from various people.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (84 commits)
    md/raid10: skip spare disk as 'first' disk
    md/raid1: Use a new variable to count flighting sync requests
    md: clear WantReplacement once disk is removed
    md/raid1/10: remove unused queue
    md: handle read-only member devices better.
    md/raid10: wait up frozen array in handle_write_completed
    uapi: fix linux/raid/md_p.h userspace compilation error
    md-cluster: Fix a memleak in an error handling path
    md: support disabling of create-on-open semantics.
    md: allow creation of mdNNN arrays via md_mod/parameters/new_array
    raid5-ppl: use a single mempool for ppl_io_unit and header_page
    md/raid0: fix up bio splitting.
    md/linear: improve bio splitting.
    md/raid5: make chunk_aligned_read() split bios more cleanly.
    md/raid10: simplify handle_read_error()
    md/raid10: simplify the splitting of requests.
    md/raid1: factor out flush_bio_list()
    md/raid1: simplify handle_read_error().
    Revert "block: introduce bio_copy_data_partial"
    md/raid1: simplify alloc_behind_master_bio()
    ...

    Linus Torvalds
     

18 Mar, 2017

1 commit


17 Mar, 2017

2 commits

  • Implement the calculation of partial parity for a stripe and PPL write
    logging functionality. The description of PPL is added to the
    documentation. More details can be found in the comments in raid5-ppl.c.

    Attach a page for holding the partial parity data to stripe_head.
    Allocate it only if mddev has the MD_HAS_PPL flag set.

    Partial parity is the xor of not modified data chunks of a stripe and is
    calculated as follows:

    - reconstruct-write case:
    xor data from all not updated disks in a stripe

    - read-modify-write case:
    xor old data and parity from all updated disks in a stripe

    Implement it using the async_tx API and integrate into raid_run_ops().
    It must be called when we still have access to old data, so do it when
    STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is
    stored into sh->ppl_page.

    Partial parity is not meaningful for full stripe write and is not stored
    in the log or used for recovery, so don't attempt to calculate it when
    stripe has STRIPE_FULL_WRITE.

    Put the PPL metadata structures to md_p.h because userspace tools
    (mdadm) will also need to read/write PPL.

    Warn about using PPL with enabled disk volatile write-back cache for
    now. It can be removed once disk cache flushing before writing PPL is
    implemented.

    Signed-off-by: Artur Paszkiewicz
    Signed-off-by: Shaohua Li

    Artur Paszkiewicz
     
  • To update size for cluster raid, we need to make
    sure all nodes can perform the change successfully.
    However, it is possible that some of them can't do
    it due to failure (bitmap_resize could fail). So
    we need to consider the issue before we set the
    capacity unconditionally, and we use below steps
    to perform sanity check.

    1. A change the size, then broadcast METADATA_UPDATED
    msg.
    2. B and C receive METADATA_UPDATED change the size
    excepts call set_capacity, sync_size is not update
    if the change failed. Also call bitmap_update_sb
    to sync sb to disk.
    3. A checks other node's sync_size, if sync_size has
    been updated in all nodes, then send CHANGE_CAPACITY
    msg otherwise send msg to revert previous change.
    4. B and C call set_capacity if receive CHANGE_CAPACITY
    msg, otherwise pers->resize will be called to restore
    the old value.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     

14 Feb, 2017

2 commits