19 Oct, 2018

2 commits

  • Since the resync region from suspend_info means one node
    is reshaping this area, so the position of reshape_progress
    should be included in the area.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     
  • To support add disk under grow mode, we need to resize
    all the bitmaps of each node before reshape, so that we
    can ensure all nodes have the same view of the bitmap of
    the clustered raid.

    So after the master node resized the bitmap, it broadcast
    a message to other slave nodes, and it checks the size of
    each bitmap are same or not by compare pages. We can only
    continue the reshaping after all nodes update the bitmap
    to the same size (by checking the pages), otherwise revert
    bitmap size to previous value.

    The resize_bitmaps interface and BITMAP_RESIZE message are
    introduced in md-cluster.c for the purpose.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

17 Mar, 2017

1 commit

  • To update size for cluster raid, we need to make
    sure all nodes can perform the change successfully.
    However, it is possible that some of them can't do
    it due to failure (bitmap_resize could fail). So
    we need to consider the issue before we set the
    capacity unconditionally, and we use below steps
    to perform sanity check.

    1. A change the size, then broadcast METADATA_UPDATED
    msg.
    2. B and C receive METADATA_UPDATED change the size
    excepts call set_capacity, sync_size is not update
    if the change failed. Also call bitmap_update_sb
    to sync sb to disk.
    3. A checks other node's sync_size, if sync_size has
    been updated in all nodes, then send CHANGE_CAPACITY
    msg otherwise send msg to revert previous change.
    4. B and C call set_capacity if receive CHANGE_CAPACITY
    msg, otherwise pers->resize will be called to restore
    the old value.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     

10 May, 2016

1 commit

  • The in-memory bitmap is not ready when node joins cluster,
    so it doesn't make sense to make gather_all_resync_info()
    called so earlier, we need to call it after the node's
    bitmap is setup. Also, recv_thread could be wake up after
    node joins cluster, but it could cause problem if node
    receives RESYNCING message without persionality since
    mddev->pers->quiesce is called in process_suspend_info.

    This commit introduces a new cluster interface load_bitmaps
    to fix above problems, load_bitmaps is called in bitmap_load
    where bitmap and persionality are ready, and load_bitmaps
    does the following tasks:

    1. call gather_all_resync_info to load all the node's
    bitmap info.
    2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread
    could be wake up, and wake up recv_thread if there is
    pending recv event.

    Then ack_bast only wakes up recv_thread after IN_CLUSTER
    bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is
    set.

    Reviewed-by: NeilBrown
    Signed-off-by: Guoqing Jiang
    Signed-off-by: Shaohua Li

    Guoqing Jiang
     

06 Jan, 2016

1 commit

  • For clustered raid, we need to do extra actions when change
    bitmap to none.

    1. check if all the bitmap lock could be get or not, if yes then
    we can continue the change since cluster raid is only active
    in current node. Otherwise return fail and unlock the related
    bitmap locks
    2. set nodes to 0 and then leave cluster environment.
    3. release other nodes's bitmap lock.

    Signed-off-by: Guoqing Jiang
    Signed-off-by: NeilBrown

    Guoqing Jiang
     

12 Oct, 2015

3 commits

  • Adding the disk worked incorrectly with the new reload code. Fix it:

    - No operation should be performed on rdev marked as Candidate
    - After a metadata update operation, kick disk if role is 0xfffe
    else clear Candidate bit and continue with the regular change check.
    - Saving the mode of the lock resource to check if token lock is already
    locked, because it can be called twice while adding a disk. However,
    unlock_comm() must be called only once.
    - add_new_disk() is called by the node initiating the --add operation.
    If it needs to be canceled, call add_new_disk_cancel(). The operation
    is completed by md_update_sb() which will write and unlock the
    communication.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • Resync or recovery must be performed by only one node at a time.
    A DLM lock resource, resync_lockres provides the mutual exclusion
    so that only one node performs the recovery/resync at a time.

    If a node is unable to get the resync_lockres, because recovery is
    being performed by another node, it set MD_RECOVER_NEEDED so as
    to schedule recovery in the future.

    Remove the debug message in resync_info_update()
    used during development.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • Suspending the entire device for resync could take too long. Resync
    in small chunks.

    cluster's resync window (32M) is maintained in r1conf as
    cluster_sync_low and cluster_sync_high and processed in
    raid1's sync_request(). If the current resync is outside the cluster
    resync window:

    1. Set the cluster_sync_low to curr_resync_completed.
    2. Check if the sync will fit in the new window, if not issue a
    wait_barrier() and set cluster_sync_low to sector_nr.
    3. Set cluster_sync_high to cluster_sync_low + resync_window.
    4. Send a message to all nodes so they may add it in their suspension
    list.

    bitmap_cond_end_sync is modified to allow to force a sync inorder
    to get the curr_resync_completed uptodate with the sector passed.

    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: NeilBrown

    Goldwyn Rodrigues
     

24 Jul, 2015

1 commit

  • During a node failure, We need to suspend read balancing so that the
    reads are directed to the first device and stale data is not read.
    Suspending writes is not required because these would be recorded and
    synced eventually.

    A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep().
    area_resyncing() will respond true for the entire devices if this
    flag is set and the request type is READ. The flag is cleared
    in recover_done().

    Signed-off-by: Goldwyn Rodrigues
    Reported-By: David Teigland
    Signed-off-by: NeilBrown

    Goldwyn Rodrigues
     

22 Apr, 2015

2 commits

  • When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state,
    the clustered md:

    1. Sends RE_ADD message with the desc_nr. Nodes receiving the message
    clear the Faulty bit in their respective rdev->flags.
    2. The node initiating re-add, gathers the bitmaps of all nodes
    and copies them into the local bitmap. It does not clear the bitmap
    from which it is copying.
    3. Initiating node schedules a md recovery to sync the devices.

    Signed-off-by: Guoqing Jiang
    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: NeilBrown

    Goldwyn Rodrigues
     
  • This adds "remove" capabilities for the clustered environment.
    When a user initiates removal of a device from the array, a
    REMOVE message with disk number in the array is sent to all
    the nodes which kick the respective device in their own array.

    This facilitates the removal of failed devices.

    Signed-off-by: Goldwyn Rodrigues
    Signed-off-by: NeilBrown

    Goldwyn Rodrigues
     

21 Mar, 2015

1 commit


23 Feb, 2015

7 commits

  • Algorithm:
    1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
    ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
    2. Node 1 sends NEWDISK with uuid and slot number
    3. Other nodes issue kobject_uevent_env with uuid and slot number
    (Steps 4,5 could be a udev rule)
    4. In userspace, the node searches for the disk, perhaps
    using blkid -t SUB_UUID=""
    5. Other nodes issue either of the following depending on whether the disk
    was found:
    ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
    disc.number set to slot number)
    ioctl(CLUSTERED_DISK_NACK)
    6. Other nodes drop lock on no-new-devs (CR) if device is found
    7. Node 1 attempts EX lock on no-new-devs
    8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk
    as SpareLocal
    9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED
    10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.

    Signed-off-by: Lidong Zhong
    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • If there is a resync going on, all nodes must suspend writes to the
    range. This is recorded in the suspend_info/suspend_list.

    If there is an I/O within the ranges of any of the suspend_info,
    should_suspend will return 1.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • When a resync is initiated, RESYNCING message is sent to all active
    nodes with the range (lo,hi). When the resync is over, a RESYNCING
    message is sent with (0,0). A high sector value of zero indicates
    that the resync is over.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • - request to send a message
    - make changes to superblock
    - send messages telling everyone that the superblock has changed
    - other nodes all read the superblock
    - other nodes all ack the messages
    - updating node release the "I'm sending a message" resource.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • When a node joins, it does not know of other nodes performing resync.
    So, each node keeps the resync information in it's LVB. When a new
    node joins, it reads the LVB of each "online" bitmap.

    [TODO] The new node attempts to get the PW lock on other bitmap, if
    it is successful, it reads the bitmap and performs the resync (if
    required) on it's behalf.

    If the node does not get the PW, it requests CR and reads the LVB
    for the resync information.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • DLM offers callbacks when a node fails and the lock remastery
    is performed:

    1. recover_prep: called when DLM discovers a node is down
    2. recover_slot: called when DLM identifies the node and recovery
    can start
    3. recover_done: called when all nodes have completed recover_slot

    recover_slot() and recover_done() are also called when the node joins
    initially in order to inform the node with its slot number. These slot
    numbers start from one, so we deduct one to make it start with zero
    which the cluster-md code uses.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues
     
  • This allows dynamic registering of cluster hooks.

    Signed-off-by: Goldwyn Rodrigues

    Goldwyn Rodrigues