Eric Lee / smarc-fsl-linux-kernel

22 Apr, 2015

3 commits

97f6cd39d md-cluster: re-add capabilities ... Browse Code »

When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state,
the clustered md:

1. Sends RE_ADD message with the desc_nr. Nodes receiving the message
clear the Faulty bit in their respective rdev->flags.
2. The node initiating re-add, gathers the bitmaps of all nodes
and copies them into the local bitmap. It does not clear the bitmap
from which it is copying.
3. Initiating node schedules a md recovery to sync the devices.

Signed-off-by: Guoqing Jiang
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: NeilBrown

Goldwyn Rodrigues
2015-04-22 05:59:39 +0800
88bcfef7b md-cluster: remove capabilities ... Browse Code »

This adds "remove" capabilities for the clustered environment.
When a user initiates removal of a device from the array, a
REMOVE message with disk number in the array is sent to all
the nodes which kick the respective device in their own array.

This facilitates the removal of failed devices.

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: NeilBrown

Goldwyn Rodrigues
2015-04-22 05:59:39 +0800
8c58f02e2 md-cluster: correct the num for comparison ... Browse Code »

Since the node num of md-cluster is from zero, and
cinfo->slot_number represents the slot num of dlm,
no need to check for equality.

Signed-off-by: Guoqing Jiang
Signed-off-by: Goldwyn Rodrigues
Signed-off-by: NeilBrown

Guoqing Jiang
2015-04-22 05:58:31 +0800

21 Mar, 2015

3 commits

09dd1af2e md/cluster: Communication Framework: fix semicolon.cocci warnings ... Browse Code »

drivers/md/md-cluster.c:328:2-3: Unneeded semicolon

Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Signed-off-by: Fengguang Wu
Signed-off-by: NeilBrown

kbuild test robot
2015-03-21 07:33:00 +0800
6dc69c9c4 md: recover_bitmaps() can be static ... Browse Code »

drivers/md/md-cluster.c:190:6: sparse: symbol 'recover_bitmaps' was not declared. Should it be static?

Signed-off-by: Fengguang Wu
Signed-off-by: NeilBrown

kbuild test robot
2015-03-21 07:33:00 +0800
fa8259da0 md: Fix stray --cluster-confirm crash ... Browse Code »

A --cluster-confirm without an --add (by another node) can
crash the kernel.

Fix it by guarding it using a state.

Signed-off-by: Goldwyn Rodrigues
Signed-off-by: NeilBrown

Goldwyn Rodrigues
2015-03-21 07:33:00 +0800

23 Feb, 2015

18 commits

1aee41f63 Add new disk to clustered array ... Browse Code »

Algorithm:
1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
2. Node 1 sends NEWDISK with uuid and slot number
3. Other nodes issue kobject_uevent_env with uuid and slot number
(Steps 4,5 could be a udev rule)
4. In userspace, the node searches for the disk, perhaps
using blkid -t SUB_UUID=""
5. Other nodes issue either of the following depending on whether the disk
was found:
ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
disc.number set to slot number)
ioctl(CLUSTERED_DISK_NACK)
6. Other nodes drop lock on no-new-devs (CR) if device is found
7. Node 1 attempts EX lock on no-new-devs
8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk
as SpareLocal
9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED
10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.

Signed-off-by: Lidong Zhong
Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:07 +0800
589a1c491 Suspend writes in RAID1 if within range ... Browse Code »

If there is a resync going on, all nodes must suspend writes to the
range. This is recorded in the suspend_info/suspend_list.

If there is an I/O within the ranges of any of the suspend_info,
should_suspend will return 1.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:07 +0800
e59721ccd Resync start/Finish actions ... Browse Code »

When a RESYNC_START message arrives, the node removes the entry
with the current slot number and adds the range to the
suspend_list.

Simlarly, when a RESYNC_FINISHED message is received, node clears
entry with respect to the bitmap number.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:07 +0800
965400eb6 Send RESYNCING while performing resync start/stop ... Browse Code »

When a resync is initiated, RESYNCING message is sent to all active
nodes with the range (lo,hi). When the resync is over, a RESYNCING
message is sent with (0,0). A high sector value of zero indicates
that the resync is over.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:06 +0800
1d7e3e961 Reload superblock if METADATA_UPDATED is received ... Browse Code »

Re-reads the devices by invalidating the cache.
Since we don't write to faulty devices, this is detected using
events recorded in the devices. If it is old as compared to the mddev
mark it is faulty.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:06 +0800
293467aa1 metadata_update sends message to other nodes ... Browse Code »

- request to send a message
- make changes to superblock
- send messages telling everyone that the superblock has changed
- other nodes all read the superblock
- other nodes all ack the messages
- updating node release the "I'm sending a message" resource.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:06 +0800
601b515c5 Communication Framework: Sending functions ... Browse Code »

The sending part is split in two functions to make sure
atomicity of the operations, such as the MD superblock update.

Signed-off-by: Lidong Zhong
Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:06 +0800
4664680c3 Communication Framework: Receiving ... Browse Code »

1. receive status

sender receiver receiver
ACK:CR ACK:CR ACK:CR

2. sender get EX of TOKEN
sender get EX of MESSAGE
sender receiver receiver
TOKEN:EX ACK:CR ACK:CR
MESSAGE:EX
ACK:CR

3. sender write LVB.
sender down-convert MESSAGE from EX to CR
sender try to get EX of ACK
[ wait until all receiver has *processed* the MESSAGE ]

[ triggered by bast of ACK ]
receiver get CR of MESSAGE
receiver read LVB
receiver processes the message
[ wait finish ]
receiver release ACK

sender receiver receiver
TOKEN:EX MESSAGE:CR MESSAGE:CR
MESSAGE:CR
ACK:EX

4. sender down-convert ACK from EX to CR
sender release MESSAGE
sender release TOKEN
receiver upconvert to EX of MESSAGE
receiver get CR of ACK
receiver release MESSAGE

sender receiver receiver
ACK:CR ACK:CR ACK:CR

Signed-off-by: Lidong Zhong
Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:06 +0800
4b26a08af Perform resync for cluster node failure ... Browse Code »

If bitmap_copy_slot returns hi>0, we need to perform resync.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:06 +0800
e94987db2 Initiate recovery on node failure ... Browse Code »

The DLM informs us in case of node failure with the DLM slot number.
cluster_info->recovery_map sets the bit corresponding to the slot number
and wakes up the recovery thread.

The recovery thread:
1. Derives the slot number from the recovery_map
2. Locks the bitmap corresponding to the slot
3. Copies the set bits to the node-local bitmap

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 23:59:05 +0800
96ae923ab Gather on-going resync information of other nodes ... Browse Code »

When a node joins, it does not know of other nodes performing resync.
So, each node keeps the resync information in it's LVB. When a new
node joins, it reads the LVB of each "online" bitmap.

[TODO] The new node attempts to get the PW lock on other bitmap, if
it is successful, it reads the bitmap and performs the resync (if
required) on it's behalf.

If the node does not get the PW, it requests CR and reads the LVB
for the resync information.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:30:11 +0800
54519c5f4 Lock bitmap while joining the cluster ... Browse Code »

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:30:11 +0800
b97e92574 Use separate bitmaps for each nodes in the cluster ... Browse Code »

On-disk format:

0 4k 8k 12k
-------------------------------------------------------------------
| idle | md super | bm super [0] + bits |
| bm bits[0, contd] | bm super[1] + bits | bm bits[1, contd] |
| bm super[2] + bits | bm bits [2, contd] | bm super[3] + bits |
| bm bits [3, contd] | | |

Bitmap super has a field nodes, which defines the maximum number
of nodes the device can use. While reading the bitmap super, if
the cluster finds out that the number of nodes is > 0:
1. Requests the md-cluster module.
2. Calls md_cluster_ops->join(), which sets up clustering such as
joining DLM lockspace.

Since the first time, the first bitmap is read. After the call
to the cluster_setup, the bitmap offset is adjusted and the
superblock is re-read. This also ensures the bitmap is read
the bitmap lock (when bitmap lock is introduced in later patches)

Questions:
1. cluster name is repeated in all bitmap supers. Is that okay?

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:30:11 +0800
cf921cc19 Add node recovery callbacks ... Browse Code »

DLM offers callbacks when a node fails and the lock remastery
is performed:

1. recover_prep: called when DLM discovers a node is down
2. recover_slot: called when DLM identifies the node and recovery
can start
3. recover_done: called when all nodes have completed recover_slot

recover_slot() and recover_done() are also called when the node joins
initially in order to inform the node with its slot number. These slot
numbers start from one, so we deduct one to make it start with zero
which the cluster-md code uses.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:30:11 +0800
c4ce867fd Introduce md_cluster_info ... Browse Code »

md_cluster_info stores the cluster information in the MD device.

The join() is called when mddev detects it is a clustered device.
The main responsibilities are:
1. Setup a DLM lockspace
2. Setup all initial locks such as super block locks and bitmap lock (will come later)

The leave() clears up the lockspace and all the locks held.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:28:42 +0800
edb39c9de Introduce md_cluster_operations to handle cluster functions ... Browse Code »

This allows dynamic registering of cluster hooks.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:28:42 +0800
47741b7ca DLM lock and unlock functions ... Browse Code »

A dlm_lock_resource is a structure which contains all information
required for locking using DLM. The init function allocates the
lock and acquires the lock in NL mode. The unlock function
converts the lock resource to NL mode. This is done to preserve
LVB and for faster processing of locks. The lock resource is
DLM unlocked only in the lockres_free function, which is the end
of life of the lock resource.

Signed-off-by: Lidong Zhong
Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:28:42 +0800
8e854e9cf Create a separate module for clustering support ... Browse Code »

Tagged as EXPERIMENTAL for now.

Signed-off-by: Goldwyn Rodrigues

Goldwyn Rodrigues
2015-02-23 21:28:42 +0800