15 Feb, 2014

1 commit


09 Aug, 2012

1 commit

  • The in_recovery rw_semaphore has always been acquired and
    released by different threads by design. To work around
    the "BUG: bad unlock balance detected!" messages, adjust
    things so the dlm_recoverd thread always does both down_write
    and up_write.

    Signed-off-by: David Teigland

    David Teigland
     

17 Jul, 2012

1 commit

  • Remove the dir hash table (dirtbl), and use
    the rsb hash table (rsbtbl) as the resource
    directory. It has always been an unnecessary
    duplication of information.

    This improves efficiency by using a single rsbtbl
    lookup in many cases where both rsbtbl and dirtbl
    lookups were needed previously.

    This eliminates the need to handle cases of rsbtbl
    and dirtbl being out of sync.

    In many cases there will be memory savings because
    the dir hash table no longer exists.

    Signed-off-by: David Teigland

    David Teigland
     

03 May, 2012

1 commit

  • The "nodir" mode (statically assign master nodes instead
    of using the resource directory) has always been highly
    experimental, and never seriously used. This commit
    fixes a number of problems, making nodir much more usable.

    - Major change to recovery: recover all locks and restart
    all in-progress operations after recovery. In some
    cases it's not possible to know which in-progess locks
    to recover, so recover all. (Most require recovery
    in nodir mode anyway since rehashing changes most
    master nodes.)

    - Change the way nodir mode is enabled, from a command
    line mount arg passed through gfs2, into a sysfs
    file managed by dlm_controld, consistent with the
    other config settings.

    - Allow recovering MSTCPY locks on an rsb that has not
    yet been turned into a master copy.

    - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
    from a previous, aborted recovery cycle. Base this
    on the local recovery status not being in the state
    where any nodes should be sending LOCK messages for the
    current recovery cycle.

    - Hold rsb lock around dlm_purge_mstcpy_locks() because it
    may run concurrently with dlm_recover_master_copy().

    - Maintain highbast on process-copy lkb's (in addition to
    the master as is usual), because the lkb can switch
    back and forth between being a master and being a
    process copy as the master node changes in recovery.

    - When recovering MSTCPY locks, flag rsb's that have
    non-empty convert or waiting queues for granting
    at the end of recovery. (Rename flag from LOCKS_PURGED
    to RECOVER_GRANT and similar for the recovery function,
    because it's not only resources with purged locks
    that need grant a grant attempt.)

    - Replace a couple of unnecessary assertion panics with
    error messages.

    Signed-off-by: David Teigland

    David Teigland
     

27 Apr, 2012

1 commit


04 Jan, 2012

2 commits

  • These new callbacks notify the dlm user about lock recovery.
    GFS2, and possibly others, need to be aware of when the dlm
    will be doing lock recovery for a failed lockspace member.

    In the past, this coordination has been done between dlm and
    file system daemons in userspace, which then direct their
    kernel counterparts. These callbacks allow the same
    coordination directly, and more simply.

    Signed-off-by: David Teigland

    David Teigland
     
  • Put all the calls to recovery barriers in the same function
    to clarify where they each happen. Should not change any behavior.
    Also modify some recovery debug lines to make them consistent.

    Signed-off-by: David Teigland

    David Teigland
     

16 Jul, 2011

1 commit

  • Instead of creating our own kthread (dlm_astd) to deliver
    callbacks for all lockspaces, use a per-lockspace workqueue
    to deliver the callbacks. This eliminates complications and
    slowdowns from many lockspaces sharing the same thread.

    Signed-off-by: David Teigland

    David Teigland
     

22 Apr, 2008

1 commit

  • If a node is removed from a lockspace, and then added back before the
    dlm is notified of the removal, the dlm will not detect the removal
    and won't clear the old state from the node. This is fixed by using a
    list of added nodes so the membership recovery can detect when a newly
    added node is already in the member list.

    Signed-off-by: David Teigland

    David Teigland
     

31 Jan, 2008

1 commit

  • To prevent the master of an rsb from changing rapidly, an unused rsb is kept
    on the "toss list" for a period of time to be reused. The toss list was
    being cleared completely for each recovery, which is unnecessary. Much of
    the benefit of the toss list can be maintained if nodes keep rsb's in their
    toss list that they are the master of. These rsb's need to be included
    when the resource directory is rebuilt during recovery.

    Signed-off-by: David Teigland

    David Teigland
     

10 Oct, 2007

1 commit

  • Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
    threads while working in the dlm. This allows dlm_recv activity to be
    suspended when the lockspace transitions to, from and between recovery
    cycles.

    The specific bug prompting this change is one where an in-progress
    recovery cycle is aborted by a new recovery cycle. While dlm_recv was
    processing a recovery message, the recovery cycle was aborted and
    dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count
    on an rsb after dlm_recoverd had reset it to zero. This is fixed by
    suspending dlm_recv (taking write lock on the rwsem) before aborting the
    current recovery.

    The transitions to/from normal and recovery modes are simplified by using
    this new ability to block dlm_recv. The switch from normal to recovery
    mode means dlm_recv goes from processing locking messages, to saving them
    for later, and vice versa. Races are avoided by blocking dlm_recv when
    setting the flag that switches between modes.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

09 Jul, 2007

1 commit

  • New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT
    flag is set, then the request/conversion will be canceled after waiting
    the specified number of centiseconds (specified per lock). This feature
    is only available for locks requested through libdlm (can be enabled for
    kernel dlm users if there's a use for it.)

    If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
    a warning message will be sent to userspace (using genetlink) after a
    request/conversion has been waiting for a given number of centiseconds
    (configurable per node). The time warnings will be used in the future
    to do deadlock detection in userspace.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

06 Feb, 2007

1 commit


30 Nov, 2006

5 commits

  • This fixes the following gcc warnings generated on
    the architectures where uint64_t != unsigned long long (e.g. ppc64).

    fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t'
    fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t'
    fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
    fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
    fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Patrick Caulfield
    Signed-off-by: Steven Whitehouse

    Ryusuke Konishi
     
  • Requests that arrive after recovery has started are saved in the
    requestqueue and processed after recovery is done. Some of these requests
    are purged during recovery if they are from nodes that have been removed.
    We move the purging of the requests (dlm_purge_requestqueue) to later in
    the recovery sequence which allows the routine saving requests
    (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the
    same will be done by the purge. The current code has add_requestqueue
    filtering by nodeid but doesn't hold any locks when accessing the list of
    current nodes. This also means that we need to call the purge routine
    when the lockspace is being shut down since the add routine will not be
    rejecting requests itself any more.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Red Hat BZ 211914

    The previous patch "[DLM] fix aborted recovery during
    node removal" was incomplete as discovered with further testing. It set
    the bit for the RS_LOCKS barrier but did not then wait for the barrier.
    This is often ok, but sometimes it will cause yet another recovery hang.
    If it's a new node that also has the lowest nodeid that skips the barrier
    wait, then it misses the important step of collecting and reporting the
    barrier status from the other nodes (which is the job of the low nodeid in
    the barrier wait routine).

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Red Hat BZ 211914

    When many nodes are joining a lockspace simultaneously, the dlm gets a
    quick sequence of stop/start events, a pair for adding each node.
    dlm_controld in user space sends dlm_recoverd in the kernel each stop and
    start event. dlm_controld will sometimes send the stop before
    dlm_recoverd has had a chance to take up the previously queued start. The
    stop aborts the processing of the previous start by setting the
    RECOVERY_STOP flag. dlm_recoverd is erroneously clearing this flag and
    ignoring the stop/abort if it happens to take up the start after the stop
    meant to abort it. The fix is to check the sequence number that's
    incremented for each stop/start before clearing the flag.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Red Hat BZ 211914

    With the new cluster infrastructure, dlm recovery for a node removal can
    be aborted and restarted for a node addition. When this happens, the
    restarted recovery isn't aware that it's doing recovery for the earlier
    removal as well as the addition. So, it then skips the recovery steps
    only required when nodes are removed. This can result in locks not being
    purged for failed/removed nodes. The fix is to check for removed nodes
    for which recovery has not been completed at the start of a new recovery
    sequence.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

25 Aug, 2006

1 commit

  • When a new lockspace was being created, the recoverd thread was being
    started for it before the lockspace was added to the global list of
    lockspaces. The new thread was looking up the lockspace in the global
    list and sometimes not finding it due to the race with the original thread
    adding it to the list. We need to add the lockspace to the global list
    before starting the thread instead of after, and if the new thread can't
    find the lockspace for some reason, it should return an error.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

09 Aug, 2006

1 commit

  • When we abort one recovery to do another, break out of the ping_members()
    routine more quickly, and wake up the dlm_recoverd thread more quickly
    instead of waiting for it to time out.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

20 Jan, 2006

1 commit


18 Jan, 2006

1 commit

  • This is the core of the distributed lock manager which is required
    to use GFS2 as a cluster filesystem. It is also used by CLVM and
    can be used as a standalone lock manager independantly of either
    of these two projects.

    It implements VAX-style locking modes.

    Signed-off-by: David Teigland
    Signed-off-by: Steve Whitehouse

    David Teigland