10 Oct, 2007

2 commits

  • Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
    threads while working in the dlm. This allows dlm_recv activity to be
    suspended when the lockspace transitions to, from and between recovery
    cycles.

    The specific bug prompting this change is one where an in-progress
    recovery cycle is aborted by a new recovery cycle. While dlm_recv was
    processing a recovery message, the recovery cycle was aborted and
    dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count
    on an rsb after dlm_recoverd had reset it to zero. This is fixed by
    suspending dlm_recv (taking write lock on the rwsem) before aborting the
    current recovery.

    The transitions to/from normal and recovery modes are simplified by using
    this new ability to block dlm_recv. The switch from normal to recovery
    mode means dlm_recv goes from processing locking messages, to saving them
    for later, and vice versa. Races are avoided by blocking dlm_recv when
    setting the flag that switches between modes.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • If the castaddr passed to the userland API is NULL then don't overwrite the
    existing castparam. This allows a different thread to cancel a lock request and
    the CANCEL AST gets delivered to the original thread.

    bz#306391 (for RHEL4) refers.

    Signed-Off-By: Patrick Caulfield
    Signed-off-by: Steven Whitehouse

    Patrick Caulfield
     

14 Aug, 2007

1 commit

  • Fix a long standing bug where a blocking callback would be missed
    when there's a granted lock in PR mode and waiting locks in both
    PR and CW modes (and the PR lock was added to the waiting queue
    before the CW lock). The logic simply compared the numerical values
    of the modes to determine if a blocking callback was required, but in
    the one case of PR and CW, the lower valued CW mode blocks the higher
    valued PR mode. We just need to add a special check for this PR/CW
    case in the tests that decide when a blocking callback is needed.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

09 Jul, 2007

9 commits

  • Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
    This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
    (This updated version of the patch uses gfp_t for ls_allocation.)

    Signed-Off-By: Patrick Caulfield
    Signed-Off-By: David Teigland
    Signed-off-by: Steven Whitehouse

    Patrick Caulfield
     
  • Add a function that can be used through libdlm by a system daemon to cancel
    another process's deadlocked lock. A completion ast with EDEADLK is returned
    to the process waiting for the lock.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Various fixes related to the new timeout feature:
    - add_timeout() missed setting TIMEWARN flag on lkb's when the
    TIMEOUT flag was already set
    - clear_proc_locks should remove a dead process's locks from the
    timeout list
    - the end-of-life calculation for user locks needs to consider that
    ETIMEDOUT is equivalent to -DLM_ECANCEL
    - make initial default timewarn_cs config value visible in configfs
    - change bit position of TIMEOUT_CANCEL flag so it's not copied to
    a remote master node
    - set timestamp on remote lkb's so a lock dump will display the time
    they've been waiting

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • A one liner fix which got missed from the earlier patches.

    Signed-off-by: Steven Whitehouse
    Cc: Fabio Massimo Di Nitto
    Cc: David Teigland

    Steven Whitehouse
     
  • In the rush to get the previous patch set sent, a compilation bug I fixed
    shortly before sending somehow got clobbered, probably by a missed quilt
    refresh or something.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • When conversion deadlock is detected, cancel the conversion and return
    EDEADLK to the application. This is a new default behavior where before
    the dlm would allow the deadlock to exist indefinately.

    The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the
    dlm from performing conversion deadlock detection/cancelation on it.
    The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the
    dlm to demote the granted mode of the lock being converted if it gets into
    a conversion deadlock.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Change the user/kernel device interface used by libdlm:
    - Add ability for userspace to check the version of the interface. libdlm
    can now adapt to different versions of the kernel interface.
    - Increase the size of the flags passed in a lock request so all possible
    flags can be used from userspace.
    - Add an opaque "xid" value for each lock. This "transaction id" will be
    used later to associate locks with each other during deadlock detection.
    - Add a "timeout" value for each lock. This is used along with the
    DLM_LKF_TIMEOUT flag.

    Also, remove a fragment of unused code in device_read().

    This patch requires updating libdlm which is backward compatible with
    older kernels.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT
    flag is set, then the request/conversion will be canceled after waiting
    the specified number of centiseconds (specified per lock). This feature
    is only available for locks requested through libdlm (can be enabled for
    kernel dlm users if there's a use for it.)

    If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
    a warning message will be sent to userspace (using genetlink) after a
    request/conversion has been waiting for a given number of centiseconds
    (configurable per node). The time warnings will be used in the future
    to do deadlock detection in userspace.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Don't let dlm_scand run during recovery since it may try to do a resource
    directory removal while the directory nodes are changing.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

01 May, 2007

5 commits

  • There are flags to enable two specialized features in the dlm:
    1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
    changing the granted mode of locks to NL.
    2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
    or CW to grant them if the normal requested mode can't be granted.

    GFS direct i/o exercises both of these features, especially when mixed
    with buffered i/o. The dlm has problems with them.

    The first problem is on the master node. If it demotes a lock as a part of
    converting it, the actual step of converting the lock isn't being done
    after the demotion, the lock is just left sitting on the granted queue
    with a granted mode of NL. I think the mistaken assumption was that the
    call to grant_pending_locks() would grant it, but that function naturally
    doesn't look at locks on the granted queue.

    The second problem is on the process node. If the master either demotes
    or gives an altmode, the munging of the gr/rq modes is never done in the
    process copy of the lock, leaving the master/process copies out of sync.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • A lock id is a uint32 and is used as an opaque reference to the lock. For
    userland apps, the lkid is passed up, through libdlm, as the return value
    from a write() on the dlm device. This created a problem when the high
    bit was 1, making the lkid look like an error. This is fixed by changing
    how the lkid is composed. The low 16 bits identified the hash bucket for
    the lock and the high 16 bits were a per-bucket counter (which eventually
    hit 0x8000 causing the problem). These are simply swapped around; the
    number of hash table buckets is far below 0x8000, making all lkid's
    positive when viewed as signed.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Add code for purging orphan locks. A process can also purge all of its
    own non-orphan locks by passing a pid of zero. Code already exists for
    processes to create persistent locks that become orphans when the process
    exits, but the complimentary capability for another process to then purge
    these orphans has been missing.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • This splits the current create_message() function into two parts so that
    later patches can call the new lower-level _create_message() function when
    they don't have an rsb struct. No functional change in this patch.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Full cancel and force-unlock support. In the past, cancel and force-unlock
    wouldn't work if there was another operation in progress on the lock. Now,
    both cancel and unlock-force can overlap an operation on a lock, meaning there
    may be 2 or 3 operations in progress on a lock in parallel. This support is
    important not only because cancel and force-unlock are explicit operations
    that an app can use, but both are used implicitly when a process exits while
    holding locks.

    Summary of changes:

    - add-to and remove-from waiters functions were rewritten to handle situations
    with more than one remote operation outstanding on a lock

    - validate_unlock_args detects when an overlapping cancel/unlock-force
    can be sent and when it needs to be delayed until a request/lookup
    reply is received

    - processing request/lookup replies detects when cancel/unlock-force
    occured during the op, and carries out the delayed cancel/unlock-force

    - manipulation of the "waiters" (remote operation) state of a lock moved under
    the standard rsb mutex that protects all the other lock state

    - the two recovery routines related to locks on the waiters list changed
    according to the way lkb's are now locked before accessing waiters state

    - waiters recovery detects when lkb's being recovered have overlapping
    cancel/unlock-force, and may not recover such locks

    - revert_lock (cancel) returns a value to distinguish cases where it did
    nothing vs cases where it actually did a cancel; the cancel completion ast
    should only be done when cancel did something

    - orphaned locks put on new list so they can be found later for purging

    - cancel must be called on a lock when making it an orphan

    - flag user locks (ENDOFLIFE) at the end of their useful life (to the
    application) so we can return an error for any further cancel/unlock-force

    - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
    either a completion or blocking ast

    - clear an unread bast on a lock that's become unlocked

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

06 Feb, 2007

9 commits

  • A new lvb for a userland lock wasn't being initialized to zero.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • A long, complicated sequence of events, beginning with the RESEND flag not
    being cleared on an lkb, can result in an unlock never completing.

    - lkb on waiters list for remote lookup
    - the remote node is both the dir node and the master node, so
    it optimizes the lookup into a request and sends a request
    reply back
    - the request reply is saved on the requestqueue to be processed
    after recovery
    - recovery runs dlm_recover_waiters_pre() which sets RESEND flag
    so the lookup will be resent after recovery
    - end of recovery: process_requestqueue takes saved request reply
    which removes the lkb off the waitesr list, _without_ clearing
    the RESEND flag
    - end of recovery: dlm_recover_waiters_post() doesn't do anything
    with the now completed lookup lkb (would usually clear RESEND)
    - later, the node unmounts, unlocks this lkb that still has RESEND
    flag set
    - the lkb is on the waiters list again, now for unlock, when recovery
    occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
    set, doesn't do anything since the master still exists
    - end of recovery: dlm_recover_waiters_post() takes this lkb off
    the waiters list because it has the RESEND flag set, then reports
    an error because unlocks are never supposed to be handled in
    recover_waiters_post().
    - later, the unlock reply is received, doesn't find the lkb on
    the waiters list because recover_waiters_post() has wrongly
    removed it.
    - the unlock operation has been lost, and we're left with a
    stray granted lock
    - unmount spins waiting for the unlock to complete

    The visible evidence of this problem will be a node where gfs umount is
    spinning, the dlm waiters list will be empty, and the dlm locks list will
    show a granted lock.

    The fix is simply to clear the RESEND flag when taking an lkb off the
    waiters list.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • dlm_receive_message() returns 0 instead of returning 'error'. What would
    happen is that process_requestqueue would take a saved message off the
    requestqueue and call receive_message on it. receive_message would then
    see that recovery had been aborted, set error to EINTR, and 'goto out',
    expecting that the error would be returned. Instead, 0 was always
    returned, so process_requestqueue would think that the message had been
    processed and delete it instead of saving it to process next time. This
    means the message (usually an unlock in my tests) would be lost.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • When a user process exits, we clear all the locks it holds. There is a
    problem, though, with locks that the process had begun unlocking before it
    exited. We couldn't find the lkb's that were in the process of being
    unlocked remotely, to flag that they are DEAD. To solve this, we move
    lkb's being unlocked onto a new list in the per-process structure that
    tracks what locks the process is holding. We can then go through this
    list to flag the necessary lkb's when clearing locks for a process when it
    exits.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
    can use macros to add configfs functions to access them (in a later
    patch). No functional changes in this patch, just naming changes.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • When the dlm fakes an unlock/cancel reply from a failed node using a stub
    message struct, it wasn't setting the flags in the stub message. So, in
    the process of receiving the fake message the lkb flags would be updated
    and cleared from the zero flags in the message. The problem observed in
    tests was the loss of the USER flag which caused the dlm to think a user
    lock was a kernel lock and subsequently fail an assertion checking the
    validity of the ast/callback field.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • LVB's are not sent as part of new requests, but the code receiving the
    request was copying data into the lvb anyway. The space in the message
    where it mistakenly thought the lvb lived actually contained the resource
    name, so it wound up incorrectly copying this name data into the lvb. Fix
    is to just create the lvb, not copy junk into it.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • The send_args() function is used to copy parameters into a message for a
    number different message types. Only some of those types are set up
    beforehand (in create_message) to include space for sending lvb data.
    send_args was wrongly copying the lvb for all message types as long as the
    lock had an lvb. This means that the lvb data was being written past the
    end of the message into unknown space.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • There's a chance the new master of resource hasn't learned it's the new
    master before another node sends it a lock during recovery. The node
    sending the lock needs to resend if this happens.

    - A sends a master lookup for resource R to C
    - B sends a master lookup for resource R to C
    - C receives A's lookup, assigns A to be master of R and
    sends a reply back to A
    - C receives B's lookup and sends a reply back to B saying
    that A is the master
    - B receives lookup reply from C and sends its lock for R to A
    - A receives lock from B, doesn't think it's the master of R
    and sends an error back to B
    - A receives lookup reply from C and becomes master of R
    - B gets error back from A and resends its lock back to A
    (this resending is what this patch does)
    - A receives lock from B, it now sees it's the master of R
    and takes the lock

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

30 Nov, 2006

2 commits

  • RH BZ 211622

    The ALTMODE flag can be set in the lock master's copy of the lock but
    never cleared, so ALTMODE will also be returned in a subsequent conversion
    of the lock when it shouldn't be. This results in lock_dlm incorrectly
    switching to the alternate lock mode when returning the result to gfs
    which then asserts when it sees the wrong lock state. The fix is to
    propagate the cleared sbflags value to the master node when the lock is
    requested. QA's d_rwrandirectlarge test triggers this bug very quickly.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • Red Hat BZ 211914

    There's a race between dlm_recoverd (1) enabling locking and (2) clearing
    out the requestqueue, and dlm_recvd (1) checking if locking is enabled and
    (2) adding a message to the requestqueue. An order of recoverd(1),
    recvd(1), recvd(2), recoverd(2) will result in a message being left on the
    requestqueue. The fix is to have dlm_recvd check if dlm_recoverd has
    enabled locking after taking the mutex for the requestqueue and if it has
    processing the message instead of queueing it.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

25 Sep, 2006

1 commit


09 Sep, 2006

1 commit

  • Fixing the following scenario:
    - A request is on the waiters list waiting for a reply from a remote node.
    - The request is the first one on the resource, so first_lkid is set.
    - The remote node fails causing recovery.
    - During recovery the requesting node becomes master.
    - The request is now processed locally instead of being a remote operation.
    - At this point we need to call confirm_master() on the resource since
    we're certain we're now the master node. This will clear first_lkid.
    - We weren't calling confirm_master(), so first_lkid was not being cleared
    causing subsequent requests on that resource to get stuck.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

24 Aug, 2006

1 commit

  • The down-conversion optimization was resulting in the lkb flags being
    cleared because the stub message reply had no flags value set. Copy the
    current flags into the stub message so they'll be copied back into the lkb
    as part of processing the fake reply. Also add an assertion to catch this
    error more directly if it exists elsewhere.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

23 Aug, 2006

2 commits


21 Aug, 2006

1 commit

  • Introduce new function dlm_dump_rsb() to call within assertions instead of
    dlm_print_rsb(). The new function dumps info about all locks on the rsb
    in addition to rsb details.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     

08 Aug, 2006

1 commit

  • This patch fixes the userland DLM unlock code so that it correctly returns the
    address of the userland lock status block in its completion AST.

    It fixes bug #201348

    Patrick

    Signed-Off-By: Patrick Caulfield
    Signed-off-by: Steven Whitehouse

    Patrick Caulfield
     

26 Jul, 2006

2 commits


20 Jul, 2006

2 commits


13 Jul, 2006

1 commit

  • This changes the way the dlm handles user locks. The core dlm is now
    aware of user locks so they can be dealt with more efficiently. There is
    no more dlm_device module which previously managed its own duplicate copy
    of every user lock.

    Signed-off-by: Patrick Caulfield
    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland