24 May, 2011

2 commits

  • Found these with the help of ispell -l.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Lars Ellenberg
    Signed-off-by: Philipp Reisner

    Bart Van Assche
     
  • An administrative detach used to request a state change directly to D_DISKLESS,
    first suspending IO to avoid the last put_ldev() occuring from an endio handler,
    potentially in irq context.

    This is not enough on the receiving side (typically secondary), we may miss
    some peer_req on the way to local disk, which then may do the last put_ldev()
    from their drbd_peer_request_endio().

    This patch makes the detach always go through the intermediate D_FAILED state.
    We may consider to rename it D_DETACHING.

    Alternative approach would be to create yet an other work item to be scheduled
    on the worker, do the destructor work from there, and get the timing right.

    manually picked commit 564040f from the drbd 8.4 branch.

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Lars Ellenberg
     

31 Mar, 2011

1 commit


10 Mar, 2011

7 commits


22 Oct, 2010

1 commit


15 Oct, 2010

4 commits

  • This adds a necessary race breaker to these commits:
    drbd: fix for possible deadlock on IO error during resync
    drbd: drop wrong debug asserts, fix recently introduced race

    What we do is get a refcount, check the state, then depending on the
    state and the requested minimum disk state, either hold it (success),
    or give it back immediately (failed "try lock").

    Some code paths (flushing of drbd metadata) may still grab and hold a
    refcount even if we are D_FAILED (application IO won't).
    So even if we hit local_cnt == 0 once after being D_FAILED,
    we still need to wait for that again after we changed to D_DISKLESS.
    Once local_cnt reaches 0 while we are D_DISKLESS, we can be sure that
    no one will look at the protected members anymore, so only then is it
    safe to free them.

    We cannot easily convert to standard locking primitives here, as we want
    to be able to use it in atomic context (we always do a "try lock"),
    as well as hold references for a "long time" (from IO submission to
    completion callback).

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Lars Ellenberg
     
  • Various cleanup paths have been incomplete, for the very unlikely case
    that we cannot allocate enough bios from process context when submitting
    on behalf of the peer or resync process.

    Never observed.

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Lars Ellenberg
     
  • There are three ways to get IO suspended:

    * Loss of any access to data
    * Fence-peer-handler running
    * User requested to suspend IO

    Track those in different bits, so that one condition clearing its
    state bit does not interfere with the other two conditions.

    Only when the user resumes IO he overrules all three bits.

    The fact is hidden from the user, he sees only a single suspend
    bit.

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Philipp Reisner
     
  • Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Philipp Reisner
     

14 Oct, 2010

2 commits


08 Aug, 2010

1 commit

  • It was a now abandoned attempt to throttle resync bandwidth
    based on the delay it causes on the bulk data socket.
    It has no userbase yet, and has been disabled by
    9173465ccb51c09cc3102a10af93e9f469a0af6f already.
    This removes the now unused code.

    The basic feature, namely using up "idle" bandwith
    of network and disk IO subsystem, with minimal impact
    to application IO, is being reimplemented differently.

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg
    Signed-off-by: Jens Axboe

    Lars Ellenberg
     

14 Jun, 2010

1 commit

  • This was a very hard to trigger race condition.

    If we got a state packet from the peer, after drbd_nl_disk() has
    already changed the disk state to D_NEGOTIATING but
    after_state_ch() was not yet run by the worker, then receive_state()
    might called drbd_sync_handshake(), which in turn crashed
    when accessing p_uuid.

    Signed-off-by: Philipp Reisner
    Signed-off-by: Lars Ellenberg

    Philipp Reisner
     

01 Jun, 2010

1 commit


22 May, 2010

1 commit


18 May, 2010

1 commit


11 Mar, 2010

1 commit


12 Jan, 2010

1 commit


26 Nov, 2009

1 commit


25 Nov, 2009

1 commit


04 Nov, 2009

1 commit


06 Oct, 2009

1 commit


02 Oct, 2009

1 commit