23 Oct, 2015

1 commit

  • commit 012572d4fc2e4ddd5c8ec8614d51414ec6cae02a upstream.

    The order of the following three spinlocks should be:
    dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock

    But dlm_dispatch_assert_master() is called while holding
    dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
    dlm_grab() which will take dlm_domain_lock.

    Once another thread (for example, dlm_query_join_handler) has already
    taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
    happens.

    Signed-off-by: Joseph Qi
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc: "Junxiao Bi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Joseph Qi
     

06 May, 2015

1 commit

  • There is a race window in dlm_get_lock_resource(), which may return a
    lock resource which has been purged. This will cause the process to
    hang forever in dlmlock() as the ast msg can't be handled due to its
    lock resource not existing.

    dlm_get_lock_resource {
    ...
    spin_lock(&dlm->spinlock);
    tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
    if (tmpres) {
    spin_unlock(&dlm->spinlock);
    >>>>>>>> race window, dlm_run_purge_list() may run and purge
    the lock resource
    spin_lock(&tmpres->spinlock);
    ...
    spin_unlock(&tmpres->spinlock);
    }
    }

    Signed-off-by: Junxiao Bi
    Cc: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

11 Feb, 2015

4 commits

  • A tiny race between BAST and unlock message causes the NULL dereference.

    A node sends an unlock request to master and receives a response. Before
    processing the response it receives a BAST from the master. Since both
    requests are processed by different threads it creates a race. While the
    BAST is being processed, lock can get freed by unlock code.

    This patch makes bast to return immediately if lock is found but unlock is
    pending. The code should handle this race. We also have to fix master
    node to skip sending BAST after receiving unlock message.

    Below is the crash stack

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
    IP: o2dlm_blocking_ast_wrapper+0xd/0x16
    dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
    dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
    o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
    o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
    worker_thread+0x14d/0x1ed

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Srinivas Eeda
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srinivas Eeda
     
  • Remove dlm_joined() that is not used anywhere.

    This was partially found by using a static code analysis program called
    cppcheck.

    Signed-off-by: Rickard Strandqvist
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rickard Strandqvist
     
  • Use snprintf format specifier "%lu" instead of "%ld" for argument of type
    'unsigned long'.

    Signed-off-by: Alex Chen
    Reviewed-by: Joseph Qi
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    alex chen
     
  • When the recovery master is down, the owner of $RECOVERY calls
    dlm_do_local_recovery_cleanup() to prune any $RECOVERY entries for dead
    nodes. The lock is in the granted list and the refcount must be 2. We
    should put twice to remove this lock. Otherwise, it will lead to a memory
    leak.

    Signed-off-by: joyce.xue
    Reported-by: yangwenfang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     

09 Jan, 2015

1 commit

  • In dlm_process_recovery_data, only when dlm_new_lock failed the ret will
    be set to -ENOMEM. And in this case, newlock is definitely NULL. So
    test newlock is meaningless, remove it.

    Signed-off-by: Joseph Qi
    Reviewed-by: Alex Chen
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

19 Dec, 2014

1 commit

  • Commit ac4fef4d23ed ("ocfs2/dlm: do not purge lockres that is queued for
    assert master") may have the following possible race case:

    dlm_dispatch_assert_master dlm_wq
    ========================================================================
    queue_work(dlm->quedlm_worker,
    &dlm->dispatched_work);
    dispatch work,
    dlm_lockres_drop_inflight_worker
    *BUG_ON(res->inflight_assert_workers == 0)*
    dlm_lockres_grab_inflight_worker
    inflight_assert_workers++

    So ensure inflight_assert_workers to be increased first.

    Signed-off-by: Joseph Qi
    Signed-off-by: Xue jiufei
    Cc: Joel Becker
    Reviewed-by: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

11 Dec, 2014

3 commits

  • In commit 1faf289454b9 ("ocfs2_dlm: disallow a domain join if node maps
    mismatch") we introduced a new earlier NULL check so this one is not
    needed. Also static checkers complain because we dereference it first
    and then check for NULL.

    Signed-off-by: Dan Carpenter
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • Node A sends master query request to node B which is the master. At this
    time lockres happens to be on purgelist. dlm_master_request_handler gets
    the dlm spinlock, finds the resource and releases the dlm spin lock.
    Right at this dlm_thread on this node could purge the lockres.
    dlm_master_request_handler can then acquire lockres spinlock and reply to
    Node A that node B is the master even though lockres on node B is purged.

    The above scenario will now make node A falsely think node B is the master
    which is inconsistent. Further if another node C tries to master the same
    resource, every node will respond they are not the master. Node C then
    masters the resource and sends assert master to all nodes. This will now
    make node A crash with the following message.

    dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
    owner is 10!

    Signed-off-by: Srinivas Eeda
    Cc: Mark Fasheh
    Cc: Joel Becker
    Reviewed-by: Wengang Wang
    Tested-by: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srinivas Eeda
     
  • Do not BUG() if GFP_ATOMIC allocation fails in dlm_dispatch_assert_master.
    Instead, return -ENOMEM to the sender and then retry.

    Signed-off-by: Joseph Qi
    Reviewed-by: Alex Chen
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

10 Oct, 2014

5 commits

  • The following case may lead to o2net_wq and o2hb thread deadlock on
    o2hb_callback_sem.
    Currently there are 2 nodes say N1, N2 in the cluster. And N2 down, at
    the same time, N3 tries to join the cluster. So N1 will handle node
    down (N2) and join (N3) simultaneously.
    o2hb o2net_wq
    ->o2hb_do_disk_heartbeat
    ->o2hb_check_slot
    ->o2hb_run_event_list
    ->o2hb_fire_callbacks
    ->down_write(&o2hb_callback_sem)
    ->o2net_hb_node_down_cb
    ->flush_workqueue(o2net_wq)
    ->o2net_process_message
    ->dlm_query_join_handler
    ->o2hb_check_node_heartbeating
    ->o2hb_fill_node_map
    ->down_read(&o2hb_callback_sem)

    No need to take o2hb_callback_sem in dlm_query_join_handler,
    o2hb_live_lock is enough to protect live node map.

    Signed-off-by: Joseph Qi
    Cc: xMark Fasheh
    Cc: Joel Becker
    Cc: jiangyiwen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Reduce boilerplate code by using seq_open_private() instead of seq_open()

    Signed-off-by: Rob Jones
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Jones
     
  • Remove the branch that free res->lockname.name because the condition
    is never satisfied when jump to label error.

    Signed-off-by: joyce.xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • dlm_lockres_put() should be called without &res->spinlock, otherwise a
    deadlock case may happen.

    spin_lock(&res->spinlock)
    ...
    dlm_lockres_put
    ->dlm_lockres_release
    ->dlm_print_one_lock_resource
    ->spin_lock(&res->spinlock)

    Signed-off-by: Alex Chen
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    alex chen
     
  • Refactoring error handling in dlm_alloc_ctxt to simplify code.

    Signed-off-by: Joseph Qi
    Reviewed-by: Alex Chen
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

03 Oct, 2014

1 commit


26 Sep, 2014

1 commit

  • There is a deadlock case which reported by Guozhonghua:
    https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html

    This case is caused by &res->spinlock and &dlm->master_lock
    misordering in different threads.

    It was introduced by commit 8d400b81cc83 ("ocfs2/dlm: Clean up refmap
    helpers"). Since lockres is new, it doesn't not require the
    &res->spinlock. So remove it.

    Fixes: 8d400b81cc83 ("ocfs2/dlm: Clean up refmap helpers")
    Signed-off-by: Joseph Qi
    Reviewed-by: joyce.xue
    Reported-by: Guozhonghua
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

07 Aug, 2014

2 commits

  • Orabug: 19074140

    When umount is issued during recovery on the new master that has not
    finished remastering locks, it triggers BUG() in
    dlm_send_mig_lockres_msg(). Here is the situation:

    1) node A has a lock on resource X mastered by node B.

    2) node B dies -> node A sets recovering flag for res X

    3) Node C becomes the new master for resources owned by the
    dead node and is remastering locks of the dead node but
    has not finished the remastering process yet.

    4) umount is issued on node C.

    5) During processing of umount, ignoring unfished recovery,
    node C attempts to migrate resource X to node A.

    6) node A finds res X in DLM_LOCK_RES_RECOVERING state, considers
    it a logic error and sends back -EFAULT.

    7) node C asserts BUG() upon seeing EFAULT resp from node B.

    Fix is to delay migrating res X till remastering is finished at which
    point recovering flag will be cleared on both A and C.

    Signed-off-by: Tariq Saeed
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tariq Saeed
     
  • The unit of total_backoff is msecs not jiffies, so no need to do the
    conversion. Otherwise, the join timeout is not 90 sec.

    Signed-off-by: Yiwen Jiang
    Signed-off-by: joyce.xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     

24 Jun, 2014

4 commits

  • When workqueue is delayed, it may occur that a lockres is purged while it
    is still queued for master assert. it may trigger BUG() as follows.

    N1 N2
    dlm_get_lockres()
    ->dlm_do_master_requery
    is the master of lockres,
    so queue assert_master work

    dlm_thread() start running
    and purge the lockres

    dlm_assert_master_worker()
    send assert master message
    to other nodes
    receiving the assert_master
    message, set master to N2

    dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID,
    if it is RECOVERY lockres, it triggers the BUG().

    Another BUG() is triggered when N3 become the new master and send
    assert_master to N1, N1 will trigger the BUG() because owner doesn't
    match. So we should not purge lockres when it is queued for assert
    master.

    Signed-off-by: joyce.xue
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • The following case may lead to endless loop during umount.

    node A node B node C node D
    umount volume,
    migrate lockres1
    to B
    want to lock lockres1,
    send
    MASTER_REQUEST_MSG
    to C
    init block mle
    send
    MIGRATE_REQUEST_MSG
    to C
    find a block
    mle, and then
    return
    DLM_MIGRATE_RESPONSE_MASTERY_REF
    to B
    set C in refmap
    umount successfully
    try to umount, endless
    loop occurs when migrate
    lockres1 since C is in
    refmap

    So we can fix this endless loop case by only returning
    DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
    MIGRATE_REQUEST_MSG.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: jiangyiwen
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Xue jiufei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    jiangyiwen
     
  • When a lockres in purge list but is still in use, it should be moved to
    the tail of purge list. dlm_thread will continue to check next lockres in
    purge list. However, code list_move_tail(&dlm->purge_list,
    &lockres->purge) will do *no* movements, so dlm_thread will purge the same
    lockres in this loop again and again. If it is in use for a long time,
    other lockres will not be processed.

    Signed-off-by: Yiwen Jiang
    Signed-off-by: joyce.xue
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • …letimeout closes conn

    Orabug: 18639535

    Two node cluster and both nodes hold a lock at PR level and both want to
    convert to EX at the same time. Master node 1 has sent BAST and then
    closes the connection due to idletime out. Node 0 receives BAST, sends
    unlock req with cancel flag but gets error -ENOTCONN. The problem is
    this error is ignored in dlm_send_remote_unlock_request() on the
    **incorrect** assumption that the master is dead. See NOTE in comment
    why it returns DLM_NORMAL. Upon getting DLM_NORMAL, node 0 proceeds to
    sends convert (without cancel flg) which fails with -ENOTCONN. waits 5
    sec and resends.

    This time gets DLM_IVLOCKID from the master since lock not found in
    grant, it had been moved to converting queue in response to conv PR->EX
    req. No way out.

    Node 1 (master) Node 0
    ============== ======

    lock mode PR PR

    convert PR -> EX
    mv grant -> convert and que BAST
    ...
    <-------- convert PR -> EX
    convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
    ...
    BAST (want PR -> NL)
    ------------------>
    ...
    idle timout, conn closed
    ...
    In response to BAST,
    sends unlock with cancel convert flag
    gets -ENOTCONN. Ignores and
    sends remote convert request
    gets -ENOTCONN, waits 5 Sec, retries
    ...
    reconnects
    <----------------- convert req goes through on next try
    does not find lock on grant que
    status DLM_IVLOCKID
    ------------------>
    ...

    No way out. Fix is to keep retrying unlock with cancel flag until it
    succeeds or the master dies.

    Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
    Reviewed-by: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Tariq Saeed
     

05 Jun, 2014

4 commits

  • dlm_recovery_ctxt.received is unused.

    ocfs2_should_refresh_lock_res() can only return 0 or 1, so the error
    handling code in ocfs2_super_lock() is unneeded.

    Signed-off-by: joyce.xue
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • We found a race situation when dlm recovery and node joining occurs
    simultaneously if the network state is bad.

    N1 N4

    start joining dlm and send
    query join to all live nodes
    set joining node to N1, return OK
    send query join to other
    live nodes and it may take
    a while

    call dlm_send_join_assert()
    to send assert join message
    when N2 is down, so keep
    trying to send message to N2
    until find N2 is down

    send assert join message to
    N3, but connection is down
    with N3, so it may take a
    while
    become the recovery master for N2
    and send begin reco message to other
    nodes in domain map but no N1
    connection with N3 is rebuild,
    then send assert join to N4
    call dlm_assert_joined_handler(),
    add N1 to domain_map

    dlm recovery done, send finalize message
    to nodes in domain map, including N1
    receiving finalize message,
    trigger the BUG() because
    recovery master mismatch.

    Signed-off-by: joyce.xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • We found there is a conversion deadlock when the owner of lockres
    happened to crash before send DLM_PROXY_AST_MSG for a downconverting
    lock. The situation is as follows:

    Node1 Node2 Node3
    the owner of lockresA
    lock_1 granted at EX mode
    and call ocfs2_cluster_unlock
    to decrease ex_holders.
    converting lock_3 from
    NL to EX
    send DLM_PROXY_AST_MSG
    to Node1, asking Node 1
    to downconvert.
    receiving DLM_PROXY_AST_MSG,
    thread ocfs2dc send
    DLM_CONVERT_LOCK_MSG
    to Node2 to downconvert
    lock_1(EX->NL).
    lock_1 can be granted and
    put it into pending_asts
    list, return DLM_NORMAL.
    then something happened
    and Node2 crashed.
    received DLM_NORMAL, waiting
    for DLM_PROXY_AST_MSG.
    selected as the recovery
    master, receving migrate
    lock from Node1, queue
    lock_1 to the tail of
    converting list.

    After dlm recovery, converting list in the master of lockresA(Node3)
    will be: converting list head lock_3(NL->EX) lock_1(EXNL).
    Requested mode of lock_3 is not compatible with the granted mode of
    lock_1, so it can not be granted. and lock_1 can not downconvert
    because covnerting queue is strictly FIFO. So a deadlock is created.
    We think function dlm_process_recovery_data() should queue_ast for
    lock_1 or alter the order of lock_1 and lock_3, so dlm_thread can
    process lock_1 first. And if there are multiple downconverting locks,
    they must convert form PR to NL, so no need to sort them.

    Signed-off-by: joyce.xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • Static values are automatically initialized to NULL.

    Signed-off-by: Fabian Frederick
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

24 May, 2014

1 commit

  • In dlm_init, if create dlm_lockname_cache failed in
    dlm_init_master_caches, it will destroy dlm_lockres_cache which created
    before twice. And this will cause system die when loading modules.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

04 Apr, 2014

4 commits

  • In dlm_query_region_handler(), once kmalloc failed, it will unlock
    dlm_domain_lock without lock first, then deadlock happens.

    Signed-off-by: Zhonghua Guo
    Signed-off-by: Joseph Qi
    Reviewed-by: Srinivas Eeda
    Tested-by: Joseph Qi
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhonghua Guo
     
  • There is a race window in dlm_do_recovery() between dlm_remaster_locks()
    and dlm_reset_recovery() when the recovery master nearly finish the
    recovery process for a dead node. After the master sends FINALIZE_RECO
    message in dlm_remaster_locks(), another node may become the recovery
    master for another dead node, and then send the BEGIN_RECO message to
    all the nodes included the old master, in the handler of this message
    dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
    dlm->reco.new_master will be set to the second dead node and the new
    master, then in dlm_reset_recovery(), these two variables will be reset
    to default value. This will cause new recovery master can not finish
    the recovery process and hung, at last the whole cluster will hung for
    recovery.

    old recovery master: new recovery master:
    dlm_remaster_locks()
    become recovery master for
    another dead node.
    dlm_send_begin_reco_message()
    dlm_begin_reco_handler()
    {
    if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
    return -EAGAIN;
    }
    dlm_set_reco_master(dlm, br->node_idx);
    dlm_set_reco_dead_node(dlm, br->dead_node);
    }
    dlm_reset_recovery()
    {
    dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
    dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
    }
    will hang in dlm_remaster_locks() for
    request dlm locks info

    Before send FINALIZE_RECO message, recovery master should set
    DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
    this can break the race windows as the BEGIN_RECO messages will not be
    handled before DLM_RECO_STATE_FINALIZE flag is cleared.

    A similar race may happen between new recovery master and normal node
    which is in dlm_finalize_reco_handler(), also fix it.

    Signed-off-by: Junxiao Bi
    Reviewed-by: Srinivas Eeda
    Reviewed-by: Wengang Wang
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • This issue was introduced by commit 800deef3f6f8 ("ocfs2: use
    list_for_each_entry where benefical") in 2007 where it replaced
    list_for_each with list_for_each_entry. The variable "lock" will point
    to invalid data if "tmpq" list is empty and a panic will be triggered
    due to this. Sunil advised reverting it back, but the old version was
    also not right. At the end of the outer for loop, that
    list_for_each_entry will also set "lock" to an invalid data, then in the
    next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
    data and cause the panic. So reverting the list_for_each back and reset
    "lock" to NULL to fix this issue.

    Another concern is that this seemes can not happen because the "tmpq"
    list should not be empty. Let me describe how.

    old lock resource owner(node 1): migratation target(node 2):
    image there's lockres with a EX lock from node 2 in
    granted list, a NR lock from node x with convert_type
    EX in converting list.
    dlm_empty_lockres() {
    dlm_pick_migration_target() {
    pick node 2 as target as its lock is the first one
    in granted list.
    }
    dlm_migrate_lockres() {
    dlm_mark_lockres_migrating() {
    res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
    wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
    //after the above code, we can not dirty lockres any more,
    // so dlm_thread shuffle list will not run
    downconvert lock from EX to NR
    upconvert lock from NR to EX
    <<< migration may schedule out here, then
    <<< node 2 send down convert request to convert type from EX to
    <<< NR, then send up convert request to convert type from NR to
    <<< EX, at this time, lockres granted list is empty, and two locks
    <<< in the converting list, node x up convert lock followed by
    <<< node 2 up convert lock.

    // will set lockres RES_MIGRATING flag, the following
    // lock/unlock can not run
    dlm_lockres_release_ast(dlm, res);
    }

    dlm_send_one_lockres()
    dlm_process_recovery_data()
    for (i=0; inum_locks; i++)
    if (ml->node == dlm->node_num)
    for (j = DLM_GRANTED_LIST; j <<< lock is invalid as grant list is empty.
    }
    if (lock->ml.node != ml->node)
    BUG() >>> crash here
    }

    I see the above locks status from a vmcore of our internal bug.

    Signed-off-by: Junxiao Bi
    Reviewed-by: Wengang Wang
    Cc: Sunil Mushran
    Reviewed-by: Srinivas Eeda
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • When mounting an ocfs2 volume, it will firstly generate a file
    /sys/kernel/debug/o2dlm//dlm_state, and then launch the dlm thread.
    So the following situation will cause a null pointer dereference.
    dlm_debug_init -> access file dlm_state which will call dlm_state_print ->
    dlm_launch_thread

    Move dlm_debug_init after dlm_launch_thread and dlm_launch_recovery_thread
    can fix this issue.

    Signed-off-by: Zongxun Wang
    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zongxun Wang
     

22 Jan, 2014

1 commit

  • The versioning information is confusing for end-users. The numbers are
    stuck at 1.5.0 when the tools version have moved to 1.8.2. Remove the
    versioning system in the OCFS2 modules and let the kernel version be the
    guide to debug issues.

    Signed-off-by: Goldwyn Rodrigues
    Acked-by: Sunil Mushran
    Cc: Mark Fasheh
    Acked-by: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     

13 Nov, 2013

3 commits

  • Signed-off-by: Junxiao Bi
    Signed-off-by: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • We trigger a bug in __dlm_lockres_reserve_ast() when we parallel umount 4
    nodes. The situation is as follows:

    1) Node A migrate all lockres it owned(eg. lockres A) to other nodes
    say node B when it umounts.

    2) Receiving MIG_LOCKRES message from A, Node B masters the lockres A
    with DLM_LOCK_RES_MIGRATING state set.

    3) Then we umount ocfs2 on node B. It also should migrate lockres A to
    another node, say node C. But now, DLM_LOCK_RES_MIGRATING state of
    lockers A is not cleared. Node B triggered the BUG on lockres with
    state DLM_LOCK_RES_MIGRATING.

    Signed-off-by: Xuejiufei
    Signed-off-by: Joel Becker
    Cc: Mark Fasheh
    Cc: Tariq Saeed
    Cc: Srinivas Eeda
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • A parallel umount on 4 nodes triggered a bug in
    dlm_process_recovery_date(). Here's the situation:

    Receiving MIG_LOCKRES message, A node processes the locks in migratable
    lockres. It copys lvb from migratable lockres when processing the first
    valid lock.

    If there is a lock in the blocked list with the EX level, it triggers the
    BUG. Since valid lvbs are set when locks are granted with EX or PR
    levels, locks in the blocked list cannot have valid lvbs. Therefore I
    think we should skip the locks in the blocked list.

    Signed-off-by: Xuejiufei
    Signed-off-by: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     

12 Sep, 2013

3 commits

  • dlm_do_local_recovery_cleanup() should force clean refmap if the owner of
    lockres is UNKNOWN. Otherwise node may hang when umounting filesystems.
    Here's the situation:

    Node1 Node2
    dlmlock()
    -> dlm_get_lock_resource()
    send DLM_MASTER_REQUEST_MSG to
    other nodes.

    trying to master this lockres,
    return MAYBE.

    selected as the master of lockresA,
    set mle->master to Node1,
    and do assert_master,
    send DLM_ASSERT_MASTER_MSG to Node2.
    Node 2 has interest on lockresA
    and return
    DLM_ASSERT_RESPONSE_MASTERY_REF
    then something happened and
    Node2 crashed.

    Receiving DLM_ASSERT_RESPONSE_MASTERY_REF, set Node2 into refmap, and keep
    sending DLM_ASSERT_MASTER_MSG to other nodes

    o2hb found node2 down, calling dlm_hb_node_down() -->
    dlm_do_local_recovery_cleanup() the master of lockresA is still UNKNOWN,
    no need to call dlm_free_dead_locks().

    Set the master of lockresA to Node1, but Node2 stills remains in refmap.

    When Node1 umount, it found that the refmap of lockresA is not empty and
    attempted to migrate it to Node2, But Node2 is already down, so umount
    hang, trying to migrate lockresA again and again.

    Signed-off-by: joyce
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Jie Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • [dan.carpenter@oracle.com: fix up some NULL dereference bugs]
    Signed-off-by: Dong Fang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Jeff Liu
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dong Fang
     
  • dlm_request_all_locks() should deal with the status sent from target node
    if DLM_LOCK_REQUEST_MSG is sent successfully, or recovery master will fall
    into endless loop, waiting for other nodes to send locks and
    DLM_RECO_DATA_DONE_MSG to me.

    NodeA NodeB
    selected as recovery master
    dlm_remaster_locks()
    ->dlm_request_all_locks()
    send DLM_LOCK_REQUEST_MSG to nodeA

    It happened that NodeA cannot alloc memory when it processes this
    message. dlm_request_all_locks_handler() do not queue
    dlm_request_all_locks_worker and returns -ENOMEM. It will never send
    locks and DLM_RECO_DATA_DONE_MSG to NodeB.

    NodeB do not deal with the status
    sent from nodeA, and will fall in
    endless loop waiting for the
    recovery state of NodeA to be
    changed.

    Signed-off-by: joyce
    Cc: Mark Fasheh
    Cc: Jeff Liu
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei