02 Mar, 2017

1 commit


28 Feb, 2017

1 commit


14 Jan, 2017

1 commit

  • Since we need to change the implementation, stop exposing internals.

    Provide kref_read() to read the current reference count; typically
    used for debug messages.

    Kills two anti-patterns:

    atomic_read(&kref->refcount)
    kref->refcount.counter

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Dec, 2016

3 commits

  • When 'dispatch_assert' is set, 'response' must be DLM_MASTER_RESP_YES,
    and 'res' won't be null, so execution can't reach these two branch.

    Link: http://lkml.kernel.org/r/58174C91.3040004@huawei.com
    Signed-off-by: Jun Piao
    Reviewed-by: Joseph Qi Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    piaojun
     
  • The variable `set_maybe' is redundant when the mle has been found in the
    map. So it is ok to set the node_idx into mle's maybe_map directly.

    Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4A3D490DD@H3CMLB12-EX.srv.huawei-3com.com
    Signed-off-by: Guozhonghua
    Reviewed-by: Mark Fasheh
    Reviewed-by: Joseph Qi
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guozhonghua
     
  • The value of 'stage' must be between 1 and 2, so the switch can't reach
    the default case.

    Link: http://lkml.kernel.org/r/57FB5EB2.7050002@huawei.com
    Signed-off-by: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    piaojun
     

12 Oct, 2016

1 commit

  • In the dlm_migrate_request_handler(), when `ret' is -EEXIST, the mle
    should be freed, otherwise the memory will be leaked.

    Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4A3D3522A@H3CMLB12-EX.srv.huawei-3com.com
    Signed-off-by: Guozhonghua
    Reviewed-by: Mark Fasheh
    Cc: Eric Ren
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guozhonghua
     

08 Oct, 2016

1 commit

  • The workqueue "dlm_worker" queues a single work item &dlm->dispatched_work
    and thus it doesn't require execution ordering. Hence, alloc_workqueue
    has been used to replace the deprecated create_singlethread_workqueue
    instance.

    The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
    memory pressure.

    Since there are fixed number of work items, explicit concurrency
    limit is unnecessary here.

    Link: http://lkml.kernel.org/r/2b5ad8d6688effe1a9ddb2bc2082d26fbbe00302.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     

20 Sep, 2016

1 commit

  • Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
    checks if lockres master has changed to identify whether new master has
    finished recovery or not. This will introduce a race that right after
    old master does umount ( means master will change), a new convert
    request comes.

    In this case, it will reset lockres state to DLM_RECOVERING and then
    retry convert, and then fail with lockres->l_action being set to
    OCFS2_AST_INVALID, which will cause inconsistent lock level between
    ocfs2 and dlm, and then finally BUG.

    Since dlm recovery will clear lock->convert_pending in
    dlm_move_lockres_to_recovery_list, we can use it to correctly identify
    the race case between convert and recovery. So fix it.

    Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
    Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com
    Signed-off-by: Joseph Qi
    Signed-off-by: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

06 Aug, 2016

1 commit

  • Pull qstr constification updates from Al Viro:
    "Fairly self-contained bunch - surprising lot of places passes struct
    qstr * as an argument when const struct qstr * would suffice; it
    complicates analysis for no good reason.

    I'd prefer to feed that separately from the assorted fixes (those are
    in #for-linus and with somewhat trickier topology)"

    * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    qstr: constify instances in adfs
    qstr: constify instances in lustre
    qstr: constify instances in f2fs
    qstr: constify instances in ext2
    qstr: constify instances in vfat
    qstr: constify instances in procfs
    qstr: constify instances in fuse
    qstr constify instances in fs/dcache.c
    qstr: constify instances in nfs
    qstr: constify instances in ocfs2
    qstr: constify instances in autofs4
    qstr: constify instances in hfs
    qstr: constify instances in hfsplus
    qstr: constify instances in logfs
    qstr: constify dentry_init_security

    Linus Torvalds
     

03 Aug, 2016

3 commits

  • We found a dlm-blocked situation caused by continuous breakdown of
    recovery masters described below. To solve this problem, we should
    purge recovery lock once detecting recovery master goes down.

    N3 N2 N1(reco master)
    go down
    pick up recovery lock and
    begin recoverying for N2

    go down

    pick up recovery
    lock failed, then
    purge it:
    dlm_purge_lockres
    ->DROPPING_REF is set

    send deref to N1 failed,
    recovery lock is not purged

    find N1 go down, begin
    recoverying for N1, but
    blocked in dlm_do_recovery
    as DROPPING_REF is set:
    dlm_do_recovery
    ->dlm_pick_recovery_master
    ->dlmlock
    ->dlm_get_lock_resource
    ->__dlm_wait_on_lockres_flags(tmpres,
    DLM_LOCK_RES_DROPPING_REF);

    Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
    Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
    Signed-off-by: Jun Piao
    Reviewed-by: Joseph Qi
    Reviewed-by: Jiufei Xue
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    piaojun
     
  • We found a BUG situation that lockres is migrated during deref described
    below. To solve the BUG, we could purge lockres directly when other
    node says I did not have a ref. Additionally, we'd better purge lockres
    if master goes down, as no one will response deref done.

    Node 1 Node 2(old master) Node3(new master)
    dlm_purge_lockres
    send deref to N2

    leave domain
    migrate lockres to N3
    finish migration
    send do assert
    master to N1

    receive do assert msg
    form N3, but can not
    find lockres because
    DROPPING_REF is set,
    so the owner is still
    N2.

    receive deref from N1
    and response -EINVAL
    because lockres is migrated

    BUG when receive -EINVAL
    in dlm_drop_lockres_ref

    Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear the refmap bit right now")

    Link: http://lkml.kernel.org/r/57845103.3070406@huawei.com
    Signed-off-by: Jun Piao
    Reviewed-by: Joseph Qi
    Reviewed-by: Jiufei Xue
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    piaojun
     
  • …eref_lockres_done_handler

    We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
    unexpected that described below. To solve the bug, we disable the
    BUG_ON and purge lockres in dlm_do_local_recovery_cleanup.

    Node 1 Node 2(master)
    dlm_purge_lockres
    dlm_deref_lockres_handler

    DLM_LOCK_RES_SETREF_INPROG is set
    response DLM_DEREF_RESPONSE_INPROG

    receive DLM_DEREF_RESPONSE_INPROG
    stop puring in dlm_purge_lockres
    and wait for DLM_DEREF_RESPONSE_DONE

    dispatch dlm_deref_lockres_worker
    response DLM_DEREF_RESPONSE_DONE

    receive DLM_DEREF_RESPONSE_DONE and
    prepare to purge lockres

    Node 2 goes down

    find Node2 down and do local
    clean up for Node2:
    dlm_do_local_recovery_cleanup
    -> clear DLM_LOCK_RES_DROPPING_REF

    when purging lockres, BUG_ON happens
    because DLM_LOCK_RES_DROPPING_REF is clear:
    dlm_deref_lockres_done_handler
    ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));

    [akpm@linux-foundation.org: fix duplicated write to `ret']
    Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
    Link: http://lkml.kernel.org/r/57845055.9080702@huawei.com
    Signed-off-by: Jun Piao <piaojun@huawei.com>
    Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
    Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
    Reviewed-by: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    piaojun
     

29 Jul, 2016

1 commit

  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

27 Jul, 2016

1 commit

  • dlm_debug_ctxt->debug_refcnt is initialized to 1 and then increased to 2
    by dlm_debug_get in dlm_debug_init. But dlm_debug_put is called only
    once in dlm_debug_shutdown during unregister dlm, which leads to
    dlm_debug_ctxt leaked.

    Link: http://lkml.kernel.org/r/577BB755.4030900@huawei.com
    Signed-off-by: Joseph Qi
    Reviewed-by: Jiufei Xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

21 Jul, 2016

1 commit


11 Jun, 2016

1 commit

  • We always mixed in the parent pointer into the dentry name hash, but we
    did it late at lookup time. It turns out that we can simplify that
    lookup-time action by salting the hash with the parent pointer early
    instead of late.

    A few other users of our string hashes also wanted to mix in their own
    pointers into the hash, and those are updated to use the same mechanism.

    Hash users that don't have any particular initial salt can just use the
    NULL pointer as a no-salt.

    Cc: Vegard Nossum
    Cc: George Spelvin
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Apr, 2016

1 commit


26 Mar, 2016

3 commits

  • We have found a bug when two nodes doing umount one after another.

    1) Node 1 migrate a lockres that has 3 locks in grant queue such as
    N2(PR)N3(NL)N4(PR) to N2. After migration, lvb of the lock
    N3(NL) and N4(PR) are empty on node 2 because migration target do not
    copy lvb to these two lock.

    2) Node 3 want to convert to PR, it can be granted in
    __dlmconvert_master(), and the order of these locks is unchanged. The
    lvb of the lock N3(PR) on node 2 is copyed from lockres in function
    dlm_update_lvb() while the lvb of lock N4(PR) is still empty.

    3) Node 2 want to leave domain, it will migrate this lockres to node 3.
    Then node 2 will trigger the BUG in dlm_prepare_lvb_for_migration()
    when adding the lock N4(PR) to mres with the following message because
    the lvb of mres is already copied from lock N3(PR), but the lvb of lock
    N4(PR) is empty.

    "Mismatched lvb in lock cookie=%u:%llu, name=%.*s, node=%u"

    [akpm@linux-foundation.org: tweak comment]
    Signed-off-by: xuejiufei
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • When master handles convert request, it queues ast first and then
    returns status. This may happen that the ast is sent before the request
    status because the above two messages are sent by two threads. And
    right after the ast is sent, if master down, it may trigger BUG in
    dlm_move_lockres_to_recovery_list in the requested node because ast
    handler moves it to grant list without clear lock->convert_pending. So
    remove BUG_ON statement and check if the ast is processed in
    dlmconvert_remote.

    Signed-off-by: Joseph Qi
    Reported-by: Yiwen Jiang
    Cc: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Tariq Saeed
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • There is a race window between dlmconvert_remote and
    dlm_move_lockres_to_recovery_list, which will cause a lock with
    OCFS2_LOCK_BUSY in grant list, thus system hangs.

    dlmconvert_remote
    {
    spin_lock(&res->spinlock);
    list_move_tail(&lock->list, &res->converting);
    lock->convert_pending = 1;
    spin_unlock(&res->spinlock);

    status = dlm_send_remote_convert_request();
    >>>>>> race window, master has queued ast and return DLM_NORMAL,
    and then down before sending ast.
    this node detects master down and calls
    dlm_move_lockres_to_recovery_list, which will revert the
    lock to grant list.
    Then OCFS2_LOCK_BUSY won't be cleared as new master won't
    send ast any more because it thinks already be authorized.

    spin_lock(&res->spinlock);
    lock->convert_pending = 0;
    if (status != DLM_NORMAL)
    dlm_revert_pending_convert(res, lock);
    spin_unlock(&res->spinlock);
    }

    In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
    (res is still in recovering) or res master changed (new master has
    finished recovery), reset the status to DLM_RECOVERING, then it will
    retry convert.

    Signed-off-by: Joseph Qi
    Reported-by: Yiwen Jiang
    Reviewed-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Tariq Saeed
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

16 Mar, 2016

7 commits

  • In dlm_send_join_cancels(), node is defined with type unsigned int, but
    initialized with -1, this will lead variable overflow. Although this
    won't cause any runtime problem, the code looks a little uncoordinated.

    Signed-off-by: Jun Piao
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jun Piao
     
  • when o2hb detect a node down, it first set the dead node to recovery map
    and create ocfs2rec which will replay journal for dead node. o2hb
    thread then call dlm_do_local_recovery_cleanup() to delete the lock for
    dead node. After the lock of dead node is gone, locks for other nodes
    can be granted and may modify the meta data without replaying journal of
    the dead node. The detail is described as follows.

    N1 N2 N3(master)
    modify the extent tree of
    inode, and commit
    dirty metadata to journal,
    then goes down.
    o2hb thread detects
    N1 goes down, set
    recovery map and
    delete the lock of N1.

    dlm_thread flush ast
    for the lock of N2.
    do not detect the death
    of N1, so recovery map is
    empty.

    read inode from disk
    without replaying
    the journal of N1 and
    modify the extent tree
    of the inode that N1
    had modified.
    ocfs2rec recover the
    journal of N1.
    The modification of N2
    is lost.

    The modification of N1 and N2 are not serial, and it will lead to
    read-only file system. We can set recovery_waiting flag to the lock
    resource after delete the lock for dead node to prevent other node from
    getting the lock before dlm recovery. After dlm recovery, the recovery
    map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2
    recovery.

    Signed-off-by: Jiufei Xue
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiufei Xue
     
  • If master migrate this lock resource to node when it happened to purge
    it, a new lock resource will be created and inserted into hash list. If
    then master goes down, the lock resource being purged is recovered, so
    there exist two lock resource with different owner. So return error to
    master if the lock resource is in DROPPING state, master will retry to
    migrate this lock resource.

    Signed-off-by: xuejiufei
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Reviewed-by: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • If the master goes down after return in-progress for deref message. The
    lock resource on non-master node can not be purged. Clear the
    DROPPING_REF flag and recovery it.

    Signed-off-by: xuejiufei
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Reviewed-by: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • Master returns in-progress to non-master node when it can not clear the
    refmap bit right now. And non-master node will not purge the lock
    resource until receiving deref done message.

    Signed-off-by: xuejiufei
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Reviewed-by: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • This series of patches is to fix the dis-order issue of setting/clearing
    refmap bit described below.

    Node 1 Node 2(master)
    dlmlock
    dlm_do_master_request
    dlm_master_request_handler
    -> dlm_lockres_set_refmap_bit
    dlmlock succeed
    dlmunlock succeed

    dlm_purge_lockres
    dlm_deref_handler
    -> find lock resource is in
    DLM_LOCK_RES_SETREF_INPROG state,
    so dispatch a deref work
    dlm_purge_lockres succeed.

    call dlmlock again
    dlm_do_master_request
    dlm_master_request_handler
    -> dlm_lockres_set_refmap_bit

    deref work trigger, call
    dlm_lockres_clear_refmap_bit
    to clear Node 1 from refmap

    dlm_purge_lockres succeed

    dlm_send_remote_lock_request
    return DLM_IVLOCKID because
    the lockres is not exist
    BUG if the lockres is $RECOVERY

    This series of patches add a new message to keep the order of set and
    clear. Other nodes can purge the lock resource only after the refmap bit
    on master is cleared.

    This patch is to add DEREF_DONE message and corresponding handler. Node
    can purge the lock resource after receiving this message. As a new
    message is added, so increase the minor number of dlm protocol version.

    Signed-off-by: xuejiufei
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Reviewed-by: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • Refer to cluster/tcp.h, NET_MAX_PAYLOAD_BYTES is a typo for
    O2NET_MAX_PAYLOAD_BYTES.

    Since currently DLM_MIG_LOCKRES_RESERVED is not actually used, it won't
    cause any problem. But we'd better correct it for further use.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

06 Feb, 2016

1 commit

  • When recovery master down, dlm_do_local_recovery_cleanup() only remove
    the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
    Which will make umount thread falling in dead loop migrating $RECOVERY
    to the dead node.

    Signed-off-by: xuejiufei
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     

15 Jan, 2016

7 commits

  • lksb flags are defined both in dlmapi.h and dlmcommon.h. So clean them
    up from dlmcommon.h.

    Signed-off-by: Joseph Qi
    Reviewed-by: Jiufei Xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Found this when do patch review, remove to make it clear and save a
    little cpu time.

    Signed-off-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • When two processes are migrating the same lockres,
    dlm_add_migration_mle() return -EEXIST, but insert a new mle in hash
    list. dlm_migrate_lockres() will detach the old mle and free the new
    one which is already in hash list, that will destroy the list.

    Signed-off-by: Jiufei Xue
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Reviewed-by: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • We have found that migration source will trigger a BUG that the refcount
    of mle is already zero before put when the target is down during
    migration. The situation is as follows:

    dlm_migrate_lockres
    dlm_add_migration_mle
    dlm_mark_lockres_migrating
    dlm_get_mle_inuse
    <<<<<< Now the refcount of the mle is 2.
    dlm_send_one_lockres and wait for the target to become the
    new master.
    <<<<<< o2hb detect the target down and clean the migration
    mle. Now the refcount is 1.

    dlm_migrate_lockres woken, and put the mle twice when found the target
    goes down which trigger the BUG with the following message:

    "ERROR: bad mle: ".

    Signed-off-by: Jiufei Xue
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • dlm_grab() may return NULL when the node is doing unmount. When doing
    code review, we found that some dlm handlers may return error to caller
    when dlm_grab() returns NULL and make caller BUG or other problems.
    Here is an example:

    Node 1 Node 2
    receives migration message
    from node 3, and send
    migrate request to others
    start unmounting

    receives migrate request
    from node 1 and call
    dlm_migrate_request_handler()

    unmount thread unregisters
    domain handlers and removes
    dlm_context from dlm_domains

    dlm_migrate_request_handlers()
    returns -EINVAL to node 1
    Exit migration neither clearing the
    migration state nor sending
    assert master message to node 3 which
    cause node 3 hung.

    Signed-off-by: Jiufei Xue
    Reviewed-by: Joseph Qi
    Reviewed-by: Yiwen Jiang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • Commit f3f854648de6 ("ocfs2_dlm: Ensure correct ordering of set/clear
    refmap bit on lockres") still exists a race which can't ensure the
    ordering is exactly correct.

    Node1 Node2 Node3
    umount, migrate
    lockres to Node2
    migrate finished,
    send migrate request
    to Node3
    received migrate request,
    create a migration_mle,
    respond to Node2.
    set DLM_LOCK_RES_SETREF_INPROG
    and send assert master to
    Node3
    delete migration_mle in
    assert_master_handler,
    Node3 umount without response
    dlm_thread purge
    this lockres, send drop
    deref message to Node2
    found the flag of
    DLM_LOCK_RES_SETREF_INPROG
    is set, dispatch
    dlm_deref_lockres_worker to
    clear refmap, but in function of
    dlm_deref_lockres_worker,
    only if node in refmap it wait
    DLM_LOCK_RES_SETREF_INPROG
    to be cleared. So worker is
    done successfully

    purge lockres, send
    assert master response
    to Node1, and finish umount
    set Node3 in refmap, and it
    won't be cleared forever, thus
    lead to umount hung

    so wait until DLM_LOCK_RES_SETREF_INPROG is cleared in
    dlm_deref_lockres_worker.

    Signed-off-by: Yiwen Jiang
    Reviewed-by: Joseph Qi
    Reviewed-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    jiangyiwen
     
  • We found a race between purge and migration when doing code review.
    Node A put lockres to purgelist before receiving the migrate message
    from node B which is the master. Node A call dlm_mig_lockres_handler to
    handle this message.

    dlm_mig_lockres_handler
    dlm_lookup_lockres
    >>>>>> race window, dlm_run_purge_list may run and send
    deref message to master, waiting the response
    spin_lock(&res->spinlock);
    res->state |= DLM_LOCK_RES_MIGRATING;
    spin_unlock(&res->spinlock);
    dlm_mig_lockres_handler returns

    >>>>>> dlm_thread receives the response from master for the deref
    message and triggers the BUG because the lockres has the state
    DLM_LOCK_RES_MIGRATING with the following message:

    dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F: res
    M0000000000000000030c0300000000 in use after deref

    Signed-off-by: Jiufei Xue
    Reviewed-by: Joseph Qi
    Reviewed-by: Yiwen Jiang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     

30 Dec, 2015

1 commit

  • We have found a BUG on res->migration_pending when migrating lock
    resources. The situation is as follows.

    dlm_mark_lockres_migration
    res->migration_pending = 1;
    __dlm_lockres_reserve_ast
    dlm_lockres_release_ast returns with res->migration_pending remains
    because other threads reserve asts
    wait dlm_migration_can_proceed returns 1
    >>>>>>> o2hb found that target goes down and remove target
    from domain_map
    dlm_migration_can_proceed returns 1
    dlm_mark_lockres_migrating returns -ESHOTDOWN with
    res->migration_pending still remains.

    When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
    with res->migration_pending. So clear migration_pending when target is
    down.

    Signed-off-by: Jiufei Xue
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     

06 Nov, 2015

1 commit

  • A node can mount multiple ocfs2 volumes. And if thread names are same for
    each volume/domain, it will bring inconvenience when analyzing problems
    because we have to identify which volume/domain the messages belong to.

    Since thread name will be printed to messages, so add volume uuid or dlm
    name to thread name can benefit problem analysis.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Gang He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

23 Oct, 2015

1 commit

  • dlm_lockres_put will call dlm_lockres_release if it is the last
    reference, and then it may call dlm_print_one_lock_resource and
    take lockres spinlock.

    So unlock lockres spinlock before dlm_lockres_put to avoid deadlock.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

23 Sep, 2015

1 commit

  • The order of the following three spinlocks should be:
    dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock

    But dlm_dispatch_assert_master() is called while holding
    dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
    dlm_grab() which will take dlm_domain_lock.

    Once another thread (for example, dlm_query_join_handler) has already
    taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
    happens.

    Signed-off-by: Joseph Qi
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc: "Junxiao Bi"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi