Eric Lee / smarc-fsl-linux-kernel

23 Oct, 2015

1 commit

40c96f620 ocfs2/dlm: fix deadlock when dispatch assert master ... Browse Code »

commit 012572d4fc2e4ddd5c8ec8614d51414ec6cae02a upstream.

The order of the following three spinlocks should be:
dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock

But dlm_dispatch_assert_master() is called while holding
dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
dlm_grab() which will take dlm_domain_lock.

Once another thread (for example, dlm_query_join_handler) has already
taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
happens.

Signed-off-by: Joseph Qi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: "Junxiao Bi"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Joseph Qi
2015-10-23 05:43:15 +0800

06 May, 2015

1 commit

b1432a2a3 ocfs2: dlm: fix race between purge and get lock resource ... Browse Code »

There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged. This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.

dlm_get_lock_resource {
...
spin_lock(&dlm->spinlock);
tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
if (tmpres) {
spin_unlock(&dlm->spinlock);
>>>>>>>> race window, dlm_run_purge_list() may run and purge
the lock resource
spin_lock(&tmpres->spinlock);
...
spin_unlock(&tmpres->spinlock);
}
}

Signed-off-by: Junxiao Bi
Cc: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2015-05-06 08:10:11 +0800

11 Feb, 2015

4 commits

99b8874e7 o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper ... Browse Code »

A tiny race between BAST and unlock message causes the NULL dereference.

A node sends an unlock request to master and receives a response. Before
processing the response it receives a BAST from the master. Since both
requests are processed by different threads it creates a race. While the
BAST is being processed, lock can get freed by unlock code.

This patch makes bast to return immediately if lock is found but unlock is
pending. The code should handle this race. We also have to fix master
node to skip sending BAST after receiving unlock message.

Below is the crash stack

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: o2dlm_blocking_ast_wrapper+0xd/0x16
dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
worker_thread+0x14d/0x1ed

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Srinivas Eeda
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srinivas Eeda
2015-02-11 06:30:30 +0800
95671c63d ocfs2: dlm: dlmdomain: remove unused function ... Browse Code »

Remove dlm_joined() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rickard Strandqvist
2015-02-11 06:30:29 +0800
79c83ea1a ocfs2: fix snprintf format specifier in dlmdebug.c ... Browse Code »

Use snprintf format specifier "%lu" instead of "%ld" for argument of type
'unsigned long'.

Signed-off-by: Alex Chen
Reviewed-by: Joseph Qi
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

alex chen
2015-02-11 06:30:29 +0800
b934beaf4 ocfs2/dlm: add missing dlm_lock_put() when recovery master down ... Browse Code »

When the recovery master is down, the owner of $RECOVERY calls
dlm_do_local_recovery_cleanup() to prune any $RECOVERY entries for dead
nodes. The lock is in the granted list and the refcount must be 2. We
should put twice to remove this lock. Otherwise, it will lead to a memory
leak.

Signed-off-by: joyce.xue
Reported-by: yangwenfang
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2015-02-11 06:30:28 +0800

09 Jan, 2015

1 commit

eb4f73b4c ocfs2: remove bogus check in dlm_process_recovery_data ... Browse Code »

In dlm_process_recovery_data, only when dlm_new_lock failed the ret will
be set to -ENOMEM. And in this case, newlock is definitely NULL. So
test newlock is meaningless, remove it.

Signed-off-by: Joseph Qi
Reviewed-by: Alex Chen
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2015-01-09 07:10:51 +0800

19 Dec, 2014

1 commit

1e5895816 ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker ... Browse Code »

Commit ac4fef4d23ed ("ocfs2/dlm: do not purge lockres that is queued for
assert master") may have the following possible race case:

dlm_dispatch_assert_master dlm_wq
========================================================================
queue_work(dlm->quedlm_worker,
&dlm->dispatched_work);
dispatch work,
dlm_lockres_drop_inflight_worker
*BUG_ON(res->inflight_assert_workers == 0)*
dlm_lockres_grab_inflight_worker
inflight_assert_workers++

So ensure inflight_assert_workers to be increased first.

Signed-off-by: Joseph Qi
Signed-off-by: Xue jiufei
Cc: Joel Becker
Reviewed-by: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2014-12-19 11:08:11 +0800

11 Dec, 2014

3 commits

b3e3e5af6 ocfs2: remove unneeded NULL check ... Browse Code »

In commit 1faf289454b9 ("ocfs2_dlm: disallow a domain join if node maps
mismatch") we introduced a new earlier NULL check so this one is not
needed. Also static checkers complain because we dereference it first
and then check for NULL.

Signed-off-by: Dan Carpenter
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2014-12-11 09:41:04 +0800
cb79662bc ocfs2: o2dlm: fix a race between purge and master query ... Browse Code »

Node A sends master query request to node B which is the master. At this
time lockres happens to be on purgelist. dlm_master_request_handler gets
the dlm spinlock, finds the resource and releases the dlm spin lock.
Right at this dlm_thread on this node could purge the lockres.
dlm_master_request_handler can then acquire lockres spinlock and reply to
Node A that node B is the master even though lockres on node B is purged.

The above scenario will now make node A falsely think node B is the master
which is inconsistent. Further if another node C tries to master the same
resource, every node will respond they are not the master. Node C then
masters the resource and sends assert master to all nodes. This will now
make node A crash with the following message.

dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
owner is 10!

Signed-off-by: Srinivas Eeda
Cc: Mark Fasheh
Cc: Joel Becker
Reviewed-by: Wengang Wang
Tested-by: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srinivas Eeda
2014-12-11 09:41:03 +0800
f08736bd6 ocfs2/dlm: let sender retry if dlm_dispatch_assert_master failed with -ENOMEM ... Browse Code »

Do not BUG() if GFP_ATOMIC allocation fails in dlm_dispatch_assert_master.
Instead, return -ENOMEM to the sender and then retry.

Signed-off-by: Joseph Qi
Reviewed-by: Alex Chen
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2014-12-11 09:41:03 +0800

10 Oct, 2014

5 commits

70e82a12d ocfs2: fix deadlock between o2hb thread and o2net_wq ... Browse Code »

The following case may lead to o2net_wq and o2hb thread deadlock on
o2hb_callback_sem.
Currently there are 2 nodes say N1, N2 in the cluster. And N2 down, at
the same time, N3 tries to join the cluster. So N1 will handle node
down (N2) and join (N3) simultaneously.
o2hb o2net_wq
->o2hb_do_disk_heartbeat
->o2hb_check_slot
->o2hb_run_event_list
->o2hb_fire_callbacks
->down_write(&o2hb_callback_sem)
->o2net_hb_node_down_cb
->flush_workqueue(o2net_wq)
->o2net_process_message
->dlm_query_join_handler
->o2hb_check_node_heartbeating
->o2hb_fill_node_map
->down_read(&o2hb_callback_sem)

No need to take o2hb_callback_sem in dlm_query_join_handler,
o2hb_live_lock is enough to protect live node map.

Signed-off-by: Joseph Qi
Cc: xMark Fasheh
Cc: Joel Becker
Cc: jiangyiwen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2014-10-10 10:25:47 +0800
8f9ac0323 fs/ocfs2/dlm/dlmdebug.c: use seq_open_private() not seq_open() ... Browse Code »

Reduce boilerplate code by using seq_open_private() instead of seq_open()

Signed-off-by: Rob Jones
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rob Jones
2014-10-10 10:25:47 +0800
6ae075485 ocfs2: remove unused code in dlm_new_lockres() ... Browse Code »

Remove the branch that free res->lockname.name because the condition
is never satisfied when jump to label error.

Signed-off-by: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-10-10 10:25:47 +0800
9a7e6b5a0 ocfs2/dlm: call dlm_lockres_put without resource spinlock ... Browse Code »

dlm_lockres_put() should be called without &res->spinlock, otherwise a
deadlock case may happen.

spin_lock(&res->spinlock)
...
dlm_lockres_put
->dlm_lockres_release
->dlm_print_one_lock_resource
->spin_lock(&res->spinlock)

Signed-off-by: Alex Chen
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

alex chen
2014-10-10 10:25:47 +0800
190a7721a ocfs2/dlm: refactor error handling in dlm_alloc_ctxt ... Browse Code »

Refactoring error handling in dlm_alloc_ctxt to simplify code.

Signed-off-by: Joseph Qi
Reviewed-by: Alex Chen
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2014-10-10 10:25:46 +0800

03 Oct, 2014

1 commit

55dacd22d ocfs2/dlm: should put mle when goto kill in dlm_assert_master_handler ... Browse Code »

In dlm_assert_master_handler, the mle is get in dlm_find_mle, should be
put when goto kill, otherwise, this mle will never be released.

Signed-off-by: Alex Chen
Reviewed-by: Joseph Qi
Reviewed-by: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

alex chen
2014-10-03 07:28:44 +0800

26 Sep, 2014

1 commit

5760a97c7 ocfs2/dlm: do not get resource spinlock if lockres is new ... Browse Code »

There is a deadlock case which reported by Guozhonghua:
https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html

This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.

It was introduced by commit 8d400b81cc83 ("ocfs2/dlm: Clean up refmap
helpers"). Since lockres is new, it doesn't not require the
&res->spinlock. So remove it.

Fixes: 8d400b81cc83 ("ocfs2/dlm: Clean up refmap helpers")
Signed-off-by: Joseph Qi
Reviewed-by: joyce.xue
Reported-by: Guozhonghua
Cc: Joel Becker
Cc: Mark Fasheh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2014-09-26 23:10:34 +0800

07 Aug, 2014

2 commits

bba1cb17d ocfs2: race between umount and unfinished remastering during recovery ... Browse Code »

Orabug: 19074140

When umount is issued during recovery on the new master that has not
finished remastering locks, it triggers BUG() in
dlm_send_mig_lockres_msg(). Here is the situation:

1) node A has a lock on resource X mastered by node B.

2) node B dies -> node A sets recovering flag for res X

3) Node C becomes the new master for resources owned by the
dead node and is remastering locks of the dead node but
has not finished the remastering process yet.

4) umount is issued on node C.

5) During processing of umount, ignoring unfished recovery,
node C attempts to migrate resource X to node A.

6) node A finds res X in DLM_LOCK_RES_RECOVERING state, considers
it a logic error and sends back -EFAULT.

7) node C asserts BUG() upon seeing EFAULT resp from node B.

Fix is to delay migrating res X till remastering is finished at which
point recovering flag will be cleared on both A and C.

Signed-off-by: Tariq Saeed
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tariq Saeed
2014-08-07 09:01:13 +0800
7567c1488 ocfs2: remove conversion of total_backoff in dlm_join_domain() ... Browse Code »

The unit of total_backoff is msecs not jiffies, so no need to do the
conversion. Otherwise, the join timeout is not 90 sec.

Signed-off-by: Yiwen Jiang
Signed-off-by: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-08-07 09:01:13 +0800

24 Jun, 2014

4 commits

ac4fef4d2 ocfs2/dlm: do not purge lockres that is queued for assert master ... Browse Code »

When workqueue is delayed, it may occur that a lockres is purged while it
is still queued for master assert. it may trigger BUG() as follows.

N1 N2
dlm_get_lockres()
->dlm_do_master_requery
is the master of lockres,
so queue assert_master work

dlm_thread() start running
and purge the lockres

dlm_assert_master_worker()
send assert master message
to other nodes
receiving the assert_master
message, set master to N2

dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID,
if it is RECOVERY lockres, it triggers the BUG().

Another BUG() is triggered when N3 become the new master and send
assert_master to N1, N1 will trigger the BUG() because owner doesn't
match. So we should not purge lockres when it is queued for assert
master.

Signed-off-by: joyce.xue
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-06-24 07:47:45 +0800
b9aaac5a6 ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount ... Browse Code »

The following case may lead to endless loop during umount.

node A node B node C node D
umount volume,
migrate lockres1
to B
want to lock lockres1,
send
MASTER_REQUEST_MSG
to C
init block mle
send
MIGRATE_REQUEST_MSG
to C
find a block
mle, and then
return
DLM_MIGRATE_RESPONSE_MASTERY_REF
to B
set C in refmap
umount successfully
try to umount, endless
loop occurs when migrate
lockres1 since C is in
refmap

So we can fix this endless loop case by only returning
DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
MIGRATE_REQUEST_MSG.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: jiangyiwen
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Cc: Xue jiufei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

jiangyiwen
2014-06-24 07:47:45 +0800
a270c6d3c ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list() ... Browse Code »

When a lockres in purge list but is still in use, it should be moved to
the tail of purge list. dlm_thread will continue to check next lockres in
purge list. However, code list_move_tail(&dlm->purge_list,
&lockres->purge) will do *no* movements, so dlm_thread will purge the same
lockres in this loop again and again. If it is in use for a long time,
other lockres will not be processed.

Signed-off-by: Yiwen Jiang
Signed-off-by: joyce.xue
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-06-24 07:47:45 +0800
27bf6305c ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and id… ... Browse Code »

…letimeout closes conn

Orabug: 18639535

Two node cluster and both nodes hold a lock at PR level and both want to
convert to EX at the same time. Master node 1 has sent BAST and then
closes the connection due to idletime out. Node 0 receives BAST, sends
unlock req with cancel flag but gets error -ENOTCONN. The problem is
this error is ignored in dlm_send_remote_unlock_request() on the
**incorrect** assumption that the master is dead. See NOTE in comment
why it returns DLM_NORMAL. Upon getting DLM_NORMAL, node 0 proceeds to
sends convert (without cancel flg) which fails with -ENOTCONN. waits 5
sec and resends.

This time gets DLM_IVLOCKID from the master since lock not found in
grant, it had been moved to converting queue in response to conv PR->EX
req. No way out.

Node 1 (master) Node 0
============== ======

lock mode PR PR

convert PR -> EX
mv grant -> convert and que BAST
...
<-------- convert PR -> EX
convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
...
BAST (want PR -> NL)
------------------>
...
idle timout, conn closed
...
In response to BAST,
sends unlock with cancel convert flag
gets -ENOTCONN. Ignores and
sends remote convert request
gets -ENOTCONN, waits 5 Sec, retries
...
reconnects
<----------------- convert req goes through on next try
does not find lock on grant que
status DLM_IVLOCKID
------------------>
...

No way out. Fix is to keep retrying unlock with cancel flag until it
succeeds or the master dies.

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Tariq Saeed
2014-06-24 07:47:45 +0800

05 Jun, 2014

4 commits

e72db989e ocfs2: remove some unused code ... Browse Code »

dlm_recovery_ctxt.received is unused.

ocfs2_should_refresh_lock_res() can only return 0 or 1, so the error
handling code in ocfs2_super_lock() is unneeded.

Signed-off-by: joyce.xue
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-06-05 07:53:55 +0800
01c6222f8 ocfs2/dlm: disallow node joining when recovery is on going ... Browse Code »

We found a race situation when dlm recovery and node joining occurs
simultaneously if the network state is bad.

N1 N4

start joining dlm and send
query join to all live nodes
set joining node to N1, return OK
send query join to other
live nodes and it may take
a while

call dlm_send_join_assert()
to send assert join message
when N2 is down, so keep
trying to send message to N2
until find N2 is down

send assert join message to
N3, but connection is down
with N3, so it may take a
while
become the recovery master for N2
and send begin reco message to other
nodes in domain map but no N1
connection with N3 is rebuild,
then send assert join to N4
call dlm_assert_joined_handler(),
add N1 to domain_map

dlm recovery done, send finalize message
to nodes in domain map, including N1
receiving finalize message,
trigger the BUG() because
recovery master mismatch.

Signed-off-by: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-06-05 07:53:54 +0800
6718cb5e0 ocfs2/dlm: fix possible convert=sion deadlock ... Browse Code »

We found there is a conversion deadlock when the owner of lockres
happened to crash before send DLM_PROXY_AST_MSG for a downconverting
lock. The situation is as follows:

Node1 Node2 Node3
the owner of lockresA
lock_1 granted at EX mode
and call ocfs2_cluster_unlock
to decrease ex_holders.
converting lock_3 from
NL to EX
send DLM_PROXY_AST_MSG
to Node1, asking Node 1
to downconvert.
receiving DLM_PROXY_AST_MSG,
thread ocfs2dc send
DLM_CONVERT_LOCK_MSG
to Node2 to downconvert
lock_1(EX->NL).
lock_1 can be granted and
put it into pending_asts
list, return DLM_NORMAL.
then something happened
and Node2 crashed.
received DLM_NORMAL, waiting
for DLM_PROXY_AST_MSG.
selected as the recovery
master, receving migrate
lock from Node1, queue
lock_1 to the tail of
converting list.

After dlm recovery, converting list in the master of lockresA(Node3)
will be: converting list head lock_3(NL->EX) lock_1(EXNL).
Requested mode of lock_3 is not compatible with the granted mode of
lock_1, so it can not be granted. and lock_1 can not downconvert
because covnerting queue is strictly FIFO. So a deadlock is created.
We think function dlm_process_recovery_data() should queue_ast for
lock_1 or alter the order of lock_1 and lock_3, so dlm_thread can
process lock_1 first. And if there are multiple downconverting locks,
they must convert form PR to NL, so no need to sort them.

Signed-off-by: joyce.xue
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2014-06-05 07:53:54 +0800
1a5c4e2a0 ocfs2: remove NULL assignments on static ... Browse Code »

Static values are automatically initialized to NULL.

Signed-off-by: Fabian Frederick
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-06-05 07:53:53 +0800

24 May, 2014

1 commit

66db6cfd4 ocfs2: fix double kmem_cache_destroy in dlm_init ... Browse Code »

In dlm_init, if create dlm_lockname_cache failed in
dlm_init_master_caches, it will destroy dlm_lockres_cache which created
before twice. And this will cause system die when loading modules.

Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2014-05-24 00:37:30 +0800

04 Apr, 2014

4 commits

a35ad97cd ocfs2: fix deadlock risk when kmalloc failed in dlm_query_region_handler ... Browse Code »

In dlm_query_region_handler(), once kmalloc failed, it will unlock
dlm_domain_lock without lock first, then deadlock happens.

Signed-off-by: Zhonghua Guo
Signed-off-by: Joseph Qi
Reviewed-by: Srinivas Eeda
Tested-by: Joseph Qi
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhonghua Guo
2014-04-04 07:20:55 +0800
ded2cf714 ocfs2: dlm: fix recovery hung ... Browse Code »

There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node. After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value. This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.

old recovery master: new recovery master:
dlm_remaster_locks()
become recovery master for
another dead node.
dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
return -EAGAIN;
}
dlm_set_reco_master(dlm, br->node_idx);
dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
will hang in dlm_remaster_locks() for
request dlm locks info

Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.

A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.

Signed-off-by: Junxiao Bi
Reviewed-by: Srinivas Eeda
Reviewed-by: Wengang Wang
Cc: Joel Becker
Cc: Mark Fasheh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2014-04-04 07:20:54 +0800
34aa8dac4 ocfs2: dlm: fix lock migration crash ... Browse Code »

This issue was introduced by commit 800deef3f6f8 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry. The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this. Sunil advised reverting it back, but the old version was
also not right. At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic. So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.

Another concern is that this seemes can not happen because the "tmpq"
list should not be empty. Let me describe how.

old lock resource owner(node 1): migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
dlm_pick_migration_target() {
pick node 2 as target as its lock is the first one
in granted list.
}
dlm_migrate_lockres() {
dlm_mark_lockres_migrating() {
res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
//after the above code, we can not dirty lockres any more,
// so dlm_thread shuffle list will not run
downconvert lock from EX to NR
upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.

// will set lockres RES_MIGRATING flag, the following
// lock/unlock can not run
dlm_lockres_release_ast(dlm, res);
}

dlm_send_one_lockres()
dlm_process_recovery_data()
for (i=0; inum_locks; i++)
if (ml->node == dlm->node_num)
for (j = DLM_GRANTED_LIST; j <<< lock is invalid as grant list is empty.
}
if (lock->ml.node != ml->node)
BUG() >>> crash here
}

I see the above locks status from a vmcore of our internal bug.

Signed-off-by: Junxiao Bi
Reviewed-by: Wengang Wang
Cc: Sunil Mushran
Reviewed-by: Srinivas Eeda
Cc: Joel Becker
Cc: Mark Fasheh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2014-04-04 07:20:54 +0800
181a9a043 ocfs2: fix null pointer dereference when access dlm_state before launching dlm thread ... Browse Code »

When mounting an ocfs2 volume, it will firstly generate a file
/sys/kernel/debug/o2dlm//dlm_state, and then launch the dlm thread.
So the following situation will cause a null pointer dereference.
dlm_debug_init -> access file dlm_state which will call dlm_state_print ->
dlm_launch_thread

Move dlm_debug_init after dlm_launch_thread and dlm_launch_recovery_thread
can fix this issue.

Signed-off-by: Zongxun Wang
Signed-off-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zongxun Wang
2014-04-04 07:20:53 +0800

22 Jan, 2014

1 commit

ff8fb3352 ocfs2: remove versioning information ... Browse Code »

The versioning information is confusing for end-users. The numbers are
stuck at 1.5.0 when the tools version have moved to 1.8.2. Remove the
versioning system in the OCFS2 modules and let the kernel version be the
guide to debug issues.

Signed-off-by: Goldwyn Rodrigues
Acked-by: Sunil Mushran
Cc: Mark Fasheh
Acked-by: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Goldwyn Rodrigues
2014-01-22 08:19:41 +0800

13 Nov, 2013

3 commits

728b98059 ocfs2: break useless while loop ... Browse Code »

Signed-off-by: Junxiao Bi
Signed-off-by: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junxiao Bi
2013-11-13 11:09:01 +0800
fae477b6f ocfs2: delay migration when the lockres is in migration state ... Browse Code »

We trigger a bug in __dlm_lockres_reserve_ast() when we parallel umount 4
nodes. The situation is as follows:

1) Node A migrate all lockres it owned(eg. lockres A) to other nodes
say node B when it umounts.

2) Receiving MIG_LOCKRES message from A, Node B masters the lockres A
with DLM_LOCK_RES_MIGRATING state set.

3) Then we umount ocfs2 on node B. It also should migrate lockres A to
another node, say node C. But now, DLM_LOCK_RES_MIGRATING state of
lockers A is not cleared. Node B triggered the BUG on lockres with
state DLM_LOCK_RES_MIGRATING.

Signed-off-by: Xuejiufei
Signed-off-by: Joel Becker
Cc: Mark Fasheh
Cc: Tariq Saeed
Cc: Srinivas Eeda
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2013-11-13 11:09:01 +0800
750e3c658 ocfs2: skip locks in the blocked list ... Browse Code »

A parallel umount on 4 nodes triggered a bug in
dlm_process_recovery_date(). Here's the situation:

Receiving MIG_LOCKRES message, A node processes the locks in migratable
lockres. It copys lvb from migratable lockres when processing the first
valid lock.

If there is a lock in the blocked list with the EX level, it triggers the
BUG. Since valid lvbs are set when locks are granted with EX or PR
levels, locks in the blocked list cannot have valid lvbs. Therefore I
think we should skip the locks in the blocked list.

Signed-off-by: Xuejiufei
Signed-off-by: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2013-11-13 11:09:01 +0800

12 Sep, 2013

3 commits

69b2bd16d ocfs2/dlm: force clean refmap when doing local cleanup ... Browse Code »

dlm_do_local_recovery_cleanup() should force clean refmap if the owner of
lockres is UNKNOWN. Otherwise node may hang when umounting filesystems.
Here's the situation:

Node1 Node2
dlmlock()
-> dlm_get_lock_resource()
send DLM_MASTER_REQUEST_MSG to
other nodes.

trying to master this lockres,
return MAYBE.

selected as the master of lockresA,
set mle->master to Node1,
and do assert_master,
send DLM_ASSERT_MASTER_MSG to Node2.
Node 2 has interest on lockresA
and return
DLM_ASSERT_RESPONSE_MASTERY_REF
then something happened and
Node2 crashed.

Receiving DLM_ASSERT_RESPONSE_MASTERY_REF, set Node2 into refmap, and keep
sending DLM_ASSERT_MASTER_MSG to other nodes

o2hb found node2 down, calling dlm_hb_node_down() -->
dlm_do_local_recovery_cleanup() the master of lockresA is still UNKNOWN,
no need to call dlm_free_dead_locks().

Set the master of lockresA to Node1, but Node2 stills remains in refmap.

When Node1 umount, it found that the refmap of lockresA is not empty and
attempted to migrate it to Node2, But Node2 is already down, so umount
hang, trying to migrate lockresA again and again.

Signed-off-by: joyce
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Jie Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2013-09-12 06:56:49 +0800
df53cd3b7 ocfs2: use list_for_each_entry() instead of list_for_each() ... Browse Code »

[dan.carpenter@oracle.com: fix up some NULL dereference bugs]
Signed-off-by: Dong Fang
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Jeff Liu
Signed-off-by: Dan Carpenter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dong Fang
2013-09-12 06:56:36 +0800
98ac9125c ocfs2: dlm_request_all_locks() should deal with the status sent from target node ... Browse Code »

dlm_request_all_locks() should deal with the status sent from target node
if DLM_LOCK_REQUEST_MSG is sent successfully, or recovery master will fall
into endless loop, waiting for other nodes to send locks and
DLM_RECO_DATA_DONE_MSG to me.

NodeA NodeB
selected as recovery master
dlm_remaster_locks()
->dlm_request_all_locks()
send DLM_LOCK_REQUEST_MSG to nodeA

It happened that NodeA cannot alloc memory when it processes this
message. dlm_request_all_locks_handler() do not queue
dlm_request_all_locks_worker and returns -ENOMEM. It will never send
locks and DLM_RECO_DATA_DONE_MSG to NodeB.

NodeB do not deal with the status
sent from nodeA, and will fall in
endless loop waiting for the
recovery state of NodeA to be
changed.

Signed-off-by: joyce
Cc: Mark Fasheh
Cc: Jeff Liu
Cc: Joel Becker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xue jiufei
2013-09-12 06:56:31 +0800