Eric Lee / smarc-fsl-linux-kernel

25 Nov, 2020

1 commit

515b269d5 gfs2: set lockdep subclass for iopen glocks ... Browse Code »

This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:

============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20

but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&gl->gl_lockref.lock);
lock(&gl->gl_lockref.lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/0:1/12:
#0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
dump_stack+0x8b/0xb0
__lock_acquire.cold+0x19e/0x2e3
lock_acquire+0x150/0x410
? lockref_get+0x9/0x20
_raw_spin_lock+0x27/0x40
? lockref_get+0x9/0x20
lockref_get+0x9/0x20
delete_work_func+0x188/0x260
process_one_work+0x237/0x540
worker_thread+0x4d/0x3b0
? process_one_work+0x540/0x540
kthread+0x127/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30

Suggested-by: Andreas Gruenbacher
Signed-off-by: Alexander Aring
Signed-off-by: Andreas Gruenbacher

Alexander Aring
2020-11-25 06:45:58 +0800

03 Nov, 2020

1 commit

da7d554f7 gfs2: Wake up when sd_glock_disposal becomes zero ... Browse Code »

Commit fc0e38dae645 ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero. As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up. Add the missing wakeup.

Fixes: fc0e38dae645 ("GFS2: Fix glock deallocation race")
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: Alexander Aring
Signed-off-by: Andreas Gruenbacher

Alexander Aring
2020-11-03 21:39:11 +0800

21 Oct, 2020

2 commits

2ffed5290 gfs2: Only access gl_delete for iopen glocks ... Browse Code »

Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.

Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-21 05:16:22 +0800
dbffb29da gfs2: Fix comments to glock_hash_walk ... Browse Code »

The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-21 05:16:16 +0800

15 Oct, 2020

3 commits

e2c6c8a79 gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) ... Browse Code »

Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.

The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.

This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.

We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.

This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 23:04:53 +0800
ee1e2c773 gfs2: call truncate_inode_pages_final for address space glocks ... Browse Code »

Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 05:54:42 +0800
e8a8023ee gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro ... Browse Code »

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Liu Shixin
Signed-off-by: Andreas Gruenbacher

Liu Shixin
2020-10-15 05:54:41 +0800

03 Aug, 2020

2 commits

c07bfb4d8 gfs2: Fix refcount leak in gfs2_glock_poke ... Browse Code »

In gfs2_glock_poke, make sure gfs2_holder_uninit is called on the local
glock holder. Without that, we're leaking a glock and a pid reference.

Fixes: 9e8990dea926 ("gfs2: Smarter iopen glock waiting")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-08-03 19:45:37 +0800
5deaf1f63 gfs2: Add some flags missing from glock output ... Browse Code »

Before this patch, three flags were not represented in the glock output.
This patch adds them in:

c - GLF_INODE_CREATING
P - GLF_PENDING_DELETE
x - GLF_FREEING (both f and F are already used)

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-08-03 19:20:13 +0800

30 Jun, 2020

1 commit

34244d711 gfs2: Don't sleep during glock hash walk ... Browse Code »

In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work. This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-30 19:04:45 +0800

06 Jun, 2020

9 commits

300e549b6 Merge branch 'gfs2-iopen' into for-next Browse Code »

Andreas Gruenbacher
2020-06-06 03:25:36 +0800
9e8990dea gfs2: Smarter iopen glock waiting ... Browse Code »

When trying to upgrade the iopen glock from a shared to an exclusive lock in
gfs2_evict_inode, abort the wait if there is contention on the corresponding
inode glock: in that case, the inode must still be in active use on another
node, and we're not guaranteed to get the iopen glock anytime soon.

To make this work even better, when we notice contention on the iopen glock and
we can't evict the corresponsing inode and release the iopen glock immediately,
poke the inode glock. The other node(s) trying to acquire the lock can then
abort instead of timing out.

Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
version of this patch.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-06 02:19:21 +0800
35b6f8fbc gfs2: Wake up when setting GLF_DEMOTE ... Browse Code »

Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE
flag.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-06 02:19:21 +0800
b0dcffd8d gfs2: Check inode generation number in delete_work_func ... Browse Code »

In delete_work_func, if the iopen glock still has an inode attached,
limit the inode lookup to that specific generation number: in the likely
case that the inode was deleted on the node on which the inode's link
count dropped to zero, we can skip verifying the on-disk block type and
reading in the inode. The same applies if another node that had the
inode open managed to delete the inode before us.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-06 02:19:21 +0800
6bdcadea7 gfs2: Minor gfs2_lookup_by_inum cleanup ... Browse Code »

Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode
generation number will qualify: a valid inode never has a zero no_formal_ino.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-06 02:19:21 +0800
8c7b9262a gfs2: Give up the iopen glock on contention ... Browse Code »

When there's contention on the iopen glock, it means that the link count
of the corresponding inode has dropped to zero on a remote node which is
now trying to delete the inode. In that case, try to evict the inode so
that the iopen glock will be released, which will allow the remote node
to do its job.

When the inode is still open locally, the inode's reference count won't
drop to zero and so we'll keep holding the inode and its iopen glock.
The remote node will time out its request to grab the iopen glock, and
when the inode is finally closed locally, we'll try to delete it
ourself.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-06 02:19:21 +0800
a0e3cc65f gfs2: Turn gl_delete into a delayed work ... Browse Code »

This requires flushing delayed work items in gfs2_make_fs_ro (which is called
before unmounting a filesystem).

When inodes are deleted and then recreated, pending gl_delete work items would
have no effect because the inode generations will have changed, so we can
cancel any pending gl_delete works before reusing iopen glocks.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-06 02:19:21 +0800
f286d627e gfs2: Keep track of deleted inode generations in LVBs ... Browse Code »

When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB). When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes. This avoids taking the resource
group glock in order to verify the block type.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-06-06 02:19:20 +0800
15f2547b4 gfs2: Allow ASPACE glocks to also have an lvb ... Browse Code »

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-06-06 02:18:59 +0800

05 Jun, 2020

2 commits

ea4e61c7f gfs2: introduce new gfs2_glock_assert_withdraw ... Browse Code »

Before this patch, asserts based on glocks did not print the glock with
the error. This patch introduces a new macro, gfs2_glock_assert_withdraw
which first prints the glock, then takes the assert.

This also changes a few glock asserts to the new macro.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-06-05 22:44:29 +0800
7e901d6e9 gfs2: print mapping->nrpages in glock dump for address space glocks ... Browse Code »

This patch makes the glock dumps in debugfs print the number of pages
(nrpages) for address space glocks. This will aid in debugging.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-06-05 20:58:23 +0800

09 May, 2020

2 commits

b14c94908 Revert "gfs2: Don't demote a glock until its revokes are written" ... Browse Code »

This reverts commit df5db5f9ee112e76b5202fbc331f990a0fc316d6.

This patch fixes a regression: patch df5db5f9ee112 allowed function
run_queue() to bypass its call to do_xmote() if revokes were queued for
the glock. That's wrong because its call to do_xmote() is what is
responsible for calling the go_sync() glops functions to sync both
the ail list and any revokes queued for it. By bypassing the call,
gfs2 could get into a stand-off where the glock could not be demoted
until its revokes are written back, but the revokes would not be
written back because do_xmote() was never called.

It "sort of" works, however, because there are other mechanisms like
the log flush daemon (logd) that can sync the ail items and revokes,
if it deems it necessary. The problem is: without file system pressure,
it might never deem it necessary.

Signed-off-by: Bob Peterson

Bob Peterson
2020-05-09 04:01:25 +0800
b11e1a84f gfs2: If go_sync returns error, withdraw but skip invalidate ... Browse Code »

Before this patch, if the go_sync operation returned an error during
the do_xmote process (such as unable to sync metadata to the journal)
the code did goto out. That kept the glock locked, so it could not be
given away, which correctly avoids file system corruption. However,
it never set the withdraw bit or requeueing the glock work. So it would
hang forever, unable to ever demote the glock.

This patch changes to goto to a new label, skip_inval, so that errors
from go_sync are treated the same way as errors from go_inval:
The delayed withdraw bit is set and the work is requeued. That way,
the logd should eventually figure out there's a problem and withdraw
properly there.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-05-09 04:00:07 +0800

08 May, 2020

1 commit

a8b7528b6 gfs2: Fix error exit in do_xmote ... Browse Code »

Before this patch, if an error was detected from glock function go_sync
by function do_xmote, it would return. But the function had temporarily
unlocked the gl_lockref spin_lock, and it never re-locked it. When the
caller of do_xmote tried to unlock it again, it was already unlocked,
which resulted in a corrupted spin_lock value.

This patch makes sure the gl_lockref spin_lock is re-locked after it is
unlocked.

Thanks to Wu Bo for reporting this problem.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-05-08 20:45:38 +0800

28 Mar, 2020

1 commit

969183bc6 gfs2: Switch to list_{first,last}_entry ... Browse Code »

Replace open-coded versions of list_first_entry and list_last_entry with those
functions.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson

Andreas Gruenbacher
2020-03-28 03:08:04 +0800

27 Feb, 2020

5 commits

1c634f94c gfs2: Do proper error checking for go_sync family of glops functions ... Browse Code »

Before this patch, function do_xmote would try to sync out the glock
dirty data by calling the appropriate glops function XXX_go_sync()
but it did not check for a good return code. If the sync was not
possible due to an io error or whatever, do_xmote would continue on
and call go_inval and release the glock to other cluster nodes.
When those nodes go to replay the journal, they may already be holding
glocks for the journal records that should have been synced, but were
not due to the ignored error.

This patch introduces proper error code checking to the go_sync
family of glops functions.

Signed-off-by: Bob Peterson
Reviewed-by: Andreas Gruenbacher

Bob Peterson
2020-02-27 21:53:18 +0800
df5db5f9e gfs2: Don't demote a glock until its revokes are written ... Browse Code »

Before this patch, run_queue would demote glocks based on whether
there are any more holders. But if the glock has pending revokes that
haven't been written to the media, giving up the glock might end in
file system corruption if the revokes never get written due to
io errors, node crashes and fences, etc. In that case, another node
will replay the metadata blocks associated with the glock, but
because the revoke was never written, it could replay that block
even though the glock had since been granted to another node who
might have made changes.

This patch changes the logic in run_queue so that it never demotes
a glock until its count of pending revokes reaches zero.

Signed-off-by: Bob Peterson
Reviewed-by: Andreas Gruenbacher

Bob Peterson
2020-02-27 21:53:18 +0800
d93ae386e gfs2: Check for log write errors before telling dlm to unlock ... Browse Code »

Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple checks for errors
in do_xmote after the calls to go_sync and go_inval. If an error
is found, we cannot withdraw yet, because the withdraw itself
uses glocks to make the file system read-only. Instead, we flag
the error. Later, asserts should cause another node to replay
the journal before continuing, thus protecting rgrp and dinode
glocks and maintaining the integrity of the metadata. Note that
we only need to do this for journaled glocks. System glocks
should be able to progress even under withdrawn conditions.

Signed-off-by: Bob Peterson
Reviewed-by: Andreas Gruenbacher

Bob Peterson
2020-02-27 21:53:18 +0800
33dbd1e41 gfs2: fix infinite loop when checking ail item count before go_inval ... Browse Code »

Before this patch, the rgrp_go_inval and inode_go_inval functions each
checked if there were any items left on the ail count (by way of a
count), and if so, did a withdraw. But the withdraw code now uses
glocks when changing the file system to read-only status. So we can
not have glock functions withdrawing or a hang will likely result:
The glocks can't be serviced by the work_func if the work_func is
busy doing its own withdraw.

This patch removes the checks from the go_inval functions and adds
a centralized check in do_xmote to warn about the problem and not
withdraw, but flag the error so it's eventually caught when the logd
daemon eventually runs.

Signed-off-by: Bob Peterson
Reviewed-by: Andreas Gruenbacher

Bob Peterson
2020-02-27 21:53:17 +0800
601ef0d52 gfs2: Force withdraw to replay journals and wait for it to finish ... Browse Code »

When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held.

The "live" glock which is now used to signal when a withdraw
occurs. When a withdraw occurs, the node signals its withdraw by
dequeueing the "live" glock and trying to enqueue it in EX mode,
thus forcing the other nodes to all see a demote request, by way
of a "1CB" (one callback) try lock. The "live" glock is not
granted in EX; the callback is only just used to indicate a
withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Also note that the withdraw function may be called from a wide
variety of situations, and therefore, we need to take extra
precautions to make sure pointers are valid before using them in
many circumstances.

We also need to take care when glocks decide to withdraw, since
the withdraw code now uses glocks.

Also, before this patch, if a process encountered an error and
decided to withdraw, if another process was already withdrawing,
the second withdraw would be silently ignored, which set it free
to unlock its glocks. That's correct behavior if the original
withdrawer encounters further errors down the road. But if
secondary waiters don't wait for the journal replay, unlocking
glocks will allow other nodes to use them, despite the fact that
the journal containing those blocks is being replayed. The
replay needs to finish before our glocks are released to other
nodes. IOW, secondary withdraws need to wait for the first
withdraw to finish.

For example, if an rgrp glock is unlocked by a process that didn't
wait for the first withdraw, a journal replay could introduce file
system corruption by replaying a rgrp block that has already been
granted to a different cluster node.

Signed-off-by: Bob Peterson

Bob Peterson
2020-02-27 21:53:12 +0800

21 Feb, 2020

1 commit

a72d2401f gfs2: Allow some glocks to be used during withdraw ... Browse Code »

We need to allow some glocks to be enqueued, dequeued, promoted, and demoted
when we're withdrawn. For example, to maintain metadata integrity, we should
disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like
iopen or the transaction glocks may be safely used because none of their
metadata goes through the journal. So in general, we should disallow all
glocks with an address space, and allow all the others. One exception is:
we need to allow our active journal to be demoted so others may recover it.

Allowing glocks after withdraw gives us the ability to take appropriate
action (in a following patch) to have our journal properly replayed by
another node rather than just abandoning the current transactions and
pretending nothing bad happened, leaving the other nodes free to modify
the blocks we had in our journal, which may result in file system
corruption.

Signed-off-by: Bob Peterson

Bob Peterson
2020-02-21 01:01:36 +0800

10 Feb, 2020

1 commit

b3422cacd gfs2: Rework how rgrp buffer_heads are managed ... Browse Code »

Before this patch, the rgrp code had a serious problem related to
how it managed buffer_heads for resource groups. The problem caused
file system corruption, especially in cases of journal replay.

When an rgrp glock was demoted to transfer ownership to a
different cluster node, do_xmote() first calls rgrp_go_sync and then
rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called
gfs2_rgrp_brelse() that dropped the buffer_head reference count.
In most cases, the reference count went to zero, which is right.
However, there were other places where the buffers are handled
differently.

After rgrp_go_sync, do_xmote called rgrp_go_inval which called
gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to
truncate_inode_pages_range would get rid of the pages in memory,
but only if the reference count drops to 0.

Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL.
So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer
to the buffer_heads in cases where the reference count was still 1.
Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time,
it failed the check for "if (bi->bi_bh)" and thus failed to call
brelse a second time. Because of that, the reference count on those
buffers sometimes failed to drop from 1 to 0. And that caused
function truncate_inode_pages_range to keep the pages in page cache
rather than freeing them.

The next time the rgrp glock was acquired, the metadata read of
the rgrp buffers re-used the pages in memory, which were now
wrong because they were likely modified by the other node who
acquired the glock in EX (which is why we demoted the glock).
This re-use of the page cache caused corruption because changes
made by the other nodes were never seen, so the bitmaps were
inaccurate.

For some reason, the problem became most apparent when journal
replay forced the replay of rgrps in memory, which caused newer
rgrp data to be overwritten by the older in-core pages.

A big part of the problem was that the rgrp buffer were released
in multiple places: The go_unlock function would release them when
the glock was released rather than when the glock is demoted,
which is clearly wrong because our intent was to cache them until
the glock is demoted from SH or EX.

This patch attempts to clean up the mess and make one consistent
and centralized mechanism for managing the rgrp buffer_heads by
implementing several changes:

1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync.
We don't want to release the buffers or zero the pointers when
syncing for the reasons stated above. It only makes sense to
release them when the glock is actually invalidated (go_inval).
And when we do, then we set the bh pointers to NULL.
2. The go_unlock function (which was only used for rgrps) is
eliminated, as we've talked about doing many times before.
The go_unlock function was called too early in the glock dq
process, and should not happen until the glock is invalidated.
3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd.
That will now happen automatically when the rgrp glocks are
demoted, and shouldn't happen any sooner or later than that.
Instead, function gfs2_clear_rgrpd has been modified to demote
the rgrp glocks, and therefore, free those pages, before the
remaining glocks are culled by gfs2_gl_hash_clear. This
prevents the gl_object from hanging around when the glocks are
culled.

Signed-off-by: Bob Peterson
Reviewed-by: Andreas Gruenbacher

Bob Peterson
2020-02-10 21:39:48 +0800

20 Jan, 2020

1 commit

f7be987b8 gfs2: Remove GFS2_MIN_LVB_SIZE define ... Browse Code »

The dlm lockspace is set up to have lock value blocks of GDLM_LVB_SIZE bytes,
and dlm is the only lock manager we support, so there is no point in claiming
that the lock value block could have any other size.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-01-20 15:46:53 +0800

16 Nov, 2019

1 commit

d99724c3c gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS ... Browse Code »

This patch closes a timing window in which two processes compete
and overlap in the execution of do_xmote for the same glock:

Process A Process B
------------------------------------ -----------------------------
1. Grabs gl_lockref and calls do_xmote
2. Grabs gl_lockref but is blocked
3. Sets GLF_INVALIDATE_IN_PROGRESS
4. Unlocks gl_lockref
5. Calls do_xmote
6. Call glops->go_sync
7. test_and_clear_bit GLF_DIRTY
8. Call gfs2_log_flush Call glops->go_sync
9. (slow IO, so it blocks a long time) test_and_clear_bit GLF_DIRTY
It's not dirty (step 7) returns
10. Tests GLF_INVALIDATE_IN_PROGRESS
11. Calls go_inval (rgrp_go_inval)
12. gfs2_rgrp_relse does brelse
13. truncate_inode_pages_range
14. Calls lm_lock UN

In step 14 we've just told dlm to give the glock to another node
when, in fact, process A has not finished the IO and synced all
buffer_heads to disk and make sure their revokes are done.

This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS
to use test_and_set_bit, and if the bit is already set, process B just
ignores it and trusts that process A will do the do_xmote in the proper
order.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2019-11-16 01:21:59 +0800

15 Nov, 2019

1 commit

eb43e660c gfs2: Introduce function gfs2_withdrawn ... Browse Code »

Add function gfs2_withdrawn and replace all checks for the SDF_WITHDRAWN
bit to call it. This does not change the logic or function of gfs2, and
it facilitates later improvements to the withdraw sequence.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2019-11-15 02:46:18 +0800

05 Sep, 2019

2 commits

ad26967b9 gfs2: Use async glocks for rename ... Browse Code »

Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can
reverse the roles of which directories are "old" and which are "new" for
the purposes of rename. This can cause deadlocks where two nodes end up
waiting for each other.

There can be several layers of directory dependencies across many nodes.

This patch fixes the problem by acquiring all gfs2_rename's inode glocks
asychronously and waiting for all glocks to be acquired. That way all
inodes are locked regardless of the order.

The timeout value for multiple asynchronous glocks is calculated to be
the total of the individual wait times for each glock times two.

Since gfs2_exchange is very similar to gfs2_rename, both functions are
patched in the same way.

A new async glock wait queue, sd_async_glock_wait, keeps a list of
waiters for these events. If gfs2's holder_wake function detects an
async holder, it wakes up any waiters for the event. The waiter only
tests whether any of its requests are still pending.

Since the glocks are sent to dlm asychronously, the wait function needs
to check to see which glocks, if any, were granted.

If a glock is granted by dlm (and therefore held), its minimum hold time
is checked and adjusted as necessary, as other glock grants do.

If the event times out, all glocks held thus far must be dequeued to
resolve any existing deadlocks. Then, if there are any outstanding
locking requests, we need to loop around and wait for dlm to respond to
those requests too. After we release all requests, we return -ESTALE to
the caller (vfs rename) which loops around and retries the request.

Node1 Node2
--------- ---------
1. Enqueue A Enqueue B
2. Enqueue B Enqueue A
3. A granted
6. B granted
7. Wait for B
8. Wait for A
9. A times out (since Node 1 holds A)
10. Dequeue B (since it was granted)
11. Wait for all requests from DLM
12. B Granted (since Node2 released it in step 10)
13. Rename
14. Dequeue A
15. DLM Grants A
16. Dequeue A (due to the timeout and since we
no longer have B held for our task).
17. Dequeue B
18. Return -ESTALE to vfs
19. VFS retries the operation, goto step 1.

This release-all-locks / acquire-all-locks may slow rename / exchange
down as both nodes struggle in the same way and do the same thing.
However, this will only happen when there is contention for the same
inodes, which ought to be rare.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2019-09-05 02:22:17 +0800
01123cf17 gfs2: create function gfs2_glock_update_hold_time ... Browse Code »

This patch moves the code that updates glock minimum hold
time to a separate function. This will be called by a future
patch.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson

Andreas Gruenbacher
2019-09-05 02:22:17 +0800

03 Sep, 2019

1 commit

98fb05748 gfs2: Fix possible fs name overflows ... Browse Code »

This patch fixes three places in which temporary character buffers
could overflow due to the addition of the file system id from patch
3792ce973f07. Thanks to Dan Carpenter for pointing it out.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2019-09-03 15:42:41 +0800

28 Jun, 2019

2 commits

3792ce973 gfs2: dump fsid when dumping glock problems ... Browse Code »

Before this patch, if a glock error was encountered, the glock with
the problem was dumped. But sometimes you may have lots of file systems
mounted, and that doesn't tell you which file system it was for.

This patch adds a new boolean parameter fsid to the dump_glock family
of functions. For non-error cases, such as dumping the glocks debugfs
file, the fsid is not dumped in order to keep lock dumps and glocktop
as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
file system id is now printed. This will make it easier to debug.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2019-06-28 03:27:43 +0800
04aea0ca1 gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN ... Browse Code »

Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2019-06-28 03:26:35 +0800