Eric Lee / smarc-fsl-linux-kernel

04 Dec, 2020

1 commit

0ba6450eb Merge 34816d20f173 ("Merge tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pu… ... Browse Code »

…b/scm/linux/kernel/git/gfs2/linux-gfs2") into android-mainline

Steps on the way to 5.10-rc7

Resolves a merge issue in:
arch/arm64/kernel/process.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If22f5ca1f09e08cdb95f841f3381eda5cd31ee00

Greg Kroah-Hartman
2020-12-04 01:58:23 +0800

01 Dec, 2020

1 commit

dd0ecf544 gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func ... Browse Code »

In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending
delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work
may block waiting for delete_work_func to complete, and delete_work_func may
block trying to acquire the inode glock in gfs2_inode_lookup.

Reported-by: Alexander Aring
Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-12-01 07:21:10 +0800

27 Nov, 2020

1 commit

82e938bd5 gfs2: Upgrade shared glocks for atime updates ... Browse Code »

Commit 20f829999c38 ("gfs2: Rework read and page fault locking") lifted
the glock lock taking from the low-level ->readpage and ->readahead
address space operations to the higher-level ->read_iter file and
->fault vm operations. The glocks are still taken in LM_ST_SHARED mode
only. On filesystems mounted without the noatime option, ->read_iter
sometimes needs to update the atime as well, though. Right now, this
leads to a failed locking mode assertion in gfs2_dirty_inode.

Fix that by introducing a new update_time inode operation. There, if
the glock is held non-exclusively, upgrade it to an exclusive lock.

Reported-by: Alexander Aring
Fixes: 20f829999c38 ("gfs2: Rework read and page fault locking")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-11-27 02:58:25 +0800

26 Nov, 2020

2 commits

f39e7d3aa gfs2: Don't freeze the file system during unmount ... Browse Code »

GFS2's freeze/thaw mechanism uses a special freeze glock to control its
operation. It does this with a sync glock operation (glops.c) called
freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the
glops function causes the file system to be frozen. This is intended. However,
GFS2's mount and unmount processes also hold the freeze glock to prevent other
processes, perhaps on different cluster nodes, from mounting the frozen file
system in read-write mode.

Before this patch, there was no check in freeze_go_sync for whether a freeze
in intended or whether the glock demote was caused by a normal unmount.
So it was trying to freeze the file system it's trying to unmount, which
ends up in a deadlock.

This patch adds an additional check to freeze_go_sync so that demotes of the
freeze glock are ignored if they come from the unmount process.

Fixes: 20b329129009 ("gfs2: Fix regression in freeze_go_sync")
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-11-26 01:12:08 +0800
778721510 gfs2: check for empty rgrp tree in gfs2_ri_update ... Browse Code »

If gfs2 tries to mount a (corrupt) file system that has no resource
groups it still tries to set preferences on the first one, which causes
a kernel null pointer dereference. This patch adds a check to function
gfs2_ri_update so this condition is detected and reported back as an
error.

Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-11-26 01:10:55 +0800

25 Nov, 2020

2 commits

515b269d5 gfs2: set lockdep subclass for iopen glocks ... Browse Code »

This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:

============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20

but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&gl->gl_lockref.lock);
lock(&gl->gl_lockref.lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/0:1/12:
#0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
dump_stack+0x8b/0xb0
__lock_acquire.cold+0x19e/0x2e3
lock_acquire+0x150/0x410
? lockref_get+0x9/0x20
_raw_spin_lock+0x27/0x40
? lockref_get+0x9/0x20
lockref_get+0x9/0x20
delete_work_func+0x188/0x260
process_one_work+0x237/0x540
worker_thread+0x4d/0x3b0
? process_one_work+0x540/0x540
kthread+0x127/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30

Suggested-by: Andreas Gruenbacher
Signed-off-by: Alexander Aring
Signed-off-by: Andreas Gruenbacher

Alexander Aring
2020-11-25 06:45:58 +0800
16e6281b6 gfs2: Fix deadlock dumping resource group glocks ... Browse Code »

Commit 0e539ca1bbbe ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
introduced additional locking in gfs2_rgrp_go_dump, which is also used for
dumping resource group glocks via debugfs. However, on that code path, the
glock spin lock is already taken in dump_glock, and taking it again in
gfs2_glock2rgrp leads to deadlock. This can be reproduced with:

$ mkfs.gfs2 -O -p lock_nolock /dev/FOO
$ mount /dev/FOO /mnt/foo
$ touch /mnt/foo/bar
$ cat /sys/kernel/debug/gfs2/FOO/glocks

Fix that by not taking the glock spin lock inside the go_dump callback.

Fixes: 0e539ca1bbbe ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
Signed-off-by: Alexander Aring
Signed-off-by: Andreas Gruenbacher

Alexander Aring
2020-11-25 06:45:58 +0800

20 Nov, 2020

1 commit

d53cfb36d Merge 4d02da974ea8 ("Merge tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/li… ... Browse Code »

…nux/kernel/git/netdev/net") into android-mainline

Steps on the way to 5.10-rc5

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I00726ee0d08f08ae6ac5edd07c8fa502b41d4800

Greg Kroah-Hartman
2020-11-20 22:06:42 +0800

18 Nov, 2020

1 commit

20b329129 gfs2: Fix regression in freeze_go_sync ... Browse Code »

Patch 541656d3a513 ("gfs2: freeze should work on read-only mounts") changed
the check for glock state in function freeze_go_sync() from "gl->gl_state
== LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE". That's wrong and it
regressed gfs2's freeze/thaw mechanism because it caused only the freezing
node (which requests the glock in EX) to queue freeze work.

All nodes go through this go_sync code path during the freeze to drop their
SHared hold on the freeze glock, allowing the freezing node to acquire it
in EXclusive mode. But all the nodes must freeze access to the file system
locally, so they ALL must queue freeze work. The freeze_work calls
freeze_func, which makes a request to reacquire the freeze glock in SH,
effectively blocking until the thaw from the EX holder. Once thawed, the
freezing node drops its EX hold on the freeze glock, then the (blocked)
freeze_func reacquires the freeze glock in SH again (on all nodes, including
the freezer) so all nodes go back to a thawed state.

This patch changes the check back to gl_state == LM_ST_SHARED like it was
prior to 541656d3a513.

Fixes: 541656d3a513 ("gfs2: freeze should work on read-only mounts")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-11-18 23:28:11 +0800

13 Nov, 2020

3 commits

3428a2f78 Merge 585e5b17b92d ("Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/s… ... Browse Code »

…cm/fs/fscrypt/fscrypt") into android-mainline

Steps on the way to 5.10-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8554ba37704bee02192ff6117d4909fde568fca2

Greg Kroah-Hartman
2020-11-13 15:26:07 +0800
4e79e3f08 gfs2: Fix case in which ail writes are done to jdata holes ... Browse Code »

Patch b2a846dbef4e ("gfs2: Ignore journal log writes for jdata holes")
tried (unsuccessfully) to fix a case in which writes were done to jdata
blocks, the blocks are sent to the ail list, then a punch_hole or truncate
operation caused the blocks to be freed. In other words, the ail items
are for jdata holes. Before b2a846dbef4e, the jdata hole caused function
gfs2_block_map to return -EIO, which was eventually interpreted as an
IO error to the journal, and then withdraw.

This patch changes function gfs2_get_block_noalloc, which is only used
for jdata writes, so it returns -ENODATA rather than -EIO, and when
-ENODATA is returned to gfs2_ail1_start_one, the error is ignored.
We can safely ignore it because gfs2_ail1_start_one is only called
when the jdata pages have already been written and truncated, so the
ail1 content no longer applies.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-11-13 01:55:20 +0800
d3039c061 Revert "gfs2: Ignore journal log writes for jdata holes" ... Browse Code »

This reverts commit b2a846dbef4ef54ef032f0f5ee188c609a0278a7.

That commit changed the behavior of function gfs2_block_map to return
-ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
false. While that fixed the intended problem for jdata, it also broke
other callers of gfs2_block_map such as some jdata block reads. Before
the patch, an encountered hole would be skipped and the buffer seen as
unmapped by the caller. The patch changed the behavior to return
-ENODATA, which is interpreted as an error by the caller.

The -ENODATA return code should be restricted to the specific case where
jdata holes are encountered during ail1 writes. That will be done in a
later patch.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-11-13 01:41:57 +0800

12 Nov, 2020

1 commit

bc923818b gfs2: fix possible reference leak in gfs2_check_blk_type ... Browse Code »

In the fail path of gfs2_check_blk_type, forgetting to call
gfs2_glock_dq_uninit will result in rgd_gh reference leak.

Signed-off-by: Zhang Qilong
Signed-off-by: Andreas Gruenbacher

Zhang Qilong
2020-11-12 20:09:07 +0800

06 Nov, 2020

1 commit

3d5e271d1 Merge 521b619acdc8 ("Merge tag 'linux-kselftest-kunit-fixes-5.10-rc3' of git://g… ... Browse Code »

…it.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest") into android-mainline

Steps on the way to 5.10-rc3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I57f80255bf5d396e92a54807a516cc41cf07be61

Greg Kroah-Hartman
2020-11-06 16:13:35 +0800

03 Nov, 2020

2 commits

da7d554f7 gfs2: Wake up when sd_glock_disposal becomes zero ... Browse Code »

Commit fc0e38dae645 ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero. As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up. Add the missing wakeup.

Fixes: fc0e38dae645 ("GFS2: Fix glock deallocation race")
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: Alexander Aring
Signed-off-by: Andreas Gruenbacher

Alexander Aring
2020-11-03 21:39:11 +0800
6bd1c7bd4 gfs2: Don't call cancel_delayed_work_sync from within delete work function ... Browse Code »

Right now, we can end up calling cancel_delayed_work_sync from within
delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup ->
gfs2_cancel_delete_work. When that happens, it will result in a
deadlock. Instead, gfs2_inode_lookup should skip the call to
gfs2_cancel_delete_work when called from delete_work_func (blktype ==
GFS2_BLKST_UNLINKED).

Reported-by: Alexander Ahring Oder Aring
Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-11-03 04:34:47 +0800

30 Oct, 2020

6 commits

c5c687246 gfs2: check for live vs. read-only file system in gfs2_fitrim ... Browse Code »

Before this patch, gfs2_fitrim was not properly checking for a "live" file
system. If the file system had something to trim and the file system
was read-only (or spectator) it would start the trim, but when it starts
the transaction, gfs2_trans_begin returns -EROFS (read-only file system)
and it errors out. However, if the file system was already trimmed so
there's no work to do, it never called gfs2_trans_begin. That code is
bypassed so it never returns the error. Instead, it returns a good
return code with 0 work. All this makes for inconsistent behavior:
The same fstrim command can return -EROFS in one case and 0 in another.
This tripped up xfstests generic/537 which reports the error as:

+fstrim with unrecovered metadata just ate your filesystem

This patch adds a check for a "live" (iow, active journal, iow, RW)
file system, and if not, returns the error properly.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-30 05:16:47 +0800
7e5b92669 gfs2: don't initialize statfs_change inodes in spectator mode ... Browse Code »

Before commit 97fd734ba17e, the local statfs_changeX inode was never
initialized for spectator mounts. However, it still checks for
spectator mounts when unmounting everything. There's no good reason to
lookup the statfs_changeX files because spectators cannot perform recovery.
It still, however, needs the master statfs file for statfs calls.
This patch adds the check for spectator mounts to init_statfs.

Fixes: 97fd734ba17e ("gfs2: lookup local statfs inodes prior to journal recovery")
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-30 05:16:46 +0800
4a55752ae gfs2: Split up gfs2_meta_sync into inode and rgrp versions ... Browse Code »

Before this patch, function gfs2_meta_sync called filemap_fdatawrite to write
the address space for the metadata being synced. That's great for inodes, but
resource groups all point to the same superblock-address space, sdp->sd_aspace.
Each rgrp has its own range of blocks on which it should operate. That meant
every time an rgrp's metadata was synced, it would write all of them instead
of just the range.

This patch eliminates function gfs2_meta_sync and tailors specific metasync
functions for inodes and rgrps.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-30 05:16:46 +0800
c4af59bd4 gfs2: init_journal's undo directive should also undo the statfs inodes ... Browse Code »

Hi,

Before this patch, function init_journal's "undo" directive jumped to label
fail_jinode_gh. But now that it does statfs initialization, it needs to
jump to fail_statfs instead. Failure to do so means that mount failures
after init_journal is successful will neglect to let go of the proper
statfs information, stranding the statfs_changeX inodes. This makes it
impossible to free its glocks, and results in:

gfs2: fsid=sda.s: G: s:EX n:2/805f f:Dqob t:EX d:UN/603701000 a:0 v:0 r:4 m:200 p:1
gfs2: fsid=sda.s: H: s:EX f:H e:0 p:1397947 [(ended)] init_journal+0x548/0x890 [gfs2]
gfs2: fsid=sda.s: I: n:6/32863 t:8 f:0x00 d:0x00000201 s:24 p:0
gfs2: fsid=sda.s: G: s:SH n:5/805f f:Dqob t:SH d:UN/603712000 a:0 v:0 r:3 m:200 p:0
gfs2: fsid=sda.s: H: s:SH f:EH e:0 p:1397947 [(ended)] gfs2_inode_lookup+0x1fb/0x410 [gfs2]
VFS: Busy inodes after unmount of sda. Self-destruct in 5 seconds. Have a nice day...

The next time the file system is mounted, it then reuses the same glocks,
which ends in a kernel NULL pointer dereference when trying to dump the
reused glock.

This patch makes the "undo" function of init_journal jump to fail_statfs
so the statfs files are properly deconstructed upon failure.

Fixes: 97fd734ba17e ("gfs2: lookup local statfs inodes prior to journal recovery")
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-30 05:16:46 +0800
a9dd945cc gfs2: Add missing truncate_inode_pages_final for sd_aspace ... Browse Code »

Gfs2 creates an address space for its rgrps called sd_aspace, but it never
called truncate_inode_pages_final on it. This confused vfs greatly which
tried to reference the address space after gfs2 had freed the superblock
that contained it.

This patch adds a call to truncate_inode_pages_final for sd_aspace, thus
avoiding the use-after-free.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-30 05:16:46 +0800
d0f17d388 gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free ... Browse Code »

Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling
return_all_reservations, but return_all_reservations still dereferences
rgd->rd_bits in __rs_deltree. Fix that by moving the call to kfree below the
call to return_all_reservations.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-30 05:16:36 +0800

27 Oct, 2020

1 commit

79c83f152 Merge 1f70935f637d ("Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/li… ... Browse Code »

…nux/kernel/git/soc/soc") into android-mainline

Steps on the way to 5.10-rc1

Resolves conflicts in:
Documentation/admin-guide/sysctl/vm.rst

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic58f28718f28dae42948c935dfb0c62122fe86fc

Greg Kroah-Hartman
2020-10-27 18:47:16 +0800

23 Oct, 2020

2 commits

bedb0f056 gfs2: Recover statfs info in journal head ... Browse Code »

Apply the outstanding statfs changes in the journal head to the
master statfs file. Zero out the local statfs file for good measure.

Previously, statfs updates would be read in from the local statfs inode and
synced to the master statfs inode during recovery.

We now use the statfs updates in the journal head to update the master statfs
inode instead of reading in from the local statfs inode. To preserve backward
compatibility with kernels that can't do this, we still need to keep the
local statfs inode up to date by writing changes to it. At some point in the
future, we can do away with the local statfs inodes altogether and keep the
statfs changes solely in the journal.

Signed-off-by: Abhi Das
Signed-off-by: Andreas Gruenbacher

Abhi Das
2020-10-23 21:47:38 +0800
97fd734ba gfs2: lookup local statfs inodes prior to journal recovery ... Browse Code »

We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.

Signed-off-by: Abhi Das
Signed-off-by: Andreas Gruenbacher

Abhi Das
2020-10-23 21:47:14 +0800

21 Oct, 2020

5 commits

730926982 gfs2: Add fields for statfs info in struct gfs2_log_header_host ... Browse Code »

And read these in __get_log_header() from the log header.
Also make gfs2_statfs_change_out() non-static so it can be used
outside of super.c

Signed-off-by: Abhi Das
Signed-off-by: Andreas Gruenbacher

Abhi Das
2020-10-21 05:16:22 +0800
ed3adb375 gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync ... Browse Code »

Once a withdraw has occurred, ignore errors that are the consequence of the
withdraw.

Signed-off-by: Andreas Gruenbacher

Andreas Gruenbacher
2020-10-21 05:16:22 +0800
23cfb0c3d gfs2: Eliminate gl_vm ... Browse Code »

The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4ae428
("GFS2: Use range based functions for rgrp sync/invalidation"), which stores
the location of resource groups within their address space. This structure is
in a union with iopen glock specific fields. It was introduced because at
unmount time, the resource group objects were destroyed before flushing out any
pending resource group glock work, and flushing out such work could require
flushing / truncating the address space.

Since commit b3422cacdd7e6 ("gfs2: Rework how rgrp buffer_heads are managed"),
any pending resource group glock work is flushed out before destroying the
resource group objects. So the resource group objects will now always exist in
rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values
where needed instead of caching them. This also eliminates the union.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-21 05:16:22 +0800
2ffed5290 gfs2: Only access gl_delete for iopen glocks ... Browse Code »

Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.

Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-21 05:16:22 +0800
dbffb29da gfs2: Fix comments to glock_hash_walk ... Browse Code »

The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-21 05:16:16 +0800

15 Oct, 2020

10 commits

e2c6c8a79 gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) ... Browse Code »

Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.

The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.

This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.

We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.

This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 23:04:53 +0800
b2a846dbe gfs2: Ignore journal log writes for jdata holes ... Browse Code »

When flushing out its ail1 list, gfs2_write_jdata_page calls function
__block_write_full_page passing in function gfs2_get_block_noalloc.
But there was a problem when a process wrote to a jdata file, then
truncated it or punched a hole, leaving references to the blocks within
the new hole in its ail list, which are to be written to the journal log.

In writing them to the journal, after calling gfs2_block_map, function
gfs2_get_block_noalloc determined that the (hole-punched) block was not
mapped, so it returned -EIO to generic_writepages, which passed it back
to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
there was a real IO error writing to the journal.

This might be a valid error when writing metadata to the journal, but for
journaled data writes, it does not warrant a withdraw.

This patch adds a check to function gfs2_block_map that makes an exception
for journaled data writes that correspond to jdata holes: If the iomap
get function returns a block type of IOMAP_HOLE, it instead returns
-ENODATA which does not cause the withdraw. Other errors are returned as
before.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:04 +0800
a6645745d gfs2: simplify gfs2_block_map ... Browse Code »

Function gfs2_block_map had a lot of redundancy between its create and
no_create paths. This patch simplifies the code to eliminate the redundancy.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:04 +0800
6302d6f43 gfs2: Only set PageChecked if we have a transaction ... Browse Code »

With jdata writes, we frequently got into situations where gfs2 deadlocked
because of this calling sequence:

gfs2_ail1_start
gfs2_ail1_flush - for every tr on the sd_ail1_list:
gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list:
generic_writepages
write_cache_pages passing __writepage()
calls clear_page_dirty_for_io which calls set_page_dirty:
which calls jdata_set_page_dirty which sets PageChecked.
__writepage() calls
mapping->a_ops->writepage AKA gfs2_jdata_writepage

However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it
ignores the write and redirties the page. The problem is that write_cache_pages
calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments
in page-writeback.c starting with "Yes, Virginia". If it's jdata,
set_page_dirty will call jdata_set_page_dirty which will set PageChecked.
That causes a conflict because it makes it look like the page has been
redirtied by another writer, in which case we need to skip writing it and
redirty the page. That ends up in a deadlock because it isn't a "real" writer
and nothing will ever clear PageChecked.

If we do have a real writer, it will have started a transaction. So this
patch checks if a transaction is in use, and if not, it skips setting
PageChecked. That way, the page will be dirtied, cleaned, and written
appropriately.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:03 +0800
249ffe18c gfs2: don't lock sd_ail_lock in gfs2_releasepage ... Browse Code »

Patch 380f7c65a7eb3288e4b6812acf3474a1de230707 changed gfs2_releasepage
so that it held the sd_ail_lock spin_lock for most of its processing.
It did this for some mysterious undocumented bug somewhere in the
evict code path. But in the nine years since, evict has been reworked
and fixed many times, and so have the transactions and ail list.
I can't see a reason to hold the sd_ail_lock unless it's protecting
the actual ail lists hung off the transactions. Therefore, this patch
removes the locking to increase speed and efficiency, and to further help
us rework the log flush code to be more concurrent with transactions.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:03 +0800
36c783092 gfs2: make gfs2_ail1_empty_one return the count of active items ... Browse Code »

This patch is one baby step toward simplifying the journal management.
It simply changes function gfs2_ail1_empty_one from a void to an int and
makes it return a count of active items. This allows the caller to check
the return code rather than list_empty on the tr_ail1_list. This way
we can, in a later patch, combine transaction ail1 and ail2 lists.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:03 +0800
68942870c gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe ... Browse Code »

Before this patch, when blocks were freed, it called gfs2_meta_wipe to
take the metadata out of the pending journal blocks. It did this mostly
by calling another function called gfs2_remove_from_journal. This is
shortsighted because it does not do anything with jdata blocks which
may also be in the journal.

This patch expands the function so that it wipes out jdata blocks from
the journal as well, and it wipes it from the ail1 list if it hasn't
been written back yet. Since it now processes jdata blocks as well,
the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.

New function gfs2_ail1_wipe wants a static view of the ail list, so it
locks the sd_ail_lock when removing items. To accomplish this, function
gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
the caller's responsibility to do so.

I was going to make sd_ail_lock locking conditional, but the practice is
generally frowned upon. For details, see: https://lwn.net/Articles/109066/

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:03 +0800
97c5e43d5 gfs2: enhance log_blocks trace point to show log blocks free ... Browse Code »

This patch adds some code to enhance the log_blocks trace point. It
reports the number of free log blocks. This makes the trace point much
more useful, especially for debugging performance problems when we can
tell when the journal gets full and needs to wait for flushes, etc.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:03 +0800
77650bdbd gfs2: add missing log_blocks trace points in gfs2_write_revokes ... Browse Code »

Function gfs2_write_revokes was incrementing and decrementing the number
of log blocks free, but there was never a log_blocks trace point for it.
Thus, the free blocks from a log_blocks trace would jump around
mysteriously.

This patch adds the missing trace points so the trace makes more sense.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:03 +0800
21b6924bb gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm ... Browse Code »

Since the function is only used for writing jdata pages, this patch
simply renames function gfs2_write_full_page to a more appropriate
name: gfs2_write_jdata_page. This makes the code easier to understand.

The function was only called in one place, which passed in a pointer to
function gfs2_get_block_noalloc. The function doesn't need to be
passed in. Therefore, this also eliminates the unnecessary parameter
to increase efficiency.

I also took the liberty of cleaning up the function comments.

Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher

Bob Peterson
2020-10-15 20:29:03 +0800