Eric Lee / smarc-fsl-linux-kernel

07 Oct, 2020

5 commits

4e919af78 xfs: periodically relog deferred intent items ... Browse Code »

There's a subtle design flaw in the deferred log item code that can lead
to pinning the log tail. Taking up the defer ops chain examples from
the previous commit, we can get trapped in sequences like this:

Caller hands us a transaction t0 with D0-D3 attached. The defer ops
chain will look like the following if the transaction rolls succeed:

t1: D0(t0), D1(t0), D2(t0), D3(t0)
t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

In transaction 9, we finish d9 and try to roll to t10 while holding onto
an intent item for D3 that we logged in t0.

The previous commit changed the order in which we place new defer ops in
the defer ops processing chain to reduce the maximum chain length. Now
make xfs_defer_finish_noroll capable of relogging the entire chain
periodically so that we can always move the log tail forward. Most
chains will never get relogged, except for operations that generate very
long chains (large extents containing many blocks with different sharing
levels) or are on filesystems with small logs and a lot of ongoing
metadata updates.

Callers are now required to ensure that the transaction reservation is
large enough to handle logging done items and new intent items for the
maximum possible chain length. Most callers are careful to keep the
chain lengths low, so the overhead should be minimal.

The decision to relog an intent item is made based on whether the intent
was logged in a previous checkpoint, since there's no point in relogging
an intent into the same checkpoint.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster

Darrick J. Wong
2020-10-07 23:40:28 +0800
ff4ab5e02 xfs: fix an incore inode UAF in xfs_bui_recover ... Browse Code »

In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation. If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.

Unfortunately, the very next thing we do is commit the transaction and
drop the inode. If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up. Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.

Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-10-07 23:40:28 +0800
64a3f3315 xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering ... Browse Code »

In most places in XFS, we have a specific order in which we gather
resources: grab the inode, allocate a transaction, then lock the inode.
xfs_bui_item_recover doesn't do it in that order, so fix it to be more
consistent. This also makes the error bailout code a bit less weird.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Brian Foster

Darrick J. Wong
2020-10-07 23:40:28 +0800
919522e89 xfs: clean up bmap intent item recovery checking ... Browse Code »

The bmap intent item checking code in xfs_bui_item_recover is spread all
over the function. We should check the recovered log item at the top
before we allocate any resources or do anything else, so do that.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-10-07 23:40:28 +0800
e6fff81e4 xfs: proper replay of deferred ops queued during log recovery ... Browse Code »

When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items. As outlined in commit 509955823cc9c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items. Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.

For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items. After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.

This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.

Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent. Finally, refactor the loop
that replays those chains to do so using one transaction per chain.

Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-10-07 23:40:28 +0800

23 Sep, 2020

3 commits

384ff09ba xfs: don't release log intent items when recovery fails ... Browse Code »

Nowadays, log recovery will call ->release on the recovered intent items
if recovery fails. Therefore, it's redundant to release them from
inside the ->recover functions when they're about to return an error.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Darrick J. Wong
2020-09-23 23:58:52 +0800
2dbf872c0 xfs: attach inode to dquot in xfs_bui_item_recover ... Browse Code »

In the bmap intent item recovery code, we must be careful to attach the
inode to its dquots (if quotas are enabled) so that a change in the
shape of the bmap btree doesn't cause the quota counters to be
incorrect.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Darrick J. Wong
2020-09-23 23:58:52 +0800
93293bcbd xfs: log new intent items created as part of finishing recovered intent items ... Browse Code »

During a code inspection, I found a serious bug in the log intent item
recovery code when an intent item cannot complete all the work and
decides to requeue itself to get that done. When this happens, the
item recovery creates a new incore deferred op representing the
remaining work and attaches it to the transaction that it allocated. At
the end of _item_recover, it moves the entire chain of deferred ops to
the dummy parent_tp that xlog_recover_process_intents passed to it, but
fail to log a new intent item for the remaining work before committing
the transaction for the single unit of work.

xlog_finish_defer_ops logs those new intent items once recovery has
finished dealing with the intent items that it recovered, but this isn't
sufficient. If the log is forced to disk after a recovered log item
decides to requeue itself and the system goes down before we call
xlog_finish_defer_ops, the second log recovery will never see the new
intent item and therefore has no idea that there was more work to do.
It will finish recovery leaving the filesystem in a corrupted state.

The same logic applies to /any/ deferred ops added during intent item
recovery, not just the one handling the remaining work.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Darrick J. Wong
2020-09-23 23:58:51 +0800

29 Jul, 2020

1 commit

32a2b11f4 xfs: Remove kmem_zone_zalloc() usage ... Browse Code »

Use kmem_cache_zalloc() directly.

With the exception of xlog_ticket_alloc() which will be dealt on the
next patch for readability.

Reviewed-by: Christoph Hellwig
Signed-off-by: Carlos Maiolino
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner

Carlos Maiolino
2020-07-29 11:24:14 +0800

08 May, 2020

8 commits

cc560a5a9 xfs: hoist setting of XFS_LI_RECOVERED to caller ... Browse Code »

The only purpose of XFS_LI_RECOVERED is to prevent log recovery from
trying to replay recovered intents more than once. Therefore, we can
move the bit setting up to the ->iop_recover caller.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-05-08 23:50:01 +0800
96b60f826 xfs: refactor intent item iop_recover calls ... Browse Code »

Now that we've made the recovered item tests all the same, we can hoist
the test and the ail locking code to the ->iop_recover caller and call
the recovery function directly.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-05-08 23:50:01 +0800
889eb55dd xfs: refactor intent item RECOVERED flag into the log item ... Browse Code »

Rename XFS_{EFI,BUI,RUI,CUI}_RECOVERED to XFS_LI_RECOVERED so that we
track recovery status in the log item, then get rid of the now unused
flags fields in each of those log item types.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-05-08 23:50:01 +0800
86a371741 xfs: refactor adding recovered intent items to the log ... Browse Code »

During recovery, every intent that we recover from the log has to be
added to the AIL. Replace the open-coded addition with a helper.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Chandan Babu R

Darrick J. Wong
2020-05-08 23:50:00 +0800
154c733a3 xfs: refactor releasing finished intents during log recovery ... Browse Code »

Replace the open-coded AIL item walking with a proper helper when we're
trying to release an intent item that has been finished. We add a new
->iop_match method to decide if an intent item matches a supplied ID.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-05-08 23:50:00 +0800
9329ba89c xfs: refactor recovered BUI log item playback ... Browse Code »

Move the code that processes the log items created from the recovered
log items into the per-item source code files and use dispatch functions
to call them. No functional changes.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-05-08 23:50:00 +0800
3c6ba3cf9 xfs: refactor log recovery BUI item dispatch for pass2 commit functions ... Browse Code »

Move the bmap update intent and intent-done pass2 commit code into the
per-item source code files and use dispatch functions to call them. We
do these one at a time because there's a lot of code to move. No
functional changes.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-05-08 23:49:59 +0800
86ffa471d xfs: refactor log recovery item sorting into a generic dispatch structure ... Browse Code »

Create a generic dispatch structure to delegate recovery of different
log item types into various code modules. This will enable us to move
code specific to a particular log item type out of xfs_log_recover.c and
into the log item source.

The first operation we virtualize is the log item sorting.

Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-05-08 23:49:58 +0800

07 May, 2020

1 commit

655879290 xfs: use delete helper for items expected to be in AIL ... Browse Code »

Various intent log items call xfs_trans_ail_remove() with a log I/O
error shutdown type, but this helper historically checks whether an
item is in the AIL before calling xfs_trans_ail_delete(). This means
the shutdown check is essentially a no-op for users of
xfs_trans_ail_remove().

It is possible that some items might not be AIL resident when the
AIL remove attempt occurs, but this should be isolated to cases
where the filesystem has already shutdown. For example, this
includes abort of the transaction committing the intent and I/O
error of the iclog buffer committing the intent to the log.
Therefore, update these callsites to use xfs_trans_ail_delete() to
provide AIL state validation for the common path of items being
released and removed when associated done items commit to the
physical log.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Allison Collins
Signed-off-by: Darrick J. Wong

Brian Foster
2020-05-07 23:27:47 +0800

05 May, 2020

5 commits

3ec1b26c0 xfs: use a xfs_btree_cur for the ->finish_cleanup state ... Browse Code »

Given how XFS is all based around btrees it doesn't make much sense
to offer a totally generic state when we can just use the btree cursor.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2020-05-05 00:03:17 +0800
f09d167c2 xfs: turn dfp_done into a xfs_log_item ... Browse Code »

All defer op instance place their own extension of the log item into
the dfp_done field. Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Also use the opportunity to improve the ->finish_item calling conventions
to place the done log item as the higher level structure before the
list_entry used for the individual items.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2020-05-05 00:03:17 +0800
13a833333 xfs: turn dfp_intent into a xfs_log_item ... Browse Code »

All defer op instance place their own extension of the log item into
the dfp_intent field. Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2020-05-05 00:03:16 +0800
d367a868e xfs: merge the ->diff_items defer op into ->create_intent ... Browse Code »

This avoids a per-item indirect call, and also simplifies the interface
a bit.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2020-05-05 00:03:16 +0800
c1f09188e xfs: merge the ->log_item defer op into ->create_intent ... Browse Code »

These are aways called together, and my merging them we reduce the amount
of indirect calls, improve type safety and in general clean up the code
a bit.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2020-05-05 00:03:16 +0800

19 Nov, 2019

1 commit

377bcd5f3 xfs: Remove kmem_zone_free() wrapper ... Browse Code »

We can remove it now, without needing to rework the KM_ flags.

Use kmem_cache_free() directly.

Reviewed-by: Darrick J. Wong
Signed-off-by: Carlos Maiolino
Signed-off-by: Darrick J. Wong

Carlos Maiolino
2019-11-19 00:40:44 +0800

11 Nov, 2019

1 commit

895e196fb xfs: convert EIO to EFSCORRUPTED when log contents are invalid ... Browse Code »

Convert EIO to EFSCORRUPTED in the logging code when we can determine
that the log contents are invalid.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2019-11-11 08:54:18 +0800

05 Nov, 2019

1 commit

a5155b870 xfs: always log corruption errors ... Browse Code »

Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.

Signed-off-by: Darrick J. Wong
Reviewed-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2019-11-05 05:55:54 +0800

28 Aug, 2019

1 commit

3e08f42ae xfs: remove unnecessary int returns from deferred bmap functions ... Browse Code »

Remove the return value from the functions that schedule deferred bmap
operations since they never fail and do not return status.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner

Darrick J. Wong
2019-08-28 23:31:02 +0800

27 Aug, 2019

1 commit

707e0ddaf fs: xfs: Remove KM_NOSLEEP and KM_SLEEP. ... Browse Code »

Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP,
we can remove KM_NOSLEEP and replace KM_SLEEP with 0.

Signed-off-by: Tetsuo Handa
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Tetsuo Handa
2019-08-27 03:06:22 +0800

29 Jun, 2019

8 commits

250d4b4c4 xfs: remove unused header files ... Browse Code »

There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this. I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Eric Sandeen
2019-06-29 10:30:43 +0800
caeaea985 xfs: merge xfs_trans_bmap.c into xfs_bmap_item.c ... Browse Code »

Keep all bmap item related code together.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2019-06-29 10:29:42 +0800
73f0d2363 xfs: merge xfs_bud_init into xfs_trans_get_bud ... Browse Code »

There is no good reason to keep these two functions separate.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2019-06-29 10:27:36 +0800
95cf0e4a0 xfs: remove a pointless comment duplicated above all xfs_item_ops instances ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2019-06-29 10:27:34 +0800
9ce632a28 xfs: add a flag to release log items on commit ... Browse Code »

We have various items that are released from ->iop_comitting. Add a
flag to just call ->iop_release from the commit path to avoid tons
of boilerplate code.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2019-06-29 10:27:32 +0800
ddf92053e xfs: split iop_unlock ... Browse Code »

The iop_unlock method is called when comitting or cancelling a
transaction. In the latter case, the transaction may or may not be
aborted. While there is no known problem with the current code in
practice, this implementation is limited in that any log item
implementation that might want to differentiate between a commit and a
cancellation must rely on the aborted state. The aborted bit is only
set when the cancelled transaction is dirty, however. This means that
there is no way to distinguish between a commit and a clean transaction
cancellation.

For example, intent log items currently rely on this distinction. The
log item is either transferred to the CIL on commit or released on
transaction cancel. There is currently no possibility for a clean intent
log item in a transaction, but if that state is ever introduced a cancel
of such a transaction will immediately result in memory leaks of the
associated log item(s). This is an interface deficiency and landmine.

To clean this up, replace the iop_unlock method with an iop_release
method that is specific to transaction cancel. The existing
iop_committing method occurs at the same time as iop_unlock in the
commit path and there is no need for two separate callbacks here.
Overload the iop_committing method with the current commit time
iop_unlock implementations to eliminate the need for the latter and
further simplify the interface.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2019-06-29 10:27:32 +0800
e8b78db77 xfs: don't require log items to implement optional methods ... Browse Code »

Just check if they are present first.

Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2019-06-29 10:27:30 +0800
5467b34bd xfs: move xfs_ino_geometry to xfs_shared.h ... Browse Code »

The inode geometry structure isn't related to ondisk format; it's
support for the mount structure. Move it to xfs_shared.h.

Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2019-06-29 10:25:35 +0800

03 Aug, 2018

4 commits

0f37d1780 xfs: pass transaction to xfs_defer_add() ... Browse Code »

The majority of remaining references to struct xfs_defer_ops in XFS
are associated with xfs_defer_add(). At this point, there are no
more external xfs_defer_ops users left. All instances of
xfs_defer_ops are embedded in the transaction, which means we can
safely pass the transaction down to the dfops add interface.

Update xfs_defer_add() to receive the transaction as a parameter.
Various subsystems implement wrappers to allocate and construct the
context specific data structures for the associated deferred
operation type. Update these to also carry the transaction down as
needed and clean up unused dfops parameters along the way.

This removes most of the remaining references to struct
xfs_defer_ops throughout the code and facilitates removal of the
structure.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
[darrick: fix unused variable warnings with ftrace disabled]
Signed-off-by: Darrick J. Wong

Brian Foster
2018-08-03 14:05:14 +0800
7dbddbacc xfs: drop dop param from xfs_defer_op_type ->finish_item() callback ... Browse Code »

The dfops infrastructure ->finish_item() callback passes the
transaction and dfops as separate parameters. Since dfops is always
part of a transaction, the latter parameter is no longer necessary.
Remove it from the various callbacks.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Darrick J. Wong

Brian Foster
2018-08-03 14:05:14 +0800
ce356d647 xfs: pass transaction to dfops reset/move helpers ... Browse Code »

All callers pass ->t_dfops of the associated transactions. Refactor
the helpers to receive the transactions and facilitate further
cleanups between xfs_defer_ops and xfs_trans.

Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Darrick J. Wong

Brian Foster
2018-08-03 14:05:13 +0800
fbfa977d2 xfs: use transaction for intent recovery instead of raw dfops ... Browse Code »

Log intent recovery is the last user of an external (on-stack)
dfops. The pattern exists because the dfops is used to collect
additional deferred operations queued during the whole recovery
sequence. The dfops is finished with a new transaction after intent
recovery completes.

We already have a mechanism to create an empty, container-like
transaction to support the scrub infrastructure. We can reuse that
mechanism here to drop the final user of external dfops. This
facilitates folding dfops state (i.e., dop_low) into the
transaction, the elimination of now unused external dfops support
and also eliminates the only caller of __xfs_defer_cancel().

Replace the on-stack dfops with an empty transaction and pass it
around to the various helpers that queue and finish deferred
operations during intent recovery.

Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Brian Foster
2018-08-03 14:05:13 +0800