07 Oct, 2020
3 commits
-
There's a subtle design flaw in the deferred log item code that can lead
to pinning the log tail. Taking up the defer ops chain examples from
the previous commit, we can get trapped in sequences like this:Caller hands us a transaction t0 with D0-D3 attached. The defer ops
chain will look like the following if the transaction rolls succeed:t1: D0(t0), D1(t0), D2(t0), D3(t0)
t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)In transaction 9, we finish d9 and try to roll to t10 while holding onto
an intent item for D3 that we logged in t0.The previous commit changed the order in which we place new defer ops in
the defer ops processing chain to reduce the maximum chain length. Now
make xfs_defer_finish_noroll capable of relogging the entire chain
periodically so that we can always move the log tail forward. Most
chains will never get relogged, except for operations that generate very
long chains (large extents containing many blocks with different sharing
levels) or are on filesystems with small logs and a lot of ongoing
metadata updates.Callers are now required to ensure that the transaction reservation is
large enough to handle logging done items and new intent items for the
maximum possible chain length. Most callers are careful to keep the
chain lengths low, so the overhead should be minimal.The decision to relog an intent item is made based on whether the intent
was logged in a previous checkpoint, since there's no point in relogging
an intent into the same checkpoint.Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster -
In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation. If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.Unfortunately, the very next thing we do is commit the transaction and
drop the inode. If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up. Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig -
When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items. As outlined in commit 509955823cc9c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items. Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items. After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent. Finally, refactor the loop
that replays those chains to do so using one transaction per chain.Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
23 Sep, 2020
1 commit
-
Nowadays, log recovery will call ->release on the recovered intent items
if recovery fails. Therefore, it's redundant to release them from
inside the ->recover functions when they're about to return an error.Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
29 Jul, 2020
1 commit
-
Use kmem_cache_zalloc() directly.
With the exception of xlog_ticket_alloc() which will be dealt on the
next patch for readability.Reviewed-by: Christoph Hellwig
Signed-off-by: Carlos Maiolino
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
08 May, 2020
8 commits
-
The only purpose of XFS_LI_RECOVERED is to prevent log recovery from
trying to replay recovered intents more than once. Therefore, we can
move the bit setting up to the ->iop_recover caller.Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig -
Now that we've made the recovered item tests all the same, we can hoist
the test and the ail locking code to the ->iop_recover caller and call
the recovery function directly.Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig -
Rename XFS_{EFI,BUI,RUI,CUI}_RECOVERED to XFS_LI_RECOVERED so that we
track recovery status in the log item, then get rid of the now unused
flags fields in each of those log item types.Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig -
During recovery, every intent that we recover from the log has to be
added to the AIL. Replace the open-coded addition with a helper.Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Chandan Babu R -
Replace the open-coded AIL item walking with a proper helper when we're
trying to release an intent item that has been finished. We add a new
->iop_match method to decide if an intent item matches a supplied ID.Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig -
Move the code that processes the log items created from the recovered
log items into the per-item source code files and use dispatch functions
to call them. No functional changes.Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig -
Move the extent free intent and intent-done pass2 commit code into the
per-item source code files and use dispatch functions to call them. We
do these one at a time because there's a lot of code to move. No
functional changes.Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig -
Create a generic dispatch structure to delegate recovery of different
log item types into various code modules. This will enable us to move
code specific to a particular log item type out of xfs_log_recover.c and
into the log item source.The first operation we virtualize is the log item sorting.
Signed-off-by: Darrick J. Wong
Reviewed-by: Chandan Babu R
Reviewed-by: Christoph Hellwig
07 May, 2020
1 commit
-
Various intent log items call xfs_trans_ail_remove() with a log I/O
error shutdown type, but this helper historically checks whether an
item is in the AIL before calling xfs_trans_ail_delete(). This means
the shutdown check is essentially a no-op for users of
xfs_trans_ail_remove().It is possible that some items might not be AIL resident when the
AIL remove attempt occurs, but this should be isolated to cases
where the filesystem has already shutdown. For example, this
includes abort of the transaction committing the intent and I/O
error of the iclog buffer committing the intent to the log.
Therefore, update these callsites to use xfs_trans_ail_delete() to
provide AIL state validation for the common path of items being
released and removed when associated done items commit to the
physical log.Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Reviewed-by: Allison Collins
Signed-off-by: Darrick J. Wong
05 May, 2020
6 commits
-
Given how XFS is all based around btrees it doesn't make much sense
to offer a totally generic state when we can just use the btree cursor.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
All defer op instance place their own extension of the log item into
the dfp_done field. Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.Also use the opportunity to improve the ->finish_item calling conventions
to place the done log item as the higher level structure before the
list_entry used for the individual items.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
All defer op instance place their own extension of the log item into
the dfp_intent field. Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
This avoids a per-item indirect call, and also simplifies the interface
a bit.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
These are aways called together, and my merging them we reduce the amount
of indirect calls, improve type safety and in general clean up the code
a bit.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
19 Nov, 2019
1 commit
-
We can remove it now, without needing to rework the KM_ flags.
Use kmem_cache_free() directly.
Reviewed-by: Darrick J. Wong
Signed-off-by: Carlos Maiolino
Signed-off-by: Darrick J. Wong
11 Nov, 2019
1 commit
-
Convert EIO to EFSCORRUPTED in the logging code when we can determine
that the log contents are invalid.Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
05 Nov, 2019
1 commit
-
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.Signed-off-by: Darrick J. Wong
Reviewed-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig
27 Aug, 2019
1 commit
-
Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP,
we can remove KM_NOSLEEP and replace KM_SLEEP with 0.Signed-off-by: Tetsuo Handa
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
29 Jun, 2019
8 commits
-
There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this. I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.Signed-off-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Keep all the extree item related code together in one file.
Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
There is no good reason to keep these two functions separate.
Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
We have various items that are released from ->iop_comitting. Add a
flag to just call ->iop_release from the commit path to avoid tons
of boilerplate code.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
The iop_unlock method is called when comitting or cancelling a
transaction. In the latter case, the transaction may or may not be
aborted. While there is no known problem with the current code in
practice, this implementation is limited in that any log item
implementation that might want to differentiate between a commit and a
cancellation must rely on the aborted state. The aborted bit is only
set when the cancelled transaction is dirty, however. This means that
there is no way to distinguish between a commit and a clean transaction
cancellation.For example, intent log items currently rely on this distinction. The
log item is either transferred to the CIL on commit or released on
transaction cancel. There is currently no possibility for a clean intent
log item in a transaction, but if that state is ever introduced a cancel
of such a transaction will immediately result in memory leaks of the
associated log item(s). This is an interface deficiency and landmine.To clean this up, replace the iop_unlock method with an iop_release
method that is specific to transaction cancel. The existing
iop_committing method occurs at the same time as iop_unlock in the
commit path and there is no need for two separate callbacks here.
Overload the iop_committing method with the current commit time
iop_unlock implementations to eliminate the need for the latter and
further simplify the interface.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Just check if they are present first.
Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
The inode geometry structure isn't related to ondisk format; it's
support for the mount structure. Move it to xfs_shared.h.Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig
13 Dec, 2018
1 commit
-
Owner information for static fs metadata can be defined readonly at
build time because it never changes across filesystems. This enables us
to reduce stack usage (particularly in scrub) because we can use the
statically defined oinfo structures.Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
07 Jun, 2018
1 commit
-
Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:for f in `git grep -l "GNU General" fs/xfs/` ; do
echo $f
cat $f | awk -f hdr.awk > $f.new
mv -f $f.new $f
doneAnd the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:$ cat hdr.awk
BEGIN {
hdr = 1.0
tag = "GPL-2.0"
str = ""
}/^ \* This program is free software/ {
hdr = 2.0;
next
}/any later version./ {
tag = "GPL-2.0+"
next
}/^ \*\// {
if (hdr > 0.0) {
print "// SPDX-License-Identifier: " tag
print str
print $0
str=""
hdr = 0.0
next
}
print $0
next
}/^ \* / {
if (hdr > 1.0)
next
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
next
}/^ \*/ {
if (hdr > 0.0)
next
print $0
next
}// {
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
}END { }
$Signed-off-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
10 May, 2018
2 commits
-
Freed extents are unconditionally discarded when online discard is
enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass
discards when unnecessary. For example, this will be useful for
eofblocks trimming.This patch does not change behavior.
Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
The log item flags contain a field that is protected by the AIL
lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to
set and clear these flags, but most of the updates and checks are
not done with the AIL lock held and so are susceptible to update
races.Fix this by changing the log item flags to use atomic bitops rather
than be reliant on the AIL lock for update serialisation.Signed-Off-By: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
03 Apr, 2018
1 commit
-
When an intent is aborted during it's initial commit through
xfs_defer_trans_abort(), there is a use after free. The current
report is for a RUI through this path in generic/388:Freed by task 6274:
__kasan_slab_free+0x136/0x180
kmem_cache_free+0xe7/0x4b0
xfs_trans_free_items+0x198/0x2e0
__xfs_trans_commit+0x27f/0xcc0
xfs_trans_roll+0x17b/0x2a0
xfs_defer_trans_roll+0x6ad/0xe60
xfs_defer_finish+0x2a6/0x2140
xfs_alloc_file_space+0x53a/0xf90
xfs_file_fallocate+0x5c6/0xac0
vfs_fallocate+0x2f5/0x930
ioctl_preallocate+0x1dc/0x320
do_vfs_ioctl+0xfe4/0x1690The problem is that the RUI has two active references - one in the
current transaction, and another held by the defer_ops structure
that is passed to the RUD (intent done) so that both the intent and
the intent done structures are freed on commit of the intent done.Hence during abort, we need to release the intent item, because the
defer_ops reference is released separately via ->abort_intent
callback. Fix all the intent code to do this correctly.Signed-Off-By: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
22 Dec, 2017
1 commit
-
Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner. This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
26 Apr, 2017
1 commit
-
Use ASSERTs on the log intent item refcounts so that we fail noisily if
anyone tries to double-free the item.Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
03 Aug, 2016
1 commit
-
Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner