07 Oct, 2020

3 commits

  • There's a subtle design flaw in the deferred log item code that can lead
    to pinning the log tail. Taking up the defer ops chain examples from
    the previous commit, we can get trapped in sequences like this:

    Caller hands us a transaction t0 with D0-D3 attached. The defer ops
    chain will look like the following if the transaction rolls succeed:

    t1: D0(t0), D1(t0), D2(t0), D3(t0)
    t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
    t3: d5(t1), D1(t0), D2(t0), D3(t0)
    ...
    t9: d9(t7), D3(t0)
    t10: D3(t0)
    t11: d10(t10), d11(t10)
    t12: d11(t10)

    In transaction 9, we finish d9 and try to roll to t10 while holding onto
    an intent item for D3 that we logged in t0.

    The previous commit changed the order in which we place new defer ops in
    the defer ops processing chain to reduce the maximum chain length. Now
    make xfs_defer_finish_noroll capable of relogging the entire chain
    periodically so that we can always move the log tail forward. Most
    chains will never get relogged, except for operations that generate very
    long chains (large extents containing many blocks with different sharing
    levels) or are on filesystems with small logs and a lot of ongoing
    metadata updates.

    Callers are now required to ensure that the transaction reservation is
    large enough to handle logging done items and new intent items for the
    maximum possible chain length. Most callers are careful to keep the
    chain lengths low, so the overhead should be minimal.

    The decision to relog an intent item is made based on whether the intent
    was logged in a previous checkpoint, since there's no point in relogging
    an intent into the same checkpoint.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Darrick J. Wong
     
  • In xfs_bui_item_recover, there exists a use-after-free bug with regards
    to the inode that is involved in the bmap replay operation. If the
    mapping operation does not complete, we call xfs_bmap_unmap_extent to
    create a deferred op to finish the unmapping work, and we retain a
    pointer to the incore inode.

    Unfortunately, the very next thing we do is commit the transaction and
    drop the inode. If reclaim tears down the inode before we try to finish
    the defer ops, we dereference garbage and blow up. Therefore, create a
    way to join inodes to the defer ops freezer so that we can maintain the
    xfs_inode reference until we're done with the inode.

    Note: This imposes the requirement that there be enough memory to keep
    every incore inode in memory throughout recovery.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • When we replay unfinished intent items that have been recovered from the
    log, it's possible that the replay will cause the creation of more
    deferred work items. As outlined in commit 509955823cc9c ("xfs: log
    recovery should replay deferred ops in order"), later work items have an
    implicit ordering dependency on earlier work items. Therefore, recovery
    must replay the items (both recovered and created) in the same order
    that they would have been during normal operation.

    For log recovery, we enforce this ordering by using an empty transaction
    to collect deferred ops that get created in the process of recovering a
    log intent item to prevent them from being committed before the rest of
    the recovered intent items. After we finish committing all the
    recovered log items, we allocate a transaction with an enormous block
    reservation, splice our huge list of created deferred ops into that
    transaction, and commit it, thereby finishing all those ops.

    This is /really/ hokey -- it's the one place in XFS where we allow
    nested transactions; the splicing of the defer ops list is is inelegant
    and has to be done twice per recovery function; and the broken way we
    handle inode pointers and block reservations cause subtle use-after-free
    and allocator problems that will be fixed by this patch and the two
    patches after it.

    Therefore, replace the hokey empty transaction with a structure designed
    to capture each chain of deferred ops that are created as part of
    recovering a single unfinished log intent. Finally, refactor the loop
    that replays those chains to do so using one transaction per chain.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

23 Sep, 2020

1 commit


29 Jul, 2020

1 commit

  • Use kmem_cache_zalloc() directly.

    With the exception of xlog_ticket_alloc() which will be dealt on the
    next patch for readability.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner

    Carlos Maiolino
     

08 May, 2020

8 commits


07 May, 2020

1 commit

  • Various intent log items call xfs_trans_ail_remove() with a log I/O
    error shutdown type, but this helper historically checks whether an
    item is in the AIL before calling xfs_trans_ail_delete(). This means
    the shutdown check is essentially a no-op for users of
    xfs_trans_ail_remove().

    It is possible that some items might not be AIL resident when the
    AIL remove attempt occurs, but this should be isolated to cases
    where the filesystem has already shutdown. For example, this
    includes abort of the transaction committing the intent and I/O
    error of the iclog buffer committing the intent to the log.
    Therefore, update these callsites to use xfs_trans_ail_delete() to
    provide AIL state validation for the common path of items being
    released and removed when associated done items commit to the
    physical log.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Allison Collins
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

05 May, 2020

6 commits


19 Nov, 2019

1 commit


11 Nov, 2019

1 commit


05 Nov, 2019

1 commit


27 Aug, 2019

1 commit


29 Jun, 2019

8 commits

  • There are many, many xfs header files which are included but
    unneeded (or included twice) in the xfs code, so remove them.

    nb: xfs_linux.h includes about 9 headers for everyone, so those
    explicit includes get removed by this. I'm not sure what the
    preference is, but if we wanted explicit includes everywhere,
    a followup patch could remove those xfs_*.h includes from
    xfs_linux.h and move them into the files that need them.
    Or it could be left as-is.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • Keep all the extree item related code together in one file.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • There is no good reason to keep these two functions separate.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • We have various items that are released from ->iop_comitting. Add a
    flag to just call ->iop_release from the commit path to avoid tons
    of boilerplate code.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The iop_unlock method is called when comitting or cancelling a
    transaction. In the latter case, the transaction may or may not be
    aborted. While there is no known problem with the current code in
    practice, this implementation is limited in that any log item
    implementation that might want to differentiate between a commit and a
    cancellation must rely on the aborted state. The aborted bit is only
    set when the cancelled transaction is dirty, however. This means that
    there is no way to distinguish between a commit and a clean transaction
    cancellation.

    For example, intent log items currently rely on this distinction. The
    log item is either transferred to the CIL on commit or released on
    transaction cancel. There is currently no possibility for a clean intent
    log item in a transaction, but if that state is ever introduced a cancel
    of such a transaction will immediately result in memory leaks of the
    associated log item(s). This is an interface deficiency and landmine.

    To clean this up, replace the iop_unlock method with an iop_release
    method that is specific to transaction cancel. The existing
    iop_committing method occurs at the same time as iop_unlock in the
    commit path and there is no need for two separate callbacks here.
    Overload the iop_committing method with the current commit time
    iop_unlock implementations to eliminate the need for the latter and
    further simplify the interface.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Just check if they are present first.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The inode geometry structure isn't related to ondisk format; it's
    support for the mount structure. Move it to xfs_shared.h.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

13 Dec, 2018

1 commit

  • Owner information for static fs metadata can be defined readonly at
    build time because it never changes across filesystems. This enables us
    to reduce stack usage (particularly in scrub) because we can use the
    statically defined oinfo structures.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

07 Jun, 2018

1 commit

  • Remove the verbose license text from XFS files and replace them
    with SPDX tags. This does not change the license of any of the code,
    merely refers to the common, up-to-date license files in LICENSES/

    This change was mostly scripted. fs/xfs/Makefile and
    fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
    and modified by the following command:

    for f in `git grep -l "GNU General" fs/xfs/` ; do
    echo $f
    cat $f | awk -f hdr.awk > $f.new
    mv -f $f.new $f
    done

    And the hdr.awk script that did the modification (including
    detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
    is as follows:

    $ cat hdr.awk
    BEGIN {
    hdr = 1.0
    tag = "GPL-2.0"
    str = ""
    }

    /^ \* This program is free software/ {
    hdr = 2.0;
    next
    }

    /any later version./ {
    tag = "GPL-2.0+"
    next
    }

    /^ \*\// {
    if (hdr > 0.0) {
    print "// SPDX-License-Identifier: " tag
    print str
    print $0
    str=""
    hdr = 0.0
    next
    }
    print $0
    next
    }

    /^ \* / {
    if (hdr > 1.0)
    next
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    next
    }

    /^ \*/ {
    if (hdr > 0.0)
    next
    print $0
    next
    }

    // {
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    }

    END { }
    $

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

10 May, 2018

2 commits

  • Freed extents are unconditionally discarded when online discard is
    enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass
    discards when unnecessary. For example, this will be useful for
    eofblocks trimming.

    This patch does not change behavior.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • The log item flags contain a field that is protected by the AIL
    lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to
    set and clear these flags, but most of the updates and checks are
    not done with the AIL lock held and so are susceptible to update
    races.

    Fix this by changing the log item flags to use atomic bitops rather
    than be reliant on the AIL lock for update serialisation.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

03 Apr, 2018

1 commit

  • When an intent is aborted during it's initial commit through
    xfs_defer_trans_abort(), there is a use after free. The current
    report is for a RUI through this path in generic/388:

    Freed by task 6274:
    __kasan_slab_free+0x136/0x180
    kmem_cache_free+0xe7/0x4b0
    xfs_trans_free_items+0x198/0x2e0
    __xfs_trans_commit+0x27f/0xcc0
    xfs_trans_roll+0x17b/0x2a0
    xfs_defer_trans_roll+0x6ad/0xe60
    xfs_defer_finish+0x2a6/0x2140
    xfs_alloc_file_space+0x53a/0xf90
    xfs_file_fallocate+0x5c6/0xac0
    vfs_fallocate+0x2f5/0x930
    ioctl_preallocate+0x1dc/0x320
    do_vfs_ioctl+0xfe4/0x1690

    The problem is that the RUI has two active references - one in the
    current transaction, and another held by the defer_ops structure
    that is passed to the RUD (intent done) so that both the intent and
    the intent done structures are freed on commit of the intent done.

    Hence during abort, we need to release the intent item, because the
    defer_ops reference is released separately via ->abort_intent
    callback. Fix all the intent code to do this correctly.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

22 Dec, 2017

1 commit

  • Calling xfs_rmap_free with an unknown owner is supposed to remove any
    rmaps covering that range regardless of owner. This is used by the EFI
    recovery code to say "we're freeing this, it mustn't be owned by
    anything anymore", but for whatever reason xfs_free_ag_extent filters
    them out.

    Therefore, remove the filter and make xfs_rmap_unmap actually treat it
    as a wildcard owner -- free anything that's already there, and if
    there's no owner at all then that's fine too.

    There are two existing callers of bmap_add_free that take care the rmap
    deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
    cleanup; convert these to use OWN_NULL (via helpers), and now we really
    require that an RUI (if any) gets added to the defer ops before any EFI.

    Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
    growfs will have to consult directly with the rmap to ensure that there
    aren't any rmaps in the grown region.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

26 Apr, 2017

1 commit


03 Aug, 2016

1 commit