07 Oct, 2020

1 commit

  • Separate the computation of the log push threshold and the push logic in
    xlog_grant_push_ail. This enables higher level code to determine (for
    example) that it is holding on to a logged intent item and the log is so
    busy that it is more than 75% full. In that case, it would be desirable
    to move the log item towards the head to release the tail, which we will
    cover in the next patch.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

27 Mar, 2020

1 commit

  • xfs_log_done() does two separate things. Firstly, it triggers commit
    records to be written for permanent transactions, and secondly it
    releases or regrants transaction reservation space.

    Since delayed logging was introduced, transactions no longer write
    directly to the log, hence they never have the XLOG_TIC_INITED flag
    cleared on them. Hence transactions never write commit records to
    the log and only need to modify reservation space.

    Split up xfs_log_done into two parts, and only call the parts of the
    operation needed for the context xfs_log_done() is currently being
    called from.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

23 Mar, 2020

2 commits

  • We can just check for a shut down log all the way down in
    xlog_cil_committed instead of passing the parameter. This means a
    slight behavior change in that we now also abort log items if the
    shutdown came in halfway into the I/O completion processing, which
    actually is the right thing to do.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The only caller of xfs_log_release_iclog doesn't care about the return
    value, so remove it. Also don't bother passing the mount pointer,
    given that we can trivially derive it from the iclog.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

03 Jul, 2019

1 commit

  • Change return types of below functions as they never fails
    xfs_log_mount_cancel
    xlog_recover_cancel
    xlog_recover_cancel_intents

    fix below issue reported by coccicheck
    fs/xfs/xfs_log_recover.c:4886:7-12: Unneeded variable: "error". Return
    "0" on line 4926

    Signed-off-by: Hariprasad Kelam
    Reviewed-by: Eric Sandeen
    Reviewed-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Hariprasad Kelam
     

29 Jun, 2019

2 commits


01 Aug, 2018

1 commit

  • Add a predicate to decide if the log is actively in recovery and use
    that instead of open-coding a pagf_init check in the attr leaf verifier.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Carlos Maiolino

    Darrick J. Wong
     

07 Jun, 2018

1 commit

  • Remove the verbose license text from XFS files and replace them
    with SPDX tags. This does not change the license of any of the code,
    merely refers to the common, up-to-date license files in LICENSES/

    This change was mostly scripted. fs/xfs/Makefile and
    fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
    and modified by the following command:

    for f in `git grep -l "GNU General" fs/xfs/` ; do
    echo $f
    cat $f | awk -f hdr.awk > $f.new
    mv -f $f.new $f
    done

    And the hdr.awk script that did the modification (including
    detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
    is as follows:

    $ cat hdr.awk
    BEGIN {
    hdr = 1.0
    tag = "GPL-2.0"
    str = ""
    }

    /^ \* This program is free software/ {
    hdr = 2.0;
    next
    }

    /any later version./ {
    tag = "GPL-2.0+"
    next
    }

    /^ \*\// {
    if (hdr > 0.0) {
    print "// SPDX-License-Identifier: " tag
    print str
    print $0
    str=""
    hdr = 0.0
    next
    }
    print $0
    next
    }

    /^ \* / {
    if (hdr > 1.0)
    next
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    next
    }

    /^ \*/ {
    if (hdr > 0.0)
    next
    print $0
    next
    }

    // {
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    }

    END { }
    $

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

10 Apr, 2018

1 commit


15 Mar, 2018

2 commits

  • Switch to a single interface for flushing the log to a specific LSN, which
    gives consistent trace point coverage and a less confusing interface.

    The was only a single user of the previous xfs_log_force_lsn function,
    which now also passes a NULL log_flushed argument.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Switch to a single interface for flushing the whole log, which gives
    consistent trace point coverage, and removes the unused log_flushed
    argument for the previous _xfs_log_force callers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

20 Jun, 2017

1 commit

  • This is a purely mechanical patch that removes the private
    __{u,}int{8,16,32,64}_t typedefs in favor of using the system
    {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
    the transformation and fix the resulting whitespace and indentation
    errors:

    s/typedef\t__uint8_t/typedef __uint8_t\t/g
    s/typedef\t__uint/typedef __uint/g
    s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
    s/__uint8_t\t/__uint8_t\t\t/g
    s/__uint/uint/g
    s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
    s/__int/int/g
    /^typedef.*int[0-9]*_t;$/d

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

31 Jan, 2017

1 commit

  • After scratching my head looking for "xfs_busy_extent" I realized
    it's not used; it's xfs_extent_busy, and the declaration for the
    other name is bogus. Remove that and a few others as well.

    (struct xfs_log_callback is used, but the 2nd declaration is
    unnecessary).

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     

01 Jun, 2016

1 commit

  • Al Viro noticed that xfs_lock_inodes should be static, and
    that led to ... a few more.

    These are just the easy ones, others require moving functions
    higher in source files, so that's not done here to keep
    this review simple.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

06 Apr, 2016

1 commit


12 Oct, 2015

1 commit

  • Since the onset of v5 superblocks, the LSN of the last modification has
    been included in a variety of on-disk data structures. This LSN is used
    to provide log recovery ordering guarantees (e.g., to ensure an older
    log recovery item is not replayed over a newer target data structure).

    While this works correctly from the point a filesystem is formatted and
    mounted, userspace tools have some problematic behaviors that defeat
    this mechanism. For example, xfs_repair historically zeroes out the log
    unconditionally (regardless of whether corruption is detected). If this
    occurs, the LSN of the filesystem is reset and the log is now in a
    problematic state with respect to on-disk metadata structures that might
    have a larger LSN. Until either the log catches up to the highest
    previously used metadata LSN or each affected data structure is modified
    and written out without incident (which resets the metadata LSN), log
    recovery is susceptible to filesystem corruption.

    This problem is ultimately addressed and repaired in the associated
    userspace tools. The kernel is still responsible to detect the problem
    and notify the user that something is wrong. Check the superblock LSN at
    mount time and fail the mount if it is invalid. From that point on,
    trigger verifier failure on any metadata I/O where an invalid LSN is
    detected. This results in a filesystem shutdown and guarantees that we
    do not log metadata changes with invalid LSNs on disk. Since this is a
    known issue with a known recovery path, present a warning to instruct
    the user how to recover.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Brian Foster
     

19 Aug, 2015

1 commit

  • Log recovery occurs in two phases at mount time. In the first phase,
    EFIs and EFDs are processed and potentially cancelled out. EFIs without
    EFD objects are inserted into the AIL for processing and recovery in the
    second phase. xfs_mountfs() runs various other operations between the
    phases and is thus subject to failure. If failure occurs after the first
    phase but before the second, pending EFIs sit on the AIL, pin it and
    cause the mount to hang.

    Update the mount sequence to ensure that pending EFIs are cancelled in
    the event of failure. Add a recovery cancellation mechanism to iterate
    the AIL and cancel all EFI items when requested. Plumb cancellation
    support through the log mount finish helper and update xfs_mountfs() to
    invoke cancellation in the event of failure after recovery has started.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Brian Foster
     

04 Jun, 2015

2 commits

  • Instead of the confusing flags argument pass a boolean flag to indicate if
    we want to release or regrant a log reservation.

    Also ensure that xfs_log_done always drop the reference on the log ticket,
    to both simplify the code and make the logic in xfs_trans_roll easier
    to understand.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • The flags argument to xfs_trans_commit is not useful for most callers, as
    a commit of a transaction without a permanent log reservation must pass
    0 here, and all callers for a transaction with a permanent log reservation
    except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove
    the flags argument from the public xfs_trans_commit interfaces, and
    introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
    that regrants a log reservation instead of releasing it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

20 May, 2014

1 commit

  • The addition of direct formatting of log items into the CIL
    linear buffer added alignment restrictions that the start of each
    vector needed to be 64 bit aligned. Hence padding was added in
    xlog_finish_iovec() to round up the vector length to ensure the next
    vector started with the correct alignment.

    This adds a small number of bytes to the size of
    the linear buffer that is otherwise unused. The issue is that we
    then use the linear buffer size to determine the log space used by
    the log item, and this includes the unused space. Hence when we
    account for space used by the log item, it's more than is actually
    written into the iclogs, and hence we slowly leak this space.

    This results on log hangs when reserving space, with threads getting
    stuck with these stack traces:

    Call Trace:
    [] schedule+0x29/0x70
    [] xlog_grant_head_wait+0xa2/0x1a0
    [] xlog_grant_head_check+0xbd/0x140
    [] xfs_log_reserve+0x103/0x220
    [] xfs_trans_reserve+0x2f5/0x310
    .....

    The 4 bytes is significant. Brain Foster did all the hard work in
    tracking down a reproducable leak to inode chunk allocation (it went
    away with the ikeep mount option). His rough numbers were that
    creating 50,000 inodes leaked 11 log blocks. This turns out to be
    roughly 800 inode chunks or 1600 inode cluster buffers. That
    works out at roughly 4 bytes per cluster buffer logged, and at that
    I started looking for a 4 byte leak in the buffer logging code.

    What I found was that a struct xfs_buf_log_format structure for an
    inode cluster buffer is 28 bytes in length. This gets rounded up to
    32 bytes, but the vector length remains 28 bytes. Hence the CIL
    ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
    for that vector rather than 28 bytes which are written into the log.

    The fix for this problem is to separately track the bytes used by
    the log vectors in the item and use that instead of the buffer
    length when accounting for the log space that will be used by the
    formatted log item.

    Again, thanks to Brian Foster for doing all the hard work and long
    hours to isolate this leak and make finding the bug relatively
    simple.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

07 Feb, 2014

1 commit

  • Convert xfs_log_commit_cil() to a void function since it return nothing
    but 0 in any case, after that we can simplify the relative code logic
    in xfs_trans_commit() accordingly.

    Signed-off-by: Jie Liu
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Jie Liu
     

13 Dec, 2013

2 commits

  • Instead of setting up pointers to memory locations in iop_format which then
    get copied into the CIL linear buffer after return move the copy into
    the individual inode items. This avoids the need to always have a memory
    block in the exact same layout that gets written into the log around, and
    allow the log items to be much more flexible in their in-memory layouts.

    The only caveat is that we need to properly align the data for each
    iovec so that don't have structures misaligned in subsequent iovecs.

    Note that all log item format routines now need to be careful to modify
    the copy of the item that was placed into the CIL after calls to
    xlog_copy_iovec instead of the in-memory copy.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Add a helper to abstract out filling the log iovecs in the log item
    format handlers. This will allow us to change the way we do the log
    item formatting more easily.

    The copy in the name is a bit confusing for now as it just assigns a
    pointer and lets the CIL code perform the copy, but that will change
    soon.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

24 Oct, 2013

1 commit

  • xfs_trans.h has a dependency on xfs_log.h for a couple of
    structures. Most code that does transactions doesn't need to know
    anything about the log, but this dependency means that they have to
    include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
    files and clean up the includes to be in dependency order.

    In doing this, remove the direct include of xfs_trans_reserve.h from
    xfs_trans.h so that we remove the dependency between xfs_trans.h and
    xfs_mount.h. Hence the xfs_trans.h include can be moved to the
    indicate the actual dependencies other header files have on it.

    Note that these are kernel only header files, so this does not
    translate to any userspace changes at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     

14 Aug, 2013

1 commit

  • Now that we have the size of the object before the formatting pass
    is called, we can allocation the log vector and it's buffer in a
    single allocation rather than two separate allocations.

    Store the size of the allocated buffer in the log vector so that
    we potentially avoid allocation for future modifications of the
    object.

    While touching this code, remove the IOP_FORMAT definition.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

13 Aug, 2013

1 commit

  • The on-disk format definitions for the log are spread randoms
    through a couple of header files. Consolidate it all in a single
    file that can be shared easily with userspace. This means that
    xfs_log.h and xfs_log_priv.h no longer need to be shared with
    userspace.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

28 Jun, 2013

2 commits

  • Introduce the inode create log item type for logical inode create logging.
    Instead of logging the changes in buffers, pass the range to be
    initialised through the log by a new transaction type. This reduces
    the amount of log space required to record initialisation during
    allocation from about 128 bytes per inode to a small fixed amount
    per inode extent to be initialised.

    This requires a new log item type to track it through the log
    and the AIL. This is a relatively simple item - most callbacks are
    noops as this item has the same life cycle as the transaction.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • And "ordered log vector" is a log vector that is used for
    tracking a log item through the CIL and into the AIL as part of the
    log checkpointing. These ordered log vectors are special in that
    they are not written to to journal in any way, and are not accounted
    to the checkpoint being written.

    The reason for this behaviour is to allow operations to attach items
    to transactions and have them follow the normal transactional
    lifecycle without actually having to write them to the journal. This
    allows logging of items that track high level logical changes and
    writing them to the log, while the physical items being modified
    pass through into the AIL and pin the tail of the log (and therefore
    the logical item in the log) until all the modified items are
    physically written to disk.

    IOWs, it allows us to write metadata without physically logging
    every individual change but still maintain the full transactional
    integrity guarantees we currently have w.r.t. crash recovery.

    This change modifies some of the CIL item insertion loops, as
    ordered log vectors introduce some new constraints as they don't
    track any data. One advantage of this change is that it combines
    two log vector chain walks into a single pass, so there is less
    overhead in the transaction commit pass as well. It also kills some
    unused code in the log vector walk loop when committing the CIL.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

18 Oct, 2012

2 commits

  • xfs_quiesce_attr() is supposed to leave the log empty with an
    unmount record written. Right now it does not wait for the AIL to be
    emptied before writing the unmount record, not does it wait for
    metadata IO completion, either. Fix it to use the same method and
    code as xfs_log_unmount().

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The only thing the periodic sync work does now is flush the AIL and
    idle the log. These are really functions of the log code, so move
    the work to xfs_log.c and rename it appropriately.

    The only wart that this leaves behind is the xfssyncd_centisecs
    sysctl, otherwise the xfssyncd is dead. Clean up any comments that
    related to xfssyncd to reflect it's passing.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

15 May, 2012

1 commit

  • Provide a variant of xlog_assign_tail_lsn that has the AIL lock already
    held. By doing so we do an additional atomic_read + atomic_set under
    the lock, which comes down to two instructions.

    Switch xfs_trans_ail_update_bulk and xfs_trans_ail_delete_bulk to the
    new version to reduce the number of lock roundtrips, and prepare for
    a new addition that would require a third lock roundtrip in
    xfs_trans_ail_delete_bulk. This addition is also the reason for
    slightly rearranging the conditionals and relying on xfs_log_space_wake
    for checking that the filesystem has been shut down internally.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

23 Feb, 2012

3 commits

  • Split the log regrant case out of xfs_log_reserve into a separate function,
    and merge xlog_grant_log_space and xlog_regrant_write_log_space into their
    respective callers. Also replace the XFS_LOG_PERM_RESERV flag, which easily
    got misused before the previous cleanups with a simple boolean parameter.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Remove the now unused opportunistic parameter, and use the the
    xlog_writeq_wake and xlog_reserveq_wake helpers now that we don't have
    to care about the opportunistic wakeups.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Currently xfs_log_move_tail has a tail_lsn argument that is horribly
    overloaded: it may contain either an actual lsn to assign to the log tail,
    0 as a special case to use the last sync LSN, or 1 to indicate that no tail
    LSN assignment should be performed, and we should opportunisticly wake up
    at one task waiting for log space even if we did not move the LSN.

    Remove the tail lsn assigned from xfs_log_move_tail and make the two callers
    use xlog_assign_tail_lsn instead of the current variant of partially using
    the code in xfs_log_move_tail and partially opencoding it. Note that means
    we grow an addition lock roundtrip on the AIL lock for each bulk update
    or delete, which is still far less than what we had before introducing the
    bulk operations. If this proves to be a problem we can still add a variant
    of xlog_assign_tail_lsn that expects the lock to be held already.

    Also rename the remainder of xfs_log_move_tail to xfs_log_space_wake as
    that name describes its functionality much better.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

09 Dec, 2011

2 commits

  • Now that the nodelaylog mode is gone we can simplify the transaction commit
    path a bit by removing the xfs_trans_commit_cil routine. Restoring the
    process flags is merged into xfs_trans_commit which already does it for
    the error path, and allocating the log vectors is merged into
    xlog_cil_format_items, which already fills them with data, thus avoiding
    one loop over all log items.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • The delaylog mode has been the default for a long time, and the nodelaylog
    option has been scheduled for removal in Linux 3.3. Remove it and code
    only used by it now that we have opened the 3.3 window.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

09 Nov, 2011

1 commit

  • The log item ops aren't nessecarily the biggest exploit vector, but marking
    them const is easy enough. Also remove the unused xfs_item_ops_t typedef
    while we're at it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Alex Elder

    Christoph Hellwig
     

29 Apr, 2011

1 commit

  • Update the extent tree in case we have to reuse a busy extent, so that it
    always is kept uptodate. This is done by replacing the busy list searches
    with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
    in case of a reuse. This allows us to allow reusing metadata extents
    unconditionally, and thus avoid log forces especially for allocation btree
    blocks.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

28 Jan, 2011

1 commit

  • Failure to commit a transaction into the CIL is not handled
    correctly. This currently can only happen when racing with a
    shutdown and requires an explicit shutdown check, so it rare and can
    be avoided. Remove the shutdown check and make the CIL commit a void
    function to indicate it will always succeed, thereby removing the
    incorrectly handled failure case.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner