05 Aug, 2020

1 commit

  • Delete repeated words in fs/xfs/.
    {we, that, the, a, to, fork}
    Change "it it" to "it is" in one location.

    Signed-off-by: Randy Dunlap
    To: linux-fsdevel@vger.kernel.org
    Cc: Darrick J. Wong
    Cc: linux-xfs@vger.kernel.org
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Randy Dunlap
     

29 Jul, 2020

1 commit

  • xlog_ticket_alloc() is always called under NOFS context, except from
    unmount path, which eitherway is holding many FS locks, so, there is no
    need for its callers to keep passing allocation flags into it.

    change xlog_ticket_alloc() to use default kmem_cache_zalloc(), remove
    its alloc_flags argument, and always use GFP_NOFS | __GFP_NOFAIL flags.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner

    Carlos Maiolino
     

23 Jun, 2020

1 commit

  • xlog_wait() on the CIL context can reference a freed context if the
    waiter doesn't get scheduled before the CIL context is freed. This
    can happen when a task is on the hard throttle and the CIL push
    aborts due to a shutdown. This was detected by generic/019:

    thread 1 thread 2

    __xfs_trans_commit
    xfs_log_commit_cil

    xlog_wait
    schedule
    xlog_cil_push_work
    wake_up_all

    xlog_cil_committed
    kmem_free

    remove_wait_queue
    spin_lock_irqsave --> UAF

    Fix it by moving the wait queue to the CIL rather than keeping it in
    in the CIL context that gets freed on push completion. Because the
    wait queue is now independent of the CIL context and we might have
    multiple contexts in flight at once, only wake the waiters on the
    push throttle when the context we are pushing is over the hard
    throttle size threshold.

    Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push")
    Reported-by: Yu Kuai
    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

27 Mar, 2020

5 commits

  • In certain situations the background CIL push can be indefinitely
    delayed. While we have workarounds from the obvious cases now, it
    doesn't solve the underlying issue. This issue is that there is no
    upper limit on the CIL where we will either force or wait for
    a background push to start, hence allowing the CIL to grow without
    bound until it consumes all log space.

    To fix this, add a new wait queue to the CIL which allows background
    pushes to wait for the CIL context to be switched out. This happens
    when the push starts, so it will allow us to block incoming
    transaction commit completion until the push has started. This will
    only affect processes that are running modifications, and only when
    the CIL threshold has been significantly overrun.

    This has no apparent impact on performance, and doesn't even trigger
    until over 45 million inodes had been created in a 16-way fsmark
    test on a 2GB log. That was limiting at 64MB of log space used, so
    the active CIL size is only about 3% of the total log in that case.
    The concurrent removal of those files did not trigger the background
    sleep at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Allison Collins
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • xlog_write_done() is just a thin wrapper around xlog_commit_record(), so
    they can be merged together easily.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Remove xlog_ticket_done and just call the renamed low-level helpers for
    ungranting or regranting log space directly. To make that a little
    the reference put on the ticket and all tracing is moved into the actual
    helpers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • xfs_log_done() does two separate things. Firstly, it triggers commit
    records to be written for permanent transactions, and secondly it
    releases or regrants transaction reservation space.

    Since delayed logging was introduced, transactions no longer write
    directly to the log, hence they never have the XLOG_TIC_INITED flag
    cleared on them. Hence transactions never write commit records to
    the log and only need to modify reservation space.

    Split up xfs_log_done into two parts, and only call the parts of the
    operation needed for the context xfs_log_done() is currently being
    called from.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • The xlog_write() function iterates over iclogs until it completes
    writing all the log vectors passed in. The ticket tracks whether
    a start record has been written or not, so only the first iclog gets
    a start record. We only ever pass single use tickets to
    xlog_write() so we only ever need to write a start record once per
    xlog_write() call.

    Hence we don't need to store whether we should write a start record
    in the ticket as the callers provide all the information we need to
    determine if a start record should be written. For the moment, we
    have to ensure that we clear the XLOG_TIC_INITED appropriately so
    the code in xfs_log_done() still works correctly for committing
    transactions.

    (darrick: Note the slight behavior change that we always deduct the
    size of the op header from the ticket, even for unmount records)

    Signed-off-by: Dave Chinner
    [hch: pass an explicit need_start_rec argument]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

23 Mar, 2020

3 commits


11 Nov, 2019

1 commit


22 Oct, 2019

1 commit

  • ic_state really is a set of different states, even if the values are
    encoded as non-conflicting bits and we sometimes use logical and
    operations to check for them. Switch all comparisms to check for
    exact values (and use switch statements in a few places to make it
    more clear) and turn the values into an implicitly enumerated enum
    type.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     

27 Aug, 2019

1 commit


29 Jun, 2019

6 commits

  • There are many, many xfs header files which are included but
    unneeded (or included twice) in the xfs code, so remove them.

    nb: xfs_linux.h includes about 9 headers for everyone, so those
    explicit includes get removed by this. I'm not sure what the
    preference is, but if we wanted explicit includes everywhere,
    a followup patch could remove those xfs_*.h includes from
    xfs_linux.h and move them into the files that need them.
    Or it could be left as-is.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • Replace the hand grown linked list handling and cil context attachment
    with the standard list_head structure.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The iop_unlock method is called when comitting or cancelling a
    transaction. In the latter case, the transaction may or may not be
    aborted. While there is no known problem with the current code in
    practice, this implementation is limited in that any log item
    implementation that might want to differentiate between a commit and a
    cancellation must rely on the aborted state. The aborted bit is only
    set when the cancelled transaction is dirty, however. This means that
    there is no way to distinguish between a commit and a clean transaction
    cancellation.

    For example, intent log items currently rely on this distinction. The
    log item is either transferred to the CIL on commit or released on
    transaction cancel. There is currently no possibility for a clean intent
    log item in a transaction, but if that state is ever introduced a cancel
    of such a transaction will immediately result in memory leaks of the
    associated log item(s). This is an interface deficiency and landmine.

    To clean this up, replace the iop_unlock method with an iop_release
    method that is specific to transaction cancel. The existing
    iop_committing method occurs at the same time as iop_unlock in the
    commit path and there is no need for two separate callbacks here.
    Overload the iop_committing method with the current commit time
    iop_unlock implementations to eliminate the need for the latter and
    further simplify the interface.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • While commiting items looks very similar to freeing them on error it is
    a different operation, and they will diverge a bit soon.

    Split out the commit case from xfs_trans_free_items, inline it into
    xfs_log_commit_cil and give it a separate trace point.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Just check if they are present first.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Just pass a straight bool aborted instead of abusing XFS_LI_ABORTED as a
    flag in function parameters.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

15 Apr, 2019

1 commit

  • XFS shutdown deadlocks have been reproduced by fstest generic/475.
    The deadlock signature involves log I/O completion running error
    handling to abort logged items and waiting for an inode cluster
    buffer lock in the buffer item unpin handler. The buffer lock is
    held by xfsaild attempting to flush an inode. The buffer happens to
    be pinned and so xfs_iflush() triggers an async log force to begin
    work required to get it unpinned. The log force is blocked waiting
    on the commit completion, which never occurs and thus leaves the
    filesystem deadlocked.

    The root problem is that aborted log I/O completion pots commit
    completion behind callback completion, which is unexpected for async
    log forces. Under normal running conditions, an async log force
    returns to the caller once the CIL ctx has been formatted/submitted
    and the commit completion event triggered at the tail end of
    xlog_cil_push(). If the filesystem has shutdown, however, we rely on
    xlog_cil_committed() to trigger the completion event and it happens
    to do so after running log item unpin callbacks. This makes it
    unsafe to invoke an async log force from contexts that hold locks
    that might also be required in log completion processing.

    To address this problem, wake commit completion waiters before
    aborting log items in the log I/O completion handler. This ensures
    that an async log force will not deadlock on held locks if the
    filesystem happens to shutdown. Note that it is still unsafe to
    issue a sync log force while holding such locks because a sync log
    force explicitly waits on the force completion, which occurs after
    log I/O completion processing.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

07 Jun, 2018

1 commit

  • Remove the verbose license text from XFS files and replace them
    with SPDX tags. This does not change the license of any of the code,
    merely refers to the common, up-to-date license files in LICENSES/

    This change was mostly scripted. fs/xfs/Makefile and
    fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
    and modified by the following command:

    for f in `git grep -l "GNU General" fs/xfs/` ; do
    echo $f
    cat $f | awk -f hdr.awk > $f.new
    mv -f $f.new $f
    done

    And the hdr.awk script that did the modification (including
    detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
    is as follows:

    $ cat hdr.awk
    BEGIN {
    hdr = 1.0
    tag = "GPL-2.0"
    str = ""
    }

    /^ \* This program is free software/ {
    hdr = 2.0;
    next
    }

    /any later version./ {
    tag = "GPL-2.0+"
    next
    }

    /^ \*\// {
    if (hdr > 0.0) {
    print "// SPDX-License-Identifier: " tag
    print str
    print $0
    str=""
    hdr = 0.0
    next
    }
    print $0
    next
    }

    /^ \* / {
    if (hdr > 1.0)
    next
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    next
    }

    /^ \*/ {
    if (hdr > 0.0)
    next
    print $0
    next
    }

    // {
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    }

    END { }
    $

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

10 May, 2018

2 commits

  • It's just a connector between a transaction and a log item. There's
    a 1:1 relationship between a log item descriptor and a log item,
    and a 1:1 relationship between a log item descriptor and a
    transaction. Both relationships are created and terminated at the
    same time, so why do we even have the descriptor?

    Replace it with a specific list_head in the log item and a new
    log item dirtied flag to replace the XFS_LID_DIRTY flag.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY]
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Because currently we have no idea what the transaction context we
    are operating in is, and I need to know that information to track
    down bugs in multiple log item joins to transactions.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

10 Apr, 2018

1 commit


12 Mar, 2018

1 commit

  • When using large directory blocks, we regularly see memory
    allocations of >64k being made for the shadow log vector buffer.
    When we are under memory pressure, kmalloc() may not be able to find
    contiguous memory chunks large enough to satisfy these allocations
    easily, and if memory is fragmented we can potentially stall here.

    TO avoid this problem, switch the log vector buffer allocation to
    use kmem_alloc_large(). This will allow failed allocations to fall
    back to vmalloc and so remove the dependency on large contiguous
    regions of memory being available. This should prevent slowdowns
    and potential stalls when memory is low and/or fragmented.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

05 Aug, 2017

1 commit

  • The bio describing discard operation is allocated by
    __blkdev_issue_discard() which returns us a reference to it. That
    reference is never released and thus we leak this bio. Drop the bio
    reference once it completes in xlog_discard_endio().

    CC: stable@vger.kernel.org
    Fixes: 4560e78f40cb55bd2ea8f1ef4001c5baa88531c7
    Signed-off-by: Jan Kara
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Jan Kara
     

19 Jun, 2017

4 commits

  • The t_lsn is not used anymore and the t_commit_lsn is used as a tmp
    storage for the checkpoint sequence number only in the current code.

    And the start/commit lsn are tracked as a transaction group tag in
    the xfs_cil_ctx instead of a single transaction, so remove them from
    the xfs_trans structure and their users to match with the design.

    Signed-off-by: Shan Hai
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Shan Hai
     
  • If a transaction log reservation overrun occurs, the ticket data
    associated with the reservation is dumped in xfs_log_commit_cil().
    This occurs long after the transaction items and details have been
    removed from the transaction and effectively lost. This limited set
    of ticket data provides very little information to support debugging
    transaction overruns based on the typical report.

    To improve transaction log reservation overrun reporting, create a
    helper to dump transaction details such as log items, log vector
    data, etc., as well as the underlying ticket data for the
    transaction. Move the overrun detection from xfs_log_commit_cil() to
    xlog_cil_insert_items() so it occurs prior to migration of the
    logged items to the CIL. Call the new helper such that it is able to
    dump this transaction data before it is lost.

    Also, warn on overrun to provide callstack context for the offending
    transaction and include a few additional messages from
    xlog_cil_insert_items() to display the reservation consumed locally
    for overhead such as log vector headers, split region headers and
    the context ticket. This provides a complete general breakdown of
    the reservation consumption of a transaction when/if it happens to
    overrun the reservation.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • Transaction reservation overrun detection currently occurs too late
    to print useful information about the offending transaction.
    Ideally, the transaction data is printed before the associated log
    items are moved from the transaction to the CIL, which occurs in
    xlog_cil_insert_items(), such that details of the items logged by
    the transaction are available for analysis.

    Refactor xlog_cil_insert_items() to facilitate moving tx overrun
    detection to this function. Update the function to track each bit of
    extra log reservation stolen from the transaction (i.e., such as for
    the CIL context ticket) and perform the log item migration as the
    last operation before the CIL lock is released. This creates a
    context where the transaction reservation consumption has been fully
    calculated when the log items are moved to the CIL. This patch makes
    no functional changes.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • xlog_print_tic_res() pre-dates delayed logging and the committed
    items list (CIL) and thus retains some factoring warts, such as hard
    coded function names in the output and the fact that it induces a
    shutdown.

    In preparation for more detailed logging of regular transaction
    overrun situations, refactor xlog_print_tic_res() to be slightly
    more generic. Reword some of the warning messages and pull the
    shutdown into the callers.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

10 Feb, 2017

1 commit


22 Jul, 2016

1 commit

  • One of the problems we currently have with delayed logging is that
    under serious memory pressure we can deadlock memory reclaim. THis
    occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
    inodes and issues a log force to unpin inodes that are dirty in the
    CIL.

    The CIL is pushed, but this will only occur once it gets the CIL
    context lock to ensure that all committing transactions are complete
    and no new transactions start being committed to the CIL while the
    push switches to a new context.

    The deadlock occurs when the CIL context lock is held by a
    committing process that is doing memory allocation for log vector
    buffers, and that allocation is then blocked on memory reclaim
    making progress. Memory reclaim, however, is blocked waiting for
    a log force to make progress, and so we effectively deadlock at this
    point.

    To solve this problem, we have to move the CIL log vector buffer
    allocation outside of the context lock so that memory reclaim can
    always make progress when it needs to force the log. The problem
    with doing this is that a CIL push can take place while we are
    determining if we need to allocate a new log vector buffer for
    an item and hence the current log vector may go away without
    warning. That means we canot rely on the existing log vector being
    present when we finally grab the context lock and so we must have a
    replacement buffer ready to go at all times.

    To ensure this, introduce a "shadow log vector" buffer that is
    always guaranteed to be present when we gain the CIL context lock
    and format the item. This shadow buffer may or may not be used
    during the formatting, but if the log item does not have an existing
    log vector buffer or that buffer is too small for the new
    modifications, we swap it for the new shadow buffer and format
    the modifications into that new log vector buffer.

    The result of this is that for any object we modify more than once
    in a given CIL checkpoint, we double the memory required
    to track dirty regions in the log. For single modifications then
    we consume the shadow log vectorwe allocate on commit, and that gets
    consumed by the checkpoint. However, if we make multiple
    modifications, then the second transaction commit will allocate a
    shadow log vector and hence we will end up with double the memory
    usage as only one of the log vectors is consumed by the CIL
    checkpoint. The remaining shadow vector will be freed when th elog
    item is freed.

    This can probably be optimised in future - access to the shadow log
    vector is serialised by the object lock (as opposited to the active
    log vector, which is controlled by the CIL context lock) and so we
    can probably free shadow log vector from some objects when the log
    item is marked clean on removal from the AIL.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

06 Apr, 2016

1 commit


29 Jul, 2015

1 commit

  • We have seen somewhat rare reports of the following assert from
    xlog_cil_push_background() failing during ltp tests or somewhat
    innocuous desktop root fs workloads (e.g., virt operations, initramfs
    construction):

    ASSERT(!list_empty(&cil->xc_cil));

    The reasoning behind the assert is that the transaction has inserted
    items to the CIL and hit background push codepath all with
    cil->xc_ctx_lock held for reading. This locks out background commit from
    emptying the CIL, which acquires the lock for writing. Therefore, the
    reasoning is that the items previously inserted in the CIL should still
    be present.

    The cil->xc_ctx_lock read lock is not sufficient to protect the xc_cil
    list, however, due to how CIL insertion is handled.
    xlog_cil_insert_items() inserts and reorders the dirty transaction items
    to the tail of the CIL under xc_cil_lock. It uses list_move_tail() to
    achieve insertion and reordering in the same block of code. This
    function removes and reinserts an item to the tail of the list. If a
    transaction commits an item that was already logged and thus already
    resides in the CIL, and said item is the sole item on the list, the
    removal and reinsertion creates a temporary state where the list is
    actually empty.

    This state is not valid and thus should never be observed by concurrent
    transaction commit-side checks in the circumstances outlined above. We
    do not want to acquire the xc_cil_lock in all of these instances as it
    was previously removed and replaced with a separate push lock for
    performance reasons. Therefore, close any races with list_empty() on the
    insertion side by ensuring that the list is never in a transient empty
    state.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Brian Foster
     

04 Jun, 2015

3 commits

  • Instead of the confusing flags argument pass a boolean flag to indicate if
    we want to release or regrant a log reservation.

    Also ensure that xfs_log_done always drop the reference on the log ticket,
    to both simplify the code and make the logic in xfs_trans_roll easier
    to understand.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • The flags argument to xfs_trans_commit is not useful for most callers, as
    a commit of a transaction without a permanent log reservation must pass
    0 here, and all callers for a transaction with a permanent log reservation
    except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove
    the flags argument from the public xfs_trans_commit interfaces, and
    introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
    that regrants a log reservation instead of releasing it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • The flags value always was 0 or XFS_TRANS_ABORT. Switch to a bool
    parameter to allow further cleanups.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

28 Nov, 2014

2 commits