07 Oct, 2020

1 commit

  • Separate the computation of the log push threshold and the push logic in
    xlog_grant_push_ail. This enables higher level code to determine (for
    example) that it is holding on to a logged intent item and the log is so
    busy that it is more than 75% full. In that case, it would be desirable
    to move the log item towards the head to release the tail, which we will
    cover in the next patch.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

24 Sep, 2020

1 commit

  • Let's use DIV_ROUND_UP() to calculate log record header
    blocks as what did in xlog_get_iclog_buffer_size() and
    wrap up a common helper for log recovery.

    Reviewed-by: Brian Foster
    Signed-off-by: Gao Xiang
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Gao Xiang
     

29 Jul, 2020

1 commit

  • xlog_ticket_alloc() is always called under NOFS context, except from
    unmount path, which eitherway is holding many FS locks, so, there is no
    need for its callers to keep passing allocation flags into it.

    change xlog_ticket_alloc() to use default kmem_cache_zalloc(), remove
    its alloc_flags argument, and always use GFP_NOFS | __GFP_NOFAIL flags.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner

    Carlos Maiolino
     

27 Mar, 2020

10 commits

  • In commit f467cad95f5e3, I added the ability to force a recalculation of
    the filesystem summary counters if they seemed incorrect. This was done
    (not entirely correctly) by tweaking the log code to write an unmount
    record without the UMOUNT_TRANS flag set. At next mount, the log
    recovery code will fail to find the unmount record and go into recovery,
    which triggers the recalculation.

    What actually gets written to the log is what ought to be an unmount
    record, but without any flags set to indicate what kind of record it
    actually is. This worked to trigger the recalculation, but we shouldn't
    write bogus log records when we could simply write nothing.

    Fixes: f467cad95f5e3 ("xfs: force summary counter recalc at next mount")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Brian Foster

    Darrick J. Wong
     
  • Running metadata intensive workloads, I've been seeing the AIL
    pushing getting stuck on pinned buffers and triggering log forces.
    The log force is taking a long time to run because the log IO is
    getting throttled by wbt_wait() - the block layer writeback
    throttle. It's being throttled because there is a huge amount of
    metadata writeback going on which is filling the request queue.

    IOWs, we have a priority inversion problem here.

    Mark the log IO bios with REQ_IDLE so they don't get throttled
    by the block layer writeback throttle. When we are forcing the CIL,
    we are likely to need to to tens of log IOs, and they are issued as
    fast as they can be build and IO completed. Hence REQ_IDLE is
    appropriate - it's an indication that more IO will follow shortly.

    And because we also set REQ_SYNC, the writeback throttle will now
    treat log IO the same way it treats direct IO writes - it will not
    throttle them at all. Hence we solve the priority inversion problem
    caused by the writeback throttle being unable to distinguish between
    high priority log IO and background metadata writeback.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Allison Collins
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Separate out the unmount record writing from the rest of the
    ticket and log state futzing necessary to make it work. This is
    a no-op, just makes the code cleaner and places the unmount record
    formatting and writing alongside the commit record formatting and
    writing code.

    We can also get rid of the ticket flag clearing before the
    xlog_write() call because it no longer cares about the state of
    XLOG_TIC_INITED.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • xlog_write_done() is just a thin wrapper around xlog_commit_record(), so
    they can be merged together easily.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Remove xlog_ticket_done and just call the renamed low-level helpers for
    ungranting or regranting log space directly. To make that a little
    the reference put on the ticket and all tracing is moved into the actual
    helpers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • It is not longer used or checked by anything, so remove the last
    traces from the log ticket code.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • xfs_log_done() does two separate things. Firstly, it triggers commit
    records to be written for permanent transactions, and secondly it
    releases or regrants transaction reservation space.

    Since delayed logging was introduced, transactions no longer write
    directly to the log, hence they never have the XLOG_TIC_INITED flag
    cleared on them. Hence transactions never write commit records to
    the log and only need to modify reservation space.

    Split up xfs_log_done into two parts, and only call the parts of the
    operation needed for the context xfs_log_done() is currently being
    called from.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Commit and unmount records records do not need start records to be
    written, so rearrange the logic in xlog_write() to remove the need
    to check for XLOG_TIC_INITED to determine if we should account for
    the space used by a start record.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • The xlog_write() function iterates over iclogs until it completes
    writing all the log vectors passed in. The ticket tracks whether
    a start record has been written or not, so only the first iclog gets
    a start record. We only ever pass single use tickets to
    xlog_write() so we only ever need to write a start record once per
    xlog_write() call.

    Hence we don't need to store whether we should write a start record
    in the ticket as the callers provide all the information we need to
    determine if a start record should be written. For the moment, we
    have to ensure that we clear the XLOG_TIC_INITED appropriately so
    the code in xfs_log_done() still works correctly for committing
    transactions.

    (darrick: Note the slight behavior change that we always deduct the
    size of the op header from the ticket, even for unmount records)

    Signed-off-by: Dave Chinner
    [hch: pass an explicit need_start_rec argument]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

26 Mar, 2020

1 commit

  • If the bio_add_page() call fails, we proceed to write out a
    partially constructed log buffer. This corrupts the physical log
    such that log recovery is not possible. Worse, persistent
    occurrences of this error eventually lead to a BUG_ON() failure in
    bio_split() as iclogs wrap the end of the physical log, which
    triggers log recovery on subsequent mount.

    Rather than warn about writing out a corrupted log buffer, shutdown
    the fs as is done for any log I/O related error. This preserves the
    consistency of the physical log such that log recovery succeeds on a
    subsequent mount. Note that this was observed on a 64k page debug
    kernel without upstream commit 59bb47985c1d ("mm, sl[aou]b:
    guarantee natural alignment for kmalloc(power-of-two)"), which
    demonstrated frequent iclog bio overflows due to unaligned (slab
    allocated) iclog data buffers.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

23 Mar, 2020

7 commits

  • Open code the xlog_state_want_sync logic in its two callers given that
    this function is a trivial wrapper around xlog_state_switch_iclogs.

    Move the lockdep assert into xlog_state_switch_iclogs to not lose this
    debugging aid, and improve the comment that documents
    xlog_state_switch_iclogs as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Use the shutdown flag in the log to bypass xlog_state_clean_iclog
    entirely in case of a shut down log.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Factor out a few self-contained helpers from xlog_state_clean_iclog, and
    update the documentation so it primarily documents why things happens
    instead of how.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • We can just check for a shut down log all the way down in
    xlog_cil_committed instead of passing the parameter. This means a
    slight behavior change in that we now also abort log items if the
    shutdown came in halfway into the I/O completion processing, which
    actually is the right thing to do.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • There is no need to check for the ioerror state before the lock, as
    the shutdown case is not a fast path. Also remove the call to force
    shutdown the file system, as it must have been shut down already
    for an iclog to be in the ioerror state. Also clean up the flow of
    the function a bit.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The only caller of xfs_log_release_iclog doesn't care about the return
    value, so remove it. Also don't bother passing the mount pointer,
    given that we can trivially derive it from the iclog.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Factor out the shared code to wait for a log force into a new helper.
    This helper uses the XLOG_FORCED_SHUTDOWN check previous only used
    by the unmount code over the equivalent iclog ioerror state used by
    the other two functions.

    There is a slight behavior change in that the force of the unmount
    record is now accounted in the log force statistics.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

14 Mar, 2020

3 commits

  • Move the code for verifying the iclog state on a clean unmount into a
    helper, and instead of checking the iclog state just rely on the shutdown
    check as they are equivalent. Also remove the ifdef DEBUG as the
    compiler is smart enough to eliminate the dead code for non-debug builds.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • When the log is shut down all iclogs are in the XLOG_STATE_IOERROR state,
    which means that xlog_state_want_sync and xlog_state_release_iclog are
    no-ops. Remove the whole section of code.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Remove the ignored return value from xfs_log_unmount_write, and also
    remove a rather pointless assert on the return value from xfs_log_force.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

03 Mar, 2020

1 commit

  • Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
    l_icloglock held"), xlog_state_release_iclog() always performed a
    locked check of the iclog error state before proceeding into the
    sync state processing code. As of this commit, part of
    xlog_state_release_iclog() was open-coded into
    xfs_log_release_iclog() and as a result the locked error state check
    was lost.

    The lockless check still exists, but this doesn't account for the
    possibility of a race with a shutdown being performed by another
    task causing the iclog state to change while the original task waits
    on ->l_icloglock. This has reproduced very rarely via generic/475
    and manifests as an assert failure in __xlog_state_release_iclog()
    due to an unexpected iclog state.

    Restore the locked error state check in xlog_state_release_iclog()
    to ensure that an iclog state update via shutdown doesn't race with
    the iclog release state processing code.

    Fixes: df732b29c807 ("xfs: call xlog_state_release_iclog with l_icloglock held")
    Reported-by: Zorro Lang
    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

04 Dec, 2019

1 commit

  • syzbot (via KASAN) reports a use-after-free in the error path of
    xlog_alloc_log(). Specifically, the iclog freeing loop doesn't
    handle the case of a fully initialized ->l_iclog linked list.
    Instead, it assumes that the list is partially constructed and NULL
    terminated.

    This bug manifested because there was no possible error scenario
    after iclog list setup when the original code was added. Subsequent
    code and associated error conditions were added some time later,
    while the original error handling code was never updated. Fix up the
    error loop to terminate either on a NULL iclog or reaching the end
    of the list.

    Reported-by: syzbot+c732f8644185de340492@syzkaller.appspotmail.com
    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

19 Nov, 2019

1 commit


11 Nov, 2019

1 commit


06 Nov, 2019

1 commit


22 Oct, 2019

7 commits

  • XLOG_STATE_DO_CALLBACK is only entered through XLOG_STATE_DONE_SYNC
    and just used in a single debug check. Remove the flag and thus
    simplify the calling conventions for xlog_state_do_callback and
    xlog_state_iodone_process_iclog.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     
  • ic_state really is a set of different states, even if the values are
    encoded as non-conflicting bits and we sometimes use logical and
    operations to check for them. Switch all comparisms to check for
    exact values (and use switch statements in a few places to make it
    more clear) and turn the values into an implicitly enumerated enum
    type.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     
  • XFSERRORDEBUG is never set and the code isn't all that useful, so remove
    it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     
  • All but one caller of xlog_state_release_iclog hold l_icloglock and need
    to drop and reacquire it to call xlog_state_release_iclog. Switch the
    xlog_state_release_iclog calling conventions to expect the lock to be
    held, and open code the logic (using a shared helper) in the only
    remaining caller that does not have the lock (and where not holding it
    is a nice performance optimization). Also move the refactored code to
    require the least amount of forward declarations.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    [darrick: minor whitespace cleanup]
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     
  • This will allow optimizing various locking cycles in the following
    patches.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     
  • ic_io_size is only used inside xlog_write_iclog, where we can just use
    the count parameter intead.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     
  • xlog_write_iclog expects a bool for the second argument. While any
    non-0 value happens to work fine this makes all calls consistent.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Christoph Hellwig
     

07 Oct, 2019

1 commit

  • Guarantee zeroed memory buffers for cases where potential memory
    leak to disk can occur. In these cases, kmem_alloc is used and
    doesn't zero the buffer, opening the possibility of information
    leakage to disk.

    Use existing infrastucture (xfs_buf_allocate_memory) to obtain
    the already zeroed buffer from kernel memory.

    This solution avoids the performance issue that would occur if a
    wholesale change to replace kmem_alloc with kmem_zalloc was done.

    Signed-off-by: Bill O'Donnell
    [darrick: fix bitwise complaint about kmflag_mask]
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Bill O'Donnell
     

06 Sep, 2019

3 commits

  • When the log fills up, we can get into the state where the
    outstanding items in the CIL being committed and aggregated are
    larger than the range that the reservation grant head tail pushing
    will attempt to clean. This can result in the tail pushing range
    being trimmed back to the the log head (l_last_sync_lsn) and so
    may not actually move the push target at all.

    When the iclogs associated with the CIL commit finally land, the
    log head moves forward, and this removes the restriction on the AIL
    push target. However, if we already have transactions sleeping on
    the grant head, and there's nothing in the AIL still to flush from
    the current push target, then nothing will move the tail of the log
    and trigger a log reservation wakeup.

    Hence the there is nothing that will trigger xlog_grant_push_ail()
    to recalculate the AIL push target and start pushing on the AIL
    again to write back the metadata objects that pin the tail of the
    log and hence free up space and allow the transaction reservations
    to be woken and make progress.

    Hence we need to push on the grant head when we move the log head
    forward, as this may be the only trigger we have that can move the
    AIL push target forwards in this situation.

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • xlog_state_clean_log() is only called from one place, and it occurs
    when an iclog is transitioning back to ACTIVE. Prior to calling
    xlog_state_clean_log, the iclog we are processing has a hard coded
    state check to DIRTY so that xlog_state_clean_log() processes it
    correctly. We also have a hard coded wakeup after
    xlog_state_clean_log() to enfore log force waiters on that iclog
    are woken correctly.

    Both of these things are operations required to finish processing an
    iclog and return it to the ACTIVE state again, so they make little
    sense to be separated from the rest of the clean state transition
    code.

    Hence push these things inside xlog_state_clean_log(), document the
    behaviour and rename it xlog_state_clean_iclog() to indicate that
    it's being driven by an iclog state change and does the iclog state
    change work itself.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • The iclog IO completion state processing is somewhat complex, and
    because it's inside two nested loops it is highly indented and very
    hard to read. Factor it out, flatten the logic flow and clean up the
    comments so that it much easier to see what the code is doing both
    in processing the individual iclogs and in the over
    xlog_state_do_callback() operation.

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner