02 Sep, 2017

4 commits

  • Ordered buffers are used in situations where the buffer is not
    physically logged but must pass through the transaction/logging
    pipeline for a particular transaction. As a result, ordered buffers
    are not unpinned and written back until the transaction commits to
    the log. Ordered buffers have a strict requirement that the target
    buffer must not be currently dirty and resident in the log pipeline
    at the time it is marked ordered. If a dirty+ordered buffer is
    committed, the buffer is reinserted to the AIL but not physically
    relogged at the LSN of the associated checkpoint. The buffer log
    item is assigned the LSN of the latest checkpoint and the AIL
    effectively releases the previously logged buffer content from the
    active log before the buffer has been written back. If the tail
    pushes forward and a filesystem crash occurs while in this state, an
    inconsistent filesystem could result.

    It is currently the caller responsibility to ensure an ordered
    buffer is not already dirty from a previous modification. This is
    unclear and error prone when not used in situations where it is
    guaranteed a buffer has not been previously modified (such as new
    metadata allocations).

    To facilitate general purpose use of ordered buffers, update
    xfs_trans_ordered_buf() to conditionally order the buffer based on
    state of the log item and return the status of the result. If the
    bli is dirty, do not order the buffer and return false. The caller
    must either physically log the buffer (having acquired the
    appropriate log reservation) or push it from the AIL to clean it
    before it can be marked ordered in the current transaction.

    Note that ordered buffers are currently only used in two situations:
    1.) inode chunk allocation where previously logged buffers are not
    possible and 2.) extent swap which will be updated to handle ordered
    buffer failures in a separate patch.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • Ordered buffers are attached to transactions and pushed through the
    logging infrastructure just like normal buffers with the exception
    that they are not actually written to the log. Therefore, we don't
    need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
    called on ordered buffers to set up all of the dirty state on the
    transaction, buffer and log item and prepare the buffer for I/O.

    Now that xfs_trans_dirty_buf() is available, call it from
    xfs_trans_ordered_buf() so the latter is now mutually exclusive with
    xfs_trans_log_buf(). This reflects the implementation of ordered
    buffers and helps eliminate confusion over the need to log ranges of
    ordered buffers just to set up internal log state.

    Signed-off-by: Brian Foster
    Reviewed-by: Allison Henderson
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • xfs_trans_log_buf() is responsible for logging the dirty segments of
    a buffer along with setting all of the necessary state on the
    transaction, buffer, bli, etc., to ensure that the associated items
    are marked as dirty and prepared for I/O. We have a couple use cases
    that need to to dirty a buffer in a transaction without actually
    logging dirty ranges of the buffer. One existing use case is
    ordered buffers, which are currently logged with arbitrary ranges to
    accomplish this even though the content of ordered buffers is never
    written to the log. Another pending use case is to relog an already
    dirty buffer across rolled transactions within the deferred
    operations infrastructure. This is required to prevent a held
    (XFS_BLI_HOLD) buffer from pinning the tail of the log.

    Refactor xfs_trans_log_buf() into a new function that contains all
    of the logic responsible to dirty the transaction, lidp, buffer and
    bli. This new function can be used in the future for the use cases
    outlined above. This patch does not introduce functional changes.

    Signed-off-by: Brian Foster
    Reviewed-by: Allison Henderson
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • It checks a single flag and has one caller. It probably isn't worth
    its own function.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

19 Jun, 2017

1 commit

  • If a filesystem shutdown occurs with a buffer log item in the CIL
    and a log force occurs, the ->iop_unpin() handler is generally
    expected to tear down the bli properly. This entails freeing the bli
    memory and releasing the associated hold on the buffer so it can be
    released and the filesystem unmounted.

    If this sequence occurs while ->bli_refcount is elevated (i.e.,
    another transaction is open and attempting to modify the buffer),
    however, ->iop_unpin() may not be responsible for releasing the bli.
    Instead, the transaction may release the final ->bli_refcount
    reference and thus xfs_trans_brelse() is responsible for tearing
    down the bli.

    While xfs_trans_brelse() does drop the reference count, it only
    attempts to release the bli if it is clean (i.e., not in the
    CIL/AIL). If the filesystem is shutdown and the bli is sitting dirty
    in the CIL as noted above, this ends up skipping the last
    opportunity to release the bli. In turn, this leaves the hold on the
    buffer and causes an unmount hang. This can be reproduced by running
    generic/388 in repetition.

    Update xfs_trans_brelse() to handle this shutdown corner case
    correctly. If the final bli reference is dropped and the filesystem
    is shutdown, remove the bli from the AIL (if necessary) and release
    the bli to drop the buffer hold and ensure an unmount does not hang.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

10 Feb, 2016

2 commits


10 Feb, 2015

1 commit

  • The commit 2d3d0c5 ("xfs: lobotomise xfs_trans_read_buf_map()") left
    a landmine in the tracing code: trace_xfs_trans_buf_read() is now
    call on all buffers that are read through this interface rather than
    just buffers in transactions. For buffers outside transaction
    context, bp->b_fspriv is null, and so the buf log item tracing
    functions cannot be called. This causes a NULL pointer dereference
    in the trace_xfs_trans_buf_read() function when tracing is turned
    on.

    cc:
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

04 Dec, 2014

2 commits

  • Conflicts:
    fs/xfs/xfs_iops.c

    Dave Chinner
     
  • There's a case in that code where it checks for a buffer match in a
    transaction where the buffer is not marked done. i.e. trying to
    catch a buffer we have locked in the transaction but have not
    completed IO on.

    The only way we can find a buffer that has not had IO completed on
    it is if it had readahead issued on it, but we never do readahead on
    buffers that we have already joined into a transaction. Hence this
    condition cannot occur, and buffers locked and joined into a
    transaction should always be marked done and not under IO.

    Remove this code and re-order xfs_trans_read_buf_map() to remove
    duplicated IO dispatch and error handling code.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

28 Nov, 2014

2 commits


02 Oct, 2014

2 commits

  • There is a lot of cookie-cutter code that looks like:

    if (shutdown)
    handle buffer error
    xfs_buf_iorequest(bp)
    error = xfs_buf_iowait(bp)
    if (error)
    handle buffer error

    spread through XFS. There's significant complexity now in
    xfs_buf_iorequest() to specifically handle this sort of synchronous
    IO pattern, but there's all sorts of nasty surprises in different
    error handling code dependent on who owns the buffer references and
    the locks.

    Pull this pattern into a single helper, where we can hide all the
    synchronous IO warts and hence make the error handling for all the
    callers much saner. This removes the need for a special extra
    reference to protect IO completion processing, as we can now hold a
    single reference across dispatch and waiting, simplifying the sync
    IO smeantics and error handling.

    In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
    make it explicitly handle on asynchronous IO. This forces all users
    to be switched specifically to one interface or the other and
    removes any ambiguity between how the interfaces are to be used. It
    also means that xfs_buf_iowait() goes away.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • There is only one caller now - xfs_trans_read_buf_map() - and it has
    very well defined call semantics - read, synchronous, and b_iodone
    is NULL. Hence it's pretty clear what error handling is necessary
    for this case. The bigger problem of untangling
    xfs_trans_read_buf_map error handling is left to a future patch.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

25 Jun, 2014

1 commit

  • Convert all the errors the core XFs code to negative error signs
    like the rest of the kernel and remove all the sign conversion we
    do in the interface layers.

    Errors for conversion (and comparison) found via searches like:

    $ git grep " E" fs/xfs
    $ git grep "return E" fs/xfs
    $ git grep " E[A-Z].*;$" fs/xfs

    Negation points found via searches like:

    $ git grep "= -[a-z,A-Z]" fs/xfs
    $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
    $ git grep " -[a-z].*;" fs/xfs

    [ with some bits I missed from Brian Foster ]

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

22 Jun, 2014

2 commits

  • XFS_ERROR was designed long ago to trap return values, but it's not
    runtime configurable, it's not consistently used, and we can do
    similar error trapping with ftrace scripts and triggers from
    userspace.

    Just nuke XFS_ERROR and associated bits.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Eric Sandeen
     
  • return is not a function. "return(EIO);" is silly;
    "return (EIO);" moreso. return is not a function.
    Nuke the pointless parens.

    [dchinner: catch a couple of extra cases in xfs_attr_list.c,
    xfs_acl.c and xfs_linux.h.]

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

07 Mar, 2014

1 commit

  • While the verifier routines may return EFSBADCRC when a buffer has
    a bad CRC, we need to translate that to EFSCORRUPTED so that the
    higher layers treat the error appropriately and we return a
    consistent error to userspace. This fixes a xfs/005 regression.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

17 Dec, 2013

1 commit

  • The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
    handles the case of a shut down filesystem. Most of the users have private,
    uncached buffers that can just be freed in this case, but the complex error
    handling in xfs_bioerror_relse messes up the case when it's called without
    a locked buffer.

    Remove xfsbdstrat and opencode the error handling in the callers. All but
    one can simply return an error and don't need to deal with buffer state,
    and the one caller that cares about the buffer state could do with a major
    cleanup as well, but we'll defer that to later.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

24 Oct, 2013

3 commits

  • Currently the xfs_inode.h header has a dependency on the definition
    of the BMAP btree records as the inode fork includes an array of
    xfs_bmbt_rec_host_t objects in it's definition.

    Move all the btree format definitions from xfs_btree.h,
    xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
    xfs_format.h to continue the process of centralising the on-disk
    format definitions. With this done, the xfs inode definitions are no
    longer dependent on btree header files.

    The enables a massive culling of unnecessary includes, with close to
    200 #include directives removed from the XFS kernel code base.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_trans.h has a dependency on xfs_log.h for a couple of
    structures. Most code that does transactions doesn't need to know
    anything about the log, but this dependency means that they have to
    include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
    files and clean up the includes to be in dependency order.

    In doing this, remove the direct include of xfs_trans_reserve.h from
    xfs_trans.h so that we remove the dependency between xfs_trans.h and
    xfs_mount.h. Hence the xfs_trans.h include can be moved to the
    indicate the actual dependencies other header files have on it.

    Note that these are kernel only header files, so this does not
    translate to any userspace changes at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • All of the buffer operations structures are needed to be exported
    for xfs_db, so move them all to a common location rather than
    spreading them all over the place. They are verifying the on-disk
    format, so while xfs_format.h might be a good place, it is not part
    of the on disk format.

    Hence we need to create a new header file that we centralise these
    related definitions. Start by moving the bffer operations
    structures, and then also move all the other definitions that have
    crept into xfs_log_format.h and xfs_format.h as there was no other
    shared header file to put them in.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

31 Aug, 2013

1 commit

  • In optimising the CIL operations, some of the IOP_* macros for
    calling log item operations were removed. Remove the rest of them as
    Christoph requested.

    Signed-off-by: Dave Chinner
    Reviewed-by: Geoffrey Wehrman
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

28 Jun, 2013

1 commit

  • If we have a buffer that we have modified but we do not wish to
    physically log in a transaction (e.g. we've logged a logical
    change), we still need to ensure that transactional integrity is
    maintained. Hence we must not move the tail of the log past the
    transaction that the buffer is associated with before the buffer is
    written to disk.

    This means these special buffers still need to be included in the
    transaction and added to the AIL just like a normal buffer, but we
    do not want the modifications to the buffer written into the
    transaction. IOWs, what we want is an "ordered buffer" that
    maintains the same transactional life cycle as a physically logged
    buffer, just without the transcribing of the modifications to the
    log.

    Hence we need to flag the buffer as an "ordered buffer" to avoid
    including it in vector size calculations or formatting during the
    transaction. Once the transaction is committed, the buffer appears
    for all intents to be the same as a physically logged buffer as it
    transitions through the log and AIL.

    Relogging will also work just fine for such an ordered buffer - the
    logical transaction will be replayed before the subsequent
    modifications that relog the buffer, so everything will be
    reconstructed correctly by recovery.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

28 Apr, 2013

2 commits

  • The buffer type passed to log recvoery in the buffer log item
    overruns the blf_flags field. I had assumed that flags field was a
    32 bit value, and it turns out it is a unisgned short. Therefore
    having 19 flags doesn't really work.

    Convert the buffer type field to numeric value, and use the top 5
    bits of the flags field for it. We currently have 17 types of
    buffers, so using 5 bits gives us plenty of room for expansion in
    future....

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Add buffer types to the buffer log items so that log recovery can
    validate the buffers and calculate CRCs correctly after the buffers
    are recovered.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     

22 Apr, 2013

2 commits

  • Add a new inode version with a larger core. The primary objective is
    to allow for a crc of the inode, and location information (uuid and ino)
    to verify it was written in the right place. We also extend it by:

    a creation time (for Samba);
    a changecount (for NFSv4);
    a flush sequence (in LSN format for recovery);
    an additional inode flags field; and
    some additional padding.

    These additional fields are not implemented yet, but already laid
    out in the structure.

    [dchinner@redhat.com] Added LSN and flags field, some factoring and rework to
    capture all the necessary information in the crc calculation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Add support for larger btree blocks that contains a CRC32C checksum,
    a filesystem uuid and block number for detecting filesystem
    consistency and out of place writes.

    [dchinner@redhat.com] Also include an owner field to allow reverse
    mappings to be implemented for improved repairability and a LSN
    field to so that log recovery can easily determine the last
    modification that made it to disk for each buffer.

    [dchinner@redhat.com] Add buffer log format flags to indicate the
    type of buffer to recovery so that we don't have to do blind magic
    number tests to determine what the buffer is.

    [dchinner@redhat.com] Modified to fit into the verifier structure.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

17 Jan, 2013

2 commits


16 Nov, 2012

2 commits

  • To separate the verifiers from iodone functions and associate read
    and write verifiers at the same time, introduce a buffer verifier
    operations structure to the xfs_buf.

    This avoids the need for assigning the write verifier, clearing the
    iodone function and re-running ioend processing in the read
    verifier, and gets rid of the nasty "b_pre_io" name for the write
    verifier function pointer. If we ever need to, it will also be
    easier to add further content specific callbacks to a buffer with an
    ops structure in place.

    We also avoid needing to export verifier functions, instead we
    can simply export the ops structures for those that are needed
    outside the function they are defined in.

    This patch also fixes a directory block readahead verifier issue
    it exposed.

    This patch also adds ops callbacks to the inode/alloc btree blocks
    initialised by growfs. These will need more work before they will
    work with CRCs.

    Signed-off-by: Dave Chinner
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Add a verifier function callback capability to the buffer read
    interfaces. This will be used by the callers to supply a function
    that verifies the contents of the buffer when it is read from disk.
    This patch does not provide callback functions, but simply modifies
    the interfaces to allow them to be called.

    The reason for adding this to the read interfaces is that it is very
    difficult to tell fom the outside is a buffer was just read from
    disk or whether we just pulled it out of cache. Supplying a callbck
    allows the buffer cache to use it's internal knowledge of the buffer
    to execute it only when the buffer is read from disk.

    It is intended that the verifier functions will mark the buffer with
    an EFSCORRUPTED error when verification fails. This allows the
    reading context to distinguish a verification error from an IO
    error, and potentially take further actions on the buffer (e.g.
    attempt repair) based on the error reported.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     

02 Jul, 2012

1 commit

  • Now that the buffer cache supports discontiguous buffers, add
    support to the transaction buffer interface for getting and reading
    buffers.

    Note that this patch does not convert the buffer item logging to
    support discontiguous buffers. That will be done as a separate
    commit.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

15 May, 2012

7 commits

  • Rather than specifying XBF_MAPPED for almost all buffers, introduce
    XBF_UNMAPPED for the couple of users that use unmapped buffers.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The only thing left in xfs_rw.h is a function prototype for an inode
    function. Move that to xfs_inode.h, and kill xfs_rw.h.

    Also move the function implementing the prototype from xfs_rw.c to
    xfs_inode.c so we only have one function left in xfs_rw.c

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Untangle the header file includes a bit by moving the definition of
    xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
    xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
    xfs_ag.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
    This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
    avoid recursion through memory reclaim back into the filesystem.

    All the blocking get calls in growfs occur inside a transaction, even though
    they are no part of the transaction, so all allocation will be GFP_NOFS due to
    the task flag PF_TRANS being set. The blocking read calls occur during log
    recovery, so they will probably be unaffected by converting to GFP_NOFS
    allocations.

    Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called
    outside a transaction context. The error handling is slightly different in that
    xfs_read_buf stales the errored buffer it gets back, but there is probably good
    reason for xfs_trans_read_buf() for doing this.

    Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(),
    and convert all the callers of xfs_read_buf() to use the former function. We can
    then remove xfs_read_buf().

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Buffers are always returned locked from the lookup routines. Hence
    we don't need to tell the lookup routines to return locked buffers,
    on to try and lock them. Remove XBF_LOCK from all the callers and
    from internal buffer cache usage.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner