22 Oct, 2020

1 commit

  • If processing recovered log intent items fails, we need to cancel all
    the unprocessed recovered items immediately so that a subsequent AIL
    push in the bail out path won't get wedged on the pinned intent items
    that didn't get processed.

    This can happen if the log contains (1) an intent that gets and releases
    an inode, (2) an intent that cannot be recovered successfully, and (3)
    some third intent item. When recovery of (2) fails, we leave (3) pinned
    in memory. Inode reclamation is called in the error-out path of
    xfs_mountfs before xfs_log_cancel_mount. Reclamation calls
    xfs_ail_push_all_sync, which gets stuck waiting for (3).

    Therefore, call xlog_recover_cancel_intents if _process_intents fails.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

07 Oct, 2020

5 commits

  • In xfs_bui_item_recover, there exists a use-after-free bug with regards
    to the inode that is involved in the bmap replay operation. If the
    mapping operation does not complete, we call xfs_bmap_unmap_extent to
    create a deferred op to finish the unmapping work, and we retain a
    pointer to the incore inode.

    Unfortunately, the very next thing we do is commit the transaction and
    drop the inode. If reclaim tears down the inode before we try to finish
    the defer ops, we dereference garbage and blow up. Therefore, create a
    way to join inodes to the defer ops freezer so that we can maintain the
    xfs_inode reference until we're done with the inode.

    Note: This imposes the requirement that there be enough memory to keep
    every incore inode in memory throughout recovery.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • When xfs_defer_capture extracts the deferred ops and transaction state
    from a transaction, it should record the transaction reservation type
    from the old transaction so that when we continue the dfops chain, we
    still use the same reservation parameters.

    Doing this means that the log item recovery functions get to determine
    the transaction reservation instead of abusing tr_itruncate in yet
    another part of xfs.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • When xfs_defer_capture extracts the deferred ops and transaction state
    from a transaction, it should record the remaining block reservations so
    that when we continue the dfops chain, we can reserve the same number of
    blocks to use. We capture the reservations for both data and realtime
    volumes.

    This adds the requirement that every log intent item recovery function
    must be careful to reserve enough blocks to handle both itself and all
    defer ops that it can queue. On the other hand, this enables us to do
    away with the handwaving block estimation nonsense that was going on in
    xlog_finish_defer_ops.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Brian Foster

    Darrick J. Wong
     
  • When we replay unfinished intent items that have been recovered from the
    log, it's possible that the replay will cause the creation of more
    deferred work items. As outlined in commit 509955823cc9c ("xfs: log
    recovery should replay deferred ops in order"), later work items have an
    implicit ordering dependency on earlier work items. Therefore, recovery
    must replay the items (both recovered and created) in the same order
    that they would have been during normal operation.

    For log recovery, we enforce this ordering by using an empty transaction
    to collect deferred ops that get created in the process of recovering a
    log intent item to prevent them from being committed before the rest of
    the recovered intent items. After we finish committing all the
    recovered log items, we allocate a transaction with an enormous block
    reservation, splice our huge list of created deferred ops into that
    transaction, and commit it, thereby finishing all those ops.

    This is /really/ hokey -- it's the one place in XFS where we allow
    nested transactions; the splicing of the defer ops list is is inelegant
    and has to be done twice per recovery function; and the broken way we
    handle inode pointers and block reservations cause subtle use-after-free
    and allocator problems that will be fixed by this patch and the two
    patches after it.

    Therefore, replace the hokey empty transaction with a structure designed
    to capture each chain of deferred ops that are created as part of
    recovering a single unfinished log intent. Finally, refactor the loop
    that replays those chains to do so using one transaction per chain.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • The ->iop_recover method of a log intent item removes the recovered
    intent item from the AIL by logging an intent done item and committing
    the transaction, so it's superfluous to have this flag check. Nothing
    else uses it, so get rid of the flag entirely.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

26 Sep, 2020

1 commit

  • We should do the assert for all the log intent-done items if they appear
    here. This patch detect intent-done items by the fact that their item ops
    don't have iop_unpin and iop_push methods and also move the helper
    xlog_item_is_intent to xfs_trans.h.

    Signed-off-by: Kaixu Xia
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Kaixu Xia
     

24 Sep, 2020

1 commit

  • Let's use DIV_ROUND_UP() to calculate log record header
    blocks as what did in xlog_get_iclog_buffer_size() and
    wrap up a common helper for log recovery.

    Reviewed-by: Brian Foster
    Signed-off-by: Gao Xiang
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Gao Xiang
     

23 Sep, 2020

1 commit

  • Currently, crafted h_len has been blocked for the log
    header of the tail block in commit a70f9fe52daa ("xfs:
    detect and handle invalid iclog size set by mkfs").

    However, each log record could still have crafted h_len
    and cause log record buffer overrun. So let's check
    h_len vs buffer size for each log record as well.

    Signed-off-by: Gao Xiang
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Gao Xiang
     

16 Sep, 2020

5 commits

  • Instead of poking deeply into buffer cache internals when re-reading the
    superblock during log recovery just generalize _xfs_buf_read and use it
    there. Note that we don't have to explicitly set up the ops as they
    must be set from the initial read.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Merge xfs_getsb into its only caller, and clean that one up a little bit
    as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The log recovery I/O completion handler does not substancially differ from
    the normal one except for the fact that it:

    a) never retries failed writes
    b) can have log items that aren't on the AIL
    c) never has inode/dquot log items attached and thus don't need to
    handle them

    Add conditionals for (a) and (b) to the ioend code, while (c) doesn't
    need special handling anyway.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • No need to keep a separate helper for this logic.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Move the log recovery I/O completion handling entirely into the log
    recovery code, and re-arrange the normal I/O completion handler flow
    to prepare to lifting more logic into common code in the next commits.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

07 Sep, 2020

1 commit

  • Remove kmem_realloc() function and convert its users to use MM API
    directly (krealloc())

    Signed-off-by: Carlos Maiolino
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Carlos Maiolino
     

05 Aug, 2020

1 commit

  • Delete repeated words in fs/xfs/.
    {we, that, the, a, to, fork}
    Change "it it" to "it is" in one location.

    Signed-off-by: Randy Dunlap
    To: linux-fsdevel@vger.kernel.org
    Cc: Darrick J. Wong
    Cc: linux-xfs@vger.kernel.org
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Randy Dunlap
     

07 Jul, 2020

1 commit

  • Log recovery has it's own buffer write completion handler for
    buffers that it directly recovers. Convert these to direct calls by
    flagging these buffers as being log recovery buffers. The flag will
    get cleared by the log recovery IO completion routine, so it will
    never leak out of log recovery.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

08 May, 2020

23 commits