02 Sep, 2017

1 commit

  • Split xfs_trans_roll into a low-level helper that just rolls the
    actual transaction and a new higher level xfs_trans_roll_inode
    that takes care of logging and rejoining the inode. This gets
    rid of the NULL inode case, and allows to simplify the special
    cases in the deferred operation code.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

07 May, 2017

1 commit

  • Pull xfs updates from Darrick Wong:
    "Here are the XFS changes for 4.12. The big new feature for this
    release is the new space mapping ioctl that we've been discussing
    since LSF2016, but other than that most of the patches are larger bug
    fixes, memory corruption prevention, and other cleanups.

    Summary:
    - various code cleanups
    - introduce GETFSMAP ioctl
    - various refactoring
    - avoid dio reads past eof
    - fix memory corruption and other errors with fragmented directory blocks
    - fix accidental userspace memory corruptions
    - publish fs uuid in superblock
    - make fstrim terminatable
    - fix race between quotaoff and in-core inode creation
    - avoid use-after-free when finishing up w/ buffer heads
    - reserve enough space to handle bmap tree resizing during cow remap"

    * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits)
    xfs: fix use-after-free in xfs_finish_page_writeback
    xfs: reserve enough blocks to handle btree splits when remapping
    xfs: wait on new inodes during quotaoff dquot release
    xfs: update ag iterator to support wait on new inodes
    xfs: support ability to wait on new inodes
    xfs: publish UUID in struct super_block
    xfs: Allow user to kill fstrim process
    xfs: better log intent item refcount checking
    xfs: fix up quotacheck buffer list error handling
    xfs: remove xfs_trans_ail_delete_bulk
    xfs: don't use bool values in trace buffers
    xfs: fix getfsmap userspace memory corruption while setting OF_LAST
    xfs: fix __user annotations for xfs_ioc_getfsmap
    xfs: corruption needs to respect endianess too!
    xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
    xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
    xfs: simplify validation of the unwritten extent bit
    xfs: remove unused values from xfs_exntst_t
    xfs: remove the unused XFS_MAXLINK_1 define
    xfs: more do_div cleanups
    ...

    Linus Torvalds
     

04 May, 2017

1 commit

  • xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite
    some time ago. We would like to make this concept more generic and use
    it for other filesystems as well. Let's start by giving the flag a more
    generic name PF_MEMALLOC_NOFS which is in line with an exiting
    PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO
    contexts. Replace all PF_FSTRANS usage from the xfs code in the first
    step before we introduce a full API for it as xfs uses the flag directly
    anyway.

    This patch doesn't introduce any functional change.

    Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Acked-by: Vlastimil Babka
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Jan Kara
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

07 Apr, 2017

1 commit


04 Apr, 2017

1 commit


03 Oct, 2016

1 commit


19 Sep, 2016

1 commit

  • One unfortunate quirk of the reference count and reverse mapping
    btrees -- they can expand in size when blocks are written to *other*
    allocation groups if, say, one large extent becomes a lot of tiny
    extents. Since we don't want to start throwing errors in the middle
    of CoWing, we need to reserve some blocks to handle future expansion.
    The transaction block reservation counters aren't sufficient here
    because we have to have a reserve of blocks in every AG, not just
    somewhere in the filesystem.

    Therefore, create two per-AG block reservation pools. One feeds the
    AGFL so that rmapbt expansion always succeeds, and the other feeds all
    other metadata so that refcountbt expansion never fails.

    Use the count of how many reserved blocks we need to have on hand to
    create a virtual reservation in the AG. Through selective clamping of
    the maximum length of allocation requests and of the length of the
    longest free extent, we can make it look like there's less free space
    in the AG unless the reservation owner is asking for blocks.

    In other words, play some accounting tricks in-core to make sure that
    we always have blocks available. On the plus side, there's nothing to
    clean up if we crash, which is contrast to the strategy that the rough
    draft used (actually removing extents from the freespace btrees).

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     

14 Sep, 2016

1 commit

  • "blocks" should be added back to fdblocks at undo time, not taken
    away, i.e. the minus sign should not be used.

    This is a regression introduced by commit 0d485ada404b ("xfs: use
    generic percpu counters for free block counter"). And it's found by
    code inspection, I didn't it in real world, so there's no
    reproducer.

    Signed-off-by: Eryu Guan
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Eryu Guan
     

06 Apr, 2016

2 commits

  • These aren't used for CIL-style logging and can be dropped.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
    that returns a transaction with all the required log and block reservations,
    and which allows passing transaction flags directly to avoid the cumbersome
    _xfs_trans_alloc interface.

    While we're at it we also get rid of the transaction type argument that has
    been superflous since we stopped supporting the non-CIL logging mode. The
    guts of it will be removed in another patch.

    [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

15 Mar, 2016

1 commit

  • __xfs_trans_roll() can return without setting the
    *committed argument; this was a problem for xfs_bmap_finish():

    int committed;/* xact committed or not */
    ...
    error = __xfs_trans_roll(tp, ip, &committed);
    if (error) {
    ...
    if (committed) {

    and we tested an uninitialized "committed" variable on the
    error path. No caller is preserving "committed" state across
    calls to __xfs_trans_roll(), so just initialize committed inside
    the function to avoid future errors like this.

    Reported-by: Dan Carpenter
    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

12 Oct, 2015

1 commit

  • This patch modifies the stats counting macros and the callers
    to those macros to properly increment, decrement, and add-to
    the xfs stats counts. The counts for global and per-fs stats
    are correctly advanced, and cleared by writing a "1" to the
    corresponding clear file.

    global counts: /sys/fs/xfs/stats/stats
    per-fs counts: /sys/fs/xfs/sda*/stats/stats

    global clear: /sys/fs/xfs/stats/stats_clear
    per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear

    [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
    stats structures and macros. ]

    Signed-off-by: Bill O'Donnell
    Reviewed-by: Eric Sandeen
    Signed-off-by: Dave Chinner

    Bill O'Donnell
     

19 Aug, 2015

1 commit

  • Some callers need to make error handling decisions based on whether
    the current transaction successfully committed or not. Rename
    xfs_trans_roll(), add a new parameter and provide a wrapper to
    preserve existing callers.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Brian Foster
     

04 Jun, 2015

5 commits

  • Instead of the confusing flags argument pass a boolean flag to indicate if
    we want to release or regrant a log reservation.

    Also ensure that xfs_log_done always drop the reference on the log ticket,
    to both simplify the code and make the logic in xfs_trans_roll easier
    to understand.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • The flags argument to xfs_trans_commit is not useful for most callers, as
    a commit of a transaction without a permanent log reservation must pass
    0 here, and all callers for a transaction with a permanent log reservation
    except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove
    the flags argument from the public xfs_trans_commit interfaces, and
    introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
    that regrants a log reservation instead of releasing it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and
    XFS_TRANS_ABORT. Both of them are a direct product of the transaction
    state, and can be deducted:

    - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled,
    and XFS_TRANS_ABORT is a noop for a transaction that is not dirty.
    - any transaction with a permanent log reservation needs
    XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing
    XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent
    log reservation is invalid.

    So just remove the flags argument and do the right thing.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • The flags value always was 0 or XFS_TRANS_ABORT. Switch to a bool
    parameter to allow further cleanups.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • We have three remaining callers of xfs_trans_dup:

    - xfs_itruncate_extents which open codes xfs_trans_roll
    - xfs_bmap_finish doesn't have an xfs_inode argument and thus leaves
    attaching them to it's callers, but otherwise is identical to
    xfs_trans_roll
    - xfs_dir_ialloc looks at the log reservations in the old xfs_trans
    structure instead of the log reservation parameters, but otherwise
    is identical to xfs_trans_roll.

    By allowing a NULL xfs_inode argument to xfs_trans_roll we can switch
    these three remaining users over to xfs_trans_roll and mark xfs_trans_dup
    static.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

23 Feb, 2015

5 commits

  • Introduce helper functions for modifying fields in the superblock
    into xfs_trans.c, the only caller of xfs_mod_incore_sb_batch(). We
    can then use these directly in xfs_trans_unreserve_and_mod_sb() and
    so remove another user of the xfs_mode_incore_sb() API without
    losing any functionality or scalability of the transaction commit
    code..

    Based on a patch from Christoph Hellwig.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • Add a new helper to modify the incore counter of free realtime
    extents. This matches the helpers used for inode and data block
    counters, and removes a significant users of the xfs_mod_incore_sb()
    interface.

    Based on a patch originally from Christoph Hellwig.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • XFS has hand-rolled per-cpu counters for the superblock since before
    there was any generic implementation. The free block counter is
    special in that it is used for ENOSPC detection outside transaction
    contexts for for delayed allocation. This means that the counter
    needs to be accurate at zero. The current per-cpu counter code jumps
    through lots of hoops to ensure we never run past zero, but we don't
    need to make all those jumps with the generic counter
    implementation.

    The generic counter implementation allows us to pass a "batch"
    threshold at which the addition/subtraction to the counter value
    will be folded back into global value under lock. We can use this
    feature to reduce the batch size as we approach 0 in a very similar
    manner to the existing counters and their rebalance algorithm. If we
    use a batch size of 1 as we approach 0, then every addition and
    subtraction will be done against the global value and hence allow
    accurate detection of zero threshold crossing.

    Hence we can replace the handrolled, accurate-at-zero counters with
    generic percpu counters.

    Note: this removes just enough of the icsb infrastructure to compile
    without warnings. The rest will go in subsequent commits.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • XFS has hand-rolled per-cpu counters for the superblock since before
    there was any generic implementation. The free inode counter is not
    used for any limit enforcement - the per-AG free inode counters are
    used during allocation to determine if there are inode available for
    allocation.

    Hence we don't need any of the complexity of the hand-rolled
    counters and we can simply replace them with generic per-cpu
    counters similar to the inode counter.

    This version introduces a xfs_mod_ifree() helper function from
    Christoph Hellwig.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • XFS has hand-rolled per-cpu counters for the superblock since before
    there was any generic implementation. There are some warts around
    the use of them for the inode counter as the hand rolled counter is
    designed to be accurate at zero, but has no specific accurracy at
    any other value. This design causes problems for the maximum inode
    count threshold enforcement, as there is no trigger that balances
    the counters as they get close tothe maximum threshold.

    Instead of designing new triggers for balancing, just replace the
    handrolled per-cpu counter with a generic counter. This enables us
    to update the counter through the normal superblock modification
    funtions, but rather than do that we add a xfs_mod_icount() helper
    function (from Christoph Hellwig) and keep the percpu counter
    outside the superblock in the struct xfs_mount.

    This means we still need to initialise the per-cpu counter
    specifically when we read the superblock, and vice versa when we
    log/write it, but it does mean that we don't need to change any
    other code.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

22 Jan, 2015

1 commit

  • When the superblock is modified in a transaction, the commonly
    modified fields are not actually copied to the superblock buffer to
    avoid the buffer lock becoming a serialisation point. However, there
    are some other operations that modify the superblock fields within
    the transaction that don't directly log to the superblock but rely
    on the changes to be applied during the transaction commit (to
    minimise the buffer lock hold time).

    When we do this, we fail to mark the buffer log item as being a
    superblock buffer and that can lead to the buffer not being marked
    with the corect type in the log and hence causing recovery issues.
    Fix it by setting the type correctly, similar to xfs_mod_sb()...

    cc: # 3.10 to current
    Tested-by: Jan Kara
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

28 Nov, 2014

2 commits


25 Jun, 2014

1 commit

  • Convert all the errors the core XFs code to negative error signs
    like the rest of the kernel and remove all the sign conversion we
    do in the interface layers.

    Errors for conversion (and comparison) found via searches like:

    $ git grep " E" fs/xfs
    $ git grep "return E" fs/xfs
    $ git grep " E[A-Z].*;$" fs/xfs

    Negation points found via searches like:

    $ git grep "= -[a-z,A-Z]" fs/xfs
    $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
    $ git grep " -[a-z].*;" fs/xfs

    [ with some bits I missed from Brian Foster ]

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

22 Jun, 2014

2 commits

  • XFS_ERROR was designed long ago to trap return values, but it's not
    runtime configurable, it's not consistently used, and we can do
    similar error trapping with ftrace scripts and triggers from
    userspace.

    Just nuke XFS_ERROR and associated bits.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Eric Sandeen
     
  • return is not a function. "return(EIO);" is silly;
    "return (EIO);" moreso. return is not a function.
    Nuke the pointless parens.

    [dchinner: catch a couple of extra cases in xfs_attr_list.c,
    xfs_acl.c and xfs_linux.h.]

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

14 Apr, 2014

1 commit


07 Feb, 2014

1 commit

  • Convert xfs_log_commit_cil() to a void function since it return nothing
    but 0 in any case, after that we can simplify the relative code logic
    in xfs_trans_commit() accordingly.

    Signed-off-by: Jie Liu
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Jie Liu
     

24 Oct, 2013

4 commits

  • Currently the xfs_inode.h header has a dependency on the definition
    of the BMAP btree records as the inode fork includes an array of
    xfs_bmbt_rec_host_t objects in it's definition.

    Move all the btree format definitions from xfs_btree.h,
    xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
    xfs_format.h to continue the process of centralising the on-disk
    format definitions. With this done, the xfs inode definitions are no
    longer dependent on btree header files.

    The enables a massive culling of unnecessary includes, with close to
    200 #include directives removed from the XFS kernel code base.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_trans.h has a dependency on xfs_log.h for a couple of
    structures. Most code that does transactions doesn't need to know
    anything about the log, but this dependency means that they have to
    include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
    files and clean up the includes to be in dependency order.

    In doing this, remove the direct include of xfs_trans_reserve.h from
    xfs_trans.h so that we remove the dependency between xfs_trans.h and
    xfs_mount.h. Hence the xfs_trans.h include can be moved to the
    indicate the actual dependencies other header files have on it.

    Note that these are kernel only header files, so this does not
    translate to any userspace changes at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The on-disk format definitions for the directory and attribute
    structures are spread across 3 header files right now, only one of
    which is dedicated to defining on-disk structures and their
    manipulation (xfs_dir2_format.h). Pull all the format definitions
    into a single header file - xfs_da_format.h - and switch all the
    code over to point at that.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • All of the buffer operations structures are needed to be exported
    for xfs_db, so move them all to a common location rather than
    spreading them all over the place. They are verifying the on-disk
    format, so while xfs_format.h might be a good place, it is not part
    of the on disk format.

    Hence we need to create a new header file that we centralise these
    related definitions. Start by moving the bffer operations
    structures, and then also move all the other definitions that have
    crept into xfs_log_format.h and xfs_format.h as there was no other
    shared header file to put them in.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

31 Aug, 2013

1 commit

  • In optimising the CIL operations, some of the IOP_* macros for
    calling log item operations were removed. Remove the rest of them as
    Christoph requested.

    Signed-off-by: Dave Chinner
    Reviewed-by: Geoffrey Wehrman
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

13 Aug, 2013

4 commits

  • With the new xfs_trans_res structure has been introduced, the log
    reservation size, log count as well as log flags are pre-initialized
    at mount time. So it's time to refine xfs_trans_reserve() interface
    to be more neat.

    Also, introduce a new helper M_RES() to return a pointer to the
    mp->m_resv structure to simplify the input.

    Signed-off-by: Jie Liu
    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Jie Liu
     
  • Introduce a new structure xfs_trans_res to hold transaction
    reservation item info per log ticket.

    We also need to improve xfs_trans_resv_calc() by initializing the
    log count as well as log flags for permanent log reservation.

    Signed-off-by: Jie Liu
    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Jie Liu
     
  • The transaction reservation size calculations is used by both kernel
    and userspace, but most of the transaction code in xfs_trans.c is
    kernel specific. Split all the transaction reservation code out into
    it's own files to make sharing with userspace simpler. This just
    leaves kernel-only definitions in xfs_trans.h, so it doesn't need to
    be shared with userspace anymore, either.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • There's a bunch of definitions in xfs_trans.h that define on-disk
    formats - transaction headers that get written into the log, log
    item type definitions, etc. Split out everything into a separate
    file so that all which remains in xfs_trans.h are kernel only
    definitions.

    Also, remove the duplicate magic number definitions for
    XFS_TRANS_MAGIC...

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner