26 Sep, 2020

1 commit

  • Since commit 1c1c6ebcf52 ("xfs: Replace per-ag array with a radix
    tree"), there is no m_peraglock anymore, so it's hard to understand
    the described situation since per-ag is no longer an array and no
    need to reallocate, call xfs_filestream_flush() in growfs.

    In addition, the race condition for shrink feature is quite confusing
    to me currently as well. Get rid of it instead.

    Signed-off-by: Gao Xiang
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Gao Xiang
     

27 Jan, 2020

1 commit


14 Nov, 2019

1 commit


04 Nov, 2019

1 commit

  • Always set XFS_ALLOC_USERDATA for data fork allocations, and check it
    in xfs_alloc_is_userdata instead of the current obsfucated check.
    Also remove the xfs_alloc_is_userdata and xfs_alloc_allow_busy_reuse
    helpers to make the code a little easier to understand.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

29 Jun, 2019

2 commits

  • There are many, many xfs header files which are included but
    unneeded (or included twice) in the xfs code, so remove them.

    nb: xfs_linux.h includes about 9 headers for everyone, so those
    explicit includes get removed by this. I'm not sure what the
    preference is, but if we wanted explicit includes everywhere,
    a followup patch could remove those xfs_*.h includes from
    xfs_linux.h and move them into the files that need them.
    Or it could be left as-is.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • The inode geometry structure isn't related to ondisk format; it's
    support for the mount structure. Move it to xfs_shared.h.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

03 Aug, 2018

1 commit

  • The dop_low field enables the low free space allocation mode when a
    previous allocation has detected difficulty allocating blocks. It
    has historically been part of the xfs_defer_ops structure, which
    means if enabled, it remains enabled across a set of transactions
    until the deferred operations have completed and the dfops is reset.

    Now that the dfops is embedded in the transaction, we can save a bit
    more space by using a transaction flag rather than a standalone
    boolean. Drop the ->dop_low field and replace it with a transaction
    flag that is set at the same points, carried across rolling
    transactions and cleared on completion of deferred operations. This
    essentially emulates the behavior of ->dop_low and so should not
    change behavior.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

27 Jul, 2018

1 commit

  • Replace the IRELE macro with a proper function so that we can do proper
    typechecking and so that we can stop open-coding iput in scrub, which
    means that we'll be able to ftrace inode lifetimes going through scrub
    correctly.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Carlos Maiolino
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

12 Jul, 2018

1 commit

  • Now that bma.dfops is only assigned from ->t_dfops, replace all
    accesses to the former with the latter and remove the unnecessary
    field. This patch does not change behavior.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

07 Jun, 2018

1 commit

  • Remove the verbose license text from XFS files and replace them
    with SPDX tags. This does not change the license of any of the code,
    merely refers to the common, up-to-date license files in LICENSES/

    This change was mostly scripted. fs/xfs/Makefile and
    fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
    and modified by the following command:

    for f in `git grep -l "GNU General" fs/xfs/` ; do
    echo $f
    cat $f | awk -f hdr.awk > $f.new
    mv -f $f.new $f
    done

    And the hdr.awk script that did the modification (including
    detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
    is as follows:

    $ cat hdr.awk
    BEGIN {
    hdr = 1.0
    tag = "GPL-2.0"
    str = ""
    }

    /^ \* This program is free software/ {
    hdr = 2.0;
    next
    }

    /any later version./ {
    tag = "GPL-2.0+"
    next
    }

    /^ \*\// {
    if (hdr > 0.0) {
    print "// SPDX-License-Identifier: " tag
    print str
    print $0
    str=""
    hdr = 0.0
    next
    }
    print $0
    next
    }

    /^ \* / {
    if (hdr > 1.0)
    next
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    next
    }

    /^ \*/ {
    if (hdr > 0.0)
    next
    print $0
    next
    }

    // {
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    }

    END { }
    $

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

10 Apr, 2018

2 commits

  • Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • The filestreams allocator stores an xfs_fstrm_item structure in the MRU to
    cache inode number to agno mappings for a particular length of time. Each
    xfs_fstrm_item contains the internal MRU structure, an inode pointer and
    agno value.

    The inode pointer stored in the xfs_fstrm_item is not referenced, however,
    which means the inode itself can be removed and reclaimed before the MRU
    item is freed. If this occurs, xfs_fstrm_free_func() can access freed or
    unrelated memory through xfs_fstrm_item->ip and crash.

    The obvious solution is to grab an inode reference for xfs_fstrm_item.
    The filestream mechanism only actually uses the inode pointer as a means
    to access the xfs_mount, however. Rather than add unnecessary
    complexity, simplify the implementation to store an xfs_mount pointer in
    struct xfs_mru_cache, and pass it to the free callback. This also
    requires updates to the tracepoint class to provide the associated data
    via parameters rather than the inode and a minor hack to peek at the MRU
    key to establish the inode number at free time.

    Based on debugging work and an earlier patch from Brian Foster, who
    also wrote most of this changelog.

    Reported-by: Brian Foster
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

03 Oct, 2016

1 commit


26 Sep, 2016

1 commit

  • When adding a new remote attribute, we write the attribute to the
    new extent before the allocation transaction is committed. This
    means we cannot reuse busy extents as that violates crash
    consistency semantics. Hence we currently treat remote attribute
    extent allocation like userdata because it has the same overwrite
    ordering constraints as userdata.

    Unfortunately, this also allows the allocator to incorrectly apply
    extent size hints to the remote attribute extent allocation. This
    results in interesting failures, such as transaction block
    reservation overruns and in-memory inode attribute fork corruption.

    To fix this, we need to separate the busy extent reuse configuration
    from the userdata configuration. This changes the definition of
    XFS_BMAPI_METADATA slightly - it now means that allocation is
    metadata and reuse of busy extents is acceptible due to the metadata
    ordering semantics of the journal. If this flag is not set, it
    means the allocation is that has unordered data writeback, and hence
    busy extent reuse is not allowed. It no longer implies the
    allocation is for user data, just that the data write will not be
    strictly ordered. This matches the semantics for both user data
    and remote attribute block allocation.

    As such, This patch changes the "userdata" field to a "datatype"
    field, and adds a "no busy reuse" flag to the field.
    When we detect an unordered data extent allocation, we immediately set
    the no reuse flag. We then set the "user data" flags based on the
    inode fork we are allocating the extent to. Hence we only set
    userdata flags on data fork allocations now and consider attribute
    fork remote extents to be an unordered metadata extent.

    The result is that remote attribute extents now have the expected
    allocation semantics, and the data fork allocation behaviour is
    completely unchanged.

    It should be noted that there may be other ways to fix this (e.g.
    use ordered metadata buffers for the remote attribute extent data
    write) but they are more invasive and difficult to validate both
    from a design and implementation POV. Hence this patch takes the
    simple, obvious route to fixing the problem...

    Reported-and-tested-by: Ross Zwisler
    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

19 Sep, 2016

1 commit

  • One unfortunate quirk of the reference count and reverse mapping
    btrees -- they can expand in size when blocks are written to *other*
    allocation groups if, say, one large extent becomes a lot of tiny
    extents. Since we don't want to start throwing errors in the middle
    of CoWing, we need to reserve some blocks to handle future expansion.
    The transaction block reservation counters aren't sufficient here
    because we have to have a reserve of blocks in every AG, not just
    somewhere in the filesystem.

    Therefore, create two per-AG block reservation pools. One feeds the
    AGFL so that rmapbt expansion always succeeds, and the other feeds all
    other metadata so that refcountbt expansion never fails.

    Use the count of how many reserved blocks we need to have on hand to
    create a virtual reservation in the AG. Through selective clamping of
    the maximum length of allocation requests and of the length of the
    longest free extent, we can make it look like there's less free space
    in the AG unless the reservation owner is asking for blocks.

    In other words, play some accounting tricks in-core to make sure that
    we always have blocks available. On the plus side, there's nothing to
    clean up if we crash, which is contrast to the strategy that the rough
    draft used (actually removing extents from the freespace btrees).

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     

03 Aug, 2016

2 commits


09 Feb, 2016

1 commit

  • Move the di_mode value from the xfs_icdinode to the VFS inode, reducing
    the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole
    in the structure.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

22 Jun, 2015

2 commits

  • We no longer calculate the minimum freelist size from the on-disk
    AGF, so we don't need the macros used for this. That means the
    nested macros can be cleaned up, and turn this into an actual
    function so the logic is clear and concise. This will make it much
    easier to add support for the rmap btree when the time comes.

    This also gets rid of the XFS_AG_MAXLEVELS macro used by these
    freelist macros as it is simply a wrapper around a single variable.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based
    access and agf buffer based access to freelist and space usage
    information. However, once the AGF buffer is locked inside this
    function, it is guaranteed that both the in-memory and on-disk
    values are identical. xfs_alloc_fix_freelist() doesn't modify the
    values in the structures directly, so it is a read-only user of the
    infomration, and hence can use the per-ag structure exclusively for
    determining what it should do.

    This opens up an avenue for cleaning up a lot of duplicated logic
    whose only difference is the structure it gets the data from, and in
    doing so removes a lot of needless byte swapping overhead when
    fixing up the free list.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

16 Apr, 2015

1 commit


25 Mar, 2015

1 commit


28 Nov, 2014

3 commits


25 Jun, 2014

1 commit

  • Convert all the errors the core XFs code to negative error signs
    like the rest of the kernel and remove all the sign conversion we
    do in the interface layers.

    Errors for conversion (and comparison) found via searches like:

    $ git grep " E" fs/xfs
    $ git grep "return E" fs/xfs
    $ git grep " E[A-Z].*;$" fs/xfs

    Negation points found via searches like:

    $ git grep "= -[a-z,A-Z]" fs/xfs
    $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
    $ git grep " -[a-z].*;" fs/xfs

    [ with some bits I missed from Brian Foster ]

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

23 Apr, 2014

6 commits


24 Oct, 2013

2 commits

  • Currently the xfs_inode.h header has a dependency on the definition
    of the BMAP btree records as the inode fork includes an array of
    xfs_bmbt_rec_host_t objects in it's definition.

    Move all the btree format definitions from xfs_btree.h,
    xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
    xfs_format.h to continue the process of centralising the on-disk
    format definitions. With this done, the xfs inode definitions are no
    longer dependent on btree header files.

    The enables a massive culling of unnecessary includes, with close to
    200 #include directives removed from the XFS kernel code base.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_trans.h has a dependency on xfs_log.h for a couple of
    structures. Most code that does transactions doesn't need to know
    anything about the log, but this dependency means that they have to
    include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
    files and clean up the includes to be in dependency order.

    In doing this, remove the direct include of xfs_trans_reserve.h from
    xfs_trans.h so that we remove the dependency between xfs_trans.h and
    xfs_mount.h. Hence the xfs_trans.h include can be moved to the
    indicate the actual dependencies other header files have on it.

    Note that these are kernel only header files, so this does not
    translate to any userspace changes at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     

13 Aug, 2013

3 commits

  • There are a few small helper functions in xfs_util, all related to
    xfs_inode modifications. Move them all to xfs_inode.c so all
    xfs_inode operations are consiolidated in the one place.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • There is a bunch of code in xfs_bmap.c that is kernel specific and
    not shared with userspace. To minimise the difference between the
    kernel and userspace code, shift this unshared code to
    xfs_bmap_util.c, and the declarations to xfs_bmap_util.h.

    The biggest issue here is xfs_bmap_finish() - userspace has it's own
    definition of this function, and so we need to move it out of
    xfs_bmap.[ch]. This means several other files need to include
    xfs_bmap_util.h as well.

    It also introduces and interesting dance for the stack switching
    code in xfs_bmapi_allocate(). The stack switching/workqueue code is
    actually moved to xfs_bmap_util.c, so that userspace can simply use
    a #define in a header file to connect the dots without needing to
    know about the stack switch code at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The log item format definitions are shared with userspace. Split
    them out of header files that contain kernel only defintions to make
    it simple to shared them.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

12 Oct, 2011

2 commits