07 Oct, 2020

1 commit

  • Refactor xfs_getfsmap to improve its performance: instead of indirectly
    calling a function that copies one record to userspace at a time, create
    a shadow buffer in the kernel and copy the whole array once at the end.
    On the author's computer, this reduces the runtime on his /home by ~20%.

    This also eliminates a deadlock when running GETFSMAP against the
    realtime device. The current code locks the rtbitmap to create
    fsmappings and copies them into userspace, having not released the
    rtbitmap lock. If the userspace buffer is an mmap of a sparse file that
    itself resides on the realtime device, the write page fault will recurse
    into the fs for allocation, which will deadlock on the rtbitmap lock.

    Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Chandan Babu R

    Darrick J. Wong
     

16 Sep, 2020

2 commits

  • This patch aims to replace kmem_zalloc_large() with global kernel memory
    API. So, all its callers are now using kvzalloc() directly, so kmalloc()
    fallsback to vmalloc() automatically.

    Signed-off-by: Carlos Maiolino
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Carlos Maiolino
     
  • Redesign the ondisk inode timestamps to be a simple unsigned 64-bit
    counter of nanoseconds since 14 Dec 1901 (i.e. the minimum time in the
    32-bit unix time epoch). This enables us to handle dates up to 2486,
    which solves the y2038 problem.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Gao Xiang
    Reviewed-by: Dave Chinner

    Darrick J. Wong
     

29 Jul, 2020

1 commit


12 Jun, 2020

1 commit

  • Pull DAX updates part three from Darrick Wong:
    "Now that the xfs changes have landed, this third piece changes the
    FS_XFLAG_DAX ioctl code in xfs to request that the inode be reloaded
    after the last program closes the file, if doing so would make a S_DAX
    change happen. The goal here is to make dax access mode switching
    quicker when possible.

    Summary:

    - Teach XFS to ask the VFS to drop an inode if the administrator
    changes the FS_XFLAG_DAX inode flag such that the S_DAX state would
    change. This can result in files changing access modes without
    requiring an unmount cycle"

    * tag 'vfs-5.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
    fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
    fs/xfs: Create function xfs_inode_should_enable_dax()
    fs/xfs: Make DAX mount option a tri-state
    fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
    fs/xfs: Remove unnecessary initialization of i_rwsem

    Linus Torvalds
     

30 May, 2020

2 commits

  • Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
    setting of S_DAX, data invalidation no longer needs to happen when
    FS_XFLAG_DAX is changed.

    Change xfs_ioctl_setattr_dax_invalidate() to be
    xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
    functionality.

    Furthermore, we no longer need the locking so we remove the join_flags
    logic.

    Signed-off-by: Ira Weiny
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     
  • The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
    nearly identical. The only difference is that *_to_linux() is called after
    inode setup and disallows changing the DAX flag.

    Combining them can be done with a flag which indicates if this is the initial
    setup to allow the DAX flag to be properly set only at init time.

    So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
    directly.

    While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
    use xfs_ip2xflags() to ensure future diflags are included correctly.

    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Ira Weiny
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     

27 May, 2020

1 commit


20 May, 2020

1 commit

  • There are there are three extents counters per inode, one for each of
    the forks. Two are in the legacy icdinode and one is directly in
    struct xfs_inode. Switch to a single counter in the xfs_ifork structure
    where it uses up padding at the end of the structure. This simplifies
    various bits of code that just wants the number of extents counter and
    can now directly dereference it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Chandan Babu R
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

05 May, 2020

2 commits

  • The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
    nearly identical. The only difference is that *_to_linux() is called after
    inode setup and disallows changing the DAX flag.

    Combining them can be done with a flag which indicates if this is the initial
    setup to allow the DAX flag to be properly set only at init time.

    So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
    directly.

    While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
    use xfs_ip2xflags() to ensure future diflags are included correctly.

    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Ira Weiny
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     
  • The initial value of variable udqp is NULL, and we only set the
    flag XFS_QMOPT_PQUOTA in xfs_qm_vop_dqalloc() function, so only
    the pdqp value is initialized and the udqp value is still NULL.
    Since the udqp value is NULL in the rest part of xfs_ioctl_setattr()
    function, it is meaningless and do nothing. So remove it from
    xfs_ioctl_setattr().

    Signed-off-by: Kaixu Xia
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Kaixu Xia
     

13 Apr, 2020

1 commit

  • The filesystem freeze sequence in XFS waits on any background
    eofblocks or cowblocks scans to complete before the filesystem is
    quiesced. At this point, the freezer has already stopped the
    transaction subsystem, however, which means a truncate or cowblock
    cancellation in progress is likely blocked in transaction
    allocation. This results in a deadlock between freeze and the
    associated scanner.

    Fix this problem by holding superblock write protection across calls
    into the block reapers. Since protection for background scans is
    acquired from the workqueue task context, trylock to avoid a similar
    deadlock between freeze and blocking on the write lock.

    Fixes: d6b636ebb1c9f ("xfs: halt auto-reclamation activities while rebuilding rmap")
    Reported-by: Paul Furtado
    Signed-off-by: Brian Foster
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Allison Collins
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

19 Mar, 2020

2 commits

  • We know the version is 3 if on a v5 file system. For earlier file
    systems formats we always upgrade the remaining v1 inodes to v2 and
    thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode
    helper to check if we deal with small or large dinodes, and thus
    remove the need for the di_version field in struct icdinode.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Only v5 file systems can have the reflink feature, and those will
    always use the large dinode format. Remove the extra check for the
    inode version.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

03 Mar, 2020

19 commits

  • Let the low-level attr code only allocate the needed buffer size
    for xfs_attrmulti_attr_get instead of allocating the upper bound
    at the top of the call chain.

    Suggested-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Use the round_down macro, and use the size of the uint32 type we
    use in the callback that fills the buffer to make the code a little
    more clear - the size of it is always the same as int for platforms
    that Linux runs on.

    Suggested-by: Dave Chinner
    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The attrlist cursor only exists as part of an attr list context, so
    embedd the structure instead of pointing to it. Also give it a proper
    xfs_ prefix and remove the obsolete typedef.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The ATTR_* flags have a long IRIX history, where they a userspace
    interface, the on-disk format and an internal interface. We've split
    out the on-disk interface to the XFS_ATTR_* values, but despite (or
    because?) of that the flag have still been a mess. Switch the
    internal interface to pass the on-disk XFS_ATTR_* flags for the
    namespace and the Linux XATTR_* flags for the actual flags instead.
    The ATTR_* values that are actually used are move to xfs_fs.h with a
    new XFS_IOC_* prefix to not conflict with the userspace version that
    has the same name and must have the same value.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Move the function to xfs_acl.c and provide a proper stub for the
    !CONFIG_XFS_POSIX_ACL case. Lift the flags check to the caller as it
    nicely fits in there.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Lift the common code to copy the cursor from and to user space into
    xfs_ioc_attr_list. Note that this means we copy in twice now as
    the cursor is in the middle of the conaining structure, but we never
    touch the memory for the original copy. Doing so keeps the cursor
    handling isolated in the common helper.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Lift the buffer allocation from the two callers into xfs_ioc_attr_list.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Lift the flags and bufsize checks from both callers into the common code
    in xfs_ioc_attr_list.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The version taking the context structure is the main interface to list
    attributes, so drop the _int postfix.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The old xfs_attr_list code is only used by the attrlist by handle
    ioctl. Move it to xfs_ioctl.c with its user. Also move the
    attrlist and attrlist_ent structure to xfs_fs.h, as they are exposed
    user ABIs. They are used through libattr headers with the same name
    by at least xfsdump. Also document this relation so that it doesn't
    require a research project to figure out.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • op_flags with the XFS_DA_OP_* flags is the usual place for in-kernel
    only flags, so move the notime flag there.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Instead of converting from one style of arguments to another in
    xfs_attr_set, pass the structure from higher up in the call chain.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Instead of converting from one style of arguments to another in
    xfs_attr_set, pass the structure from higher up in the call chain.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Add a new helper to handle a single attr multi ioctl operation that
    can be shared between the native and compat ioctl implementation.

    There is a slight change in behaviour in that we don't break out of the
    loop when copying in the attribute name fails. The previous behaviour
    was rather inconsistent here as it continued for any other kind of
    error, and that we don't clear the flags in the structure returned
    to userspace, a behavior only introduced as a bug fix in the last
    merge window.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Simplify the user copy code by using strndup_user. This means that we
    now do one memory allocation per operation instead of one per ioctl,
    but memory allocations are cheap compared to the actual file system
    operations. Also the error for an invalid path is now EINVAL or EFAULT
    instead of the previous odd and undocumented ERANGE.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Merge the ioctl handlers just like the low-level xfs_attr_set function.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • The Linux xattr and acl APIs use a single call for set and remove.
    Modify the high-level XFS API to match that and let xfs_attr_set handle
    removing attributes as well. With a little bit of reordering this
    removes a lot of code.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • While the flags field in the ABI and the on-disk format allows for
    multiple namespace flags, an attribute can only exist in a single
    namespace at a time. Hence asking to list attributes that exist
    in multiple namespaces simultaneously is a logically invalid
    request and will return no results. Reject this case early with
    -EINVAL.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Use the Linux inode i_uid/i_gid members everywhere and just convert
    from/to the scalar value when reading or writing the on-disk inode.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

10 Jan, 2020

3 commits


14 Nov, 2019

2 commits


08 Nov, 2019

1 commit

  • Some of the xfs source files are missing header includes, so add them
    back. Sparse complains about non-static functions that don't have a
    forward declaration anywhere.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

01 Nov, 2019

1 commit

  • AIO+DIO can extend the file size on IO completion, and it holds
    no inode locks while the IO is in flight. Therefore, a race
    condition exists in file size updates if we do something like this:

    aio-thread fallocate-thread

    lock inode
    submit IO beyond inode->i_size
    unlock inode
    .....
    lock inode
    break layouts
    if (off + len > inode->i_size)
    new_size = off + len
    .....
    inode_dio_wait()

    .....
    completes
    inode->i_size updated
    inode_dio_done()
    ....


    if (new_size)
    xfs_vn_setattr(inode, new_size)

    Yup, that attempt to extend the file size in the fallocate code
    turns into a truncate - it removes the whatever the aio write
    allocated and put to disk, and reduced the inode size back down to
    where the fallocate operation ends.

    Fundamentally, xfs_file_fallocate() not compatible with racing
    AIO+DIO completions, so we need to move the inode_dio_wait() call
    up to where the lock the inode and break the layouts.

    Secondly, storing the inode size and then using it unchecked without
    holding the ILOCK is not safe; we can only do such a thing if we've
    locked out and drained all IO and other modification operations,
    which we don't do initially in xfs_file_fallocate.

    It should be noted that some of the fallocate operations are
    compound operations - they are made up of multiple manipulations
    that may zero data, and so we may need to flush and invalidate the
    file multiple times during an operation. However, we only need to
    lock out IO and other space manipulation operations once, as that
    lockout is maintained until the entire fallocate operation has been
    completed.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner