05 Nov, 2020

1 commit

  • It is possible to expose non-zeroed post-EOF data in XFS if the new
    EOF page is dirty, backed by an unwritten block and the truncate
    happens to race with writeback. iomap_truncate_page() will not zero
    the post-EOF portion of the page if the underlying block is
    unwritten. The subsequent call to truncate_setsize() will, but
    doesn't dirty the page. Therefore, if writeback happens to complete
    after iomap_truncate_page() (so it still sees the unwritten block)
    but before truncate_setsize(), the cached page becomes inconsistent
    with the on-disk block. A mapped read after the associated page is
    reclaimed or invalidated exposes non-zero post-EOF data.

    For example, consider the following sequence when run on a kernel
    modified to explicitly flush the new EOF page within the race
    window:

    $ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file
    $ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file
    ...
    $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
    00000400: 00 00 00 00 00 00 00 00 ........
    $ umount /mnt/; mount /mnt/
    $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
    00000400: cd cd cd cd cd cd cd cd ........

    Update xfs_setattr_size() to explicitly flush the new EOF page prior
    to the page truncate to ensure iomap has the latest state of the
    underlying block.

    Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path")
    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

26 Sep, 2020

1 commit


10 Jun, 2020

1 commit

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

06 Jun, 2020

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "A lot of bug fixes and cleanups for ext4, including:

    - Fix performance problems found in dioread_nolock now that it is the
    default, caused by transaction leaks.

    - Clean up fiemap handling in ext4

    - Clean up and refactor multiple block allocator (mballoc) code

    - Fix a problem with mballoc with a smaller file systems running out
    of blocks because they couldn't properly use blocks that had been
    reserved by inode preallocation.

    - Fixed a race in ext4_sync_parent() versus rename()

    - Simplify the error handling in the extent manipulation code

    - Make sure all metadata I/O errors are felected to
    ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

    - Avoid passing an error pointer to brelse in ext4_xattr_set()

    - Fix race which could result to freeing an inode on the dirty last
    in data=journal mode.

    - Fix refcount handling if ext4_iget() fails

    - Fix a crash in generic/019 caused by a corrupted extent node"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
    ext4: avoid unnecessary transaction starts during writeback
    ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
    ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
    fs: remove the access_ok() check in ioctl_fiemap
    fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
    fs: move fiemap range validation into the file systems instances
    iomap: fix the iomap_fiemap prototype
    fs: move the fiemap definitions out of fs.h
    fs: mark __generic_block_fiemap static
    ext4: remove the call to fiemap_check_flags in ext4_fiemap
    ext4: split _ext4_fiemap
    ext4: fix fiemap size checks for bitmap files
    ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
    add comment for ext4_dir_entry_2 file_type member
    jbd2: avoid leaking transaction credits when unreserving handle
    ext4: drop ext4_journal_free_reserved()
    ext4: mballoc: use lock for checking free blocks while retrying
    ext4: mballoc: refactor ext4_mb_good_group()
    ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
    ext4: mballoc: refactor ext4_mb_discard_preallocations()
    ...

    Linus Torvalds
     

04 Jun, 2020

1 commit


20 May, 2020

1 commit

  • There are there are three extents counters per inode, one for each of
    the forks. Two are in the legacy icdinode and one is directly in
    struct xfs_inode. Switch to a single counter in the xfs_ifork structure
    where it uses up padding at the end of the structure. This simplifies
    various bits of code that just wants the number of extents counter and
    can now directly dereference it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Chandan Babu R
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

05 May, 2020

4 commits

  • The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
    nearly identical. The only difference is that *_to_linux() is called after
    inode setup and disallows changing the DAX flag.

    Combining them can be done with a flag which indicates if this is the initial
    setup to allow the DAX flag to be properly set only at init time.

    So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
    directly.

    While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
    use xfs_ip2xflags() to ensure future diflags are included correctly.

    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Ira Weiny
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     
  • xfs_inode_supports_dax() should reflect if the inode can support DAX not
    that it is enabled for DAX.

    Change the use of xfs_inode_supports_dax() to reflect only if the inode
    and underlying storage support dax.

    Add a new function xfs_inode_should_enable_dax() which reflects if the
    inode should be enabled for DAX.

    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Ira Weiny
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     
  • In prep for the new tri-state mount option which then introduces
    XFS_MOUNT_DAX_NEVER.

    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Ira Weiny
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     
  • The two if statements have same condition, and the mask value
    does not change in xfs_setattr_nonsize(), so combine them.

    Signed-off-by: Kaixu Xia
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Kaixu Xia
     

19 Mar, 2020

1 commit

  • We know the version is 3 if on a v5 file system. For earlier file
    systems formats we always upgrade the remaining v1 inodes to v2 and
    thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode
    helper to check if we deal with small or large dinodes, and thus
    remove the need for the di_version field in struct icdinode.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

03 Mar, 2020

4 commits

  • The ATTR_* flags have a long IRIX history, where they a userspace
    interface, the on-disk format and an internal interface. We've split
    out the on-disk interface to the XFS_ATTR_* values, but despite (or
    because?) of that the flag have still been a mess. Switch the
    internal interface to pass the on-disk XFS_ATTR_* flags for the
    namespace and the Linux XATTR_* flags for the actual flags instead.
    The ATTR_* values that are actually used are move to xfs_fs.h with a
    new XFS_IOC_* prefix to not conflict with the userspace version that
    has the same name and must have the same value.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Instead of converting from one style of arguments to another in
    xfs_attr_set, pass the structure from higher up in the call chain.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Use the Linux inode i_uid/i_gid members everywhere and just convert
    from/to the scalar value when reading or writing the on-disk inode.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Instead of only synchronizing the uid/gid values in xfs_setup_inode,
    ensure that they always match to prepare for removing the icdinode
    fields.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

10 Jan, 2020

1 commit

  • This helps to pre-simplify the extra handling of the null terminator in
    delayed operations which use memcpy rather than strlen. Later
    when we introduce parent pointers, attribute names will become binary,
    so strlen will not work at all. Removing uses of strlen now will
    help reduce complexities later

    Signed-off-by: Allison Collins
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Allison Henderson
     

14 Nov, 2019

3 commits


11 Nov, 2019

2 commits


05 Nov, 2019

1 commit


30 Oct, 2019

5 commits


28 Oct, 2019

1 commit

  • Add a new xfs_inode_buftarg helper that gets the data I/O buftarg for a
    given inode. Replace the existing xfs_find_bdev_for_inode and
    xfs_find_daxdev_for_inode helpers with this new general one and cleanup
    some of the callers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

22 Oct, 2019

2 commits


23 Aug, 2019

1 commit

  • Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
    fails on account of being out of disk quota. I ran his reproducer
    script:

    # adduser dummy
    # adduser dummy plugdev

    # dd if=/dev/zero bs=1M count=100 of=test.img
    # mkfs.xfs test.img
    # mount -t xfs -o gquota test.img /mnt
    # mkdir -p /mnt/dummy
    # chown -c dummy /mnt/dummy
    # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt

    (and then as user dummy)

    $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
    $ chgrp plugdev /mnt/dummy/foo

    and saw:

    ================================================
    WARNING: lock held when returning to user space!
    5.3.0-rc5 #rc5 Tainted: G W
    ------------------------------------------------
    chgrp/47006 is leaving the kernel with locks still held!
    1 lock held by chgrp/47006:
    #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]

    ...which is clearly caused by xfs_setattr_nonsize failing to unlock the
    ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing
    unlock.

    Reported-by: benjamin.moody@gmail.com
    Fixes: 253f4911f297 ("xfs: better xfs_trans_alloc interface")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner
    Tested-by: Salvatore Bonaccorso

    Darrick J. Wong
     

29 Jun, 2019

1 commit

  • There are many, many xfs header files which are included but
    unneeded (or included twice) in the xfs code, so remove them.

    nb: xfs_linux.h includes about 9 headers for everyone, so those
    explicit includes get removed by this. I'm not sure what the
    preference is, but if we wanted explicit includes everywhere,
    a followup patch could remove those xfs_*.h includes from
    xfs_linux.h and move them into the files that need them.
    Or it could be left as-is.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     

02 Mar, 2019

1 commit

  • statx(2) notes that any attribute that is not indicated as supported by
    stx_attributes_mask has no usable value. Commit 5f955f26f3d42d ("xfs: report
    crtime and attribute flags to statx") added support for informing userspace
    of extra file attributes but forgot to list these flags as supported
    making reporting them rather useless for the pedantic userspace author.

    $ git describe --contains 5f955f26f3d42d04aba65590a32eb70eedb7f37d
    v4.11-rc6~5^2^2~2

    Fixes: 5f955f26f3d42d ("xfs: report crtime and attribute flags to statx")
    Signed-off-by: Luis R. Rodriguez
    Reviewed-by: Darrick J. Wong
    [darrick: add a comment reminding people to keep attributes_mask up to date]
    Signed-off-by: Darrick J. Wong

    Luis R. Rodriguez
     

15 Feb, 2019

1 commit

  • When XFS creates an O_TMPFILE file, the inode is created with nlink = 1,
    put on the unlinked list, and then the VFS sets nlink = 0 in d_tmpfile.
    If we crash before anything logs the inode (it's dirty incore but the
    vfs doesn't tell us it's dirty so we never log that change), the iunlink
    processing part of recovery will then explode with a pile of:

    XFS: Assertion failed: VFS_I(ip)->i_nlink == 0, file:
    fs/xfs/xfs_log_recover.c, line: 5072

    Worse yet, since nlink is nonzero, the inodes also don't get cleaned up
    and they just leak until the next xfs_repair run.

    Therefore, change xfs_iunlink to require that inodes being put on the
    unlinked list have nlink == 0, change the tmpfile callers to instantiate
    nodes that way, and set the nlink to 1 just prior to calling d_tmpfile.
    Fix the comment for xfs_iunlink while we're at it.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

29 Sep, 2018

1 commit

  • The VFS routine that calls ->get_link blindly copies whatever's returned
    into the user's buffer. If we return a NULL pointer, the vfs will
    crash on the null pointer. Therefore, return -EFSCORRUPTED instead of
    blowing up the kernel.

    [dgc: clean up with hch's suggestions]

    Reported-by: wen.xu@gatech.edu
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Allison Henderson
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     

14 Aug, 2018

1 commit

  • Pull xfs updates from Darrick Wong:
    "This is the second part of the XFS changes for 4.19.

    The biggest changes are the removal of buffer heads frm XFS, a massive
    reworking of the deferred transaction operations handling code, the
    removal of the long defunct barrier/nobarrier mount options, and the
    addition of a few more online repair functions.

    Summary:

    - Use extent maps to track pagecache page status instead of
    bufferhead state.

    - Refactor pagecache read and write paths to use the new iomap
    library functions, which enable us to drop the old bufferhead code
    for pagesize == blocksize filesystems.

    - Set up parallel per-block-per-page metadata to track subpage
    information that was tracked by buffer heads, which enables us to
    drop the old bufferhead code for pagesize > blocksize filesystems.

    - Tie a deferred ops control structure to a transaction so that we
    can take advantage of an upper-level dfops without having to plumb
    pointer passing through the code.

    - Refactor the deferred ops code to track deferred ops as part of the
    transaction structure (instead of as a separate data structure) so
    that we can simplify the scoping rules around defer_ops.

    - Refactor twisty delwri buffer submission code to avoid deadlocks.

    - Shorten and fix indenting problems in the scrub code.

    - Detect obviously bad summary counts at mount and fix them.

    - Directly associate deferred ops control structure with a
    transaction so that callers no longer have to manage it themselves.

    - Remove a couple of IRIX-era inode macros.

    - Remove the long-deprecated 'barrier' and 'nobarrier' mount options.

    - Clean up the inode fork structure a bit.

    - Check for bad fs summary counter values in the superblock.

    - Reduce COW fork lookups during writeback.

    - Refactor the deferred ops control structures into the transaction
    structure, thereby eliminating the need for transaction users to
    handle the deferred ops as a separate data structure.

    - Add the ability to repair AG headers online.

    - Fix a crash due to insufficient return value checking.

    - Various fixes and cleanups"

    * tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (155 commits)
    xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree
    xfs: remove b_last_holder & associated macros
    iomap: Switch to offset_in_page for clarity
    xfs: Close race between direct IO and xfs_break_layouts()
    xfs: repair the AGI
    xfs: repair the AGFL
    xfs: repair the AGF
    xfs: remove dead error handling code in xfs_dquot_disk_alloc()
    xfs: use WRITE_ONCE to update if_seq
    xfs: fix a comment in xfs_log_reserve
    xfs: only validate summary counts on primary superblock
    xfs: substitute spaces with tabs
    xfs: fold dfops into the transaction
    xfs: always defer agfl block frees
    xfs: pass transaction to xfs_defer_add()
    xfs: replace xfs_defer_ops ->dop_pending with on-stack list
    xfs: cancel dfops on xfs_defer_finish() error
    xfs: clean out superfluous dfops dop params/vars
    xfs: drop dop param from xfs_defer_op_type ->finish_item() callback
    xfs: automatic dfops inode relogging
    ...

    Linus Torvalds
     

04 Aug, 2018

1 commit


27 Jul, 2018

3 commits

  • Replace the IRELE macro with a proper function so that we can do proper
    typechecking and so that we can stop open-coding iput in scrub, which
    means that we'll be able to ftrace inode lifetimes going through scrub
    correctly.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Carlos Maiolino
    Reviewed-by: Brian Foster

    Darrick J. Wong
     
  • At this point, the transaction subsystem completely manages deferred
    items internally such that the common and boilerplate
    xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() ->
    xfs_trans_commit() sequence can be replaced with a simple
    transaction allocation and commit.

    Remove all such boilerplate deferred ops code. In doing so, we
    change each case over to use the dfops in the transaction and
    specifically eliminate:

    - The on-stack dfops and associated xfs_defer_init() call, as the
    internal dfops is initialized on transaction allocation.
    - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of
    a transaction.
    - xfs_defer_cancel() calls in error handlers that precede a
    transaction cancel.

    The only deferred ops calls that remain are those that are
    non-deterministic with respect to the final commit of the associated
    transaction or are open-coded due to special handling.

    Signed-off-by: Brian Foster
    Reviewed-by: Bill O'Donnell
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • xfs_itruncate_extents[_flags]() uses a local dfops with a
    transaction provided by the caller. It uses hacky ->t_dfops
    replacement logic to avoid stomping over an already populated
    ->t_dfops.

    The latter never occurs for current callers and the logic itself is
    not really appropriate. Clean this up by updating all callers to
    initialize a dfops and to use that down in xfs_itruncate_extents().
    This more closely resembles the upcoming logic where dfops will be
    embedded within the transaction. We can also replace the
    xfs_defer_init() in the xfs_itruncate_extents_flags() loop with an
    assert. Both dfops and firstblock should be in a valid state
    after xfs_defer_finish() and the inode joined to the dfops is fixed
    throughout the loop.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bill O'Donnell
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster