05 Nov, 2020
1 commit
-
It is possible to expose non-zeroed post-EOF data in XFS if the new
EOF page is dirty, backed by an unwritten block and the truncate
happens to race with writeback. iomap_truncate_page() will not zero
the post-EOF portion of the page if the underlying block is
unwritten. The subsequent call to truncate_setsize() will, but
doesn't dirty the page. Therefore, if writeback happens to complete
after iomap_truncate_page() (so it still sees the unwritten block)
but before truncate_setsize(), the cached page becomes inconsistent
with the on-disk block. A mapped read after the associated page is
reclaimed or invalidated exposes non-zero post-EOF data.For example, consider the following sequence when run on a kernel
modified to explicitly flush the new EOF page within the race
window:$ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file
$ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file
...
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: 00 00 00 00 00 00 00 00 ........
$ umount /mnt/; mount /mnt/
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: cd cd cd cd cd cd cd cd ........Update xfs_setattr_size() to explicitly flush the new EOF page prior
to the page truncate to ensure iomap has the latest state of the
underlying block.Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path")
Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
26 Sep, 2020
1 commit
-
The current create and mkdir handlers both call the xfs_vn_mknod()
which is a wrapper routine around xfs_generic_create() function.
Actually the create and mkdir handlers can directly call
xfs_generic_create() function and reduce the call chain.Signed-off-by: Kaixu Xia
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
10 Jun, 2020
1 commit
-
Convert comments that reference mmap_sem to reference mmap_lock instead.
[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds
06 Jun, 2020
1 commit
-
Pull ext4 updates from Ted Ts'o:
"A lot of bug fixes and cleanups for ext4, including:- Fix performance problems found in dioread_nolock now that it is the
default, caused by transaction leaks.- Clean up fiemap handling in ext4
- Clean up and refactor multiple block allocator (mballoc) code
- Fix a problem with mballoc with a smaller file systems running out
of blocks because they couldn't properly use blocks that had been
reserved by inode preallocation.- Fixed a race in ext4_sync_parent() versus rename()
- Simplify the error handling in the extent manipulation code
- Make sure all metadata I/O errors are felected to
ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.- Avoid passing an error pointer to brelse in ext4_xattr_set()
- Fix race which could result to freeing an inode on the dirty last
in data=journal mode.- Fix refcount handling if ext4_iget() fails
- Fix a crash in generic/019 caused by a corrupted extent node"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: avoid unnecessary transaction starts during writeback
ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
fs: remove the access_ok() check in ioctl_fiemap
fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
fs: move fiemap range validation into the file systems instances
iomap: fix the iomap_fiemap prototype
fs: move the fiemap definitions out of fs.h
fs: mark __generic_block_fiemap static
ext4: remove the call to fiemap_check_flags in ext4_fiemap
ext4: split _ext4_fiemap
ext4: fix fiemap size checks for bitmap files
ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
add comment for ext4_dir_entry_2 file_type member
jbd2: avoid leaking transaction credits when unreserving handle
ext4: drop ext4_journal_free_reserved()
ext4: mballoc: use lock for checking free blocks while retrying
ext4: mballoc: refactor ext4_mb_good_group()
ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
ext4: mballoc: refactor ext4_mb_discard_preallocations()
...
04 Jun, 2020
1 commit
-
No need to pull the fiemap definitions into almost every file in the
kernel build.Signed-off-by: Christoph Hellwig
Reviewed-by: Ritesh Harjani
Reviewed-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o
20 May, 2020
1 commit
-
There are there are three extents counters per inode, one for each of
the forks. Two are in the legacy icdinode and one is directly in
struct xfs_inode. Switch to a single counter in the xfs_ifork structure
where it uses up padding at the end of the structure. This simplifies
various bits of code that just wants the number of extents counter and
can now directly dereference it.Signed-off-by: Christoph Hellwig
Reviewed-by: Chandan Babu R
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
05 May, 2020
4 commits
-
The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical. The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Ira Weiny
Signed-off-by: Darrick J. Wong -
xfs_inode_supports_dax() should reflect if the inode can support DAX not
that it is enabled for DAX.Change the use of xfs_inode_supports_dax() to reflect only if the inode
and underlying storage support dax.Add a new function xfs_inode_should_enable_dax() which reflects if the
inode should be enabled for DAX.Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Ira Weiny
Signed-off-by: Darrick J. Wong -
In prep for the new tri-state mount option which then introduces
XFS_MOUNT_DAX_NEVER.Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Ira Weiny
Signed-off-by: Darrick J. Wong -
The two if statements have same condition, and the mask value
does not change in xfs_setattr_nonsize(), so combine them.Signed-off-by: Kaixu Xia
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
19 Mar, 2020
1 commit
-
We know the version is 3 if on a v5 file system. For earlier file
systems formats we always upgrade the remaining v1 inodes to v2 and
thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode
helper to check if we deal with small or large dinodes, and thus
remove the need for the di_version field in struct icdinode.Signed-off-by: Christoph Hellwig
Reviewed-by: Brian Foster
Reviewed-by: Chandan Rajendra
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
03 Mar, 2020
4 commits
-
The ATTR_* flags have a long IRIX history, where they a userspace
interface, the on-disk format and an internal interface. We've split
out the on-disk interface to the XFS_ATTR_* values, but despite (or
because?) of that the flag have still been a mess. Switch the
internal interface to pass the on-disk XFS_ATTR_* flags for the
namespace and the Linux XATTR_* flags for the actual flags instead.
The ATTR_* values that are actually used are move to xfs_fs.h with a
new XFS_IOC_* prefix to not conflict with the userspace version that
has the same name and must have the same value.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Chandan Rajendra
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Instead of converting from one style of arguments to another in
xfs_attr_set, pass the structure from higher up in the call chain.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Chandan Rajendra
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Use the Linux inode i_uid/i_gid members everywhere and just convert
from/to the scalar value when reading or writing the on-disk inode.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Instead of only synchronizing the uid/gid values in xfs_setup_inode,
ensure that they always match to prepare for removing the icdinode
fields.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
10 Jan, 2020
1 commit
-
This helps to pre-simplify the extra handling of the null terminator in
delayed operations which use memcpy rather than strlen. Later
when we introduce parent pointers, attribute names will become binary,
so strlen will not work at all. Removing uses of strlen now will
help reduce complexities laterSigned-off-by: Allison Collins
Reviewed-by: Darrick J. Wong
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Darrick J. Wong
14 Nov, 2019
3 commits
-
There is no point in splitting the fields like this in an purely
in-memory structure.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
struct xfs_icdinode is purely an in-memory data structure, so don't use
a log on-disk structure for it. This simplifies the code a bit, and
also reduces our include hell slightly.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
[darrick: fix a minor indenting problem in xfs_trans_ichgtime]
Signed-off-by: Darrick J. Wong -
Convert the last of the open coded corruption check and report idioms to
use the XFS_IS_CORRUPT macro.Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
11 Nov, 2019
2 commits
-
Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Move the node header size field to struct xfs_da_geometry, and remove
the now unused non-directory dir ops infrastructure.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
05 Nov, 2019
1 commit
-
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.Signed-off-by: Darrick J. Wong
Reviewed-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig
30 Oct, 2019
5 commits
-
Replace XFS_MOUNT_COMPAT_IOSIZE with an inverted XFS_MOUNT_LARGEIO flag
that makes the usage more clear.Signed-off-by: Christoph Hellwig
Reviewed-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Make the flag match the mount option and usage.
Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Use the allocsize name to match the mount option and usage instead.
Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
m_readio_blocks is entirely unused, and m_readio_blocks is only used in
xfs_stat_blksize in a max statements that is a no-op as it always has
the same value as m_writeio_log.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Move xfs_preferred_iosize to xfs_iops.c, unobsfucate it and also handle
the realtime special case in the helper.Signed-off-by: Christoph Hellwig
Reviewed-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
28 Oct, 2019
1 commit
-
Add a new xfs_inode_buftarg helper that gets the data I/O buftarg for a
given inode. Replace the existing xfs_find_bdev_for_inode and
xfs_find_daxdev_for_inode helpers with this new general one and cleanup
some of the callers.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
22 Oct, 2019
2 commits
-
Instead of lots of magic conditionals in the main write_begin
handler this make the intent very clear. Thing will become even
better once we support delayed allocations for extent size hints
and realtime allocations.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Start untangling xfs_file_iomap_begin by splitting out the read-only
case into its own set of iomap_ops with a very simply iomap_begin
helper.Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
23 Aug, 2019
1 commit
-
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
fails on account of being out of disk quota. I ran his reproducer
script:# adduser dummy
# adduser dummy plugdev# dd if=/dev/zero bs=1M count=100 of=test.img
# mkfs.xfs test.img
# mount -t xfs -o gquota test.img /mnt
# mkdir -p /mnt/dummy
# chown -c dummy /mnt/dummy
# xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt(and then as user dummy)
$ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
$ chgrp plugdev /mnt/dummy/fooand saw:
================================================
WARNING: lock held when returning to user space!
5.3.0-rc5 #rc5 Tainted: G W
------------------------------------------------
chgrp/47006 is leaving the kernel with locks still held!
1 lock held by chgrp/47006:
#0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]...which is clearly caused by xfs_setattr_nonsize failing to unlock the
ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing
unlock.Reported-by: benjamin.moody@gmail.com
Fixes: 253f4911f297 ("xfs: better xfs_trans_alloc interface")
Signed-off-by: Darrick J. Wong
Reviewed-by: Dave Chinner
Tested-by: Salvatore Bonaccorso
29 Jun, 2019
1 commit
-
There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this. I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.Signed-off-by: Eric Sandeen
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
02 Mar, 2019
1 commit
-
statx(2) notes that any attribute that is not indicated as supported by
stx_attributes_mask has no usable value. Commit 5f955f26f3d42d ("xfs: report
crtime and attribute flags to statx") added support for informing userspace
of extra file attributes but forgot to list these flags as supported
making reporting them rather useless for the pedantic userspace author.$ git describe --contains 5f955f26f3d42d04aba65590a32eb70eedb7f37d
v4.11-rc6~5^2^2~2Fixes: 5f955f26f3d42d ("xfs: report crtime and attribute flags to statx")
Signed-off-by: Luis R. Rodriguez
Reviewed-by: Darrick J. Wong
[darrick: add a comment reminding people to keep attributes_mask up to date]
Signed-off-by: Darrick J. Wong
15 Feb, 2019
1 commit
-
When XFS creates an O_TMPFILE file, the inode is created with nlink = 1,
put on the unlinked list, and then the VFS sets nlink = 0 in d_tmpfile.
If we crash before anything logs the inode (it's dirty incore but the
vfs doesn't tell us it's dirty so we never log that change), the iunlink
processing part of recovery will then explode with a pile of:XFS: Assertion failed: VFS_I(ip)->i_nlink == 0, file:
fs/xfs/xfs_log_recover.c, line: 5072Worse yet, since nlink is nonzero, the inodes also don't get cleaned up
and they just leak until the next xfs_repair run.Therefore, change xfs_iunlink to require that inodes being put on the
unlinked list have nlink == 0, change the tmpfile callers to instantiate
nodes that way, and set the nlink to 1 just prior to calling d_tmpfile.
Fix the comment for xfs_iunlink while we're at it.Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
29 Sep, 2018
1 commit
-
The VFS routine that calls ->get_link blindly copies whatever's returned
into the user's buffer. If we return a NULL pointer, the vfs will
crash on the null pointer. Therefore, return -EFSCORRUPTED instead of
blowing up the kernel.[dgc: clean up with hch's suggestions]
Reported-by: wen.xu@gatech.edu
Signed-off-by: Darrick J. Wong
Reviewed-by: Allison Henderson
Signed-off-by: Dave Chinner
14 Aug, 2018
1 commit
-
Pull xfs updates from Darrick Wong:
"This is the second part of the XFS changes for 4.19.The biggest changes are the removal of buffer heads frm XFS, a massive
reworking of the deferred transaction operations handling code, the
removal of the long defunct barrier/nobarrier mount options, and the
addition of a few more online repair functions.Summary:
- Use extent maps to track pagecache page status instead of
bufferhead state.- Refactor pagecache read and write paths to use the new iomap
library functions, which enable us to drop the old bufferhead code
for pagesize == blocksize filesystems.- Set up parallel per-block-per-page metadata to track subpage
information that was tracked by buffer heads, which enables us to
drop the old bufferhead code for pagesize > blocksize filesystems.- Tie a deferred ops control structure to a transaction so that we
can take advantage of an upper-level dfops without having to plumb
pointer passing through the code.- Refactor the deferred ops code to track deferred ops as part of the
transaction structure (instead of as a separate data structure) so
that we can simplify the scoping rules around defer_ops.- Refactor twisty delwri buffer submission code to avoid deadlocks.
- Shorten and fix indenting problems in the scrub code.
- Detect obviously bad summary counts at mount and fix them.
- Directly associate deferred ops control structure with a
transaction so that callers no longer have to manage it themselves.- Remove a couple of IRIX-era inode macros.
- Remove the long-deprecated 'barrier' and 'nobarrier' mount options.
- Clean up the inode fork structure a bit.
- Check for bad fs summary counter values in the superblock.
- Reduce COW fork lookups during writeback.
- Refactor the deferred ops control structures into the transaction
structure, thereby eliminating the need for transaction users to
handle the deferred ops as a separate data structure.- Add the ability to repair AG headers online.
- Fix a crash due to insufficient return value checking.
- Various fixes and cleanups"
* tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (155 commits)
xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree
xfs: remove b_last_holder & associated macros
iomap: Switch to offset_in_page for clarity
xfs: Close race between direct IO and xfs_break_layouts()
xfs: repair the AGI
xfs: repair the AGFL
xfs: repair the AGF
xfs: remove dead error handling code in xfs_dquot_disk_alloc()
xfs: use WRITE_ONCE to update if_seq
xfs: fix a comment in xfs_log_reserve
xfs: only validate summary counts on primary superblock
xfs: substitute spaces with tabs
xfs: fold dfops into the transaction
xfs: always defer agfl block frees
xfs: pass transaction to xfs_defer_add()
xfs: replace xfs_defer_ops ->dop_pending with on-stack list
xfs: cancel dfops on xfs_defer_finish() error
xfs: clean out superfluous dfops dop params/vars
xfs: drop dop param from xfs_defer_op_type ->finish_item() callback
xfs: automatic dfops inode relogging
...
04 Aug, 2018
1 commit
-
open-coded in a quite a few places...
Signed-off-by: Al Viro
27 Jul, 2018
3 commits
-
Replace the IRELE macro with a proper function so that we can do proper
typechecking and so that we can stop open-coding iput in scrub, which
means that we'll be able to ftrace inode lifetimes going through scrub
correctly.Signed-off-by: Darrick J. Wong
Reviewed-by: Carlos Maiolino
Reviewed-by: Brian Foster -
At this point, the transaction subsystem completely manages deferred
items internally such that the common and boilerplate
xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() ->
xfs_trans_commit() sequence can be replaced with a simple
transaction allocation and commit.Remove all such boilerplate deferred ops code. In doing so, we
change each case over to use the dfops in the transaction and
specifically eliminate:- The on-stack dfops and associated xfs_defer_init() call, as the
internal dfops is initialized on transaction allocation.
- xfs_bmap_finish() calls that precede a final xfs_trans_commit() of
a transaction.
- xfs_defer_cancel() calls in error handlers that precede a
transaction cancel.The only deferred ops calls that remain are those that are
non-deterministic with respect to the final commit of the associated
transaction or are open-coded due to special handling.Signed-off-by: Brian Foster
Reviewed-by: Bill O'Donnell
Reviewed-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
xfs_itruncate_extents[_flags]() uses a local dfops with a
transaction provided by the caller. It uses hacky ->t_dfops
replacement logic to avoid stomping over an already populated
->t_dfops.The latter never occurs for current callers and the logic itself is
not really appropriate. Clean this up by updating all callers to
initialize a dfops and to use that down in xfs_itruncate_extents().
This more closely resembles the upcoming logic where dfops will be
embedded within the transaction. We can also replace the
xfs_defer_init() in the xfs_itruncate_extents_flags() loop with an
assert. Both dfops and firstblock should be in a valid state
after xfs_defer_finish() and the inode joined to the dfops is fixed
throughout the loop.Signed-off-by: Brian Foster
Reviewed-by: Christoph Hellwig
Reviewed-by: Bill O'Donnell
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong