07 Aug, 2017
1 commit
-
Pull ext4 fixes from Ted Ts'o:
"A large number of ext4 bug fixes and cleanups for v4.13"* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix copy paste error in ext4_swap_extents()
ext4: fix overflow caused by missing cast in ext4_resize_fs()
ext4, project: expand inode extra size if possible
ext4: cleanup ext4_expand_extra_isize_ea()
ext4: restructure ext4_expand_extra_isize
ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize
ext4: make xattr inode reads faster
ext4: inplace xattr block update fails to deduplicate blocks
ext4: remove unused mode parameter
ext4: fix warning about stack corruption
ext4: fix dir_nlink behaviour
ext4: silence array overflow warning
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4: release discard bio after sending discard commands
ext4: convert swap_inode_data() over to use swap() on most of the fields
ext4: error should be cleared if ea_inode isn't added to the cache
ext4: Don't clear SGID when inheriting ACLs
ext4: preserve i_mode if __ext4_set_acl() fails
ext4: remove unused metadata accounting variables
ext4: correct comment references to ext4_ext_direct_IO()
06 Aug, 2017
14 commits
-
This bug was found by a static code checker tool for copy paste
problems.Signed-off-by: Maninder Singh
Signed-off-by: Vaneet Narang
Signed-off-by: Theodore Ts'o -
On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks. This may
caused the superblock being corrupted with zero blocks count.Fixes: 1c6bd7173d66
Signed-off-by: Jerry Lee
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org # 3.7+ -
When upgrading from old format, try to set project id
to old file first time, it will return EOVERFLOW, but if
that file is dirtied(touch etc), changing project id will
be allowed, this might be confusing for users, we could
try to expand @i_extra_isize here too.Reported-by: Zhang Yi
Signed-off-by: Miao Xie
Signed-off-by: Wang Shilong
Signed-off-by: Theodore Ts'o -
Clean up some goto statement, make ext4_expand_extra_isize_ea() clearer.
Signed-off-by: Miao Xie
Signed-off-by: Theodore Ts'o
Reviewed-by: Wang Shilong -
Current ext4_expand_extra_isize just tries to expand extra isize, if
someone is holding xattr lock or some check fails, it will give up.
So rename its name to ext4_try_to_expand_extra_isize.Besides that, we clean up unnecessary check and move some relative checks
into it.Signed-off-by: Miao Xie
Signed-off-by: Theodore Ts'o
Reviewed-by: Wang Shilong -
We should avoid the contention between the i_extra_isize update and
the inline data insertion, so move the xattr trylock in front of
i_extra_isize update.Signed-off-by: Miao Xie
Reviewed-by: Wang Shilong -
ext4_xattr_inode_read() currently reads each block sequentially while
waiting for io operation to complete before moving on to the next
block. This prevents request merging in block layer.Add a ext4_bread_batch() function that starts reads for all blocks
then optionally waits for them to complete. A similar logic is used
in ext4_find_entry(), so update that code to use the new function.Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o -
When an xattr block has a single reference, block is updated inplace
and it is reinserted to the cache. Later, a cache lookup is performed
to see whether an existing block has the same contents. This cache
lookup will most of the time return the just inserted entry so
deduplication is not achieved.Running the following test script will produce two xattr blocks which
can be observed in "File ACL: " line of debugfs output:mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G
mount /dev/sdb /mnt/sdbtouch /mnt/sdb/{x,y}
setfattr -n user.1 -v aaa /mnt/sdb/x
setfattr -n user.2 -v bbb /mnt/sdb/xsetfattr -n user.1 -v aaa /mnt/sdb/y
setfattr -n user.2 -v bbb /mnt/sdb/ydebugfs -R 'stat x' /dev/sdb | cat
debugfs -R 'stat y' /dev/sdb | catThis patch defers the reinsertion to the cache so that we can locate
other blocks with the same contents.Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger -
ext4_alloc_file_blocks() does not use its mode parameter. Remove it.
Signed-off-by: Tahsin Erdogan
Signed-off-by: Theodore Ts'o -
After commit 62d1034f53e3 ("fortify: use WARN instead of BUG for now"),
we get a warning about possible stack overflow from a memcpy that
was not strictly bounded to the size of the local variable:inlined from 'ext4_mb_seq_groups_show' at fs/ext4/mballoc.c:2322:2:
include/linux/string.h:309:9: error: '__builtin_memcpy': writing between 161 and 1116 bytes into a region of size 160 overflows the destination [-Werror=stringop-overflow=]We actually had a bug here that would have been found by the warning,
but it was already fixed last year in commit 30a9d7afe70e ("ext4: fix
stack memory corruption with 64k block size").This replaces the fixed-length structure on the stack with a variable-length
structure, using the correct upper bound that tells the compiler that
everything is really fine here. I also change the loop count to check
for the same upper bound for consistency, but the existing code is
already correct here.Note that while clang won't allow certain kinds of variable-length arrays
in structures, this particular instance is fine, as the array is at the
end of the structure, and the size is strictly bounded.Signed-off-by: Arnd Bergmann
Signed-off-by: Theodore Ts'o -
The dir_nlink feature has been enabled by default for new ext4
filesystems since e2fsprogs-1.41 in 2008, and was automatically
enabled by the kernel for older ext4 filesystems since the
dir_nlink feature was added with ext4 in kernel 2.6.28+ when
the subdirectory count exceeded EXT4_LINK_MAX-1.Automatically adding the file system features such as dir_nlink is
generally frowned upon, since it could cause the file system to not be
mountable on older kernel, thus preventing the administrator from
rolling back to an older kernel if necessary.In this case, the administrator might also want to disable the feature
because glibc's fts_read() function does not correctly optimize
directory traversal for directories that use st_nlinks field of 1 to
indicate that the number of links in the directory are not tracked by
the file system, and could fail to traverse the full directory
hierarchy. Fortunately, in the past ten years very few users have
complained about incomplete file system traversal by glibc's
fts_read().This commit also changes ext4_inc_count() to allow i_nlinks to reach
the full EXT4_LINK_MAX links on the parent directory (including "."
and "..") before changing i_links_count to be 1.Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196405
Signed-off-by: Andreas Dilger
Signed-off-by: Theodore Ts'o -
I get a static checker warning:
fs/ext4/ext4.h:3091 ext4_set_de_type()
error: buffer overflow 'ext4_type_by_mode' 15 -
ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:xfs_io -f -c "falloc 0 4k" \
-c "pwrite 1k 1k" \
-c "pwrite 3k 1k" \
-c "seek -a -r 0" fooIn this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.Fix the problem by neglecting buffers in a page before starting offset.
Reported-by: Andreas Gruenbacher
Signed-off-by: Theodore Ts'o
Signed-off-by: Jan Kara
CC: stable@vger.kernel.org # 3.8+ -
We've changed the discard command handling into parallel manner.
But, in this change, I forgot decreasing the usage count of the bio
which was used to send discard request. I'm sorry about that.Fixes: a015434480dc ("ext4: send parallel discards on commit completions")
Signed-off-by: Daeho Jeong
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara
04 Aug, 2017
1 commit
-
Merge misc fixes from Andrew Morton:
"15 fixes"[ This does not merge the "fortify: use WARN instead of BUG for now"
patch, which needs a bit of extra work to build cleanly with all
configurations. Arnd is on it. - Linus ]* emailed patches from Andrew Morton :
ocfs2: don't clear SGID when inheriting ACLs
mm: allow page_cache_get_speculative in interrupt context
userfaultfd: non-cooperative: flush event_wqh at release time
ipc: add missing container_of()s for randstruct
cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
userfaultfd_zeropage: return -ENOSPC in case mm has gone
mm: take memory hotplug lock within numa_zonelist_order_handler()
mm/page_io.c: fix oops during block io poll in swapin path
zram: do not free pool->size_class
kthread: fix documentation build warning
kasan: avoid -Wmaybe-uninitialized warning
userfaultfd: non-cooperative: notify about unmap of destination during mremap
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
pid: kill pidhash_size in pidhash_init()
mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
03 Aug, 2017
4 commits
-
Pull NFS client fixes from Anna Schumaker:
"Two fixes from Trond this time, now that he's back from his vacation.
The first is a stable fix for the EXCHANGE_ID issue on the mailing
list, and the other fixes a double-free situation that he found at the
same time.Stable fix:
- Fix EXCHANGE_ID corrupt verifier issueOther fix:
- Fix double frees in nfs4_test_session_trunk()"* tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Fix double frees in nfs4_test_session_trunk()
NFSv4: Fix EXCHANGE_ID corrupt verifier issue -
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
into ocfs2_iop_set_acl(). That way the function will not be called when
inheriting ACLs which is what we want as it prevents SGID bit clearing
and the mode has been properly set by posix_acl_create() anyway. Also
posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
mode itself.Fixes: 073931017b4 ("posix_acl: Clear SGID bit when setting file permissions")
Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
Signed-off-by: Jan Kara
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There may still be threads waiting on event_wqh at the time the
userfault file descriptor is closed. Flush the events wait-queue to
prevent waiting threads from hanging.Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes: 9cd75c3cd4c3d ("userfaultfd: non-cooperative: add ability to report
non-PF events from uffd descriptor")
Signed-off-by: Mike Rapoport
Cc: Andrea Arcangeli
Cc: "Dr. David Alan Gilbert"
Cc: Pavel Emelyanov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In the non-cooperative userfaultfd case, the process exit may race with
outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
instead of -EINVAL when mm is already gone will allow uffd monitor to
distinguish this case from other error conditions.Unfortunately I overlooked userfaultfd_zeropage when updating
userfaultd_copy().Link: http://lkml.kernel.org/r/1501136819-21857-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes: 96333187ab162 ("userfaultfd_copy: return -ENOSPC in case mm has gone")
Signed-off-by: Mike Rapoport
Cc: Andrea Arcangeli
Cc: "Dr. David Alan Gilbert"
Cc: Pavel Emelyanov
Cc: Michal Hocko
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Aug, 2017
2 commits
-
rpc_clnt_add_xprt() expects the callback function to be synchronous, and
expects to release the transport and switch references itself.Fixes: 04fa2c6bb51b1 ("NFS pnfs data server multipath session trunking")
Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker -
The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit 8d89bd70bc939. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.Fixes: 8d89bd70bc939 ("NFS setup async exchange_id")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Trond Myklebust
Signed-off-by: Anna Schumaker
31 Jul, 2017
6 commits
-
For some odd reason, it forces a byte-by-byte copy of each field. A
plain old swap() on most of these fields would be more efficient. We
do need to retain the memswap of i_data however as that field is an array.Signed-off-by: Theodore Ts'o
Signed-off-by: Jeff Layton
Reviewed-by: Jan Kara -
For Lustre, if ea_inode fails in hash validation but passes parent
inode and generation checks, it won't be added to the cache as well
as the error "-EFSCORRUPTED" should be cleared, otherwise it will
cause "Structure needs cleaning" when running getfattr command.Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9723
Cc: stable@vger.kernel.org
Fixes: dec214d00e0d78a08b947d7dccdfdb84407a9f4d
Signed-off-by: Emoly Liu
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger
Reviewed-by: tahsin@google.com -
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.Fix the problem by moving posix_acl_update_mode() out of
__ext4_set_acl() into ext4_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o
Signed-off-by: Jan Kara
Reviewed-by: Andreas Gruenbacher -
When changing a file's acl mask, __ext4_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.Prevent this by only changing the inode mode after the acl has been set.
Signed-off-by: Ernesto A. Fernández
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara -
Two variables in ext4_inode_info, i_reserved_meta_blocks and
i_allocated_meta_blocks, are unused. Removing them saves a little
memory per in-memory inode and cleans up clutter in several tracepoints.
Adjust tracepoint output from ext4_alloc_da_blocks() for consistency
and fix a typo and whitespace near these changes.Signed-off-by: Eric Whitney
Signed-off-by: Theodore Ts'o
Reviewed-by: Jan Kara -
Commit 914f82a32d0268847 "ext4: refactor direct IO code" deleted
ext4_ext_direct_IO(), but references to that function remain in
comments. Update them to refer to ext4_direct_IO_write().Signed-off-by: Eric Whitney
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger
Reviewed-by: Jan Kara
29 Jul, 2017
4 commits
-
Pull NFS client fixes from Anna Schumaker:
"More NFS client bugfixes for 4.13.Most of these fix locking bugs that Ben and Neil noticed, but I also
have a patch to fix one more access bug that was reported after last
week.Stable fixes:
- Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
- Invalidate file size when taking a lock to prevent corruptionOther fixes:
- Don't excessively generate tiny writes with fallocate
- Use the raw NFS access mask in nfs4_opendata_access()"* tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
NFS: Optimize fallocate by refreshing mapping when needed.
NFS: invalidate file size when taking a lock.
NFS: Use raw NFS access mask in nfs4_opendata_access() -
Pull xfs fixes from Darrick Wong:
- fix firstfsb variables that we left uninitialized, which could lead
to locking problems.- check for NULL metadata buffer pointers before using them.
- don't allow btree cursor manipulation if the btree block is corrupt.
Better to just shut down.- fix infinite loop problems in quotacheck.
- fix buffer overrun when validating directory blocks.
- fix deadlock problem in bunmapi.
* tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix multi-AG deadlock in xfs_bunmapi
xfs: check that dir block entries don't off the end of the buffer
xfs: fix quotacheck dquot id overflow infinite loop
xfs: check _alloc_read_agf buffer pointer before using
xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write
xfs: check _btree_check_block value -
nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
same region protected by the wait_queue's lock after checking for a
notification from CB_NOTIFY_LOCK callback. However, after releasing that
lock, a wakeup for that task may race in before the call to
freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
freezable_schedule_timeout_interruptible() will set the state back to
TASK_INTERRUPTIBLE before the task will sleep. The result is that the task
will sleep for the entire duration of the timeout.Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
freezable_schedule_timout() instead.Fixes: a1d617d8f134 ("nfs: allow blocking locks to be awoken by lock callbacks")
Signed-off-by: Benjamin Coddington
Reviewed-by: Jeff Layton
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker -
Pull btrfs fixes from David Sterba:
"Fixes addressing problems reported by users, and there's one more
regression fix"* 'for-4.13-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: round down size diff when shrinking/growing device
Btrfs: fix early ENOSPC due to delalloc
btrfs: fix lockup in find_free_extent with read-only block groups
Btrfs: fix dir item validation when replaying xattr deletes
27 Jul, 2017
3 commits
-
posix_fallocate() will allocate space in an NFS file by considering
the last byte of every 4K block. If it is before EOF, it will read
the byte and if it is zero, a zero is written out. If it is after EOF,
the zero is unconditionally written.For the blocks beyond EOF, if NFS believes its cache is valid, it will
expand these writes to write full pages, and then will merge the pages.
This results if (typically) 1MB writes. If NFS believes its cache is
not valid (particularly if NFS_INO_INVALID_DATA or
NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
send the individual 1-byte writes. This results in (typically) 256 times
as many RPC requests, and can be substantially slower.Currently nfs_revalidate_mapping() is only used when reading a file or
mmapping a file, as these are times when the content needs to be
up-to-date. Writes don't generally need the cache to be up-to-date, but
writes beyond EOF can benefit, particularly in the posix_fallocate()
case.So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
i.e. when there is a gap between the end of the file and the start of
the write. If the cache is thought to be out of date (as happens after
taking a file lock), this will cause a GETATTR, and the two flags
mentioned above will be cleared. With this, posix_fallocate() on a
newly locked file does not generate excessive tiny writes.Signed-off-by: NeilBrown
Signed-off-by: Anna Schumaker -
Prior to commit ca0daa277aca ("NFS: Cache aggressively when file is open
for writing"), NFS would revalidate, or invalidate, the file size when
taking a lock. Since that commit it only invalidates the file content.If the file size is changed on the server while wait for the lock, the
client will have an incorrect understanding of the file size and could
corrupt data. This particularly happens when writing beyond the
(supposed) end of file and can be easily be demonstrated with
posix_fallocate().If an application opens an empty file, waits for a write lock, and then
calls posix_fallocate(), glibc will determine that the underlying
filesystem doesn't support fallocate (assuming version 4.1 or earlier)
and will write out a '0' byte at the end of each 4K page in the region
being fallocated that is after the end of the file.
NFS will (usually) detect that these writes are beyond EOF and will
expand them to cover the whole page, and then will merge the pages.
Consequently, NFS will write out large blocks of zeroes beyond where it
thought EOF was. If EOF had moved, the pre-existing part of the file
will be over-written. Locking should have protected against this,
but it doesn't.This patch restores the use of nfs_zap_caches() which invalidated the
cached attributes. When posix_fallocate() asks for the file size, the
request will go to the server and get a correct answer.cc: stable@vger.kernel.org (v4.8+)
Fixes: ca0daa277aca ("NFS: Cache aggressively when file is open for writing")
Signed-off-by: NeilBrown
Signed-off-by: Anna Schumaker -
Commit bd8b2441742b ("NFS: Store the raw NFS access mask in the inode's
access cache") changed how the access results are stored after an
access() call. An NFS v4 OPEN might have access bits returned with the
opendata, so we should use the NFS4_ACCESS values when determining the
return value in nfs4_opendata_access().Fixes: bd8b2441742b ("NFS: Store the raw NFS access mask in the inode's
access cache")
Reported-by: Eryu Guan
Signed-off-by: Anna Schumaker
Tested-by: Takashi Iwai
26 Jul, 2017
1 commit
-
Just like in the allocator we must avoid touching multiple AGs out of
order when freeing blocks, as freeing still locks the AGF and can cause
the same AB-BA deadlocks as in the allocation path.Signed-off-by: Christoph Hellwig
Reported-by: Nikolay Borisov
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
25 Jul, 2017
2 commits
-
Pull JFS fixes from David Kleikamp.
* tag 'jfs-4.13' of git://github.com/kleikamp/linux-shaggy:
jfs: preserve i_mode if __jfs_set_acl() fails
jfs: Don't clear SGID when inheriting ACLs
jfs: atomically read inode size -
When we're checking the entries in a directory buffer, make sure that
the entry length doesn't push us off the end of the buffer. Found via
xfs/388 writing ones to the length fields.Signed-off-by: Darrick J. Wong
Reviewed-by: Brian Foster
24 Jul, 2017
2 commits
-
If a dquot has an id of U32_MAX, the next lookup index increment
overflows the uint32_t back to 0. This starts the lookup sequence
over from the beginning, repeats indefinitely and results in a
livelock.Update xfs_qm_dquot_walk() to explicitly check for the lookup
overflow and exit the loop.Signed-off-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
Further testing showed that the fix introduced in 7dfb8be11b5d ("btrfs:
Round down values which are written for total_bytes_size") was
insufficient and it could still lead to discrepancies between the
total_bytes in the super block and the device total bytes. So this patch
also ensures that the difference between old/new sizes when
shrinking/growing is also rounded down. This ensure that we won't be
subtracting/adding a non-sectorsize multiples to the superblock/device
total sizees.Fixes: 7dfb8be11b5d ("btrfs: Round down values which are written for total_bytes_size")
Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba