Eric Lee / smarc-fsl-linux-kernel

30 Jul, 2018

3 commits

3cfb6772d Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 fixes from Ted Ts'o:
"Some miscellaneous ext4 fixes for 4.18; one fix is for a regression
introduced in 4.18-rc4.

Sorry for the late-breaking pull. I was originally going to wait for
the next merge window, but Eric Whitney found a regression introduced
in 4.18-rc4, so I decided to push out the regression plus the other
fixes now. (The other commits have been baking in linux-next since
early July)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix check to prevent initializing reserved inodes
ext4: check for allocation block validity with block group locked
ext4: fix inline data updates with checksums enabled
ext4: clear mmp sequence number when remounting read-only
ext4: fix false negatives *and* false positives in ext4_check_descriptors()

Linus Torvalds
2018-07-30 04:13:45 +0800
01cfb7937 squashfs: be more careful about metadata corruption ... Browse Code »

Anatoly Trosinenko reports that a corrupted squashfs image can cause a
kernel oops. It turns out that squashfs can end up being confused about
negative fragment lengths.

The regular squashfs_read_data() does check for negative lengths, but
squashfs_read_metadata() did not, and the fragment size code just
blindly trusted the on-disk value. Fix both the fragment parsing and
the metadata reading code.

Reported-by: Anatoly Trosinenko
Cc: Al Viro
Cc: Phillip Lougher
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2018-07-30 03:44:46 +0800
501228470 ext4: fix check to prevent initializing reserved inodes ... Browse Code »

Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:

mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc

Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.

This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.

Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney
Signed-off-by: Theodore Ts'o

Theodore Ts'o
2018-07-30 03:34:00 +0800

28 Jul, 2018

3 commits

eb181a814 Merge tag 'for-linus-20180727' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Bigger than usual at this time, mostly due to the O_DIRECT corruption
issue and the fact that I was on vacation last week. This contains:

- NVMe pull request with two fixes for the FC code, and two target
fixes (Christoph)

- a DIF bio reset iteration fix (Greg Edwards)

- two nbd reply and requeue fixes (Josef)

- SCSI timeout fixup (Keith)

- a small series that fixes an issue with bio_iov_iter_get_pages(),
which ended up causing corruption for larger sized O_DIRECT writes
that ended up racing with buffered writes (Martin Wilck)"

* tag 'for-linus-20180727' of git://git.kernel.dk/linux-block:
block: reset bi_iter.bi_done after splitting bio
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
blkdev: __blkdev_direct_IO_simple: fix leak in error case
block: bio_iov_iter_get_pages: fix size of last iovec
nvmet: only check for filebacking on -ENOTBLK
nvmet: fixup crash on NULL device path
scsi: set timed out out mq requests to complete
blk-mq: export setting request completion state
nvme: if_ready checks to fail io to deleting controller
nvmet-fc: fix target sgl list on large transfers
nbd: handle unexpected replies better
nbd: don't requeue the same request twice.

Linus Torvalds
2018-07-28 03:51:00 +0800
864af0d40 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"11 fixes"

* emailed patches from Andrew Morton :
kvm, mm: account shadow page tables to kmemcg
zswap: re-check zswap_is_full() after do zswap_shrink()
include/linux/eventfd.h: include linux/errno.h
mm: fix vma_is_anonymous() false-positives
mm: use vma_init() to initialize VMAs on stack and data segments
mm: introduce vma_init()
mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
ipc/sem.c: prevent queue.status tearing in semop
mm: disallow mappings that conflict for devm_memremap_pages()
kasan: only select SLUB_DEBUG with SYSFS=y
delayacct: fix crash in delayacct_blkio_end() after delayacct init failure

Linus Torvalds
2018-07-28 01:30:47 +0800
f636d300c Merge tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:

- Fix some uninitialized variable errors

- Fix an incorrect check in metadata verifiers

* tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: properly handle free inodes in extent hint validators
xfs: Initialize variables in xfs_alloc_get_rec before using them

Linus Torvalds
2018-07-28 00:25:09 +0800

27 Jul, 2018

3 commits

bfd40eaff mm: fix vma_is_anonymous() false-positives ... Browse Code »

vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous
VMA. This is unreliable as ->mmap may not set ->vm_ops.

False-positive vma_is_anonymous() may lead to crashes:

next ffff8801ce5e7040 prev ffff8801d20eca50 mm ffff88019c1e13c0
prot 27 anon_vma ffff88019680cdd8 vm_ops 0000000000000000
pgoff 0 file ffff8801b2ec2d00 private_data 0000000000000000
flags: 0xff(read|write|exec|shared|mayread|maywrite|mayexec|mayshare)
------------[ cut here ]------------
kernel BUG at mm/memory.c:1422!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 18486 Comm: syz-executor3 Not tainted 4.18.0-rc3+ #136
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:zap_pmd_range mm/memory.c:1421 [inline]
RIP: 0010:zap_pud_range mm/memory.c:1466 [inline]
RIP: 0010:zap_p4d_range mm/memory.c:1487 [inline]
RIP: 0010:unmap_page_range+0x1c18/0x2220 mm/memory.c:1508
Call Trace:
unmap_single_vma+0x1a0/0x310 mm/memory.c:1553
zap_page_range_single+0x3cc/0x580 mm/memory.c:1644
unmap_mapping_range_vma mm/memory.c:2792 [inline]
unmap_mapping_range_tree mm/memory.c:2813 [inline]
unmap_mapping_pages+0x3a7/0x5b0 mm/memory.c:2845
unmap_mapping_range+0x48/0x60 mm/memory.c:2880
truncate_pagecache+0x54/0x90 mm/truncate.c:800
truncate_setsize+0x70/0xb0 mm/truncate.c:826
simple_setattr+0xe9/0x110 fs/libfs.c:409
notify_change+0xf13/0x10f0 fs/attr.c:335
do_truncate+0x1ac/0x2b0 fs/open.c:63
do_sys_ftruncate+0x492/0x560 fs/open.c:205
__do_sys_ftruncate fs/open.c:215 [inline]
__se_sys_ftruncate fs/open.c:213 [inline]
__x64_sys_ftruncate+0x59/0x80 fs/open.c:213
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reproducer:

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

#define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
#define KCOV_ENABLE _IO('c', 100)
#define KCOV_DISABLE _IO('c', 101)
#define COVER_SIZE (1024<<< 20);
return 0;
}

This can be fixed by assigning anonymous VMAs own vm_ops and not relying
on it being NULL.

If ->mmap() failed to set ->vm_ops, mmap_region() will set it to
dummy_vm_ops. This way we will have non-NULL ->vm_ops for all VMAs.

Link: http://lkml.kernel.org/r/20180724121139.62570-4-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Reported-by: syzbot+3f84280d52be9b7083cc@syzkaller.appspotmail.com
Acked-by: Linus Torvalds
Reviewed-by: Andrew Morton
Cc: Dmitry Vyukov
Cc: Oleg Nesterov
Cc: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2018-07-27 10:38:03 +0800
2c4541e24 mm: use vma_init() to initialize VMAs on stack and data segments ... Browse Code »

Make sure to initialize all VMAs properly, not only those which come
from vm_area_cachep.

Link: http://lkml.kernel.org/r/20180724121139.62570-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Linus Torvalds
Reviewed-by: Andrew Morton
Cc: Dmitry Vyukov
Cc: Oleg Nesterov
Cc: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2018-07-27 10:38:03 +0800
9362dd110 blkdev: __blkdev_direct_IO_simple: fix leak in error case ... Browse Code »

Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io")
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Signed-off-by: Martin Wilck
Signed-off-by: Jens Axboe

Martin Wilck
2018-07-27 01:52:33 +0800

26 Jul, 2018

1 commit

5c61ef1b7 Merge tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/dhowells/linux-fs

Pull fscache/cachefiles fixes from David Howells:

- Allow cancelled operations to be queued so they can be cleaned up.

- Fix a refcounting bug in the monitoring of reads on backend files
whereby a race can occur between monitor objects being listed for
work, the work processing being queued and the work processor running
and destroying the monitor objects.

- Fix a ref overput in object attachment, whereby a tentatively
considered object is put in error handling without first being 'got'.

- Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an
assertion occurs when we retry because it seems the object is now
active.

- Wait rather BUG'ing on an object collision in the depths of
cachefiles as the active object should be being cleaned up - also
depends on the one above.

* tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
cachefiles: Wait rather than BUG'ing on "Unexpected object collision"
cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
fscache: Fix reference overput in fscache_attach_object() error handling
cachefiles: Fix refcounting bug in backing-file read monitoring
fscache: Allow cancelled operations to be enqueued

Linus Torvalds
2018-07-26 01:55:24 +0800

25 Jul, 2018

6 commits

c2412ac45 cachefiles: Wait rather than BUG'ing on "Unexpected object collision" ... Browse Code »

If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
the active object tree, we have been emitting a BUG after logging
information about it and the new object.

Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
on the old object (or return an error). The ACTIVE flag should be cleared
after it has been removed from the active object tree. A timeout of 60s is
used in the wait, so we shouldn't be able to get stuck there.

Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Kiran Kumar Modukuri
Signed-off-by: David Howells

Kiran Kumar Modukuri
2018-07-25 21:49:00 +0800
5ce83d4bb cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag ... Browse Code »

In cachefiles_mark_object_active(), the new object is marked active and
then we try to add it to the active object tree. If a conflicting object
is already present, we want to wait for that to go away. After the wait,
we go round again and try to re-mark the object as being active - but it's
already marked active from the first time we went through and a BUG is
issued.

Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.

Analysis from Kiran Kumar Modukuri:

[Impact]
Oops during heavy NFS + FSCache + Cachefiles

CacheFiles: Error: Overlong wait for old active object to go away.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002

CacheFiles: Error: Object already active kernel BUG at
fs/cachefiles/namei.c:163!

[Cause]
In a heavily loaded system with big files being read and truncated, an
fscache object for a cookie is being dropped and a new object being
looked. The new object being looked for has to wait for the old object
to go away before the new object is moved to active state.

[Fix]
Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
retrying the object lookup.

[Testcase]
Have run ~100 hours of NFS stress tests and have not seen this bug recur.

[Regression Potential]
- Limited to fscache/cachefiles.

Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Kiran Kumar Modukuri
Signed-off-by: David Howells

Kiran Kumar Modukuri
2018-07-25 21:49:00 +0800
f29507ce6 fscache: Fix reference overput in fscache_attach_object() error handling ... Browse Code »

When a cookie is allocated that causes fscache_object structs to be
allocated, those objects are initialised with the cookie pointer, but
aren't blessed with a ref on that cookie unless the attachment is
successfully completed in fscache_attach_object().

If attachment fails because the parent object was dying or there was a
collision, fscache_attach_object() returns without incrementing the cookie
counter - but upon failure of this function, the object is released which
then puts the cookie, whether or not a ref was taken on the cookie.

Fix this by taking a ref on the cookie when it is assigned in
fscache_object_init(), even when we're creating a root object.

Analysis from Kiran Kumar:

This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277

fscache cookie ref count updated incorrectly during fscache object
allocation resulting in following Oops.

kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!

[Cause]
Two threads are trying to do operate on a cookie and two objects.

(1) One thread tries to unmount the filesystem and in process goes over a
huge list of objects marking them dead and deleting the objects.
cookie->usage is also decremented in following path:

nfs_fscache_release_super_cookie
-> __fscache_relinquish_cookie
->__fscache_cookie_put
->BUG_ON(atomic_read(&cookie->usage) fscache_object_init
-> assign cookie, but usage not bumped.
2) fscache_attach_object -> fails in cant_attach_object because the
cookie's backing object or cookie's->parent object are going away
3) fscache_put_object
-> cachefiles_put_object
->fscache_object_destroy
->fscache_cookie_put
->BUG_ON(atomic_read(&cookie->usage)
Signed-off-by: David Howells

Kiran Kumar Modukuri
2018-07-25 21:49:00 +0800
934140ab0 cachefiles: Fix refcounting bug in backing-file read monitoring ... Browse Code »

cachefiles_read_waiter() has the right to access a 'monitor' object by
virtue of being called under the waitqueue lock for one of the pages in its
purview. However, it has no ref on that monitor object or on the
associated operation.

What it is allowed to do is to move the monitor object to the operation's
to_do list, but once it drops the work_lock, it's actually no longer
permitted to access that object. However, it is trying to enqueue the
retrieval operation for processing - but it can only do this via a pointer
in the monitor object, something it shouldn't be doing.

If it doesn't enqueue the operation, the operation may not get processed.
If the order is flipped so that the enqueue is first, then it's possible
for the work processor to look at the to_do list before the monitor is
enqueued upon it.

Fix this by getting a ref on the operation so that we can trust that it
will still be there once we've added the monitor to the to_do list and
dropped the work_lock. The op can then be enqueued after the lock is
dropped.

The bug can manifest in one of a couple of ways. The first manifestation
looks like:

FS-Cache:
FS-Cache: Assertion failed
FS-Cache: 6 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:494!
RIP: 0010:fscache_put_operation+0x1e3/0x1f0
...
fscache_op_work_func+0x26/0x50
process_one_work+0x131/0x290
worker_thread+0x45/0x360
kthread+0xf8/0x130
? create_worker+0x190/0x190
? kthread_cancel_work_sync+0x10/0x10
ret_from_fork+0x1f/0x30

This is due to the operation being in the DEAD state (6) rather than
INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
fscache_put_operation().

The bug can also manifest like the following:

kernel BUG at fs/fscache/operation.c:69!
...
[exception RIP: fscache_enqueue_operation+246]
...
#7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
#8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
#9 [ffff883fff083c48] __wake_up_common at ffffffff810af028

I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
entirely clear which assertion failed.

Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Lei Xue
Reported-by: Vegard Nossum
Reported-by: Anthony DeRobertis
Reported-by: NeilBrown
Reported-by: Daniel Axtens
Reported-by: Kiran Kumar Modukuri
Signed-off-by: David Howells
Reviewed-by: Daniel Axtens

Kiran Kumar Modukuri
2018-07-25 21:49:00 +0800
d0eb06afe fscache: Allow cancelled operations to be enqueued ... Browse Code »

Alter the state-check assertion in fscache_enqueue_operation() to allow
cancelled operations to be given processing time so they can be cleaned up.

Also fix a debugging statement that was requiring such operations to have
an object assigned.

Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Kiran Kumar Modukuri
Signed-off-by: David Howells

Kiran Kumar Modukuri
2018-07-25 21:31:20 +0800
d4a34e165 xfs: properly handle free inodes in extent hint validators ... Browse Code »

When inodes are freed in xfs_ifree(), di_flags is cleared (so extent size
hints are removed) but the actual extent size fields are left intact.
This causes the extent hint validators to fail on freed inodes which once
had extent size hints.

This can be observed (for example) by running xfs/229 twice on a
non-crc xfs filesystem, or presumably on V5 with ikeep.

Fixes: 7d71a67 ("xfs: verify extent size hint is valid in inode verifier")
Fixes: 02a0fda ("xfs: verify COW extent size hint is valid in inode verifier")
Signed-off-by: Eric Sandeen
Reviewed-by: Brian Foster
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Eric Sandeen
2018-07-25 02:34:52 +0800

23 Jul, 2018

1 commit

165ea0d1c Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"Fix several places that screw up cleanups after failures halfway
through opening a file (one open-coding filp_clone_open() and getting
it wrong, two misusing alloc_file()). That part is -stable fodder from
the 'work.open' branch.

And Christoph's regression fix for uapi breakage in aio series;
include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel
definition of sigset_t, the reason for doing so in the first place had
been bogus - there's no need to expose struct __aio_sigset in
aio_abi.h at all"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: don't expose __aio_sigset in uapi
ocxlflash_getfile(): fix double-iput() on alloc_file() failures
cxl_getfile(): fix double-iput() on alloc_file() failures
drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()

Linus Torvalds
2018-07-23 03:04:51 +0800

22 Jul, 2018

4 commits

55b636b41 Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fix from David Sterba:
"A fix of a corruption regarding fsync and clone, under some very
specific conditions explained in the patch.

The fix is marked for stable 3.16+ so I'd like to get it merged now
given the impact"

* tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix file data corruption after cloning a range and fsync

Linus Torvalds
2018-07-22 07:42:03 +0800
490fc0538 mm: make vm_area_alloc() initialize core fields ... Browse Code »

Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.

The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.

Signed-off-by: Linus Torvalds

Linus Torvalds
2018-07-22 06:24:03 +0800
3928d4f5e mm: use helper functions for allocating and freeing vm_area structs ... Browse Code »

The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.

We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.

Right now those functions are literally just wrappers around the
kmem_cache_*() calls. This is a purely mechanical conversion:

# new vma:
kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()

# copy old vma
kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)

# free vma
kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)

to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).

Signed-off-by: Linus Torvalds

Linus Torvalds
2018-07-22 04:48:51 +0800
35033ab98 fat: fix memory allocation failure handling of match_strdup() ... Browse Code »

In parse_options(), if match_strdup() failed, parse_options() leaves
opts->iocharset in unexpected state (i.e. still pointing the freed
string). And this can be the cause of double free.

To fix, this initialize opts->iocharset always when freeing.

Link: http://lkml.kernel.org/r/8736wp9dzc.fsf@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi
Reported-by: syzbot+90b8e10515ae88228a92@syzkaller.appspotmail.com
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2018-07-22 03:50:46 +0800

19 Jul, 2018

2 commits

bd3599a0e Btrfs: fix file data corruption after cloning a range and fsync ... Browse Code »

When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).

The following example, turned into a test for fstests, reproduces the
issue:

$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt

$ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
$ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar

$ xfs_io -c "fsync" /mnt/bar
# reflink destination offset corresponds to the size of file bar,
# 2728Kb minus 4Kb.
$ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar

$ md5sum /mnt/bar
95a95813a8c2abc9aa75a6c2914a077e /mnt/bar

$ mount /dev/sdb /mnt
$ md5sum /mnt/bar
207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar
# digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
# power failure

In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.

Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).

CC: stable@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana
Signed-off-by: David Sterba

Filipe Manana
2018-07-19 21:36:31 +0800
04a132065 Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:
"Three regression fixes. They're few-liners and fixing some corner
cases missed in the origial patches"

* tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block()
btrfs: fix use-after-free of cmp workspace pages
btrfs: restore uuid_mutex in btrfs_open_devices

Linus Torvalds
2018-07-19 02:13:25 +0800

18 Jul, 2018

1 commit

9ba546c01 aio: don't expose __aio_sigset in uapi ... Browse Code »

glibc uses a different defintion of sigset_t than the kernel does,
and the current version would pull in both. To fix this just do not
expose the type at all - this somewhat mirrors pselect() where we
do not even have a type for the magic sigmask argument, but just
use pointer arithmetics.

Fixes: 7a074e96 ("aio: implement io_pgetevents")
Signed-off-by: Christoph Hellwig
Reported-by: Adrian Reber
Signed-off-by: Al Viro

Christoph Hellwig
2018-07-18 11:26:58 +0800

17 Jul, 2018

1 commit

665d4953c btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() ... Browse Code »

In commit ac0b4145d662 ("btrfs: scrub: Don't use inode pages for device
replace") we removed the branch of copy_nocow_pages() to avoid
corruption for compressed nodatasum extents.

However above commit only solves the problem in scrub_extent(), if
during scrub_pages() we failed to read some pages,
sctx->no_io_error_seen will be non-zero and we go to fixup function
scrub_handle_errored_block().

In scrub_handle_errored_block(), for sctx without csum (no matter if
we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
which does the similar thing with copy_nocow_pages(), but does it
without the extra check in copy_nocow_pages() routine.

So for test cases like btrfs/100, where we emulate read errors during
replace/scrub, we could corrupt compressed extent data again.

This patch will fix it just by avoiding any "optimization" for
nodatasum, just falls back to the normal fixup routine by try read from
any good copy.

This also solves WARN_ON() or dead lock caused by lame backref iteration
in scrub_fixup_nodatasum() routine.

The deadlock or WARN_ON() won't be triggered before commit ac0b4145d662
("btrfs: scrub: Don't use inode pages for device replace") since
copy_nocow_pages() have better locking and extra check for data extent,
and it's already doing the fixup work by try to read data from any good
copy, so it won't go scrub_fixup_nodatasum() anyway.

This patch disables the faulty code and will be removed completely in a
followup patch.

Fixes: ac0b4145d662 ("btrfs: scrub: Don't use inode pages for device replace")
Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2018-07-17 19:56:30 +0800

15 Jul, 2018

4 commits

fe10e398e reiserfs: fix buffer overflow with long warning messages ... Browse Code »

ReiserFS prepares log messages into a 1024-byte buffer with no bounds
checks. Long messages, such as the "unknown mount option" warning when
userspace passes a crafted mount options string, overflow this buffer.
This causes KASAN to report a global-out-of-bounds write.

Fix it by truncating messages to the buffer size.

Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Biggers
2018-07-15 02:11:10 +0800
24962af7e fs, elf: make sure to page align bss in load_elf_library ... Browse Code »

The current code does not make sure to page align bss before calling
vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to
the requested lenght not being correctly aligned.

Let us make sure to align it properly.

Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured
for libc5.

Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador
Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com
Tested-by: Tetsuo Handa
Acked-by: Kees Cook
Cc: Michal Hocko
Cc: Nicolas Pitre
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oscar Salvador
2018-07-15 02:11:10 +0800
02f51d459 autofs: fix slab out of bounds read in getname_kernel() ... Browse Code »

The autofs subsystem does not check that the "path" parameter is present
for all cases where it is required when it is passed in via the "param"
struct.

In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD
ioctl command.

To solve it, modify validate_dev_ioctl(function to check that a path has
been provided for ioctl commands that require it.

Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net
Signed-off-by: Tomas Bortoli
Signed-off-by: Ian Kent
Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com
Cc: Dmitry Vyukov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tomas Bortoli
2018-07-15 02:11:09 +0800
e70cc2bd5 fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps* ... Browse Code »

Thomas reports:
"While looking around in /proc on my v4.14.52 system I noticed that all
processes got a lot of "Locked" memory in /proc/*/smaps. A lot more
memory than a regular user can usually lock with mlock().

Commit 493b0e9d945f (in v4.14-rc1) seems to have changed the behavior
of "Locked".

Before that commit the code was like this. Notice the VM_LOCKED check.

(vma->vm_flags & VM_LOCKED) ?
(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);

After that commit Locked is now the same as Pss:

(unsigned long)(mss->pss >> (10 + PSS_SHIFT)));

This looks like a mistake."

Indeed, the commit has added mss->pss_locked with the correct value that
depends on VM_LOCKED, but forgot to actually use it. Fix it.

Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka
Reported-by: Thomas Lindroth
Cc: Alexey Dobriyan
Cc: Daniel Colascione
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2018-07-15 02:11:09 +0800

13 Jul, 2018

3 commits

97b191702 btrfs: fix use-after-free of cmp workspace pages ... Browse Code »

btrfs_cmp_data_free() puts cmp's src_pages and dst_pages, but leaves
their page address intact. Now, if you hit "goto again" in
btrfs_extent_same_range() and hit some error in
btrfs_cmp_data_prepare(), you'll try to unlock/put already put pages.

This is simple fix to reset the address to avoid use-after-free.

Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
Signed-off-by: Naohiro Aota
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Naohiro Aota
2018-07-13 23:31:35 +0800
20c5bbc64 btrfs: restore uuid_mutex in btrfs_open_devices ... Browse Code »

Commit 542c5908abfe84f7b4c1 ("btrfs: replace uuid_mutex by
device_list_mutex in btrfs_open_devices") switched to device_list_mutex
as we need that for the device list traversal, but we also need
uuid_mutex to protect access to fs_devices::opened to be consistent with
other users of that.

Fixes: 542c5908abfe84f7b4c1 ("btrfs: replace uuid_mutex by device_list_mutex in btrfs_open_devices")
Reviewed-by: Anand Jain
Signed-off-by: David Sterba

David Sterba
2018-07-13 20:55:46 +0800
8d5a803c6 ext4: check for allocation block validity with block group locked ... Browse Code »

With commit 044e6e3d74a3: "ext4: don't update checksum of new
initialized bitmaps" the buffer valid bit will get set without
actually setting up the checksum for the allocation bitmap, since the
checksum will get calculated once we actually allocate an inode or
block.

If we are doing this, then we need to (re-)check the verified bit
after we take the block group lock. Otherwise, we could race with
another process reading and verifying the bitmap, which would then
complain about the checksum being invalid.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137

Signed-off-by: Theodore Ts'o
Cc: stable@kernel.org

Theodore Ts'o
2018-07-13 07:08:05 +0800

11 Jul, 2018

1 commit

b4e7a7a88 drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open() ... Browse Code »

Failure of ->open() should *not* be followed by fput(). Fixed by
using filp_clone_open(), which gets the cleanups right.

Cc: stable@vger.kernel.org
Acked-by: Linus Torvalds
Signed-off-by: Al Viro

Al Viro
2018-07-11 11:29:03 +0800

10 Jul, 2018

1 commit

362eca70b ext4: fix inline data updates with checksums enabled ... Browse Code »

The inline data code was updating the raw inode directly; this is
problematic since if metadata checksums are enabled,
ext4_mark_inode_dirty() must be called to update the inode's checksum.
In addition, the jbd2 layer requires that get_write_access() be called
before the metadata buffer is modified. Fix both of these problems.

https://bugzilla.kernel.org/show_bug.cgi?id=200443

Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Theodore Ts'o
2018-07-10 13:07:43 +0800

09 Jul, 2018

3 commits

2dca60d98 ext4: clear mmp sequence number when remounting read-only ... Browse Code »

Previously, when an MMP-protected file system is remounted read-only,
the kmmpd thread would exit the next time it woke up (a few seconds
later), without resetting the MMP sequence number back to
EXT4_MMP_SEQ_CLEAN.

Fix this by explicitly killing the MMP thread when the file system is
remounted read-only.

Signed-off-by: Theodore Ts'o
Cc: Andreas Dilger

Theodore Ts'o
2018-07-09 07:36:02 +0800
44de022c4 ext4: fix false negatives *and* false positives in ext4_check_descriptors() ... Browse Code »

Ext4_check_descriptors() was getting called before s_gdb_count was
initialized. So for file systems w/o the meta_bg feature, allocation
bitmaps could overlap the block group descriptors and ext4 wouldn't
notice.

For file systems with the meta_bg feature enabled, there was a
fencepost error which would cause the ext4_check_descriptors() to
incorrectly believe that the block allocation bitmap overlaps with the
block group descriptor blocks, and it would reject the mount.

Fix both of these problems.

Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org

Theodore Ts'o
2018-07-09 07:35:02 +0800
70a2dc6ab Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 bugfixes from Ted Ts'o:
"Bug fixes for ext4; most of which relate to vulnerabilities where a
maliciously crafted file system image can result in a kernel OOPS or
hang.

At least one fix addresses an inline data bug could be triggered by
userspace without the need of a crafted file system (although it does
require that the inline data feature be enabled)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: check superblock mapped prior to committing
ext4: add more mount time checks of the superblock
ext4: add more inode number paranoia checks
ext4: avoid running out of journal credits when appending to an inline file
jbd2: don't mark block as modified if the handle is out of credits
ext4: never move the system.data xattr out of the inode body
ext4: clear i_data in ext4_inode_info when removing inline data
ext4: include the illegal physical block in the bad map ext4_error msg
ext4: verify the depth of extent tree in ext4_find_extent()
ext4: only look at the bg_flags field if it is valid
ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
ext4: always check block group bounds in ext4_init_block_bitmap()
ext4: always verify the magic number in xattr blocks
ext4: add corruption check in ext4_xattr_set_entry()
ext4: add warn_on_error mount option

Linus Torvalds
2018-07-09 02:10:30 +0800

08 Jul, 2018

1 commit

b2d44d145 Merge tag '4.18-rc3-smb3fixes' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"Five smb3/cifs fixes for stable (including for some leaks and memory
overwrites) and also a few fixes for recent regressions in packet
signing.

Additional testing at the recent SMB3 test event, and some good work
by Paulo and others spotted the issues fixed here. In addition to my
xfstest runs on these, Aurelien and Stefano did additional test runs
to verify this set"

* tag '4.18-rc3-smb3fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf()
cifs: Fix infinite loop when using hard mount option
cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting
cifs: Fix memory leak in smb2_set_ea()
cifs: fix SMB1 breakage
cifs: Fix validation of signed data in smb2
cifs: Fix validation of signed data in smb3+
cifs: Fix use after free of a mid_q_entry

Linus Torvalds
2018-07-08 09:31:34 +0800

06 Jul, 2018

2 commits

0fa3ecd87 Fix up non-directory creation in SGID directories ... Browse Code »

sgid directories have special semantics, making newly created files in
the directory belong to the group of the directory, and newly created
subdirectories will also become sgid. This is historically used for
group-shared directories.

But group directories writable by non-group members should not imply
that such non-group members can magically join the group, so make sure
to clear the sgid bit on non-directories for non-members (but remember
that sgid without group execute means "mandatory locking", just to
confuse things even more).

Reported-by: Jann Horn
Cc: Andy Lutomirski
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2018-07-06 03:36:36 +0800
729c0c9dd cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf() ... Browse Code »

smb{2,3}_create_lease_buf() store a lease key in the lease
context for later usage on a lease break.

In most paths, the key is currently sourced from data that
happens to be on the stack near local variables for oplock in
SMB2_open() callers, e.g. from open_shroot(), whereas
smb2_open_file() properly allocates space on its stack for it.

The address of those local variables holding the oplock is then
passed to create_lease_buf handlers via SMB2_open(), and 16
bytes near oplock are used. This causes a stack out-of-bounds
access as reported by KASAN on SMB2.1 and SMB3 mounts (first
out-of-bounds access is shown here):

[ 111.528823] BUG: KASAN: stack-out-of-bounds in smb3_create_lease_buf+0x399/0x3b0 [cifs]
[ 111.530815] Read of size 8 at addr ffff88010829f249 by task mount.cifs/985
[ 111.532838] CPU: 3 PID: 985 Comm: mount.cifs Not tainted 4.18.0-rc3+ #91
[ 111.534656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 111.536838] Call Trace:
[ 111.537528] dump_stack+0xc2/0x16b
[ 111.540890] print_address_description+0x6a/0x270
[ 111.542185] kasan_report+0x258/0x380
[ 111.544701] smb3_create_lease_buf+0x399/0x3b0 [cifs]
[ 111.546134] SMB2_open+0x1ef8/0x4b70 [cifs]
[ 111.575883] open_shroot+0x339/0x550 [cifs]
[ 111.591969] smb3_qfs_tcon+0x32c/0x1e60 [cifs]
[ 111.617405] cifs_mount+0x4f3/0x2fc0 [cifs]
[ 111.674332] cifs_smb3_do_mount+0x263/0xf10 [cifs]
[ 111.677915] mount_fs+0x55/0x2b0
[ 111.679504] vfs_kern_mount.part.22+0xaa/0x430
[ 111.684511] do_mount+0xc40/0x2660
[ 111.698301] ksys_mount+0x80/0xd0
[ 111.701541] do_syscall_64+0x14e/0x4b0
[ 111.711807] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 111.713665] RIP: 0033:0x7f372385b5fa
[ 111.715311] Code: 48 8b 0d 99 78 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 66 78 2c 00 f7 d8 64 89 01 48
[ 111.720330] RSP: 002b:00007ffff27049d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 111.722601] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f372385b5fa
[ 111.724842] RDX: 000055c2ecdc73b2 RSI: 000055c2ecdc73f9 RDI: 00007ffff270580f
[ 111.727083] RBP: 00007ffff2705804 R08: 000055c2ee976060 R09: 0000000000001000
[ 111.729319] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f3723f4d000
[ 111.731615] R13: 000055c2ee976060 R14: 00007f3723f4f90f R15: 0000000000000000

[ 111.735448] The buggy address belongs to the page:
[ 111.737420] page:ffffea000420a7c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 111.739890] flags: 0x17ffffc0000000()
[ 111.741750] raw: 0017ffffc0000000 0000000000000000 dead000000000200 0000000000000000
[ 111.744216] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 111.746679] page dumped because: kasan: bad access detected

[ 111.750482] Memory state around the buggy address:
[ 111.752562] ffff88010829f100: 00 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
[ 111.754991] ffff88010829f180: 00 00 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
[ 111.757401] >ffff88010829f200: 00 00 00 00 00 f1 f1 f1 f1 01 f2 f2 f2 f2 f2 f2
[ 111.759801] ^
[ 111.762034] ffff88010829f280: f2 02 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[ 111.764486] ffff88010829f300: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 111.766913] ==================================================================

Lease keys are however already generated and stored in fid data
on open and create paths: pass them down to the lease context
creation handlers and use them.

Suggested-by: Aurélien Aptel
Reviewed-by: Aurelien Aptel
Fixes: b8c32dbb0deb ("CIFS: Request SMB2.1 leases")
Signed-off-by: Stefano Brivio
Signed-off-by: Steve French

Stefano Brivio
2018-07-06 02:48:25 +0800