29 Oct, 2018
2 commits
-
That we we can also poll non blk-mq queues. Mostly needed for
the NVMe multipath code, but could also be useful elsewhere.Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
(cherry picked from commit ea435e1b9392a33deceaea2a16ebaa3397bead93) -
Fix bug of commit 74d46992e0d9 ("block: replace bi_bdev with a gendisk
pointer and partitions index").bio_dev(bio) is used to find the dev state in function
__btrfsic_submit_bio. But when dev_state is added to the hashtable, it
is using dev_t of block_device.bio_dev(bio) returns a dev_t of part0 which is different from dev_t in
block_device(bd_dev). bd_dev in block_device represents the exact
partition.block_device.bd_dev =
bio->bi_partno (same as block_device.bd_partno) + bio_dev(bio).When adding a dev_state into hashtable, we use the exact partition dev_t.
So when looking it up, it should also use the exact partition dev_t.Reproducer of this bug:
Use MOUNT_OPTIONS="-o check_int" and run btrfs/001 in fstests.
Then there will be WARNING like below.WARNING:
btrfs: attempt to write superblock which references block M @29523968 (sda7 /1111654400/2) which is never written!Signed-off-by: Gu JinXiang
Reviewed-by: David Sterba
Signed-off-by: David Sterba
(cherry picked from commit d28e649a5c58b779b303c252c66ee84a0f2c3b32)
20 Oct, 2018
1 commit
-
This reverts commit 4f4374a9bd25b333971e6f2656b642d29e2efe7b which was
commit a6795a585929d94ca3e931bc8518f8deb8bbe627 upstream.Turns out this causes problems and was to fix a patch only in the 4.19
and newer tree.Reported-by: Amir Goldstein
Cc: Miklos Szeredi
Cc: Christoph Hellwig
Cc: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
18 Oct, 2018
1 commit
-
commit f1782c9bc547754f4bd3043fe8cfda53db85f13f upstream.
I received a report about suspicious growth of unreclaimable slabs on
some machines. I've found that it happens on machines with low memory
pressure, and these unreclaimable slabs are external names attached to
dentries.External names are allocated using generic kmalloc() function, so they
are accounted as unreclaimable. But they are held by dentries, which
are reclaimable, and they will be reclaimed under the memory pressure.In particular, this breaks MemAvailable calculation, as it doesn't take
unreclaimable slabs into account. This leads to a silly situation, when
a machine is almost idle, has no memory pressure and therefore has a big
dentry cache. And the resulting MemAvailable is too low to start a new
workload.To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is
used to track the amount of memory, consumed by external names. The
counter is increased in the dentry allocation path, if an external name
structure is allocated; and it's decreased in the dentry freeing path.To reproduce the problem I've used the following Python script:
import os
for iter in range (0, 10000000):
try:
name = ("/some_long_name_%d" % iter) + "_" * 220
os.stat(name)
except Exception:
passWithout this patch:
$ cat /proc/meminfo | grep MemAvailable
MemAvailable: 7811688 kB
$ python indirect.py
$ cat /proc/meminfo | grep MemAvailable
MemAvailable: 2753052 kBWith the patch:
$ cat /proc/meminfo | grep MemAvailable
MemAvailable: 7809516 kB
$ python indirect.py
$ cat /proc/meminfo | grep MemAvailable
MemAvailable: 7749144 kB[guro@fb.com: fix indirectly reclaimable memory accounting for CONFIG_SLOB]
Link: http://lkml.kernel.org/r/20180312194140.19517-1-guro@fb.com
[guro@fb.com: fix indirectly reclaimable memory accounting]
Link: http://lkml.kernel.org/r/20180313125701.7955-1-guro@fb.com
Link: http://lkml.kernel.org/r/20180305133743.12746-5-guro@fb.com
Signed-off-by: Roman Gushchin
Reviewed-by: Andrew Morton
Cc: Alexander Viro
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
13 Oct, 2018
2 commits
-
commit 37f31b6ca4311b94d985fb398a72e5399ad57925 upstream.
The requested device name can be NULL or an empty string.
Check for that and refuse to continue. UBIFS has to do this manually
since we cannot use mount_bdev(), which checks for this condition.Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system")
Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com
Signed-off-by: Richard Weinberger
Signed-off-by: Greg Kroah-Hartman -
commit d3f07c049dab1a3f1740f476afd3d5e5b738c21c upstream.
syzbot found the following crash on:
HEAD commit: d9bd94c0bcaa Add linux-next specific files for 20180801
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000
kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c
dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81
compiler: gcc (GCC) 8.0.1 20180413 (experimental)Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c966a82db0b14aa37e81@syzkaller.appspotmail.comloop7: rw=12288, want=8200, limit=20
netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'.
openvswitch: netlink: Message has 8 unknown bytes.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline]
RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline]
RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835
Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00
RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000
RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005
RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026
R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb
R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40
FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860
f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883
mount_bdev+0x314/0x3e0 fs/super.c:1344
f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133
legacy_get_tree+0x131/0x460 fs/fs_context.c:729
vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743
do_new_mount fs/namespace.c:2603 [inline]
do_mount+0x6f2/0x1e20 fs/namespace.c:2927
ksys_mount+0x12d/0x140 fs/namespace.c:3143
__do_sys_mount fs/namespace.c:3157 [inline]
__se_sys_mount fs/namespace.c:3154 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3154
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45943a
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a
RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0
RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013
R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace bd8550c129352286 ]---
RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline]
RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline]
RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835
Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00
RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000
RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005
netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'.
RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026
openvswitch: netlink: Message has 8 unknown bytes.
R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb
R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40
FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400In validate_checkpoint(), if we failed to call get_checkpoint_version(), we
will pass returned invalid page pointer into f2fs_put_page, cause accessing
invalid memory, this patch tries to handle error path correctly to fix this
issue.Signed-off-by: Chao Yu
Signed-off-by: Greg Kroah-HartmanSigned-off-by: Jaegeuk Kim
10 Oct, 2018
11 commits
-
commit cbe355f57c8074bc4f452e5b6e35509044c6fa23 upstream.
In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock. This can cause list
corruptions and can end up in kernel panic.Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant
Reviewed-by: Changwei Ge
Acked-by: Joseph Qi
Acked-by: Jun Piao
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman -
commit f8a00cef17206ecd1b30d3d9f99e10d9fa707aa7 upstream.
Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU. That means that
the stack can change under the stack walker. The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers. This can cause exposure of
kernel stack contents to userspace.Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents. See the added comment for a longer
rationale.There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails. Therefore, I believe
that this change is unlikely to break things. In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f50 ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Alexey Dobriyan
Cc: Ken Chen
Cc: Will Deacon
Cc: Laura Abbott
Cc: Andy Lutomirski
Cc: Catalin Marinas
Cc: Josh Poimboeuf
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H . Peter Anvin"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman -
commit 0595751f267994c3c7027377058e4185b3a28e75 upstream.
When mounting a Windows share that is the root of a drive (eg. C$)
the server does not return . and .. directory entries. This results in
the smb2 code path erroneously skipping the 2 first entries.Pseudo-code of the readdir() code path:
cifs_readdir(struct file, struct dir_context)
initiate_cifs_search ops->query_dir_firstdir_emit_dots
dir_emit < (pos_in_buf)) && (cur_ent != NULL); i++) {
/* go entry by entry figuring out which is first */
cur_ent = nxt_dir_entry(cur_ent, end_of_smb,
cfile->srch_inf.info_level);
}C) cifs_filldir() skips . and .. so we can safely ignore them for now.
Sample program:
int main(int argc, char **argv)
{
const char *path = argc >= 2 ? argv[1] : ".";
DIR *dh;
struct dirent *de;printf("listing path \n", path);
dh = opendir(path);
if (!dh) {
printf("opendir error %d\n", errno);
return 1;
}while (1) {
de = readdir(dh);
if (!de) {
if (errno) {
printf("readdir error %d\n", errno);
return 1;
}
printf("end of listing\n");
break;
}
printf("off=%lu \n", de->d_off, de->d_name);
}return 0;
}Before the fix with SMB1 on root shares:
off=1
off=2
off=3
off=4and on non-root shares:
off=1
off=4 off=5 we skipped . and .. from response buffer (C)
off=6 but still incremented pos
off=7
off=8Therefore the fix for smb2 is to mimic smb1 behaviour and offset the
index_of_last_entry by 2.Test results comparing smb1 and smb2 before/after the fix on root
share, non-root shares and on large directories (ie. multi-response
dir listing):PRE FIX
=======
pre-1-root VS pre-2-root:
ERR pre-2-root is missing [bootmgr, $Recycle.Bin]
pre-1-nonroot VS pre-2-nonroot:
OK~ same files, same order, different offsets
pre-1-nonroot-large VS pre-2-nonroot-large:
OK~ same files, same order, different offsetsPOST FIX
========
post-1-root VS post-2-root:
OK same files, same order, same offsets
post-1-nonroot VS post-2-nonroot:
OK same files, same order, same offsets
post-1-nonroot-large VS post-2-nonroot-large:
OK same files, same order, same offsetsREGRESSION?
===========
pre-1-root VS post-1-root:
OK same files, same order, same offsets
pre-1-nonroot VS post-1-nonroot:
OK same files, same order, same offsetsBugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107
Signed-off-by: Aurelien Aptel
Signed-off-by: Paulo Alcantara
Reviewed-by: Ronnie Sahlberg
Signed-off-by: Steve French
CC: Stable
Signed-off-by: Greg Kroah-Hartman -
commit ffc4c92227db5699493e43eb140b4cb5904c30ff upstream.
Commit 786534b92f3c introduced a regression that caused listxattr to
return the POSIX ACL attribute names even though sysfs doesn't support
POSIX ACLs. This happens because simple_xattr_list checks for NULL
i_acl / i_default_acl, but inode_init_always initializes those fields
to ACL_NOT_CACHED ((void *)-1). For example:
$ getfattr -m- -d /sys
/sys: system.posix_acl_access: Operation not supported
/sys: system.posix_acl_default: Operation not supported
Fix this in simple_xattr_list by checking if the filesystem supports POSIX ACLs.Fixes: 786534b92f3c ("tmpfs: listxattr should include POSIX ACL xattrs")
Reported-by: Marc Aurèle La France
Tested-by: Marc Aurèle La France
Signed-off-by: Andreas Gruenbacher
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman -
commit 1a8f8d2a443ef9ad9a3065ba8c8119df714240fa upstream.
Format has a typo: it was meant to be "%.*s", not "%*s". But at some point
callers grew nonprintable values as well, so use "%*pE" instead with a
maximized length.Reported-by: Amir Goldstein
Signed-off-by: Miklos Szeredi
Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
Cc: # v4.12
Signed-off-by: Greg Kroah-Hartman -
commit 63e132528032ce937126aba591a7b37ec593a6bb upstream.
The memory leak was detected by kmemleak when running xfstests
overlay/051,053Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries")
Cc: # v4.13
Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi
Signed-off-by: Greg Kroah-Hartman -
commit 601350ff58d5415a001769532f6b8333820e5786 upstream.
KASAN detected slab-out-of-bounds access in printk from overlayfs,
because string format used %*s instead of %.*s.> BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604
> Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811
>
> CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36
...
> printk+0xa7/0xcf kernel/printk/printk.c:1996
> ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi
Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
Cc: # v4.13
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 097f5863b1a0c9901f180bbd56ae7d630655faaa ]
We need to verify that the "data_offset" is within bounds.
Reported-by: Dr Silvio Cesare of InfoSect
Signed-off-by: Dan Carpenter
Signed-off-by: Steve French
Reviewed-by: Aurelien Aptel
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit bcfb84a996f6fa90b5e6e2954b2accb7a4711097 ]
A powerpc build of cifs with gcc v8.2.0 produces this warning:
fs/cifs/cifssmb.c: In function ‘CIFSSMBNegotiate’:
fs/cifs/cifssmb.c:605:3: warning: ‘strncpy’ writing 16 bytes into a region of size 1 overflows the destination [-Wstringop-overflow=]
strncpy(pSMB->DialectsArray+count, protocols[i].name, 16);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Since we are already doing a strlen() on the source, change the strncpy
to a memcpy().Signed-off-by: Stephen Rothwell
Signed-off-by: Steve French
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit c15e3f19a6d5c89b1209dc94b40e568177cb0921 ]
When a Mac client saves an item containing a backslash to a file server
the backslash is represented in the CIFS/SMB protocol as as U+F026.
Before this change, listing a directory containing an item with a
backslash in its name will return that item with the backslash
represented with a true backslash character (U+005C) because
convert_sfm_character mapped U+F026 to U+005C when interpretting the
CIFS/SMB protocol response. However, attempting to open or stat the
path using a true backslash will result in an error because
convert_to_sfm_char does not map U+005C back to U+F026 causing the
CIFS/SMB request to be made with the backslash represented as U+005C.This change simply prevents the U+F026 to U+005C conversion from
happenning. This is analogous to how the code does not do any
translation of UNI_SLASH (U+F000).Signed-off-by: Jon Kuhn
Signed-off-by: Steve French
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 801660b040d132f67fac6a95910ad307c5929b49 ]
Test case btrfs/164 reports use-after-free:
[ 6712.084324] general protection fault: 0000 [#1] PREEMPT SMP
..
[ 6712.195423] btrfs_update_commit_device_size+0x75/0xf0 [btrfs]
[ 6712.201424] btrfs_commit_transaction+0x57d/0xa90 [btrfs]
[ 6712.206999] btrfs_rm_device+0x627/0x850 [btrfs]
[ 6712.211800] btrfs_ioctl+0x2b03/0x3120 [btrfs]Reason for this is that btrfs_shrink_device adds the resized device to
the fs_devices::resized_devices after it has called the last commit
transaction.So the list fs_devices::resized_devices is not empty when
btrfs_shrink_device returns. Now the parent function
btrfs_rm_device calls:btrfs_close_bdev(device);
call_rcu(&device->rcu, free_device_rcu);and then does the transactio ncommit. It goes through the
fs_devices::resized_devices in btrfs_update_commit_device_size and
leads to use-after-free.Fix this by making sure btrfs_shrink_device calls the last needed
btrfs_commit_transaction before the return. This is consistent with what
the grow counterpart does and this makes sure the on-disk state is
persistent when the function returns.Reported-by: Lu Fengqi
Tested-by: Lu Fengqi
Signed-off-by: Anand Jain
Reviewed-by: David Sterba
[ update changelog ]
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2018
5 commits
-
[ Upstream commit 09a4e0be5826aa66c4ce9954841f110ffe63ef4f ]
The largest block size supported by isofs is ISOFS_BLOCK_SIZE (2048), but
isofs_fill_super calls sb_min_blocksize and sets the blocksize to the
device's logical block size if it's larger than what we ended up with after
option parsing.If for some reason we try to mount a hard 4k device as an isofs filesystem,
we'll set opt.blocksize to 4096, and when we try to read the superblock
we found via:block = iso_blknum << (ISOFS_BLOCK_BITS - s->s_blocksize_bits)
with s_blocksize_bits greater than ISOFS_BLOCK_BITS, we'll have a negative
shift and the bread will fail somewhat cryptically:isofs_fill_super: bread failed, dev=sda, iso_blknum=17, block=-2147483648
It seems best to just catch and clearly reject mounts of such a device.
Reported-by: Bryan Gurney
Signed-off-by: Eric Sandeen
Signed-off-by: Jan Kara
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
commit 764baba80168ad3adafb521d2ab483ccbc49e344 upstream.
Commit 31747eda41ef ("ovl: hash directory inodes for fsnotify")
fixed an issue of inotify watch on directory that stops getting
events after dropping dentry caches.A similar issue exists for non-dir non-upper files, for example:
$ mkdir -p lower upper work merged
$ touch lower/foo
$ mount -t overlay -o
lowerdir=lower,workdir=work,upperdir=upper none merged
$ inotifywait merged/foo &
$ echo 2 > /proc/sys/vm/drop_caches
$ cat merged/fooinotifywait doesn't get the OPEN event, because ovl_lookup() called
from 'cat' allocates a new overlay inode and does not reuse the
watched inode.Fix this by hashing non-dir overlay inodes by lower real inode in
the following cases that were not hashed before this change:
- A non-upper overlay mount
- A lower non-hardlink when index=offA helper ovl_hash_bylower() was added to put all the logic and
documentation about which real inode an overlay inode is hashed by
into one place.The issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.Cc: #v4.13
Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi
Signed-off-by: Mark Salyzyn #4.14
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 826d7bc9f013d01e92997883d2fd0c25f4af1f1c ]
If the flock owner process is dead and its pid has been already freed,
pid translation won't work, but we still want to show flock owner pid
number when expecting /proc/$PID/fdinfo/$FD in init pidns.Reproducer:
process A process A1 process A2
fork()--------->
exit() open()
flock()
fork()--------->
exit() sleep()Before the patch:
================
(root@vz7)/: cat /proc/${PID_A2}/fdinfo/3
pos: 4
flags: 02100002
mnt_id: 257
lock: (root@vz7)/:After the patch:
===============
(root@vz7)/:cat /proc/${PID_A2}/fdinfo/3
pos: 4
flags: 02100002
mnt_id: 295
lock: 1: FLOCK ADVISORY WRITE ${PID_A1} b6:f8a61:529946 0 EOFFixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks")
Signed-off-by: Konstantin Khorenko
Acked-by: Andrey Vagin
Reviewed-by: Benjamin Coddington
Signed-off-by: Jeff Layton
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 5b7b15aee641904ae269be9846610a3950cbd64c ]
We're encoding a single op in the reply but leaving the number of ops
zero, so the reply makes no sense.Somewhat academic as this isn't a case any real client will hit, though
in theory perhaps that could change in a future protocol extension.Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit ebf00be37de35788cad72f4f20b4a39e30c0be4a ]
According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated. This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O. The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks. As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write. In that
case, wait for the short write to complete to allow gfs2 to recover.Signed-off-by: Andreas Gruenbacher
Signed-off-by: Christoph Hellwig
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
29 Sep, 2018
10 commits
-
commit 338affb548c243d2af25b1ca628e67819350de6b upstream.
When in effect, add "test_dummy_encryption" to _ext4_show_options() so
that it is shown in /proc/mounts and other relevant procfs files.Signed-off-by: Eric Biggers
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit fe18d649891d813964d3aaeebad873f281627fbc upstream.
Marking mmp bh dirty before writing it will make writeback
pick up mmp block later and submit a write, we don't want the
duplicate write as kmmpd thread should have full control of
reading and writing the mmp block.
Another reason is we will also have random I/O error on
the writeback request when blk integrity is enabled, because
kmmpd could modify the content of the mmp block(e.g. setting
new seq and time) while the mmp block is under I/O requested
by writeback.Signed-off-by: Li Dongyang
Signed-off-by: Theodore Ts'o
Reviewed-by: Andreas Dilger
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 5f8c10936fab2b69a487400f2872902e597dd320 upstream.
An online resize of a file system with the bigalloc feature enabled
and a 1k block size would be refused since ext4_resize_begin() did not
understand s_first_data_block is 0 for all bigalloc file systems, even
when the block size is 1k.Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit f0a459dec5495a3580f8d784555e6f8f3bf7f263 upstream.
Avoid growing the file system to an extent so that the last block
group is too small to hold all of the metadata that must be stored in
the block group.This problem can be triggered with the following reproducer:
umount /mnt
mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \
-E resize=1073741824 /tmp/foo.img 128M
mount /tmp/foo.img /mnt
truncate --size 1708M /tmp/foo.img
resize2fs /dev/loop0 295400
umount /mnt
e2fsck -fy /tmp/foo.imgReported-by: Torsten Hilbrich
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 4274f516d4bc50648a4d97e4f67ecbd7b65cde4a upstream.
When mounting the superblock, ext4_fill_super() calculates the free
blocks and free inodes and stores them in the superblock. It's not
strictly necessary, since we don't use them any more, but it's nice to
keep them roughly aligned to reality.Since it's not critical for file system correctness, the code doesn't
call ext4_commit_super(). The problem is that it's in
ext4_commit_super() that we recalculate the superblock checksum. So
if we're not going to call ext4_commit_super(), we need to call
ext4_superblock_csum_set() to make sure the superblock checksum is
consistent.Most of the time, this doesn't matter, since we end up calling
ext4_commit_super() very soon thereafter, and definitely by the time
the file system is unmounted. However, it doesn't work in this
sequence:mke2fs -Fq -t ext4 /dev/vdc 128M
mount /dev/vdc /vdc
cp xfstests/git-versions /vdc
godown /vdc
umount /vdc
mount /dev/vdc
tune2fs -l /dev/vdcWith this commit, the "tune2fs -l" no longer fails.
Reported-by: Chengguang Xu
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit bcd8e91f98c156f4b1ebcfacae675f9cfd962441 upstream.
A maliciously crafted file system can cause an overflow when the
results of a 64-bit calculation is stored into a 32-bit length
parameter.https://bugzilla.kernel.org/show_bug.cgi?id=200623
Signed-off-by: Theodore Ts'o
Reported-by: Wen Xu
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 4d982e25d0bdc83d8c64e66fdeca0b89240b3b85 upstream.
A specially crafted file system can trick empty_inline_dir() into
reading past the last valid entry in a inline directory, and then run
into the end of xattr marker. This will trigger a divide by zero
fault. Fix this by using the size of the inline directory instead of
dir->i_size.Also clean up error reporting in __ext4_check_dir_entry so that the
message is clearer and more understandable --- and avoids the division
by zero trap if the size passed in is zero. (I'm not sure why we
coded it that way in the first place; printing offset % size is
actually more confusing and less useful.)https://bugzilla.kernel.org/show_bug.cgi?id=200933
Signed-off-by: Theodore Ts'o
Reported-by: Wen Xu
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit b50282f3241acee880514212d88b6049fb5039c8 upstream.
If the destination of the rename(2) system call exists, the inode's
link count (i_nlinks) must be non-zero. If it is, the inode can end
up on the orphan list prematurely, leading to all sorts of hilarity,
including a use-after-free.https://bugzilla.kernel.org/show_bug.cgi?id=200931
Signed-off-by: Theodore Ts'o
Reported-by: Wen Xu
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 234b69e3e089d850a98e7b3145bd00e9b52b1111 upstream.
While reading block, it is possible that io error return due to underlying
storage issue, in this case, BH_NeedsValidate was left in the buffer head.
Then when reading the very block next time, if it was already linked into
journal, that will trigger the following panic.[203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
[203748.702533] invalid opcode: 0000 [#1] SMP
[203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
[203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
[203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
[203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
[203748.703088] RIP: 0010:[] [] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.703130] RSP: 0018:ffff88006ff4b818 EFLAGS: 00010206
[203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
[203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
[203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
[203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
[203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
[203748.705871] FS: 00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
[203748.706370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
[203748.707124] Stack:
[203748.707371] ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
[203748.707885] 0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
[203748.708399] 00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
[203748.708915] Call Trace:
[203748.709175] [] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[203748.709680] [] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
[203748.710185] [] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
[203748.710691] [] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
[203748.711204] [] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
[203748.711716] [] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
[203748.712227] [] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
[203748.712737] [] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
[203748.713003] [] ocfs2_create+0x65/0x170 [ocfs2]
[203748.713263] [] vfs_create+0xdb/0x150
[203748.713518] [] do_last+0x815/0x1210
[203748.713772] [] ? path_init+0xb9/0x450
[203748.714123] [] path_openat+0x80/0x600
[203748.714378] [] ? handle_pte_fault+0xd15/0x1620
[203748.714634] [] do_filp_open+0x3a/0xb0
[203748.714888] [] ? __alloc_fd+0xa7/0x130
[203748.715143] [] do_sys_open+0x12c/0x220
[203748.715403] [] ? syscall_trace_enter_phase1+0x11b/0x180
[203748.715668] [] ? system_call_after_swapgs+0xe9/0x190
[203748.715928] [] SyS_open+0x1e/0x20
[203748.716184] [] system_call_fastpath+0x18/0xd7
[203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
[203748.717505] RIP [] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.717775] RSPJoesph ever reported a similar panic.
Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.htmlLink: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi
Cc: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Changwei Ge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman -
commit f061c1cc404a618858a77aea233fde0aeaad2f2d upstream.
This reverts commit 11a6fc3dc743e22fb50f2196ec55bee5140d3c52.
UBIFS wants to assert that xattr operations are only issued on files
with positive link count. The said patch made this operations return
-ENOENT for unlinked files such that the asserts will no longer trigger.
This was wrong since xattr operations are perfectly fine on unlinked
files.
Instead the assertions need to be fixed/removed.Cc:
Fixes: 11a6fc3dc743 ("ubifs: xattr: Don't operate on deleted inodes")
Reported-by: Koen Vandeputte
Tested-by: Joel Stanley
Signed-off-by: Richard Weinberger
Signed-off-by: Greg Kroah-Hartman
26 Sep, 2018
8 commits
-
[ Upstream commit cc57c07343bd071cdf1915a91a24ab7d40c9b590 ]
This patch fixes a bug where configfs_register_group had added
a group in a tree, and userspace has done a rmdir on a dir somewhere
above that group and we hit a kernel crash. The problem is configfs_rmdir
will detach everything under it and unlink groups on the default_groups
list. It will not unlink groups added with configfs_register_group so when
configfs_unregister_group is called to drop its references to the group/items
we crash when we try to access the freed dentrys.The patch just adds a check for if a rmdir has been done above
us and if so just does the unlink part of unregistration.Sorry if you are getting this multiple times. I thouhgt I sent
this to some of you and lkml, but I do not see it.Signed-off-by: Mike Christie
Cc: Christoph Hellwig
Cc: Joel Becker
Signed-off-by: Christoph Hellwig
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit a6795a585929d94ca3e931bc8518f8deb8bbe627 ]
The underlying real file used by overlayfs still contains the overlay path.
This results in mnt_want_write_file() calls by the filesystem getting
freeze protection on the wrong inode (the overlayfs one instead of the real
one).Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb.
Reported-by: Amir Goldstein
Signed-off-by: Miklos Szeredi
Reviewed-by: Christoph Hellwig
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 2f819db565e82e5f73cd42b39925098986693378 ]
The regset API documented in defines -ENODEV as the
result of the `->active' handler to be used where the feature requested
is not available on the hardware found. However code handling core file
note generation in `fill_thread_core_info' interpretes any non-zero
result from the `->active' handler as the regset requested being active.
Consequently processing continues (and hopefully gracefully fails later
on) rather than being abandoned right away for the regset requested.Fix the problem then by making the code proceed only if a positive
result is returned from the `->active' handler.Signed-off-by: Maciej W. Rozycki
Signed-off-by: Paul Burton
Fixes: 4206d3aa1978 ("elf core dump: notes user_regset")
Patchwork: https://patchwork.linux-mips.org/patch/19332/
Cc: Alexander Viro
Cc: James Hogan
Cc: Ralf Baechle
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman -
commit 994b15b983a72e1148a173b61e5b279219bb45ae upstream.
The previous fix broke recovery of delegated stateids because it assumes
that if we did not mark the delegation as suspect, then the delegation has
effectively been revoked, and so it removes that delegation irrespectively
of whether or not it is valid and still in use. While this is "mostly
harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning
in an infinite loop while complaining that we're using an invalid stateid
(in this case the all-zero stateid).What we rather want to do here is ensure that the delegation is always
correctly marked as needing testing when that is the case. So we want
to close the loophole offered by nfs4_schedule_stateid_recovery(),
which marks the state as needing to be reclaimed, but not the
delegation that may be backing it.Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error")
Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Anna Schumaker
Signed-off-by: Greg Kroah-Hartman -
commit 56446f218af1133c802dad8e9e116f07f381846c upstream.
The problem is that "entryptr + next_offset" and "entryptr + len + size"
can wrap. I ended up changing the type of "entryptr" because it makes
the math easier when we don't have to do so much casting.Signed-off-by: Dan Carpenter
Signed-off-by: Steve French
Reviewed-by: Aurelien Aptel
Reviewed-by: Pavel Shilovsky
CC: Stable
Signed-off-by: Greg Kroah-Hartman -
commit 8ad8aa353524d89fa2e09522f3078166ff78ec42 upstream.
The "old_entry + le32_to_cpu(pDirInfo->NextEntryOffset)" can wrap
around so I have added a check for integer overflow.Reported-by: Dr Silvio Cesare of InfoSect
Reviewed-by: Ronnie Sahlberg
Reviewed-by: Aurelien Aptel
Signed-off-by: Dan Carpenter
Signed-off-by: Steve French
CC: Stable
Signed-off-by: Greg Kroah-Hartman -
commit 831b624df1b420c8f9281ed1307a8db23afb72df upstream.
persistent_ram_vmap() returns the page start vaddr.
persistent_ram_iomap() supports non-page-aligned mapping.persistent_ram_buffer_map() always adds offset-in-page to the vaddr
returned from these two functions, which causes incorrect mapping of
non-page-aligned persistent ram buffer.By default ftrace_size is 4096 and max_ftrace_cnt is nr_cpu_ids. Without
this patch, the zone_sz in ramoops_init_przs() is 4096/nr_cpu_ids which
might not be page aligned. If the offset-in-page > 2048, the vaddr will be
in next page. If the next page is not mapped, it will cause kernel panic:[ 0.074231] BUG: unable to handle kernel paging request at ffffa19e0081b000
...
[ 0.075000] RIP: 0010:persistent_ram_new+0x1f8/0x39f
...
[ 0.075000] Call Trace:
[ 0.075000] ramoops_init_przs.part.10.constprop.15+0x105/0x260
[ 0.075000] ramoops_probe+0x232/0x3a0
[ 0.075000] platform_drv_probe+0x3e/0xa0
[ 0.075000] driver_probe_device+0x2cd/0x400
[ 0.075000] __driver_attach+0xe4/0x110
[ 0.075000] ? driver_probe_device+0x400/0x400
[ 0.075000] bus_for_each_dev+0x70/0xa0
[ 0.075000] driver_attach+0x1e/0x20
[ 0.075000] bus_add_driver+0x159/0x230
[ 0.075000] ? do_early_param+0x95/0x95
[ 0.075000] driver_register+0x70/0xc0
[ 0.075000] ? init_pstore_fs+0x4d/0x4d
[ 0.075000] __platform_driver_register+0x36/0x40
[ 0.075000] ramoops_init+0x12f/0x131
[ 0.075000] do_one_initcall+0x4d/0x12c
[ 0.075000] ? do_early_param+0x95/0x95
[ 0.075000] kernel_init_freeable+0x19b/0x222
[ 0.075000] ? rest_init+0xbb/0xbb
[ 0.075000] kernel_init+0xe/0xfc
[ 0.075000] ret_from_fork+0x3a/0x50Signed-off-by: Bin Yang
[kees: add comments describing the mapping differences, updated commit log]
Fixes: 24c3d2f342ed ("staging: android: persistent_ram: Make it possible to use memory outside of bootmem")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit e79e0e1428188b24c3b57309ffa54a33c4ae40c4 ]
Before this patch, you could get into situations like this:
1. Process 1 searches for X free blocks, finds them, makes a reservation
2. Process 2 searches for free blocks in the same rgrp, but now the
bitmap is full because process 1's reservation is skipped over.
So it marks the bitmap as GBF_FULL.
3. Process 1 tries to allocate blocks from its own reservation, but
since the GBF_FULL bit is set, it skips over the rgrp and searches
elsewhere, thus not using its own reservation.This patch adds an additional check to allow processes to use their
own reservations.Signed-off-by: Bob Peterson
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman