Eric Lee / smarc-fsl-linux-kernel

17 Aug, 2020

1 commit

2cc3c4b3c Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring fixes from Jens Axboe:
"A few differerent things in here.

Seems like syzbot got some more io_uring bits wired up, and we got a
handful of reports and the associated fixes are in here.

General fixes too, and a lot of them marked for stable.

Lastly, a bit of fallout from the async buffered reads, where we now
more easily trigger short reads. Some applications don't really like
that, so the io_read() code now handles short reads internally, and
got a cleanup along the way so that it's now easier to read (and
documented). We're now passing tests that failed before"

* tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
io_uring: short circuit -EAGAIN for blocking read attempt
io_uring: sanitize double poll handling
io_uring: internally retry short reads
io_uring: retain iov_iter state over io_read/io_write calls
task_work: only grab task signal lock when needed
io_uring: enable lookup of links holding inflight files
io_uring: fail poll arm on queue proc failure
io_uring: hold 'ctx' reference around task_work queue + execute
fs: RWF_NOWAIT should imply IOCB_NOIO
io_uring: defer file table grabbing request cleanup for locked requests
io_uring: add missing REQ_F_COMP_LOCKED for nested requests
io_uring: fix recursive completion locking on oveflow flush
io_uring: use TWA_SIGNAL for task_work uncondtionally
io_uring: account locked memory before potential error case
io_uring: set ctx sq/cq entry count earlier
io_uring: Fix NULL pointer dereference in loop_rw_iter()
io_uring: add comments on how the async buffered read retry works
io_uring: io_async_buf_func() need not test page bit

Linus Torvalds
2020-08-17 01:55:12 +0800

16 Aug, 2020

2 commits

f91daf565 io_uring: short circuit -EAGAIN for blocking read attempt ... Browse Code »

One case was missed in the short IO retry handling, and that's hitting
-EAGAIN on a blocking attempt read (eg from io-wq context). This is a
problem on sockets that are marked as non-blocking when created, they
don't carry any REQ_F_NOWAIT information to help us terminate them
instead of perpetually retrying.

Fixes: 227c0c9673d8 ("io_uring: internally retry short reads")
Signed-off-by: Jens Axboe

Jens Axboe
2020-08-16 06:58:42 +0800
d4e7cd36a io_uring: sanitize double poll handling ... Browse Code »

There's a bit of confusion on the matching pairs of poll vs double poll,
depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
poll driven retry.

Add io_poll_get_double() that returns the double poll waitqueue, if any,
and io_poll_get_single() that returns the original poll waitqueue. With
that, remove the argument to io_poll_remove_double().

Finally ensure that wait->private is cleared once the double poll handler
has run, so that remove knows it's already been seen.

Cc: stable@vger.kernel.org # v5.8
Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com
Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: Jens Axboe

Jens Axboe
2020-08-16 02:48:18 +0800

15 Aug, 2020

5 commits

410520d07 Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux ... Browse Code »

Pull 9p updates from Dominique Martinet:

- some code cleanup

- a couple of static analysis fixes

- setattr: try to pick a fid associated with the file rather than the
dentry, which might sometimes matter

* tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux:
9p: Remove unneeded cast from memory allocation
9p: remove unused code in 9p
net/9p: Fix sparse endian warning in trans_fd.c
9p: Fix memory leak in v9fs_mount
9p: retrieve fid from file when file instance exist.

Linus Torvalds
2020-08-15 23:34:36 +0800
f6513bd39 Merge tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Steve French:
"Three small cifs/smb3 fixes, one for stable fixing mkdir path with
the 'idsfromsid' mount option"

* tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
SMB3: Fix mkdir when idsfromsid configured on mount
cifs: Convert to use the fallthrough macro
cifs: Fix an error pointer dereference in cifs_mount()

Linus Torvalds
2020-08-15 23:31:39 +0800
37711e5e2 Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Stable fixes:
- pNFS: Don't return layout segments that are being used for I/O
- pNFS: Don't move layout segments off the active list when being used for I/O

Features:
- NFS: Add support for user xattrs through the NFSv4.2 protocol
- NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
- NFSv4.0 allow nconnect for v4.0

Bugfixes and cleanups:
- nfs: ensure correct writeback errors are returned on close()
- nfs: nfs_file_write() should check for writeback errors
- nfs: Fix getxattr kernel panic and memory overflow
- NFS: Fix the pNFS/flexfiles mirrored read failover code
- SUNRPC: dont update timeout value on connection reset
- freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
- sunrpc: destroy rpc_inode_cachep after unregister_filesystem"

* tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits)
NFS: Fix flexfiles read failover
fs: nfs: delete repeated words in comments
rpc_pipefs: convert comma to semicolon
nfs: Fix getxattr kernel panic and memory overflow
NFS: Don't return layout segments that are in use
NFS: Don't move layouts to plh_return_segs list while in use
NFS: Add layout segment info to pnfs read/write/commit tracepoints
NFS: Add tracepoints for layouterror and layoutstats.
NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
SUNRPC dont update timeout value on connection reset
nfs: nfs_file_write() should check for writeback errors
nfs: ensure correct writeback errors are returned on close()
NFSv4.2: xattr cache: get rid of cache discard work queue
NFS: remove redundant initialization of variable result
NFSv4.0 allow nconnect for v4.0
freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
sunrpc: destroy rpc_inode_cachep after unregister_filesystem
NFSv4.2: add client side xattr caching.
NFSv4.2: hook in the user extended attribute handlers
NFSv4.2: add the extended attribute proc functions.
...

Linus Torvalds
2020-08-15 23:26:55 +0800
c734124c5 fs: autofs: delete repeated words in comments ... Browse Code »

Drop duplicated words {the, at} in comments.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Acked-by: Ian Kent
Link: http://lkml.kernel.org/r/20200811021817.24982-1-rdunlap@infradead.org
Signed-off-by: Linus Torvalds

Randy Dunlap
2020-08-15 10:56:56 +0800
fc4177be9 exec: restore EACCES of S_ISDIR execve() ... Browse Code »

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES. Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac3980 ("exec: move S_ISREG() check earlier")
Reported-by: Marc Zyngier
Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Tested-by: Greg Kroah-Hartman
Reviewed-by: Greg Kroah-Hartman
Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why
Signed-off-by: Linus Torvalds

Kees Cook
2020-08-15 10:56:56 +0800

14 Aug, 2020

6 commits

c8c412f97 SMB3: Fix mkdir when idsfromsid configured on mount ... Browse Code »

mkdir uses a compounded create operation which was not setting
the security descriptor on create of a directory. Fix so
mkdir now sets the mode and owner info properly when idsfromsid
and modefromsid are configured on the mount.

Signed-off-by: Steve French
CC: Stable # v5.8
Reviewed-by: Paulo Alcantara (SUSE)
Reviewed-by: Pavel Shilovsky

Steve French
2020-08-14 08:41:01 +0800
227c0c967 io_uring: internally retry short reads ... Browse Code »

We've had a few application cases of not handling short reads properly,
and it is understandable as short reads aren't really expected if the
application isn't doing non-blocking IO.

Now that we retain the iov_iter over retries, we can implement internal
retry pretty trivially. This ensures that we don't return a short read,
even for buffered reads on page cache conflicts.

Cleanup the deep nesting and hard to read nature of io_read() as well,
it's much more straight forward now to read and understand. Added a
few comments explaining the logic as well.

Signed-off-by: Jens Axboe

Jens Axboe
2020-08-14 07:00:31 +0800
ff6165b2d io_uring: retain iov_iter state over io_read/io_write calls ... Browse Code »

Instead of maintaining (and setting/remembering) iov_iter size and
segment counts, just put the iov_iter in the async part of the IO
structure.

This is mostly a preparation patch for doing appropriate internal retries
for short reads, but it also cleans up the state handling nicely and
simplifies it quite a bit.

Signed-off-by: Jens Axboe

Jens Axboe
2020-08-14 04:53:34 +0800
23c2c8c6f Merge tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull more btrfs updates from David Sterba:
"One minor update, the rest are fixes that have arrived a bit late for
the first batch. There are also some recent fixes for bugs that were
discovered during the merge window and pop up during testing.

User visible change:

- show correct subvolume path in /proc/mounts for bind mounts

Fixes:

- fix compression messages when remounting with different level or
compression algorithm

- tree-log: fix some memory leaks on error handling paths

- restore I_VERSION on remount

- fix return values and error code mixups

- fix umount crash with quotas enabled when removing sysfs files

- fix trim range on a shrunk device"

* tag 'for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
btrfs: fix return value mixup in btrfs_get_extent
btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups()
btrfs: check correct variable after allocation in btrfs_backref_iter_alloc
btrfs: make sure SB_I_VERSION doesn't get unset by remount
btrfs: fix memory leaks after failure to lookup checksums during inode logging
btrfs: don't show full path of bind mounts in subvol=
btrfs: fix messages after changing compression level by remount
btrfs: only search for left_info if there is no right_info in try_merge_free_space
btrfs: inode: fix NULL pointer dereference if inode doesn't need compression

Linus Torvalds
2020-08-14 03:26:18 +0800
69307ade1 Merge tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs fixes from Darrick Wong:
"Two small fixes that have come in during the past week:

- Fix duplicated words in comments

- Fix an ubsan complaint about null pointer arithmetic"

* tag 'xfs-5.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init
xfs: delete duplicated words + other fixes

Linus Torvalds
2020-08-14 03:22:19 +0800
ff419b61f Merge tag 'exfat-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat ... Browse Code »

Pull exfat updates from Namjae Jeon:

- don't clear MediaFailure and VolumeDirty bit in volume flags if these
were already set before mounting

- write multiple dirty buffers at once in sync mode

- remove unneeded EXFAT_SB_DIRTY bit set

* tag 'exfat-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: retain 'VolumeFlags' properly
exfat: optimize exfat_zeroed_cluster()
exfat: add error check when updating dir-entries
exfat: write multiple sectors at once
exfat: remove EXFAT_SB_DIRTY flag

Linus Torvalds
2020-08-14 03:18:07 +0800

13 Aug, 2020

26 commits

f254ac04c io_uring: enable lookup of links holding inflight files ... Browse Code »

When a process exits, we cancel whatever requests it has pending that
are referencing the file table. However, if a link is holding a
reference, then we cannot find it by simply looking at the inflight
list.

Enable checking of the poll and timeout list to find the link, and
cancel it appropriately.

Cc: stable@vger.kernel.org
Reported-by: Josef
Signed-off-by: Jens Axboe

Jens Axboe
2020-08-13 07:33:30 +0800
7c2a69f61 Merge tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"Xiubo has completed his work on filesystem client metrics, they are
sent to all available MDSes once per second now.

Other than that, we have a lot of fixes and cleanups all around the
filesystem, including a tweak to cut down on MDS request resends in
multi-MDS setups from Yanhu and fixups for SELinux symlink labeling
and MClientSession message decoding from Jeff"

* tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client: (22 commits)
ceph: handle zero-length feature mask in session messages
ceph: use frag's MDS in either mode
ceph: move sb->wb_pagevec_pool to be a global mempool
ceph: set sec_context xattr on symlink creation
ceph: remove redundant initialization of variable mds
ceph: fix use-after-free for fsc->mdsc
ceph: remove unused variables in ceph_mdsmap_decode()
ceph: delete repeated words in fs/ceph/
ceph: send client provided metric flags in client metadata
ceph: periodically send perf metrics to MDSes
ceph: check the sesion state and return false in case it is closed
libceph: replace HTTP links with HTTPS ones
ceph: remove unnecessary cast in kfree()
libceph: just have osd_req_op_init() return a pointer
ceph: do not access the kiocb after aio requests
ceph: clean up and optimize ceph_check_delayed_caps()
ceph: fix potential mdsc use-after-free crash
ceph: switch to WARN_ON_ONCE in encode_supported_features()
ceph: add global total_caps to count the mdsc's total caps number
ceph: add check_session_state() helper and make it global
...

Linus Torvalds
2020-08-13 03:51:31 +0800
9ad57f6df Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
memory-hotplug, cleanups, uaccess, migration, gup, pagemap),

- various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
exec, kdump, rapidio, panic, kcov, kgdb, ipc).

* emailed patches from Andrew Morton : (164 commits)
mm/gup: remove task_struct pointer for all gup code
mm: clean up the last pieces of page fault accountings
mm/xtensa: use general page fault accounting
mm/x86: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sh: use general page fault accounting
mm/s390: use general page fault accounting
mm/riscv: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/nios2: use general page fault accounting
mm/nds32: use general page fault accounting
mm/mips: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/m68k: use general page fault accounting
mm/ia64: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/csky: use general page fault accounting
...

Linus Torvalds
2020-08-13 02:24:12 +0800
64019a2e4 mm/gup: remove task_struct pointer for all gup code ... Browse Code »

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:04 +0800
0fd338b2d exec: move path_noexec() check earlier ... Browse Code »

The path_noexec() check, like the regular file check, was happening too
late, letting LSMs see impossible execve()s. Check it earlier as well in
may_open() and collect the redundant fs/exec.c path_noexec() test under
the same robustness comment as the S_ISREG() check.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
.acc_mode = MAY_EXEC,
...
do_filp_open(dfd, filename, open_flags)
path_openat(nameidata, open_flags, flags)
file = alloc_empty_file(open_flags, current_cred());
do_open(nameidata, file, open_flags)
may_open(path, acc_mode, open_flag)
/* new location of MAY_EXEC vs path_noexec() test */
inode_permission(inode, MAY_OPEN | acc_mode)
security_inode_permission(inode, acc_mode)
vfs_open(path, file)
do_dentry_open(file, path->dentry->d_inode, open)
security_file_open(f)
open()
/* old location of path_noexec() test */

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: Aleksa Sarai
Cc: Christian Brauner
Cc: Dmitry Vyukov
Cc: Eric Biggers
Cc: Tetsuo Handa
Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.org
Signed-off-by: Linus Torvalds

Kees Cook
2020-08-13 01:58:01 +0800
633fb6ac3 exec: move S_ISREG() check earlier ... Browse Code »

The execve(2)/uselib(2) syscalls have always rejected non-regular files.
Recently, it was noticed that a deadlock was introduced when trying to
execute pipes, as the S_ISREG() test was happening too late. This was
fixed in commit 73601ea5b7b1 ("fs/open.c: allow opening only regular files
during execve()"), but it was added after inode_permission() had already
run, which meant LSMs could see bogus attempts to execute non-regular
files.

Move the test into the other inode type checks (which already look for
other pathological conditions[1]). Since there is no need to use
FMODE_EXEC while we still have access to "acc_mode", also switch the test
to MAY_EXEC.

Also include a comment with the redundant S_ISREG() checks at the end of
execve(2)/uselib(2) to note that they are present to avoid any mistakes.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
.acc_mode = MAY_EXEC,
...
do_filp_open(dfd, filename, open_flags)
path_openat(nameidata, open_flags, flags)
file = alloc_empty_file(open_flags, current_cred());
do_open(nameidata, file, open_flags)
may_open(path, acc_mode, open_flag)
/* new location of MAY_EXEC vs S_ISREG() test */
inode_permission(inode, MAY_OPEN | acc_mode)
security_inode_permission(inode, acc_mode)
vfs_open(path, file)
do_dentry_open(file, path->dentry->d_inode, open)
/* old location of FMODE_EXEC vs S_ISREG() test */
security_file_open(f)
open()

[1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Cc: Aleksa Sarai
Cc: Alexander Viro
Cc: Christian Brauner
Cc: Dmitry Vyukov
Cc: Eric Biggers
Cc: Tetsuo Handa
Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.org
Signed-off-by: Linus Torvalds

Kees Cook
2020-08-13 01:58:01 +0800
db19c91c3 exec: change uselib(2) IS_SREG() failure to EACCES ... Browse Code »

Patch series "Relocate execve() sanity checks", v2.

While looking at the code paths for the proposed O_MAYEXEC flag, I saw
some things that looked like they should be fixed up.

exec: Change uselib(2) IS_SREG() failure to EACCES
This just regularizes the return code on uselib(2).

exec: Move S_ISREG() check earlier
This moves the S_ISREG() check even earlier than it was already.

exec: Move path_noexec() check earlier
This adds the path_noexec() check to the same place as the
S_ISREG() check.

This patch (of 3):

Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so
the behavior matches execve(2), and the seemingly documented value. The
"not a regular file" failure mode of execve(2) is explicitly
documented[1], but it is not mentioned in uselib(2)[2] which does,
however, say that open(2) and mmap(2) errors may apply. The documentation
for open(2) does not include a "not a regular file" error[3], but mmap(2)
does[4], and it is EACCES.

[1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS
[2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS
[3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS
[4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORS

Signed-off-by: Kees Cook
Signed-off-by: Andrew Morton
Acked-by: Christian Brauner
Cc: Aleksa Sarai
Cc: Alexander Viro
Cc: Dmitry Vyukov
Cc: Eric Biggers
Cc: Tetsuo Handa
Link: http://lkml.kernel.org/r/20200605160013.3954297-1-keescook@chromium.org
Link: http://lkml.kernel.org/r/20200605160013.3954297-2-keescook@chromium.org
Signed-off-by: Linus Torvalds

Kees Cook
2020-08-13 01:58:01 +0800
f38c85f1b coredump: add %f for executable filename ... Browse Code »

The document reads "%e" should be "executable filename" while actually it
could be changed by things like pr_ctl PR_SET_NAME. People who uses "%e"
in core_pattern get surprised when they find out they get thread name
instead of executable filename.

This is either a bug of document or a bug of code. Since the behavior of
"%e" is there for long time, it could bring another surprise for users if
we "fix" the code.

So we just "fix" the document. And more, for users who really need the
"executable filename" in core_pattern, we introduce a new "%f" for the
real executable filename. We already have "%E" for executable path in
kernel, so just reuse most of its code for the new added "%f" format.

Signed-off-by: Lepton Wu
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200701031432.2978761-1-ytht.net@gmail.com
Signed-off-by: Linus Torvalds

Lepton Wu
2020-08-13 01:58:01 +0800
a089e3fd5 fs/signalfd.c: fix inconsistent return codes for signalfd4 ... Browse Code »

The kernel signalfd4() syscall returns different error codes when called
either in compat or native mode. This behaviour makes correct emulation
in qemu and testing programs like LTP more complicated.

Fix the code to always return -in both modes- EFAULT for unaccessible user
memory, and EINVAL when called with an invalid signal mask.

Signed-off-by: Helge Deller
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: Laurent Vivier
Link: http://lkml.kernel.org/r/20200530100707.GA10159@ls3530.fritz.box
Signed-off-by: Linus Torvalds

Helge Deller
2020-08-13 01:58:01 +0800
a090a5a7d fat: fix fat_ra_init() for data clusters == 0 ... Browse Code »

If data clusters == 0, fat_ra_init() calls the ->ent_blocknr() for the
cluster beyond ->max_clusters.

This checks the limit before initialization to suppress the warning.

Reported-by: syzbot+756199124937b31a9b7e@syzkaller.appspotmail.com
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/87mu462sv4.fsf@mail.parknet.co.jp
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2020-08-13 01:58:01 +0800
4ecfed61d VFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS ones ... Browse Code »

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `xmlns`:
For each link, `http://[^# ]*(?:\w|/)`:
If neither `gnu\.org/license`, nor `mozilla\.org/MPL`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov
Signed-off-by: Andrew Morton
Acked-by: OGAWA Hirofumi
Link: http://lkml.kernel.org/r/20200708200409.22293-1-grandmaster@al2klimov.de
Signed-off-by: Linus Torvalds

Alexander A. Klimov
2020-08-13 01:58:01 +0800
e348e65a0 fatfs: switch write_lock to read_lock in fat_ioctl_get_attributes ... Browse Code »

There is no need to hold write_lock in fat_ioctl_get_attributes.
write_lock may make an impact on concurrency of fat_ioctl_get_attributes.

Signed-off-by: Yubo Feng
Signed-off-by: Andrew Morton
Acked-by: OGAWA Hirofumi
Link: http://lkml.kernel.org/r/1593308053-12702-1-git-send-email-fengyubo3@huawei.com
Signed-off-by: Linus Torvalds

Yubo Feng
2020-08-13 01:58:01 +0800
88b2e9b06 fs/ufs: avoid potential u32 multiplication overflow ... Browse Code »

The 64 bit ino is being compared to the product of two u32 values,
however, the multiplication is being performed using a 32 bit multiply so
there is a potential of an overflow. To be fully safe, cast uspi->s_ncg
to a u64 to ensure a 64 bit multiplication occurs to avoid any chance of
overflow.

Fixes: f3e2a520f5fb ("ufs: NFS support")
Signed-off-by: Colin Ian King
Signed-off-by: Andrew Morton
Cc: Evgeniy Dushistov
Cc: Alexey Dobriyan
Link: http://lkml.kernel.org/r/20200715170355.1081713-1-colin.king@canonical.com
Addresses-Coverity: ("Unintentional integer overflow")
Signed-off-by: Linus Torvalds

Colin Ian King
2020-08-13 01:58:01 +0800
a1d0747a3 nilfs2: use a more common logging style ... Browse Code »

Add macros for nilfs_(sb, fmt, ...) and convert the uses of
'nilfs_msg(sb, KERN_, ...)' to 'nilfs_(sb, ...)' so nilfs2
uses a logging style more like the typical kernel logging style.

Miscellanea:

o Realign arguments for these uses

Signed-off-by: Joe Perches
Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/1595860111-3920-4-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Linus Torvalds

Joe Perches
2020-08-13 01:58:01 +0800
2987a4cfc nilfs2: convert __nilfs_msg to integrate the level and format ... Browse Code »

Reduce object size a bit by removing the KERN_ as a separate
argument and adding it to the format string.

Reduce overall object size by about ~.5% (x86-64 defconfig w/ nilfs2)

old:
$ size -t fs/nilfs2/built-in.a | tail -1
191738 8676 44 200458 30f0a (TOTALS)

new:
$ size -t fs/nilfs2/built-in.a | tail -1
190971 8676 44 199691 30c0b (TOTALS)

Signed-off-by: Joe Perches
Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/1595860111-3920-3-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Linus Torvalds

Joe Perches
2020-08-13 01:58:01 +0800
1b0e31861 nilfs2: only call unlock_new_inode() if I_NEW ... Browse Code »

Patch series "nilfs2 updates".

This patch (of 3):

unlock_new_inode() is only meant to be called after a new inode has
already been inserted into the hash table. But nilfs_new_inode() can call
it even before it has inserted the inode, triggering the WARNING in
unlock_new_inode(). Fix this by only calling unlock_new_inode() if the
inode has the I_NEW flag set, indicating that it's in the table.

Signed-off-by: Eric Biggers
Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/1595860111-3920-1-git-send-email-konishi.ryusuke@gmail.com
Link: http://lkml.kernel.org/r/1595860111-3920-2-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Linus Torvalds

Eric Biggers
2020-08-13 01:58:00 +0800
f666f9fb9 fs/minix: remove expected error message in block_to_path() ... Browse Code »

When truncating a file to a size within the last allowed logical block,
block_to_path() is called with the *next* block. This exceeds the limit,
causing the "block %ld too big" error message to be printed.

This case isn't actually an error; there are just no more blocks past that
point. So, remove this error message.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: Qiujun Huang
Link: http://lkml.kernel.org/r/20200628060846.682158-7-ebiggers@kernel.org
Signed-off-by: Linus Torvalds

Eric Biggers
2020-08-13 01:58:00 +0800
0a12c4a80 fs/minix: fix block limit check for V1 filesystems ... Browse Code »

The minix filesystem reads its maximum file size from its on-disk
superblock. This value isn't necessarily a multiple of the block size.
When it's not, the V1 block mapping code doesn't allow mapping the last
possible block. Commit 6ed6a722f9ab ("minixfs: fix block limit check")
fixed this in the V2 mapping code. Fix it in the V1 mapping code too.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: Qiujun Huang
Link: http://lkml.kernel.org/r/20200628060846.682158-6-ebiggers@kernel.org
Signed-off-by: Linus Torvalds

Eric Biggers
2020-08-13 01:58:00 +0800
32ac86eff fs/minix: set s_maxbytes correctly ... Browse Code »

The minix filesystem leaves super_block::s_maxbytes at MAX_NON_LFS rather
than setting it to the actual filesystem-specific limit. This is broken
because it means userspace doesn't see the standard behavior like getting
EFBIG and SIGXFSZ when exceeding the maximum file size.

Fix this by setting s_maxbytes correctly.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: Qiujun Huang
Link: http://lkml.kernel.org/r/20200628060846.682158-5-ebiggers@kernel.org
Signed-off-by: Linus Torvalds

Eric Biggers
2020-08-13 01:58:00 +0800
270ef4109 fs/minix: reject too-large maximum file size ... Browse Code »

If the minix filesystem tries to map a very large logical block number to
its on-disk location, block_to_path() can return offsets that are too
large, causing out-of-bounds memory accesses when accessing indirect index
blocks. This should be prevented by the check against the maximum file
size, but this doesn't work because the maximum file size is read directly
from the on-disk superblock and isn't validated itself.

Fix this by validating the maximum file size at mount time.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+c7d9ec7a1a7272dd71b3@syzkaller.appspotmail.com
Reported-by: syzbot+3b7b03a0c28948054fb5@syzkaller.appspotmail.com
Reported-by: syzbot+6e056ee473568865f3e6@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: Qiujun Huang
Cc:
Link: http://lkml.kernel.org/r/20200628060846.682158-4-ebiggers@kernel.org
Signed-off-by: Linus Torvalds

Eric Biggers
2020-08-13 01:58:00 +0800
facb03ddd fs/minix: don't allow getting deleted inodes ... Browse Code »

If an inode has no links, we need to mark it bad rather than allowing it
to be accessed. This avoids WARNINGs in inc_nlink() and drop_nlink() when
doing directory operations on a fuzzed filesystem.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a9ac3de1b5de5fb10efc@syzkaller.appspotmail.com
Reported-by: syzbot+df958cf5688a96ad3287@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Cc: Qiujun Huang
Cc:
Link: http://lkml.kernel.org/r/20200628060846.682158-3-ebiggers@kernel.org
Signed-off-by: Linus Torvalds

Eric Biggers
2020-08-13 01:58:00 +0800
da27e0a0e fs/minix: check return value of sb_getblk() ... Browse Code »

Patch series "fs/minix: fix syzbot bugs and set s_maxbytes".

This series fixes all syzbot bugs in the minix filesystem:

KASAN: null-ptr-deref Write in get_block
KASAN: use-after-free Write in get_block
KASAN: use-after-free Read in get_block
WARNING in inc_nlink
KMSAN: uninit-value in get_block
WARNING in drop_nlink

It also fixes the minix filesystem to set s_maxbytes correctly, so that
userspace sees the correct behavior when exceeding the max file size.

This patch (of 6):

sb_getblk() can fail, so check its return value.

This fixes a NULL pointer dereference.

Originally from Qiujun Huang.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+4a88b2b9dc280f47baf4@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers
Signed-off-by: Andrew Morton
Cc: Qiujun Huang
Cc: Alexander Viro
Cc:
Link: http://lkml.kernel.org/r/20200628060846.682158-1-ebiggers@kernel.org
Link: http://lkml.kernel.org/r/20200628060846.682158-2-ebiggers@kernel.org
Signed-off-by: Linus Torvalds

Eric Biggers
2020-08-13 01:58:00 +0800
fe8141759 exec: use force_uaccess_begin during exec and exit ... Browse Code »

Both exec and exit want to ensure that the uaccess routines actually do
access user pointers. Use the newly added force_uaccess_begin helper
instead of an open coded set_fs for that to prepare for kernel builds
where set_fs() does not exist.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Linus Torvalds
Cc: Nick Hu
Cc: Greentime Hu
Cc: Vincent Chen
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Cc: Geert Uytterhoeven
Link: http://lkml.kernel.org/r/20200710135706.537715-7-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-08-13 01:57:59 +0800
15568299b hugetlbfs: prevent filesystem stacking of hugetlbfs ... Browse Code »

syzbot found issues with having hugetlbfs on a union/overlay as reported
in [1]. Due to the limitations (no write) and special functionality of
hugetlbfs, it does not work well in filesystem stacking. There are no
know use cases for hugetlbfs stacking. Rather than making modifications
to get hugetlbfs working in such environments, simply prevent stacking.

[1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/

Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
Suggested-by: Amir Goldstein
Signed-off-by: Mike Kravetz
Signed-off-by: Andrew Morton
Acked-by: Miklos Szeredi
Cc: Al Viro
Cc: Matthew Wilcox
Cc: Colin Walters
Link: http://lkml.kernel.org/r/80f869aa-810d-ef6c-8888-b46cee135907@oracle.com
Signed-off-by: Linus Torvalds

Mike Kravetz
2020-08-13 01:57:56 +0800
9066e5cfb mm, oom: make the calculation of oom badness more accurate ... Browse Code »

Recently we found an issue on our production environment that when memcg
oom is triggered the oom killer doesn't chose the process with largest
resident memory but chose the first scanned process. Note that all
processes in this memcg have the same oom_score_adj, so the oom killer
should chose the process with largest resident memory.

Bellow is part of the oom info, which is enough to analyze this issue.
[7516987.983223] memory: usage 16777216kB, limit 16777216kB, failcnt 52843037
[7516987.983224] memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0
[7516987.983225] kmem: usage 301464kB, limit 9007199254740988kB, failcnt 0
[...]
[7516987.983293] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[7516987.983510] [ 5740] 0 5740 257 1 32768 0 -998 pause
[7516987.983574] [58804] 0 58804 4594 771 81920 0 -998 entry_point.bas
[7516987.983577] [58908] 0 58908 7089 689 98304 0 -998 cron
[7516987.983580] [58910] 0 58910 16235 5576 163840 0 -998 supervisord
[7516987.983590] [59620] 0 59620 18074 1395 188416 0 -998 sshd
[7516987.983594] [59622] 0 59622 18680 6679 188416 0 -998 python
[7516987.983598] [59624] 0 59624 1859266 5161 548864 0 -998 odin-agent
[7516987.983600] [59625] 0 59625 707223 9248 983040 0 -998 filebeat
[7516987.983604] [59627] 0 59627 416433 64239 774144 0 -998 odin-log-agent
[7516987.983607] [59631] 0 59631 180671 15012 385024 0 -998 python3
[7516987.983612] [61396] 0 61396 791287 3189 352256 0 -998 client
[7516987.983615] [61641] 0 61641 1844642 29089 946176 0 -998 client
[7516987.983765] [ 9236] 0 9236 2642 467 53248 0 -998 php_scanner
[7516987.983911] [42898] 0 42898 15543 838 167936 0 -998 su
[7516987.983915] [42900] 1000 42900 3673 867 77824 0 -998 exec_script_vr2
[7516987.983918] [42925] 1000 42925 36475 19033 335872 0 -998 python
[7516987.983921] [57146] 1000 57146 3673 848 73728 0 -998 exec_script_J2p
[7516987.983925] [57195] 1000 57195 186359 22958 491520 0 -998 python2
[7516987.983928] [58376] 1000 58376 275764 14402 290816 0 -998 rosmaster
[7516987.983931] [58395] 1000 58395 155166 4449 245760 0 -998 rosout
[7516987.983935] [58406] 1000 58406 18285584 3967322 37101568 0 -998 data_sim
[7516987.984221] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=3aa16c9482ae3a6f6b78bda68a55d32c87c99b985e0f11331cddf05af6c4d753,mems_allowed=0-1,oom_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184,task_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184/1f246a3eeea8f70bf91141eeaf1805346a666e225f823906485ea0b6c37dfc3d,task=pause,pid=5740,uid=0
[7516987.984254] Memory cgroup out of memory: Killed process 5740 (pause) total-vm:1028kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB
[7516988.092344] oom_reaper: reaped process 5740 (pause), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

We can find that the first scanned process 5740 (pause) was killed, but
its rss is only one page. That is because, when we calculate the oom
badness in oom_badness(), we always ignore the negtive point and convert
all of these negtive points to 1. Now as oom_score_adj of all the
processes in this targeted memcg have the same value -998, the points of
these processes are all negtive value. As a result, the first scanned
process will be killed.

The oom_socre_adj (-998) in this memcg is set by kubelet, because it is a
a Guaranteed pod, which has higher priority to prevent from being killed
by system oom.

To fix this issue, we should make the calculation of oom point more
accurate. We can achieve it by convert the chosen_point from 'unsigned
long' to 'long'.

[cai@lca.pw: reported a issue in the previous version]
[mhocko@suse.com: fixed the issue reported by Cai]
[mhocko@suse.com: add the comment in proc_oom_score()]
[laoar.shao@gmail.com: v3]
Link: http://lkml.kernel.org/r/1594396651-9931-1-git-send-email-laoar.shao@gmail.com

Signed-off-by: Yafang Shao
Signed-off-by: Andrew Morton
Tested-by: Naresh Kamboju
Acked-by: Michal Hocko
Cc: David Rientjes
Cc: Qian Cai
Link: http://lkml.kernel.org/r/1594309987-9919-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Linus Torvalds

Yafang Shao
2020-08-13 01:57:56 +0800
471e78cc7 /proc/PID/smaps: consistent whitespace output format ... Browse Code »

The keys in smaps output are padded to fixed width with spaces. All
except for THPeligible that uses tabs (only since commit c06306696f83
("mm: thp: fix false negative of shmem vma's THP eligibility")).

Unify the output formatting to save time debugging some naïve parsers.
(Part of the unification is also aligning FilePmdMapped with others.)

Signed-off-by: Michal Koutný
Signed-off-by: Andrew Morton
Acked-by: Yang Shi
Cc: Alexey Dobriyan
Cc: Matthew Wilcox
Link: http://lkml.kernel.org/r/20200728083207.17531-1-mkoutny@suse.com
Signed-off-by: Linus Torvalds

Michal Koutný
2020-08-13 01:57:56 +0800