Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2019

1 commit

0ecee6699 fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry() ... Browse Code »

After do_add_mount() returns success, the caller doesn't hold a
reference to the 'struct mount' anymore. So it's invalid to access it
in mnt_warn_timestamp_expiry().

Fix it by calling mnt_warn_timestamp_expiry() before do_add_mount()
rather than after, and adjusting the warning message accordingly.

Reported-by: syzbot+da4f525235510683d855@syzkaller.appspotmail.com
Fixes: f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry")
Signed-off-by: Eric Biggers
Signed-off-by: Al Viro

Eric Biggers
2019-10-17 11:15:09 +0800

27 Sep, 2019

1 commit

cbafe18c7 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- almost all of the rest of -mm

- various other subsystems

Subsystems affected by this patch series:
memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork,
cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise,
cleanups, pagemap

* emailed patches from Andrew Morton : (77 commits)
arch/sparc/include/asm/pgtable_64.h: fix build
mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
ntfs: remove (un)?likely() from IS_ERR() conditions
IB/hfi1: remove unlikely() from IS_ERR*() condition
xfs: remove unlikely() from WARN_ON() condition
wimax/i2400m: remove unlikely() from WARN*() condition
fs: remove unlikely() from WARN_ON() condition
xen/events: remove unlikely() from WARN() condition
checkpatch: check for nested (un)?likely() calls
hexagon: drop empty and unused free_initrd_mem
mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
mm: introduce MADV_PAGEOUT
mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
mm: introduce MADV_COLD
mm: untag user pointers in mmap/munmap/mremap/brk
vfio/type1: untag user pointers in vaddr_get_pfn
tee/shm: untag user pointers in tee_shm_register
media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
drm/amdgpu: untag user pointers
...

Linus Torvalds
2019-09-27 01:29:42 +0800

26 Sep, 2019

2 commits

ed8a66b83 fs/namespace: untag user pointers in copy_mount_options ... Browse Code »

This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

In copy_mount_options a user address is being subtracted from TASK_SIZE.
If the address is lower than TASK_SIZE, the size is calculated to not
allow the exact_copy_from_user() call to cross TASK_SIZE boundary.
However if the address is tagged, then the size will be calculated
incorrectly.

Untag the address before subtracting.

Link: http://lkml.kernel.org/r/1de225e4a54204bfd7f25dac2635e31aa4aa1d90.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Reviewed-by: Khalid Aziz
Reviewed-by: Vincenzo Frascino
Reviewed-by: Kees Cook
Reviewed-by: Catalin Marinas
Cc: Al Viro
Cc: Dave Hansen
Cc: Eric Auger
Cc: Felix Kuehling
Cc: Jens Wiklander
Cc: Mauro Carvalho Chehab
Cc: Mike Rapoport
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Konovalov
2019-09-26 08:51:41 +0800
7b1373dd6 Merge tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse updates from Miklos Szeredi:

- Continue separating the transport (user/kernel communication) and the
filesystem layers of fuse. Getting rid of most layering violations
will allow for easier cleanup and optimization later on.

- Prepare for the addition of the virtio-fs filesystem. The actual
filesystem will be introduced by a separate pull request.

- Convert to new mount API.

- Various fixes, optimizations and cleanups.

* tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits)
fuse: Make fuse_args_to_req static
fuse: fix memleak in cuse_channel_open
fuse: fix beyond-end-of-page access in fuse_parse_cache()
fuse: unexport fuse_put_request
fuse: kmemcg account fs data
fuse: on 64-bit store time in d_fsdata directly
fuse: fix missing unlock_page in fuse_writepage()
fuse: reserve byteswapped init opcodes
fuse: allow skipping control interface and forced unmount
fuse: dissociate DESTROY from fuseblk
fuse: delete dentry if timeout is zero
fuse: separate fuse device allocation and installation in fuse_conn
fuse: add fuse_iqueue_ops callbacks
fuse: extract fuse_fill_super_common()
fuse: export fuse_dequeue_forget() function
fuse: export fuse_get_unique()
fuse: export fuse_send_init_request()
fuse: export fuse_len_args()
fuse: export fuse_end_request()
fuse: fix request limit
...

Linus Torvalds
2019-09-26 00:55:59 +0800

20 Sep, 2019

1 commit

cfb82e1df Merge tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground ... Browse Code »

Pull y2038 vfs updates from Arnd Bergmann:
"Add inode timestamp clamping.

This series from Deepa Dinamani adds a per-superblock minimum/maximum
timestamp limit for a file system, and clamps timestamps as they are
written, to avoid random behavior from integer overflow as well as
having different time stamps on disk vs in memory.

At mount time, a warning is now printed for any file system that can
represent current timestamps but not future timestamps more than 30
years into the future, similar to the arbitrary 30 year limit that was
added to settimeofday().

This was picked as a compromise to warn users to migrate to other file
systems (e.g. ext4 instead of ext3) when they need the file system to
survive beyond 2038 (or similar limits in other file systems), but not
get in the way of normal usage"

* tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
ext4: Reduce ext4 timestamp warnings
isofs: Initialize filesystem timestamp ranges
pstore: fs superblock limits
fs: omfs: Initialize filesystem timestamp ranges
fs: hpfs: Initialize filesystem timestamp ranges
fs: ceph: Initialize filesystem timestamp ranges
fs: sysv: Initialize filesystem timestamp ranges
fs: affs: Initialize filesystem timestamp ranges
fs: fat: Initialize filesystem timestamp ranges
fs: cifs: Initialize filesystem timestamp ranges
fs: nfs: Initialize filesystem timestamp ranges
ext4: Initialize timestamps limits
9p: Fill min and max timestamps in sb
fs: Fill in max and min timestamps in superblock
utimes: Clamp the timestamps before update
mount: Add mount warning for impending timestamp expiry
timestamp_truncate: Replace users of timespec64_trunc
vfs: Add timestamp_truncate() api
vfs: Add file timestamp range support

Linus Torvalds
2019-09-20 00:42:37 +0800

19 Sep, 2019

2 commits

d013cc800 Merge tag 'filelock-v5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux ... Browse Code »

Pull file locking updates from Jeff Layton:
"Just a couple of minor bugfixes, a revision to a tracepoint to account
for some earlier changes to the internals, and a patch to add a
pr_warn message when someone tries to mount a filesystem with '-o
mand' on a kernel that has that support disabled"

* tag 'filelock-v5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: fix a memory leak bug in __break_lease()
locks: print a warning when mount fails due to lack of "mand" support
locks: Fix procfs output for file leases
locks: revise generic_add_lease tracepoint

Linus Torvalds
2019-09-19 04:41:01 +0800
53e5e7a7a Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs namei updates from Al Viro:
"Pathwalk-related stuff"

[ Audit-related cleanups, misc simplifications, and easier to follow
nd->root refcounts - Linus ]

* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
devpts_pty_kill(): don't bother with d_delete()
infiniband: don't bother with d_delete()
hypfs: don't bother with d_delete()
fs/namei.c: keep track of nd->root refcount status
fs/namei.c: new helper - legitimize_root()
kill the last users of user_{path,lpath,path_dir}()
namei.h: get the comments on LOOKUP_... in sync with reality
kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h
audit_inode(): switch to passing AUDIT_INODE_...
filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there
filename_lookup(): audit_inode() argument is always 0

Linus Torvalds
2019-09-19 04:03:01 +0800

07 Sep, 2019

1 commit

c7eb68696 vfs: subtype handling moved to fuse ... Browse Code »

The unused vfs code can be removed. Don't pass empty subtype (same as if
->parse callback isn't called).

The bits that are left involve determining whether it's permitted to split the
filesystem type string passed in to mount(2). Consequently, this means that we
cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
with a dot in it always indicates a subtype specification.

Signed-off-by: David Howells
Signed-off-by: Al Viro
Signed-off-by: Miklos Szeredi

David Howells
2019-09-07 03:28:49 +0800

31 Aug, 2019

1 commit

ce6595a28 kill the last users of user_{path,lpath,path_dir}() ... Browse Code »

old wrappers with few callers remaining; put them out of their misery...

Signed-off-by: Al Viro

Al Viro
2019-08-31 09:30:13 +0800

30 Aug, 2019

1 commit

f8b92ba67 mount: Add mount warning for impending timestamp expiry ... Browse Code »

The warning reuses the uptime max of 30 years used by
settimeofday().

Note that the warning is only emitted for writable filesystem mounts
through the mount syscall. Automounts do not have the same warning.

Print out the warning in human readable format using the struct tm.
After discussion with Arnd Bergmann, we chose to print only the year number.
The raw s_time_max is also displayed, and the user can easily decode
it e.g. "date -u -d @$((0x7fffffff))". We did not want to consolidate
struct rtc_tm and struct tm just to print the date using a format specifier
as part of this series.
Given that the rtc_tm is not compiled on all architectures, this is not a
trivial patch. This can be added in the future.

Signed-off-by: Deepa Dinamani
Acked-by: Jeff Layton

Deepa Dinamani
2019-08-30 22:27:17 +0800

17 Aug, 2019

1 commit

df2474a22 locks: print a warning when mount fails due to lack of "mand" support ... Browse Code »

Since 9e8925b67a ("locks: Allow disabling mandatory locking at compile
time"), attempts to mount filesystems with "-o mand" will fail.
Unfortunately, there is no other indiciation of the reason for the
failure.

Change how the function is defined for better readability. When
CONFIG_MANDATORY_FILE_LOCKING is disabled, printk a warning when
someone attempts to mount with -o mand.

Also, add a blurb to the mandatory-locking.txt file to explain about
the "mand" option, and the behavior one should expect when it is
disabled.

Reported-by: Jan Kara
Reviewed-by: Jan Kara
Signed-off-by: Jeff Layton

Jeff Layton
2019-08-17 00:13:48 +0800

26 Jul, 2019

1 commit

19a1c4092 fix the struct mount leak in umount_tree() ... Browse Code »

We need to drop everything we remove from the tree, whether
mnt_has_parent() is true or not. Usually the bug manifests as a slow
memory leak (leaked struct mount for initramfs); it becomes much more
visible in mount_subtree() users, such as btrfs. There we leak
a struct mount for btrfs superblock being mounted, which prevents
fs shutdown on subsequent umount.

Fixes: 56cbb429d911 ("switch the remnants of releasing the mountpoint away from fs_pin")
Reported-by: Nikolay Borisov
Tested-by: Nikolay Borisov
Signed-off-by: Al Viro

Al Viro
2019-07-26 19:59:06 +0800

22 Jul, 2019

1 commit

39145f5f0 filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there ... Browse Code »

user_path_mountpoint_at() always gets it and the reasons to have it
there (i.e. in umount(2)) apply to kern_path_mountpoint() callers
as well.

Signed-off-by: Al Viro

Al Viro
2019-07-22 06:24:45 +0800

21 Jul, 2019

1 commit

18253e034 Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull dcache and mountpoint updates from Al Viro:
"Saner handling of refcounts to mountpoints.

Transfer the counting reference from struct mount ->mnt_mountpoint
over to struct mountpoint ->m_dentry. That allows us to get rid of the
convoluted games with ordering of mount shutdowns.

The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
mixed-filesystem shrink lists, which we'll also need for the Slab
Movable Objects patchset"

* 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch the remnants of releasing the mountpoint away from fs_pin
get rid of detach_mnt()
make struct mountpoint bear the dentry reference to mountpoint, not struct mount
Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
__detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
nfs: dget_parent() never returns NULL
ceph: don't open-code the check for dead lockref

Linus Torvalds
2019-07-21 00:15:51 +0800

20 Jul, 2019

1 commit

933a90bf4 Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs mount updates from Al Viro:
"The first part of mount updates.

Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
mnt_init(): call shmem_init() unconditionally
constify ksys_mount() string arguments
don't bother with registering rootfs
init_rootfs(): don't bother with init_ramfs_fs()
vfs: Convert smackfs to use the new mount API
vfs: Convert selinuxfs to use the new mount API
vfs: Convert securityfs to use the new mount API
vfs: Convert apparmorfs to use the new mount API
vfs: Convert openpromfs to use the new mount API
vfs: Convert xenfs to use the new mount API
vfs: Convert gadgetfs to use the new mount API
vfs: Convert oprofilefs to use the new mount API
vfs: Convert ibmasmfs to use the new mount API
vfs: Convert qib_fs/ipathfs to use the new mount API
vfs: Convert efivarfs to use the new mount API
vfs: Convert configfs to use the new mount API
vfs: Convert binfmt_misc to use the new mount API
convenience helper: get_tree_single()
convenience helper get_tree_nodev()
vfs: Kill sget_userns()
...

Linus Torvalds
2019-07-20 01:42:02 +0800

17 Jul, 2019

3 commits

56cbb429d switch the remnants of releasing the mountpoint away from fs_pin ... Browse Code »

We used to need rather convoluted ordering trickery to guarantee
that dput() of ex-mountpoints happens before the final mntput()
of the same. Since we don't need that anymore, there's no point
playing with fs_pin for that.

Signed-off-by: Al Viro

Al Viro
2019-07-17 10:52:37 +0800
2763d1191 get rid of detach_mnt() ... Browse Code »

Lift getting the original mount (dentry is actually not needed at all)
of the mountpoint into the callers - to do_move_mount() and pivot_root()
level. That simplifies the cleanup in those and allows to get saner
arguments for attach_mnt_recursive().

Signed-off-by: Al Viro

Al Viro
2019-07-17 10:50:11 +0800
4edbe133f make struct mountpoint bear the dentry reference to mountpoint, not struct mount ... Browse Code »

Using dput_to_list() to shift the contributing reference from ->mnt_mountpoint
to ->mnt_mp->m_dentry. Dentries are dropped (with dput_to_list()) as soon
as struct mountpoint is destroyed; in cases where we are under namespace_sem
we use the global list, shrinking it in namespace_unlock(). In case of
detaching stuck MNT_LOCKed children at final mntput_no_expire() we use a local
list and shrink it ourselves. ->mnt_ex_mountpoint crap is gone.

Signed-off-by: Al Viro

Al Viro
2019-07-17 10:43:40 +0800

05 Jul, 2019

5 commits

037f11b47 mnt_init(): call shmem_init() unconditionally ... Browse Code »

No point having two call sites (earlier in init_rootfs() from
mnt_init() in case we are going to use shmem-style rootfs,
later from do_basic_setup() unconditionally), along with the
logics in shmem_init() itself to make the second call a no-op...

Signed-off-by: Al Viro

Al Viro
2019-07-05 10:01:59 +0800
33488845f constify ksys_mount() string arguments ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2019-07-05 10:01:59 +0800
fd3e007f6 don't bother with registering rootfs ... Browse Code »

init_mount_tree() can get to rootfs_fs_type directly and that simplifies
a lot of things. We don't need to register it, we don't need to look
it up *and* we don't need to bother with preventing subsequent userland
mounts. That's the way we should've done that from the very beginning.

There is a user-visible change, namely the disappearance of "rootfs"
from /proc/filesystems. Note that it's been unmountable all along
and it didn't show up in /proc/mounts; however, it *is* a user-visible
change and theoretically some script might've been using its presence
in /proc/filesystems to tell 2.4.11+ from earlier kernels.

*IF* any complaints about behaviour change do show up, we could fake
it in /proc/filesystems. I very much doubt we'll have to, though.

Signed-off-by: Al Viro

Al Viro
2019-07-05 10:01:59 +0800
e4e59906c fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt() ... Browse Code »

make unhash_mnt() return the mountpoint to be dropped, let callers
deal with it.

Signed-off-by: Al Viro

Al Viro
2019-07-05 06:58:38 +0800
adc9b5c09 __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore ... Browse Code »

... not since 1e9c75fb9c47 ("mnt: fix __detach_mounts infinite loop")

Signed-off-by: Al Viro

Al Viro
2019-07-05 06:58:37 +0800

01 Jul, 2019

1 commit

570d7a98e vfs: move_mount: reject moving kernel internal mounts ... Browse Code »

sys_move_mount() crashes by dereferencing the pointer MNT_NS_INTERNAL,
a.k.a. ERR_PTR(-EINVAL), if the old mount is specified by fd for a
kernel object with an internal mount, such as a pipe or memfd.

Fix it by checking for this case and returning -EINVAL.

[AV: what we want is is_mounted(); use that instead of making the
condition even more convoluted]

Reproducer:

#include

#define __NR_move_mount 429
#define MOVE_MOUNT_F_EMPTY_PATH 0x00000004

int main()
{
int fds[2];

pipe(fds);
syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
}

Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com
Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Eric Biggers
Signed-off-by: Al Viro

Eric Biggers
2019-07-01 22:46:36 +0800

18 Jun, 2019

2 commits

d728cf791 fs/namespace: fix unprivileged mount propagation ... Browse Code »

When propagating mounts across mount namespaces owned by different user
namespaces it is not possible anymore to move or umount the mount in the
less privileged mount namespace.

Here is a reproducer:

sudo mount -t tmpfs tmpfs /mnt
sudo --make-rshared /mnt

# create unprivileged user + mount namespace and preserve propagation
unshare -U -m --map-root --propagation=unchanged

# now change back to the original mount namespace in another terminal:
sudo mkdir /mnt/aaa
sudo mount -t tmpfs tmpfs /mnt/aaa

# now in the unprivileged user + mount namespace
mount --move /mnt/aaa /opt

Unfortunately, this is a pretty big deal for userspace since this is
e.g. used to inject mounts into running unprivileged containers.
So this regression really needs to go away rather quickly.

The problem is that a recent change falsely locked the root of the newly
added mounts by setting MNT_LOCKED. Fix this by only locking the mounts
on copy_mnt_ns() and not when adding a new mount.

Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies")
Cc: Linus Torvalds
Cc: Al Viro
Cc:
Tested-by: Christian Brauner
Acked-by: Christian Brauner
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Christian Brauner
Signed-off-by: Al Viro

Christian Brauner
2019-06-18 05:36:09 +0800
1b0b9cc8d vfs: fsmount: add missing mntget() ... Browse Code »

sys_fsmount() needs to take a reference to the new mount when adding it
to the anonymous mount namespace. Otherwise the filesystem can be
unmounted while it's still in use, as found by syzkaller.

Reported-by: Mark Rutland
Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
Signed-off-by: Eric Biggers
Signed-off-by: Al Viro

Eric Biggers
2019-06-18 05:36:07 +0800

31 May, 2019

1 commit

59bd9ded4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209 ... Browse Code »

Based on 1 normalized pattern(s):

released under gpl v2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 15 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Steve Winslow
Reviewed-by: Allison Randal
Reviewed-by: Alexios Zavras
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171438.895196075@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-31 02:29:53 +0800

26 May, 2019

1 commit

c3aabf078 move mount_capable() further out ... Browse Code »

Call graph of vfs_get_tree():
vfs_fsconfig_locked() # neither kernmount, nor submount
do_new_mount() # neither kernmount, nor submount
fc_mount()
afs_mntpt_do_automount() # submount
mount_one_hugetlbfs() # kernmount
pid_ns_prepare_proc() # kernmount
mq_create_mount() # kernmount
vfs_kern_mount()
simple_pin_fs() # kernmount
vfs_submount() # submount
kern_mount() # kernmount
init_mount_tree()
btrfs_mount()
nfs_do_root_mount()

The first two need the check (unconditionally).
init_mount_tree() is setting rootfs up; any capability
checks make zero sense for that one. And btrfs_mount()/
nfs_do_root_mount() have the checks already done in their
callers.

IOW, we can shift mount_capable() handling into
the two callers - one in the normal case of mount(2),
another - in fsconfig(2) handling of FSCONFIG_CMD_CREATE.
I.e. the syscalls that set a new filesystem up.

Signed-off-by: Al Viro

Al Viro
2019-05-26 06:00:02 +0800

09 May, 2019

1 commit

05883eee8 do_move_mount(): fix an unsafe use of is_anon_ns() ... Browse Code »

What triggers it is a race between mount --move and umount -l
of the source; we should reject it (the source is parentless *and*
not the root of anon namespace at that), but the check for namespace
being an anon one is broken in that case - is_anon_ns() needs
ns to be non-NULL. Better fixed here than in is_anon_ns(), since
the rest of the callers is guaranteed to get a non-NULL argument...

Reported-by: syzbot+494c7ddf66acac0ad747@syzkaller.appspotmail.com
Signed-off-by: Al Viro

Al Viro
2019-05-09 14:32:50 +0800

21 Mar, 2019

4 commits

93766fbd2 vfs: syscall: Add fsmount() to create a mount for a superblock ... Browse Code »

Provide a system call by which a filesystem opened with fsopen() and
configured by a series of fsconfig() calls can have a detached mount object
created for it. This mount object can then be attached to the VFS mount
hierarchy using move_mount() by passing the returned file descriptor as the
from directory fd.

The system call looks like:

int mfd = fsmount(int fsfd, unsigned int flags,
unsigned int attr_flags);

where fsfd is the file descriptor returned by fsopen(). flags can be 0 or
FSMOUNT_CLOEXEC. attr_flags is a bitwise-OR of the following flags:

MOUNT_ATTR_RDONLY Mount read-only
MOUNT_ATTR_NOSUID Ignore suid and sgid bits
MOUNT_ATTR_NODEV Disallow access to device special files
MOUNT_ATTR_NOEXEC Disallow program execution
MOUNT_ATTR__ATIME Setting on how atime should be updated
MOUNT_ATTR_RELATIME - Update atime relative to mtime/ctime
MOUNT_ATTR_NOATIME - Do not update access times
MOUNT_ATTR_STRICTATIME - Always perform atime updates
MOUNT_ATTR_NODIRATIME Do not update directory access times

In the event that fsmount() fails, it may be possible to get an error
message by calling read() on fsfd. If no message is available, ENODATA
will be reported.

Signed-off-by: David Howells
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro

David Howells
2019-03-21 06:49:06 +0800
44dfd84a6 teach move_mount(2) to work with OPEN_TREE_CLONE ... Browse Code »

Allow a detached tree created by open_tree(..., OPEN_TREE_CLONE) to be
attached by move_mount(2).

If by the time of final fput() of OPEN_TREE_CLONE-opened file its tree is
not detached anymore, it won't be dissolved. move_mount(2) is adjusted
to handle detached source.

That gives us equivalents of mount --bind and mount --rbind.

Thanks also to Alan Jenkins for
providing a whole bunch of ways to break things using this interface.

Signed-off-by: Al Viro
Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2019-03-21 06:49:06 +0800
2db154b3e vfs: syscall: Add move_mount(2) to move mounts around ... Browse Code »

Add a move_mount() system call that will move a mount from one place to
another and, in the next commit, allow to attach an unattached mount tree.

The new system call looks like the following:

int move_mount(int from_dfd, const char *from_path,
int to_dfd, const char *to_path,
unsigned int flags);

Signed-off-by: David Howells
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro

David Howells
2019-03-21 06:49:06 +0800
a07b20004 vfs: syscall: Add open_tree(2) to reference or clone a mount ... Browse Code »

open_tree(dfd, pathname, flags)

Returns an O_PATH-opened file descriptor or an error.
dfd and pathname specify the location to open, in usual
fashion (see e.g. fstatat(2)). flags should be an OR of
some of the following:
* AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW -
same meanings as usual
* OPEN_TREE_CLOEXEC - make the resulting descriptor
close-on-exec
* OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE -
instead of opening the location in question, create a detached
mount tree matching the subtree rooted at location specified by
dfd/pathname. With AT_RECURSIVE the entire subtree is cloned,
without it - only the part within in the mount containing the
location in question. In other words, the same as mount --rbind
or mount --bind would've taken. The detached tree will be
dissolved on the final close of obtained file. Creation of such
detached trees requires the same capabilities as doing mount --bind.

Signed-off-by: Al Viro
Signed-off-by: David Howells
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2019-03-21 06:49:06 +0800

13 Mar, 2019

1 commit

7b47a9e7c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs mount infrastructure updates from Al Viro:
"The rest of core infrastructure; no new syscalls in that pile, but the
old parts are switched to new infrastructure. At that point
conversions of individual filesystems can happen independently; some
are done here (afs, cgroup, procfs, etc.), there's also a large series
outside of that pile dealing with NFS (quite a bit of option-parsing
stuff is getting used there - it's one of the most convoluted
filesystems in terms of mount-related logics), but NFS bits are the
next cycle fodder.

It got seriously simplified since the last cycle; documentation is
probably the weakest bit at the moment - I considered dropping the
commit introducing Documentation/filesystems/mount_api.txt (cutting
the size increase by quarter ;-), but decided that it would be better
to fix it up after -rc1 instead.

That pile allows to do followup work in independent branches, which
should make life much easier for the next cycle. fs/super.c size
increase is unpleasant; there's a followup series that allows to
shrink it considerably, but I decided to leave that until the next
cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
afs: Use fs_context to pass parameters over automount
afs: Add fs_context support
vfs: Add some logging to the core users of the fs_context log
vfs: Implement logging through fs_context
vfs: Provide documentation for new mount API
vfs: Remove kern_mount_data()
hugetlbfs: Convert to fs_context
cpuset: Use fs_context
kernfs, sysfs, cgroup, intel_rdt: Support fs_context
cgroup: store a reference to cgroup_ns into cgroup_fs_context
cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
cgroup_do_mount(): massage calling conventions
cgroup: stash cgroup_root reference into cgroup_fs_context
cgroup2: switch to option-by-option parsing
cgroup1: switch to option-by-option parsing
cgroup: take options parsing into ->parse_monolithic()
cgroup: fold cgroup1_mount() into cgroup1_get_tree()
cgroup: start switching to fs_context
ipc: Convert mqueue fs to fs_context
proc: Add fs_context support to procfs
...

Linus Torvalds
2019-03-13 05:08:19 +0800

08 Mar, 2019

1 commit

be37f21a0 Merge tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit ... Browse Code »

Pull audit updates from Paul Moore:
"A lucky 13 audit patches for v5.1.

Despite the rather large diffstat, most of the changes are from two
bug fix patches that move code from one Kconfig option to another.

Beyond that bit of churn, the remaining changes are largely cleanups
and bug-fixes as we slowly march towards container auditing. It isn't
all boring though, we do have a couple of new things: file
capabilities v3 support, and expanded support for filtering on
filesystems to solve problems with remote filesystems.

All changes pass the audit-testsuite. Please merge for v5.1"

* tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: mark expected switch fall-through
audit: hide auditsc_get_stamp and audit_serial prototypes
audit: join tty records to their syscall
audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL
audit: remove unused actx param from audit_rule_match
audit: ignore fcaps on umount
audit: clean up AUDITSYSCALL prototypes and stubs
audit: more filter PATH records keyed on filesystem magic
audit: add support for fcaps v3
audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT
audit: add syscall information to CONFIG_CHANGE records
audit: hand taken context to audit_kill_trees for syscall logging
audit: give a clue what CONFIG_CHANGE op was involved

Linus Torvalds
2019-03-08 04:20:11 +0800

05 Mar, 2019

1 commit

4f9020ffd Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"Assorted fixes that sat in -next for a while, all over the place"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: Fix locking in aio_poll()
exec: Fix mem leak in kernel_read_file
copy_mount_string: Limit string length to PATH_MAX
cgroup: saner refcounting for cgroup_root
fix cgroup_do_mount() handling of failure exits

Linus Torvalds
2019-03-05 05:24:27 +0800

28 Feb, 2019

2 commits

d911b4585 vfs: Remove kern_mount_data() ... Browse Code »

The kern_mount_data() isn't used any more so remove it.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2019-02-28 16:29:36 +0800
3e1aeb00e vfs: Implement a filesystem superblock creation/configuration context ... Browse Code »

[AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living
mounts]
[AV - reordering fs/namespace.c is badly overdue, but let's keep it
separate from that series]
[AV - drop simple_pin_fs() change]
[AV - clean vfs_kern_mount() failure exits up]

Implement a filesystem context concept to be used during superblock
creation for mount and superblock reconfiguration for remount.

The mounting procedure then becomes:

(1) Allocate new fs_context context.

(2) Configure the context.

(3) Create superblock.

(4) Query the superblock.

(5) Create a mount for the superblock.

(6) Destroy the context.

Rather than calling fs_type->mount(), an fs_context struct is created and
fs_type->init_fs_context() is called to set it up. Pointers exist for the
filesystem and LSM to hang their private data off.

A set of operations has to be set by ->init_fs_context() to provide
freeing, duplication, option parsing, binary data parsing, validation,
mounting and superblock filling.

Legacy filesystems are supported by the provision of a set of legacy
fs_context operations that build up a list of mount options and then invoke
fs_type->mount() from within the fs_context ->get_tree() operation. This
allows all filesystems to be accessed using fs_context.

It should be noted that, whilst this patch adds a lot of lines of code,
there is quite a bit of duplication with existing code that can be
eliminated should all filesystems be converted over.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2019-02-28 16:29:26 +0800

26 Feb, 2019

1 commit

53a41cb7e Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses" ... Browse Code »

This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b.

It was well-intentioned, but wrong. Overriding the exception tables for
instructions for random reasons is just wrong, and that is what the new
code did.

It caused problems for tracing, and it caused problems for strncpy_from_user(),
because the new checks made perfectly valid use cases break, rather than
catch things that did bad things.

Unchecked user space accesses are a problem, but that's not a reason to
add invalid checks that then people have to work around with silly flags
(in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
odd way to say "this commit was wrong" and was sprinked into random
places to hide the wrongness).

The real fix to unchecked user space accesses is to get rid of the
special "let's not check __get_user() and __put_user() at all" logic.
Make __{get|put}_user() be just aliases to the regular {get|put}_user()
functions, and make it impossible to access user space without having
the proper checks in places.

The raison d'être of the special double-underscore versions used to be
that the range check was expensive, and if you did multiple user
accesses, you'd do the range check up front (like the signal frame
handling code, for example). But SMAP (on x86) and PAN (on ARM) have
made that optimization pointless, because the _real_ expense is the "set
CPU flag to allow user space access".

Do let's not break the valid cases to catch invalid cases that shouldn't
even exist.

Cc: Thomas Gleixner
Cc: Kees Cook
Cc: Tobin C. Harding
Cc: Borislav Petkov
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Jann Horn
Signed-off-by: Linus Torvalds

Linus Torvalds
2019-02-26 01:10:51 +0800

01 Feb, 2019

1 commit

fbdb44013 copy_mount_string: Limit string length to PATH_MAX ... Browse Code »

On ppc64le, When a string with PAGE_SIZE - 1 (i.e. 64k-1) length is
passed as a "filesystem type" argument to the mount(2) syscall,
copy_mount_string() ends up allocating 64k (the PAGE_SIZE on ppc64le)
worth of space for holding the string in kernel's address space.

Later, in set_precision() (invoked by get_fs_type() ->
__request_module() -> vsnprintf()), we end up assigning
strlen(fs-type-string) i.e. 65535 as the
value to 'struct printf_spec'->precision member. This field has a width
of 16 bits and it is a signed data type. Hence an invalid value ends
up getting assigned. This causes the "WARN_ONCE(spec->precision != prec,
"precision %d too large", prec)" statement inside set_precision() to be
executed.

This commit fixes the bug by limiting the length of the string passed by
copy_mount_string() to strndup_user() to PATH_MAX.

Signed-off-by: Chandan Rajendra
Reported-by: Abdul Haleem
Suggested-by: Al Viro
Signed-off-by: Al Viro

Chandan Rajendra
2019-02-01 14:57:33 +0800