17 Oct, 2019

1 commit

  • After do_add_mount() returns success, the caller doesn't hold a
    reference to the 'struct mount' anymore. So it's invalid to access it
    in mnt_warn_timestamp_expiry().

    Fix it by calling mnt_warn_timestamp_expiry() before do_add_mount()
    rather than after, and adjusting the warning message accordingly.

    Reported-by: syzbot+da4f525235510683d855@syzkaller.appspotmail.com
    Fixes: f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry")
    Signed-off-by: Eric Biggers
    Signed-off-by: Al Viro

    Eric Biggers
     

27 Sep, 2019

1 commit

  • Merge more updates from Andrew Morton:

    - almost all of the rest of -mm

    - various other subsystems

    Subsystems affected by this patch series:
    memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork,
    cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise,
    cleanups, pagemap

    * emailed patches from Andrew Morton : (77 commits)
    arch/sparc/include/asm/pgtable_64.h: fix build
    mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
    ntfs: remove (un)?likely() from IS_ERR() conditions
    IB/hfi1: remove unlikely() from IS_ERR*() condition
    xfs: remove unlikely() from WARN_ON() condition
    wimax/i2400m: remove unlikely() from WARN*() condition
    fs: remove unlikely() from WARN_ON() condition
    xen/events: remove unlikely() from WARN() condition
    checkpatch: check for nested (un)?likely() calls
    hexagon: drop empty and unused free_initrd_mem
    mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
    mm: introduce MADV_PAGEOUT
    mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
    mm: introduce MADV_COLD
    mm: untag user pointers in mmap/munmap/mremap/brk
    vfio/type1: untag user pointers in vaddr_get_pfn
    tee/shm: untag user pointers in tee_shm_register
    media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
    drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
    drm/amdgpu: untag user pointers
    ...

    Linus Torvalds
     

26 Sep, 2019

2 commits

  • This patch is a part of a series that extends kernel ABI to allow to pass
    tagged user pointers (with the top byte set to something else other than
    0x00) as syscall arguments.

    In copy_mount_options a user address is being subtracted from TASK_SIZE.
    If the address is lower than TASK_SIZE, the size is calculated to not
    allow the exact_copy_from_user() call to cross TASK_SIZE boundary.
    However if the address is tagged, then the size will be calculated
    incorrectly.

    Untag the address before subtracting.

    Link: http://lkml.kernel.org/r/1de225e4a54204bfd7f25dac2635e31aa4aa1d90.1563904656.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Khalid Aziz
    Reviewed-by: Vincenzo Frascino
    Reviewed-by: Kees Cook
    Reviewed-by: Catalin Marinas
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Auger
    Cc: Felix Kuehling
    Cc: Jens Wiklander
    Cc: Mauro Carvalho Chehab
    Cc: Mike Rapoport
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • Pull fuse updates from Miklos Szeredi:

    - Continue separating the transport (user/kernel communication) and the
    filesystem layers of fuse. Getting rid of most layering violations
    will allow for easier cleanup and optimization later on.

    - Prepare for the addition of the virtio-fs filesystem. The actual
    filesystem will be introduced by a separate pull request.

    - Convert to new mount API.

    - Various fixes, optimizations and cleanups.

    * tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits)
    fuse: Make fuse_args_to_req static
    fuse: fix memleak in cuse_channel_open
    fuse: fix beyond-end-of-page access in fuse_parse_cache()
    fuse: unexport fuse_put_request
    fuse: kmemcg account fs data
    fuse: on 64-bit store time in d_fsdata directly
    fuse: fix missing unlock_page in fuse_writepage()
    fuse: reserve byteswapped init opcodes
    fuse: allow skipping control interface and forced unmount
    fuse: dissociate DESTROY from fuseblk
    fuse: delete dentry if timeout is zero
    fuse: separate fuse device allocation and installation in fuse_conn
    fuse: add fuse_iqueue_ops callbacks
    fuse: extract fuse_fill_super_common()
    fuse: export fuse_dequeue_forget() function
    fuse: export fuse_get_unique()
    fuse: export fuse_send_init_request()
    fuse: export fuse_len_args()
    fuse: export fuse_end_request()
    fuse: fix request limit
    ...

    Linus Torvalds
     

20 Sep, 2019

1 commit

  • Pull y2038 vfs updates from Arnd Bergmann:
    "Add inode timestamp clamping.

    This series from Deepa Dinamani adds a per-superblock minimum/maximum
    timestamp limit for a file system, and clamps timestamps as they are
    written, to avoid random behavior from integer overflow as well as
    having different time stamps on disk vs in memory.

    At mount time, a warning is now printed for any file system that can
    represent current timestamps but not future timestamps more than 30
    years into the future, similar to the arbitrary 30 year limit that was
    added to settimeofday().

    This was picked as a compromise to warn users to migrate to other file
    systems (e.g. ext4 instead of ext3) when they need the file system to
    survive beyond 2038 (or similar limits in other file systems), but not
    get in the way of normal usage"

    * tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    ext4: Reduce ext4 timestamp warnings
    isofs: Initialize filesystem timestamp ranges
    pstore: fs superblock limits
    fs: omfs: Initialize filesystem timestamp ranges
    fs: hpfs: Initialize filesystem timestamp ranges
    fs: ceph: Initialize filesystem timestamp ranges
    fs: sysv: Initialize filesystem timestamp ranges
    fs: affs: Initialize filesystem timestamp ranges
    fs: fat: Initialize filesystem timestamp ranges
    fs: cifs: Initialize filesystem timestamp ranges
    fs: nfs: Initialize filesystem timestamp ranges
    ext4: Initialize timestamps limits
    9p: Fill min and max timestamps in sb
    fs: Fill in max and min timestamps in superblock
    utimes: Clamp the timestamps before update
    mount: Add mount warning for impending timestamp expiry
    timestamp_truncate: Replace users of timespec64_trunc
    vfs: Add timestamp_truncate() api
    vfs: Add file timestamp range support

    Linus Torvalds
     

19 Sep, 2019

2 commits

  • Pull file locking updates from Jeff Layton:
    "Just a couple of minor bugfixes, a revision to a tracepoint to account
    for some earlier changes to the internals, and a patch to add a
    pr_warn message when someone tries to mount a filesystem with '-o
    mand' on a kernel that has that support disabled"

    * tag 'filelock-v5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    locks: fix a memory leak bug in __break_lease()
    locks: print a warning when mount fails due to lack of "mand" support
    locks: Fix procfs output for file leases
    locks: revise generic_add_lease tracepoint

    Linus Torvalds
     
  • Pull vfs namei updates from Al Viro:
    "Pathwalk-related stuff"

    [ Audit-related cleanups, misc simplifications, and easier to follow
    nd->root refcounts - Linus ]

    * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    devpts_pty_kill(): don't bother with d_delete()
    infiniband: don't bother with d_delete()
    hypfs: don't bother with d_delete()
    fs/namei.c: keep track of nd->root refcount status
    fs/namei.c: new helper - legitimize_root()
    kill the last users of user_{path,lpath,path_dir}()
    namei.h: get the comments on LOOKUP_... in sync with reality
    kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h
    audit_inode(): switch to passing AUDIT_INODE_...
    filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there
    filename_lookup(): audit_inode() argument is always 0

    Linus Torvalds
     

07 Sep, 2019

1 commit

  • The unused vfs code can be removed. Don't pass empty subtype (same as if
    ->parse callback isn't called).

    The bits that are left involve determining whether it's permitted to split the
    filesystem type string passed in to mount(2). Consequently, this means that we
    cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
    with a dot in it always indicates a subtype specification.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Signed-off-by: Miklos Szeredi

    David Howells
     

31 Aug, 2019

1 commit


30 Aug, 2019

1 commit

  • The warning reuses the uptime max of 30 years used by
    settimeofday().

    Note that the warning is only emitted for writable filesystem mounts
    through the mount syscall. Automounts do not have the same warning.

    Print out the warning in human readable format using the struct tm.
    After discussion with Arnd Bergmann, we chose to print only the year number.
    The raw s_time_max is also displayed, and the user can easily decode
    it e.g. "date -u -d @$((0x7fffffff))". We did not want to consolidate
    struct rtc_tm and struct tm just to print the date using a format specifier
    as part of this series.
    Given that the rtc_tm is not compiled on all architectures, this is not a
    trivial patch. This can be added in the future.

    Signed-off-by: Deepa Dinamani
    Acked-by: Jeff Layton

    Deepa Dinamani
     

17 Aug, 2019

1 commit

  • Since 9e8925b67a ("locks: Allow disabling mandatory locking at compile
    time"), attempts to mount filesystems with "-o mand" will fail.
    Unfortunately, there is no other indiciation of the reason for the
    failure.

    Change how the function is defined for better readability. When
    CONFIG_MANDATORY_FILE_LOCKING is disabled, printk a warning when
    someone attempts to mount with -o mand.

    Also, add a blurb to the mandatory-locking.txt file to explain about
    the "mand" option, and the behavior one should expect when it is
    disabled.

    Reported-by: Jan Kara
    Reviewed-by: Jan Kara
    Signed-off-by: Jeff Layton

    Jeff Layton
     

26 Jul, 2019

1 commit

  • We need to drop everything we remove from the tree, whether
    mnt_has_parent() is true or not. Usually the bug manifests as a slow
    memory leak (leaked struct mount for initramfs); it becomes much more
    visible in mount_subtree() users, such as btrfs. There we leak
    a struct mount for btrfs superblock being mounted, which prevents
    fs shutdown on subsequent umount.

    Fixes: 56cbb429d911 ("switch the remnants of releasing the mountpoint away from fs_pin")
    Reported-by: Nikolay Borisov
    Tested-by: Nikolay Borisov
    Signed-off-by: Al Viro

    Al Viro
     

22 Jul, 2019

1 commit


21 Jul, 2019

1 commit

  • Pull dcache and mountpoint updates from Al Viro:
    "Saner handling of refcounts to mountpoints.

    Transfer the counting reference from struct mount ->mnt_mountpoint
    over to struct mountpoint ->m_dentry. That allows us to get rid of the
    convoluted games with ordering of mount shutdowns.

    The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
    mixed-filesystem shrink lists, which we'll also need for the Slab
    Movable Objects patchset"

    * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch the remnants of releasing the mountpoint away from fs_pin
    get rid of detach_mnt()
    make struct mountpoint bear the dentry reference to mountpoint, not struct mount
    Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
    fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
    __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
    nfs: dget_parent() never returns NULL
    ceph: don't open-code the check for dead lockref

    Linus Torvalds
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

17 Jul, 2019

3 commits

  • We used to need rather convoluted ordering trickery to guarantee
    that dput() of ex-mountpoints happens before the final mntput()
    of the same. Since we don't need that anymore, there's no point
    playing with fs_pin for that.

    Signed-off-by: Al Viro

    Al Viro
     
  • Lift getting the original mount (dentry is actually not needed at all)
    of the mountpoint into the callers - to do_move_mount() and pivot_root()
    level. That simplifies the cleanup in those and allows to get saner
    arguments for attach_mnt_recursive().

    Signed-off-by: Al Viro

    Al Viro
     
  • Using dput_to_list() to shift the contributing reference from ->mnt_mountpoint
    to ->mnt_mp->m_dentry. Dentries are dropped (with dput_to_list()) as soon
    as struct mountpoint is destroyed; in cases where we are under namespace_sem
    we use the global list, shrinking it in namespace_unlock(). In case of
    detaching stuck MNT_LOCKed children at final mntput_no_expire() we use a local
    list and shrink it ourselves. ->mnt_ex_mountpoint crap is gone.

    Signed-off-by: Al Viro

    Al Viro
     

05 Jul, 2019

5 commits


01 Jul, 2019

1 commit

  • sys_move_mount() crashes by dereferencing the pointer MNT_NS_INTERNAL,
    a.k.a. ERR_PTR(-EINVAL), if the old mount is specified by fd for a
    kernel object with an internal mount, such as a pipe or memfd.

    Fix it by checking for this case and returning -EINVAL.

    [AV: what we want is is_mounted(); use that instead of making the
    condition even more convoluted]

    Reproducer:

    #include

    #define __NR_move_mount 429
    #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004

    int main()
    {
    int fds[2];

    pipe(fds);
    syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
    }

    Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com
    Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around")
    Signed-off-by: Eric Biggers
    Signed-off-by: Al Viro

    Eric Biggers
     

18 Jun, 2019

2 commits

  • When propagating mounts across mount namespaces owned by different user
    namespaces it is not possible anymore to move or umount the mount in the
    less privileged mount namespace.

    Here is a reproducer:

    sudo mount -t tmpfs tmpfs /mnt
    sudo --make-rshared /mnt

    # create unprivileged user + mount namespace and preserve propagation
    unshare -U -m --map-root --propagation=unchanged

    # now change back to the original mount namespace in another terminal:
    sudo mkdir /mnt/aaa
    sudo mount -t tmpfs tmpfs /mnt/aaa

    # now in the unprivileged user + mount namespace
    mount --move /mnt/aaa /opt

    Unfortunately, this is a pretty big deal for userspace since this is
    e.g. used to inject mounts into running unprivileged containers.
    So this regression really needs to go away rather quickly.

    The problem is that a recent change falsely locked the root of the newly
    added mounts by setting MNT_LOCKED. Fix this by only locking the mounts
    on copy_mnt_ns() and not when adding a new mount.

    Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies")
    Cc: Linus Torvalds
    Cc: Al Viro
    Cc:
    Tested-by: Christian Brauner
    Acked-by: Christian Brauner
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Christian Brauner
    Signed-off-by: Al Viro

    Christian Brauner
     
  • sys_fsmount() needs to take a reference to the new mount when adding it
    to the anonymous mount namespace. Otherwise the filesystem can be
    unmounted while it's still in use, as found by syzkaller.

    Reported-by: Mark Rutland
    Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
    Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
    Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
    Signed-off-by: Eric Biggers
    Signed-off-by: Al Viro

    Eric Biggers
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    released under gpl v2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 15 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Steve Winslow
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171438.895196075@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 May, 2019

1 commit

  • Call graph of vfs_get_tree():
    vfs_fsconfig_locked() # neither kernmount, nor submount
    do_new_mount() # neither kernmount, nor submount
    fc_mount()
    afs_mntpt_do_automount() # submount
    mount_one_hugetlbfs() # kernmount
    pid_ns_prepare_proc() # kernmount
    mq_create_mount() # kernmount
    vfs_kern_mount()
    simple_pin_fs() # kernmount
    vfs_submount() # submount
    kern_mount() # kernmount
    init_mount_tree()
    btrfs_mount()
    nfs_do_root_mount()

    The first two need the check (unconditionally).
    init_mount_tree() is setting rootfs up; any capability
    checks make zero sense for that one. And btrfs_mount()/
    nfs_do_root_mount() have the checks already done in their
    callers.

    IOW, we can shift mount_capable() handling into
    the two callers - one in the normal case of mount(2),
    another - in fsconfig(2) handling of FSCONFIG_CMD_CREATE.
    I.e. the syscalls that set a new filesystem up.

    Signed-off-by: Al Viro

    Al Viro
     

09 May, 2019

1 commit

  • What triggers it is a race between mount --move and umount -l
    of the source; we should reject it (the source is parentless *and*
    not the root of anon namespace at that), but the check for namespace
    being an anon one is broken in that case - is_anon_ns() needs
    ns to be non-NULL. Better fixed here than in is_anon_ns(), since
    the rest of the callers is guaranteed to get a non-NULL argument...

    Reported-by: syzbot+494c7ddf66acac0ad747@syzkaller.appspotmail.com
    Signed-off-by: Al Viro

    Al Viro
     

21 Mar, 2019

4 commits

  • Provide a system call by which a filesystem opened with fsopen() and
    configured by a series of fsconfig() calls can have a detached mount object
    created for it. This mount object can then be attached to the VFS mount
    hierarchy using move_mount() by passing the returned file descriptor as the
    from directory fd.

    The system call looks like:

    int mfd = fsmount(int fsfd, unsigned int flags,
    unsigned int attr_flags);

    where fsfd is the file descriptor returned by fsopen(). flags can be 0 or
    FSMOUNT_CLOEXEC. attr_flags is a bitwise-OR of the following flags:

    MOUNT_ATTR_RDONLY Mount read-only
    MOUNT_ATTR_NOSUID Ignore suid and sgid bits
    MOUNT_ATTR_NODEV Disallow access to device special files
    MOUNT_ATTR_NOEXEC Disallow program execution
    MOUNT_ATTR__ATIME Setting on how atime should be updated
    MOUNT_ATTR_RELATIME - Update atime relative to mtime/ctime
    MOUNT_ATTR_NOATIME - Do not update access times
    MOUNT_ATTR_STRICTATIME - Always perform atime updates
    MOUNT_ATTR_NODIRATIME Do not update directory access times

    In the event that fsmount() fails, it may be possible to get an error
    message by calling read() on fsfd. If no message is available, ENODATA
    will be reported.

    Signed-off-by: David Howells
    cc: linux-api@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Allow a detached tree created by open_tree(..., OPEN_TREE_CLONE) to be
    attached by move_mount(2).

    If by the time of final fput() of OPEN_TREE_CLONE-opened file its tree is
    not detached anymore, it won't be dissolved. move_mount(2) is adjusted
    to handle detached source.

    That gives us equivalents of mount --bind and mount --rbind.

    Thanks also to Alan Jenkins for
    providing a whole bunch of ways to break things using this interface.

    Signed-off-by: Al Viro
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Add a move_mount() system call that will move a mount from one place to
    another and, in the next commit, allow to attach an unattached mount tree.

    The new system call looks like the following:

    int move_mount(int from_dfd, const char *from_path,
    int to_dfd, const char *to_path,
    unsigned int flags);

    Signed-off-by: David Howells
    cc: linux-api@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • open_tree(dfd, pathname, flags)

    Returns an O_PATH-opened file descriptor or an error.
    dfd and pathname specify the location to open, in usual
    fashion (see e.g. fstatat(2)). flags should be an OR of
    some of the following:
    * AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW -
    same meanings as usual
    * OPEN_TREE_CLOEXEC - make the resulting descriptor
    close-on-exec
    * OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE -
    instead of opening the location in question, create a detached
    mount tree matching the subtree rooted at location specified by
    dfd/pathname. With AT_RECURSIVE the entire subtree is cloned,
    without it - only the part within in the mount containing the
    location in question. In other words, the same as mount --rbind
    or mount --bind would've taken. The detached tree will be
    dissolved on the final close of obtained file. Creation of such
    detached trees requires the same capabilities as doing mount --bind.

    Signed-off-by: Al Viro
    Signed-off-by: David Howells
    cc: linux-api@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

13 Mar, 2019

1 commit

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     

08 Mar, 2019

1 commit

  • Pull audit updates from Paul Moore:
    "A lucky 13 audit patches for v5.1.

    Despite the rather large diffstat, most of the changes are from two
    bug fix patches that move code from one Kconfig option to another.

    Beyond that bit of churn, the remaining changes are largely cleanups
    and bug-fixes as we slowly march towards container auditing. It isn't
    all boring though, we do have a couple of new things: file
    capabilities v3 support, and expanded support for filtering on
    filesystems to solve problems with remote filesystems.

    All changes pass the audit-testsuite. Please merge for v5.1"

    * tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
    audit: mark expected switch fall-through
    audit: hide auditsc_get_stamp and audit_serial prototypes
    audit: join tty records to their syscall
    audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL
    audit: remove unused actx param from audit_rule_match
    audit: ignore fcaps on umount
    audit: clean up AUDITSYSCALL prototypes and stubs
    audit: more filter PATH records keyed on filesystem magic
    audit: add support for fcaps v3
    audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT
    audit: add syscall information to CONFIG_CHANGE records
    audit: hand taken context to audit_kill_trees for syscall logging
    audit: give a clue what CONFIG_CHANGE op was involved

    Linus Torvalds
     

05 Mar, 2019

1 commit


28 Feb, 2019

2 commits

  • The kern_mount_data() isn't used any more so remove it.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • [AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living
    mounts]
    [AV - reordering fs/namespace.c is badly overdue, but let's keep it
    separate from that series]
    [AV - drop simple_pin_fs() change]
    [AV - clean vfs_kern_mount() failure exits up]

    Implement a filesystem context concept to be used during superblock
    creation for mount and superblock reconfiguration for remount.

    The mounting procedure then becomes:

    (1) Allocate new fs_context context.

    (2) Configure the context.

    (3) Create superblock.

    (4) Query the superblock.

    (5) Create a mount for the superblock.

    (6) Destroy the context.

    Rather than calling fs_type->mount(), an fs_context struct is created and
    fs_type->init_fs_context() is called to set it up. Pointers exist for the
    filesystem and LSM to hang their private data off.

    A set of operations has to be set by ->init_fs_context() to provide
    freeing, duplication, option parsing, binary data parsing, validation,
    mounting and superblock filling.

    Legacy filesystems are supported by the provision of a set of legacy
    fs_context operations that build up a list of mount options and then invoke
    fs_type->mount() from within the fs_context ->get_tree() operation. This
    allows all filesystems to be accessed using fs_context.

    It should be noted that, whilst this patch adds a lot of lines of code,
    there is quite a bit of duplication with existing code that can be
    eliminated should all filesystems be converted over.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

26 Feb, 2019

1 commit

  • This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b.

    It was well-intentioned, but wrong. Overriding the exception tables for
    instructions for random reasons is just wrong, and that is what the new
    code did.

    It caused problems for tracing, and it caused problems for strncpy_from_user(),
    because the new checks made perfectly valid use cases break, rather than
    catch things that did bad things.

    Unchecked user space accesses are a problem, but that's not a reason to
    add invalid checks that then people have to work around with silly flags
    (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
    odd way to say "this commit was wrong" and was sprinked into random
    places to hide the wrongness).

    The real fix to unchecked user space accesses is to get rid of the
    special "let's not check __get_user() and __put_user() at all" logic.
    Make __{get|put}_user() be just aliases to the regular {get|put}_user()
    functions, and make it impossible to access user space without having
    the proper checks in places.

    The raison d'être of the special double-underscore versions used to be
    that the range check was expensive, and if you did multiple user
    accesses, you'd do the range check up front (like the signal frame
    handling code, for example). But SMAP (on x86) and PAN (on ARM) have
    made that optimization pointless, because the _real_ expense is the "set
    CPU flag to allow user space access".

    Do let's not break the valid cases to catch invalid cases that shouldn't
    even exist.

    Cc: Thomas Gleixner
    Cc: Kees Cook
    Cc: Tobin C. Harding
    Cc: Borislav Petkov
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Jann Horn
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Feb, 2019

1 commit

  • On ppc64le, When a string with PAGE_SIZE - 1 (i.e. 64k-1) length is
    passed as a "filesystem type" argument to the mount(2) syscall,
    copy_mount_string() ends up allocating 64k (the PAGE_SIZE on ppc64le)
    worth of space for holding the string in kernel's address space.

    Later, in set_precision() (invoked by get_fs_type() ->
    __request_module() -> vsnprintf()), we end up assigning
    strlen(fs-type-string) i.e. 65535 as the
    value to 'struct printf_spec'->precision member. This field has a width
    of 16 bits and it is a signed data type. Hence an invalid value ends
    up getting assigned. This causes the "WARN_ONCE(spec->precision != prec,
    "precision %d too large", prec)" statement inside set_precision() to be
    executed.

    This commit fixes the bug by limiting the length of the string passed by
    copy_mount_string() to strndup_user() to PATH_MAX.

    Signed-off-by: Chandan Rajendra
    Reported-by: Abdul Haleem
    Suggested-by: Al Viro
    Signed-off-by: Al Viro

    Chandan Rajendra