24 Oct, 2020

1 commit

  • Pull clone/dedupe/remap code refactoring from Darrick Wong:
    "Move the generic file range remap (aka reflink and dedupe) functions
    out of mm/filemap.c and fs/read_write.c and into fs/remap_range.c to
    reduce clutter in the first two files"

    * tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    vfs: move the generic write and copy checks out of mm
    vfs: move the remap range helpers to remap_range.c
    vfs: move generic_remap_checks out of mm

    Linus Torvalds
     

23 Oct, 2020

1 commit

  • Pull initial set_fs() removal from Al Viro:
    "Christoph's set_fs base series + fixups"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Allow a NULL pos pointer to __kernel_read
    fs: Allow a NULL pos pointer to __kernel_write
    powerpc: remove address space overrides using set_fs()
    powerpc: use non-set_fs based maccess routines
    x86: remove address space overrides using set_fs()
    x86: make TASK_SIZE_MAX usable from assembly code
    x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
    lkdtm: remove set_fs-based tests
    test_bitmap: remove user bitmap tests
    uaccess: add infrastructure for kernel builds with set_fs()
    fs: don't allow splice read/write without explicit ops
    fs: don't allow kernel reads and writes without iter ops
    sysctl: Convert to iter interfaces
    proc: add a read_iter method to proc proc_ops
    proc: cleanup the compat vs no compat file ops
    proc: remove a level of indentation in proc_get_inode

    Linus Torvalds
     

16 Oct, 2020

4 commits


13 Oct, 2020

1 commit

  • Pull compat iovec cleanups from Al Viro:
    "Christoph's series around import_iovec() and compat variant thereof"

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    security/keys: remove compat_keyctl_instantiate_key_iov
    mm: remove compat_process_vm_{readv,writev}
    fs: remove compat_sys_vmsplice
    fs: remove the compat readv/writev syscalls
    fs: remove various compat readv/writev helpers
    iov_iter: transparently handle compat iovecs in import_iovec
    iov_iter: refactor rw_copy_check_uvector and import_iovec
    iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
    compat.h: fix a spelling error in

    Linus Torvalds
     

03 Oct, 2020

3 commits


30 Sep, 2020

1 commit

  • autofs got broken in some configurations by commit 13c164b1a186
    ("autofs: switch to kernel_write") because there is now an extra LSM
    permission check done by security_file_permission() in rw_verify_area().

    autofs is one if the few places that really does want the much more
    limited __kernel_write(), because the write is an internal kernel one
    that shouldn't do any user permission checks (it also doesn't need the
    file_start_write/file_end_write logic, since it's just a pipe).

    There are a couple of other cases like that - accounting, core dumping,
    and splice - but autofs stands out because it can be built as a module.

    As a result, we need to export this internal __kernel_write() function
    again.

    We really don't want any other module to use this, but we don't have a
    "EXPORT_SYMBOL_FOR_AUTOFS_ONLY()". But we can mark it GPL-only to at
    least approximate that "internal use only" for licensing.

    While in this area, make autofs pass in NULL for the file position
    pointer, since it's always a pipe, and we now use a NULL file pointer
    for streaming file descriptors (see file_ppos() and commit 438ab720c675:
    "vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files")

    This effectively reverts commits 9db977522449 ("fs: unexport
    __kernel_write") and 13c164b1a186 ("autofs: switch to kernel_write").

    Fixes: 13c164b1a186 ("autofs: switch to kernel_write")
    Reported-by: Ondrej Mosnacek
    Acked-by: Christoph Hellwig
    Acked-by: Acked-by: Ian Kent
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

25 Sep, 2020

1 commit


09 Sep, 2020

2 commits

  • default_file_splice_write is the last piece of generic code that uses
    set_fs to make the uaccess routines operate on kernel pointers. It
    implements a "fallback loop" for splicing from files that do not actually
    provide a proper splice_read method. The usual file systems and other
    high bandwidth instances all provide a ->splice_read, so this just removes
    support for various device drivers and procfs/debugfs files. If splice
    support for any of those turns out to be important it can be added back
    by switching them to the iter ops and using generic_file_splice_read.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Kees Cook
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Don't allow calling ->read or ->write with set_fs as a preparation for
    killing off set_fs. All the instances that we use kernel_read/write on
    are using the iter ops already.

    If a file has both the regular ->read/->write methods and the iter
    variants those could have different semantics for messed up enough
    drivers. Also fails the kernel access to them in that case.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Kees Cook
    Signed-off-by: Al Viro

    Christoph Hellwig
     

30 Jul, 2020

1 commit

  • There is no good reason to mess with file descriptors from in-kernel
    code, switch the initrd loading to struct file based read and writes
    instead.

    Also Pass an explicit offset instead of ->f_pos, and to make that easier,
    use file scope file structs and offsets everywhere except for
    identify_ramdisk_image instead of the current strange mix.

    Signed-off-by: Christoph Hellwig
    Acked-by: Linus Torvalds

    Christoph Hellwig
     

08 Jul, 2020

7 commits


02 Apr, 2020

1 commit

  • This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
    unneeded __ARCH_WANT_SYS_LLSEEK macro")

    When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.

    There is resistance to both removing the llseek syscall from the 64bit
    syscall tables and building the llseek interface unconditionally.

    Signed-off-by: Michal Suchanek
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/
    Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/
    Link: https://lore.kernel.org/r/dd4575c51e31766e87f7e7fa121d099ab78d3290.1584699455.git.msuchanek@suse.de

    Michal Suchanek
     

04 Feb, 2020

1 commit

  • Pull overlayfs update from Miklos Szeredi:

    - Try to preserve holes in sparse files when copying up, thus saving
    disk space and improving performance.

    - Fix a performance regression introduced in v4.19 by preserving
    asynchronicity of IO when fowarding to underlying layers. Add VFS
    helpers to submit async iocbs.

    - Fix a regression in lseek(2) introduced in v4.19 that breaks >2G
    seeks on 32bit kernels.

    - Fix a corner case where st_ino/st_dev was not preserved across copy
    up.

    - Miscellaneous fixes and cleanups.

    * tag 'ovl-update-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix lseek overflow on 32bit
    ovl: add splice file read write helper
    ovl: implement async IO routines
    vfs: add vfs_iocb_iter_[read|write] helper functions
    ovl: layer is const
    ovl: fix corner case of non-constant st_dev;st_ino
    ovl: fix corner case of conflicting lower layer uuid
    ovl: generalize the lower_fs[] array
    ovl: simplify ovl_same_sb() helper
    ovl: generalize the lower_layers[] array
    ovl: improving copy-up efficiency for big sparse file
    ovl: use ovl_inode_lock in ovl_llseek()
    ovl: use pr_fmt auto generate prefix
    ovl: fix wrong WARN_ON() in ovl_cache_update_ino()

    Linus Torvalds
     

24 Jan, 2020

2 commits

  • This doesn't cause any behavior changes and will be used by overlay async
    IO implementation.

    Signed-off-by: Jiufei Xue
    Signed-off-by: Miklos Szeredi

    Jiufei Xue
     
  • We always round down, to a multiple of the filesystem's block size, the
    length to deduplicate at generic_remap_check_len(). However this is only
    needed if an attempt to deduplicate the last block into the middle of the
    destination file is requested, since that leads into a corruption if the
    length of the source file is not block size aligned. When an attempt to
    deduplicate the last block into the end of the destination file is
    requested, we should allow it because it is safe to do it - there's no
    stale data exposure and we are prepared to compare the data ranges for
    a length not aligned to the block (or page) size - in fact we even do
    the data compare before adjusting the deduplication length.

    After btrfs was updated to use the generic helpers from VFS (by commit
    34a28e3d77535e ("Btrfs: use generic_remap_file_range_prep() for cloning
    and deduplication")) we started to have user reports of deduplication
    not reflinking the last block anymore, and whence users getting lower
    deduplication scores. The main use case is deduplication of entire
    files that have a size not aligned to the block size of the filesystem.

    We already allow cloning the last block to the end (and beyond) of the
    destination file, so allow for deduplication as well.

    Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/
    CC: stable@vger.kernel.org # 5.1+
    Reviewed-by: Josef Bacik
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     

17 Aug, 2019

1 commit

  • When dedupe wants to use the page cache to compare parts of two files
    for dedupe, we must be very careful to handle locking correctly. The
    current code doesn't do this. It must lock and unlock the page only
    once if the two pages are the same, since the overlapping range check
    doesn't catch this when blocksize < pagesize. If the pages are distinct
    but from the same file, we must observe page locking order and lock them
    in order of increasing offset to avoid clashing with writeback locking.

    Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Bill O'Donnell
    Reviewed-by: Matthew Wilcox (Oracle)

    Darrick J. Wong
     

10 Jun, 2019

6 commits

  • We want to enable cross-filesystem copy_file_range functionality
    where possible, so push the "same superblock only" checks down to
    the individual filesystem callouts so they can make their own
    decisions about cross-superblock copy offload and fallack to
    generic_copy_file_range() for cross-superblock copy.

    [Amir] We do not call ->remap_file_range() in case the files are not
    on the same sb and do not call ->copy_file_range() in case the files
    do not belong to the same filesystem driver.

    This changes behavior of the copy_file_range(2) syscall, which will
    now allow cross filesystem in-kernel copy. CIFS already supports
    cross-superblock copy, between two shares to the same server. This
    functionality will now be available via the copy_file_range(2) syscall.

    Cc: Steve French
    Signed-off-by: Dave Chinner
    Signed-off-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Amir Goldstein
     
  • The combination of file_remove_privs() and file_update_mtime() is
    quite common in filesystem ->write_iter() methods.

    Modelled after the helper file_accessed(), introduce file_modified()
    and use it from generic_remap_file_range_prep().

    Note that the order of calling file_remove_privs() before
    file_update_mtime() in the helper was matched to the more common order by
    filesystems and not the current order in generic_remap_file_range_prep().

    Signed-off-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Amir Goldstein
     
  • Like the clone and dedupe interfaces we've recently fixed, the
    copy_file_range() implementation is missing basic sanity, limits and
    boundary condition tests on the parameters that are passed to it
    from userspace. Create a new "generic_copy_file_checks()" function
    modelled on the generic_remap_checks() function to provide this
    missing functionality.

    [Amir] Shorten copy length instead of checking pos_in limits
    because input file size already abides by the limits.

    Signed-off-by: Dave Chinner
    Signed-off-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Amir Goldstein
     
  • Factor out helper with some checks on in/out file that are
    common to clone_file_range and copy_file_range.

    Suggested-by: Darrick J. Wong
    Signed-off-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Amir Goldstein
     
  • Now that we have generic_copy_file_range(), remove it as a fallback
    case when offloads fail. This puts the responsibility for executing
    fallbacks on the filesystems that implement ->copy_file_range and
    allows us to add operational validity checks to
    generic_copy_file_range().

    Rework vfs_copy_file_range() to call a new do_copy_file_range()
    helper to execute the copying callout, and move calls to
    generic_file_copy_range() into filesystem methods where they
    currently return failures.

    [Amir] overlayfs is not responsible of executing the fallback.
    It is the responsibility of the underlying filesystem.

    Signed-off-by: Dave Chinner
    Signed-off-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Right now if vfs_copy_file_range() does not use any offload
    mechanism, it falls back to calling do_splice_direct(). This fails
    to do basic sanity checks on the files being copied. Before we
    start adding this necessarily functionality to the fallback path,
    separate it out into generic_copy_file_range().

    generic_copy_file_range() has the same prototype as
    ->copy_file_range() so that filesystems can use it in their custom
    ->copy_file_range() method if they so choose.

    Signed-off-by: Dave Chinner
    Signed-off-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

06 May, 2019

1 commit

  • This amends commit 10dce8af3422 ("fs: stream_open - opener for
    stream-like files so that read and write can run simultaneously without
    deadlock") in how position is passed into .read()/.write() handler for
    stream-like files:

    Rasmus noticed that we currently pass 0 as position and ignore any position
    change if that is done by a file implementation. This papers over bugs if ppos
    is used in files that declare themselves as being stream-like as such bugs will
    go unnoticed. Even if a file implementation is correctly converted into using
    stream_open, its read/write later could be changed to use ppos and even though
    that won't be working correctly, that bug might go unnoticed without someone
    doing wrong behaviour analysis. It is thus better to pass ppos=NULL into
    read/write for stream-like files as that don't give any chance for ppos usage
    bugs because it will oops if ppos is ever used inside .read() or .write().

    Note 1: rw_verify_area, new_sync_{read,write} needs to be updated
    because they are called by vfs_read/vfs_write & friends before
    file_operations .read/.write .

    Note 2: if file backend uses new-style .read_iter/.write_iter, position
    is still passed into there as non-pointer kiocb.ki_pos . Currently
    stream_open.cocci (semantic patch added by 10dce8af3422) ignores files
    whose file_operations has *_iter methods.

    Suggested-by: Rasmus Villemoes
    Signed-off-by: Kirill Smelkov

    Kirill Smelkov
     

07 Apr, 2019

1 commit

  • …multaneously without deadlock

    Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
    locking for file.f_pos access and in particular made concurrent read and
    write not possible - now both those functions take f_pos lock for the
    whole run, and so if e.g. a read is blocked waiting for data, write will
    deadlock waiting for that read to complete.

    This caused regression for stream-like files where previously read and
    write could run simultaneously, but after that patch could not do so
    anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
    to /proc/xen/xenbus") which fixes such regression for particular case of
    /proc/xen/xenbus.

    The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
    safety for read/write/lseek and added the locking to file descriptors of
    all regular files. In 2014 that thread-safety problem was not new as it
    was already discussed earlier in 2006.

    However even though 2006'th version of Linus's patch was adding f_pos
    locking "only for files that are marked seekable with FMODE_LSEEK (thus
    avoiding the stream-like objects like pipes and sockets)", the 2014
    version - the one that actually made it into the tree as 9c225f2655e3 -
    is doing so irregardless of whether a file is seekable or not.

    See

    https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
    https://lwn.net/Articles/180387
    https://lwn.net/Articles/180396

    for historic context.

    The reason that it did so is, probably, that there are many files that
    are marked non-seekable, but e.g. their read implementation actually
    depends on knowing current position to correctly handle the read. Some
    examples:

    kernel/power/user.c snapshot_read
    fs/debugfs/file.c u32_array_read
    fs/fuse/control.c fuse_conn_waiting_read + ...
    drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
    arch/s390/hypfs/inode.c hypfs_read_iter
    ...

    Despite that, many nonseekable_open users implement read and write with
    pure stream semantics - they don't depend on passed ppos at all. And for
    those cases where read could wait for something inside, it creates a
    situation similar to xenbus - the write could be never made to go until
    read is done, and read is waiting for some, potentially external, event,
    for potentially unbounded time -> deadlock.

    Besides xenbus, there are 14 such places in the kernel that I've found
    with semantic patch (see below):

    drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
    drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
    drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
    drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
    net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
    drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
    drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
    drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
    net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
    drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
    drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
    drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
    drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
    drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

    In addition to the cases above another regression caused by f_pos
    locking is that now FUSE filesystems that implement open with
    FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
    stream-like files - for the same reason as above e.g. read can deadlock
    write locking on file.f_pos in the kernel.

    FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
    implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
    in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
    write routines not depending on current position at all, and with both
    read and write being potentially blocking operations:

    See

    https://github.com/libfuse/osspd
    https://lwn.net/Articles/308445

    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

    Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
    "somewhat pipe-like files ..." with read handler not using offset.
    However that test implements only read without write and cannot exercise
    the deadlock scenario:

    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

    I've actually hit the read vs write deadlock for real while implementing
    my FUSE filesystem where there is /head/watch file, for which open
    creates separate bidirectional socket-like stream in between filesystem
    and its user with both read and write being later performed
    simultaneously. And there it is semantically not easy to split the
    stream into two separate read-only and write-only channels:

    https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

    Let's fix this regression. The plan is:

    1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
    doing so would break many in-kernel nonseekable_open users which
    actually use ppos in read/write handlers.

    2. Add stream_open() to kernel to open stream-like non-seekable file
    descriptors. Read and write on such file descriptors would never use
    nor change ppos. And with that property on stream-like files read and
    write will be running without taking f_pos lock - i.e. read and write
    could be running simultaneously.

    3. With semantic patch search and convert to stream_open all in-kernel
    nonseekable_open users for which read and write actually do not
    depend on ppos and where there is no other methods in file_operations
    which assume @offset access.

    4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
    steam_open if that bit is present in filesystem open reply.

    It was tempting to change fs/fuse/ open handler to use stream_open
    instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
    grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
    and in particular GVFS which actually uses offset in its read and
    write handlers

    https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
    https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
    https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
    https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

    so if we would do such a change it will break a real user.

    5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
    from v3.14+ (the kernel where 9c225f2655 first appeared).

    This will allow to patch OSSPD and other FUSE filesystems that
    provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
    in their open handler and this way avoid the deadlock on all kernel
    versions. This should work because fs/fuse/ ignores unknown open
    flags returned from a filesystem and so passing FOPEN_STREAM to a
    kernel that is not aware of this flag cannot hurt. In turn the kernel
    that is not aware of FOPEN_STREAM will be < v3.14 where just
    FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
    write deadlock.

    This patch adds stream_open, converts /proc/xen/xenbus to it and adds
    semantic patch to automatically locate in-kernel places that are either
    required to be converted due to read vs write deadlock, or that are just
    safe to be converted because read and write do not use ppos and there
    are no other funky methods in file_operations.

    Regarding semantic patch I've verified each generated change manually -
    that it is correct to convert - and each other nonseekable_open instance
    left - that it is either not correct to convert there, or that it is not
    converted due to current stream_open.cocci limitations.

    The script also does not convert files that should be valid to convert,
    but that currently have .llseek = noop_llseek or generic_file_llseek for
    unknown reason despite file being opened with nonseekable_open (e.g.
    drivers/input/mousedev.c)

    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Yongzhi Pan <panyongzhi@gmail.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: David Vrabel <david.vrabel@citrix.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: Miklos Szeredi <miklos@szeredi.hu>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Julia Lawall <Julia.Lawall@lip6.fr>
    Cc: Nikolaus Rath <Nikolaus@rath.org>
    Cc: Han-Wen Nienhuys <hanwen@google.com>
    Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Kirill Smelkov
     

13 Mar, 2019

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted fixes (really no common topic here)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Make __vfs_write() static
    vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
    pipe: stop using ->can_merge
    splice: don't merge into linked buffers
    fs: move generic stat response attr handling to vfs_getattr_nosec
    orangefs: don't reinitialize result_mask in ->getattr
    fs/devpts: always delete dcache dentry-s in dput()

    Linus Torvalds
     

05 Mar, 2019

1 commit

  • Every in-kernel use of this function defined it to KERNEL_DS (either as
    an actual define, or as an inline function). It's an entirely
    historical artifact, and long long long ago used to actually read the
    segment selector valueof '%ds' on x86.

    Which in the kernel is always KERNEL_DS.

    Inspired by a patch from Jann Horn that just did this for a very small
    subset of users (the ones in fs/), along with Al who suggested a script.
    I then just took it to the logical extreme and removed all the remaining
    gunk.

    Roughly scripted with

    git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
    git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'

    plus manual fixups to remove a few unusual usage patterns, the couple of
    inline function cases and to fix up a comment that had become stale.

    The 'get_ds()' function remains in an x86 kvm selftest, since in user
    space it actually does something relevant.

    Inspired-by: Jann Horn
    Inspired-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Feb, 2019

1 commit


16 Feb, 2019

1 commit

  • The preadv2 and pwritev2 syscalls are supposed to emulate the readv and
    writev syscalls when offset == -1. Therefore the compat code should
    check for offset before calling do_compat_preadv64 and
    do_compat_pwritev64. This is the case for the preadv2 and pwritev2
    syscalls, but handling of offset == -1 is missing in their 64-bit
    equivalent.

    This patch fixes that, calling do_compat_readv and do_compat_writev when
    offset == -1. This fixes the following glibc tests on x32:
    - misc/tst-preadvwritev2
    - misc/tst-preadvwritev64v2

    Cc: Alexander Viro
    Cc: H.J. Lu
    Signed-off-by: Aurelien Jarno
    Signed-off-by: Al Viro

    Aurelien Jarno
     

04 Jan, 2019

1 commit

  • Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
    of the user address range verification function since we got rid of the
    old racy i386-only code to walk page tables by hand.

    It existed because the original 80386 would not honor the write protect
    bit when in kernel mode, so you had to do COW by hand before doing any
    user access. But we haven't supported that in a long time, and these
    days the 'type' argument is a purely historical artifact.

    A discussion about extending 'user_access_begin()' to do the range
    checking resulted this patch, because there is no way we're going to
    move the old VERIFY_xyz interface to that model. And it's best done at
    the end of the merge window when I've done most of my merges, so let's
    just get this done once and for all.

    This patch was mostly done with a sed-script, with manual fix-ups for
    the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

    There were a couple of notable cases:

    - csky still had the old "verify_area()" name as an alias.

    - the iter_iov code had magical hardcoded knowledge of the actual
    values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
    really used it)

    - microblaze used the type argument for a debug printout

    but other than those oddities this should be a total no-op patch.

    I tried to fix up all architectures, did fairly extensive grepping for
    access_ok() uses, and the changes are trivial, but I may have missed
    something. Any missed conversion should be trivially fixable, though.

    Signed-off-by: Linus Torvalds

    Linus Torvalds