07 Aug, 2016

1 commit

  • Pull binfmt_misc update from James Bottomley:
    "This update is to allow architecture emulation containers to function
    such that the emulation binary can be housed outside the container
    itself. The container and fs parts both have acks from relevant
    experts.

    To use the new feature you have to add an F option to your binfmt_misc
    configuration"

    From the docs:
    "The usual behaviour of binfmt_misc is to spawn the binary lazily when
    the misc format file is invoked. However, this doesn't work very well
    in the face of mount namespaces and changeroots, so the F mode opens
    the binary as soon as the emulation is installed and uses the opened
    image to spawn the emulator, meaning it is always available once
    installed, regardless of how the environment changes"

    * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
    binfmt_misc: add F option description to documentation
    binfmt_misc: add persistent opened binary handler for containers
    fs: add filp_clone_open API

    Linus Torvalds
     

30 Jun, 2016

1 commit

  • The two methods essentially do the same: find the real dentry/inode
    belonging to an overlay dentry. The difference is in the usage:

    vfs_open() uses ->d_select_inode() and expects the function to perform
    copy-up if necessary based on the open flags argument.

    file_dentry() uses ->d_real() passing in the overlay dentry as well as the
    underlying inode.

    vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real()
    with a zero inode would have worked just as well here.

    This patch merges the functionality of ->d_select_inode() into ->d_real()
    by adding an 'open_flags' argument to the latter.

    [Al Viro] Make the signature of d_real() match that of ->d_real() again.
    And constify the inode argument, while we are at it.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

18 May, 2016

1 commit

  • Pull 'struct path' constification update from Al Viro:
    "'struct path' is passed by reference to a bunch of Linux security
    methods; in theory, there's nothing to stop them from modifying the
    damn thing and LSM community being what it is, sooner or later some
    enterprising soul is going to decide that it's a good idea.

    Let's remove the temptation and constify all of those..."

    * 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    constify ima_d_path()
    constify security_sb_pivotroot()
    constify security_path_chroot()
    constify security_path_{link,rename}
    apparmor: remove useless checks for NULL ->mnt
    constify security_path_{mkdir,mknod,symlink}
    constify security_path_{unlink,rmdir}
    apparmor: constify common_perm_...()
    apparmor: constify aa_path_link()
    apparmor: new helper - common_path_perm()
    constify chmod_common/security_path_chmod
    constify security_sb_mount()
    constify chown_common/security_path_chown
    tomoyo: constify assorted struct path *
    apparmor_path_truncate(): path->mnt is never NULL
    constify vfs_truncate()
    constify security_path_truncate()
    [apparmor] constify struct path * in a bunch of helpers

    Linus Torvalds
     

17 May, 2016

1 commit


11 May, 2016

1 commit


03 May, 2016

1 commit


31 Mar, 2016

1 commit

  • I need an API that allows me to obtain a clone of the current file
    pointer to pass in to an exec handler. I've labelled this as an
    internal API because I can't see how it would be useful outside of the
    fs subsystem. The use case will be a persistent binfmt_misc handler.

    Signed-off-by: James Bottomley
    Acked-by: Serge Hallyn
    Acked-by: Jan Kara

    James Bottomley
     

28 Mar, 2016

3 commits


23 Mar, 2016

1 commit

  • This commit fixes the following security hole affecting systems where
    all of the following conditions are fulfilled:

    - The fs.suid_dumpable sysctl is set to 2.
    - The kernel.core_pattern sysctl's value starts with "/". (Systems
    where kernel.core_pattern starts with "|/" are not affected.)
    - Unprivileged user namespace creation is permitted. (This is
    true on Linux >=3.8, but some distributions disallow it by
    default using a distro patch.)

    Under these conditions, if a program executes under secure exec rules,
    causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
    namespace, changes its root directory and crashes, the coredump will be
    written using fsuid=0 and a path derived from kernel.core_pattern - but
    this path is interpreted relative to the root directory of the process,
    allowing the attacker to control where a coredump will be written with
    root privileges.

    To fix the security issue, always interpret core_pattern for dumps that
    are written under SUID_DUMP_ROOT relative to the root directory of init.

    Signed-off-by: Jann Horn
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Andy Lutomirski
    Cc: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jann Horn
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

04 Jan, 2016

1 commit


10 Jul, 2015

1 commit

  • Today proc and sysfs do not contain any executable files. Several
    applications today mount proc or sysfs without noexec and nosuid and
    then depend on there being no exectuables files on proc or sysfs.
    Having any executable files show on proc or sysfs would cause
    a user space visible regression, and most likely security problems.

    Therefore commit to never allowing executables on proc and sysfs by
    adding a new flag to mark them as filesystems without executables and
    enforce that flag.

    Test the flag where MNT_NOEXEC is tested today, so that the only user
    visible effect will be that exectuables will be treated as if the
    execute bit is cleared.

    The filesystems proc and sysfs do not currently incoporate any
    executable files so this does not result in any user visible effects.

    This makes it unnecessary to vet changes to proc and sysfs tightly for
    adding exectuable files or changes to chattr that would modify
    existing files, as no matter what the individual file say they will
    not be treated as exectuable files by the vfs.

    Not having to vet changes to closely is important as without this we
    are only one proc_create call (or another goof up in the
    implementation of notify_change) from having problematic executables
    on proc. Those mistakes are all too easy to make and would create
    a situation where there are security issues or the assumptions of
    some program having to be broken (and cause userspace regressions).

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

24 Jun, 2015

2 commits

  • Comment in include/linux/security.h says that ->inode_killpriv() should
    be called when setuid bit is being removed and that similar security
    labels (in fact this applies only to file capabilities) should be
    removed at this time as well. However we don't call ->inode_killpriv()
    when we remove suid bit on truncate.

    We fix the problem by calling ->inode_need_killpriv() and subsequently
    ->inode_killpriv() on truncate the same way as we do it on file write.

    After this patch there's only one user of should_remove_suid() - ocfs2 -
    and indeed it's buggy because it doesn't call ->inode_killpriv() on
    write. However fixing it is difficult because of special locking
    constraints.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Turn
    d_path(&file->f_path, ...);
    into
    file_path(file, ...);

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     

19 Jun, 2015

1 commit

  • Make file->f_path always point to the overlay dentry so that the path in
    /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
    overlay as well as the underlay (path-based LSMs probably don't need it).

    Using my union testsuite to set things up, before the patch I see:

    [root@andromeda union-testsuite]# bash 5 /a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...

    After the patch:

    [root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...

    Note the change in where /proc/$$/fd/5 points to in the ls command. It was
    pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
    (which is correct).

    The inode accessed, however, is the lower layer. The union layer is on device
    25h/37d and the upper layer on 24h/36d.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

11 May, 2015

1 commit


24 Apr, 2015

1 commit

  • Pull xfs update from Dave Chinner:
    "This update contains:

    - RENAME_WHITEOUT support

    - conversion of per-cpu superblock accounting to use generic counters

    - new inode mmap lock so that we can lock page faults out of
    truncate, hole punch and other direct extent manipulation functions
    to avoid racing mmap writes from causing data corruption

    - rework of direct IO submission and completion to solve data
    corruption issue when running concurrent extending DIO writes.
    Also solves problem of running IO completion transactions in
    interrupt context during size extending AIO writes.

    - FALLOC_FL_INSERT_RANGE support for inserting holes into a file via
    direct extent manipulation to avoid needing to copy data within the
    file

    - attribute block header field overflow fix for 64k block size
    filesystems

    - Lots of changes to log messaging to be more informative and concise
    when errors occur. Also prevent a lot of unnecessary log spamming
    due to cascading failures in error conditions.

    - lots of cleanups and bug fixes

    One thing of note is the direct IO fixes that we merged last week
    after the window opened. Even though a little late, they fix a user
    reported data corruption and have been pretty well tested. I figured
    there was not much point waiting another 2 weeks for -rc1 to be
    released just so I could send them to you..."

    * tag 'xfs-for-linus-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits)
    xfs: using generic_file_direct_write() is unnecessary
    xfs: direct IO EOF zeroing needs to drain AIO
    xfs: DIO write completion size updates race
    xfs: DIO writes within EOF don't need an ioend
    xfs: handle DIO overwrite EOF update completion correctly
    xfs: DIO needs an ioend for writes
    xfs: move DIO mapping size calculation
    xfs: factor DIO write mapping from get_blocks
    xfs: unlock i_mutex in xfs_break_layouts
    xfs: kill unnecessary firstused overflow check on attr3 leaf removal
    xfs: use larger in-core attr firstused field and detect overflow
    xfs: pass attr geometry to attr leaf header conversion functions
    xfs: disallow ro->rw remount on norecovery mount
    xfs: xfs_shift_file_space can be static
    xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
    fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
    xfs: Fix incorrect positive ENOMEM return
    xfs: xfs_mru_cache_insert() should use GFP_NOFS
    xfs: %pF is only for function pointers
    xfs: fix shadow warning in xfs_da3_root_split()
    ...

    Linus Torvalds
     

12 Apr, 2015

3 commits

  • no remaining users

    Signed-off-by: Al Viro

    Al Viro
     
  • We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
    occurs during an rsync into a filesystem that is exported via NFS.

    1.) fs/attr.c:notify_change() modifies the caller's version of attr.
    2.) 6de0ec00ba8d ("VFS: make notify_change pass ATTR_KILL_S*ID to
    setattr operations") introduced a BUG() restriction such that "no
    function will ever call notify_change() with both ATTR_MODE and
    ATTR_KILL_S*ID set". Under some circumstances though, it will have
    assisted in setting the caller's version of attr to this very
    combination.
    3.) 27ac0ffeac80 ("locks: break delegations on any attribute
    modification") introduced code to handle breaking
    delegations. This can result in notify_change() being re-called. attr
    _must_ be explicitly reset to avoid triggering the BUG() established
    in #2.
    4.) The path that that triggers this is via fs/open.c:chmod_common().
    The combination of attr flags set here and in the first call to
    notify_change() along with a later failed break_deleg_wait()
    results in notify_change() being called again via retry_deleg
    without resetting attr.

    Solution is to move retry_deleg in chmod_common() a bit further up to
    ensure attr is completely reset.

    There are other places where this seemingly could occur, such as
    fs/utimes.c:utimes_common(), but the attr flags are not initially
    set in such a way to trigger this.

    Fixes: 27ac0ffeac80 ("locks: break delegations on any attribute modification")
    Reported-by: Eric Meddaugh
    Tested-by: Eric Meddaugh
    Signed-off-by: Andrew Elble
    Signed-off-by: Al Viro

    Andrew Elble
     
  • For one thing, LOOKUP_DIRECTORY will be dealt with in do_last().
    For another, name can be an empty string, but not NULL - no callers
    pass that and it would oops immediately if they would.

    Signed-off-by: Al Viro

    Al Viro
     

25 Mar, 2015

1 commit

  • FALLOC_FL_INSERT_RANGE command is the opposite command of
    FALLOC_FL_COLLAPSE_RANGE that is needed for someone who wants to add
    some data in the middle of file.

    FALLOC_FL_INSERT_RANGE will create space for writing new data within
    a file after shifting extents to right as given length. This command
    also has same limitations as FALLOC_FL_COLLAPSE_RANGE in that
    operations need to be filesystem block boundary aligned and cannot
    cross the current EOF.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Ashish Sangwan
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Namjae Jeon
     

18 Feb, 2015

1 commit

  • Pull getname/putname updates from Al Viro:
    "Rework of getname/getname_kernel/etc., mostly from Paul Moore. Gets
    rid of quite a pile of kludges between namei and audit..."

    * 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    audit: replace getname()/putname() hacks with reference counters
    audit: fix filename matching in __audit_inode() and __audit_inode_child()
    audit: enable filename recording via getname_kernel()
    simpler calling conventions for filename_mountpoint()
    fs: create proper filename objects using getname_kernel()
    fs: rework getname_kernel to handle up to PATH_MAX sized filenames
    cut down the number of do_path_lookup() callers

    Linus Torvalds
     

17 Feb, 2015

1 commit

  • All callers of get_xip_mem() are now gone. Remove checks for it,
    initialisers of it, documentation of it and the only implementation of it.
    Also remove mm/filemap_xip.c as it is now empty. Also remove
    documentation of the long-gone get_xip_page().

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

23 Jan, 2015

1 commit

  • There are several areas in the kernel that create temporary filename
    objects using the following pattern:

    int func(const char *name)
    {
    struct filename *file = { .name = name };
    ...
    return 0;
    }

    ... which for the most part works okay, but it causes havoc within the
    audit subsystem as the filename object does not persist beyond the
    lifetime of the function. This patch converts all of these temporary
    filename objects into proper filename objects using getname_kernel()
    and putname() which ensure that the filename object persists until the
    audit subsystem is finished with it.

    Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
    for helping resolve a difficult kernel panic on boot related to a
    use-after-free problem in kern_path_create(); the thread can be seen
    at the link below:

    * https://lkml.org/lkml/2015/1/20/710

    This patch includes code that was either based on, or directly written
    by Al in the above thread.

    CC: viro@zeniv.linux.org.uk
    CC: linux@roeck-us.net
    CC: sd@queasysnail.net
    CC: linux-fsdevel@vger.kernel.org
    Signed-off-by: Paul Moore
    Signed-off-by: Al Viro

    Paul Moore
     

17 Dec, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "A comparatively quieter cycle for nfsd this time, but still with two
    larger changes:

    - RPC server scalability improvements from Jeff Layton (using RCU
    instead of a spinlock to find idle threads).

    - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
    Schumaker, enabling fallocate on new clients"

    * 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd4: fix xdr4 count of server in fs_location4
    nfsd4: fix xdr4 inclusion of escaped char
    sunrpc/cache: convert to use string_escape_str()
    sunrpc: only call test_bit once in svc_xprt_received
    fs: nfsd: Fix signedness bug in compare_blob
    sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
    sunrpc: convert to lockless lookup of queued server threads
    sunrpc: fix potential races in pool_stats collection
    sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
    sunrpc: require svc_create callers to pass in meaningful shutdown routine
    sunrpc: have svc_wake_up only deal with pool 0
    sunrpc: convert sp_task_pending flag to use atomic bitops
    sunrpc: move rq_cachetype field to better optimize space
    sunrpc: move rq_splice_ok flag into rq_flags
    sunrpc: move rq_dropme flag into rq_flags
    sunrpc: move rq_usedeferral flag to rq_flags
    sunrpc: move rq_local field to rq_flags
    sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
    nfsd: minor off by one checks in __write_versions()
    sunrpc: release svc_pool_map reference when serv allocation fails
    ...

    Linus Torvalds
     

14 Dec, 2014

1 commit

  • The fanotify and the inotify API can be used to monitor changes of the
    file system. System call fallocate() modifies files. Hence it should
    trigger the corresponding fanotify (FAN_MODIFY) and inotify (IN_MODIFY)
    events. The most interesting case is FALLOC_FL_COLLAPSE_RANGE because
    this value allows to create arbitrary file content from random data.

    This patch adds the missing call to fsnotify_modify().

    The FAN_MODIFY and IN_MODIFY event will be created when fallocate()
    succeeds. It will even be created if the file length remains unchanged,
    e.g. when calling fanotify with flag FALLOC_FL_KEEP_SIZE.

    This logic was primarily chosen to keep the coding simple.

    It resembles the logic of the write() system call.

    When we call write() we always create a FAN_MODIFY event, even in the case
    of overwriting with identical data.

    Events FAN_MODIFY and IN_MODIFY do not provide any guarantee that data was
    actually changed.

    Furthermore even if if the filesize remains unchanged, fallocate() may
    influence whether a subsequent write() will succeed and hence the
    fallocate() call may be considered a modification.

    The fallocate(2) man page teaches: After a successful call, subsequent
    writes into the range specified by offset and len are guaranteed not to
    fail because of lack of disk space.

    So calling fallocate(fd, FALLOC_FL_KEEP_SIZE, offset, len) may result in
    different outcomes of a subsequent write depending on the values of offset
    and len.

    Signed-off-by: Heinrich Schuchardt
    Reviewed-by: Jan Kara
    Cc: Jan Kara
    Cc: Alexander Viro
    Cc: Eric Paris
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Michael Kerrisk
    Cc: Theodore Ts'o
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinrich Schuchardt
     

20 Nov, 2014

2 commits


08 Nov, 2014

1 commit

  • This function needs to be exported so it can be used by the NFSD module
    when responding to the new ALLOCATE and DEALLOCATE operations in NFS
    v4.2. Christoph Hellwig suggested renaming the function to stay
    consistent with how other vfs functions are named.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     

24 Oct, 2014

1 commit


01 Aug, 2014

1 commit


07 May, 2014

2 commits

  • Beginning to introduce those. Just the callers for now, and it's
    clumsier than it'll eventually become; once we finish converting
    aio_read and aio_write instances, the things will get nicer.

    For now, these guys are in parallel to ->aio_read() and ->aio_write();
    they take iocb and iov_iter, with everything in iov_iter already
    validated. File offset is passed in iocb->ki_pos, iov/nr_segs -
    in iov_iter.

    Main concerns in that series are stack footprint and ability to
    split the damn thing cleanly.

    [fix from Peter Ujfalusi folded]

    Signed-off-by: Al Viro

    Al Viro
     
  • Since we are about to introduce new methods (read_iter/write_iter), the
    tests in a bunch of places would have to grow inconveniently. Check
    once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
    and FMODE_CAN_WRITE resp. It might end up being a temporary measure -
    once everything switches from ->aio_{read,write} to ->{read,write}_iter
    it might make sense to return to open-coded checks. We'll see...

    Signed-off-by: Al Viro

    Al Viro
     

21 Apr, 2014

1 commit

  • Pull ext4 fixes from Ted Ts'o:
    "These are regression and bug fixes for ext4.

    We had a number of new features in ext4 during this merge window
    (ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so
    there were many more regression and bug fixes this time around. It
    didn't help that xfstests hadn't been fully updated to fully stress
    test COLLAPSE_RANGE until after -rc1"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits)
    ext4: disable COLLAPSE_RANGE for bigalloc
    ext4: fix COLLAPSE_RANGE failure with 1KB block size
    ext4: use EINVAL if not a regular file in ext4_collapse_range()
    ext4: enforce we are operating on a regular file in ext4_zero_range()
    ext4: fix extent merging in ext4_ext_shift_path_extents()
    ext4: discard preallocations after removing space
    ext4: no need to truncate pagecache twice in collapse range
    ext4: fix removing status extents in ext4_collapse_range()
    ext4: use filemap_write_and_wait_range() correctly in collapse range
    ext4: use truncate_pagecache() in collapse range
    ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE
    ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled
    ext4: always check ext4_ext_find_extent result
    ext4: fix error handling in ext4_ext_shift_extents
    ext4: silence sparse check warning for function ext4_trim_extent
    ext4: COLLAPSE_RANGE only works on extent-based files
    ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches
    ext4: use i_size_read in ext4_unaligned_aio()
    fs: disallow all fallocate operation on active swapfile
    fs: move falloc collapse range check into the filesystem methods
    ...

    Linus Torvalds
     

13 Apr, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     

12 Apr, 2014

3 commits

  • Currently some file system have IS_SWAPFILE check in their fallocate
    implementations and some do not. However we should really prevent any
    fallocate operation on swapfile so move the check to vfs and remove the
    redundant checks from the file systems fallocate implementations.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     
  • Currently in do_fallocate in collapse range case we're checking
    whether offset + len is not bigger than i_size. However there is
    nothing which would prevent i_size from changing so the check is
    pointless. It should be done in the file system itself and the file
    system needs to make sure that i_size is not going to change. The
    i_size check for the other fallocate modes are also done in the
    filesystems.

    As it is now we can easily crash the kernel by having two processes
    doing truncate and fallocate collapse range at the same time. This
    can be reproduced on ext4 and it is theoretically possible on xfs even
    though I was not able to trigger it with this simple test.

    This commit removes the check from do_fallocate and adds it to the
    file system.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Acked-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Lukas Czerner
     
  • Currently punch hole and collapse range fallocate operation are not
    allowed on append only file. This should be case for zero range as well.
    Fix it by allowing only pure fallocate (possibly with keep size set).

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner