10 Nov, 2018

1 commit

  • commit a725356b6659469d182d662f22d770d83d3bc7b5 upstream.

    Commit 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze
    protection") created a wrapper do_clone_file_range() around
    vfs_clone_file_range() moving the freeze protection to former, so
    overlayfs could call the latter.

    The more common vfs practice is to call do_xxx helpers from vfs_xxx
    helpers, where freeze protecction is taken in the vfs_xxx helper, so
    this anomality could be a source of confusion.

    It seems that commit 8ede205541ff ("ovl: add reflink/copyfile/dedup
    support") may have fallen a victim to this confusion -
    ovl_clone_file_range() calls the vfs_clone_file_range() helper in the
    hope of getting freeze protection on upper fs, but in fact results in
    overlayfs allowing to bypass upper fs freeze protection.

    Swap the names of the two helpers to conform to common vfs practice
    and call the correct helpers from overlayfs and nfsd.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi
    Fixes: 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze...")
    Signed-off-by: Amir Goldstein
    Signed-off-by: Sasha Levin

    Amir Goldstein
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

02 Mar, 2017

1 commit

  • Instead of including the full , we are going to include the
    types-only header in , to further
    decouple the scheduler header from the signal headers.

    This means that various files which relied on the full need
    to be updated to gain an explicit dependency on it.

    Update the code that relies on sched.h's inclusion of the header.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 Dec, 2016

2 commits

  • Move sb_start_write()/sb_end_write() out of the vfs helper and up into the
    ioctl handler.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest
    files are not on the same mount point.
    Practically, clone only requires that src and dest files
    are on the same file system.

    Move the check for same mount point to ioctl handler and keep
    only the check for same super block in the vfs helper.

    A following patch is going to use the vfs_clone_file_range()
    helper in overlayfs to copy up between lower and upper
    mount points on the same file system.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

16 Sep, 2016

2 commits

  • Kirill A Shutemov reports that the kernel doesn't try to cap dest_count
    in any way, and uses the number to allocate kernel memory. This causes
    high order allocation warnings in the kernel log if someone passes in a
    big enough value. We should clamp the allocation at PAGE_SIZE to avoid
    stressing the VM.

    The two existing users of the dedupe ioctl never send more than 120
    requests, so we can safely clamp dest_range at PAGE_SIZE, because with
    4k pages we can handle up to 127 dedupe candidates. Given the max
    extent length of 16MB, we can end up doing 2GB of IO which is plenty.

    [ Note: the "offsetof()" can't overflow, because 'count' is just a
    16-bit integer. That's not obvious in the limited context of the
    patch, so I'm noting it here because it made me go look. - Linus ]

    Reported-by: "Kirill A. Shutemov"
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • All the VFS functions in the dedupe ioctl path return int status, so
    the ioctl handler ought to as well.

    Found by Coverity, CID 1350952.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     

29 Jul, 2016

1 commit


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

13 Jan, 2016

1 commit

  • Pull vfs copy_file_range updates from Al Viro:
    "Several series around copy_file_range/CLONE"

    * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    btrfs: use new dedupe data function pointer
    vfs: hoist the btrfs deduplication ioctl to the vfs
    vfs: wire up compat ioctl for CLONE/CLONE_RANGE
    cifs: avoid unused variable and label
    nfsd: implement the NFSv4.2 CLONE operation
    nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
    vfs: pull btrfs clone API to vfs layer
    locks: new locks_mandatory_area calling convention
    vfs: Add vfs_copy_file_range() support for pagecache copies
    btrfs: add .copy_file_range file operation
    x86: add sys_copy_file_range to syscall tables
    vfs: add copy_file_range syscall and vfs helper

    Linus Torvalds
     

09 Jan, 2016

1 commit


01 Jan, 2016

1 commit


08 Dec, 2015

1 commit

  • The btrfs clone ioctls are now adopted by other file systems, with NFS
    and CIFS already having support for them, and XFS being under active
    development. To avoid growth of various slightly incompatible
    implementations, add one to the VFS. Note that clones are different from
    file copies in several ways:

    - they are atomic vs other writers
    - they support whole file clones
    - they support 64-bit legth clones
    - they do not allow partial success (aka short writes)
    - clones are expected to be a fast metadata operation

    Because of that it would be rather cumbersome to try to piggyback them on
    top of the recent clone_file_range infrastructure. The converse isn't
    true and the clone_file_range system call could try clone file range as
    a first attempt to copy, something that further patches will enable.

    Based on earlier work from Peng Tao.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

11 Feb, 2015

1 commit

  • __generic_block_fiemap may spin very long time for large sparse files.

    Without this patch an unprivileged user may abuse system resources simply
    by spawning a vast number of unkilable busyloops (works on ext2/ext3):

    truncate --size 1T test
    for ((i=0;i /dev/null &
    done

    Signed-off-by: Dmitry Monakhov
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Monakhov
     

17 Dec, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "A comparatively quieter cycle for nfsd this time, but still with two
    larger changes:

    - RPC server scalability improvements from Jeff Layton (using RCU
    instead of a spinlock to find idle threads).

    - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
    Schumaker, enabling fallocate on new clients"

    * 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd4: fix xdr4 count of server in fs_location4
    nfsd4: fix xdr4 inclusion of escaped char
    sunrpc/cache: convert to use string_escape_str()
    sunrpc: only call test_bit once in svc_xprt_received
    fs: nfsd: Fix signedness bug in compare_blob
    sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
    sunrpc: convert to lockless lookup of queued server threads
    sunrpc: fix potential races in pool_stats collection
    sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
    sunrpc: require svc_create callers to pass in meaningful shutdown routine
    sunrpc: have svc_wake_up only deal with pool 0
    sunrpc: convert sp_task_pending flag to use atomic bitops
    sunrpc: move rq_cachetype field to better optimize space
    sunrpc: move rq_splice_ok flag into rq_flags
    sunrpc: move rq_dropme flag into rq_flags
    sunrpc: move rq_usedeferral flag to rq_flags
    sunrpc: move rq_local field to rq_flags
    sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
    nfsd: minor off by one checks in __write_versions()
    sunrpc: release svc_pool_map reference when serv allocation fails
    ...

    Linus Torvalds
     

17 Nov, 2014

1 commit

  • Currently, freezing a filesystem involves calling freeze_super, which locks
    sb->s_umount and then calls the fs-specific freeze_fs hook. This makes it
    hard for gfs2 (and potentially other cluster filesystems) to use the vfs
    freezing code to do freezes on all the cluster nodes.

    In order to communicate that a freeze has been requested, and to make sure
    that only one node is trying to freeze at a time, gfs2 uses a glock
    (sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire
    this lock before calling freeze_super. This means that two nodes can
    attempt to freeze the filesystem by both calling freeze_super, acquiring
    the sb->s_umount lock, and then attempting to grab the cluster glock
    sd_freeze_gl. Only one will succeed, and the other will be stuck in
    freeze_super, making it impossible to finish freezing the node.

    To solve this problem, this patch adds the freeze_super and thaw_super
    hooks. If a filesystem implements these hooks, they are called instead of
    the vfs freeze_super and thaw_super functions. This means that every
    filesystem that implements these hooks must call the vfs freeze_super and
    thaw_super functions itself within the hook function to make use of the vfs
    freezing code.

    Reviewed-by: Jan Kara
    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

08 Nov, 2014

1 commit

  • This function needs to be exported so it can be used by the NFSD module
    when responding to the new ALLOCATE and DEALLOCATE operations in NFS
    v4.2. Christoph Hellwig suggested renaming the function to stay
    consistent with how other vfs functions are named.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     

25 Oct, 2013

1 commit


23 Feb, 2013

1 commit


27 Sep, 2012

1 commit


29 Feb, 2012

1 commit


06 Jan, 2012

1 commit

  • We're doing some odd things there, which already messes up various users
    (see the net/socket.c code that this removes), and it was going to add
    yet more crud to the block layer because of the incorrect error code
    translation.

    ENOIOCTLCMD is not an error return that should be returned to user mode
    from the "ioctl()" system call, but it should *not* be translated as
    EINVAL ("Invalid argument"). It should be translated as ENOTTY
    ("Inappropriate ioctl for device").

    That EINVAL confusion has apparently so permeated some code that the
    block layer actually checks for it, which is sad. We continue to do so
    for now, but add a big comment about how wrong that is, and we should
    remove it entirely eventually. In the meantime, this tries to keep the
    changes localized to just the EINVAL -> ENOTTY fix, and removing code
    that makes it harder to do the right thing.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Mar, 2011

1 commit

  • Move declaration of 'inode' to beginning of the function. Since it
    is referenced directly or indirectly (in case of FIFREEZE/FITHAW/
    FS_IOC_FIEMAP) it's not harmful IMHO. And remove unnecessary casts
    using 'argp' instead.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Al Viro

    Namhyung Kim
     

03 Feb, 2011

1 commit

  • Some filesystems don't deal well with being asked to map less than
    blocksize blocks (GFS2 for example). Since we are always mapping at least
    blocksize sections anyway, just make sure len is at least as big as a
    blocksize so we don't trip up any filesystems. Thanks,

    Signed-off-by: Josef Bacik
    Cc: Steven Whitehouse
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

17 Jan, 2011

1 commit

  • The fi_extents_start field of struct fiemap_extent_info is a
    user pointer but was not marked as __user. This makes sparse
    emit following warnings:

    CHECK fs/ioctl.c
    fs/ioctl.c:114:26: warning: incorrect type in argument 1 (different address spaces)
    fs/ioctl.c:114:26: expected void [noderef] *dst
    fs/ioctl.c:114:26: got struct fiemap_extent *[assigned] dest
    fs/ioctl.c:202:14: warning: incorrect type in argument 1 (different address spaces)
    fs/ioctl.c:202:14: expected void const volatile [noderef] *
    fs/ioctl.c:202:14: got struct fiemap_extent *[assigned] fi_extents_start
    fs/ioctl.c:212:27: warning: incorrect type in argument 1 (different address spaces)
    fs/ioctl.c:212:27: expected void [noderef] *dst
    fs/ioctl.c:212:27: got char *

    Also add 'ufiemap' variable to eliminate unnecessary casts.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Al Viro

    Namhyung Kim
     

20 Nov, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard
    fs: Do not dispatch FITRIM through separate super_operation
    ext4: ext4_fill_super shouldn't return 0 on corruption
    jbd2: fix /proc/fs/jbd2/ when using an external journal
    ext4: missing unlock in ext4_clear_request_list()
    ext4: fix setting random pages PageUptodate

    Linus Torvalds
     
  • There was concern that FITRIM ioctl is not common enough to be included
    in core vfs ioctl, as Christoph Hellwig pointed out there's no real point
    in dispatching this out to a separate vector instead of just through
    ->ioctl.

    So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs
    from super_operation structure.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

18 Nov, 2010

1 commit


28 Oct, 2010

1 commit

  • Adds an filesystem independent ioctl to allow implementation of file
    system batched discard support. I takes fstrim_range structure as an
    argument. fstrim_range is definec in the include/fs.h and its
    definition is as follows.

    struct fstrim_range {
    start;
    len;
    minlen;
    }

    start - first Byte to trim
    len - number of Bytes to trim from start
    minlen - minimum extent length to trim, free extents shorter than this
    number of Bytes will be ignored. This will be rounded up to fs
    block size.

    It is also possible to specify NULL as an argument. In this case the
    arguments will set itself as follows:

    start = 0;
    len = ULLONG_MAX;
    minlen = 0;

    So it will trim the whole file system at one run.

    After the FITRIM is done, the number of actually discarded Bytes is stored
    in fstrim_range.len to give the user better insight on how much storage
    space has been really released for wear-leveling.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

14 Aug, 2010

1 commit


22 May, 2010

1 commit

  • Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then
    letting it do all the work. But freezing is more of an fs thing, and doesn't
    really have much to do with the bdev at all, all the work gets done with the
    super. In btrfs we do not populate s_bdev, since we can have multiple bdev's
    for one fs and setting s_bdev makes removing devices from a pool kind of tricky.
    This means that freezing a btrfs filesystem fails, which causes us to corrupt
    with things like tux-on-ice which use the fsfreeze mechanism. So instead of
    populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs
    freezing stuff into freeze_super and thaw_super. These just take the
    super_block that we're freezing and does the appropriate work. It's basically
    just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to
    use the new super helpers. I've tested this with ext4 and btrfs and verified
    everything continues to work the same as before.

    The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if
    the fs is already frozen. I thought this was a better solution than adding a
    freeze counter to the super_block, but if everybody hates this idea I'm open to
    suggestions. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

24 Apr, 2010

1 commit

  • This cleans up a few of the complaints of __generic_block_fiemap. I've
    fixed all the typing stuff, used inline functions instead of macros,
    gotten rid of a couple of variables, and made sure the size and block
    requests are all block aligned. It also fixes a problem where sometimes
    FIEMAP_EXTENT_LAST wasn't being set properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

12 Nov, 2009

1 commit

  • Because of an integer overflow on start_blk, various kind of wrong results
    would be returned by the generic_block_fiemap() handler, such as no
    extents when there is a 4GB+ hole at the beginning of the file, or wrong
    fe_logical when an extent starts after the first 4GB.

    Signed-off-by: Mike Hommey
    Cc: Alexander Viro
    Cc: Steven Whitehouse
    Cc: Theodore Ts'o
    Cc: Eric Sandeen
    Cc: Josef Bacik
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Hommey
     

24 Sep, 2009

1 commit

  • If fiemap_check_ranges is passed a large enough value, then it's
    possible that the value would be cast to a signed value for comparison
    against s_maxbytes when we change it to loff_t. Make sure that doesn't
    happen by explicitly casting s_maxbytes to an unsigned value for the
    purposes of comparison.

    Signed-off-by: Jeff Layton
    Cc: Christoph Hellwig
    Cc: Robert Love
    Cc: Al Viro
    Cc: Johannes Weiner
    Cc: Mandeep Singh Baines
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Jeff Layton
     

24 Jun, 2009

1 commit


17 Jun, 2009

1 commit


14 May, 2009

1 commit


07 May, 2009

1 commit

  • Fix a problem where the generic block based fiemap stuff would not
    properly set FIEMAP_EXTENT_LAST on the last extent. I've reworked things
    to keep track if we go past the EOF, and mark the last extent properly.
    The problem was reported by and tested by Eric Sandeen.

    Tested-by: Eric Sandeen
    Signed-off-by: Josef Bacik
    Cc:
    Cc:
    Cc:
    Cc: Steven Whitehouse
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

16 Mar, 2009

2 commits

  • Most fasync implementations do something like:

    return fasync_helper(...);

    But fasync_helper() will return a positive value at times - a feature used
    in at least one place. Thus, a number of other drivers do:

    err = fasync_helper(...);
    if (err < 0)
    return err;
    return 0;

    In the interests of consistency and more concise code, it makes sense to
    map positive return values onto zero where ->fasync() is called.

    Cc: Al Viro
    Signed-off-by: Jonathan Corbet

    Jonathan Corbet
     
  • Removing the BKL from FASYNC handling ran into the challenge of keeping the
    setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
    the underlying fasync() function. Andi Kleen suggested moving the handling
    of that bit into fasync(); this patch does exactly that. As a result, we
    have a couple of internal API changes: fasync() must now manage the FASYNC
    bit, and it will be called without the BKL held.

    As it happens, every fasync() implementation in the kernel with one
    exception calls fasync_helper(). So, if we make fasync_helper() set the
    FASYNC bit, we can avoid making any changes to the other fasync()
    functions - as long as those functions, themselves, have proper locking.
    Most fasync() implementations do nothing but call fasync_helper() - which
    has its own lock - so they are easily verified as correct. The BKL had
    already been pushed down into the rest.

    The networking code has its own version of fasync_helper(), so that code
    has been augmented with explicit FASYNC bit handling.

    Cc: Al Viro
    Cc: David Miller
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jonathan Corbet

    Jonathan Corbet