16 Sep, 2016

2 commits

  • Kirill A Shutemov reports that the kernel doesn't try to cap dest_count
    in any way, and uses the number to allocate kernel memory. This causes
    high order allocation warnings in the kernel log if someone passes in a
    big enough value. We should clamp the allocation at PAGE_SIZE to avoid
    stressing the VM.

    The two existing users of the dedupe ioctl never send more than 120
    requests, so we can safely clamp dest_range at PAGE_SIZE, because with
    4k pages we can handle up to 127 dedupe candidates. Given the max
    extent length of 16MB, we can end up doing 2GB of IO which is plenty.

    [ Note: the "offsetof()" can't overflow, because 'count' is just a
    16-bit integer. That's not obvious in the limited context of the
    patch, so I'm noting it here because it made me go look. - Linus ]

    Reported-by: "Kirill A. Shutemov"
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • All the VFS functions in the dedupe ioctl path return int status, so
    the ioctl handler ought to as well.

    Found by Coverity, CID 1350952.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     

29 Jul, 2016

1 commit


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

13 Jan, 2016

1 commit

  • Pull vfs copy_file_range updates from Al Viro:
    "Several series around copy_file_range/CLONE"

    * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    btrfs: use new dedupe data function pointer
    vfs: hoist the btrfs deduplication ioctl to the vfs
    vfs: wire up compat ioctl for CLONE/CLONE_RANGE
    cifs: avoid unused variable and label
    nfsd: implement the NFSv4.2 CLONE operation
    nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
    vfs: pull btrfs clone API to vfs layer
    locks: new locks_mandatory_area calling convention
    vfs: Add vfs_copy_file_range() support for pagecache copies
    btrfs: add .copy_file_range file operation
    x86: add sys_copy_file_range to syscall tables
    vfs: add copy_file_range syscall and vfs helper

    Linus Torvalds
     

09 Jan, 2016

1 commit


01 Jan, 2016

1 commit


08 Dec, 2015

1 commit

  • The btrfs clone ioctls are now adopted by other file systems, with NFS
    and CIFS already having support for them, and XFS being under active
    development. To avoid growth of various slightly incompatible
    implementations, add one to the VFS. Note that clones are different from
    file copies in several ways:

    - they are atomic vs other writers
    - they support whole file clones
    - they support 64-bit legth clones
    - they do not allow partial success (aka short writes)
    - clones are expected to be a fast metadata operation

    Because of that it would be rather cumbersome to try to piggyback them on
    top of the recent clone_file_range infrastructure. The converse isn't
    true and the clone_file_range system call could try clone file range as
    a first attempt to copy, something that further patches will enable.

    Based on earlier work from Peng Tao.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

11 Feb, 2015

1 commit

  • __generic_block_fiemap may spin very long time for large sparse files.

    Without this patch an unprivileged user may abuse system resources simply
    by spawning a vast number of unkilable busyloops (works on ext2/ext3):

    truncate --size 1T test
    for ((i=0;i /dev/null &
    done

    Signed-off-by: Dmitry Monakhov
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Monakhov
     

17 Dec, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "A comparatively quieter cycle for nfsd this time, but still with two
    larger changes:

    - RPC server scalability improvements from Jeff Layton (using RCU
    instead of a spinlock to find idle threads).

    - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
    Schumaker, enabling fallocate on new clients"

    * 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd4: fix xdr4 count of server in fs_location4
    nfsd4: fix xdr4 inclusion of escaped char
    sunrpc/cache: convert to use string_escape_str()
    sunrpc: only call test_bit once in svc_xprt_received
    fs: nfsd: Fix signedness bug in compare_blob
    sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
    sunrpc: convert to lockless lookup of queued server threads
    sunrpc: fix potential races in pool_stats collection
    sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
    sunrpc: require svc_create callers to pass in meaningful shutdown routine
    sunrpc: have svc_wake_up only deal with pool 0
    sunrpc: convert sp_task_pending flag to use atomic bitops
    sunrpc: move rq_cachetype field to better optimize space
    sunrpc: move rq_splice_ok flag into rq_flags
    sunrpc: move rq_dropme flag into rq_flags
    sunrpc: move rq_usedeferral flag to rq_flags
    sunrpc: move rq_local field to rq_flags
    sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
    nfsd: minor off by one checks in __write_versions()
    sunrpc: release svc_pool_map reference when serv allocation fails
    ...

    Linus Torvalds
     

17 Nov, 2014

1 commit

  • Currently, freezing a filesystem involves calling freeze_super, which locks
    sb->s_umount and then calls the fs-specific freeze_fs hook. This makes it
    hard for gfs2 (and potentially other cluster filesystems) to use the vfs
    freezing code to do freezes on all the cluster nodes.

    In order to communicate that a freeze has been requested, and to make sure
    that only one node is trying to freeze at a time, gfs2 uses a glock
    (sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire
    this lock before calling freeze_super. This means that two nodes can
    attempt to freeze the filesystem by both calling freeze_super, acquiring
    the sb->s_umount lock, and then attempting to grab the cluster glock
    sd_freeze_gl. Only one will succeed, and the other will be stuck in
    freeze_super, making it impossible to finish freezing the node.

    To solve this problem, this patch adds the freeze_super and thaw_super
    hooks. If a filesystem implements these hooks, they are called instead of
    the vfs freeze_super and thaw_super functions. This means that every
    filesystem that implements these hooks must call the vfs freeze_super and
    thaw_super functions itself within the hook function to make use of the vfs
    freezing code.

    Reviewed-by: Jan Kara
    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

08 Nov, 2014

1 commit

  • This function needs to be exported so it can be used by the NFSD module
    when responding to the new ALLOCATE and DEALLOCATE operations in NFS
    v4.2. Christoph Hellwig suggested renaming the function to stay
    consistent with how other vfs functions are named.

    Signed-off-by: Anna Schumaker
    Signed-off-by: J. Bruce Fields

    Anna Schumaker
     

25 Oct, 2013

1 commit


23 Feb, 2013

1 commit


27 Sep, 2012

1 commit


29 Feb, 2012

1 commit


06 Jan, 2012

1 commit

  • We're doing some odd things there, which already messes up various users
    (see the net/socket.c code that this removes), and it was going to add
    yet more crud to the block layer because of the incorrect error code
    translation.

    ENOIOCTLCMD is not an error return that should be returned to user mode
    from the "ioctl()" system call, but it should *not* be translated as
    EINVAL ("Invalid argument"). It should be translated as ENOTTY
    ("Inappropriate ioctl for device").

    That EINVAL confusion has apparently so permeated some code that the
    block layer actually checks for it, which is sad. We continue to do so
    for now, but add a big comment about how wrong that is, and we should
    remove it entirely eventually. In the meantime, this tries to keep the
    changes localized to just the EINVAL -> ENOTTY fix, and removing code
    that makes it harder to do the right thing.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Mar, 2011

1 commit

  • Move declaration of 'inode' to beginning of the function. Since it
    is referenced directly or indirectly (in case of FIFREEZE/FITHAW/
    FS_IOC_FIEMAP) it's not harmful IMHO. And remove unnecessary casts
    using 'argp' instead.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Al Viro

    Namhyung Kim
     

03 Feb, 2011

1 commit

  • Some filesystems don't deal well with being asked to map less than
    blocksize blocks (GFS2 for example). Since we are always mapping at least
    blocksize sections anyway, just make sure len is at least as big as a
    blocksize so we don't trip up any filesystems. Thanks,

    Signed-off-by: Josef Bacik
    Cc: Steven Whitehouse
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

17 Jan, 2011

1 commit

  • The fi_extents_start field of struct fiemap_extent_info is a
    user pointer but was not marked as __user. This makes sparse
    emit following warnings:

    CHECK fs/ioctl.c
    fs/ioctl.c:114:26: warning: incorrect type in argument 1 (different address spaces)
    fs/ioctl.c:114:26: expected void [noderef] *dst
    fs/ioctl.c:114:26: got struct fiemap_extent *[assigned] dest
    fs/ioctl.c:202:14: warning: incorrect type in argument 1 (different address spaces)
    fs/ioctl.c:202:14: expected void const volatile [noderef] *
    fs/ioctl.c:202:14: got struct fiemap_extent *[assigned] fi_extents_start
    fs/ioctl.c:212:27: warning: incorrect type in argument 1 (different address spaces)
    fs/ioctl.c:212:27: expected void [noderef] *dst
    fs/ioctl.c:212:27: got char *

    Also add 'ufiemap' variable to eliminate unnecessary casts.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Al Viro

    Namhyung Kim
     

20 Nov, 2010

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard
    fs: Do not dispatch FITRIM through separate super_operation
    ext4: ext4_fill_super shouldn't return 0 on corruption
    jbd2: fix /proc/fs/jbd2/ when using an external journal
    ext4: missing unlock in ext4_clear_request_list()
    ext4: fix setting random pages PageUptodate

    Linus Torvalds
     
  • There was concern that FITRIM ioctl is not common enough to be included
    in core vfs ioctl, as Christoph Hellwig pointed out there's no real point
    in dispatching this out to a separate vector instead of just through
    ->ioctl.

    So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs
    from super_operation structure.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

18 Nov, 2010

1 commit


28 Oct, 2010

1 commit

  • Adds an filesystem independent ioctl to allow implementation of file
    system batched discard support. I takes fstrim_range structure as an
    argument. fstrim_range is definec in the include/fs.h and its
    definition is as follows.

    struct fstrim_range {
    start;
    len;
    minlen;
    }

    start - first Byte to trim
    len - number of Bytes to trim from start
    minlen - minimum extent length to trim, free extents shorter than this
    number of Bytes will be ignored. This will be rounded up to fs
    block size.

    It is also possible to specify NULL as an argument. In this case the
    arguments will set itself as follows:

    start = 0;
    len = ULLONG_MAX;
    minlen = 0;

    So it will trim the whole file system at one run.

    After the FITRIM is done, the number of actually discarded Bytes is stored
    in fstrim_range.len to give the user better insight on how much storage
    space has been really released for wear-leveling.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

14 Aug, 2010

1 commit


22 May, 2010

1 commit

  • Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then
    letting it do all the work. But freezing is more of an fs thing, and doesn't
    really have much to do with the bdev at all, all the work gets done with the
    super. In btrfs we do not populate s_bdev, since we can have multiple bdev's
    for one fs and setting s_bdev makes removing devices from a pool kind of tricky.
    This means that freezing a btrfs filesystem fails, which causes us to corrupt
    with things like tux-on-ice which use the fsfreeze mechanism. So instead of
    populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs
    freezing stuff into freeze_super and thaw_super. These just take the
    super_block that we're freezing and does the appropriate work. It's basically
    just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to
    use the new super helpers. I've tested this with ext4 and btrfs and verified
    everything continues to work the same as before.

    The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if
    the fs is already frozen. I thought this was a better solution than adding a
    freeze counter to the super_block, but if everybody hates this idea I'm open to
    suggestions. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

24 Apr, 2010

1 commit

  • This cleans up a few of the complaints of __generic_block_fiemap. I've
    fixed all the typing stuff, used inline functions instead of macros,
    gotten rid of a couple of variables, and made sure the size and block
    requests are all block aligned. It also fixes a problem where sometimes
    FIEMAP_EXTENT_LAST wasn't being set properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

12 Nov, 2009

1 commit

  • Because of an integer overflow on start_blk, various kind of wrong results
    would be returned by the generic_block_fiemap() handler, such as no
    extents when there is a 4GB+ hole at the beginning of the file, or wrong
    fe_logical when an extent starts after the first 4GB.

    Signed-off-by: Mike Hommey
    Cc: Alexander Viro
    Cc: Steven Whitehouse
    Cc: Theodore Ts'o
    Cc: Eric Sandeen
    Cc: Josef Bacik
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Hommey
     

24 Sep, 2009

1 commit

  • If fiemap_check_ranges is passed a large enough value, then it's
    possible that the value would be cast to a signed value for comparison
    against s_maxbytes when we change it to loff_t. Make sure that doesn't
    happen by explicitly casting s_maxbytes to an unsigned value for the
    purposes of comparison.

    Signed-off-by: Jeff Layton
    Cc: Christoph Hellwig
    Cc: Robert Love
    Cc: Al Viro
    Cc: Johannes Weiner
    Cc: Mandeep Singh Baines
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Jeff Layton
     

24 Jun, 2009

1 commit


17 Jun, 2009

1 commit


14 May, 2009

1 commit


07 May, 2009

1 commit

  • Fix a problem where the generic block based fiemap stuff would not
    properly set FIEMAP_EXTENT_LAST on the last extent. I've reworked things
    to keep track if we go past the EOF, and mark the last extent properly.
    The problem was reported by and tested by Eric Sandeen.

    Tested-by: Eric Sandeen
    Signed-off-by: Josef Bacik
    Cc:
    Cc:
    Cc:
    Cc: Steven Whitehouse
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

16 Mar, 2009

3 commits

  • Most fasync implementations do something like:

    return fasync_helper(...);

    But fasync_helper() will return a positive value at times - a feature used
    in at least one place. Thus, a number of other drivers do:

    err = fasync_helper(...);
    if (err < 0)
    return err;
    return 0;

    In the interests of consistency and more concise code, it makes sense to
    map positive return values onto zero where ->fasync() is called.

    Cc: Al Viro
    Signed-off-by: Jonathan Corbet

    Jonathan Corbet
     
  • Removing the BKL from FASYNC handling ran into the challenge of keeping the
    setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
    the underlying fasync() function. Andi Kleen suggested moving the handling
    of that bit into fasync(); this patch does exactly that. As a result, we
    have a couple of internal API changes: fasync() must now manage the FASYNC
    bit, and it will be called without the BKL held.

    As it happens, every fasync() implementation in the kernel with one
    exception calls fasync_helper(). So, if we make fasync_helper() set the
    FASYNC bit, we can avoid making any changes to the other fasync()
    functions - as long as those functions, themselves, have proper locking.
    Most fasync() implementations do nothing but call fasync_helper() - which
    has its own lock - so they are easily verified as correct. The BKL had
    already been pushed down into the rest.

    The networking code has its own version of fasync_helper(), so that code
    has been augmented with explicit FASYNC bit handling.

    Cc: Al Viro
    Cc: David Miller
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jonathan Corbet

    Jonathan Corbet
     
  • Traditionally, changes to struct file->f_flags have been done under BKL
    protection, or with no protection at all. This patch causes all f_flags
    changes after file open/creation time to be done under protection of
    f_lock. This allows the removal of some BKL usage and fixes a number of
    longstanding (if microscopic) races.

    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Jonathan Corbet

    Jonathan Corbet
     

14 Jan, 2009

1 commit


10 Jan, 2009

1 commit

  • The ioctls for the generic freeze feature are below.
    o Freeze the filesystem
    int ioctl(int fd, int FIFREEZE, arg)
    fd: The file descriptor of the mountpoint
    FIFREEZE: request code for the freeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1

    o Unfreeze the filesystem
    int ioctl(int fd, int FITHAW, arg)
    fd: The file descriptor of the mountpoint
    FITHAW: request code for unfreeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1
    Error number: If the filesystem has already been unfrozen,
    errno is set to EINVAL.

    [akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
    Signed-off-by: Takashi Sato
    Signed-off-by: Masayuki Hamaguchi
    Cc:
    Cc:
    Cc: Christoph Hellwig
    Cc: Dave Kleikamp
    Cc: Dave Chinner
    Cc: Alasdair G Kergon
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Takashi Sato
     

05 Jan, 2009

1 commit

  • This patch implements the FIEMAP ioctl for GFS2. We can use the generic
    code (aside from a lock order issue, solved as per Ted Tso's suggestion)
    for which I've introduced a new variant of the generic function. We also
    have one exception to deal with, namely stuffed files, so we do that
    "by hand", setting all the required flags.

    This has been tested with a modified (I could only find an old version) of
    Eric's test program, and appears to work correctly.

    This patch does not currently support FIEMAP of xattrs, but the plan is to add
    that feature at some future point.

    Signed-off-by: Steven Whitehouse
    Cc: Theodore Tso
    Cc: Eric Sandeen

    Steven Whitehouse
     

06 Dec, 2008

1 commit

  • Changeset a238b790d5f99c7832f9b73ac8847025815b85f7 (Call fasync()
    functions without the BKL) introduced a race which could leave
    file->f_flags in a state inconsistent with what the underlying
    driver/filesystem believes. Revert that change, and also fix the same
    races in ioctl_fioasync() and ioctl_fionbio().

    This is a minimal, short-term fix; the real fix will not involve the
    BKL.

    Reported-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Al Viro
    Cc: stable@kernel.org
    Signed-off-by: Jonathan Corbet
    Signed-off-by: Linus Torvalds

    Jonathan Corbet