12 Jan, 2017

1 commit

  • commit 1bb33a98702d8360947f18a44349df75ba555d5d upstream.

    After various discussions on linux-fsdevel, it has been decided that it
    is not necessary to cap the length of a dedupe request, and that
    correctly-written userspace client programs will be able to absorb the
    change. Therefore, remove the length clamping behavior.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner
    Cc: Christoph Hellwig
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     

20 Oct, 2016

6 commits


14 Oct, 2016

1 commit

  • …kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
    ----------------------------------
    \ ^__^
    \ (oo)\_______
    (__)\ )\/\
    ||----w |
    || ||

    Pull XFS support for shared data extents from Dave Chinner:
    "This is the second part of the XFS updates for this merge cycle. This
    pullreq contains the new shared data extents feature for XFS.

    Given the complexity and size of this change I am expecting - like the
    addition of reverse mapping last cycle - that there will be some
    follow-up bug fixes and cleanups around the -rc3 stage for issues that
    I'm sure will show up once the code hits a wider userbase.

    What it is:

    At the most basic level we are simply adding shared data extents to
    XFS - i.e. a single extent on disk can now have multiple owners. To do
    this we have to add new on-disk features to both track the shared
    extents and the number of times they've been shared. This is done by
    the new "refcount" btree that sits in every allocation group. When we
    share or unshare an extent, this tree gets updated.

    Along with this new tree, the reverse mapping tree needs to be updated
    to track each owner or a shared extent. This also needs to be updated
    ever share/unshare operation. These interactions at extent allocation
    and freeing time have complex ordering and recovery constraints, so
    there's a significant amount of new intent-based transaction code to
    ensure that operations are performed atomically from both the runtime
    and integrity/crash recovery perspectives.

    We also need to break sharing when writes hit a shared extent - this
    is where the new copy-on-write implementation comes in. We allocate
    new storage and copy the original data along with the overwrite data
    into the new location. We only do this for data as we don't share
    metadata at all - each inode has it's own metadata that tracks the
    shared data extents, the extents undergoing CoW and it's own private
    extents.

    Of course, being XFS, nothing is simple - we use delayed allocation
    for CoW similar to how we use it for normal writes. ENOSPC is a
    significant issue here - we build on the reservation code added in
    4.8-rc1 with the reverse mapping feature to ensure we don't get
    spurious ENOSPC issues part way through a CoW operation. These
    mechanisms also help minimise fragmentation due to repeated CoW
    operations. To further reduce fragmentation overhead, we've also
    introduced a CoW extent size hint, which indicates how large a region
    we should allocate when we execute a CoW operation.

    With all this functionality in place, we can hook up .copy_file_range,
    .clone_file_range and .dedupe_file_range and we gain all the
    capabilities of reflink and other vfs provided functionality that
    enable manipulation to shared extents. We also added a fallocate mode
    that explicitly unshares a range of a file, which we implemented as an
    explicit CoW of all the shared extents in a file.

    As such, it's a huge chunk of new functionality with new on-disk
    format features and internal infrastructure. It warns at mount time as
    an experimental feature and that it may eat data (as we do with all
    new on-disk features until they stabilise). We have not released
    userspace suport for it yet - userspace support currently requires
    download from Darrick's xfsprogs repo and build from source, so the
    access to this feature is really developer/tester only at this point.
    Initial userspace support will be released at the same time the kernel
    with this code in it is released.

    The new code causes 5-6 new failures with xfstests - these aren't
    serious functional failures but things the output of tests changing
    slightly due to perturbations in layouts, space usage, etc. OTOH,
    we've added 150+ new tests to xfstests that specifically exercise this
    new functionality so it's got far better test coverage than any
    functionality we've previously added to XFS.

    Darrick has done a pretty amazing job getting us to this stage, and
    special mention also needs to go to Christoph (review, testing,
    improvements and bug fixes) and Brian (caught several intricate bugs
    during review) for the effort they've also put in.

    Summary:

    - unshare range (FALLOC_FL_UNSHARE) support for fallocate

    - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
    interface

    - shared extent support for XFS

    - copy-on-write support for shared extents

    - copy_file_range support

    - clone_file_range support (implements reflink)

    - dedupe_file_range support

    - defrag support for reverse mapping enabled filesystems"

    * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
    xfs: convert COW blocks to real blocks before unwritten extent conversion
    xfs: rework refcount cow recovery error handling
    xfs: clear reflink flag if setting realtime flag
    xfs: fix error initialization
    xfs: fix label inaccuracies
    xfs: remove isize check from unshare operation
    xfs: reduce stack usage of _reflink_clear_inode_flag
    xfs: check inode reflink flag before calling reflink functions
    xfs: implement swapext for rmap filesystems
    xfs: refactor swapext code
    xfs: various swapext cleanups
    xfs: recognize the reflink feature bit
    xfs: simulate per-AG reservations being critically low
    xfs: don't mix reflink and DAX mode for now
    xfs: check for invalid inode reflink flags
    xfs: set a default CoW extent size of 32 blocks
    xfs: convert unwritten status of reverse mappings for shared files
    xfs: use interval query for rmap alloc operations on shared files
    xfs: add shared rmap map/unmap/convert log item types
    xfs: increase log reservations for reflink
    ...

    Linus Torvalds
     

11 Oct, 2016

3 commits

  • Pull splice fixups from Al Viro:
    "A couple of fixups for interaction of pipe-backed iov_iter with
    O_DIRECT reads + constification of a couple of primitives in uio.h
    missed by previous rounds.

    Kudos to davej - his fuzzing has caught those bugs"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    [btrfs] fix check_direct_IO() for non-iovec iterators
    constify iov_iter_count() and iter_is_iovec()
    fix ITER_PIPE interaction with direct_IO

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     
  • by making sure we call iov_iter_advance() on original
    iov_iter even if direct_IO (done on its copy) has returned 0.
    It's a no-op for old iov_iter flavours and does the right thing
    (== truncation of the stuff we'd allocated, but not filled) in
    ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with
    by cleanup in generic_file_read_iter().

    Signed-off-by: Al Viro

    Al Viro
     

10 Oct, 2016

1 commit


08 Oct, 2016

4 commits

  • Al Viro
     
  • Merge updates from Andrew Morton:

    - fsnotify updates

    - ocfs2 updates

    - all of MM

    * emailed patches from Andrew Morton : (127 commits)
    console: don't prefer first registered if DT specifies stdout-path
    cred: simpler, 1D supplementary groups
    CREDITS: update Pavel's information, add GPG key, remove snail mail address
    mailmap: add Johan Hovold
    .gitattributes: set git diff driver for C source code files
    uprobes: remove function declarations from arch/{mips,s390}
    spelling.txt: "modeled" is spelt correctly
    nmi_backtrace: generate one-line reports for idle cpus
    arch/tile: adopt the new nmi_backtrace framework
    nmi_backtrace: do a local dump_stack() instead of a self-NMI
    nmi_backtrace: add more trigger_*_cpu_backtrace() methods
    min/max: remove sparse warnings when they're nested
    Documentation/filesystems/proc.txt: add more description for maps/smaps
    mm, proc: fix region lost in /proc/self/smaps
    proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
    proc: add LSM hook checks to /proc//timerslack_ns
    proc: relax /proc//timerslack_ns capability requirements
    meminfo: break apart a very long seq_printf with #ifdefs
    seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
    proc: faster /proc/*/status
    ...

    Linus Torvalds
     
  • To support DAX pmd mappings with unmodified applications, filesystems
    need to align an mmap address by the pmd size.

    Call thp_get_unmapped_area() from f_op->get_unmapped_area.

    Note, there is no change in behavior for a non-DAX file.

    Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com
    Signed-off-by: Toshi Kani
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Theodore Ts'o
    Cc: Andreas Dilger
    Cc: Mike Kravetz
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Pull VFS splice updates from Al Viro:
    "There's a bunch of branches this cycle, both mine and from other folks
    and I'd rather send pull requests separately.

    This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
    (and introduction of such). Gets rid of a lot of code in fs/splice.c
    and elsewhere; there will be followups, but these are for the next
    cycle... Some pipe/splice-related cleanups from Miklos in the same
    branch as well"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    pipe: fix comment in pipe_buf_operations
    pipe: add pipe_buf_steal() helper
    pipe: add pipe_buf_confirm() helper
    pipe: add pipe_buf_release() helper
    pipe: add pipe_buf_get() helper
    relay: simplify relay_file_read()
    switch default_file_splice_read() to use of pipe-backed iov_iter
    switch generic_file_splice_read() to use of ->read_iter()
    new iov_iter flavour: pipe-backed
    fuse_dev_splice_read(): switch to add_to_pipe()
    skb_splice_bits(): get rid of callback
    new helper: add_to_pipe()
    splice: lift pipe_lock out of splice_to_pipe()
    splice: switch get_iovec_page_array() to iov_iter
    splice_to_pipe(): don't open-code wakeup_pipe_readers()
    consistent treatment of EFAULT on O_DIRECT read/write

    Linus Torvalds
     

06 Oct, 2016

7 commits

  • Since we don't have a strategy for handling both DAX and reflink,
    for now we'll just prohibit both being set at the same time.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Trim CoW reservations made on behalf of a cowextsz hint if they get too
    old or we run low on quota, so long as we don't have dirty data awaiting
    writeback or directio operations in progress.

    Garbage collection of the cowextsize extents are kept separate from
    prealloc extent reaping because setting the CoW prealloc lifetime to a
    (much) higher value than the regular prealloc extent lifetime has been
    useful for combatting CoW fragmentation on VM hosts where the VMs
    experience bursty write behaviors and we can keep the utilization ratios
    low enough that we don't start to run out of space. IOWs, it benefits
    us to keep the CoW fork reservations around for as long as we can unless
    we run out of blocks or hit inode reclaim.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Unshare all shared extents if the user calls fallocate with the new
    unshare mode flag set, so that we can guarantee that a subsequent
    write will not ENOSPC.

    Signed-off-by: Darrick J. Wong
    [hch: pass inode instead of file to xfs_reflink_dirty_range,
    use iomap infrastructure for copy up]
    Signed-off-by: Christoph Hellwig

    Darrick J. Wong
     
  • Define a VFS function which allows userspace to request that the
    kernel reflink a range of blocks between two files if the ranges'
    contents match. The function fits the new VFS ioctl that standardizes
    the checking for the btrfs EXTENT SAME ioctl.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Define two VFS functions which allow userspace to reflink a range of
    blocks between two files or to reflink one file's contents to another.
    These functions fit the new VFS ioctls that standardize the checking
    for the btrfs CLONE and CLONE RANGE ioctls.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • For O_DIRECT writes to shared blocks, we have to CoW them just like
    we would with buffered writes. For writes that are not block-aligned,
    just bounce them to the page cache.

    For block-aligned writes, however, we can do better than that. Use
    the same mechanisms that we employ for buffered CoW to set up a
    delalloc reservation, allocate all the blocks at once, issue the
    writes against the new blocks and use the same ioend functions to
    remap the blocks after the write. This should be fairly performant.

    Christoph discovered that xfs_reflink_allocate_cow_range may stumble
    over invalid entries in the extent array given that it drops the ilock
    but still expects the index to be stable. Simple fixing it to a new
    lookup for every iteration still isn't correct given that
    xfs_bmapi_allocate will trigger a BUG_ON() if hitting a hole, and
    there is nothing preventing a xfs_bunmapi_cow call removing extents
    once we dropped the ilock either.

    This patch duplicates the inner loop of xfs_bmapi_allocate into a
    helper for xfs_reflink_allocate_cow_range so that it can be done under
    the same ilock critical section as our CoW fork delayed allocation.
    The directio CoW warts will be revisited in a later patch.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Christoph Hellwig

    Darrick J. Wong
     
  • ... and kill the ->splice_read() instances that can be switched to it

    Signed-off-by: Al Viro

    Al Viro
     

03 Oct, 2016

2 commits


22 Sep, 2016

1 commit

  • To avoid clearing of capabilities or security related extended
    attributes too early, inode_change_ok() will need to take dentry instead
    of inode. Propagate dentry down to functions calling inode_change_ok().
    This is rather straightforward except for xfs_set_mode() function which
    does not have dentry easily available. Luckily that function does not
    call inode_change_ok() anyway so we just have to do a little dance with
    function prototypes.

    Acked-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

19 Sep, 2016

2 commits

  • Another users of buffer_heads bytes the dust.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • So far DAX writes inherited the locking from direct I/O writes, but
    the direct I/O model of using shared locks for writes is actually
    wrong for DAX. For direct I/O we're out of any standards and don't
    have to provide the Posix required exclusion between writers, but
    for DAX which gets transparently enable on applications without any
    knowledge of it we can't simply drop the requirement. Even worse
    this only happens for aligned writes and thus doesn't show up for
    many typical use cases.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

17 Aug, 2016

1 commit

  • When we do DAX IO, we try to invalidate the entire page cache held
    on the file. This is incorrect as it will trash the entire mapping
    tree that now tracks dirty state in exceptional entries in the radix
    tree slots.

    What we are trying to do is remove cached pages (e.g from reads
    into holes) that sit in the radix tree over the range we are about
    to write to. Hence we should just limit the invalidation to the
    range we are about to overwrite.

    Reported-by: Jan Kara
    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Dave Chinner
     

28 Jul, 2016

1 commit

  • Pull xfs updates from Dave Chinner:
    "The major addition is the new iomap based block mapping
    infrastructure. We've been kicking this about locally for years, but
    there are other filesystems want to use it too (e.g. gfs2). Now it
    is fully working, reviewed and ready for merge and be used by other
    filesystems.

    There are a lot of other fixes and cleanups in the tree, but those are
    XFS internal things and none are of the scale or visibility of the
    iomap changes. See below for details.

    I am likely to send another pull request next week - we're just about
    ready to merge some new functionality (on disk block->owner reverse
    mapping infrastructure), but that's a huge chunk of code (74 files
    changed, 7283 insertions(+), 1114 deletions(-)) so I'm keeping that
    separate to all the "normal" pull request changes so they don't get
    lost in the noise.

    Summary of changes in this update:
    - generic iomap based IO path infrastructure
    - generic iomap based fiemap implementation
    - xfs iomap based Io path implementation
    - buffer error handling fixes
    - tracking of in flight buffer IO for unmount serialisation
    - direct IO and DAX io path separation and simplification
    - shortform directory format definition changes for wider platform
    compatibility
    - various buffer cache fixes
    - cleanups in preparation for rmap merge
    - error injection cleanups and fixes
    - log item format buffer memory allocation restructuring to prevent
    rare OOM reclaim deadlocks
    - sparse inode chunks are now fully supported"

    * tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (53 commits)
    xfs: remove EXPERIMENTAL tag from sparse inode feature
    xfs: bufferhead chains are invalid after end_page_writeback
    xfs: allocate log vector buffers outside CIL context lock
    libxfs: directory node splitting does not have an extra block
    xfs: remove dax code from object file when disabled
    xfs: skip dirty pages in ->releasepage()
    xfs: remove __arch_pack
    xfs: kill xfs_dir2_inou_t
    xfs: kill xfs_dir2_sf_off_t
    xfs: split direct I/O and DAX path
    xfs: direct calls in the direct I/O path
    xfs: stop using generic_file_read_iter for direct I/O
    xfs: split xfs_file_read_iter into buffered and direct I/O helpers
    xfs: remove s_maxbytes enforcement in xfs_file_read_iter
    xfs: kill ioflags
    xfs: don't pass ioflags around in the ioctl path
    xfs: track and serialize in-flight async buffers against unmount
    xfs: exclude never-released buffers from buftarg I/O accounting
    xfs: don't reset b_retries to 0 on every failure
    xfs: remove extraneous buffer flag changes
    ...

    Linus Torvalds
     

27 Jul, 2016

1 commit

  • Remove the unused wrappers dax_fault() and dax_pmd_fault(). After this
    removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
    dax_pmd_fault() respectively, and update all callers.

    The dax_fault() and dax_pmd_fault() wrappers were initially intended to
    capture some filesystem independent functionality around page faults
    (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
    and ctime).

    However, the following commits:

    5726b27b09cc ("ext2: Add locking for DAX faults")
    ea3d7209ca01 ("ext4: fix races between page faults and hole punching")

    added locking to the ext2 and ext4 filesystems after these common
    operations but before __dax_fault() and __dax_pmd_fault() were called.
    This means that these wrappers are no longer used, and are unlikely to
    be used in the future.

    XFS has had locking analogous to what was recently added to ext2 and
    ext4 since DAX support was initially introduced by:

    6b698edeeef0 ("xfs: add DAX file operations support")

    Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andreas Dilger
    Cc: Dan Williams
    Cc: Dave Chinner
    Reviewed-by: Jan Kara
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

22 Jul, 2016

2 commits

  • Dave Chinner
     
  • We check IS_DAX(inode) before calling either xfs_file_dax_read or
    xfs_file_dax_write, and this will lead the call being optimized out at
    compile time when CONFIG_FS_DAX is disabled.

    However, the two functions are marked STATIC, so they become global
    symbols when CONFIG_XFS_DEBUG is set, leaving us with two unused global
    functions that call into an undefined function and a broken "allmodconfig"
    build:

    fs/built-in.o: In function `xfs_file_dax_read':
    fs/xfs/xfs_file.c:348: undefined reference to `dax_do_io'
    fs/built-in.o: In function `xfs_file_dax_write':
    fs/xfs/xfs_file.c:758: undefined reference to `dax_do_io'

    Marking the two functions 'static noinline' instead of 'STATIC' will let
    the compiler drop the symbols when there are no callers but avoid the
    implicit inlining.

    Signed-off-by: Arnd Bergmann
    Fixes: 16d4d43595b4 ("xfs: split direct I/O and DAX path")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Arnd Bergmann
     

20 Jul, 2016

7 commits