28 Feb, 2017

1 commit

  • Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
    branch.

    This patch also fixes multiple checkpatch warnings: WARNING: Prefer
    'unsigned int' to bare use of 'unsigned'

    Thanks to Andrew Morton for suggesting more appropriate function instead
    of macro.

    [geliangtang@gmail.com: truncate: use i_blocksize()]
    Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
    Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Signed-off-by: Geliang Tang
    Cc: Alexander Viro
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

25 Feb, 2017

3 commits

  • Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has
    been somewhat painful with getting the flags set and removed at the
    correct locations. More than one kernel oops was introduced due to
    difficulties of getting the placement correctly.

    Remove the flag values and introduce an input parameter to huge_fault
    that indicates the size of the page entry. This makes the code easier
    to trace and should avoid the issues we see with the fault flags where
    removal of the flag was necessary in the fallback paths.

    Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Tested-by: Dan Williams
    Reviewed-by: Jan Kara
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Nilesh Choudhury
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Patch series "1G transparent hugepage support for device dax", v2.

    The following series implements support for 1G trasparent hugepage on
    x86 for device dax. The bulk of the code was written by Mathew Wilcox a
    while back supporting transparent 1G hugepage for fs DAX. I have
    forward ported the relevant bits to 4.10-rc. The current submission has
    only the necessary code to support device DAX.

    Comments from Dan Williams: So the motivation and intended user of this
    functionality mirrors the motivation and users of 1GB page support in
    hugetlbfs. Given expected capacities of persistent memory devices an
    in-memory database may want to reduce tlb pressure beyond what they can
    already achieve with 2MB mappings of a device-dax file. We have
    customer feedback to that effect as Willy mentioned in his previous
    version of these patches [1].

    [1]: https://lkml.org/lkml/2016/1/31/52

    Comments from Nilesh @ Oracle:

    There are applications which have a process model; and if you assume
    10,000 processes attempting to mmap all the 6TB memory available on a
    server; we are looking at the following:

    processes : 10,000
    memory : 6TB
    pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
    pmd @ 2M page size: 120,000 / 512 = ~240GB
    pud @ 1G page size: 240GB / 512 = ~480MB

    As you can see with 2M pages, this system will use up an exorbitant
    amount of DRAM to hold the page tables; but the 1G pages finally brings
    it down to a reasonable level. Memory sizes will keep increasing; so
    this number will keep increasing.

    An argument can be made to convert the applications from process model
    to thread model, but in the real world that may not be always practical.
    Hopefully this helps explain the use case where this is valuable.

    This patch (of 3):

    In preparation for adding the ability to handle PUD pages, convert
    vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault. The
    vm_fault structure is extended to include a union of the different page
    table pointers that may be needed, and three flag bits are reserved to
    indicate which type of pointer is in the union.

    [ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
    Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
    [dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
    Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
    Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Matthew Wilcox
    Signed-off-by: Dave Jiang
    Signed-off-by: Ross Zwisler
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Jan Kara
    Cc: Dan Williams
    Cc: Kirill A. Shutemov
    Cc: Nilesh Choudhury
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Dave Jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

23 Feb, 2017

3 commits

  • Merge updates from Andrew Morton:
    "142 patches:

    - DAX updates

    - various misc bits

    - OCFS2 updates

    - most of MM"

    * emailed patches from Andrew Morton : (142 commits)
    mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
    mm: fix stray kernel-doc notation
    zram: remove obsolete sysfs attrs
    mm/memblock.c: remove unnecessary log and clean up
    oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA
    mm: drop unused argument of zap_page_range()
    mm: drop zap_details::check_swap_entries
    mm: drop zap_details::ignore_dirty
    mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled
    mm: help __GFP_NOFAIL allocations which do not trigger OOM killer
    mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
    mm: consolidate GFP_NOFAIL checks in the allocator slowpath
    lib/show_mem.c: teach show_mem to work with the given nodemask
    arch, mm: remove arch specific show_mem
    mm, page_alloc: warn_alloc print nodemask
    mm, page_alloc: do not report all nodes in show_mem
    Revert "mm: bail out in shrink_inactive_list()"
    mm, vmscan: consider eligible zones in get_scan_count
    mm, vmscan: cleanup lru size claculations
    mm, vmscan: do not count freed pages as PGDEACTIVATE
    ...

    Linus Torvalds
     
  • pmd_fault() and related functions really only need the vmf parameter since
    the additional parameters are all included in the vmf struct. Remove the
    additional parameter and simplify pmd_fault() and friends.

    Link: http://lkml.kernel.org/r/1484085142-2297-8-git-send-email-ross.zwisler@linux.intel.com
    Signed-off-by: Dave Jiang
    Reviewed-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Dave Jiang
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Instead of passing in multiple parameters in the pmd_fault() handler,
    a vmf can be passed in just like a fault() handler. This will simplify
    code and remove the need for the actual pmd fault handlers to allocate a
    vmf. Related functions are also modified to do the same.

    [dave.jiang@intel.com: fix issue with xfs_tests stall when DAX option is off]
    Link: http://lkml.kernel.org/r/148469861071.195597.3619476895250028518.stgit@djiang5-desk3.ch.intel.com
    Link: http://lkml.kernel.org/r/1484085142-2297-7-git-send-email-ross.zwisler@linux.intel.com
    Signed-off-by: Dave Jiang
    Reviewed-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

07 Feb, 2017

2 commits

  • Instead of preallocating all the required COW blocks in the high-level
    write code do it inside the iomap code, like we do for all other I/O.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • We currently fall back from direct to buffered writes if we detect a
    remaining shared extent in the iomap_begin callback. But by the time
    iomap_begin is called for the potentially unaligned end block we might
    have already written most of the data to disk, which we'd now write
    again using buffered I/O. To avoid this reject all writes to reflinked
    files before starting I/O so that we are guaranteed to only write the
    data once.

    The alternative would be to unshare the unaligned start and/or end block
    before doing the I/O. I think that's doable, and will actually be
    required to support reflinks on DAX file system. But it will take a
    little more time and I'd rather get rid of the double write ASAP.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

03 Feb, 2017

1 commit

  • When we open a directory, we try to readahead block 0 of the directory
    on the assumption that we're going to need it soon. If the bmbt is
    corrupt, the directory will never be usable and the readahead fails
    immediately, so we might as well prevent the directory from being opened
    at all. This prevents a subsequent read or modify operation from
    hitting it and taking the fs offline.

    NOTE: We're only checking for early failures in the block mapping, not
    the readahead directory block itself.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

31 Jan, 2017

1 commit

  • The xfs_eofblocks.eof_scan_owner field is an internal field to
    facilitate invoking eofb scans from the kernel while under the iolock.
    This is necessary because the eofb scan acquires the iolock of each
    inode. Synchronous scans are invoked on certain buffered write failures
    while under iolock. In such cases, the scan owner indicates that the
    context for the scan already owns the particular iolock and prevents a
    double lock deadlock.

    eofblocks scans while under iolock are still livelock prone in the event
    of multiple parallel scans, however. If multiple buffered writes to
    different inodes fail and invoke eofblocks scans at the same time, each
    scan avoids a deadlock with its own inode by virtue of the
    eof_scan_owner field, but will never be able to acquire the iolock of
    the inode from the parallel scan. Because the low free space scans are
    invoked with SYNC_WAIT, the scan will not return until it has processed
    every tagged inode and thus both scans will spin indefinitely on the
    iolock being held across the opposite scan. This problem can be
    reproduced reliably by generic/224 on systems with higher cpu counts
    (x16).

    To avoid this problem, simplify the semantics of eofblocks scans to
    never invoke a scan while under iolock. This means that the buffered
    write context must drop the iolock before the scan. It must reacquire
    the lock before the write retry and also repeat the initial write
    checks, as the original state might no longer be valid once the iolock
    was dropped.

    Signed-off-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

18 Dec, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    "In this pile:

    - autofs-namespace series
    - dedupe stuff
    - more struct path constification"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
    ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features
    ocfs2: charge quota for reflinked blocks
    ocfs2: fix bad pointer cast
    ocfs2: always unlock when completing dio writes
    ocfs2: don't eat io errors during _dio_end_io_write
    ocfs2: budget for extent tree splits when adding refcount flag
    ocfs2: prohibit refcounted swapfiles
    ocfs2: add newlines to some error messages
    ocfs2: convert inode refcount test to a helper
    simple_write_end(): don't zero in short copy into uptodate
    exofs: don't mess with simple_write_{begin,end}
    9p: saner ->write_end() on failing copy into non-uptodate page
    fix gfs2_stuffed_write_end() on short copies
    fix ceph_write_end()
    nfs_write_end(): fix handling of short copies
    vfs: refactor clone/dedupe_file_range common functions
    fs: try to clone files first in vfs_copy_file_range
    vfs: misc struct path constification
    namespace.c: constify struct path passed to a bunch of primitives
    quota: constify struct path in quota_on
    ...

    Linus Torvalds
     

10 Dec, 2016

1 commit

  • A clone is a perfectly fine implementation of a file copy, so most
    file systems just implement the copy that way. Instead of duplicating
    this logic move it to the VFS. Currently btrfs and XFS implement copies
    the same way as clones and there is no behavior change for them, cifs
    only implements clones and grow support for copy_file_range with this
    patch. NFS implements both, so this will allow copy_file_range to work
    on servers that only implement CLONE and be lot more efficient on servers
    that implements CLONE and COPY.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

09 Dec, 2016

2 commits


07 Dec, 2016

1 commit


05 Dec, 2016

1 commit

  • After various discussions on linux-fsdevel, it has been decided that it
    is not necessary to cap the length of a dedupe request, and that
    correctly-written userspace client programs will be able to absorb the
    change. Therefore, remove the length clamping behavior.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     

30 Nov, 2016

3 commits

  • Straight switch over to using iomap for direct I/O - we already have the
    non-COW dio path in write_begin for DAX and files with extent size hints,
    so nothing to add there. The COW path is ported over from the old
    get_blocks version and a bit of a mess, but I have some work in progress
    to make it look more like the buffered I/O COW path.

    This gets rid of xfs_get_blocks_direct and the last caller of
    xfs_get_blocks with the create flag set, so all that code can be removed.

    Last but not least I've removed a comment in xfs_filemap_fault that
    refers to xfs_get_blocks entirely instead of updating it - while the
    reference is correct, the whole DAX fault path looks different than
    the non-DAX one, so it seems rather pointless.

    Signed-off-by: Christoph Hellwig
    Tested-by: Jens Axboe
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which
    recently replaced i_mutex instead. This means we only have to take
    one lock instead of two in many fast path operations, and we can
    also shrink the xfs_inode structure. Thanks to the xfs_ilock family
    there is very little churn, the only thing of note is that we need
    to switch to use the lock_two_directory helper for taking the i_rwsem
    on two inodes in a few places to make sure our lock order matches
    the one used in the VFS.

    Signed-off-by: Christoph Hellwig
    Tested-by: Jens Axboe
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Dave Chinner
     

08 Nov, 2016

2 commits

  • Switch xfs_filemap_pmd_fault() from using dax_pmd_fault() to the new and
    improved dax_iomap_pmd_fault(). Also, now that it has no more users,
    remove xfs_get_blocks_dax_fault().

    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dave Chinner

    Ross Zwisler
     
  • The recently added DAX functions that use the new struct iomap data
    structure were named iomap_dax_rw(), iomap_dax_fault() and
    iomap_dax_actor(). These are actually defined in fs/dax.c, though, so
    should be part of the "dax" namespace and not the "iomap" namespace.
    Rename them to dax_iomap_rw(), dax_iomap_fault() and dax_iomap_actor()
    respectively.

    Signed-off-by: Ross Zwisler
    Suggested-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Dave Chinner

    Ross Zwisler
     

20 Oct, 2016

6 commits


14 Oct, 2016

1 commit

  • …kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
    ----------------------------------
    \ ^__^
    \ (oo)\_______
    (__)\ )\/\
    ||----w |
    || ||

    Pull XFS support for shared data extents from Dave Chinner:
    "This is the second part of the XFS updates for this merge cycle. This
    pullreq contains the new shared data extents feature for XFS.

    Given the complexity and size of this change I am expecting - like the
    addition of reverse mapping last cycle - that there will be some
    follow-up bug fixes and cleanups around the -rc3 stage for issues that
    I'm sure will show up once the code hits a wider userbase.

    What it is:

    At the most basic level we are simply adding shared data extents to
    XFS - i.e. a single extent on disk can now have multiple owners. To do
    this we have to add new on-disk features to both track the shared
    extents and the number of times they've been shared. This is done by
    the new "refcount" btree that sits in every allocation group. When we
    share or unshare an extent, this tree gets updated.

    Along with this new tree, the reverse mapping tree needs to be updated
    to track each owner or a shared extent. This also needs to be updated
    ever share/unshare operation. These interactions at extent allocation
    and freeing time have complex ordering and recovery constraints, so
    there's a significant amount of new intent-based transaction code to
    ensure that operations are performed atomically from both the runtime
    and integrity/crash recovery perspectives.

    We also need to break sharing when writes hit a shared extent - this
    is where the new copy-on-write implementation comes in. We allocate
    new storage and copy the original data along with the overwrite data
    into the new location. We only do this for data as we don't share
    metadata at all - each inode has it's own metadata that tracks the
    shared data extents, the extents undergoing CoW and it's own private
    extents.

    Of course, being XFS, nothing is simple - we use delayed allocation
    for CoW similar to how we use it for normal writes. ENOSPC is a
    significant issue here - we build on the reservation code added in
    4.8-rc1 with the reverse mapping feature to ensure we don't get
    spurious ENOSPC issues part way through a CoW operation. These
    mechanisms also help minimise fragmentation due to repeated CoW
    operations. To further reduce fragmentation overhead, we've also
    introduced a CoW extent size hint, which indicates how large a region
    we should allocate when we execute a CoW operation.

    With all this functionality in place, we can hook up .copy_file_range,
    .clone_file_range and .dedupe_file_range and we gain all the
    capabilities of reflink and other vfs provided functionality that
    enable manipulation to shared extents. We also added a fallocate mode
    that explicitly unshares a range of a file, which we implemented as an
    explicit CoW of all the shared extents in a file.

    As such, it's a huge chunk of new functionality with new on-disk
    format features and internal infrastructure. It warns at mount time as
    an experimental feature and that it may eat data (as we do with all
    new on-disk features until they stabilise). We have not released
    userspace suport for it yet - userspace support currently requires
    download from Darrick's xfsprogs repo and build from source, so the
    access to this feature is really developer/tester only at this point.
    Initial userspace support will be released at the same time the kernel
    with this code in it is released.

    The new code causes 5-6 new failures with xfstests - these aren't
    serious functional failures but things the output of tests changing
    slightly due to perturbations in layouts, space usage, etc. OTOH,
    we've added 150+ new tests to xfstests that specifically exercise this
    new functionality so it's got far better test coverage than any
    functionality we've previously added to XFS.

    Darrick has done a pretty amazing job getting us to this stage, and
    special mention also needs to go to Christoph (review, testing,
    improvements and bug fixes) and Brian (caught several intricate bugs
    during review) for the effort they've also put in.

    Summary:

    - unshare range (FALLOC_FL_UNSHARE) support for fallocate

    - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
    interface

    - shared extent support for XFS

    - copy-on-write support for shared extents

    - copy_file_range support

    - clone_file_range support (implements reflink)

    - dedupe_file_range support

    - defrag support for reverse mapping enabled filesystems"

    * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
    xfs: convert COW blocks to real blocks before unwritten extent conversion
    xfs: rework refcount cow recovery error handling
    xfs: clear reflink flag if setting realtime flag
    xfs: fix error initialization
    xfs: fix label inaccuracies
    xfs: remove isize check from unshare operation
    xfs: reduce stack usage of _reflink_clear_inode_flag
    xfs: check inode reflink flag before calling reflink functions
    xfs: implement swapext for rmap filesystems
    xfs: refactor swapext code
    xfs: various swapext cleanups
    xfs: recognize the reflink feature bit
    xfs: simulate per-AG reservations being critically low
    xfs: don't mix reflink and DAX mode for now
    xfs: check for invalid inode reflink flags
    xfs: set a default CoW extent size of 32 blocks
    xfs: convert unwritten status of reverse mappings for shared files
    xfs: use interval query for rmap alloc operations on shared files
    xfs: add shared rmap map/unmap/convert log item types
    xfs: increase log reservations for reflink
    ...

    Linus Torvalds
     

11 Oct, 2016

3 commits

  • Pull splice fixups from Al Viro:
    "A couple of fixups for interaction of pipe-backed iov_iter with
    O_DIRECT reads + constification of a couple of primitives in uio.h
    missed by previous rounds.

    Kudos to davej - his fuzzing has caught those bugs"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    [btrfs] fix check_direct_IO() for non-iovec iterators
    constify iov_iter_count() and iter_is_iovec()
    fix ITER_PIPE interaction with direct_IO

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     
  • by making sure we call iov_iter_advance() on original
    iov_iter even if direct_IO (done on its copy) has returned 0.
    It's a no-op for old iov_iter flavours and does the right thing
    (== truncation of the stuff we'd allocated, but not filled) in
    ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with
    by cleanup in generic_file_read_iter().

    Signed-off-by: Al Viro

    Al Viro
     

10 Oct, 2016

1 commit


08 Oct, 2016

4 commits

  • Al Viro
     
  • Merge updates from Andrew Morton:

    - fsnotify updates

    - ocfs2 updates

    - all of MM

    * emailed patches from Andrew Morton : (127 commits)
    console: don't prefer first registered if DT specifies stdout-path
    cred: simpler, 1D supplementary groups
    CREDITS: update Pavel's information, add GPG key, remove snail mail address
    mailmap: add Johan Hovold
    .gitattributes: set git diff driver for C source code files
    uprobes: remove function declarations from arch/{mips,s390}
    spelling.txt: "modeled" is spelt correctly
    nmi_backtrace: generate one-line reports for idle cpus
    arch/tile: adopt the new nmi_backtrace framework
    nmi_backtrace: do a local dump_stack() instead of a self-NMI
    nmi_backtrace: add more trigger_*_cpu_backtrace() methods
    min/max: remove sparse warnings when they're nested
    Documentation/filesystems/proc.txt: add more description for maps/smaps
    mm, proc: fix region lost in /proc/self/smaps
    proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
    proc: add LSM hook checks to /proc//timerslack_ns
    proc: relax /proc//timerslack_ns capability requirements
    meminfo: break apart a very long seq_printf with #ifdefs
    seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
    proc: faster /proc/*/status
    ...

    Linus Torvalds
     
  • To support DAX pmd mappings with unmodified applications, filesystems
    need to align an mmap address by the pmd size.

    Call thp_get_unmapped_area() from f_op->get_unmapped_area.

    Note, there is no change in behavior for a non-DAX file.

    Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com
    Signed-off-by: Toshi Kani
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Theodore Ts'o
    Cc: Andreas Dilger
    Cc: Mike Kravetz
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Pull VFS splice updates from Al Viro:
    "There's a bunch of branches this cycle, both mine and from other folks
    and I'd rather send pull requests separately.

    This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
    (and introduction of such). Gets rid of a lot of code in fs/splice.c
    and elsewhere; there will be followups, but these are for the next
    cycle... Some pipe/splice-related cleanups from Miklos in the same
    branch as well"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    pipe: fix comment in pipe_buf_operations
    pipe: add pipe_buf_steal() helper
    pipe: add pipe_buf_confirm() helper
    pipe: add pipe_buf_release() helper
    pipe: add pipe_buf_get() helper
    relay: simplify relay_file_read()
    switch default_file_splice_read() to use of pipe-backed iov_iter
    switch generic_file_splice_read() to use of ->read_iter()
    new iov_iter flavour: pipe-backed
    fuse_dev_splice_read(): switch to add_to_pipe()
    skb_splice_bits(): get rid of callback
    new helper: add_to_pipe()
    splice: lift pipe_lock out of splice_to_pipe()
    splice: switch get_iovec_page_array() to iov_iter
    splice_to_pipe(): don't open-code wakeup_pipe_readers()
    consistent treatment of EFAULT on O_DIRECT read/write

    Linus Torvalds
     

06 Oct, 2016

3 commits

  • Since we don't have a strategy for handling both DAX and reflink,
    for now we'll just prohibit both being set at the same time.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Trim CoW reservations made on behalf of a cowextsz hint if they get too
    old or we run low on quota, so long as we don't have dirty data awaiting
    writeback or directio operations in progress.

    Garbage collection of the cowextsize extents are kept separate from
    prealloc extent reaping because setting the CoW prealloc lifetime to a
    (much) higher value than the regular prealloc extent lifetime has been
    useful for combatting CoW fragmentation on VM hosts where the VMs
    experience bursty write behaviors and we can keep the utilization ratios
    low enough that we don't start to run out of space. IOWs, it benefits
    us to keep the CoW fork reservations around for as long as we can unless
    we run out of blocks or hit inode reclaim.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Unshare all shared extents if the user calls fallocate with the new
    unshare mode flag set, so that we can guarantee that a subsequent
    write will not ENOSPC.

    Signed-off-by: Darrick J. Wong
    [hch: pass inode instead of file to xfs_reflink_dirty_range,
    use iomap infrastructure for copy up]
    Signed-off-by: Christoph Hellwig

    Darrick J. Wong