13 Apr, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     

08 Apr, 2014

1 commit

  • filemap_map_pages() is generic implementation of ->map_pages() for
    filesystems who uses page cache.

    It should be safe to use filemap_map_pages() for ->map_pages() if
    filesystem use filemap_fault() for ->fault().

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Alexander Viro
    Cc: Dave Chinner
    Cc: Ning Qu
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

07 Apr, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "Nothing major: the stricter permissions checking for sysfs broke a
    staging driver; fix included. Greg KH said he'd take the patch but
    hadn't as the merge window opened, so it's included here to avoid
    breaking build"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    staging: fix up speakup kobject mode
    Use 'E' instead of 'X' for unsigned module taint flag.
    VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
    kallsyms: fix percpu vars on x86-64 with relocation.
    kallsyms: generalize address range checking
    module: LLVMLinux: Remove unused function warning from __param_check macro
    Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
    module: remove MODULE_GENERIC_TABLE
    module: allow multiple calls to MODULE_DEVICE_TABLE() per module
    module: use pr_cont

    Linus Torvalds
     

05 Apr, 2014

2 commits

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     
  • Pull fuse update from Miklos Szeredi:
    "This series adds cached writeback support to fuse, improving write
    throughput"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: fix "uninitialized variable" warning
    fuse: Turn writeback cache on
    fuse: Fix O_DIRECT operations vs cached writeback misorder
    fuse: fuse_flush() should wait on writeback
    fuse: Implement write_begin/write_end callbacks
    fuse: restructure fuse_readpage()
    fuse: Flush files on wb close
    fuse: Trust kernel i_mtime only
    fuse: Trust kernel i_size only
    fuse: Connection bit for enabling writeback
    fuse: Prepare to handle short reads
    fuse: Linking file to inode helper

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

02 Apr, 2014

16 commits

  • Fix the following warning:

    In file included from include/linux/fs.h:16:0,
    from fs/fuse/fuse_i.h:13,
    from fs/fuse/file.c:9:
    fs/fuse/file.c: In function 'fuse_file_poll':
    include/linux/rbtree.h:82:28: warning: 'parent' may be used
    uninitialized in this function [-Wmaybe-uninitialized]
    fs/fuse/file.c:2592:27: note: 'parent' was declared here

    Signed-off-by: Rajat Jain
    Signed-off-by: Miklos Szeredi

    Rajat Jain
     
  • Introduce a bit kernel and userspace exchange between each-other on
    the init stage and turn writeback on if the userspace want this and
    mount option 'allow_wbcache' is present (controlled by fusermount).

    Also add each writable file into per-inode write list and call the
    generic_file_aio_write to make use of the Linux page cache engine.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • The problem is:

    1. write cached data to a file
    2. read directly from the same file (via another fd)

    The 2nd operation may read stale data, i.e. the one that was in a file
    before the 1st op. Problem is in how fuse manages writeback.

    When direct op occurs the core kernel code calls filemap_write_and_wait
    to flush all the cached ops in flight. But fuse acks the writeback right
    after the ->writepages callback exits w/o waiting for the real write to
    happen. Thus the subsequent direct op proceeds while the real writeback
    is still in flight. This is a problem for backends that reorder operation.

    Fix this by making the fuse direct IO callback explicitly wait on the
    in-flight writeback to finish.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • The aim of .flush fop is to hint file-system that flushing its state or caches
    or any other important data to reliable storage would be desirable now.
    fuse_flush() passes this hint by sending FUSE_FLUSH request to userspace.
    However, dirty pages and pages under writeback may be not visible to userspace
    yet if we won't ensure it explicitly.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • The .write_begin and .write_end are requiered to use generic routines
    (generic_file_aio_write --> ... --> generic_perform_write) for buffered
    writes.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • Move the code filling and sending read request to a separate function. Future
    patches will use it for .write_begin -- partial modification of a page
    requires reading the page from the storage very similarly to what fuse_readpage
    does.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Any write request requires a file handle to report to the userspace. Thus
    when we close a file (and free the fuse_file with this info) we have to
    flush all the outstanding dirty pages.

    filemap_write_and_wait() is enough because every page under fuse writeback
    is accounted in ff->count. This delays actual close until all fuse wb is
    completed.

    In case of "write cache" turned off, the flush is ensured by fuse_vma_close().

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • Let the kernel maintain i_mtime locally:
    - clear S_NOCMTIME
    - implement i_op->update_time()
    - flush mtime on fsync and last close
    - update i_mtime explicitly on truncate and fallocate

    Fuse inode flag FUSE_I_MTIME_DIRTY serves as indication that local i_mtime
    should be flushed to the server eventually.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • Make fuse think that when writeback is on the inode's i_size is always
    up-to-date and not update it with the value received from the userspace.
    This is done because the page cache code may update i_size without letting
    the FS know.

    This assumption implies fixing the previously introduced short-read helper --
    when a short read occurs the 'hole' is filled with zeroes.

    fuse_file_fallocate() is also fixed because now we should keep i_size up to
    date, so it must be updated if FUSE_FALLOCATE request succeeded.

    Signed-off-by: Maxim V. Patlasov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • Off (0) by default. Will be used in the next patches and will be turned
    on at the very end.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • A helper which gets called when read reports less bytes than was requested.
    See patch "trust kernel i_size only" for details.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • When writeback is ON every writeable file should be in per-inode write list,
    not only mmap-ed ones. Thus introduce a helper for this linkage.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Miklos Szeredi

    Pavel Emelyanov
     
  • always equal to &iocb->ki_pos.

    Signed-off-by: Al Viro

    Al Viro
     
  • ... it does that itself (via kmap_atomic())

    Signed-off-by: Al Viro

    Al Viro
     
  • all pipe_buffer_operations have the same instances of those...

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     

24 Mar, 2014

1 commit

  • Summary of http://lkml.org/lkml/2014/3/14/363 :

    Ted: module_param(queue_depth, int, 444)
    Joe: 0444!
    Rusty: User perms >= group perms >= other perms?
    Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2?

    Side effect of stricter permissions means removing the unnecessary
    S_IFREG from several callers.

    Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair
    number of drivers fail this test, so that will be the debate for a
    future patch.

    Suggested-by: Joe Perches
    Acked-by: Bjorn Helgaas for drivers/pci/slot.c
    Acked-by: Greg Kroah-Hartman
    Cc: Miklos Szeredi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Rusty Russell

    Rusty Russell
     

13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

29 Jan, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

26 Jan, 2014

1 commit

  • So far I've had one ACK for this, and no other comments. So I think it
    is probably time to send this via some suitable tree. I'm guessing that
    the vfs tree would be the most appropriate route, but not sure that
    there is one at the moment (don't see anything recent at kernel.org)
    so in that case I think -mm is the "back up plan". Al, please let me
    know if you will take this?

    Steve.

    ---------------------

    Following on from the "Re: [PATCH v3] vfs: fix a bug when we do some dio
    reads with append dio writes" thread on linux-fsdevel, this patch is my
    current version of the fix proposed as option (b) in that thread.

    Removing the i_size test from the direct i/o read path at vfs level
    means that filesystems now have to deal with requests which are beyond
    i_size themselves. These I've divided into three sets:

    a) Those with "no op" ->direct_IO (9p, cifs, ceph)
    These are obviously not going to be an issue

    b) Those with "home brew" ->direct_IO (nfs, fuse)
    I've been told that NFS should not have any problem with the larger
    i_size, however I've added an extra test to FUSE to duplicate the
    original behaviour just to be on the safe side.

    c) Those using __blockdev_direct_IO()
    These call through to ->get_block() which should deal with the EOF
    condition correctly. I've verified that with GFS2 and I believe that
    Zheng has verified it for ext4. I've also run the test on XFS and it
    passes both before and after this change.

    The part of the patch in filemap.c looks a lot larger than it really is
    - there are only two lines of real change. The rest is just indentation
    of the contained code.

    There remains a test of i_size though, which was added for btrfs. It
    doesn't cause the other filesystems a problem as the test is performed
    after ->direct_IO has been called. It is possible that there is a race
    that does matter to btrfs, however this patch doesn't change that, so
    its still an overall improvement.

    Signed-off-by: Steven Whitehouse
    Reported-by: Zheng Liu
    Cc: Jan Kara
    Cc: Dave Chinner
    Acked-by: Miklos Szeredi
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Signed-off-by: Al Viro

    Steven Whitehouse
     

23 Jan, 2014

4 commits

  • open/release operations require userspace transitions to keep track
    of the open count and to perform any FS-specific setup. However,
    for some purely read-only FSs which don't need to perform any setup
    at open/release time, we can avoid the performance overhead of
    calling into userspace for open/release calls.

    This patch adds the necessary support to the fuse kernel modules to prevent
    open/release operations from hitting in userspace. When the client returns
    ENOSYS, we avoid sending the subsequent release to userspace, and also
    remember this so that future opens also don't trigger a userspace
    operation.

    Signed-off-by: Miklos Szeredi

    Andrew Gallagher
     
  • Various read operations (e.g. readlink, readdir) invalidate the cached
    attrs for atime changes. This patch adds a new function
    'fuse_invalidate_atime', which checks for a read-only super block and
    avoids the attr invalidation in that case.

    Signed-off-by: Andrew Gallagher
    Signed-off-by: Miklos Szeredi

    Andrew Gallagher
     
  • As noticed by Coverity the "num != 0" condition never triggers. Instead it
    should check for a complete page.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Having this struct in module memory could Oops when if the module is
    unloaded while the buffer still persists in a pipe.

    Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
    merge them into nosteal_pipe_buf_ops (this is the same as
    default_pipe_buf_ops except stealing the page from the buffer is not
    allowed).

    Reported-by: Al Viro
    Signed-off-by: Miklos Szeredi
    Cc: stable@vger.kernel.org

    Miklos Szeredi
     

13 Nov, 2013

1 commit

  • Pull vfs updates from Al Viro:
    "All kinds of stuff this time around; some more notable parts:

    - RCU'd vfsmounts handling
    - new primitives for coredump handling
    - files_lock is gone
    - Bruce's delegations handling series
    - exportfs fixes

    plus misc stuff all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
    ecryptfs: ->f_op is never NULL
    locks: break delegations on any attribute modification
    locks: break delegations on link
    locks: break delegations on rename
    locks: helper functions for delegation breaking
    locks: break delegations on unlink
    namei: minor vfs_unlink cleanup
    locks: implement delegations
    locks: introduce new FL_DELEG lock flag
    vfs: take i_mutex on renamed file
    vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
    vfs: don't use PARENT/CHILD lock classes for non-directories
    vfs: pull ext4's double-i_mutex-locking into common code
    exportfs: fix quadratic behavior in filehandle lookup
    exportfs: better variable name
    exportfs: move most of reconnect_path to helper function
    exportfs: eliminate unused "noprogress" counter
    exportfs: stop retrying once we race with rename/remove
    exportfs: clear DISCONNECTED on all parents sooner
    exportfs: more detailed comment for path_reconnect
    ...

    Linus Torvalds
     

05 Nov, 2013

4 commits

  • All async fuse requests must be supplied with extra reference to a fuse
    file. This is necessary to ensure that the fuse file is not released until
    all in-flight requests are completed. Fuse secondary writeback requests
    must obey this rule as well.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • BDI_WRITTEN counter is used to estimate bdi bandwidth. It must be
    incremented every time as bdi ends page writeback. No matter whether it
    was fulfilled by actual write or by discarding the request (e.g. due to
    shrunk i_size).

    Note that even before writepages patches, the case "Got truncated off
    completely" was handled in fuse_send_writepage() by calling
    fuse_writepage_finish() which updated BDI_WRITTEN unconditionally.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • If writeback happens while fuse is in FUSE_NOWRITE condition, the request
    will be queued but not processed immediately (see fuse_flush_writepages()).
    Until FUSE_NOWRITE becomes relaxed, more writebacks can happen. They will
    be queued as "secondary" requests to that first ("primary") request.

    Existing implementation crops only primary request. This is not correct
    because a subsequent extending write(2) may increase i_size and then
    secondary requests won't be cropped properly. The result would be stale
    data written to the server to a file offset where zeros must be.

    Similar problem may happen if secondary requests are attached to an
    in-flight request that was already cropped.

    The patch solves the issue by cropping all secondary requests in
    fuse_writepage_end(). Thanks to Miklos for idea.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     
  • fuse_writepage_in_flight() returns false if it fails to find request with
    given index in fi->writepages. Then the caller proceeds with populating
    data->orig_pages[] and incrementing req->num_pages. Hence,
    fuse_writepage_in_flight() must revert changes it made in request before
    returning false.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi

    Maxim Patlasov
     

25 Oct, 2013

2 commits


01 Oct, 2013

3 commits

  • This allows udev (or more recently systemd-tmpfiles) to create /dev/cuse on
    boot, in the same way as /dev/fuse is currently created, and the corresponding
    module to be loaded on first access.

    The corresponding functionalty was introduced for fuse in commit 578454f.

    Signed-off-by: Tom Gundersen
    Cc: Kay Sievers
    Signed-off-by: Miklos Szeredi

    Tom Gundersen
     
  • If ->writepage() tries to write back a page whose copy is still in flight,
    then just skip by calling redirty_page_for_writepage().

    This is OK, since now ->writepage() should never be called for data
    integrity sync.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • As Maxim Patlasov pointed out, it's possible to get a dirty page while it's
    copy is still under writeback, despite fuse_page_mkwrite() doing its thing
    (direct IO).

    This could result in two concurrent write request for the same offset, with
    data corruption if they get mixed up.

    To prevent this, fuse needs to check and delay such writes. This
    implementation does this by:

    1. check if page is still under writeout, if so create a new, single page
    secondary request for it

    2. chain this secondary request onto the in-flight request

    2/a. if a seconday request for the same offset was already chained to the
    in-flight request, then just copy the contents of the page and discard
    the new secondary request. This makes sure that for each page will
    have at most two requests associated with it

    3. when the in-flight request finished, send off all secondary requests
    chained onto it

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi