17 Aug, 2022

3 commits

  • [ Upstream commit c64797809a64c73497082aa05e401a062ec1af34 ]

    The commit 15c8e72e88e0 ("fuse: allow skipping control interface and forced
    unmount") tries to remove the control interface for virtio-fs since it does
    not support aborting requests which are being processed. But it doesn't
    work now.

    This patch fixes it by skipping creating the control interface if
    fuse_conn->no_control is set.

    Fixes: 15c8e72e88e0 ("fuse: allow skipping control interface and forced unmount")
    Signed-off-by: Xie Yongji
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Sasha Levin

    Xie Yongji
     
  • commit 02c0cab8e7345b06f1c0838df444e2902e4138d3 upstream.

    Overlayfs may fail to complete updates when a filesystem lacks
    fileattr/xattr syscall support and responds with an ENOSYS error code,
    resulting in an unexpected "Function not implemented" error.

    This bug may occur with FUSE filesystems, such as davfs2.

    Steps to reproduce:

    # install davfs2, e.g., apk add davfs2
    mkdir /test mkdir /test/lower /test/upper /test/work /test/mnt
    yes '' | mount -t davfs -o ro http://some-web-dav-server/path \
    /test/lower
    mount -t overlay -o upperdir=/test/upper,lowerdir=/test/lower \
    -o workdir=/test/work overlay /test/mnt

    # when "some-file" exists in the lowerdir, this fails with "Function
    # not implemented", with dmesg showing "overlayfs: failed to retrieve
    # lower fileattr (/some-file, err=-38)"
    touch /test/mnt/some-file

    The underlying cause of this regresion is actually in FUSE, which fails to
    translate the ENOSYS error code returned by userspace filesystem (which
    means that the ioctl operation is not supported) to ENOTTY.

    Reported-by: Christian Kohlschütter
    Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags")
    Fixes: 59efec7b9039 ("fuse: implement ioctl support")
    Cc:
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 47912eaa061a6a81e4aa790591a1874c650733c0 upstream.

    Limit nanoseconds to 0..999999999.

    Fixes: d8a5ba45457e ("[PATCH] FUSE - core")
    Cc:
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

01 May, 2022

1 commit

  • commit a6294593e8a1290091d0b078d5d33da5e0cd3dfe upstream

    Turn iov_iter_fault_in_readable into a function that returns the number
    of bytes not faulted in, similar to copy_to_user, instead of returning a
    non-zero value when any of the requested pages couldn't be faulted in.
    This supports the existing users that require all pages to be faulted in
    as well as new users that are happy if any pages can be faulted in.

    Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
    sure this change doesn't silently break things.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Anand Jain
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     

16 Mar, 2022

2 commits

  • commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream.

    In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
    fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
    imports the write buffer with fuse_get_user_pages(), which uses
    iov_iter_get_pages() to grab references to userspace pages instead of
    actually copying memory.

    On the filesystem device side, these pages can then either be read to
    userspace (via fuse_dev_read()), or splice()d over into a pipe using
    fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.

    This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
    the userspace filesystem can mark the request as completed, causing write()
    to return. At that point, the userspace filesystem should no longer have
    access to the pipe buffer.

    Fix by copying pages coming from the user address space to new pipe
    buffers.

    Reported-by: Jann Horn
    Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device")
    Cc:
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit a679a61520d8a7b0211a1da990404daf5cc80b72 upstream.

    The fileattr API conversion broke lsattr on ntfs3g.

    Previously the ioctl(... FS_IOC_GETFLAGS) returned an EINVAL error, but
    after the conversion the error returned by the fuse filesystem was not
    propagated back to the ioctl() system call, resulting in success being
    returned with bogus values.

    Fix by checking for outarg.result in fuse_priv_ioctl(), just as generic
    ioctl code does.

    Reported-by: Jean-Pierre André
    Fixes: 72227eac177d ("fuse: convert to fileattr")
    Cc: # v5.13
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

27 Jan, 2022

1 commit

  • commit e388164ea385f04666c4633f5dc4f951fca71890 upstream.

    The acceptable maximum value of lend parameter in
    filemap_write_and_wait_range() is LLONG_MAX rather than -1. And there is
    also some logic depending on LLONG_MAX check in write_cache_pages(). So
    let's pass LLONG_MAX to filemap_write_and_wait_range() in
    fuse_writeback_range() instead.

    Fixes: 59bda8ecee2f ("fuse: flush extending writes")
    Signed-off-by: Xie Yongji
    Cc: # v5.15
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Xie Yongji
     

22 Dec, 2021

1 commit


17 Dec, 2021

1 commit

  • commit 5c791fe1e2a4f401f819065ea4fc0450849f1818 upstream.

    In writeback cache mode mtime/ctime updates are cached, and flushed to the
    server using the ->write_inode() callback.

    Closing the file will result in a dirty inode being immediately written,
    but in other cases the inode can remain dirty after all references are
    dropped. This result in the inode being written back from reclaim, which
    can deadlock on a regular allocation while the request is being served.

    The usual mechanisms (GFP_NOFS/PF_MEMALLOC*) don't work for FUSE, because
    serving a request involves unrelated userspace process(es).

    Instead do the same as for dirty pages: make sure the inode is written
    before the last reference is gone.

    - fallocate(2)/copy_file_range(2): these call file_update_time() or
    file_modified(), so flush the inode before returning from the call

    - unlink(2), link(2) and rename(2): these call fuse_update_ctime(), so
    flush the ctime directly from this helper

    Reported-by: chenguanyou
    Signed-off-by: Miklos Szeredi
    Cc: Ed Tsai
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

01 Dec, 2021

1 commit

  • commit 473441720c8616dfaf4451f9c7ea14f0eb5e5d65 upstream.

    Checking buf->flags should be done before the pipe_buf_release() is called
    on the pipe buffer, since releasing the buffer might modify the flags.

    This is exactly what page_cache_pipe_buf_release() does, and which results
    in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
    trying to fix.

    Reported-by: Justin Forbes
    Fixes: 712a951025c0 ("fuse: fix page stealing")
    Cc: # v2.6.35
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

19 Nov, 2021

1 commit

  • commit 712a951025c0667ff00b25afc360f74e639dfabe upstream.

    It is possible to trigger a crash by splicing anon pipe bufs to the fuse
    device.

    The reason for this is that anon_pipe_buf_release() will reuse buf->page if
    the refcount is 1, but that page might have already been stolen and its
    flags modified (e.g. PG_lru added).

    This happens in the unlikely case of fuse_dev_splice_write() getting around
    to calling pipe_buf_release() after a page has been stolen, added to the
    page cache and removed from the page cache.

    Fix by calling pipe_buf_release() right after the page was inserted into
    the page cache. In this case the page has an elevated refcount so any
    release function will know that the page isn't reusable.

    Reported-by: Frank Dinoff
    Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1yw@mail.gmail.com/
    Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
    Cc: # v2.6.35
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

21 Oct, 2021

5 commits

  • Instead of "goto err", return error directly, since there's no error
    cleanup to do now.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Syzkaller reports a null pointer dereference in fuse_test_super() that is
    caused by sb->s_fs_info being NULL.

    This is due to the fact that fuse_fill_super() is initializing s_fs_info,
    which is too late, it's already on the fs_supers list. The initialization
    needs to be done in sget_fc() with the sb_lock held.

    Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into
    fuse_get_tree().

    After this ->kill_sb() will always be called with non-NULL ->s_fs_info,
    hence fuse_mount_destroy() can drop the test for non-NULL "fm".

    Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com
    Fixes: 5d5b74aa9c76 ("fuse: allow sharing existing sb")
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • 1. call fuse_mount_destroy() for open coded variants

    2. before deactivate_locked_super() don't need fuse_mount destruction since
    that will now be done (if ->s_fs_info is not cleared)

    3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the
    regular pattern can be used

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The ->put_super callback is called from generic_shutdown_super() in case of
    a fully initialized sb. This is called from kill_***_super(), which is
    called from ->kill_sb instances.

    Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the
    reference to the fuse_conn, while it does the same on each error case
    during sb setup.

    This patch moves the destruction from fuse_put_super() to
    fuse_mount_destroy(), called at the end of all ->kill_sb instances. A
    follup patch will clean up the error paths.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Checking "fm" works because currently sb->s_fs_info is cleared on error
    paths; however, sb->s_root is what generic_shutdown_super() checks to
    determine whether the sb was fully initialized or not.

    This change will allow cleanup of sb setup error paths.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

08 Sep, 2021

1 commit

  • Pull fuse updates from Miklos Szeredi:

    - Allow mounting an active fuse device. Previously the fuse device
    would always be mounted during initialization, and sharing a fuse
    superblock was only possible through mount or namespace cloning

    - Fix data flushing in syncfs (virtiofs only)

    - Fix data flushing in copy_file_range()

    - Fix a possible deadlock in atomic O_TRUNC

    - Misc fixes and cleanups

    * tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: remove unused arg in fuse_write_file_get()
    fuse: wait for writepages in syncfs
    fuse: flush extending writes
    fuse: truncate pagecache on atomic_o_trunc
    fuse: allow sharing existing sb
    fuse: move fget() to fuse_get_tree()
    fuse: move option checking into fuse_fill_super()
    fuse: name fs_context consistently
    fuse: fix use after free in fuse_read_interrupt()

    Linus Torvalds
     

06 Sep, 2021

2 commits

  • The struct fuse_conn argument is not used and can be removed.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • In case of fuse the MM subsystem doesn't guarantee that page writeback
    completes by the time ->sync_fs() is called. This is because fuse
    completes page writeback immediately to prevent DoS of memory reclaim by
    the userspace file server.

    This means that fuse itself must ensure that writes are synced before
    sending the SYNCFS request to the server.

    Introduce sync buckets, that hold a counter for the number of outstanding
    write requests. On syncfs replace the current bucket with a new one and
    wait until the old bucket's counter goes down to zero.

    It is possible to have multiple syncfs calls in parallel, in which case
    there could be more than one waited-on buckets. Descendant buckets must
    not complete until the parent completes. Add a count to the child (new)
    bucket until the (parent) old bucket completes.

    Use RCU protection to dereference the current bucket and to wake up an
    emptied bucket. Use fc->lock to protect against parallel assignments to
    the current bucket.

    This leaves just the counter to be a possible scalability issue. The
    fc->num_waiting counter has a similar issue, so both should be addressed at
    the same time.

    Reported-by: Amir Goldstein
    Fixes: 2d82ab251ef0 ("virtiofs: propagate sync() to file server")
    Cc: # v5.14
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

03 Sep, 2021

1 commit

  • Pull overlayfs update from Miklos Szeredi:

    - Copy up immutable/append/sync/noatime attributes (Amir Goldstein)

    - Improve performance by enabling RCU lookup.

    - Misc fixes and improvements

    The reason this touches so many files is that the ->get_acl() method now
    gets a "bool rcu" argument. The ->get_acl() API was updated based on
    comments from Al and Linus:

    Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/

    * tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: enable RCU'd ->get_acl()
    vfs: add rcu argument to ->get_acl() callback
    ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
    ovl: use kvalloc in xattr copy-up
    ovl: update ctime when changing fileattr
    ovl: skip checking lower file's i_writecount on truncate
    ovl: relax lookup error on mismatch origin ftype
    ovl: do not set overlay.opaque for new directories
    ovl: add ovl_allow_offline_changes() helper
    ovl: disable decoding null uuid with redirect_dir
    ovl: consistent behavior for immutable/append-only inodes
    ovl: copy up sync/noatime fileattr flags
    ovl: pass ovl_fs to ovl_check_setxattr()
    fs: add generic helper for filling statx attribute flags

    Linus Torvalds
     

31 Aug, 2021

2 commits

  • Callers of fuse_writeback_range() assume that the file is ready for
    modification by the server in the supplied byte range after the call
    returns.

    If there's a write that extends the file beyond the end of the supplied
    range, then the file needs to be extended to at least the end of the range,
    but currently that's not done.

    There are at least two cases where this can cause problems:

    - copy_file_range() will return short count if the file is not extended
    up to end of the source range.

    - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
    hence the region may not be fully allocated.

    Fix by flushing writes from the start of the range up to the end of the
    file. This could be optimized if the writes are non-extending, etc, but
    it's probably not worth the trouble.

    Fixes: a2bc92362941 ("fuse: fix copy_file_range() in the writeback case")
    Fixes: 6b1bdb56b17c ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)")
    Cc: # v5.2
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Pull fs hole punching vs cache filling race fixes from Jan Kara:
    "Fix races leading to possible data corruption or stale data exposure
    in multiple filesystems when hole punching races with operations such
    as readahead.

    This is the series I was sending for the last merge window but with
    your objection fixed - now filemap_fault() has been modified to take
    invalidate_lock only when we need to create new page in the page cache
    and / or bring it uptodate"

    * tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    filesystems/locking: fix Malformed table warning
    cifs: Fix race between hole punch and page fault
    ceph: Fix race between hole punch and page fault
    fuse: Convert to using invalidate_lock
    f2fs: Convert to using invalidate_lock
    zonefs: Convert to using invalidate_lock
    xfs: Convert double locking of MMAPLOCK to use VFS helpers
    xfs: Convert to use invalidate_lock
    xfs: Refactor xfs_isilocked()
    ext2: Convert to using invalidate_lock
    ext4: Convert to use mapping->invalidate_lock
    mm: Add functions to lock invalidate_lock for two mappings
    mm: Protect operations adding pages to page cache with invalidate_lock
    documentation: Sync file_operations members with reality
    mm: Fix comments mentioning i_mutex

    Linus Torvalds
     

19 Aug, 2021

1 commit


18 Aug, 2021

1 commit

  • fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic
    O_TRUNC. This can deadlock with fuse_wait_on_page_writeback() in
    fuse_launder_page() triggered by invalidate_inode_pages2().

    Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a
    truncate_pagecache() call. This makes sense regardless of FOPEN_KEEP_CACHE
    or fc->writeback cache, so do it unconditionally.

    Reported-by: Xie Yongji
    Reported-and-tested-by: syzbot+bea44a5189836d956894@syzkaller.appspotmail.com
    Fixes: e4648309b85a ("fuse: truncate pending writes on O_TRUNC")
    Cc:
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

12 Aug, 2021

1 commit


05 Aug, 2021

2 commits

  • Make it possible to create a new mount from a already working server.

    Here's a detailed description of the problem from Jakob:

    "The background for this question is occasional problems we see with our
    fuse filesystem [1] and mount namespaces. On a usual client, we have
    system-wide, autofs managed mountpoints. When a new mount namespace is
    created (which can be done unprivileged in combination with user
    namespaces), it can happen that a mountpoint is used inside the new
    namespace but idle in the root mount namespace. So autofs unmounts the
    parent, system-wide mountpoint. But the fuse module stays active and
    still serves mountpoint in the child mount namespace. Because the fuse
    daemon also blocks other system wide resources corresponding to the
    mountpoint, this situation effectively prevents new mounts until the
    child mount namespaces closes.

    [1] https://github.com/cvmfs/cvmfs"

    Reported-by: Jakob Blomer
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Affected call chains:

    fuse_get_tree
    -> get_tree_(bdev|nodev)
    -> fuse_fill_super

    Needed for following patch.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

04 Aug, 2021

3 commits

  • Checking whether the "fd=", "rootmode=", "user_id=" and "group_id=" mount
    options are present can be moved from fuse_get_tree() into
    fuse_fill_super() where the value of the options are consumed.

    This relaxes semantics of reusing a fuse blockdev mount using the device
    name. Before this patch presence of these options were enforced but values
    ignored, after this patch these options are completely ignored in this
    case.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Naming convention under fs/fuse/:

    struct fuse_conn *fc;
    struct fs_context *fsc;

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • There is a potential race between fuse_read_interrupt() and
    fuse_request_end().

    TASK1
    in fuse_read_interrupt(): delete req->intr_entry (while holding
    fiq->lock)

    TASK2
    in fuse_request_end(): req->intr_entry is empty -> skip fiq->lock
    wake up TASK3

    TASK3
    request is freed

    TASK1
    in fuse_read_interrupt(): dereference req->in.h.unique ***BAM***

    Fix by always grabbing fiq->lock if the request was ever interrupted
    (FR_INTERRUPTED set) thereby serializing with concurrent
    fuse_read_interrupt() calls.

    FR_INTERRUPTED is set before the request is queued on fiq->interrupts.
    Dequeing the request is done with list_del_init() but FR_INTERRUPTED is not
    cleared in this case.

    Reported-by: lijiazi
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

13 Jul, 2021

1 commit

  • Use invalidate_lock instead of fuse's private i_mmap_sem. The intended
    purpose is exactly the same. By this conversion we fix a long standing
    race between hole punching and read(2) / readahead(2) paths that can
    lead to stale page cache contents.

    CC: Miklos Szeredi
    Reviewed-by: Miklos Szeredi
    Signed-off-by: Jan Kara

    Jan Kara
     

08 Jul, 2021

1 commit

  • fuse_dax_mem_range_init() does not need the address or the pfn of the
    memory requested in dax_direct_access(). It is only calling direct
    access to get the number of pages.

    Remove the unused variables and stop requesting the kaddr and pfn from
    dax_direct_access().

    Reviewed-by: Dan Williams
    Signed-off-by: Ira Weiny
    Reviewed-by: Vivek Goyal
    Link: https://lore.kernel.org/r/20210525172428.3634316-2-ira.weiny@intel.com
    Signed-off-by: Dan Williams

    Ira Weiny
     

07 Jul, 2021

1 commit

  • Pull fuse updates from Miklos Szeredi:

    - Fixes for virtiofs submounts

    - Misc fixes and cleanups

    * tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    virtiofs: Fix spelling mistakes
    fuse: use DIV_ROUND_UP helper macro for calculations
    fuse: fix illegal access to inode with reused nodeid
    fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)
    fuse: Make fuse_fill_super_submount() static
    fuse: Switch to fc_mount() for submounts
    fuse: Call vfs_get_tree() for submounts
    fuse: add dedicated filesystem context ops for submounts
    virtiofs: propagate sync() to file server
    fuse: reject internal errno
    fuse: check connected before queueing on fpq->io
    fuse: ignore PG_workingset after stealing
    fuse: Fix infinite loop in sget_fc()
    fuse: Fix crash if superblock of submount gets killed early
    fuse: Fix crash in fuse_dentry_automount() error path

    Linus Torvalds
     

04 Jul, 2021

1 commit

  • Pull iov_iter updates from Al Viro:
    "iov_iter cleanups and fixes.

    There are followups, but this is what had sat in -next this cycle. IMO
    the macro forest in there became much thinner and easier to follow..."

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
    csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller
    clean up copy_mc_pipe_to_iter()
    pipe_zero(): we don't need no stinkin' kmap_atomic()...
    iov_iter: clean csum_and_copy_...() primitives up a bit
    copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases
    copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases
    iterate_xarray(): only of the first iteration we might get offset != 0
    pull handling of ->iov_offset into iterate_{iovec,bvec,xarray}
    iov_iter: make iterator callbacks use base and len instead of iovec
    iov_iter: make the amount already copied available to iterator callbacks
    iov_iter: get rid of separate bvec and xarray callbacks
    iov_iter: teach iterate_{bvec,xarray}() about possible short copies
    iterate_bvec(): expand bvec.h macro forest, massage a bit
    iov_iter: unify iterate_iovec and iterate_kvec
    iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec
    iterate_and_advance(): get rid of magic in case when n is 0
    csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter()
    iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
    [xarray] iov_iter_npages(): just use DIV_ROUND_UP()
    iov_iter_npages(): don't bother with iterate_all_kinds()
    ...

    Linus Torvalds
     

30 Jun, 2021

2 commits

  • These functions implement the address_space ->set_page_dirty operation and
    should live in pagemap.h, not mm.h so that the rest of the kernel doesn't
    get funny ideas about calling them directly.

    Link: https://lkml.kernel.org/r/20210615162342.1669332-7-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Use __set_page_dirty_no_writeback() instead. This will set the dirty bit
    on the page, which will be used to avoid calling set_page_dirty() in the
    future. It will have no effect on actually writing the page back, as the
    pages are not on any LRU lists.

    [akpm@linux-foundation.org: export __set_page_dirty_no_writeback() to modules]

    Link: https://lkml.kernel.org/r/20210615162342.1669332-6-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

22 Jun, 2021

4 commits

  • Fix some spelling mistakes in comments:
    refernce ==> reference
    happnes ==> happens
    threhold ==> threshold
    splitted ==> split
    mached ==> matched

    Signed-off-by: Zheng Yongjun
    Signed-off-by: Miklos Szeredi

    Zheng Yongjun
     
  • Replace open coded divisor calculations with the DIV_ROUND_UP kernel macro
    for better readability.

    Signed-off-by: Wu Bo
    Signed-off-by: Miklos Szeredi

    Wu Bo
     
  • Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
    with ourarg containing nodeid and generation.

    If a fuse inode is found in inode cache with the same nodeid but different
    generation, the existing fuse inode should be unhashed and marked "bad" and
    a new inode with the new generation should be hashed instead.

    This can happen, for example, with passhrough fuse filesystem that returns
    the real filesystem ino/generation on lookup and where real inode numbers
    can get recycled due to real files being unlinked not via the fuse
    passthrough filesystem.

    With current code, this situation will not be detected and an old fuse
    dentry that used to point to an older generation real inode, can be used to
    access a completely new inode, which should be accessed only via the new
    dentry.

    Note that because the FORGET message carries the nodeid w/o generation, the
    server should wait to get FORGET counts for the nlookup counts of the old
    and reused inodes combined, before it can free the resources associated to
    that nodeid.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • The current fuse module filters out fallocate(FALLOC_FL_ZERO_RANGE)
    returning -EOPNOTSUPP. libnbd's nbdfuse would like to translate
    FALLOC_FL_ZERO_RANGE requests into the NBD command
    NBD_CMD_WRITE_ZEROES which allows NBD servers that support it to do
    zeroing efficiently.

    This commit treats this flag exactly like FALLOC_FL_PUNCH_HOLE.

    A way to test this, requiring fuse >= 3, nbdkit >= 1.8 and the latest
    nbdfuse from https://gitlab.com/nbdkit/libnbd/-/tree/master/fuse is to
    create a file containing some data and "mirror" it to a fuse file:

    $ dd if=/dev/urandom of=disk.img bs=1M count=1
    $ nbdkit file disk.img
    $ touch mirror.img
    $ nbdfuse mirror.img nbd://localhost &

    (mirror.img -> nbdfuse -> NBD over loopback -> nbdkit -> disk.img)

    You can then run commands such as:

    $ fallocate -z -o 1024 -l 1024 mirror.img

    and check that the content of the original file ("disk.img") stays
    synchronized. To show NBD commands, export LIBNBD_DEBUG=1 before
    running nbdfuse. To clean up:

    $ fusermount3 -u mirror.img
    $ killall nbdkit

    Signed-off-by: Richard W.M. Jones
    Signed-off-by: Miklos Szeredi

    Richard W.M. Jones