09 Jan, 2021

1 commit

  • [ Upstream commit 5d069dbe8aaf2a197142558b6fb2978189ba3454 ]

    Jan Kara's analysis of the syzbot report (edited):

    The reproducer opens a directory on FUSE filesystem, it then attaches
    dnotify mark to the open directory. After that a fuse_do_getattr() call
    finds that attributes returned by the server are inconsistent, and calls
    make_bad_inode() which, among other things does:

    inode->i_mode = S_IFREG;

    This then confuses dnotify which doesn't tear down its structures
    properly and eventually crashes.

    Avoid calling make_bad_inode() on a live inode: switch to a private flag on
    the fuse inode. Also add the test to ops which the bad_inode_ops would
    have caught.

    This bug goes back to the initial merge of fuse in 2.6.14...

    Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
    Signed-off-by: Miklos Szeredi
    Tested-by: Jan Kara
    Cc:
    Signed-off-by: Sasha Levin

    Miklos Szeredi
     

30 Dec, 2020

1 commit

  • [ Upstream commit 66ab33bf6d4341574f88b511e856a73f6f2a921e ]

    This can be triggered for example by adding the "-omand" mount option,
    which will be rejected and virtio_fs_fill_super() will return an error.

    In such a case the allocations for fuse_conn and fuse_mount will leak due
    to s_root not yet being set and so ->put_super() not being called.

    Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Sasha Levin

    Miklos Szeredi
     

20 Oct, 2020

1 commit

  • Pull fuse updates from Miklos Szeredi:

    - Support directly accessing host page cache from virtiofs. This can
    improve I/O performance for various workloads, as well as reducing
    the memory requirement by eliminating double caching. Thanks to Vivek
    Goyal for doing most of the work on this.

    - Allow automatic submounting inside virtiofs. This allows unique
    st_dev/ st_ino values to be assigned inside the guest to files
    residing on different filesystems on the host. Thanks to Max Reitz
    for the patches.

    - Fix an old use after free bug found by Pradeep P V K.

    * tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
    virtiofs: calculate number of scatter-gather elements accurately
    fuse: connection remove fix
    fuse: implement crossmounts
    fuse: Allow fuse_fill_super_common() for submounts
    fuse: split fuse_mount off of fuse_conn
    fuse: drop fuse_conn parameter where possible
    fuse: store fuse_conn in fuse_req
    fuse: add submount support to
    fuse: fix page dereference after free
    virtiofs: add logic to free up a memory range
    virtiofs: maintain a list of busy elements
    virtiofs: serialize truncate/punch_hole and dax fault path
    virtiofs: define dax address space operations
    virtiofs: add DAX mmap support
    virtiofs: implement dax read/write operations
    virtiofs: introduce setupmapping/removemapping commands
    virtiofs: implement FUSE_INIT map_alignment field
    virtiofs: keep a list of free dax memory ranges
    virtiofs: add a mount option to enable dax
    virtiofs: set up virtio_fs dax_device
    ...

    Linus Torvalds
     

14 Oct, 2020

2 commits

  • virtiofs currently maps various buffers in scatter gather list and it looks
    at number of pages (ap->pages) and assumes that same number of pages will
    be used both for input and output (sg_count_fuse_req()), and calculates
    total number of scatterlist elements accordingly.

    But looks like this assumption is not valid in all the cases. For example,
    Cai Qian reported that trinity, triggers warning with virtiofs sometimes.
    A closer look revealed that if one calls ioctl(fd, 0x5a004000, buf), it
    will trigger following warning.

    WARN_ON(out_sgs + in_sgs != total_sgs)

    In this case, total_sgs = 8, out_sgs=4, in_sgs=3. Number of pages is 2
    (ap->pages), but out_sgs are using both the pages but in_sgs are using
    only one page. In this case, fuse_do_ioctl() sets different size values
    for input and output.

    args->in_args[args->in_numargs - 1].size == 6656
    args->out_args[args->out_numargs - 1].size == 4096

    So current method of calculating how many scatter-gather list elements
    will be used is not accurate. Make calculations more precise by parsing
    size and ap->descs.

    Reported-by: Qian Cai
    Signed-off-by: Vivek Goyal
    Link: https://lore.kernel.org/linux-fsdevel/5ea77e9f6cb8c2db43b09fbd4158ab2d8c066a0a.camel@redhat.com/
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Pull block updates from Jens Axboe:

    - Series of merge handling cleanups (Baolin, Christoph)

    - Series of blk-throttle fixes and cleanups (Baolin)

    - Series cleaning up BDI, seperating the block device from the
    backing_dev_info (Christoph)

    - Removal of bdget() as a generic API (Christoph)

    - Removal of blkdev_get() as a generic API (Christoph)

    - Cleanup of is-partition checks (Christoph)

    - Series reworking disk revalidation (Christoph)

    - Series cleaning up bio flags (Christoph)

    - bio crypt fixes (Eric)

    - IO stats inflight tweak (Gabriel)

    - blk-mq tags fixes (Hannes)

    - Buffer invalidation fixes (Jan)

    - Allow soft limits for zone append (Johannes)

    - Shared tag set improvements (John, Kashyap)

    - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

    - DM no-wait support (Mike, Konstantin)

    - Request allocation improvements (Ming)

    - Allow md/dm/bcache to use IO stat helpers (Song)

    - Series improving blk-iocost (Tejun)

    - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
    Xianting, Yang, Yufen, yangerkun)

    * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
    block: fix uapi blkzoned.h comments
    blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
    blk-mq: get rid of the dead flush handle code path
    block: get rid of unnecessary local variable
    block: fix comment and add lockdep assert
    blk-mq: use helper function to test hw stopped
    block: use helper function to test queue register
    block: remove redundant mq check
    block: invoke blk_mq_exit_sched no matter whether have .exit_sched
    percpu_ref: don't refer to ref->data if it isn't allocated
    block: ratelimit handle_bad_sector() message
    blk-throttle: Re-use the throtl_set_slice_end()
    blk-throttle: Open code __throtl_de/enqueue_tg()
    blk-throttle: Move service tree validation out of the throtl_rb_first()
    blk-throttle: Move the list operation after list validation
    blk-throttle: Fix IO hang for a corner case
    blk-throttle: Avoid tracking latency if low limit is invalid
    blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
    blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
    block: Remove redundant 'return' statement
    ...

    Linus Torvalds
     

12 Oct, 2020

1 commit


09 Oct, 2020

1 commit

  • FUSE servers can indicate crossmount points by setting FUSE_ATTR_SUBMOUNT
    in fuse_attr.flags. The inode will then be marked as S_AUTOMOUNT, and the
    .d_automount implementation creates a new submount at that location, so
    that the submount gets a distinct st_dev value.

    Note that all submounts get a distinct superblock and a distinct st_dev
    value, so for virtio-fs, even if the same filesystem is mounted more than
    once on the host, none of its mount points will have the same st_dev. We
    need distinct superblocks because the superblock points to the root node,
    but the different host mounts may show different trees (e.g. due to
    submounts in some of them, but not in others).

    Right now, this behavior is only enabled when fuse_conn.auto_submounts is
    set, which is the case only for virtio-fs.

    Signed-off-by: Max Reitz
    Signed-off-by: Miklos Szeredi

    Max Reitz
     

25 Sep, 2020

2 commits

  • Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to
    make the checks more obvious. Also remove the pointless
    bdi_cap_account_writeback wrapper that just obsfucates the check.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Set up a readahead size by default, as very few users have a good
    reason to change it. This means code, ecryptfs, and orangefs now
    set up the values while they were previously missing it, while ubifs,
    mtd and vboxsf manually set it to 0 to avoid readahead.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Acked-by: David Sterba [btrfs]
    Acked-by: Richard Weinberger [ubifs, mtd]
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Sep, 2020

6 commits

  • Submounts have their own superblock, which needs to be initialized.
    However, they do not have a fuse_fs_context associated with them, and
    the root node's attributes should be taken from the mountpoint's node.

    Extend fuse_fill_super_common() to work for submounts by making the @ctx
    parameter optional, and by adding a @submount_finode parameter.

    (There is a plain "unsigned" in an existing code block that is being
    indented by this commit. Extend it to "unsigned int" so checkpatch does
    not complain.)

    Signed-off-by: Max Reitz
    Signed-off-by: Miklos Szeredi

    Max Reitz
     
  • We want to allow submounts for the same fuse_conn, but with different
    superblocks so that each of the submounts has its own device ID. To do
    so, we need to split all mount-specific information off of fuse_conn
    into a new fuse_mount structure, so that multiple mounts can share a
    single fuse_conn.

    We need to take care only to perform connection-level actions once (i.e.
    when the fuse_conn and thus the first fuse_mount are established, or
    when the last fuse_mount and thus the fuse_conn are destroyed). For
    example, fuse_sb_destroy() must invoke fuse_send_destroy() until the
    last superblock is released.

    To do so, we keep track of which fuse_mount is the root mount and
    perform all fuse_conn-level actions only when this fuse_mount is
    involved.

    Signed-off-by: Max Reitz
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Miklos Szeredi

    Max Reitz
     
  • With the last commit, all functions that handle some existing fuse_req
    no longer need to be given the associated fuse_conn, because they can
    get it from the fuse_req object.

    Signed-off-by: Max Reitz
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Miklos Szeredi

    Max Reitz
     
  • Every fuse_req belongs to a fuse_conn. Right now, we always know which
    fuse_conn that is based on the respective device, but we want to allow
    multiple (sub)mounts per single connection, and then the corresponding
    filesystem is not going to be so trivial to obtain.

    Storing a pointer to the associated fuse_conn in every fuse_req will
    allow us to trivially find any request's superblock (and thus
    filesystem) even then.

    Signed-off-by: Max Reitz
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Miklos Szeredi

    Max Reitz
     
  • After unlock_request() pages from the ap->pages[] array may be put (e.g. by
    aborting the connection) and the pages can be freed.

    Prevent use after free by grabbing a reference to the page before calling
    unlock_request().

    The original patch was created by Pradeep P V K.

    Reported-by: Pradeep P V K
    Cc:
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • the callers rely upon having any iov_iter_truncate() done inside
    ->direct_IO() countered by iov_iter_reexpand().

    Reported-by: Qian Cai
    Tested-by: Qian Cai
    Signed-off-by: Al Viro

    Al Viro
     

10 Sep, 2020

12 commits

  • Add logic to free up a busy memory range. Freed memory range will be
    returned to free pool. Add a worker which can be started to select
    and free some busy memory ranges.

    Process can also steal one of its busy dax ranges if free range is not
    available. I will refer it to as direct reclaim.

    If free range is not available and nothing can't be stolen from same
    inode, caller waits on a waitq for free range to become available.

    For reclaiming a range, as of now we need to hold following locks in
    specified order.

    down_write(&fi->i_mmap_sem);
    down_write(&fi->dax->sem);

    We look for a free range in following order.

    A. Try to get a free range.
    B. If not, try direct reclaim.
    C. If not, wait for a memory range to become free

    Signed-off-by: Vivek Goyal
    Signed-off-by: Liu Bo
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • This list will be used selecting fuse_dax_mapping to free when number of
    free mappings drops below a threshold.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Currently in fuse we don't seem have any lock which can serialize fault
    path with truncate/punch_hole path. With dax support I need one for
    following reasons.

    1. Dax requirement

    DAX fault code relies on inode size being stable for the duration of
    fault and want to serialize with truncate/punch_hole and they explicitly
    mention it.

    static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
    const struct iomap_ops *ops)
    /*
    * Check whether offset isn't beyond end of file now. Caller is
    * supposed to hold locks serializing us with truncate / punch hole so
    * this is a reliable test.
    */
    max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

    2. Make sure there are no users of pages being truncated/punch_hole

    get_user_pages() might take references to page and then do some DMA
    to said pages. Filesystem might truncate those pages without knowing
    that a DMA is in progress or some I/O is in progress. So use
    dax_layout_busy_page() to make sure there are no such references
    and I/O is not in progress on said pages before moving ahead with
    truncation.

    3. Limitation of kvm page fault error reporting

    If we are truncating file on host first and then removing mappings in
    guest lateter (truncate page cache etc), then this could lead to a
    problem with KVM. Say a mapping is in place in guest and truncation
    happens on host. Now if guest accesses that mapping, then host will
    take a fault and kvm will either exit to qemu or spin infinitely.

    IOW, before we do truncation on host, we need to make sure that guest
    inode does not have any mapping in that region or whole file.

    4. virtiofs memory range reclaim

    Soon I will introduce the notion of being able to reclaim dax memory
    ranges from a fuse dax inode. There also I need to make sure that
    no I/O or fault is going on in the reclaimed range and nobody is using
    it so that range can be reclaimed without issues.

    Currently if we take inode lock, that serializes read/write. But it does
    not do anything for faults. So I add another semaphore fuse_inode->i_mmap_sem
    for this purpose. It can be used to serialize with faults.

    As of now, I am adding taking this semaphore only in dax fault path and
    not regular fault path because existing code does not have one. May
    be existing code can benefit from it as well to take care of some
    races, but that we can fix later if need be. For now, I am just focussing
    only on DAX path which is new path.

    Also added logic to take fuse_inode->i_mmap_sem in
    truncate/punch_hole/open(O_TRUNC) path to make sure file truncation and
    fuse dax fault are mutually exlusive and avoid all the above problems.

    Signed-off-by: Vivek Goyal
    Cc: Dave Chinner
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • This is done along the lines of ext4 and xfs. I primarily wanted
    ->writepages hook at this time so that I could call into
    dax_writeback_mapping_range(). This in turn will decide which pfns need to
    be written back.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Add DAX mmap() support.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Miklos Szeredi

    Stefan Hajnoczi
     
  • This patch implements basic DAX support. mmap() is not implemented
    yet and will come in later patches. This patch looks into implemeting
    read/write.

    We make use of interval tree to keep track of per inode dax mappings.

    Do not use dax for file extending writes, instead just send WRITE message
    to daemon (like we do for direct I/O path). This will keep write and
    i_size change atomic w.r.t crash.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Dr. David Alan Gilbert
    Signed-off-by: Vivek Goyal
    Signed-off-by: Liu Bo
    Signed-off-by: Peng Tao
    Cc: Dave Chinner
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • The device communicates FUSE_SETUPMAPPING/FUSE_REMOVMAPPING alignment
    constraints via the FUST_INIT map_alignment field. Parse this field and
    ensure our DAX mappings meet the alignment constraints.

    We don't actually align anything differently since our mappings are
    already 2MB aligned. Just check the value when the connection is
    established. If it becomes necessary to honor arbitrary alignments in
    the future we'll have to adjust how mappings are sized.

    The upshot of this commit is that we can be confident that mappings will
    work even when emulating x86 on Power and similar combinations where the
    host page sizes are different.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Stefan Hajnoczi
     
  • Divide the dax memory range into fixed size ranges (2MB for now) and put
    them in a list. This will track free ranges. Once an inode requires a
    free range, we will take one from here and put it in interval-tree
    of ranges assigned to inode.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Peng Tao
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Add a mount option to allow using dax with virtio_fs.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Setup a dax device.

    Use the shm capability to find the cache entry and map it.

    The DAX window is accessed by the fs/dax.c infrastructure and must have
    struct pages (at least on x86). Use devm_memremap_pages() to map the
    DAX window PCI BAR and allocate struct page.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Dr. David Alan Gilbert
    Signed-off-by: Vivek Goyal
    Signed-off-by: Sebastien Boeuf
    Signed-off-by: Liu Bo
    Signed-off-by: Miklos Szeredi

    Stefan Hajnoczi
     
  • This option was introduced so that for virtio_fs we don't show any mounts
    options fuse_show_options(). Because we don't offer any of these options
    to be controlled by mounter.

    Very soon we are planning to introduce option "dax" which mounter should
    be able to specify. And no_mount_options does not work anymore.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • This reduces code duplication and make it little easier to read code.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     

04 Sep, 2020

1 commit

  • As stated in https://sourceforge.net/projects/fuse/, "the FUSE project has
    moved to https://github.com/libfuse/" in 22-Dec-2015. Update URLs to
    reflect this.

    Signed-off-by: André Almeida
    Signed-off-by: Miklos Szeredi

    André Almeida
     

12 Aug, 2020

1 commit

  • Pull virtio updates from Michael Tsirkin:

    - IRQ bypass support for vdpa and IFC

    - MLX5 vdpa driver

    - Endianness fixes for virtio drivers

    - Misc other fixes

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
    vdpa/mlx5: fix up endian-ness for mtu
    vdpa: Fix pointer math bug in vdpasim_get_config()
    vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
    vdpa/mlx5: fix memory allocation failure checks
    vdpa/mlx5: Fix uninitialised variable in core/mr.c
    vdpa_sim: init iommu lock
    virtio_config: fix up warnings on parisc
    vdpa/mlx5: Add VDPA driver for supported mlx5 devices
    vdpa/mlx5: Add shared memory registration code
    vdpa/mlx5: Add support library for mlx5 VDPA implementation
    vdpa/mlx5: Add hardware descriptive header file
    vdpa: Modify get_vq_state() to return error code
    net/vdpa: Use struct for set/get vq state
    vdpa: remove hard coded virtq num
    vdpasim: support batch updating
    vhost-vdpa: support IOTLB batching hints
    vhost-vdpa: support get/set backend features
    vhost: generialize backend features setting/getting
    vhost-vdpa: refine ioctl pre-processing
    vDPA: dont change vq irq after DRIVER_OK
    ...

    Linus Torvalds
     

05 Aug, 2020

2 commits

  • Virtio fs is modern-only. Use LE accessors for config space.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • Pull uninitialized_var() macro removal from Kees Cook:
    "This is long overdue, and has hidden too many bugs over the years. The
    series has several "by hand" fixes, and then a trivial treewide
    replacement.

    - Clean up non-trivial uses of uninitialized_var()

    - Update documentation and checkpatch for uninitialized_var() removal

    - Treewide removal of uninitialized_var()"

    * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    compiler: Remove uninitialized_var() macro
    treewide: Remove uninitialized_var() usage
    checkpatch: Remove awareness of uninitialized_var() macro
    mm/debug_vm_pgtable: Remove uninitialized_var() usage
    f2fs: Eliminate usage of uninitialized_var() macro
    media: sur40: Remove uninitialized_var() usage
    KVM: PPC: Book3S PR: Remove uninitialized_var() usage
    clk: spear: Remove uninitialized_var() usage
    clk: st: Remove uninitialized_var() usage
    spi: davinci: Remove uninitialized_var() usage
    ide: Remove uninitialized_var() usage
    rtlwifi: rtl8192cu: Remove uninitialized_var() usage
    b43: Remove uninitialized_var() usage
    drbd: Remove uninitialized_var() usage
    x86/mm/numa: Remove uninitialized_var() usage
    docs: deprecated.rst: Add uninitialized_var()

    Linus Torvalds
     

17 Jul, 2020

1 commit

  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

15 Jul, 2020

1 commit

  • The ioctl encoding for this parameter is a long but the documentation says
    it should be an int and the kernel drivers expect it to be an int. If the
    fuse driver treats this as a long it might end up scribbling over the stack
    of a userspace process that only allocated enough space for an int.

    This was previously discussed in [1] and a patch for fuse was proposed in
    [2]. From what I can tell the patch in [2] was nacked in favor of adding
    new, "fixed" ioctls and using those from userspace. However there is still
    no "fixed" version of these ioctls and the fact is that it's sometimes
    infeasible to change all userspace to use the new one.

    Handling the ioctls specially in the fuse driver seems like the most
    pragmatic way for fuse servers to support them without causing crashes in
    userspace applications that call them.

    [1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
    [2]: https://sourceforge.net/p/fuse/mailman/message/31771759/

    Signed-off-by: Chirantan Ekbote
    Fixes: 59efec7b9039 ("fuse: implement ioctl support")
    Cc:
    Signed-off-by: Miklos Szeredi

    Chirantan Ekbote
     

14 Jul, 2020

7 commits

  • fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
    believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
    either guarantee that all data was successfully saved or return error.

    Fixes: 26d614df1da9 ("fuse: Implement writepages callback")
    Signed-off-by: Vasily Averin
    Signed-off-by: Miklos Szeredi

    Vasily Averin
     
  • fuse_writepages_fill uses following construction:

    if (wpa && ap->num_pages &&
    (A || B || C)) {
    action;
    } else if (wpa && D) {
    if (E) {
    the same action;
    }
    }

    - ap->num_pages check is always true and can be removed

    - "if" and "else if" calls the same action and can be merged.

    Move checking A, B, C, D, E conditions to a helper, add comments.

    Original-patch-by: Vasily Averin
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Previous patch changed handling of remount/reconfigure to ignore all
    options, including those that are unknown to the fuse kernel fs. This was
    done for backward compatibility, but this likely only affects the old
    mount(2) API.

    The new fsconfig(2) based reconfiguration could possibly be improved. This
    would make the new API less of a drop in replacement for the old, OTOH this
    is a good chance to get rid of some weirdnesses in the old API.

    Several other behaviors might make sense:

    1) unknown options are rejected, known options are ignored

    2) unknown options are rejected, known options are rejected if the value
    is changed, allowed otherwise

    3) all options are rejected

    Prior to the backward compatibility fix to ignore all options all known
    options were accepted (1), even if they change the value of a mount
    parameter; fuse_reconfigure() does not look at the config values set by
    fuse_parse_param().

    To fix that we'd need to verify that the value provided is the same as set
    in the initial configuration (2). The major drawback is that this is much
    more complex than just rejecting all attempts at changing options (3);
    i.e. all options signify initial configuration values and don't make sense
    on reconfigure.

    This patch opts for (3) with the rationale that no mount options are
    reconfigurable in fuse.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The command

    mount -o remount -o unknownoption /mnt/fuse

    succeeds on kernel versions prior to v5.4 and fails on kernel version at or
    after. This is because fuse_parse_param() rejects any unrecognised options
    in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.

    This causes a regression in case the fuse filesystem is in fstab, since
    remount sends all options found there to the kernel; even ones that are
    meant for the initial mount and are consumed by the userspace fuse server.

    Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
    the conversion to the new API.

    Reported-by: Stefan Priebe
    Fixes: c30da2e981a7 ("fuse: convert to use the new mount API")
    Cc: # v5.4
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • s_op->remount_fs() is only called from legacy_reconfigure(), which is not
    used after being converted to the new API.

    Convert to using ->reconfigure(). This restores the previous behavior of
    syncing the filesystem and rejecting MS_MANDLOCK on remount.

    Fixes: c30da2e981a7 ("fuse: convert to use the new mount API")
    Cc: # v5.4
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which
    triggers the following warning:

    WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse]
    RIP: 0010:tree_insert+0xab/0xc0 [fuse]
    Call Trace:
    fuse_writepages_fill+0x5da/0x6a0 [fuse]
    write_cache_pages+0x171/0x470
    fuse_writepages+0x8a/0x100 [fuse]
    do_writepages+0x43/0xe0

    Fix up the warning and clean up the code around rb-tree insertion:

    - Rename tree_insert() to fuse_insert_writeback() and make it return the
    conflicting entry in case of failure

    - Re-add tree_insert() as a wrapper around fuse_insert_writeback()

    - Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse
    the meaning of the return value to mean

    + "true" in case the writepage entry was successfully added

    + "false" in case it was in-fligt queued on an existing writepage
    entry's auxiliary list or the existing writepage entry's temporary
    page updated

    Switch from fuse_find_writeback() + tree_insert() to
    fuse_insert_writeback()

    - Move setting orig_pages to before inserting/updating the entry; this may
    result in the orig_pages value being discarded later in case of an
    in-flight request

    - In case of a new writepage entry use fuse_writepage_add()
    unconditionally, only set data->wpa if the entry was added.

    Fixes: 6b2fb79963fb ("fuse: optimize writepages search")
    Reported-by: kernel test robot
    Original-path-by: Vasily Averin
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • In fuse_writepage_end() the old writepages entry needs to be removed from
    the rbtree before inserting the new one, otherwise tree_insert() would
    fail. This is a very rare codepath and no reproducer exists.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi