23 Oct, 2019

1 commit

  • Fixes gcc '-Wunused-but-set-variable' warning:

    fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
    fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable]

    It is not used since commit 7ee1e2e631db ("virtiofs: No need to check
    fpq->connected state")

    Reported-by: Hulk Robot
    Signed-off-by: zhengbin
    Signed-off-by: Miklos Szeredi

    zhengbin
     

21 Oct, 2019

5 commits

  • If regular request queue gets full, currently we sleep for a bit and
    retrying submission in submitter's context. This assumes submitter is not
    holding any spin lock. But this assumption is not true for background
    requests. For background requests, we are called with fc->bg_lock held.

    This can lead to deadlock where one thread is trying submission with
    fc->bg_lock held while request completion thread has called
    fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As
    request completion thread gets blocked, it does not make further progress
    and that means queue does not get empty and submitter can't submit more
    requests.

    To solve this issue, retry submission with the help of a worker, instead of
    retrying in submitter's context. We already do this for hiprio/forget
    requests.

    Reported-by: Chirantan Ekbote
    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • If virtqueue is full, we put forget requests on a list and these forgets
    are dispatched later using a worker. As of now we don't count these forgets
    in fsvq->in_flight variable. This means when queue is being drained, we
    have to have special logic to first drain these pending requests and then
    wait for fsvq->in_flight to go to zero.

    By counting pending forgets in fsvq->in_flight, we can get rid of special
    logic and just wait for in_flight to go to zero. Worker thread will kick
    and drain all the forgets anyway, leading in_flight to zero.

    I also need similar logic for normal request queue in next patch where I am
    about to defer request submission in the worker context if queue is full.

    This simplifies the code a bit.

    Also add two helper functions to inc/dec in_flight. Decrement in_flight
    helper will later used to call completion when in_flight reaches zero.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • FR_SENT flag should be set when request has been sent successfully sent
    over virtqueue. This is used by interrupt logic to figure out if interrupt
    request should be sent or not.

    Also add it to fqp->processing list after sending it successfully.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • In virtiofs we keep per queue connected state in virtio_fs_vq->connected
    and use that to end request if queue is not connected. And virtiofs does
    not even touch fpq->connected state.

    We probably need to merge these two at some point of time. For now,
    simplify the code a bit and do not worry about checking state of
    fpq->connected.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     
  • Submission context can hold some locks which end request code tries to hold
    again and deadlock can occur. For example, fc->bg_lock. If a background
    request is being submitted, it might hold fc->bg_lock and if we could not
    submit request (because device went away) and tried to end request, then
    deadlock happens. During testing, I also got a warning from deadlock
    detection code.

    So put requests on a list and end requests from a worker thread.

    I got following warning from deadlock detector.

    [ 603.137138] WARNING: possible recursive locking detected
    [ 603.137142] --------------------------------------------
    [ 603.137144] blogbench/2036 is trying to acquire lock:
    [ 603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
    [ 603.140701]
    [ 603.140701] but task is already holding lock:
    [ 603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
    [ 603.140713]
    [ 603.140713] other info that might help us debug this:
    [ 603.140714] Possible unsafe locking scenario:
    [ 603.140714]
    [ 603.140715] CPU0
    [ 603.140716] ----
    [ 603.140716] lock(&(&fc->bg_lock)->rlock);
    [ 603.140718] lock(&(&fc->bg_lock)->rlock);
    [ 603.140719]
    [ 603.140719] *** DEADLOCK ***

    Signed-off-by: Vivek Goyal
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     

15 Oct, 2019

1 commit


19 Sep, 2019

1 commit

  • Add a basic file system module for virtio-fs. This does not yet contain
    shared data support between host and guest or metadata coherency speedups.
    However it is already significantly faster than virtio-9p.

    Design Overview
    ===============

    With the goal of designing something with better performance and local file
    system semantics, a bunch of ideas were proposed.

    - Use fuse protocol (instead of 9p) for communication between guest and
    host. Guest kernel will be fuse client and a fuse server will run on
    host to serve the requests.

    - For data access inside guest, mmap portion of file in QEMU address space
    and guest accesses this memory using dax. That way guest page cache is
    bypassed and there is only one copy of data (on host). This will also
    enable mmap(MAP_SHARED) between guests.

    - For metadata coherency, there is a shared memory region which contains
    version number associated with metadata and any guest changing metadata
    updates version number and other guests refresh metadata on next access.
    This is yet to be implemented.

    How virtio-fs differs from existing approaches
    ==============================================

    The unique idea behind virtio-fs is to take advantage of the co-location of
    the virtual machine and hypervisor to avoid communication (vmexits).

    DAX allows file contents to be accessed without communication with the
    hypervisor. The shared memory region for metadata avoids communication in
    the common case where metadata is unchanged.

    By replacing expensive communication with cheaper shared memory accesses,
    we expect to achieve better performance than approaches based on network
    file system protocols. In addition, this also makes it easier to achieve
    local file system semantics (coherency).

    These techniques are not applicable to network file system protocols since
    the communications channel is bypassed by taking advantage of shared memory
    on a local machine. This is why we decided to build virtio-fs rather than
    focus on 9P or NFS.

    Caching Modes
    =============

    Like virtio-9p, different caching modes are supported which determine the
    coherency level as well. The “cache=FOO” and “writeback” options control
    the level of coherence between the guest and host filesystems.

    - cache=none
    metadata, data and pathname lookup are not cached in guest. They are
    always fetched from host and any changes are immediately pushed to host.

    - cache=always
    metadata, data and pathname lookup are cached in guest and never expire.

    - cache=auto
    metadata and pathname lookup cache expires after a configured amount of
    time (default is 1 second). Data is cached while the file is open
    (close to open consistency).

    - writeback/no_writeback
    These options control the writeback strategy. If writeback is disabled,
    then normal writes will immediately be synchronized with the host fs.
    If writeback is enabled, then writes may be cached in the guest until
    the file is closed or an fsync(2) performed. This option has no effect
    on mmap-ed writes or writes going through the DAX mechanism.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Vivek Goyal
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Miklos Szeredi

    Stefan Hajnoczi