09 Feb, 2017

1 commit

  • commit cda8bba0f99d25d2061c531113c14fa41effc3ae upstream.

    Currently, under certain circumstances vhost_init_is_le does just a part
    of the initialization job, and depends on vhost_reset_is_le being called
    too. For this reason vhost_vq_init_access used to call vhost_reset_is_le
    when vq->private_data is NULL. This is not only counter intuitive, but
    also real a problem because it breaks vhost_net. The bug was introduced to
    vhost_net with commit 2751c9882b94 ("vhost: cross-endian support for
    legacy devices"). The symptom is corruption of the vq's used.idx field
    (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost
    shutdown on a vq with pending descriptors.

    Let us make sure the outcome of vhost_init_is_le never depend on the state
    it is actually supposed to initialize, and fix virtio_net by removing the
    reset from vhost_vq_init_access.

    With the above, there is no reason for vhost_reset_is_le to do just half
    of the job. Let us make vhost_reset_is_le reinitialize is_le.

    Signed-off-by: Halil Pasic
    Reported-by: Michael A. Tebolt
    Reported-by: Dr. David Alan Gilbert
    Fixes: commit 2751c9882b94 ("vhost: cross-endian support for legacy devices")
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Greg Kurz
    Tested-by: Michael A. Tebolt
    Signed-off-by: Greg Kroah-Hartman

    Halil Pasic
     

09 Dec, 2016

1 commit

  • local_addr.svm_cid is host cid. We should check guest cid instead,
    which is remote_addr.svm_cid. Otherwise we end up resetting all
    connections to all guests.

    Cc: stable@vger.kernel.org [4.8+]
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Peng Tao
    Signed-off-by: David S. Miller

    Peng Tao
     

05 Sep, 2016

1 commit


31 Aug, 2016

1 commit

  • Many modules call misc_register and misc_deregister in its module init
    and exit methods without any additional code. This ends up being
    boilerplate. This patch adds helper macro module_misc_device(), that
    replaces module_init()/ module_exit() with template functions.

    This patch also converts drivers to use new macro.

    Change since v1:
    Add device.h include in miscdevice.h as module_driver macro was not
    available from other include files in some architectures.

    Signed-off-by: PrasannaKumar Muralidharan
    Signed-off-by: Greg Kroah-Hartman

    PrasannaKumar Muralidharan
     

23 Aug, 2016

1 commit

  • The address of the iovec &vq->iov[out] is not guaranteed to contain the scsi
    command's response iovec throughout the lifetime of the command. Rather, it
    is more likely to contain an iovec from an immediately following command
    after looping back around to vhost_get_vq_desc(). Pass along the iovec
    entirely instead.

    Fixes: 79c14141a487 ("vhost/scsi: Convert completion path to use copy_to_iter")
    Cc: stable@vger.kernel.org
    Signed-off-by: Benjamin Coddington
    Signed-off-by: Michael S. Tsirkin

    Benjamin Coddington
     

15 Aug, 2016

1 commit


09 Aug, 2016

1 commit

  • Stash the packet length in a local variable before handing over
    ownership of the packet to virtio_transport_recv_pkt() or
    virtio_transport_free_pkt().

    This patch solves the use-after-free since pkt is no longer guaranteed
    to be alive.

    Reported-by: Dan Carpenter
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Stefan Hajnoczi
     

06 Aug, 2016

1 commit

  • Pull virtio/vhost updates from Michael Tsirkin:

    - new vsock device support in host and guest

    - platform IOMMU support in host and guest, including compatibility
    quirks for legacy systems.

    - misc fixes and cleanups.

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    VSOCK: Use kvfree()
    vhost: split out vringh Kconfig
    vhost: detect 32 bit integer wrap around
    vhost: new device IOTLB API
    vhost: drop vringh dependency
    vhost: convert pre sorted vhost memory array to interval tree
    vhost: introduce vhost memory accessors
    VSOCK: Add Makefile and Kconfig
    VSOCK: Introduce vhost_vsock.ko
    VSOCK: Introduce virtio_transport.ko
    VSOCK: Introduce virtio_vsock_common.ko
    VSOCK: defer sock removal to transports
    VSOCK: transport-specific vsock_transport functions
    vhost: drop vringh dependency
    vop: pull in vhost Kconfig
    virtio: new feature to detect IOMMU device quirk
    balloon: check the number of available pages in leak balloon
    vhost: lockless enqueuing
    vhost: simplify work flushing

    Linus Torvalds
     

02 Aug, 2016

11 commits

  • Use kvfree() instead of open-coding it.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Michael S. Tsirkin

    Wei Yongjun
     
  • vringh is pulled in by caif and mic, but the other
    vhost config does not need to be there.
    In particular, it makes no sense to have vhost net/scsi/sock
    under caif/mic.

    Create a separate Kconfig file and put vringh bits there.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • Detect and fail early if long wrap around is triggered.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • This patch tries to implement an device IOTLB for vhost. This could be
    used with userspace(qemu) implementation of DMA remapping
    to emulate an IOMMU for the guest.

    The idea is simple, cache the translation in a software device IOTLB
    (which is implemented as an interval tree) in vhost and use vhost_net
    file descriptor for reporting IOTLB miss and IOTLB
    update/invalidation. When vhost meets an IOTLB miss, the fault
    address, size and access can be read from the file. After userspace
    finishes the translation, it writes the translated address to the
    vhost_net file to update the device IOTLB.

    When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
    addresses set by ioctl are treated as iova instead of virtual address and
    the accessing can only be done through IOTLB instead of direct userspace
    memory access. Before each round or vq processing, all vq metadata is
    prefetched in device IOTLB to make sure no translation fault happens
    during vq processing.

    In most cases, virtqueues are contiguous even in virtual address space.
    The IOTLB translation for virtqueue itself may make it a little
    slower. We might add fast path cache on top of this patch.

    Signed-off-by: Jason Wang
    [mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
    [mst: fix build warnings ]
    Signed-off-by: Michael S. Tsirkin
    [ weiyj.lk: missing unlock on error ]
    Signed-off-by: Wei Yongjun

    Jason Wang
     
  • Current pre-sorted memory region array has some limitations for future
    device IOTLB conversion:

    1) need extra work for adding and removing a single region, and it's
    expected to be slow because of sorting or memory re-allocation.
    2) need extra work of removing a large range which may intersect
    several regions with different size.
    3) need trick for a replacement policy like LRU

    To overcome the above shortcomings, this patch convert it to interval
    tree which can easily address the above issue with almost no extra
    work.

    The patch could be used for:

    - Extend the current API and only let the userspace to send diffs of
    memory table.
    - Simplify Device IOTLB implementation.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • This patch introduces vhost memory accessors which were just wrappers
    for userspace address access helpers. This is a requirement for vhost
    device iotlb implementation which will add iotlb translations in those
    accessors.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • Enable virtio-vsock and vhost-vsock.

    Signed-off-by: Asias He
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Asias He
     
  • VM sockets vhost transport implementation. This driver runs on the
    host.

    Signed-off-by: Asias He
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Asias He
     
  • vringh isn't used by vhost net or scsi - it's used
    by CAIF only at the moment. Drop the dependency.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • We use spinlock to synchronize the work list now which may cause
    unnecessary contentions. So this patch switch to use llist to remove
    this contention. Pktgen tests shows about 5% improvement:

    Before:
    ~1300000 pps
    After:
    ~1370000 pps

    Signed-off-by: Jason Wang
    Reviewed-by: Michael S. Tsirkin
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • We used to implement the work flushing through tracking queued seq,
    done seq, and the number of flushing. This patch simplify this by just
    implement work flushing through another kind of vhost work with
    completion. This will be used by lockless enqueuing patch.

    Signed-off-by: Jason Wang
    Reviewed-by: Michael S. Tsirkin
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     

01 Jul, 2016

1 commit

  • We used to queue tx packets in sk_receive_queue, this is less
    efficient since it requires spinlocks to synchronize between producer
    and consumer.

    This patch tries to address this by:

    - switch from sk_receive_queue to a skb_array, and resize it when
    tx_queue_len was changed.
    - introduce a new proto_ops peek_len which was used for peeking the
    skb length.
    - implement a tun version of peek_len for vhost_net to use and convert
    vhost_net to use peek_len if possible.

    Pktgen test shows about 15.3% improvement on guest receiving pps for small
    buffers:

    Before: ~1300000pps
    After : ~1500000pps

    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     

08 Jun, 2016

1 commit

  • We don't stop rx polling socket during rx processing, this will lead
    unnecessary wakeups from under layer net devices (E.g
    sock_def_readable() form tun). Rx will be slowed down in this
    way. This patch avoids this by stop polling socket during rx
    processing. A small drawback is that this introduces some overheads in
    light load case because of the extra start/stop polling, but single
    netperf TCP_RR does not notice any change. In a super heavy load case,
    e.g using pktgen to inject packet to guest, we get about ~8.8%
    improvement on pps:

    before: ~1240000 pkt/s
    after: ~1350000 pkt/s

    Signed-off-by: Jason Wang
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Jason Wang
     

10 May, 2016

2 commits

  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Nicholas Bellinger

    Christoph Hellwig
     
  • Turns out the template and thus many drivers got the return value wrong:
    0 means the fabrics driver needs to put a session reference, which no
    driver except for the iSCSI target drivers did. Fortunately none of these
    drivers supports explicit Node ACLs, so the bug was harmless.

    Even without that only qla2xxx and iscsi every did real work in
    shutdown_session, so get rid of the boilerplate code in all other
    drivers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Nicholas Bellinger

    Christoph Hellwig
     

23 Mar, 2016

1 commit

  • Pull SCSI target updates from Nicholas Bellinger:
    "The highlights this round include:

    - Add target_alloc_session() w/ callback helper for doing se_session
    allocation + tag + se_node_acl lookup. (HCH + nab)

    - Tree-wide fabric driver conversion to use target_alloc_session()

    - Convert sbp-target to use percpu_ida tag pre-allocation, and
    TARGET_SCF_ACK_KREF I/O krefs (Chris Boot + nab)

    - Convert usb-gadget to use percpu_ida tag pre-allocation, and
    TARGET_SCF_ACK_KREF I/O krefs (Andrzej Pietrasiewicz + nab)

    - Convert xen-scsiback to use percpu_ida tag pre-allocation, and
    TARGET_SCF_ACK_KREF I/O krefs (Juergen Gross + nab)

    - Convert tcm_fc to use TARGET_SCF_ACK_KREF I/O + TMR krefs

    - Convert ib_srpt to use percpu_ida tag pre-allocation

    - Add DebugFS node for qla2xxx target sess list (Quinn)

    - Rework iser-target connection termination (Jenny + Sagi)

    - Convert iser-target to new CQ API (HCH)

    - Add pass-through WRITE_SAME support for IBLOCK (Mike Christie)

    - Introduce data_bitmap for asynchronous access of data area (Sheng
    Yang + Andy)

    - Fix target_release_cmd_kref shutdown comp leak (Himanshu Madhani)

    Also, there is a separate PULL request coming for cxgb4 NIC driver
    prerequisites for supporting hw iscsi segmentation offload (ISO), that
    will be the base for a number of v4.7 developments involving
    iscsi-target hw offloads"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (36 commits)
    target: Fix target_release_cmd_kref shutdown comp leak
    target: Avoid DataIN transfers for non-GOOD SAM status
    target/user: Report capability of handling out-of-order completions to userspace
    target/user: Fix size_t format-spec build warning
    target/user: Don't free expired command when time out
    target/user: Introduce data_bitmap, replace data_length/data_head/data_tail
    target/user: Free data ring in unified function
    target/user: Use iovec[] to describe continuous area
    target: Remove enum transport_lunflags_table
    target/iblock: pass WRITE_SAME to device if possible
    iser-target: Kill the ->isert_cmd back pointer in struct iser_tx_desc
    iser-target: Kill struct isert_rdma_wr
    iser-target: Convert to new CQ API
    iser-target: Split and properly type the login buffer
    iser-target: Remove ISER_RECV_DATA_SEG_LEN
    iser-target: Remove impossible condition from isert_wait_conn
    iser-target: Remove redundant wait in release_conn
    iser-target: Rework connection termination
    iser-target: Separate flows for np listeners and connections cma events
    iser-target: Add new state ISER_CONN_BOUND to isert_conn
    ...

    Linus Torvalds
     

11 Mar, 2016

4 commits

  • This patch converts vhost/scsi pre-allocation of vhost_scsi_cmd
    descriptors to use the new alloc_session callback().

    Acked-by: Michael S. Tsirkin
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • This patch tries to poll for new added tx buffer or socket receive
    queue for a while at the end of tx/rx processing. The maximum time
    spent on polling were specified through a new kind of vring ioctl.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • This patch introduces a helper which will return true if we're sure
    that the available ring is empty for a specific vq. When we're not
    sure, e.g vq access failure, return false instead. This could be used
    for busy polling code to exit the busy loop.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • This path introduces a helper which can give a hint for whether or not
    there's a work queued in the work list. This could be used for busy
    polling code to exit the busy loop.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     

02 Mar, 2016

3 commits

  • Looking at how callers use this, maybe we should just rename init_used
    to vhost_vq_init_access. The _used suffix was a hint that we
    access the vq used ring. But maybe what callers care about is
    that it must be called after access_ok.

    Also, this function manipulates the vq->is_le field which isn't related
    to the vq used ring.

    This patch simply renames vhost_init_used() to vhost_vq_init_access() as
    suggested by Michael.

    No behaviour change.

    Signed-off-by: Greg Kurz
    Signed-off-by: Michael S. Tsirkin

    Greg Kurz
     
  • The default use case for vhost is when the host and the vring have the
    same endianness (default native endianness). But there are cases where
    they differ and vhost should byteswap when accessing the vring.

    The first case is when the host is big endian and the vring belongs to
    a virtio 1.0 device, which is always little endian.

    This is covered by the vq->is_le field. This field is initialized when
    userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device
    stops.

    We already have a vhost_init_is_le() helper, but the reset operation is
    opencoded as follows:

    vq->is_le = virtio_legacy_is_little_endian();

    It isn't clear that we are resetting vq->is_le here.

    This patch moves the code to a helper with a more explicit name.

    The other case where we may have to byteswap is when the architecture can
    switch endianness at runtime (bi-endian). If endianness differs in the host
    and in the guest, then legacy devices need to be used in cross-endian mode.

    This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which
    introduces a vq->user_be field. Userspace may enable cross-endian mode
    by calling the SET_VRING_ENDIAN ioctl before the device is started. The
    cross-endian mode is disabled when the device is stopped.

    The current names of the helpers that manipulate vq->user_be are unclear.

    This patch renames those helpers to clearly show that this is cross-endian
    stuff and with explicit enable/disable semantics.

    No behaviour change.

    Signed-off-by: Greg Kurz
    Signed-off-by: Michael S. Tsirkin

    Greg Kurz
     
  • We don't want side effects. If something fails, we rollback vq->is_le to
    its previous value.

    Signed-off-by: Greg Kurz
    Signed-off-by: Michael S. Tsirkin

    Greg Kurz
     

07 Dec, 2015

2 commits


14 Nov, 2015

1 commit

  • Pull SCSI target updates from Nicholas Bellinger:
    "This series contains HCH's changes to absorb configfs attribute
    ->show() + ->store() function pointer usage from it's original
    tree-wide consumers, into common configfs code.

    It includes usb-gadget, target w/ drivers, netconsole and ocfs2
    changes to realize the improved simplicity, that now renders the
    original include/target/configfs_macros.h CPP magic for fabric drivers
    and others, unnecessary and obsolete.

    And with common code in place, new configfs attributes can be added
    easier than ever before.

    Note, there are further improvements in-flight from other folks for
    v4.5 code in configfs land, plus number of target fixes for post -rc1
    code"

    In the meantime, a new user of the now-removed old configfs API came in
    through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce
    an abstraction for System Trace Module devices").

    This merge resolution comes from Alexander Shishkin, who updated his stm
    class tracing abstraction to account for the removal of the old
    show_attribute and store_attribute methods in commit 517982229f78
    ("configfs: remove old API") from this pull. As Alexander says about
    that patch:

    "There's no need to keep an extra wrapper structure per item and the
    awkward show_attribute/store_attribute item ops are no longer needed.

    This patch converts policy code to the new api, all the while making
    the code quite a bit smaller and easier on the eyes.

    Signed-off-by: Alexander Shishkin "

    That patch was folded into the merge so that the tree should be fully
    bisectable.

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
    configfs: remove old API
    ocfs2/cluster: use per-attribute show and store methods
    ocfs2/cluster: move locking into attribute store methods
    netconsole: use per-attribute show and store methods
    target: use per-attribute show and store methods
    spear13xx_pcie_gadget: use per-attribute show and store methods
    dlm: use per-attribute show and store methods
    usb-gadget/f_serial: use per-attribute show and store methods
    usb-gadget/f_phonet: use per-attribute show and store methods
    usb-gadget/f_obex: use per-attribute show and store methods
    usb-gadget/f_uac2: use per-attribute show and store methods
    usb-gadget/f_uac1: use per-attribute show and store methods
    usb-gadget/f_mass_storage: use per-attribute show and store methods
    usb-gadget/f_sourcesink: use per-attribute show and store methods
    usb-gadget/f_printer: use per-attribute show and store methods
    usb-gadget/f_midi: use per-attribute show and store methods
    usb-gadget/f_loopback: use per-attribute show and store methods
    usb-gadget/ether: use per-attribute show and store methods
    usb-gadget/f_acm: use per-attribute show and store methods
    usb-gadget/f_hid: use per-attribute show and store methods
    ...

    Linus Torvalds
     

28 Oct, 2015

1 commit

  • commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian
    support for legacy devices") introduced a minor regression: even with
    cross-endian disabled, and even on LE host, vhost_is_little_endian is
    checking is_le flag so there's always a branch.

    To fix, simply check virtio_legacy_is_little_endian first.

    Cc: Greg Kurz
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Greg Kurz
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

14 Oct, 2015

1 commit

  • This also allows to remove the target-specific old configfs macros, and
    gets rid of the target_core_fabric_configfs.h header which only had one
    function declaration left that could be moved to a better place.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Nicholas Bellinger
    Acked-by: Nicholas Bellinger
    Signed-off-by: Nicholas Bellinger

    Christoph Hellwig
     

19 Sep, 2015

1 commit


16 Sep, 2015

1 commit


10 Aug, 2015

1 commit


06 Aug, 2015

1 commit

  • With well over 200+ users of this api, there are a mere 12 users that
    actually checked the return value of this function. And all of them
    really didn't do anything with that information as the system or module
    was shutting down no matter what.

    So stop pretending like it matters, and just return void from
    misc_deregister(). If something goes wrong in the call, you will get a
    WARNING splat in the syslog so you know how to fix up your driver.
    Other than that, there's nothing that can go wrong.

    Cc: Alasdair Kergon
    Cc: Neil Brown
    Cc: Oleg Drokin
    Cc: Andreas Dilger
    Cc: "Michael S. Tsirkin"
    Cc: Wim Van Sebroeck
    Cc: Christine Caulfield
    Cc: David Teigland
    Cc: Mark Fasheh
    Acked-by: Joel Becker
    Acked-by: Alexandre Belloni
    Acked-by: Alessandro Zummo
    Acked-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman