16 Jul, 2020

1 commit

  • Commit 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free
    on the_virtio_vsock") starts to use RCU to protect 'the_virtio_vsock'
    pointer, but we forgot to annotate it.

    This patch adds the annotation to fix the following sparse errors:

    net/vmw_vsock/virtio_transport.c:73:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:171:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:207:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:561:13: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:612:9: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:631:9: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock *

    Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
    Reported-by: Michael S. Tsirkin
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jakub Kicinski

    Stefano Garzarella
     

12 Dec, 2019

1 commit


15 Nov, 2019

4 commits

  • This patch adds 'module' member in the 'struct vsock_transport'
    in order to get/put the transport module. This prevents the
    module unloading while sockets are assigned to it.

    We increase the module refcnt when a socket is assigned to a
    transport, and we decrease the module refcnt when the socket
    is destructed.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds the support of multiple transports in the
    VSOCK core.

    With the multi-transports support, we can use vsock with nested VMs
    (using also different hypervisors) loading both guest->host and
    host->guest transports at the same time.

    Major changes:
    - vsock core module can be loaded regardless of the transports
    - vsock_core_init() and vsock_core_exit() are renamed to
    vsock_core_register() and vsock_core_unregister()
    - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
    to identify which directions the transport can handle and if it's
    support DGRAM (only vmci)
    - each stream socket is assigned to a transport when the remote CID
    is set (during the connect() or when we receive a connection request
    on a listener socket).
    The remote CID is used to decide which transport to use:
    - remote CID host transport;
    - remote CID == local_cid (guest->host transport) will use guest->host
    transport for loopback (host->guest transports don't support loopback);
    - remote CID > VMADDR_CID_HOST will use host->guest transport;
    - listener sockets are not bound to any transports since no transport
    operations are done on it. In this way we can create a listener
    socket, also if the transports are not loaded or with VMADDR_CID_ANY
    to listen on all transports.
    - DGRAM sockets are handled as before, since only the vmci_transport
    provides this feature.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • virtio_transport and vmci_transport handle the buffer_size
    sockopts in a very similar way.

    In order to support multiple transports, this patch moves this
    handling in the core to allow the user to change the options
    also if the socket is not yet assigned to any transport.

    This patch also adds the '.notify_buffer_size' callback in the
    'struct virtio_transport' in order to inform the transport,
    when the buffer_size is changed by the user. It is also useful
    to limit the 'buffer_size' requested (e.g. virtio transports).

    Acked-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • We are going to add 'struct vsock_sock *' parameter to
    virtio_transport_get_ops().

    In some cases, like in the virtio_transport_reset_no_sock(),
    we don't have any socket assigned to the packet received,
    so we can't use the virtio_transport_get_ops().

    In order to allow virtio_transport_reset_no_sock() to use the
    '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
    we add the 'struct virtio_transport *' to it and to its caller:
    virtio_transport_recv_pkt().

    We moved the 'vhost_transport' and 'virtio_transport' definition,
    to pass their address to the virtio_transport_recv_pkt().

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

31 Jul, 2019

1 commit

  • Since virtio-vsock was introduced, the buffers filled by the host
    and pushed to the guest using the vring, are directly queued in
    a per-socket list. These buffers are preallocated by the guest
    with a fixed size (4 KB).

    The maximum amount of memory used by each socket should be
    controlled by the credit mechanism.
    The default credit available per-socket is 256 KB, but if we use
    only 1 byte per packet, the guest can queue up to 262144 of 4 KB
    buffers, using up to 1 GB of memory per-socket. In addition, the
    guest will continue to fill the vring with new 4 KB free buffers
    to avoid starvation of other sockets.

    This patch mitigates this issue copying the payload of small
    packets (< 128 bytes) into the buffer of last packet queued, in
    order to avoid wasting memory.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

09 Jul, 2019

3 commits

  • This patch moves the flush of works after vdev->config->del_vqs(vdev),
    because we need to be sure that no workers run before to free the
    'vsock' object.

    Since we stopped the workers using the [tx|rx|event]_run flags,
    we are sure no one is accessing the device while we are calling
    vdev->config->reset(vdev), so we can safely move the workers' flush.

    Before the vdev->config->del_vqs(vdev), workers can be scheduled
    by VQ callbacks, so we must flush them after del_vqs(), to avoid
    use-after-free of 'vsock' object.

    Suggested-by: Michael S. Tsirkin
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Before to call vdev->config->reset(vdev) we need to be sure that
    no one is accessing the device, for this reason, we add new variables
    in the struct virtio_vsock to stop the workers during the .remove().

    This patch also add few comments before vdev->config->reset(vdev)
    and vdev->config->del_vqs(vdev).

    Suggested-by: Stefan Hajnoczi
    Suggested-by: Michael S. Tsirkin
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Some callbacks used by the upper layers can run while we are in the
    .remove(). A potential use-after-free can happen, because we free
    the_virtio_vsock without knowing if the callbacks are over or not.

    To solve this issue we move the assignment of the_virtio_vsock at the
    end of .probe(), when we finished all the initialization, and at the
    beginning of .remove(), before to release resources.
    For the same reason, we do the same also for the vdev->priv.

    We use RCU to be sure that all callbacks that use the_virtio_vsock
    ended before freeing it. This is not required for callbacks that
    use vdev->priv, because after the vdev->config->del_vqs() we are sure
    that they are ended and will no longer be invoked.

    We also take the mutex during the .remove() to avoid that .probe() can
    run while we are resetting the device.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 48 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Enrico Weigelt
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

19 May, 2019

1 commit

  • Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are
    accessed (while handling interrupts) before they are initialized.

    [ 4.201410] BUG: unable to handle kernel paging request at ffffffffffffffe8
    [ 4.207829] IP: vsock_addr_equals_addr+0x3/0x20
    [ 4.211379] PGD 28210067 P4D 28210067 PUD 28212067 PMD 0
    [ 4.211379] Oops: 0000 [#1] PREEMPT SMP PTI
    [ 4.211379] Modules linked in:
    [ 4.211379] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 4.14.106-419297-gd7e28cc1f241 #1
    [ 4.211379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    [ 4.211379] Workqueue: virtio_vsock virtio_transport_rx_work
    [ 4.211379] task: ffffa3273d175280 task.stack: ffffaea1800e8000
    [ 4.211379] RIP: 0010:vsock_addr_equals_addr+0x3/0x20
    [ 4.211379] RSP: 0000:ffffaea1800ebd28 EFLAGS: 00010286
    [ 4.211379] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffffffffb94e42f0
    [ 4.211379] RDX: 0000000000000400 RSI: ffffffffffffffe0 RDI: ffffaea1800ebdd0
    [ 4.211379] RBP: ffffaea1800ebd58 R08: 0000000000000001 R09: 0000000000000001
    [ 4.211379] R10: 0000000000000000 R11: ffffffffb89d5d60 R12: ffffaea1800ebdd0
    [ 4.211379] R13: 00000000828cbfbf R14: 0000000000000000 R15: ffffaea1800ebdc0
    [ 4.211379] FS: 0000000000000000(0000) GS:ffffa3273fd00000(0000) knlGS:0000000000000000
    [ 4.211379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 4.211379] CR2: ffffffffffffffe8 CR3: 000000002820e001 CR4: 00000000001606e0
    [ 4.211379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 4.211379] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 4.211379] Call Trace:
    [ 4.211379] ? vsock_find_connected_socket+0x6c/0xe0
    [ 4.211379] virtio_transport_recv_pkt+0x15f/0x740
    [ 4.211379] ? detach_buf+0x1b5/0x210
    [ 4.211379] virtio_transport_rx_work+0xb7/0x140
    [ 4.211379] process_one_work+0x1ef/0x480
    [ 4.211379] worker_thread+0x312/0x460
    [ 4.211379] kthread+0x132/0x140
    [ 4.211379] ? process_one_work+0x480/0x480
    [ 4.211379] ? kthread_destroy_worker+0xd0/0xd0
    [ 4.211379] ret_from_fork+0x35/0x40
    [ 4.211379] Code: c7 47 08 00 00 00 00 66 c7 07 28 00 c7 47 08 ff ff ff ff c7 47 04 ff ff ff ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 8b 47 08 46 08 75 0a 8b 47 04 3b 46 04 0f 94 c0 c3 31 c0 c3 90 66 2e
    [ 4.211379] RIP: vsock_addr_equals_addr+0x3/0x20 RSP: ffffaea1800ebd28
    [ 4.211379] CR2: ffffffffffffffe8
    [ 4.211379] ---[ end trace f31cc4a2e6df3689 ]---
    [ 4.211379] Kernel panic - not syncing: Fatal exception in interrupt
    [ 4.211379] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [ 4.211379] Rebooting in 5 seconds..

    Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug")
    Cc: Stefan Hajnoczi
    Cc: Stefano Garzarella
    Cc: "David S. Miller"
    Cc: kvm@vger.kernel.org
    Cc: virtualization@lists.linux-foundation.org
    Cc: netdev@vger.kernel.org
    Cc: kernel-team@android.com
    Cc: stable@vger.kernel.org [4.9+]
    Signed-off-by: Jorge E. Moreira
    Reviewed-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Jorge E. Moreira
     

04 Feb, 2019

2 commits

  • When the virtio transport device disappear, we should reset all
    connected sockets in order to inform the users.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • virtio_vsock_remove() invokes the vsock_core_exit() also if there
    are opened sockets for the AF_VSOCK protocol family. In this way
    the vsock "transport" pointer is set to NULL, triggering the
    kernel panic at the first socket activity.

    This patch move the vsock_core_init()/vsock_core_exit() in the
    virtio_vsock respectively in module_init and module_exit functions,
    that cannot be invoked until there are open sockets.

    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1609699
    Reported-by: Yan Fu
    Signed-off-by: Stefano Garzarella
    Acked-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

22 Jun, 2018

1 commit

  • The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be
    used, and in fact in virtio_transport_common.c only 64 bit accessors are
    used. Using 32 bit accessors for 64 bit values breaks big endian systems.

    This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.

    Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport")

    Signed-off-by: Claudio Imbrenda
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Claudio Imbrenda
     

06 Oct, 2017

1 commit

  • There are two state fields: socket->state and sock->sk_state. The
    socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
    sock->sk_state typically uses values that match TCP state constants
    (TCP_CLOSE, TCP_ESTABLISHED). AF_VSOCK does not follow this convention
    and instead uses SS_* constants for both fields.

    The sk_state field will be exposed to userspace through the vsock_diag
    interface for ss(8), netstat(8), and other programs.

    This patch switches sk_state to TCP state constants so that the meaning
    of this field is consistent with other address families. Not just
    AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.

    The following mapping was used to convert the code:

    SS_FREE -> TCP_CLOSE
    SS_UNCONNECTED -> TCP_CLOSE
    SS_CONNECTING -> TCP_SYN_SENT
    SS_CONNECTED -> TCP_ESTABLISHED
    SS_DISCONNECTING -> TCP_CLOSING
    VSOCK_SS_LISTEN -> TCP_LISTEN

    In __vsock_create() the sk_state initialization was dropped because
    sock_init_data() already initializes sk_state to TCP_CLOSE.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     

11 May, 2017

1 commit

  • Pull virtio updates from Michael Tsirkin:
    "Fixes, cleanups, performance

    A bunch of changes to virtio, most affecting virtio net. Also ptr_ring
    batched zeroing - first of batching enhancements that seems ready."

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    s390/virtio: change maintainership
    tools/virtio: fix spelling mistake: "wakeus" -> "wakeups"
    virtio_net: tidy a couple debug statements
    ptr_ring: support testing different batching sizes
    ringtest: support test specific parameters
    ptr_ring: batch ring zeroing
    virtio: virtio_driver doc
    virtio_net: don't reset twice on XDP on/off
    virtio_net: fix support for small rings
    virtio_net: reduce alignment for buffers
    virtio_net: rework mergeable buffer handling
    virtio_net: allow specifying context for rx
    virtio: allow extra context per descriptor
    tools/virtio: fix build breakage
    virtio: add context flag to find vqs
    virtio: wrap find_vqs
    ringtest: fix an assert statement

    Linus Torvalds
     

03 May, 2017

1 commit


25 Apr, 2017

1 commit

  • The virtio drivers deal with struct virtio_vsock_pkt. Add
    virtio_transport_deliver_tap_pkt(pkt) for handing packets to the
    vsockmon device.

    We call virtio_transport_deliver_tap_pkt(pkt) from
    net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of
    common code. This is because the drivers may drop packets before
    handing them to common code - we still want to capture them.

    Signed-off-by: Gerard Garcia
    Signed-off-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Gerard Garcia
     

22 Mar, 2017

1 commit


28 Feb, 2017

1 commit

  • Add a struct irq_affinity pointer to the find_vqs methods, which if set
    is used to tell the PCI layer to create the MSI-X vectors for our I/O
    virtqueues with the proper affinity from the start. Compared to after
    the fact affinity hints this gives us an instantly working setup and
    allows to allocate the irq descritors node-local and avoid interconnect
    traffic. Last but not least this will allow blk-mq queues are created
    based on the interrupt affinity for storage drivers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Christoph Hellwig
     

15 Dec, 2016

1 commit


25 Nov, 2016

1 commit

  • The VMware VMCI transport supports loopback inside virtual machines.
    This patch implements loopback for virtio-vsock.

    Flow control is handled by the virtio-vsock protocol as usual. The
    sending process stops transmitting on a connection when the peer's
    receive buffer space is exhausted.

    Cathy Avery noticed this difference between VMCI and
    virtio-vsock when a test case using loopback failed. Although loopback
    isn't the main point of AF_VSOCK, it is useful for testing and
    virtio-vsock must match VMCI semantics so that userspace programs run
    regardless of the underlying transport.

    My understanding is that loopback is not supported on the host side with
    VMCI. Follow that by implementing it only in the guest driver, not the
    vhost host driver.

    Cc: Jorgen Hansen
    Reported-by: Cathy Avery
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     

15 Aug, 2016

1 commit

  • Remove unnecessary use of enable/disable callback notifications
    and the incorrect more space available check.

    The virtio_transport_tx_work handles when the TX virtqueue
    has more buffers available.

    Signed-off-by: Gerard Garcia
    Acked-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin

    Gerard Garcia
     

02 Aug, 2016

1 commit