28 Feb, 2020

2 commits

  • The mptcp conflict was overlapping additions.

    The SMC conflict was an additional and removal happening at the same
    time.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Some transports (hyperv, virtio) acquire the sock lock during the
    .release() callback.

    In the vsock_stream_connect() we call vsock_assign_transport(); if
    the socket was previously assigned to another transport, the
    vsk->transport->release() is called, but the sock lock is already
    held in the vsock_stream_connect(), causing a deadlock reported by
    syzbot:

    INFO: task syz-executor280:9768 blocked for more than 143 seconds.
    Not tainted 5.6.0-rc1-syzkaller #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor280 D27912 9768 9766 0x00000000
    Call Trace:
    context_switch kernel/sched/core.c:3386 [inline]
    __schedule+0x934/0x1f90 kernel/sched/core.c:4082
    schedule+0xdc/0x2b0 kernel/sched/core.c:4156
    __lock_sock+0x165/0x290 net/core/sock.c:2413
    lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
    virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
    vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
    vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
    __sys_connect_file+0x161/0x1c0 net/socket.c:1857
    __sys_connect+0x174/0x1b0 net/socket.c:1874
    __do_sys_connect net/socket.c:1885 [inline]
    __se_sys_connect net/socket.c:1882 [inline]
    __x64_sys_connect+0x73/0xb0 net/socket.c:1882
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    To avoid this issue, this patch remove the lock acquiring in the
    .release() callback of hyperv and virtio transports, and it holds
    the lock when we call vsk->transport->release() in the vsock core.

    Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
    Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

17 Feb, 2020

1 commit

  • Whenever the vsock backend on the host sends a packet through the RX
    queue, it expects an answer on the TX queue. Unfortunately, there is one
    case where the host side will hang waiting for the answer and might
    effectively never recover if no timeout mechanism was implemented.

    This issue happens when the guest side starts binding to the socket,
    which insert a new bound socket into the list of already bound sockets.
    At this time, we expect the guest to also start listening, which will
    trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
    occurs if the host side queued a RX packet and triggered an interrupt
    right between the end of the binding process and the beginning of the
    listening process. In this specific case, the function processing the
    packet virtio_transport_recv_pkt() will find a bound socket, which means
    it will hit the switch statement checking for the sk_state, but the
    state won't be changed into TCP_LISTEN yet, which leads the code to pick
    the default statement. This default statement will only free the buffer,
    while it should also respond to the host side, by sending a packet on
    its TX queue.

    In order to simply fix this unfortunate chain of events, it is important
    that in case the default statement is entered, and because at this stage
    we know the host side is waiting for an answer, we must send back a
    packet containing the operation VIRTIO_VSOCK_OP_RST.

    One could say that a proper timeout mechanism on the host side will be
    enough to avoid the backend to hang. But the point of this patch is to
    ensure the normal use case will be provided with proper responsiveness
    when it comes to establishing the connection.

    Signed-off-by: Sebastien Boeuf
    Signed-off-by: David S. Miller

    Sebastien Boeuf
     

20 Jan, 2020

1 commit


15 Jan, 2020

1 commit

  • Currently, hv_sock restricts the port the guest socket can accept
    connections on. hv_sock divides the socket port namespace into two parts
    for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF
    (there are no restrictions on client port namespace). The first part
    (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted.
    The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports
    for the peer (host) socket, once a connection is accepted.
    This reservation of the port namespace is specific to hv_sock and not
    known by the generic vsock library (ex: af_vsock). This is problematic
    because auto-binds/ephemeral ports are handled by the generic vsock
    library and it has no knowledge of this port reservation and could
    allocate a port that is not compatible with hv_sock (and legitimately so).
    The issue hasn't surfaced so far because the auto-bind code of vsock
    (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for
    VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and
    start assigning ports. That will take a large number of iterations to hit
    0x7FFFFFFF. But, after the above change to randomize port selection, the
    issue has started coming up more frequently.
    There has really been no good reason to have this port reservation logic
    in hv_sock from the get go. Reserving a local port for peer ports is not
    how things are handled generally. Peer ports should reflect the peer port.
    This fixes the issue by lifting the port reservation, and also returns the
    right peer port. Since the code converts the GUID to the peer port (by
    using the first 4 bytes), there is a possibility of conflicts, but that
    seems like a reasonable risk to take, given this is limited to vsock and
    that only applies to all local sockets.

    Signed-off-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

23 Dec, 2019

1 commit


17 Dec, 2019

2 commits

  • virtio_transport_get_ops() and virtio_transport_send_pkt_info()
    can only be used on connecting/connected sockets, since a socket
    assigned to a transport is required.

    This patch adds a WARN_ON() on virtio_transport_get_ops() to check
    this requirement, a comment and a returned error on
    virtio_transport_send_pkt_info(),

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • With multi-transport support, listener sockets are not bound to any
    transport. So, calling virtio_transport_reset(), when an error
    occurs, on a listener socket produces the following null-pointer
    dereference:

    BUG: kernel NULL pointer dereference, address: 00000000000000e8
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP PTI
    CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.5.0-rc1-ste-00003-gb4be21f316ac-dirty #56
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
    Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
    RIP: 0010:virtio_transport_send_pkt_info+0x20/0x130 [vmw_vsock_virtio_transport_common]
    Code: 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 10 44 8b 76 20 e8 c0 ba fe ff 8b 80 e8 00 00 00 e8 64 e3 7d c1 45 8b 45 00 41 8b 8c 24 d4 02
    RSP: 0018:ffffc900000b7d08 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff88807bf12728 RCX: 0000000000000000
    RDX: ffff88807bf12700 RSI: ffffc900000b7d50 RDI: ffff888035c84000
    RBP: ffffc900000b7d40 R08: ffff888035c84000 R09: ffffc900000b7d08
    R10: ffff8880781de800 R11: 0000000000000018 R12: ffff888035c84000
    R13: ffffc900000b7d50 R14: 0000000000000000 R15: ffff88807bf12724
    FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000e8 CR3: 00000000790f4004 CR4: 0000000000160ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    virtio_transport_reset+0x59/0x70 [vmw_vsock_virtio_transport_common]
    virtio_transport_recv_pkt+0x5bb/0xe50 [vmw_vsock_virtio_transport_common]
    ? detach_buf_split+0xf1/0x130
    virtio_transport_rx_work+0xba/0x130 [vmw_vsock_virtio_transport]
    process_one_work+0x1c0/0x300
    worker_thread+0x45/0x3c0
    kthread+0xfc/0x130
    ? current_work+0x40/0x40
    ? kthread_park+0x90/0x90
    ret_from_fork+0x35/0x40
    Modules linked in: sunrpc kvm_intel kvm vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common irqbypass vsock virtio_rng rng_core
    CR2: 00000000000000e8
    ---[ end trace e75400e2ea2fa824 ]---

    This happens because virtio_transport_reset() calls
    virtio_transport_send_pkt_info() that can be used only on
    connecting/connected sockets.

    This patch fixes the issue, using virtio_transport_reset_no_sock()
    instead of virtio_transport_reset() when we are handling a listener
    socket.

    Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

12 Dec, 2019

6 commits

  • We can remove the loopback handling from virtio_transport,
    because now the vsock core is able to handle local communication
    using the new vsock_loopback device.

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Now that we have a transport that can handle the local communication,
    we can use it when it is loaded.

    A socket will use the local transport (loopback) when the remote
    CID is:
    - equal to VMADDR_CID_LOCAL
    - or equal to transport_g2h->get_local_cid(), if transport_g2h
    is loaded (this allows us to keep the same behavior implemented
    by virtio and vmci transports)
    - or equal to VMADDR_CID_HOST, if transport_g2h is not loaded

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds a new vsock_loopback transport to handle local
    communication.
    This transport is based on the loopback implementation of
    virtio_transport, so it uses the virtio_transport_common APIs
    to interface with the vsock core.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch allows to register a transport able to handle
    local communication (loopback).

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • The VMADDR_CID_RESERVED (1) was used by VMCI, but now it is not
    used anymore, so we can reuse it for local communication
    (loopback) adding the new well-know CID: VMADDR_CID_LOCAL.

    Cc: Jorgen Hansen
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • We can remove virtio header includes, because virtio_transport_common
    doesn't use virtio API, but provides common functions to interface
    virtio/vhost transports with the af_vsock core, and to handle
    the protocol.

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

01 Dec, 2019

1 commit

  • Pull Hyper-V updates from Sasha Levin:

    - support for new VMBus protocols (Andrea Parri)

    - hibernation support (Dexuan Cui)

    - latency testing framework (Branden Bonaby)

    - decoupling Hyper-V page size from guest page size (Himadri Pandya)

    * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
    Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
    drivers/hv: Replace binary semaphore with mutex
    drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
    HID: hyperv: Add the support of hibernation
    hv_balloon: Add the support of hibernation
    x86/hyperv: Implement hv_is_hibernation_supported()
    Drivers: hv: balloon: Remove dependencies on guest page size
    Drivers: hv: vmbus: Remove dependencies on guest page size
    x86: hv: Add function to allocate zeroed page for Hyper-V
    Drivers: hv: util: Specify ring buffer size using Hyper-V page size
    Drivers: hv: Specify receive buffer size using Hyper-V page size
    tools: hv: add vmbus testing tool
    drivers: hv: vmbus: Introduce latency testing
    video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
    video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
    hv_netvsc: Add the support of hibernation
    hv_sock: Add the support of hibernation
    video: hyperv_fb: Add the support of hibernation
    scsi: storvsc: Add the support of hibernation
    Drivers: hv: vmbus: Add module parameter to cap the VMBus version
    ...

    Linus Torvalds
     

22 Nov, 2019

2 commits


15 Nov, 2019

14 commits

  • When we are looking for a socket bound to a specific address,
    we also have to take into account the CID.

    This patch is useful with multi-transports support because it
    allows the binding of the same port with different CID, and
    it prevents a connection to a wrong socket bound to the same
    port, but with different CID.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds 'module' member in the 'struct vsock_transport'
    in order to get/put the transport module. This prevents the
    module unloading while sockets are assigned to it.

    We increase the module refcnt when a socket is assigned to a
    transport, and we decrease the module refcnt when the socket
    is destructed.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • To allow other transports to be loaded with vmci_transport,
    we register the vmci_transport as G2H or H2G only when a VMCI guest
    or host is active.

    To do that, this patch adds a callback registered in the vmci driver
    that will be called when the host or guest becomes active.
    This callback will register the vmci_transport in the VSOCK core.

    Cc: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds the support of multiple transports in the
    VSOCK core.

    With the multi-transports support, we can use vsock with nested VMs
    (using also different hypervisors) loading both guest->host and
    host->guest transports at the same time.

    Major changes:
    - vsock core module can be loaded regardless of the transports
    - vsock_core_init() and vsock_core_exit() are renamed to
    vsock_core_register() and vsock_core_unregister()
    - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
    to identify which directions the transport can handle and if it's
    support DGRAM (only vmci)
    - each stream socket is assigned to a transport when the remote CID
    is set (during the connect() or when we receive a connection request
    on a listener socket).
    The remote CID is used to decide which transport to use:
    - remote CID host transport;
    - remote CID == local_cid (guest->host transport) will use guest->host
    transport for loopback (host->guest transports don't support loopback);
    - remote CID > VMADDR_CID_HOST will use host->guest transport;
    - listener sockets are not bound to any transports since no transport
    operations are done on it. In this way we can create a listener
    socket, also if the transports are not loaded or with VMADDR_CID_ANY
    to listen on all transports.
    - DGRAM sockets are handled as before, since only the vmci_transport
    provides this feature.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Remote peer is always the host, so we set VMADDR_CID_HOST as
    remote CID instead of VMADDR_CID_ANY.

    Reviewed-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • vsock_insert_unbound() was called only when 'sock' parameter of
    __vsock_create() was not null. This only happened when
    __vsock_create() was called by vsock_create().

    In order to simplify the multi-transports support, this patch
    moves vsock_insert_unbound() at the end of vsock_create().

    Reviewed-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • All transports call __vsock_create() with the same parameters,
    most of them depending on the parent socket. In order to simplify
    the VSOCK core APIs exposed to the transports, this patch adds
    the vsock_create_connected() callable from transports to create
    a new socket when a connection request is received.
    We also unexported the __vsock_create().

    Suggested-by: Stefan Hajnoczi
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • virtio_transport and vmci_transport handle the buffer_size
    sockopts in a very similar way.

    In order to support multiple transports, this patch moves this
    handling in the core to allow the user to change the options
    also if the socket is not yet assigned to any transport.

    This patch also adds the '.notify_buffer_size' callback in the
    'struct virtio_transport' in order to inform the transport,
    when the buffer_size is changed by the user. It is also useful
    to limit the 'buffer_size' requested (e.g. virtio transports).

    Acked-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Since now the 'struct vsock_sock' object contains a pointer to
    the transport, this patch adds a parameter to the
    vsock_core_get_transport() to return the right transport
    assigned to the socket.

    This patch modifies also the virtio_transport_get_ops(), that
    uses the vsock_core_get_transport(), adding the
    'struct vsock_sock *' parameter.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • We are going to add 'struct vsock_sock *' parameter to
    virtio_transport_get_ops().

    In some cases, like in the virtio_transport_reset_no_sock(),
    we don't have any socket assigned to the packet received,
    so we can't use the virtio_transport_get_ops().

    In order to allow virtio_transport_reset_no_sock() to use the
    '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
    we add the 'struct virtio_transport *' to it and to its caller:
    virtio_transport_recv_pkt().

    We moved the 'vhost_transport' and 'virtio_transport' definition,
    to pass their address to the virtio_transport_recv_pkt().

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • As a preparation to support multiple transports, this patch adds
    the 'transport' member at the 'struct vsock_sock'.
    This new field is initialized during the creation in the
    __vsock_create() function.

    This patch also renames the global 'transport' pointer to
    'transport_single', since for now we're only supporting a single
    transport registered at run-time.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This header file now only includes the "uapi/linux/vm_sockets.h".
    We can include directly it when needed.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
    We can replace it calling the virtio_transport_get_ops() and
    using the get_local_cid() callback registered by the transport.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • The VSOCK_DEFAULT_CONNECT_TIMEOUT definition was introduced with
    commit d021c344051af ("VSOCK: Introduce VM Sockets"), but it is
    never used in the net/vmw_vsock/vmci_transport.c.

    VSOCK_DEFAULT_CONNECT_TIMEOUT is used and defined in
    net/vmw_vsock/af_vsock.c

    Cc: Jorgen Hansen
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

10 Nov, 2019

1 commit


09 Nov, 2019

1 commit

  • The "42f5cda5eaf4" commit rightly set SOCK_DONE on peer shutdown,
    but there is an issue if we receive the SHUTDOWN(RDWR) while the
    virtio_transport_close_timeout() is scheduled.
    In this case, when the timeout fires, the SOCK_DONE is already
    set and the virtio_transport_close_timeout() will not call
    virtio_transport_reset() and virtio_transport_do_close().
    This causes that both sockets remain open and will never be released,
    preventing the unloading of [virtio|vhost]_transport modules.

    This patch fixes this issue, calling virtio_transport_reset() and
    virtio_transport_do_close() when we receive the SHUTDOWN(RDWR)
    and there is nothing left to read.

    Fixes: 42f5cda5eaf4 ("vsock/virtio: set SOCK_DONE on peer shutdown")
    Cc: Stephen Barber
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

07 Nov, 2019

1 commit


06 Nov, 2019

1 commit


03 Nov, 2019

1 commit


29 Oct, 2019

1 commit


21 Oct, 2019

1 commit


19 Oct, 2019

2 commits