28 Feb, 2020

1 commit

  • Some transports (hyperv, virtio) acquire the sock lock during the
    .release() callback.

    In the vsock_stream_connect() we call vsock_assign_transport(); if
    the socket was previously assigned to another transport, the
    vsk->transport->release() is called, but the sock lock is already
    held in the vsock_stream_connect(), causing a deadlock reported by
    syzbot:

    INFO: task syz-executor280:9768 blocked for more than 143 seconds.
    Not tainted 5.6.0-rc1-syzkaller #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor280 D27912 9768 9766 0x00000000
    Call Trace:
    context_switch kernel/sched/core.c:3386 [inline]
    __schedule+0x934/0x1f90 kernel/sched/core.c:4082
    schedule+0xdc/0x2b0 kernel/sched/core.c:4156
    __lock_sock+0x165/0x290 net/core/sock.c:2413
    lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
    virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
    vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
    vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
    __sys_connect_file+0x161/0x1c0 net/socket.c:1857
    __sys_connect+0x174/0x1b0 net/socket.c:1874
    __do_sys_connect net/socket.c:1885 [inline]
    __se_sys_connect net/socket.c:1882 [inline]
    __x64_sys_connect+0x73/0xb0 net/socket.c:1882
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    To avoid this issue, this patch remove the lock acquiring in the
    .release() callback of hyperv and virtio transports, and it holds
    the lock when we call vsk->transport->release() in the vsock core.

    Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
    Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

15 Jan, 2020

1 commit

  • Currently, hv_sock restricts the port the guest socket can accept
    connections on. hv_sock divides the socket port namespace into two parts
    for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF
    (there are no restrictions on client port namespace). The first part
    (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted.
    The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports
    for the peer (host) socket, once a connection is accepted.
    This reservation of the port namespace is specific to hv_sock and not
    known by the generic vsock library (ex: af_vsock). This is problematic
    because auto-binds/ephemeral ports are handled by the generic vsock
    library and it has no knowledge of this port reservation and could
    allocate a port that is not compatible with hv_sock (and legitimately so).
    The issue hasn't surfaced so far because the auto-bind code of vsock
    (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for
    VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and
    start assigning ports. That will take a large number of iterations to hit
    0x7FFFFFFF. But, after the above change to randomize port selection, the
    issue has started coming up more frequently.
    There has really been no good reason to have this port reservation logic
    in hv_sock from the get go. Reserving a local port for peer ports is not
    how things are handled generally. Peer ports should reflect the peer port.
    This fixes the issue by lifting the port reservation, and also returns the
    right peer port. Since the code converts the GUID to the peer port (by
    using the first 4 bytes), there is a possibility of conflicts, but that
    seems like a reasonable risk to take, given this is limited to vsock and
    that only applies to all local sockets.

    Signed-off-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

01 Dec, 2019

1 commit

  • Pull Hyper-V updates from Sasha Levin:

    - support for new VMBus protocols (Andrea Parri)

    - hibernation support (Dexuan Cui)

    - latency testing framework (Branden Bonaby)

    - decoupling Hyper-V page size from guest page size (Himadri Pandya)

    * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
    Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
    drivers/hv: Replace binary semaphore with mutex
    drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
    HID: hyperv: Add the support of hibernation
    hv_balloon: Add the support of hibernation
    x86/hyperv: Implement hv_is_hibernation_supported()
    Drivers: hv: balloon: Remove dependencies on guest page size
    Drivers: hv: vmbus: Remove dependencies on guest page size
    x86: hv: Add function to allocate zeroed page for Hyper-V
    Drivers: hv: util: Specify ring buffer size using Hyper-V page size
    Drivers: hv: Specify receive buffer size using Hyper-V page size
    tools: hv: add vmbus testing tool
    drivers: hv: vmbus: Introduce latency testing
    video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
    video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
    hv_netvsc: Add the support of hibernation
    hv_sock: Add the support of hibernation
    video: hyperv_fb: Add the support of hibernation
    scsi: storvsc: Add the support of hibernation
    Drivers: hv: vmbus: Add module parameter to cap the VMBus version
    ...

    Linus Torvalds
     

22 Nov, 2019

1 commit


15 Nov, 2019

5 commits

  • This patch adds 'module' member in the 'struct vsock_transport'
    in order to get/put the transport module. This prevents the
    module unloading while sockets are assigned to it.

    We increase the module refcnt when a socket is assigned to a
    transport, and we decrease the module refcnt when the socket
    is destructed.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds the support of multiple transports in the
    VSOCK core.

    With the multi-transports support, we can use vsock with nested VMs
    (using also different hypervisors) loading both guest->host and
    host->guest transports at the same time.

    Major changes:
    - vsock core module can be loaded regardless of the transports
    - vsock_core_init() and vsock_core_exit() are renamed to
    vsock_core_register() and vsock_core_unregister()
    - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
    to identify which directions the transport can handle and if it's
    support DGRAM (only vmci)
    - each stream socket is assigned to a transport when the remote CID
    is set (during the connect() or when we receive a connection request
    on a listener socket).
    The remote CID is used to decide which transport to use:
    - remote CID host transport;
    - remote CID == local_cid (guest->host transport) will use guest->host
    transport for loopback (host->guest transports don't support loopback);
    - remote CID > VMADDR_CID_HOST will use host->guest transport;
    - listener sockets are not bound to any transports since no transport
    operations are done on it. In this way we can create a listener
    socket, also if the transports are not loaded or with VMADDR_CID_ANY
    to listen on all transports.
    - DGRAM sockets are handled as before, since only the vmci_transport
    provides this feature.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Remote peer is always the host, so we set VMADDR_CID_HOST as
    remote CID instead of VMADDR_CID_ANY.

    Reviewed-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • All transports call __vsock_create() with the same parameters,
    most of them depending on the parent socket. In order to simplify
    the VSOCK core APIs exposed to the transports, this patch adds
    the vsock_create_connected() callable from transports to create
    a new socket when a connection request is received.
    We also unexported the __vsock_create().

    Suggested-by: Stefan Hajnoczi
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • virtio_transport and vmci_transport handle the buffer_size
    sockopts in a very similar way.

    In order to support multiple transports, this patch moves this
    handling in the core to allow the user to change the options
    also if the socket is not yet assigned to any transport.

    This patch also adds the '.notify_buffer_size' callback in the
    'struct virtio_transport' in order to inform the transport,
    when the buffer_size is changed by the user. It is also useful
    to limit the 'buffer_size' requested (e.g. virtio transports).

    Acked-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

07 Nov, 2019

1 commit


16 Oct, 2019

1 commit

  • Current code assumes PAGE_SIZE (the guest page size) is equal
    to the page size used to communicate with Hyper-V (which is
    always 4K). While this assumption is true on x86, it may not
    be true for Hyper-V on other architectures. For example,
    Linux on ARM64 may have PAGE_SIZE of 16K or 64K. A new symbol,
    HV_HYP_PAGE_SIZE, has been previously introduced to use when
    the Hyper-V page size is intended instead of the guest page size.

    Make this code work on non-x86 architectures by using the new
    HV_HYP_PAGE_SIZE symbol instead of PAGE_SIZE, where appropriate.
    Also replace the now redundant PAGE_SIZE_4K with HV_HYP_PAGE_SIZE.
    The change has no effect on x86, but lays the groundwork to run
    on ARM64 and others.

    Signed-off-by: Himadri Pandya
    Reviewed-by: Michael Kelley
    Signed-off-by: David S. Miller

    Himadri Pandya
     

02 Oct, 2019

1 commit

  • Lockdep is unhappy if two locks from the same class are held.

    Fix the below warning for hyperv and virtio sockets (vmci socket code
    doesn't have the issue) by using lock_sock_nested() when __vsock_release()
    is called recursively:

    ============================================
    WARNING: possible recursive locking detected
    5.3.0+ #1 Not tainted
    --------------------------------------------
    server/1795 is trying to acquire lock:
    ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]

    but task is already holding lock:
    ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(sk_lock-AF_VSOCK);
    lock(sk_lock-AF_VSOCK);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    2 locks held by server/1795:
    #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
    #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

    stack backtrace:
    CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
    Call Trace:
    dump_stack+0x67/0x90
    __lock_acquire.cold.67+0xd2/0x20b
    lock_acquire+0xb5/0x1c0
    lock_sock_nested+0x6d/0x90
    hvs_release+0x10/0x120 [hv_sock]
    __vsock_release+0x24/0xf0 [vsock]
    __vsock_release+0xa0/0xf0 [vsock]
    vsock_release+0x12/0x30 [vsock]
    __sock_release+0x37/0xa0
    sock_close+0x14/0x20
    __fput+0xc1/0x250
    task_work_run+0x98/0xc0
    do_exit+0x344/0xc60
    do_group_exit+0x47/0xb0
    get_signal+0x15c/0xc50
    do_signal+0x30/0x720
    exit_to_usermode_loop+0x50/0xa0
    do_syscall_64+0x24e/0x270
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f4184e85f31

    Tested-by: Stefano Garzarella
    Signed-off-by: Dexuan Cui
    Reviewed-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Dexuan Cui
     

07 Aug, 2019

1 commit


03 Aug, 2019

1 commit

  • There is a race condition for an established connection that is being closed
    by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
    'remove_sock' is false):

    1 for the initial value;
    1 for the sk being in the bound list;
    1 for the sk being in the connected list;
    1 for the delayed close_work.

    After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
    decrease the refcnt to 3.

    Concurrently, hvs_close_connection() runs in another thread:
    calls vsock_remove_sock() to decrease the refcnt by 2;
    call sock_put() to decrease the refcnt to 0, and free the sk;
    next, the "release_sock(sk)" may hang due to use-after-free.

    In the above, after hvs_release() finishes, if hvs_close_connection() runs
    faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
    because at the beginning of hvs_close_connection(), the refcnt is still 4.

    The issue can be resolved if an extra reference is taken when the
    connection is established.

    Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
    Signed-off-by: Dexuan Cui
    Reviewed-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller

    Dexuan Cui
     

24 Jul, 2019

1 commit


22 Jun, 2019

1 commit


19 Jun, 2019

1 commit

  • Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
    not return even when the hvsock socket is writable, under some race
    condition. This can happen under the following sequence:
    - fd = socket(hvsocket)
    - fd_out = dup(fd)
    - fd_in = dup(fd)
    - start a writer thread that writes data to fd_out with a combination of
    epoll_wait(fd_out, EPOLLOUT) and
    - start a reader thread that reads data from fd_in with a combination of
    epoll_wait(fd_in, EPOLLIN)
    - On the host, there are two threads that are reading/writing data to the
    hvsocket

    stack:
    hvs_stream_has_space
    hvs_notify_poll_out
    vsock_poll
    sock_poll
    ep_poll

    Race condition:
    check for epollout from ep_poll():
    assume no writable space in the socket
    hvs_stream_has_space() returns 0
    check for epollin from ep_poll():
    assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
    hvs_stream_has_space() will clear the channel pending send size
    host will not notify the guest because the pending send size has
    been cleared and so the hvsocket will never mark the
    socket writable

    Now, the EPOLLOUT will never return even if the socket write buffer is
    empty.

    The fix is to set the pending size to the default size and never change it.
    This way the host will always notify the guest whenever the writable space
    is bigger than the pending size. The host is already optimized to *only*
    notify the guest when the pending size threshold boundary is crossed and
    not everytime.

    This change also reduces the cpu usage somewhat since hv_stream_has_space()
    is in the hotpath of send:
    vsock_stream_sendmsg()->hv_stream_has_space()
    Earlier hv_stream_has_space was setting/clearing the pending size on every
    call.

    Signed-off-by: Sunil Muthuswamy
    Reviewed-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

18 Jun, 2019

2 commits

  • Honestly all the conflicts were simple overlapping changes,
    nothing really interesting to report.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:
    "Lots of bug fixes here:

    1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer.

    2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John
    Crispin.

    3) Use after free in psock backlog workqueue, from John Fastabend.

    4) Fix source port matching in fdb peer flow rule of mlx5, from Raed
    Salem.

    5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet.

    6) Network header needs to be set for packet redirect in nfp, from
    John Hurley.

    7) Fix udp zerocopy refcnt, from Willem de Bruijn.

    8) Don't assume linear buffers in vxlan and geneve error handlers,
    from Stefano Brivio.

    9) Fix TOS matching in mlxsw, from Jiri Pirko.

    10) More SCTP cookie memory leak fixes, from Neil Horman.

    11) Fix VLAN filtering in rtl8366, from Linus Walluij.

    12) Various TCP SACK payload size and fragmentation memory limit fixes
    from Eric Dumazet.

    13) Use after free in pneigh_get_next(), also from Eric Dumazet.

    14) LAPB control block leak fix from Jeremy Sowden"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
    lapb: fixed leak of control-blocks.
    tipc: purge deferredq list for each grp member in tipc_group_delete
    ax25: fix inconsistent lock state in ax25_destroy_timer
    neigh: fix use-after-free read in pneigh_get_next
    tcp: fix compile error if !CONFIG_SYSCTL
    hv_sock: Suppress bogus "may be used uninitialized" warnings
    be2net: Fix number of Rx queues used for flow hashing
    net: handle 802.1P vlan 0 packets properly
    tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()
    tcp: add tcp_min_snd_mss sysctl
    tcp: tcp_fragment() should apply sane memory limits
    tcp: limit payload size of sacked skbs
    Revert "net: phylink: set the autoneg state in phylink_phy_change"
    bpf: fix nested bpf tracepoints with per-cpu data
    bpf: Fix out of bounds memory access in bpf_sk_storage
    vsock/virtio: set SOCK_DONE on peer shutdown
    net: dsa: rtl8366: Fix up VLAN filtering
    net: phylink: set the autoneg state in phylink_phy_change
    net: add high_order_alloc_disable sysctl/static key
    tcp: add tcp_tx_skb_cache sysctl
    ...

    Linus Torvalds
     

17 Jun, 2019

1 commit

  • gcc 8.2.0 may report these bogus warnings under some condition:

    warning: ‘vnew’ may be used uninitialized in this function
    warning: ‘hvs_new’ may be used uninitialized in this function

    Actually, the 2 pointers are only initialized and used if the variable
    "conn_from_host" is true. The code is not buggy here.

    Signed-off-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Dexuan Cui
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 263 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

23 May, 2019

2 commits

  • Currently, the hv_sock send() iterates once over the buffer, puts data into
    the VMBUS channel and returns. It doesn't maximize on the case when there
    is a simultaneous reader draining data from the channel. In such a case,
    the send() can maximize the bandwidth (and consequently minimize the cpu
    cycles) by iterating until the channel is found to be full.

    Perf data:
    Total Data Transfer: 10GB/iteration
    Single threaded reader/writer, Linux hvsocket writer with Windows hvsocket
    reader
    Packet size: 64KB
    CPU sys time was captured using the 'time' command for the writer to send
    10GB of data.
    'Send Buffer Loop' is with the patch applied.
    The values below are over 10 iterations.

    |--------------------------------------------------------|
    | | Current | Send Buffer Loop |
    |--------------------------------------------------------|
    | | Throughput | CPU sys | Throughput | CPU sys |
    | | (MB/s) | time (s) | (MB/s) | time (s) |
    |--------------------------------------------------------|
    | Min | 407 | 7.048 | 401 | 5.958 |
    |--------------------------------------------------------|
    | Max | 455 | 7.563 | 542 | 6.993 |
    |--------------------------------------------------------|
    | Avg | 440 | 7.411 | 451 | 6.639 |
    |--------------------------------------------------------|
    | Median | 446 | 7.417 | 447 | 6.761 |
    |--------------------------------------------------------|

    Observation:
    1. The avg throughput doesn't really change much with this change for this
    scenario. This is most probably because the bottleneck on throughput is
    somewhere else.
    2. The average system (or kernel) cpu time goes down by 10%+ with this
    change, for the same amount of data transfer.

    Signed-off-by: Sunil Muthuswamy
    Reviewed-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     
  • Currently, the hv_sock buffer size is static and can't scale to the
    bandwidth requirements of the application. This change allows the
    applications to influence the socket buffer sizes using the SO_SNDBUF and
    the SO_RCVBUF socket options.

    Few interesting points to note:
    1. Since the VMBUS does not allow a resize operation of the ring size, the
    socket buffer size option should be set prior to establishing the
    connection for it to take effect.
    2. Setting the socket option comes with the cost of that much memory being
    reserved/allocated by the kernel, for the lifetime of the connection.

    Perf data:
    Total Data Transfer: 1GB
    Single threaded reader/writer
    Results below are summarized over 10 iterations.

    Linux hvsocket writer + Windows hvsocket reader:
    |---------------------------------------------------------------------------------------------|
    |Packet size -> | 128B | 1KB | 4KB | 64KB |
    |---------------------------------------------------------------------------------------------|
    |SO_SNDBUF size | | Throughput in MB/s (min/max/avg/median): |
    | v | |
    |---------------------------------------------------------------------------------------------|
    | Default | 109/118/114/116 | 636/774/701/700 | 435/507/480/476 | 410/491/462/470 |
    | 16KB | 110/116/112/111 | 575/705/662/671 | 749/900/854/869 | 592/824/692/676 |
    | 32KB | 108/120/115/115 | 703/823/767/772 | 718/878/850/866 | 1593/2124/2000/2085 |
    | 64KB | 108/119/114/114 | 592/732/683/688 | 805/934/903/911 | 1784/1943/1862/1843 |
    |---------------------------------------------------------------------------------------------|

    Windows hvsocket writer + Linux hvsocket reader:
    |---------------------------------------------------------------------------------------------|
    |Packet size -> | 128B | 1KB | 4KB | 64KB |
    |---------------------------------------------------------------------------------------------|
    |SO_RCVBUF size | | Throughput in MB/s (min/max/avg/median): |
    | v | |
    |---------------------------------------------------------------------------------------------|
    | Default | 69/82/75/73 | 313/343/333/336 | 418/477/446/445 | 659/701/676/678 |
    | 16KB | 69/83/76/77 | 350/401/375/382 | 506/548/517/516 | 602/624/615/615 |
    | 32KB | 62/83/73/73 | 471/529/496/494 | 830/1046/935/939 | 944/1180/1070/1100 |
    | 64KB | 64/70/68/69 | 467/533/501/497 | 1260/1590/1430/1431 | 1605/1819/1670/1660 |
    |---------------------------------------------------------------------------------------------|

    Signed-off-by: Sunil Muthuswamy
    Reviewed-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

17 May, 2019

1 commit

  • Currently, hvsock does not implement any delayed or background close
    logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and
    the last reference to the socket is dropped, which leads to a call to
    .destruct where the socket can hang indefinitely waiting for the peer to
    close it's side. The can cause the user application to hang in the close()
    call.

    This change implements proper STREAM(TCP) closing handshake mechanism by
    sending the FIN to the peer and the waiting for the peer's FIN to arrive
    for a given timeout. On timeout, it will try to terminate the connection
    (i.e. a RST). This is in-line with other socket providers such as virtio.

    This change does not address the hang in the vmbus_hvsock_device_unregister
    where it waits indefinitely for the host to rescind the channel. That
    should be taken up as a separate fix.

    Signed-off-by: Sunil Muthuswamy
    Reviewed-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

06 Dec, 2017

1 commit

  • Since commit 3b4477d2dcf2709d0be89e2a8dced3d0f4a017f2 ("VSOCK: use TCP
    state constants for sk_state") VSOCK has used TCP_* constants for
    sk_state.

    Commit b4562ca7925a3bedada87a3dd072dd5bad043288 ("hv_sock: add locking
    in the open/close/release code paths") reintroduced the SS_DISCONNECTING
    constant.

    This patch replaces the old SS_DISCONNECTING with the new TCP_CLOSING
    constant.

    CC: Dexuan Cui
    CC: Cathy Avery
    Signed-off-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     

22 Oct, 2017

1 commit

  • There were quite a few overlapping sets of changes here.

    Daniel's bug fix for off-by-ones in the new BPF branch instructions,
    along with the added allowances for "data_end > ptr + x" forms
    collided with the metadata additions.

    Along with those three changes came veritifer test cases, which in
    their final form I tried to group together properly. If I had just
    trimmed GIT's conflict tags as-is, this would have split up the
    meta tests unnecessarily.

    In the socketmap code, a set of preemption disabling changes
    overlapped with the rename of bpf_compute_data_end() to
    bpf_compute_data_pointers().

    Changes were made to the mv88e6060.c driver set addr method
    which got removed in net-next.

    The hyperv transport socket layer had a locking change in 'net'
    which overlapped with a change of socket state macro usage
    in 'net-next'.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Oct, 2017

1 commit

  • Without the patch, when hvs_open_connection() hasn't completely established
    a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
    inserted the sock into the connected queue), vsock_stream_connect() may see
    the sk_state change and return the connection to the userspace, and next
    when the userspace closes the connection quickly, hvs_release() may not see
    the connection in the connected queue; finally hvs_open_connection()
    inserts the connection into the queue, but we won't be able to purge the
    connection for ever.

    Signed-off-by: Dexuan Cui
    Cc: K. Y. Srinivasan
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Vitaly Kuznetsov
    Cc: Cathy Avery
    Cc: Rolf Neugebauer
    Cc: Marcelo Cerri
    Signed-off-by: David S. Miller

    Dexuan Cui
     

06 Oct, 2017

1 commit

  • There are two state fields: socket->state and sock->sk_state. The
    socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
    sock->sk_state typically uses values that match TCP state constants
    (TCP_CLOSE, TCP_ESTABLISHED). AF_VSOCK does not follow this convention
    and instead uses SS_* constants for both fields.

    The sk_state field will be exposed to userspace through the vsock_diag
    interface for ss(8), netstat(8), and other programs.

    This patch switches sk_state to TCP state constants so that the meaning
    of this field is consistent with other address families. Not just
    AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.

    The following mapping was used to convert the code:

    SS_FREE -> TCP_CLOSE
    SS_UNCONNECTED -> TCP_CLOSE
    SS_CONNECTING -> TCP_SYN_SENT
    SS_CONNECTED -> TCP_ESTABLISHED
    SS_DISCONNECTING -> TCP_CLOSING
    VSOCK_SS_LISTEN -> TCP_LISTEN

    In __vsock_create() the sk_state initialization was dropped because
    sock_init_data() already initializes sk_state to TCP_CLOSE.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     

29 Aug, 2017

1 commit

  • Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
    mechanism between the host and the guest. It uses VMBus ringbuffer as the
    transportation layer.

    With hv_sock, applications between the host (Windows 10, Windows Server
    2016 or newer) and the guest can talk with each other using the traditional
    socket APIs.

    More info about Hyper-V Sockets is available here:

    "Make your own integration services":
    https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

    The patch implements the necessary support in Linux guest by introducing a new
    vsock transport for AF_VSOCK.

    Signed-off-by: Dexuan Cui
    Cc: K. Y. Srinivasan
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Andy King
    Cc: Dmitry Torokhov
    Cc: George Zhang
    Cc: Jorgen Hansen
    Cc: Reilly Grant
    Cc: Asias He
    Cc: Stefan Hajnoczi
    Cc: Vitaly Kuznetsov
    Cc: Cathy Avery
    Cc: Rolf Neugebauer
    Cc: Marcelo Cerri
    Signed-off-by: David S. Miller

    Dexuan Cui