07 Oct, 2020

2 commits

  • [ Upstream commit df12eb6d6cd920ab2f0e0a43cd6e1c23a05cea91 ]

    Whenever the vsock backend on the host sends a packet through the RX
    queue, it expects an answer on the TX queue. Unfortunately, there is one
    case where the host side will hang waiting for the answer and might
    effectively never recover if no timeout mechanism was implemented.

    This issue happens when the guest side starts binding to the socket,
    which insert a new bound socket into the list of already bound sockets.
    At this time, we expect the guest to also start listening, which will
    trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
    occurs if the host side queued a RX packet and triggered an interrupt
    right between the end of the binding process and the beginning of the
    listening process. In this specific case, the function processing the
    packet virtio_transport_recv_pkt() will find a bound socket, which means
    it will hit the switch statement checking for the sk_state, but the
    state won't be changed into TCP_LISTEN yet, which leads the code to pick
    the default statement. This default statement will only free the buffer,
    while it should also respond to the host side, by sending a packet on
    its TX queue.

    In order to simply fix this unfortunate chain of events, it is important
    that in case the default statement is entered, and because at this stage
    we know the host side is waiting for an answer, we must send back a
    packet containing the operation VIRTIO_VSOCK_OP_RST.

    One could say that a proper timeout mechanism on the host side will be
    enough to avoid the backend to hang. But the point of this patch is to
    ensure the normal use case will be provided with proper responsiveness
    when it comes to establishing the connection.

    Signed-off-by: Sebastien Boeuf
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Sebastien Boeuf
     
  • [ Upstream commit 4c7246dc45e2706770d5233f7ce1597a07e069ba ]

    We are going to add 'struct vsock_sock *' parameter to
    virtio_transport_get_ops().

    In some cases, like in the virtio_transport_reset_no_sock(),
    we don't have any socket assigned to the packet received,
    so we can't use the virtio_transport_get_ops().

    In order to allow virtio_transport_reset_no_sock() to use the
    '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
    we add the 'struct virtio_transport *' to it and to its caller:
    virtio_transport_recv_pkt().

    We moved the 'vhost_transport' and 'virtio_transport' definition,
    to pass their address to the virtio_transport_recv_pkt().

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Stefano Garzarella
     

29 Jul, 2020

1 commit

  • [ Upstream commit f961134a612c793d5901a93d85a29337c74af978 ]

    Commit 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free
    on the_virtio_vsock") starts to use RCU to protect 'the_virtio_vsock'
    pointer, but we forgot to annotate it.

    This patch adds the annotation to fix the following sparse errors:

    net/vmw_vsock/virtio_transport.c:73:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:73:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:171:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:171:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:207:17: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:207:17: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:561:13: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:561:13: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:612:9: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:612:9: struct virtio_vsock *
    net/vmw_vsock/virtio_transport.c:631:9: error: incompatible types in comparison expression (different address spaces):
    net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock [noderef] __rcu *
    net/vmw_vsock/virtio_transport.c:631:9: struct virtio_vsock *

    Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
    Reported-by: Michael S. Tsirkin
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Stefano Garzarella
     

11 Jun, 2020

1 commit

  • [ Upstream commit 7e0afbdfd13d1e708fe96e31c46c4897101a6a43 ]

    The accept(2) is an "input" socket interface, so we should use
    SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout.

    So this patch replace sock_sndtimeo() with sock_rcvtimeo() to
    use the right timeout in the vsock_accept().

    Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Stefano Garzarella
     

15 Feb, 2020

1 commit

  • [ Upstream commit c742c59e1fbd022b64d91aa9a0092b3a699d653c ]

    Currently, hv_sock restricts the port the guest socket can accept
    connections on. hv_sock divides the socket port namespace into two parts
    for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF
    (there are no restrictions on client port namespace). The first part
    (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted.
    The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports
    for the peer (host) socket, once a connection is accepted.
    This reservation of the port namespace is specific to hv_sock and not
    known by the generic vsock library (ex: af_vsock). This is problematic
    because auto-binds/ephemeral ports are handled by the generic vsock
    library and it has no knowledge of this port reservation and could
    allocate a port that is not compatible with hv_sock (and legitimately so).
    The issue hasn't surfaced so far because the auto-bind code of vsock
    (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for
    VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and
    start assigning ports. That will take a large number of iterations to hit
    0x7FFFFFFF. But, after the above change to randomize port selection, the
    issue has started coming up more frequently.
    There has really been no good reason to have this port reservation logic
    in hv_sock from the get go. Reserving a local port for peer ports is not
    how things are handled generally. Peer ports should reflect the peer port.
    This fixes the issue by lifting the port reservation, and also returns the
    right peer port. Since the code converts the GUID to the peer port (by
    using the first 4 bytes), there is a possibility of conflicts, but that
    seems like a reasonable risk to take, given this is limited to vsock and
    that only applies to all local sockets.

    Signed-off-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Sunil Muthuswamy
     

09 Nov, 2019

1 commit

  • The "42f5cda5eaf4" commit rightly set SOCK_DONE on peer shutdown,
    but there is an issue if we receive the SHUTDOWN(RDWR) while the
    virtio_transport_close_timeout() is scheduled.
    In this case, when the timeout fires, the SOCK_DONE is already
    set and the virtio_transport_close_timeout() will not call
    virtio_transport_reset() and virtio_transport_do_close().
    This causes that both sockets remain open and will never be released,
    preventing the unloading of [virtio|vhost]_transport modules.

    This patch fixes this issue, calling virtio_transport_reset() and
    virtio_transport_do_close() when we receive the SHUTDOWN(RDWR)
    and there is nothing left to read.

    Fixes: 42f5cda5eaf4 ("vsock/virtio: set SOCK_DONE on peer shutdown")
    Cc: Stephen Barber
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

29 Oct, 2019

1 commit


19 Oct, 2019

2 commits


02 Oct, 2019

1 commit

  • Lockdep is unhappy if two locks from the same class are held.

    Fix the below warning for hyperv and virtio sockets (vmci socket code
    doesn't have the issue) by using lock_sock_nested() when __vsock_release()
    is called recursively:

    ============================================
    WARNING: possible recursive locking detected
    5.3.0+ #1 Not tainted
    --------------------------------------------
    server/1795 is trying to acquire lock:
    ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]

    but task is already holding lock:
    ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(sk_lock-AF_VSOCK);
    lock(sk_lock-AF_VSOCK);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    2 locks held by server/1795:
    #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
    #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

    stack backtrace:
    CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
    Call Trace:
    dump_stack+0x67/0x90
    __lock_acquire.cold.67+0xd2/0x20b
    lock_acquire+0xb5/0x1c0
    lock_sock_nested+0x6d/0x90
    hvs_release+0x10/0x120 [hv_sock]
    __vsock_release+0x24/0xf0 [vsock]
    __vsock_release+0xa0/0xf0 [vsock]
    vsock_release+0x12/0x30 [vsock]
    __sock_release+0x37/0xa0
    sock_close+0x14/0x20
    __fput+0xc1/0x250
    task_work_run+0x98/0xc0
    do_exit+0x344/0xc60
    do_group_exit+0x47/0xb0
    get_signal+0x15c/0xc50
    do_signal+0x30/0x720
    exit_to_usermode_loop+0x50/0xa0
    do_syscall_64+0x24e/0x270
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f4184e85f31

    Tested-by: Stefano Garzarella
    Signed-off-by: Dexuan Cui
    Reviewed-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Dexuan Cui
     

05 Sep, 2019

1 commit


07 Aug, 2019

1 commit


03 Aug, 2019

1 commit

  • There is a race condition for an established connection that is being closed
    by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
    'remove_sock' is false):

    1 for the initial value;
    1 for the sk being in the bound list;
    1 for the sk being in the connected list;
    1 for the delayed close_work.

    After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
    decrease the refcnt to 3.

    Concurrently, hvs_close_connection() runs in another thread:
    calls vsock_remove_sock() to decrease the refcnt by 2;
    call sock_put() to decrease the refcnt to 0, and free the sk;
    next, the "release_sock(sk)" may hang due to use-after-free.

    In the above, after hvs_release() finishes, if hvs_close_connection() runs
    faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
    because at the beginning of hvs_close_connection(), the refcnt is still 4.

    The issue can be resolved if an extra reference is taken when the
    connection is established.

    Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
    Signed-off-by: Dexuan Cui
    Reviewed-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller

    Dexuan Cui
     

31 Jul, 2019

5 commits

  • Since now we are able to split packets, we can avoid limiting
    their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
    Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
    packet size.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • If the packets to sent to the guest are bigger than the buffer
    available, we can split them, using multiple buffers and fixing
    the length in the packet header.
    This is safe since virtio-vsock supports only stream sockets.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
    the same spinlock also if we are in the TX path.

    Move also buf_alloc under the same lock.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • In order to reduce the number of credit update messages,
    we send them only when the space available seen by the
    transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Since virtio-vsock was introduced, the buffers filled by the host
    and pushed to the guest using the vring, are directly queued in
    a per-socket list. These buffers are preallocated by the guest
    with a fixed size (4 KB).

    The maximum amount of memory used by each socket should be
    controlled by the credit mechanism.
    The default credit available per-socket is 256 KB, but if we use
    only 1 byte per packet, the guest can queue up to 262144 of 4 KB
    buffers, using up to 1 GB of memory per-socket. In addition, the
    guest will continue to fill the vring with new 4 KB free buffers
    to avoid starvation of other sockets.

    This patch mitigates this issue copying the payload of small
    packets (< 128 bytes) into the buffer of last packet queued, in
    order to avoid wasting memory.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

24 Jul, 2019

1 commit


09 Jul, 2019

3 commits

  • This patch moves the flush of works after vdev->config->del_vqs(vdev),
    because we need to be sure that no workers run before to free the
    'vsock' object.

    Since we stopped the workers using the [tx|rx|event]_run flags,
    we are sure no one is accessing the device while we are calling
    vdev->config->reset(vdev), so we can safely move the workers' flush.

    Before the vdev->config->del_vqs(vdev), workers can be scheduled
    by VQ callbacks, so we must flush them after del_vqs(), to avoid
    use-after-free of 'vsock' object.

    Suggested-by: Michael S. Tsirkin
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Before to call vdev->config->reset(vdev) we need to be sure that
    no one is accessing the device, for this reason, we add new variables
    in the struct virtio_vsock to stop the workers during the .remove().

    This patch also add few comments before vdev->config->reset(vdev)
    and vdev->config->del_vqs(vdev).

    Suggested-by: Stefan Hajnoczi
    Suggested-by: Michael S. Tsirkin
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Some callbacks used by the upper layers can run while we are in the
    .remove(). A potential use-after-free can happen, because we free
    the_virtio_vsock without knowing if the callbacks are over or not.

    To solve this issue we move the assignment of the_virtio_vsock at the
    end of .probe(), when we finished all the initialization, and at the
    beginning of .remove(), before to release resources.
    For the same reason, we do the same also for the vdev->priv.

    We use RCU to be sure that all callbacks that use the_virtio_vsock
    ended before freeing it. This is not required for callbacks that
    use vdev->priv, because after the vdev->config->del_vqs() we are sure
    that they are ended and will no longer be invoked.

    We also take the mutex during the .remove() to avoid that .probe() can
    run while we are resetting the device.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

22 Jun, 2019

3 commits

  • Minor SPDX change conflict.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:

    1) Fix leak of unqueued fragments in ipv6 nf_defrag, from Guillaume
    Nault.

    2) Don't access the DDM interface unless the transceiver implements it
    in bnx2x, from Mauro S. M. Rodrigues.

    3) Don't double fetch 'len' from userspace in sock_getsockopt(), from
    JingYi Hou.

    4) Sign extension overflow in lio_core, from Colin Ian King.

    5) Various netem bug fixes wrt. corrupted packets from Jakub Kicinski.

    6) Fix epollout hang in hvsock, from Sunil Muthuswamy.

    7) Fix regression in default fib6_type, from David Ahern.

    8) Handle memory limits in tcp_fragment more appropriately, from Eric
    Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
    tcp: refine memory limit test in tcp_fragment()
    inet: clear num_timeout reqsk_alloc()
    net: mvpp2: debugfs: Add pmap to fs dump
    ipv6: Default fib6_type to RTN_UNICAST when not set
    net: hns3: Fix inconsistent indenting
    net/af_iucv: always register net_device notifier
    net/af_iucv: build proper skbs for HiperTransport
    net/af_iucv: remove GFP_DMA restriction for HiperTransport
    net: dsa: mv88e6xxx: fix shift of FID bits in mv88e6185_g1_vtu_loadpurge()
    hvsock: fix epollout hang from race condition
    net/udp_gso: Allow TX timestamp with UDP GSO
    net: netem: fix use after free and double free with packet corruption
    net: netem: fix backlog accounting for corrupted GSO frames
    net: lio_core: fix potential sign-extension overflow on large shift
    tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
    ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
    ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
    tun: wake up waitqueues after IFF_UP is set
    net: remove duplicate fetch in sock_getsockopt
    tipc: fix issues with early FAILOVER_MSG from peer
    ...

    Linus Torvalds
     
  • Pull still more SPDX updates from Greg KH:
    "Another round of SPDX updates for 5.2-rc6

    Here is what I am guessing is going to be the last "big" SPDX update
    for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates
    that were "easy" to determine by pattern matching. The ones after this
    are going to be a bit more difficult and the people on the spdx list
    will be discussing them on a case-by-case basis now.

    Another 5000+ files are fixed up, so our overall totals are:
    Files checked: 64545
    Files with SPDX: 45529

    Compared to the 5.1 kernel which was:
    Files checked: 63848
    Files with SPDX: 22576

    This is a huge improvement.

    Also, we deleted another 20000 lines of boilerplate license crud,
    always nice to see in a diffstat"

    * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits)
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485
    ...

    Linus Torvalds
     

19 Jun, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 48 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Enrico Weigelt
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
    not return even when the hvsock socket is writable, under some race
    condition. This can happen under the following sequence:
    - fd = socket(hvsocket)
    - fd_out = dup(fd)
    - fd_in = dup(fd)
    - start a writer thread that writes data to fd_out with a combination of
    epoll_wait(fd_out, EPOLLOUT) and
    - start a reader thread that reads data from fd_in with a combination of
    epoll_wait(fd_in, EPOLLIN)
    - On the host, there are two threads that are reading/writing data to the
    hvsocket

    stack:
    hvs_stream_has_space
    hvs_notify_poll_out
    vsock_poll
    sock_poll
    ep_poll

    Race condition:
    check for epollout from ep_poll():
    assume no writable space in the socket
    hvs_stream_has_space() returns 0
    check for epollin from ep_poll():
    assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
    hvs_stream_has_space() will clear the channel pending send size
    host will not notify the guest because the pending send size has
    been cleared and so the hvsocket will never mark the
    socket writable

    Now, the EPOLLOUT will never return even if the socket write buffer is
    empty.

    The fix is to set the pending size to the default size and never change it.
    This way the host will always notify the guest whenever the writable space
    is bigger than the pending size. The host is already optimized to *only*
    notify the guest when the pending size threshold boundary is crossed and
    not everytime.

    This change also reduces the cpu usage somewhat since hv_stream_has_space()
    is in the hotpath of send:
    vsock_stream_sendmsg()->hv_stream_has_space()
    Earlier hv_stream_has_space was setting/clearing the pending size on every
    call.

    Signed-off-by: Sunil Muthuswamy
    Reviewed-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

18 Jun, 2019

2 commits

  • Honestly all the conflicts were simple overlapping changes,
    nothing really interesting to report.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:
    "Lots of bug fixes here:

    1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer.

    2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John
    Crispin.

    3) Use after free in psock backlog workqueue, from John Fastabend.

    4) Fix source port matching in fdb peer flow rule of mlx5, from Raed
    Salem.

    5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet.

    6) Network header needs to be set for packet redirect in nfp, from
    John Hurley.

    7) Fix udp zerocopy refcnt, from Willem de Bruijn.

    8) Don't assume linear buffers in vxlan and geneve error handlers,
    from Stefano Brivio.

    9) Fix TOS matching in mlxsw, from Jiri Pirko.

    10) More SCTP cookie memory leak fixes, from Neil Horman.

    11) Fix VLAN filtering in rtl8366, from Linus Walluij.

    12) Various TCP SACK payload size and fragmentation memory limit fixes
    from Eric Dumazet.

    13) Use after free in pneigh_get_next(), also from Eric Dumazet.

    14) LAPB control block leak fix from Jeremy Sowden"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
    lapb: fixed leak of control-blocks.
    tipc: purge deferredq list for each grp member in tipc_group_delete
    ax25: fix inconsistent lock state in ax25_destroy_timer
    neigh: fix use-after-free read in pneigh_get_next
    tcp: fix compile error if !CONFIG_SYSCTL
    hv_sock: Suppress bogus "may be used uninitialized" warnings
    be2net: Fix number of Rx queues used for flow hashing
    net: handle 802.1P vlan 0 packets properly
    tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()
    tcp: add tcp_min_snd_mss sysctl
    tcp: tcp_fragment() should apply sane memory limits
    tcp: limit payload size of sacked skbs
    Revert "net: phylink: set the autoneg state in phylink_phy_change"
    bpf: fix nested bpf tracepoints with per-cpu data
    bpf: Fix out of bounds memory access in bpf_sk_storage
    vsock/virtio: set SOCK_DONE on peer shutdown
    net: dsa: rtl8366: Fix up VLAN filtering
    net: phylink: set the autoneg state in phylink_phy_change
    net: add high_order_alloc_disable sysctl/static key
    tcp: add tcp_tx_skb_cache sysctl
    ...

    Linus Torvalds
     

17 Jun, 2019

1 commit

  • gcc 8.2.0 may report these bogus warnings under some condition:

    warning: ‘vnew’ may be used uninitialized in this function
    warning: ‘hvs_new’ may be used uninitialized in this function

    Actually, the 2 pointers are only initialized and used if the variable
    "conn_from_host" is true. The code is not buggy here.

    Signed-off-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Dexuan Cui
     

16 Jun, 2019

1 commit

  • Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has
    shut down and there is nothing left to read.

    This fixes the following bug:
    1) Peer sends SHUTDOWN(RDWR).
    2) Socket enters TCP_CLOSING but SOCK_DONE is not set.
    3) read() returns -ENOTCONN until close() is called, then returns 0.

    Signed-off-by: Stephen Barber
    Signed-off-by: David S. Miller

    Stephen Barber
     

15 Jun, 2019

1 commit

  • The current vsock code for removal of socket from the list is both
    subject to race and inefficient. It takes the lock, checks whether
    the socket is in the list, drops the lock and if the socket was on the
    list, deletes it from the list. This is subject to race because as soon
    as the lock is dropped once it is checked for presence, that condition
    cannot be relied upon for any decision. It is also inefficient because
    if the socket is present in the list, it takes the lock twice.

    Signed-off-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

08 Jun, 2019

1 commit


05 Jun, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation version 2 and no later version this
    program is distributed in the hope that it will be useful but
    without any warranty without even the implied warranty of
    merchantability or fitness for a particular purpose see the gnu
    general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 33 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190530000435.345978407@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 263 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

23 May, 2019

2 commits

  • Currently, the hv_sock send() iterates once over the buffer, puts data into
    the VMBUS channel and returns. It doesn't maximize on the case when there
    is a simultaneous reader draining data from the channel. In such a case,
    the send() can maximize the bandwidth (and consequently minimize the cpu
    cycles) by iterating until the channel is found to be full.

    Perf data:
    Total Data Transfer: 10GB/iteration
    Single threaded reader/writer, Linux hvsocket writer with Windows hvsocket
    reader
    Packet size: 64KB
    CPU sys time was captured using the 'time' command for the writer to send
    10GB of data.
    'Send Buffer Loop' is with the patch applied.
    The values below are over 10 iterations.

    |--------------------------------------------------------|
    | | Current | Send Buffer Loop |
    |--------------------------------------------------------|
    | | Throughput | CPU sys | Throughput | CPU sys |
    | | (MB/s) | time (s) | (MB/s) | time (s) |
    |--------------------------------------------------------|
    | Min | 407 | 7.048 | 401 | 5.958 |
    |--------------------------------------------------------|
    | Max | 455 | 7.563 | 542 | 6.993 |
    |--------------------------------------------------------|
    | Avg | 440 | 7.411 | 451 | 6.639 |
    |--------------------------------------------------------|
    | Median | 446 | 7.417 | 447 | 6.761 |
    |--------------------------------------------------------|

    Observation:
    1. The avg throughput doesn't really change much with this change for this
    scenario. This is most probably because the bottleneck on throughput is
    somewhere else.
    2. The average system (or kernel) cpu time goes down by 10%+ with this
    change, for the same amount of data transfer.

    Signed-off-by: Sunil Muthuswamy
    Reviewed-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     
  • Currently, the hv_sock buffer size is static and can't scale to the
    bandwidth requirements of the application. This change allows the
    applications to influence the socket buffer sizes using the SO_SNDBUF and
    the SO_RCVBUF socket options.

    Few interesting points to note:
    1. Since the VMBUS does not allow a resize operation of the ring size, the
    socket buffer size option should be set prior to establishing the
    connection for it to take effect.
    2. Setting the socket option comes with the cost of that much memory being
    reserved/allocated by the kernel, for the lifetime of the connection.

    Perf data:
    Total Data Transfer: 1GB
    Single threaded reader/writer
    Results below are summarized over 10 iterations.

    Linux hvsocket writer + Windows hvsocket reader:
    |---------------------------------------------------------------------------------------------|
    |Packet size -> | 128B | 1KB | 4KB | 64KB |
    |---------------------------------------------------------------------------------------------|
    |SO_SNDBUF size | | Throughput in MB/s (min/max/avg/median): |
    | v | |
    |---------------------------------------------------------------------------------------------|
    | Default | 109/118/114/116 | 636/774/701/700 | 435/507/480/476 | 410/491/462/470 |
    | 16KB | 110/116/112/111 | 575/705/662/671 | 749/900/854/869 | 592/824/692/676 |
    | 32KB | 108/120/115/115 | 703/823/767/772 | 718/878/850/866 | 1593/2124/2000/2085 |
    | 64KB | 108/119/114/114 | 592/732/683/688 | 805/934/903/911 | 1784/1943/1862/1843 |
    |---------------------------------------------------------------------------------------------|

    Windows hvsocket writer + Linux hvsocket reader:
    |---------------------------------------------------------------------------------------------|
    |Packet size -> | 128B | 1KB | 4KB | 64KB |
    |---------------------------------------------------------------------------------------------|
    |SO_RCVBUF size | | Throughput in MB/s (min/max/avg/median): |
    | v | |
    |---------------------------------------------------------------------------------------------|
    | Default | 69/82/75/73 | 313/343/333/336 | 418/477/446/445 | 659/701/676/678 |
    | 16KB | 69/83/76/77 | 350/401/375/382 | 506/548/517/516 | 602/624/615/615 |
    | 32KB | 62/83/73/73 | 471/529/496/494 | 830/1046/935/939 | 944/1180/1070/1100 |
    | 64KB | 64/70/68/69 | 467/533/501/497 | 1260/1590/1430/1431 | 1605/1819/1670/1660 |
    |---------------------------------------------------------------------------------------------|

    Signed-off-by: Sunil Muthuswamy
    Reviewed-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

22 May, 2019

1 commit

  • Pull SPDX update from Greg KH:
    "Here is a series of patches that add SPDX tags to different kernel
    files, based on two different things:

    - SPDX entries are added to a bunch of files that we missed a year
    ago that do not have any license information at all.

    These were either missed because the tool saw the MODULE_LICENSE()
    tag, or some EXPORT_SYMBOL tags, and got confused and thought the
    file had a real license, or the files have been added since the
    last big sweep, or they were Makefile/Kconfig files, which we
    didn't touch last time.

    - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
    tools can determine the license text in the file itself. Where this
    happens, the license text is removed, in order to cut down on the
    700+ different ways we have in the kernel today, in a quest to get
    rid of all of these.

    These patches have been out for review on the linux-spdx@vger mailing
    list, and while they were created by automatic tools, they were
    hand-verified by a bunch of different people, all whom names are on
    the patches are reviewers.

    The reason for these "large" patches is if we were to continue to
    progress at the current rate of change in the kernel, adding license
    tags to individual files in different subsystems, we would be finished
    in about 10 years at the earliest.

    There will be more series of these types of patches coming over the
    next few weeks as the tools and reviewers crunch through the more
    "odd" variants of how to say "GPLv2" that developers have come up with
    over the years, combined with other fun oddities (GPL + a BSD
    disclaimer?) that are being unearthed, with the goal for the whole
    kernel to be cleaned up.

    These diffstats are not small, 3840 files are touched, over 10k lines
    removed in just 24 patches"

    * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits)
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3
    ...

    Linus Torvalds
     

21 May, 2019

1 commit