15 Nov, 2020

1 commit

  • Before commit c0cfa2d8a788 ("vsock: add multi-transports support"),
    if a G2H transport was loaded (e.g. virtio transport), every packets
    was forwarded to the host, regardless of the destination CID.
    The H2G transports implemented until then (vhost-vsock, VMCI) always
    responded with an error, if the destination CID was not
    VMADDR_CID_HOST.

    From that commit, we are using the remote CID to decide which
    transport to use, so packets with remote CID > VMADDR_CID_HOST(2)
    are sent only through H2G transport. If no H2G is available, packets
    are discarded directly in the guest.

    Some use cases (e.g. Nitro Enclaves [1]) rely on the old behaviour
    to implement sibling VMs communication, so we restore the old
    behavior when no H2G is registered.
    It will be up to the host to discard packets if the destination is
    not the right one. As it was already implemented before adding
    multi-transport support.

    Tested with nested QEMU/KVM by me and Nitro Enclaves by Andra.

    [1] Documentation/virt/ne_overview.rst

    Cc: Jorgen Hansen
    Cc: Dexuan Cui
    Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
    Reported-by: Andra Paraschiv
    Tested-by: Andra Paraschiv
    Signed-off-by: Stefano Garzarella
    Link: https://lore.kernel.org/r/20201112133837.34183-1-sgarzare@redhat.com
    Signed-off-by: Jakub Kicinski

    Stefano Garzarella
     

27 Oct, 2020

1 commit

  • During __vsock_create() CAP_NET_ADMIN is used to determine if the
    vsock_sock->trusted should be set to true. This value is used later
    for determing if a remote connection should be allowed to connect
    to a restricted VM. Unfortunately, if the caller doesn't have
    CAP_NET_ADMIN, an audit message such as an selinux denial is
    generated even if the caller does not want a trusted socket.

    Logging errors on success is confusing. To avoid this, switch the
    capable(CAP_NET_ADMIN) check to the noaudit version.

    Reported-by: Roman Kiryanov
    https://android-review.googlesource.com/c/device/generic/goldfish/+/1468545/
    Signed-off-by: Jeff Vander Stoep
    Reviewed-by: James Morris
    Link: https://lore.kernel.org/r/20201023143757.377574-1-jeffv@google.com
    Signed-off-by: Jakub Kicinski

    Jeff Vander Stoep
     

13 Aug, 2020

1 commit

  • syzbot reported this issue where in the vsock_poll() we find the
    socket state at TCP_ESTABLISHED, but 'transport' is null:
    general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
    CPU: 0 PID: 8227 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:vsock_poll+0x75a/0x8e0 net/vmw_vsock/af_vsock.c:1038
    Call Trace:
    sock_poll+0x159/0x460 net/socket.c:1266
    vfs_poll include/linux/poll.h:90 [inline]
    do_pollfd fs/select.c:869 [inline]
    do_poll fs/select.c:917 [inline]
    do_sys_poll+0x607/0xd40 fs/select.c:1011
    __do_sys_poll fs/select.c:1069 [inline]
    __se_sys_poll fs/select.c:1057 [inline]
    __x64_sys_poll+0x18c/0x440 fs/select.c:1057
    do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This issue can happen if the TCP_ESTABLISHED state is set after we read
    the vsk->transport in the vsock_poll().

    We could put barriers to synchronize, but this can only happen during
    connection setup, so we can simply check that 'transport' is valid.

    Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
    Reported-and-tested-by: syzbot+a61bac2fcc1a7c6623fe@syzkaller.appspotmail.com
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

25 Jul, 2020

1 commit

  • Rework the remaining setsockopt code to pass a sockptr_t instead of a
    plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
    outside of architecture specific code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Stefan Schmidt [ieee802154]
    Acked-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

20 Jul, 2020

1 commit


28 May, 2020

1 commit

  • The accept(2) is an "input" socket interface, so we should use
    SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout.

    So this patch replace sock_sndtimeo() with sock_rcvtimeo() to
    use the right timeout in the vsock_accept().

    Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

28 Feb, 2020

1 commit

  • Some transports (hyperv, virtio) acquire the sock lock during the
    .release() callback.

    In the vsock_stream_connect() we call vsock_assign_transport(); if
    the socket was previously assigned to another transport, the
    vsk->transport->release() is called, but the sock lock is already
    held in the vsock_stream_connect(), causing a deadlock reported by
    syzbot:

    INFO: task syz-executor280:9768 blocked for more than 143 seconds.
    Not tainted 5.6.0-rc1-syzkaller #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syz-executor280 D27912 9768 9766 0x00000000
    Call Trace:
    context_switch kernel/sched/core.c:3386 [inline]
    __schedule+0x934/0x1f90 kernel/sched/core.c:4082
    schedule+0xdc/0x2b0 kernel/sched/core.c:4156
    __lock_sock+0x165/0x290 net/core/sock.c:2413
    lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
    virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
    vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
    vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
    __sys_connect_file+0x161/0x1c0 net/socket.c:1857
    __sys_connect+0x174/0x1b0 net/socket.c:1874
    __do_sys_connect net/socket.c:1885 [inline]
    __se_sys_connect net/socket.c:1882 [inline]
    __x64_sys_connect+0x73/0xb0 net/socket.c:1882
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    To avoid this issue, this patch remove the lock acquiring in the
    .release() callback of hyperv and virtio transports, and it holds
    the lock when we call vsk->transport->release() in the vsock core.

    Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
    Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

12 Dec, 2019

2 commits

  • Now that we have a transport that can handle the local communication,
    we can use it when it is loaded.

    A socket will use the local transport (loopback) when the remote
    CID is:
    - equal to VMADDR_CID_LOCAL
    - or equal to transport_g2h->get_local_cid(), if transport_g2h
    is loaded (this allows us to keep the same behavior implemented
    by virtio and vmci transports)
    - or equal to VMADDR_CID_HOST, if transport_g2h is not loaded

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch allows to register a transport able to handle
    local communication (loopback).

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

22 Nov, 2019

1 commit

  • If transport->init() fails, we can't assign the transport to the
    socket, because it's not initialized correctly, and any future
    calls to the transport callbacks would have an unexpected behavior.

    Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
    Reported-and-tested-by: syzbot+e2e5c07bf353b2f79daa@syzkaller.appspotmail.com
    Signed-off-by: Stefano Garzarella
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

15 Nov, 2019

9 commits

  • When we are looking for a socket bound to a specific address,
    we also have to take into account the CID.

    This patch is useful with multi-transports support because it
    allows the binding of the same port with different CID, and
    it prevents a connection to a wrong socket bound to the same
    port, but with different CID.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds 'module' member in the 'struct vsock_transport'
    in order to get/put the transport module. This prevents the
    module unloading while sockets are assigned to it.

    We increase the module refcnt when a socket is assigned to a
    transport, and we decrease the module refcnt when the socket
    is destructed.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds the support of multiple transports in the
    VSOCK core.

    With the multi-transports support, we can use vsock with nested VMs
    (using also different hypervisors) loading both guest->host and
    host->guest transports at the same time.

    Major changes:
    - vsock core module can be loaded regardless of the transports
    - vsock_core_init() and vsock_core_exit() are renamed to
    vsock_core_register() and vsock_core_unregister()
    - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
    to identify which directions the transport can handle and if it's
    support DGRAM (only vmci)
    - each stream socket is assigned to a transport when the remote CID
    is set (during the connect() or when we receive a connection request
    on a listener socket).
    The remote CID is used to decide which transport to use:
    - remote CID host transport;
    - remote CID == local_cid (guest->host transport) will use guest->host
    transport for loopback (host->guest transports don't support loopback);
    - remote CID > VMADDR_CID_HOST will use host->guest transport;
    - listener sockets are not bound to any transports since no transport
    operations are done on it. In this way we can create a listener
    socket, also if the transports are not loaded or with VMADDR_CID_ANY
    to listen on all transports.
    - DGRAM sockets are handled as before, since only the vmci_transport
    provides this feature.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • vsock_insert_unbound() was called only when 'sock' parameter of
    __vsock_create() was not null. This only happened when
    __vsock_create() was called by vsock_create().

    In order to simplify the multi-transports support, this patch
    moves vsock_insert_unbound() at the end of vsock_create().

    Reviewed-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • All transports call __vsock_create() with the same parameters,
    most of them depending on the parent socket. In order to simplify
    the VSOCK core APIs exposed to the transports, this patch adds
    the vsock_create_connected() callable from transports to create
    a new socket when a connection request is received.
    We also unexported the __vsock_create().

    Suggested-by: Stefan Hajnoczi
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • virtio_transport and vmci_transport handle the buffer_size
    sockopts in a very similar way.

    In order to support multiple transports, this patch moves this
    handling in the core to allow the user to change the options
    also if the socket is not yet assigned to any transport.

    This patch also adds the '.notify_buffer_size' callback in the
    'struct virtio_transport' in order to inform the transport,
    when the buffer_size is changed by the user. It is also useful
    to limit the 'buffer_size' requested (e.g. virtio transports).

    Acked-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Since now the 'struct vsock_sock' object contains a pointer to
    the transport, this patch adds a parameter to the
    vsock_core_get_transport() to return the right transport
    assigned to the socket.

    This patch modifies also the virtio_transport_get_ops(), that
    uses the vsock_core_get_transport(), adding the
    'struct vsock_sock *' parameter.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • As a preparation to support multiple transports, this patch adds
    the 'transport' member at the 'struct vsock_sock'.
    This new field is initialized during the creation in the
    __vsock_create() function.

    This patch also renames the global 'transport' pointer to
    'transport_single', since for now we're only supporting a single
    transport registered at run-time.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
    We can replace it calling the virtio_transport_get_ops() and
    using the get_local_cid() callback registered by the transport.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

07 Nov, 2019

1 commit


06 Nov, 2019

1 commit


29 Oct, 2019

1 commit


02 Oct, 2019

1 commit

  • Lockdep is unhappy if two locks from the same class are held.

    Fix the below warning for hyperv and virtio sockets (vmci socket code
    doesn't have the issue) by using lock_sock_nested() when __vsock_release()
    is called recursively:

    ============================================
    WARNING: possible recursive locking detected
    5.3.0+ #1 Not tainted
    --------------------------------------------
    server/1795 is trying to acquire lock:
    ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]

    but task is already holding lock:
    ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(sk_lock-AF_VSOCK);
    lock(sk_lock-AF_VSOCK);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    2 locks held by server/1795:
    #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
    #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

    stack backtrace:
    CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
    Call Trace:
    dump_stack+0x67/0x90
    __lock_acquire.cold.67+0xd2/0x20b
    lock_acquire+0xb5/0x1c0
    lock_sock_nested+0x6d/0x90
    hvs_release+0x10/0x120 [hv_sock]
    __vsock_release+0x24/0xf0 [vsock]
    __vsock_release+0xa0/0xf0 [vsock]
    vsock_release+0x12/0x30 [vsock]
    __sock_release+0x37/0xa0
    sock_close+0x14/0x20
    __fput+0xc1/0x250
    task_work_run+0x98/0xc0
    do_exit+0x344/0xc60
    do_group_exit+0x47/0xb0
    get_signal+0x15c/0xc50
    do_signal+0x30/0x720
    exit_to_usermode_loop+0x50/0xa0
    do_syscall_64+0x24e/0x270
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f4184e85f31

    Tested-by: Stefano Garzarella
    Signed-off-by: Dexuan Cui
    Reviewed-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Dexuan Cui
     

18 Jun, 2019

1 commit


15 Jun, 2019

1 commit

  • The current vsock code for removal of socket from the list is both
    subject to race and inefficient. It takes the lock, checks whether
    the socket is in the list, drops the lock and if the socket was on the
    list, deletes it from the list. This is subject to race because as soon
    as the lock is dropped once it is checked for presence, that condition
    cannot be relied upon for any decision. It is also inefficient because
    if the socket is present in the list, it takes the lock twice.

    Signed-off-by: Sunil Muthuswamy
    Signed-off-by: David S. Miller

    Sunil Muthuswamy
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation version 2 and no later version this
    program is distributed in the hope that it will be useful but
    without any warranty without even the implied warranty of
    merchantability or fitness for a particular purpose see the gnu
    general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 33 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190530000435.345978407@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

04 Feb, 2019

1 commit

  • This is a cleanup to prepare for the addition of 64-bit time_t
    in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
    unnecessarily complex and error-prone, moving it all into the
    main setsockopt()/getsockopt() implementation requires half
    as much code and is easier to extend.

    32-bit user space can now use old_timeval32 on both 32-bit
    and 64-bit machines, while 64-bit code can use
    __old_kernel_timeval.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

16 Jan, 2019

1 commit


15 Dec, 2018

1 commit

  • The old code always starts from fixed port for VMADDR_PORT_ANY. Sometimes
    when VMM crashed, there is still orphaned vsock which is waiting for
    close timer, then it could cause connection time out for new started VM
    if they are trying to connect to same port with same guest cid since the
    new packets could hit that orphaned vsock. We could also fix this by doing
    more in vhost_vsock_reset_orphans, but any way, it should be better to start
    from a random local port instead of a fixed one.

    Signed-off-by: Lepton Wu
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Lepton Wu
     

08 Aug, 2018

1 commit

  • syzbot reported that we reinitialize an active delayed
    work in vsock_stream_connect():

    ODEBUG: init active (active state 0) object type: timer_list hint:
    delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
    WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
    debug_print_object+0x16a/0x210 lib/debugobjects.c:326

    The pattern is apparently wrong, we should only initialize
    the dealyed work once and could repeatly schedule it. So we
    have to move out the initializations to allocation side.
    And to avoid confusion, we can split the shared dwork
    into two, instead of re-using the same one.

    Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
    Reported-by:
    Cc: Andy king
    Cc: Stefan Hajnoczi
    Cc: Jorgen Hansen
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 May, 2018

1 commit


17 Apr, 2018

1 commit

  • Commit c1eef220c1760762753b602c382127bfccee226d ("vsock: always call
    vsock_init_tables()") introduced a module_init() function without a
    corresponding module_exit() function.

    Modules with an init function can only be removed if they also have an
    exit function. Therefore the vsock module was considered "permanent"
    and could not be removed.

    This patch adds an empty module_exit() function so that "rmmod vsock"
    works. No explicit cleanup is required because:

    1. Transports call vsock_core_exit() upon exit and cannot be removed
    while sockets are still alive.
    2. vsock_diag.ko does not perform any action that requires cleanup by
    vsock.ko.

    Fixes: c1eef220c176 ("vsock: always call vsock_init_tables()")
    Reported-by: Xiumei Mu
    Cc: Cong Wang
    Cc: Jorgen Hansen
    Signed-off-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     

13 Feb, 2018

1 commit

  • Changes since v1:
    Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

    Before:
    All these functions either return a negative error indicator,
    or store length of sockaddr into "int *socklen" parameter
    and return zero on success.

    "int *socklen" parameter is awkward. For example, if caller does not
    care, it still needs to provide on-stack storage for the value
    it does not need.

    None of the many FOO_getname() functions of various protocols
    ever used old value of *socklen. They always just overwrite it.

    This change drops this parameter, and makes all these functions, on success,
    return length of sockaddr. It's always >= 0 and can be differentiated
    from an error.

    Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

    rpc_sockname() lost "int buflen" parameter, since its only use was
    to be passed to kernel_getsockname() as &buflen and subsequently
    not used in any way.

    Userspace API is not changed.

    text data bss dec hex filename
    30108430 2633624 873672 33615726 200ef6e vmlinux.before.o
    30108109 2633612 873672 33615393 200ee21 vmlinux.o

    Signed-off-by: Denys Vlasenko
    CC: David S. Miller
    CC: linux-kernel@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: linux-bluetooth@vger.kernel.org
    CC: linux-decnet-user@lists.sourceforge.net
    CC: linux-wireless@vger.kernel.org
    CC: linux-rdma@vger.kernel.org
    CC: linux-sctp@vger.kernel.org
    CC: linux-nfs@vger.kernel.org
    CC: linux-x25@vger.kernel.org
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

31 Jan, 2018

1 commit

  • Pull poll annotations from Al Viro:
    "This introduces a __bitwise type for POLL### bitmap, and propagates
    the annotations through the tree. Most of that stuff is as simple as
    'make ->poll() instances return __poll_t and do the same to local
    variables used to hold the future return value'.

    Some of the obvious brainos found in process are fixed (e.g. POLLIN
    misspelled as POLL_IN). At that point the amount of sparse warnings is
    low and most of them are for genuine bugs - e.g. ->poll() instance
    deciding to return -EINVAL instead of a bitmap. I hadn't touched those
    in this series - it's large enough as it is.

    Another problem it has caught was eventpoll() ABI mess; select.c and
    eventpoll.c assumed that corresponding POLL### and EPOLL### were
    equal. That's true for some, but not all of them - EPOLL### are
    arch-independent, but POLL### are not.

    The last commit in this series separates userland POLL### values from
    the (now arch-independent) kernel-side ones, converting between them
    in the few places where they are copied to/from userland. AFAICS, this
    is the least disruptive fix preserving poll(2) ABI and making epoll()
    work on all architectures.

    As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
    it will trigger only on what would've triggered EPOLLWRBAND on other
    architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
    at all on sparc. With this patch they should work consistently on all
    architectures"

    * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
    make kernel-side POLL... arch-independent
    eventpoll: no need to mask the result of epi_item_poll() again
    eventpoll: constify struct epoll_event pointers
    debugging printk in sg_poll() uses %x to print POLL... bitmap
    annotate poll(2) guts
    9p: untangle ->poll() mess
    ->si_band gets POLL... bitmap stored into a user-visible long field
    ring_buffer_poll_wait() return value used as return value of ->poll()
    the rest of drivers/*: annotate ->poll() instances
    media: annotate ->poll() instances
    fs: annotate ->poll() instances
    ipc, kernel, mm: annotate ->poll() instances
    net: annotate ->poll() instances
    apparmor: annotate ->poll() instances
    tomoyo: annotate ->poll() instances
    sound: annotate ->poll() instances
    acpi: annotate ->poll() instances
    crypto: annotate ->poll() instances
    block: annotate ->poll() instances
    x86: annotate ->poll() instances
    ...

    Linus Torvalds
     

27 Jan, 2018

1 commit

  • select(2) with wfds but no rfds must return when the socket is shut down
    by the peer. This way userspace notices socket activity and gets -EPIPE
    from the next write(2).

    Currently select(2) does not return for virtio-vsock when a SEND+RCV
    shutdown packet is received. This is because vsock_poll() only sets
    POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the
    socket is in when the shutdown is received.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefan Hajnoczi
     

28 Nov, 2017

1 commit


26 Oct, 2017

1 commit

  • Although CONFIG_VSOCKETS_DIAG depends on CONFIG_VSOCKETS,
    vsock_init_tables() is not always called, it is called only
    if other modules call its caller. Therefore if we only
    enable CONFIG_VSOCKETS_DIAG, it would crash kernel on uninitialized
    vsock_bind_table.

    This patch fixes it by moving vsock_init_tables() to its own
    module_init().

    Fixes: 413a4317aca7 ("VSOCK: add sock_diag interface")
    Reported-by: syzkaller bot
    Cc: Stefan Hajnoczi
    Cc: Jorgen Hansen
    Signed-off-by: Cong Wang
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Oct, 2017

1 commit

  • There are two state fields: socket->state and sock->sk_state. The
    socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
    sock->sk_state typically uses values that match TCP state constants
    (TCP_CLOSE, TCP_ESTABLISHED). AF_VSOCK does not follow this convention
    and instead uses SS_* constants for both fields.

    The sk_state field will be exposed to userspace through the vsock_diag
    interface for ss(8), netstat(8), and other programs.

    This patch switches sk_state to TCP state constants so that the meaning
    of this field is consistent with other address families. Not just
    AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.

    The following mapping was used to convert the code:

    SS_FREE -> TCP_CLOSE
    SS_UNCONNECTED -> TCP_CLOSE
    SS_CONNECTING -> TCP_SYN_SENT
    SS_CONNECTED -> TCP_ESTABLISHED
    SS_DISCONNECTING -> TCP_CLOSING
    VSOCK_SS_LISTEN -> TCP_LISTEN

    In __vsock_create() the sk_state initialization was dropped because
    sock_init_data() already initializes sk_state to TCP_CLOSE.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Stefan Hajnoczi