02 Apr, 2020

5 commits

  • This patch introduces a vDPA-based vhost backend. This backend is
    built on top of the same interface defined in virtio-vDPA and provides
    a generic vhost interface for userspace to accelerate the virtio
    devices in guest.

    This backend is implemented as a vDPA device driver on top of the same
    ops used in virtio-vDPA. It will create char device entry named
    vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
    on top of this char device to setup the backend.

    Vhost ioctls are extended to make it type agnostic and behave like a
    virtio device, this help to eliminate type specific API like what
    vhost_net/scsi/vsock did:

    - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
    by virtio specification to differ from different type of devices
    - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
    supported by the vDPA device
    - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
    - VHOST_VDPA_SET/GET_CONFIG: access virtio config space
    - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue

    For memory mapping, IOTLB API is mandated for vhost-vDPA which means
    userspace drivers are required to use
    VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
    a specific userspace memory region.

    The vhost-vDPA API is designed to be type agnostic, but it allows net
    device only in current stage. Due to the lacking of control virtqueue
    support, some features were filter out by vhost-vdpa.

    We will enable more features and devices in the near future.

    Signed-off-by: Tiwei Bie
    Signed-off-by: Eugenio Pérez
    Signed-off-by: Jason Wang
    Link: https://lore.kernel.org/r/20200326140125.19794-8-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin

    Tiwei Bie
     
  • This patch implements the third memory accessor for vringh besides
    current kernel and userspace accessors. This idea is to allow vringh
    to do the address translation through an IOTLB which is implemented
    via vhost_map interval tree. Users should setup and IOVA to PA mapping
    in this IOTLB.

    This allows us to:

    - Use vringh to access virtqueues with vIOMMU
    - Use vringh to implement software virtqueues for vDPA devices

    Signed-off-by: Jason Wang
    Link: https://lore.kernel.org/r/20200326140125.19794-5-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • This patch factors out IOTLB into a dedicated module in order to be
    reused by other modules like vringh. User may choose to enable the
    automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit
    for the case of vhost device IOTLB implementation.

    Signed-off-by: Jason Wang
    Link: https://lore.kernel.org/r/20200326140125.19794-4-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • This patch allow device to register its own message handler during
    vhost_dev_init(). vDPA device will use it to implement its own DMA
    mapping logic.

    Signed-off-by: Jason Wang
    Link: https://lore.kernel.org/r/20200326140125.19794-3-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • Currently, CONFIG_VHOST depends on CONFIG_VIRTUALIZATION. But vhost is
    not necessarily for VM since it's a generic userspace and kernel
    communication protocol. Such dependency may prevent archs without
    virtualization support from using vhost.

    To solve this, a dedicated vhost menu is created under drivers so
    CONIFG_VHOST can be decoupled out of CONFIG_VIRTUALIZATION.

    While at it, also squash Kconfig.vringh into vhost Kconfig file. This
    avoids the trick of conditional inclusion from VOP or CAIF. Then it
    will be easier to introduce new vringh users and common dependency for
    both vringh and vhost.

    Signed-off-by: Jason Wang
    Link: https://lore.kernel.org/r/20200326140125.19794-2-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     

23 Feb, 2020

1 commit

  • Doing so, we save one call to get data we already have in the struct.

    Also, since there is no guarantee that getname use sockaddr_ll
    parameter beyond its size, we add a little bit of security here.
    It should do not do beyond MAX_ADDR_LEN, but syzbot found that
    ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25,
    versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro).

    Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
    Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com
    Signed-off-by: Eugenio Pérez
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Eugenio Pérez
     

09 Dec, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) More jumbo frame fixes in r8169, from Heiner Kallweit.

    2) Fix bpf build in minimal configuration, from Alexei Starovoitov.

    3) Use after free in slcan driver, from Jouni Hogander.

    4) Flower classifier port ranges don't work properly in the HW offload
    case, from Yoshiki Komachi.

    5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.

    6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.

    7) Fix flow dissection in dsa TX path, from Alexander Lobakin.

    8) Stale syncookie timestampe fixes from Guillaume Nault.

    [ Did an evil merge to silence a warning introduced by this pull - Linus ]

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
    r8169: fix rtl_hw_jumbo_disable for RTL8168evl
    net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
    r8169: add missing RX enabling for WoL on RTL8125
    vhost/vsock: accept only packets with the right dst_cid
    net: phy: dp83867: fix hfs boot in rgmii mode
    net: ethernet: ti: cpsw: fix extra rx interrupt
    inet: protect against too small mtu values.
    gre: refetch erspan header from skb->data after pskb_may_pull()
    pppoe: remove redundant BUG_ON() check in pppoe_pernet
    tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
    tcp: tighten acceptance of ACKs not matching a child socket
    tcp: fix rejected syncookies due to stale timestamps
    lpc_eth: kernel BUG on remove
    tcp: md5: fix potential overestimation of TCP option space
    net: sched: allow indirect blocks to bind to clsact in TC
    net: core: rename indirect block ingress cb function
    net-sysfs: Call dev_hold always in netdev_queue_add_kobject
    net: dsa: fix flow dissection on Tx path
    net/tls: Fix return values to avoid ENOTSUPP
    net: avoid an indirect call in ____sys_recvmsg()
    ...

    Linus Torvalds
     

08 Dec, 2019

1 commit


05 Dec, 2019

1 commit

  • Add kcov_remote_start()/kcov_remote_stop() annotations to the
    vhost_worker() function, which is responsible for processing vhost
    works.

    Since vhost_worker() threads are spawned per vhost device instance the
    common kcov handle is used for kcov_remote_start()/stop() annotations
    (see Documentation/dev-tools/kcov.rst for details). As the result kcov
    can now be used to collect coverage from vhost worker threads.

    Link: http://lkml.kernel.org/r/e49d5d154e5da6c9ada521d2b7ce10a49ce9f98b.1572366574.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Cc: Alan Stern
    Cc: Alexander Potapenko
    Cc: Anders Roxell
    Cc: Arnd Bergmann
    Cc: David Windsor
    Cc: Dmitry Vyukov
    Cc: Elena Reshetova
    Cc: Greg Kroah-Hartman
    Cc: Jason Wang
    Cc: Marco Elver
    Cc: "Michael S. Tsirkin"
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

02 Dec, 2019

1 commit

  • Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
    "As part of the cleanup of some remaining y2038 issues, I came to
    fs/compat_ioctl.c, which still has a couple of commands that need
    support for time64_t.

    In completely unrelated work, I spent time on cleaning up parts of
    this file in the past, moving things out into drivers instead.

    After Al Viro reviewed an earlier version of this series and did a lot
    more of that cleanup, I decided to try to completely eliminate the
    rest of it and move it all into drivers.

    This series incorporates some of Al's work and many patches of my own,
    but in the end stops short of actually removing the last part, which
    is the scsi ioctl handlers. I have patches for those as well, but they
    need more testing or possibly a rewrite"

    * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
    scsi: sd: enable compat ioctls for sed-opal
    pktcdvd: add compat_ioctl handler
    compat_ioctl: move SG_GET_REQUEST_TABLE handling
    compat_ioctl: ppp: move simple commands into ppp_generic.c
    compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
    compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
    compat_ioctl: unify copy-in of ppp filters
    tty: handle compat PPP ioctls
    compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
    compat_ioctl: handle SIOCOUTQNSD
    af_unix: add compat_ioctl support
    compat_ioctl: reimplement SG_IO handling
    compat_ioctl: move WDIOC handling into wdt drivers
    fs: compat_ioctl: move FITRIM emulation into file systems
    gfs2: add compat_ioctl support
    compat_ioctl: remove unused convert_in_user macro
    compat_ioctl: remove last RAID handling code
    compat_ioctl: remove /dev/raw ioctl translation
    compat_ioctl: remove PCI ioctl translation
    compat_ioctl: remove joystick ioctl translation
    ...

    Linus Torvalds
     

15 Nov, 2019

5 commits

  • In a nested VM environment, we have to refuse to assign to a nested
    guest the same CID assigned to our guest->host transport.
    In this way, the user can use the local CID for loopback.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds 'module' member in the 'struct vsock_transport'
    in order to get/put the transport module. This prevents the
    module unloading while sockets are assigned to it.

    We increase the module refcnt when a socket is assigned to a
    transport, and we decrease the module refcnt when the socket
    is destructed.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • This patch adds the support of multiple transports in the
    VSOCK core.

    With the multi-transports support, we can use vsock with nested VMs
    (using also different hypervisors) loading both guest->host and
    host->guest transports at the same time.

    Major changes:
    - vsock core module can be loaded regardless of the transports
    - vsock_core_init() and vsock_core_exit() are renamed to
    vsock_core_register() and vsock_core_unregister()
    - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
    to identify which directions the transport can handle and if it's
    support DGRAM (only vmci)
    - each stream socket is assigned to a transport when the remote CID
    is set (during the connect() or when we receive a connection request
    on a listener socket).
    The remote CID is used to decide which transport to use:
    - remote CID host transport;
    - remote CID == local_cid (guest->host transport) will use guest->host
    transport for loopback (host->guest transports don't support loopback);
    - remote CID > VMADDR_CID_HOST will use host->guest transport;
    - listener sockets are not bound to any transports since no transport
    operations are done on it. In this way we can create a listener
    socket, also if the transports are not loaded or with VMADDR_CID_ANY
    to listen on all transports.
    - DGRAM sockets are handled as before, since only the vmci_transport
    provides this feature.

    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • virtio_transport and vmci_transport handle the buffer_size
    sockopts in a very similar way.

    In order to support multiple transports, this patch moves this
    handling in the core to allow the user to change the options
    also if the socket is not yet assigned to any transport.

    This patch also adds the '.notify_buffer_size' callback in the
    'struct virtio_transport' in order to inform the transport,
    when the buffer_size is changed by the user. It is also useful
    to limit the 'buffer_size' requested (e.g. virtio transports).

    Acked-by: Dexuan Cui
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • We are going to add 'struct vsock_sock *' parameter to
    virtio_transport_get_ops().

    In some cases, like in the virtio_transport_reset_no_sock(),
    we don't have any socket assigned to the packet received,
    so we can't use the virtio_transport_get_ops().

    In order to allow virtio_transport_reset_no_sock() to use the
    '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
    we add the 'struct virtio_transport *' to it and to its caller:
    virtio_transport_recv_pkt().

    We moved the 'vhost_transport' and 'virtio_transport' definition,
    to pass their address to the virtio_transport_recv_pkt().

    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Stefano Garzarella
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

28 Oct, 2019

1 commit


23 Oct, 2019

1 commit

  • Each of these drivers has a copy of the same trivial helper function to
    convert the pointer argument and then call the native ioctl handler.

    We now have a generic implementation of that, so use it.

    Acked-by: Greg Kroah-Hartman
    Acked-by: Michael S. Tsirkin
    Acked-by: David S. Miller
    Acked-by: Jarkko Sakkinen
    Reviewed-by: Jarkko Sakkinen
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Jiri Kosina
    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Cornelia Huck
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

13 Oct, 2019

1 commit

  • When device stop was moved out of reset, test device wasn't updated to
    stop before reset, this resulted in a use after free. Fix by invoking
    stop appropriately.

    Fixes: b211616d7125 ("vhost: move -net specific code out")
    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

15 Sep, 2019

2 commits


12 Sep, 2019

2 commits

  • The code assumes log_num < in_num everywhere, and that is true as long as
    in_num is incremented by descriptor iov count, and log_num by 1. However
    this breaks if there's a zero sized descriptor.

    As a result, if a malicious guest creates a vring desc with desc.len = 0,
    it may cause the host kernel to crash by overflowing the log array. This
    bug can be triggered during the VM migration.

    There's no need to log when desc.len = 0, so just don't increment log_num
    in this case.

    Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server")
    Cc: stable@vger.kernel.org
    Reviewed-by: Lidong Chen
    Signed-off-by: ruippan
    Signed-off-by: yongduan
    Acked-by: Michael S. Tsirkin
    Reviewed-by: Tyler Hicks
    Signed-off-by: Michael S. Tsirkin

    yongduan
     
  • iovec addresses coming from vhost are assumed to be
    pre-validated, but in fact can be speculated to a value
    out of range.

    Userspace address are later validated with array_index_nospec so we can
    be sure kernel info does not leak through these addresses, but vhost
    must also not leak userspace info outside the allowed memory table to
    guests.

    Following the defence in depth principle, make sure
    the address is not validated out of node range.

    Signed-off-by: Michael S. Tsirkin
    Cc: stable@vger.kernel.org
    Acked-by: Jason Wang
    Tested-by: Jason Wang

    Michael S. Tsirkin
     

04 Sep, 2019

4 commits

  • This reverts commit 7f466032dc ("vhost: access vq metadata through
    kernel virtual address"). The commit caused a bunch of issues, and
    while commit 73f628ec9e ("vhost: disable metadata prefetch
    optimization") disabled the optimization it's not nice to keep lots of
    dead code around.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • It is unnecessary to use ret variable to return the error
    code, just return the error code directly.

    Signed-off-by: Yunsheng Lin
    Signed-off-by: Michael S. Tsirkin

    Yunsheng Lin
     
  • Since vhost_exceeds_weight() was introduced, callers need to specify
    the packet weight and byte weight in vhost_dev_init(). Note that, the
    packet weight isn't counted in this patch to keep the original behavior
    unchanged.

    Fixes: e82b9b0727ff ("vhost: introduce vhost_exceeds_weight()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Tiwei Bie
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Jason Wang

    Tiwei Bie
     
  • Since below commit, callers need to specify the iov_limit in
    vhost_dev_init() explicitly.

    Fixes: b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Tiwei Bie
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Jason Wang

    Tiwei Bie
     

07 Aug, 2019

1 commit


31 Jul, 2019

2 commits

  • If the packets to sent to the guest are bigger than the buffer
    available, we can split them, using multiple buffers and fixing
    the length in the packet header.
    This is safe since virtio-vsock supports only stream sockets.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     
  • Since virtio-vsock was introduced, the buffers filled by the host
    and pushed to the guest using the vring, are directly queued in
    a per-socket list. These buffers are preallocated by the guest
    with a fixed size (4 KB).

    The maximum amount of memory used by each socket should be
    controlled by the credit mechanism.
    The default credit available per-socket is 256 KB, but if we use
    only 1 byte per packet, the guest can queue up to 262144 of 4 KB
    buffers, using up to 1 GB of memory per-socket. In addition, the
    guest will continue to fill the vring with new 4 KB free buffers
    to avoid starvation of other sockets.

    This patch mitigates this issue copying the payload of small
    packets (< 128 bytes) into the buffer of last packet queued, in
    order to avoid wasting memory.

    Signed-off-by: Stefano Garzarella
    Reviewed-by: Stefan Hajnoczi
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Stefano Garzarella
     

26 Jul, 2019

1 commit


18 Jul, 2019

1 commit

  • Pull virtio, vhost updates from Michael Tsirkin:
    "Fixes, features, performance:

    - new iommu device

    - vhost guest memory access using vmap (just meta-data for now)

    - minor fixes"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio-mmio: add error check for platform_get_irq
    scsi: virtio_scsi: Use struct_size() helper
    iommu/virtio: Add event queue
    iommu/virtio: Add probe request
    iommu: Add virtio-iommu driver
    PCI: OF: Initialize dev->fwnode appropriately
    of: Allow the iommu-map property to omit untranslated devices
    dt-bindings: virtio: Add virtio-pci-iommu node
    dt-bindings: virtio-mmio: Add IOMMU description
    vhost: fix clang build warning
    vhost: access vq metadata through kernel virtual address
    vhost: factor out setting vring addr and num
    vhost: introduce helpers to get the size of metadata area
    vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
    vhost: fine grain userspace memory accessors
    vhost: generalize adding used elem

    Linus Torvalds
     

12 Jul, 2019

1 commit

  • Pull networking updates from David Miller:
    "Some highlights from this development cycle:

    1) Big refactoring of ipv6 route and neigh handling to support
    nexthop objects configurable as units from userspace. From David
    Ahern.

    2) Convert explored_states in BPF verifier into a hash table,
    significantly decreased state held for programs with bpf2bpf
    calls, from Alexei Starovoitov.

    3) Implement bpf_send_signal() helper, from Yonghong Song.

    4) Various classifier enhancements to mvpp2 driver, from Maxime
    Chevallier.

    5) Add aRFS support to hns3 driver, from Jian Shen.

    6) Fix use after free in inet frags by allocating fqdirs dynamically
    and reworking how rhashtable dismantle occurs, from Eric Dumazet.

    7) Add act_ctinfo packet classifier action, from Kevin
    Darbyshire-Bryant.

    8) Add TFO key backup infrastructure, from Jason Baron.

    9) Remove several old and unused ISDN drivers, from Arnd Bergmann.

    10) Add devlink notifications for flash update status to mlxsw driver,
    from Jiri Pirko.

    11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.

    12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.

    13) Various enhancements to ipv6 flow label handling, from Eric
    Dumazet and Willem de Bruijn.

    14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
    der Merwe, and others.

    15) Various improvements to axienet driver including converting it to
    phylink, from Robert Hancock.

    16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.

    17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
    Radulescu.

    18) Add devlink health reporting to mlx5, from Moshe Shemesh.

    19) Convert stmmac over to phylink, from Jose Abreu.

    20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
    Shalom Toledo.

    21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.

    22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.

    23) Track spill/fill of constants in BPF verifier, from Alexei
    Starovoitov.

    24) Support bounded loops in BPF, from Alexei Starovoitov.

    25) Various page_pool API fixes and improvements, from Jesper Dangaard
    Brouer.

    26) Just like ipv4, support ref-countless ipv6 route handling. From
    Wei Wang.

    27) Support VLAN offloading in aquantia driver, from Igor Russkikh.

    28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.

    29) Add flower GRE encap/decap support to nfp driver, from Pieter
    Jansen van Vuuren.

    30) Protect against stack overflow when using act_mirred, from John
    Hurley.

    31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.

    32) Use page_pool API in netsec driver, Ilias Apalodimas.

    33) Add Google gve network driver, from Catherine Sullivan.

    34) More indirect call avoidance, from Paolo Abeni.

    35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.

    36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.

    37) Add MPLS manipulation actions to TC, from John Hurley.

    38) Add sending a packet to connection tracking from TC actions, and
    then allow flower classifier matching on conntrack state. From
    Paul Blakey.

    39) Netfilter hw offload support, from Pablo Neira Ayuso"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
    net/mlx5e: Return in default case statement in tx_post_resync_params
    mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
    net: dsa: add support for BRIDGE_MROUTER attribute
    pkt_sched: Include const.h
    net: netsec: remove static declaration for netsec_set_tx_de()
    net: netsec: remove superfluous if statement
    netfilter: nf_tables: add hardware offload support
    net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
    net: flow_offload: add flow_block_cb_is_busy() and use it
    net: sched: remove tcf block API
    drivers: net: use flow block API
    net: sched: use flow block API
    net: flow_offload: add flow_block_cb_{priv, incref, decref}()
    net: flow_offload: add list handling functions
    net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
    net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
    net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
    net: flow_offload: add flow_block_cb_setup_simple()
    net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
    net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
    ...

    Linus Torvalds
     

10 Jul, 2019

1 commit

  • Pull Documentation updates from Jonathan Corbet:
    "It's been a relatively busy cycle for docs:

    - A fair pile of RST conversions, many from Mauro. These create more
    than the usual number of simple but annoying merge conflicts with
    other trees, unfortunately. He has a lot more of these waiting on
    the wings that, I think, will go to you directly later on.

    - A new document on how to use merges and rebases in kernel repos,
    and one on Spectre vulnerabilities.

    - Various improvements to the build system, including automatic
    markup of function() references because some people, for reasons I
    will never understand, were of the opinion that
    :c:func:``function()`` is unattractive and not fun to type.

    - We now recommend using sphinx 1.7, but still support back to 1.4.

    - Lots of smaller improvements, warning fixes, typo fixes, etc"

    * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
    docs: automarkup.py: ignore exceptions when seeking for xrefs
    docs: Move binderfs to admin-guide
    Disable Sphinx SmartyPants in HTML output
    doc: RCU callback locks need only _bh, not necessarily _irq
    docs: format kernel-parameters -- as code
    Doc : doc-guide : Fix a typo
    platform: x86: get rid of a non-existent document
    Add the RCU docs to the core-api manual
    Documentation: RCU: Add TOC tree hooks
    Documentation: RCU: Rename txt files to rst
    Documentation: RCU: Convert RCU UP systems to reST
    Documentation: RCU: Convert RCU linked list to reST
    Documentation: RCU: Convert RCU basic concepts to reST
    docs: filesystems: Remove uneeded .rst extension on toctables
    scripts/sphinx-pre-install: fix out-of-tree build
    docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
    Documentation: PGP: update for newer HW devices
    Documentation: Add section about CPU vulnerabilities for Spectre
    Documentation: platform: Delete x86-laptop-drivers.txt
    docs: Note that :c:func: should no longer be used
    ...

    Linus Torvalds
     

22 Jun, 2019

1 commit


19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this work is licensed under the terms of the gnu gpl version 2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 48 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Enrico Weigelt
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

18 Jun, 2019

1 commit

  • Vhost_net was known to suffer from HOL[1] issues which is not easy to
    fix. Several downstream disable the feature by default. What's more,
    the datapath was split and datacopy path got the support of batching
    and XDP support recently which makes it faster than zerocopy part for
    small packets transmission.

    It looks to me that disable zerocopy by default is more
    appropriate. It cold be enabled by default again in the future if we
    fix the above issues.

    [1] https://patchwork.kernel.org/patch/3787671/

    Signed-off-by: Jason Wang
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Jason Wang
     

15 Jun, 2019

1 commit


09 Jun, 2019

1 commit

  • Mostly due to x86 and acpi conversion, several documentation
    links are still pointing to the old file. Fix them.

    Signed-off-by: Mauro Carvalho Chehab
    Reviewed-by: Wolfram Sang
    Reviewed-by: Sven Van Asbroeck
    Reviewed-by: Bhupesh Sharma
    Acked-by: Mark Brown
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

07 Jun, 2019

1 commit

  • Clang warns:

    drivers/vhost/vhost.c:2085:5: warning: macro expansion producing
    'defined' has undefined behavior [-Wexpansion-to-defined]
    #if VHOST_ARCH_CAN_ACCEL_UACCESS
    ^
    drivers/vhost/vhost.h:98:38: note: expanded from macro
    'VHOST_ARCH_CAN_ACCEL_UACCESS'
    #define VHOST_ARCH_CAN_ACCEL_UACCESS defined(CONFIG_MMU_NOTIFIER) && \
    ^

    It's being pedantic for the sake of portability, but the fix is easy
    enough.

    Rework the definition of VHOST_ARCH_CAN_ACCEL_UACCESS to expand to a constant.

    Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
    Link: https://github.com/ClangBuiltLinux/linux/issues/508
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Nathan Chancellor
    Tested-by: Nathan Chancellor

    Michael S. Tsirkin
     

06 Jun, 2019

1 commit

  • It was noticed that the copy_to/from_user() friends that was used to
    access virtqueue metdata tends to be very expensive for dataplane
    implementation like vhost since it involves lots of software checks,
    speculation barriers, hardware feature toggling (e.g SMAP). The
    extra cost will be more obvious when transferring small packets since
    the time spent on metadata accessing become more significant.

    This patch tries to eliminate those overheads by accessing them
    through direct mapping of those pages. Invalidation callbacks is
    implemented for co-operation with general VM management (swap, KSM,
    THP or NUMA balancing). We will try to get the direct mapping of vq
    metadata before each round of packet processing if it doesn't
    exist. If we fail, we will simplely fallback to copy_to/from_user()
    friends.

    This invalidation and direct mapping access are synchronized through
    spinlock and RCU. All matedata accessing through direct map is
    protected by RCU, and the setup or invalidation are done under
    spinlock.

    This method might does not work for high mem page which requires
    temporary mapping so we just fallback to normal
    copy_to/from_user() and may not for arch that has virtual tagged cache
    since extra cache flushing is needed to eliminate the alias. This will
    result complex logic and bad performance. For those archs, this patch
    simply go for copy_to/from_user() friends. This is done by ruling out
    kernel mapping codes through ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.

    Note that this is only done when device IOTLB is not enabled. We
    could use similar method to optimize IOTLB in the future.

    Tests shows at most about 23% improvement on TX PPS when using
    virtio-user + vhost_net + xdp1 + TAP on 2.6GHz Broadwell:

    SMAP on | SMAP off
    Before: 5.2Mpps | 7.1Mpps
    After: 6.4Mpps | 8.2Mpps

    Cc: Andrea Arcangeli
    Cc: James Bottomley
    Cc: Christoph Hellwig
    Cc: David Miller
    Cc: Jerome Glisse
    Cc: linux-mm@kvack.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-parisc@vger.kernel.org
    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang