Eric Lee / smarc-fsl-linux-kernel

07 Oct, 2020

1 commit

215459ff3 vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() ... Browse Code »

[ Upstream commit 4c7246dc45e2706770d5233f7ce1597a07e069ba ]

We are going to add 'struct vsock_sock *' parameter to
virtio_transport_get_ops().

In some cases, like in the virtio_transport_reset_no_sock(),
we don't have any socket assigned to the packet received,
so we can't use the virtio_transport_get_ops().

In order to allow virtio_transport_reset_no_sock() to use the
'.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
we add the 'struct virtio_transport *' to it and to its caller:
virtio_transport_recv_pkt().

We moved the 'vhost_transport' and 'virtio_transport' definition,
to pass their address to the virtio_transport_recv_pkt().

Reviewed-by: Stefan Hajnoczi
Signed-off-by: Stefano Garzarella
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin

Stefano Garzarella
2020-10-07 14:01:24 +0800

05 Aug, 2020

1 commit

96f105943 vhost/scsi: fix up req type endian-ness ... Browse Code »

commit 295c1b9852d000580786375304a9800bd9634d15 upstream.

vhost/scsi doesn't handle type conversion correctly
for request type when using virtio 1.0 and up for BE,
or cross-endian platforms.

Fix it up using vhost_32_to_cpu.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin
Acked-by: Jason Wang
Reviewed-by: Stefan Hajnoczi
Signed-off-by: Greg Kroah-Hartman

Michael S. Tsirkin
2020-08-05 15:59:42 +0800

24 Jun, 2020

1 commit

6a70c943a scsi: vhost: Notify TCM about the maximum sg entries supported per command ... Browse Code »

[ Upstream commit 5ae6a6a915033bfee79e76e0c374d4f927909edc ]

vhost-scsi pre-allocates the maximum sg entries per command and if a
command requires more than VHOST_SCSI_PREALLOC_SGLS entries, then that
command is failed by it. This patch lets vhost communicate the max sg limit
when it registers vhost_scsi_ops with TCM. With this change, TCM would
report the max sg entries through "Block Limits" VPD page which will be
typically queried by the SCSI initiator during device discovery. By knowing
this limit, the initiator could ensure the maximum transfer length is less
than or equal to what is reported by vhost-scsi.

Link: https://lore.kernel.org/r/1590166317-953-1-git-send-email-sudhakar.panneerselvam@oracle.com
Cc: Michael S. Tsirkin
Cc: Jason Wang
Cc: Paolo Bonzini
Cc: Stefan Hajnoczi
Reviewed-by: Mike Christie
Signed-off-by: Sudhakar Panneerselvam
Signed-off-by: Martin K. Petersen
Signed-off-by: Sasha Levin

Sudhakar Panneerselvam
2020-06-24 23:50:17 +0800

27 May, 2020

1 commit

445437b41 vhost/vsock: fix packet delivery order to monitoring devices ... Browse Code »

[ Upstream commit 107bc0766b9feb5113074c753735a3f115c2141f ]

We want to deliver packets to monitoring devices before it is
put in the virtqueue, to avoid that replies can appear in the
packet capture before the transmitted packet.

Signed-off-by: Stefano Garzarella
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin

Stefano Garzarella
2020-05-27 23:46:31 +0800

10 May, 2020

1 commit

336c7260a vhost: vsock: kick send_pkt worker once device is started ... Browse Code »

commit 0b841030625cde5f784dd62aec72d6a766faae70 upstream.

Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connecting from the client side. The vhost vsock client tries to connect
an initializing virtio vsock server.

The abnormal flow looks like:
host-userspace vhost vsock guest vsock
============== =========== ============
connect() --------> vhost_transport_send_pkt_work() initializing
| vq->private_data==NULL
| will not be queued
V
schedule_timeout(2s)
vhost_vsock_start() private_data

wait for 2s and failed
connect() again vq->private_data!=NULL recv connecting pkt

Details:
1. Host userspace sends a connect pkt, at that time, guest vsock is under
initializing, hence the vhost_vsock_start has not been called. So
vq->private_data==NULL, and the pkt is not been queued to send to guest
2. Then it sleeps for 2s
3. After guest vsock finishes initializing, vq->private_data is set
4. When host userspace wakes up after 2s, send connecting pkt again,
everything is fine.

As suggested by Stefano Garzarella, this fixes it by additional kicking the
send_pkt worker in vhost_vsock_start once the virtio device is started. This
makes the pending pkt sent again.

After this patch, kata-runtime (with vsock enabled) boot time is reduced
from 3s to 1s on a ThunderX2 arm64 server.

[1] https://github.com/kata-containers/runtime/issues/1917

Reported-by: Ning Bo
Suggested-by: Stefano Garzarella
Signed-off-by: Jia He
Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com
Signed-off-by: Michael S. Tsirkin
Reviewed-by: Stefano Garzarella
Signed-off-by: Greg Kroah-Hartman

Jia He
2020-05-10 16:31:21 +0800

05 Mar, 2020

1 commit

f09fbb117 vhost: Check docket sk_family instead of call getname ... Browse Code »

commit 42d84c8490f9f0931786f1623191fcab397c3d64 upstream.

Doing so, we save one call to get data we already have in the struct.

Also, since there is no guarantee that getname use sockaddr_ll
parameter beyond its size, we add a little bit of security here.
It should do not do beyond MAX_ADDR_LEN, but syzbot found that
ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25,
versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro).

Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com
Signed-off-by: Eugenio Pérez
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eugenio Pérez
2020-03-05 23:43:44 +0800

05 Jan, 2020

1 commit

bb9aab784 vhost/vsock: accept only packets with the right dst_cid ... Browse Code »

[ Upstream commit 8a3cc29c316c17de590e3ff8b59f3d6cbfd37b0a ]

When we receive a new packet from the guest, we check if the
src_cid is correct, but we forgot to check the dst_cid.

The host should accept only packets where dst_cid is
equal to the host CID.

Signed-off-by: Stefano Garzarella
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Stefano Garzarella
2020-01-05 02:19:18 +0800

28 Oct, 2019

1 commit

b3683dee8 vringh: fix copy direction of vringh_iov_push_kern() ... Browse Code »

We want to copy from iov to buf, so the direction was wrong.

Note: no real user for the helper, but it will be used by future
features.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-10-28 16:25:04 +0800

13 Oct, 2019

1 commit

245cdd9fb vhost/test: stop device before reset ... Browse Code »

When device stop was moved out of reset, test device wasn't updated to
stop before reset, this resulted in a use after free. Fix by invoking
stop appropriately.

Fixes: b211616d7125 ("vhost: move -net specific code out")
Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2019-10-13 21:38:27 +0800

15 Sep, 2019

2 commits

aa2eaa8c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Minor overlapping changes in the btusb and ixgbe drivers.

Signed-off-by: David S. Miller

David S. Miller
2019-09-15 20:17:27 +0800
0d4a3f2ab Revert "vhost: block speculation of translated descriptors" ... Browse Code »

This reverts commit a89db445fbd7f1f8457b03759aa7343fa530ef6b.

I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2019-09-15 03:21:51 +0800

12 Sep, 2019

2 commits

060423bfd vhost: make sure log_num < in_num ... Browse Code »

The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: Lidong Chen
Signed-off-by: ruippan
Signed-off-by: yongduan
Acked-by: Michael S. Tsirkin
Reviewed-by: Tyler Hicks
Signed-off-by: Michael S. Tsirkin

yongduan
2019-09-12 03:15:26 +0800
a89db445f vhost: block speculation of translated descriptors ... Browse Code »

iovec addresses coming from vhost are assumed to be
pre-validated, but in fact can be speculated to a value
out of range.

Userspace address are later validated with array_index_nospec so we can
be sure kernel info does not leak through these addresses, but vhost
must also not leak userspace info outside the allowed memory table to
guests.

Following the defence in depth principle, make sure
the address is not validated out of node range.

Signed-off-by: Michael S. Tsirkin
Cc: stable@vger.kernel.org
Acked-by: Jason Wang
Tested-by: Jason Wang

Michael S. Tsirkin
2019-09-12 03:15:07 +0800

04 Sep, 2019

4 commits

3d2c7d370 Revert "vhost: access vq metadata through kernel virtual address" ... Browse Code »

This reverts commit 7f466032dc ("vhost: access vq metadata through
kernel virtual address"). The commit caused a bunch of issues, and
while commit 73f628ec9e ("vhost: disable metadata prefetch
optimization") disabled the optimization it's not nice to keep lots of
dead code around.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2019-09-04 19:39:48 +0800
896fc242b vhost: Remove unnecessary variable ... Browse Code »

It is unnecessary to use ret variable to return the error
code, just return the error code directly.

Signed-off-by: Yunsheng Lin
Signed-off-by: Michael S. Tsirkin

Yunsheng Lin
2019-09-04 18:21:17 +0800
264b563b8 vhost/test: fix build for vhost test ... Browse Code »

Since vhost_exceeds_weight() was introduced, callers need to specify
the packet weight and byte weight in vhost_dev_init(). Note that, the
packet weight isn't counted in this patch to keep the original behavior
unchanged.

Fixes: e82b9b0727ff ("vhost: introduce vhost_exceeds_weight()")
Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie
Signed-off-by: Michael S. Tsirkin
Acked-by: Jason Wang

Tiwei Bie
2019-09-04 18:21:17 +0800
93d2c4de8 vhost/test: fix build for vhost test ... Browse Code »

Since below commit, callers need to specify the iov_limit in
vhost_dev_init() explicitly.

Fixes: b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()")
Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie
Signed-off-by: Michael S. Tsirkin
Acked-by: Jason Wang

Tiwei Bie
2019-09-04 18:21:17 +0800

07 Aug, 2019

1 commit

13dfb3fa4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Just minor overlapping changes in the conflicts here.

Signed-off-by: David S. Miller

David S. Miller
2019-08-07 09:44:57 +0800

31 Jul, 2019

2 commits

6dbd3e66e vhost/vsock: split packets to send using multiple buffers ... Browse Code »

If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella
Reviewed-by: Stefan Hajnoczi
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Stefano Garzarella
2019-07-31 06:00:00 +0800
473c7391c vsock/virtio: limit the memory used per-socket ... Browse Code »

Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Signed-off-by: Stefano Garzarella
Reviewed-by: Stefan Hajnoczi
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Stefano Garzarella
2019-07-31 06:00:00 +0800

26 Jul, 2019

1 commit

73f628ec9 vhost: disable metadata prefetch optimization ... Browse Code »

This seems to cause guest and host memory corruption.
Disable for now until we get a better handle on that.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2019-07-26 19:49:29 +0800

18 Jul, 2019

1 commit

3a1d5384b Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost ... Browse Code »

Pull virtio, vhost updates from Michael Tsirkin:
"Fixes, features, performance:

- new iommu device

- vhost guest memory access using vmap (just meta-data for now)

- minor fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-mmio: add error check for platform_get_irq
scsi: virtio_scsi: Use struct_size() helper
iommu/virtio: Add event queue
iommu/virtio: Add probe request
iommu: Add virtio-iommu driver
PCI: OF: Initialize dev->fwnode appropriately
of: Allow the iommu-map property to omit untranslated devices
dt-bindings: virtio: Add virtio-pci-iommu node
dt-bindings: virtio-mmio: Add IOMMU description
vhost: fix clang build warning
vhost: access vq metadata through kernel virtual address
vhost: factor out setting vring addr and num
vhost: introduce helpers to get the size of metadata area
vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
vhost: fine grain userspace memory accessors
vhost: generalize adding used elem

Linus Torvalds
2019-07-18 02:26:09 +0800

12 Jul, 2019

1 commit

237f83dfb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Some highlights from this development cycle:

1) Big refactoring of ipv6 route and neigh handling to support
nexthop objects configurable as units from userspace. From David
Ahern.

2) Convert explored_states in BPF verifier into a hash table,
significantly decreased state held for programs with bpf2bpf
calls, from Alexei Starovoitov.

3) Implement bpf_send_signal() helper, from Yonghong Song.

4) Various classifier enhancements to mvpp2 driver, from Maxime
Chevallier.

5) Add aRFS support to hns3 driver, from Jian Shen.

6) Fix use after free in inet frags by allocating fqdirs dynamically
and reworking how rhashtable dismantle occurs, from Eric Dumazet.

7) Add act_ctinfo packet classifier action, from Kevin
Darbyshire-Bryant.

8) Add TFO key backup infrastructure, from Jason Baron.

9) Remove several old and unused ISDN drivers, from Arnd Bergmann.

10) Add devlink notifications for flash update status to mlxsw driver,
from Jiri Pirko.

11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.

12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.

13) Various enhancements to ipv6 flow label handling, from Eric
Dumazet and Willem de Bruijn.

14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
der Merwe, and others.

15) Various improvements to axienet driver including converting it to
phylink, from Robert Hancock.

16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.

17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
Radulescu.

18) Add devlink health reporting to mlx5, from Moshe Shemesh.

19) Convert stmmac over to phylink, from Jose Abreu.

20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
Shalom Toledo.

21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.

22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.

23) Track spill/fill of constants in BPF verifier, from Alexei
Starovoitov.

24) Support bounded loops in BPF, from Alexei Starovoitov.

25) Various page_pool API fixes and improvements, from Jesper Dangaard
Brouer.

26) Just like ipv4, support ref-countless ipv6 route handling. From
Wei Wang.

27) Support VLAN offloading in aquantia driver, from Igor Russkikh.

28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.

29) Add flower GRE encap/decap support to nfp driver, from Pieter
Jansen van Vuuren.

30) Protect against stack overflow when using act_mirred, from John
Hurley.

31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.

32) Use page_pool API in netsec driver, Ilias Apalodimas.

33) Add Google gve network driver, from Catherine Sullivan.

34) More indirect call avoidance, from Paolo Abeni.

35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.

36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.

37) Add MPLS manipulation actions to TC, from John Hurley.

38) Add sending a packet to connection tracking from TC actions, and
then allow flower classifier matching on conntrack state. From
Paul Blakey.

39) Netfilter hw offload support, from Pablo Neira Ayuso"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
net/mlx5e: Return in default case statement in tx_post_resync_params
mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
net: dsa: add support for BRIDGE_MROUTER attribute
pkt_sched: Include const.h
net: netsec: remove static declaration for netsec_set_tx_de()
net: netsec: remove superfluous if statement
netfilter: nf_tables: add hardware offload support
net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
net: flow_offload: add flow_block_cb_is_busy() and use it
net: sched: remove tcf block API
drivers: net: use flow block API
net: sched: use flow block API
net: flow_offload: add flow_block_cb_{priv, incref, decref}()
net: flow_offload: add list handling functions
net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
net: flow_offload: add flow_block_cb_setup_simple()
net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
...

Linus Torvalds
2019-07-12 01:55:49 +0800

10 Jul, 2019

1 commit

e9a83bd23 Merge tag 'docs-5.3' of git://git.lwn.net/linux ... Browse Code »

Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:

- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.

- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.

- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.

- We now recommend using sphinx 1.7, but still support back to 1.4.

- Lots of smaller improvements, warning fixes, typo fixes, etc"

* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...

Linus Torvalds
2019-07-10 03:34:26 +0800

22 Jun, 2019

1 commit

92ad6325c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Minor SPDX change conflict.

Signed-off-by: David S. Miller

David S. Miller
2019-06-22 20:59:24 +0800

19 Jun, 2019

1 commit

7a338472f treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 ... Browse Code »

Based on 1 normalized pattern(s):

this work is licensed under the terms of the gnu gpl version 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 48 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Enrico Weigelt
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-19 23:09:52 +0800

18 Jun, 2019

1 commit

098eadce3 vhost_net: disable zerocopy by default ... Browse Code »

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Jason Wang
2019-06-18 04:58:02 +0800

15 Jun, 2019

1 commit

8afecfb0e Merge tag 'v5.2-rc4' into mauro ... Browse Code »

We need to pick up post-rc1 changes to various document files so they don't
get lost in Mauro's massive RST conversion push.

Jonathan Corbet
2019-06-15 04:18:53 +0800

09 Jun, 2019

1 commit

cb1aaebea docs: fix broken documentation links ... Browse Code »

Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.

Signed-off-by: Mauro Carvalho Chehab
Reviewed-by: Wolfram Sang
Reviewed-by: Sven Van Asbroeck
Reviewed-by: Bhupesh Sharma
Acked-by: Mark Brown
Signed-off-by: Jonathan Corbet

Mauro Carvalho Chehab
2019-06-09 03:42:13 +0800

07 Jun, 2019

1 commit

0b4a7092f vhost: fix clang build warning ... Browse Code »

Clang warns:

drivers/vhost/vhost.c:2085:5: warning: macro expansion producing
'defined' has undefined behavior [-Wexpansion-to-defined]
#if VHOST_ARCH_CAN_ACCEL_UACCESS
^
drivers/vhost/vhost.h:98:38: note: expanded from macro
'VHOST_ARCH_CAN_ACCEL_UACCESS'
#define VHOST_ARCH_CAN_ACCEL_UACCESS defined(CONFIG_MMU_NOTIFIER) && \
^

It's being pedantic for the sake of portability, but the fix is easy
enough.

Rework the definition of VHOST_ARCH_CAN_ACCEL_UACCESS to expand to a constant.

Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Link: https://github.com/ClangBuiltLinux/linux/issues/508
Signed-off-by: Michael S. Tsirkin
Reviewed-by: Nathan Chancellor
Tested-by: Nathan Chancellor

Michael S. Tsirkin
2019-06-07 05:32:13 +0800

06 Jun, 2019

6 commits

7f466032d vhost: access vq metadata through kernel virtual address ... Browse Code »

It was noticed that the copy_to/from_user() friends that was used to
access virtqueue metdata tends to be very expensive for dataplane
implementation like vhost since it involves lots of software checks,
speculation barriers, hardware feature toggling (e.g SMAP). The
extra cost will be more obvious when transferring small packets since
the time spent on metadata accessing become more significant.

This patch tries to eliminate those overheads by accessing them
through direct mapping of those pages. Invalidation callbacks is
implemented for co-operation with general VM management (swap, KSM,
THP or NUMA balancing). We will try to get the direct mapping of vq
metadata before each round of packet processing if it doesn't
exist. If we fail, we will simplely fallback to copy_to/from_user()
friends.

This invalidation and direct mapping access are synchronized through
spinlock and RCU. All matedata accessing through direct map is
protected by RCU, and the setup or invalidation are done under
spinlock.

This method might does not work for high mem page which requires
temporary mapping so we just fallback to normal
copy_to/from_user() and may not for arch that has virtual tagged cache
since extra cache flushing is needed to eliminate the alias. This will
result complex logic and bad performance. For those archs, this patch
simply go for copy_to/from_user() friends. This is done by ruling out
kernel mapping codes through ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.

Note that this is only done when device IOTLB is not enabled. We
could use similar method to optimize IOTLB in the future.

Tests shows at most about 23% improvement on TX PPS when using
virtio-user + vhost_net + xdp1 + TAP on 2.6GHz Broadwell:

SMAP on | SMAP off
Before: 5.2Mpps | 7.1Mpps
After: 6.4Mpps | 8.2Mpps

Cc: Andrea Arcangeli
Cc: James Bottomley
Cc: Christoph Hellwig
Cc: David Miller
Cc: Jerome Glisse
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-06-06 09:09:18 +0800
feebcaeac vhost: factor out setting vring addr and num ... Browse Code »

Factoring vring address and num setting which needs special care for
accelerating vq metadata accessing.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-06-06 04:23:53 +0800
4942e8254 vhost: introduce helpers to get the size of metadata area ... Browse Code »

To avoid code duplication since it will be used by kernel VA prefetching.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-06-06 04:23:53 +0800
9b5e830b7 vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() ... Browse Code »

Rename the function to be more accurate since it actually tries to
prefetch vq metadata address in IOTLB. And this will be used by
following patch to prefetch metadata virtual addresses.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-06-06 04:23:52 +0800
7b5d753eb vhost: fine grain userspace memory accessors ... Browse Code »

This is used to hide the metadata address from virtqueue helpers. This
will allow to implement a vmap based fast accessing to metadata.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-06-06 04:23:52 +0800
1ab5d1385 vhost: generalize adding used elem ... Browse Code »

Use one generic vhost_copy_to_user() instead of two dedicated
accessor. This will simplify the conversion to fine grain
accessors. About 2% improvement of PPS were seen during vitio-user
txonly test.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-06-06 04:23:52 +0800

27 May, 2019

4 commits

c1ea02f15 vhost: scsi: add weight support ... Browse Code »

This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing scsi kthread from hogging cpu
which is guest triggerable.

This addresses CVE-2019-3900.

Cc: Paolo Bonzini
Cc: Stefan Hajnoczi
Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver")
Signed-off-by: Jason Wang
Reviewed-by: Stefan Hajnoczi
Signed-off-by: Michael S. Tsirkin
Reviewed-by: Stefan Hajnoczi

Jason Wang
2019-05-27 23:08:23 +0800
e79b431fb vhost: vsock: add weight support ... Browse Code »

This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing vsock kthread from hogging cpu
which is guest triggerable. The weight can help to avoid starving the
request from on direction while another direction is being processed.

The value of weight is picked from vhost-net.

This addresses CVE-2019-3900.

Cc: Stefan Hajnoczi
Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Signed-off-by: Jason Wang
Reviewed-by: Stefan Hajnoczi
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-05-27 23:08:23 +0800
e2412c07f vhost_net: fix possible infinite loop ... Browse Code »

When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:

while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
&busyloop_intr))) {
...
/* On overrun, truncate and discard */
if (unlikely(headcount > UIO_MAXIOV)) {
iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
err = sock->ops->recvmsg(sock, &msg,
1, MSG_DONTWAIT | MSG_TRUNC);
pr_debug("Discarded rx packet: len %zd\n", sock_len);
continue;
}
...
}

This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:

1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
vhost process as much as possible e.g using indirect descriptors or
other.
2) Malicious VM2 generate packets to VM1 as fast as possible

Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:

- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
to hit the continue after vhost find guest just add new buffers

This addresses CVE-2019-3900.

Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short")
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang
Reviewed-by: Stefan Hajnoczi
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-05-27 23:08:22 +0800
e82b9b072 vhost: introduce vhost_exceeds_weight() ... Browse Code »

We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.

Signed-off-by: Jason Wang
Reviewed-by: Stefan Hajnoczi
Signed-off-by: Michael S. Tsirkin

Jason Wang
2019-05-27 23:08:22 +0800