Eric Lee / smarc-fsl-linux-kernel

02 Aug, 2016

6 commits

ec33d031a vhost: detect 32 bit integer wrap around ... Browse Code »

Detect and fail early if long wrap around is triggered.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2016-08-02 21:54:28 +0800
6b1e6cc78 vhost: new device IOTLB API ... Browse Code »

This patch tries to implement an device IOTLB for vhost. This could be
used with userspace(qemu) implementation of DMA remapping
to emulate an IOMMU for the guest.

The idea is simple, cache the translation in a software device IOTLB
(which is implemented as an interval tree) in vhost and use vhost_net
file descriptor for reporting IOTLB miss and IOTLB
update/invalidation. When vhost meets an IOTLB miss, the fault
address, size and access can be read from the file. After userspace
finishes the translation, it writes the translated address to the
vhost_net file to update the device IOTLB.

When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
addresses set by ioctl are treated as iova instead of virtual address and
the accessing can only be done through IOTLB instead of direct userspace
memory access. Before each round or vq processing, all vq metadata is
prefetched in device IOTLB to make sure no translation fault happens
during vq processing.

In most cases, virtqueues are contiguous even in virtual address space.
The IOTLB translation for virtqueue itself may make it a little
slower. We might add fast path cache on top of this patch.

Signed-off-by: Jason Wang
[mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
[mst: fix build warnings ]
Signed-off-by: Michael S. Tsirkin
[ weiyj.lk: missing unlock on error ]
Signed-off-by: Wei Yongjun

Jason Wang
2016-08-02 21:53:54 +0800
a9709d687 vhost: convert pre sorted vhost memory array to interval tree ... Browse Code »

Current pre-sorted memory region array has some limitations for future
device IOTLB conversion:

1) need extra work for adding and removing a single region, and it's
expected to be slow because of sorting or memory re-allocation.
2) need extra work of removing a large range which may intersect
several regions with different size.
3) need trick for a replacement policy like LRU

To overcome the above shortcomings, this patch convert it to interval
tree which can easily address the above issue with almost no extra
work.

The patch could be used for:

- Extend the current API and only let the userspace to send diffs of
memory table.
- Simplify Device IOTLB implementation.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2016-08-02 07:57:31 +0800
bfe2bc512 vhost: introduce vhost memory accessors ... Browse Code »

This patch introduces vhost memory accessors which were just wrappers
for userspace address access helpers. This is a requirement for vhost
device iotlb implementation which will add iotlb translations in those
accessors.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2016-08-02 07:57:30 +0800
04b96e552 vhost: lockless enqueuing ... Browse Code »

We use spinlock to synchronize the work list now which may cause
unnecessary contentions. So this patch switch to use llist to remove
this contention. Pktgen tests shows about 5% improvement:

Before:
~1300000 pps
After:
~1370000 pps

Signed-off-by: Jason Wang
Reviewed-by: Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin

Jason Wang
2016-08-02 02:44:51 +0800
7235acdb1 vhost: simplify work flushing ... Browse Code »

We used to implement the work flushing through tracking queued seq,
done seq, and the number of flushing. This patch simplify this by just
implement work flushing through another kind of vhost work with
completion. This will be used by lockless enqueuing patch.

Signed-off-by: Jason Wang
Reviewed-by: Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin

Jason Wang
2016-08-02 02:44:50 +0800

11 Mar, 2016

3 commits

030881372 vhost_net: basic polling support ... Browse Code »

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2016-03-11 08:18:53 +0800
d4a60603f vhost: introduce vhost_vq_avail_empty() ... Browse Code »

This patch introduces a helper which will return true if we're sure
that the available ring is empty for a specific vq. When we're not
sure, e.g vq access failure, return false instead. This could be used
for busy polling code to exit the busy loop.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2016-03-11 08:18:50 +0800
526d3e7ff vhost: introduce vhost_has_work() ... Browse Code »

This path introduces a helper which can give a hint for whether or not
there's a work queued in the work list. This could be used for busy
polling code to exit the busy loop.

Signed-off-by: Jason Wang
Signed-off-by: Michael S. Tsirkin

Jason Wang
2016-03-11 08:18:45 +0800

02 Mar, 2016

3 commits

80f7d0301 vhost: rename vhost_init_used() ... Browse Code »

Looking at how callers use this, maybe we should just rename init_used
to vhost_vq_init_access. The _used suffix was a hint that we
access the vq used ring. But maybe what callers care about is
that it must be called after access_ok.

Also, this function manipulates the vq->is_le field which isn't related
to the vq used ring.

This patch simply renames vhost_init_used() to vhost_vq_init_access() as
suggested by Michael.

No behaviour change.

Signed-off-by: Greg Kurz
Signed-off-by: Michael S. Tsirkin

Greg Kurz
2016-03-02 23:02:04 +0800
c50720375 vhost: rename cross-endian helpers ... Browse Code »

The default use case for vhost is when the host and the vring have the
same endianness (default native endianness). But there are cases where
they differ and vhost should byteswap when accessing the vring.

The first case is when the host is big endian and the vring belongs to
a virtio 1.0 device, which is always little endian.

This is covered by the vq->is_le field. This field is initialized when
userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device
stops.

We already have a vhost_init_is_le() helper, but the reset operation is
opencoded as follows:

vq->is_le = virtio_legacy_is_little_endian();

It isn't clear that we are resetting vq->is_le here.

This patch moves the code to a helper with a more explicit name.

The other case where we may have to byteswap is when the architecture can
switch endianness at runtime (bi-endian). If endianness differs in the host
and in the guest, then legacy devices need to be used in cross-endian mode.

This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which
introduces a vq->user_be field. Userspace may enable cross-endian mode
by calling the SET_VRING_ENDIAN ioctl before the device is started. The
cross-endian mode is disabled when the device is stopped.

The current names of the helpers that manipulate vq->user_be are unclear.

This patch renames those helpers to clearly show that this is cross-endian
stuff and with explicit enable/disable semantics.

No behaviour change.

Signed-off-by: Greg Kurz
Signed-off-by: Michael S. Tsirkin

Greg Kurz
2016-03-02 23:02:00 +0800
e1f33be91 vhost: fix error path in vhost_init_used() ... Browse Code »

We don't want side effects. If something fails, we rollback vq->is_le to
its previous value.

Signed-off-by: Greg Kurz
Signed-off-by: Michael S. Tsirkin

Greg Kurz
2016-03-02 23:01:49 +0800

07 Dec, 2015

2 commits

5fba13b5c vhost: replace % with & on data path ... Browse Code »

We know vring num is a power of 2, so use &
to mask the high bits.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2015-12-07 23:28:10 +0800
d54248387 vhost: relax log address alignment ... Browse Code »

commit 5d9a07b0de512b77bf28d2401e5fe3351f00a240 ("vhost: relax used
address alignment") fixed the alignment for the used virtual address,
but not for the physical address used for logging.

That's a mistake: alignment should clearly be the same for virtual and
physical addresses,

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2015-12-07 23:27:54 +0800

27 Jul, 2015

2 commits

1e0994730 vhost: fix error handling for memory region alloc ... Browse Code »

callers of vhost_kvzalloc() expect the same behaviour on
allocation error as from kmalloc/vmalloc i.e. NULL return
value. So just return vzmalloc() returned value instead of
returning ERR_PTR(-ENOMEM)

Fixes: 4de7255f7d2be5 ("vhost: extend memory regions allocation to vmalloc")

Spotted-by: Dan Carpenter
Suggested-by: Julia Lawall
Signed-off-by: Igor Mammedov
Signed-off-by: Michael S. Tsirkin

Igor Mammedov
2015-07-27 23:05:05 +0800
7932c0bd7 vhost: actually track log eventfd file ... Browse Code »

While reviewing vhost log code, I found out that log_file is never
set. Note: I haven't tested the change (QEMU doesn't use LOG_FD yet).

Cc: stable@vger.kernel.org
Signed-off-by: Marc-André Lureau
Signed-off-by: Michael S. Tsirkin

Marc-André Lureau
2015-07-27 23:04:58 +0800

14 Jul, 2015

2 commits

c9ce42f72 vhost: add max_mem_regions module parameter ... Browse Code »

it became possible to use a bigger amount of memory
slots, which is used by memory hotplug for
registering hotplugged memory.
However QEMU crashes if it's used with more than ~60
pc-dimm devices and vhost-net enabled since host kernel
in module vhost-net refuses to accept more than 64
memory regions.

Allow to tweak limit via max_mem_regions module paramemter
with default value set to 64 slots.

Signed-off-by: Igor Mammedov
Signed-off-by: Michael S. Tsirkin

Igor Mammedov
2015-07-14 04:17:19 +0800
4de7255f7 vhost: extend memory regions allocation to vmalloc ... Browse Code »

with large number of memory regions we could end up with
high order allocations and kmalloc could fail if
host is under memory pressure.
Considering that memory regions array is used on hot path
try harder to allocate using kmalloc and if it fails resort
to vmalloc.
It's still better than just failing vhost_set_memory() and
causing guest crash due to it when a new memory hotplugged
to guest.

I'll still look at QEMU side solution to reduce amount of
memory regions it feeds to vhost to make things even better,
but it doesn't hurt for kernel to behave smarter and don't
crash older QEMU's which could use large amount of memory
regions.

Signed-off-by: Igor Mammedov
Signed-off-by: Michael S. Tsirkin

Igor Mammedov
2015-07-14 04:17:18 +0800

01 Jul, 2015

1 commit

bcfeacab4 vhost: use binary search instead of linear in find_region() ... Browse Code »

For default region layouts performance stays the same
as linear search i.e. it takes around 210ns average for
translate_desc() that inlines find_region().

But it scales better with larger amount of regions,
235ns BS vs 300ns LS with 55 memory regions
and it will be about the same values when allowed number
of slots is increased to 509 like it has been done in kvm.

Signed-off-by: Igor Mammedov

Signed-off-by: Michael S. Tsirkin

Igor Mammedov
2015-07-01 16:12:12 +0800

01 Jun, 2015

1 commit

2751c9882 vhost: cross-endian support for legacy devices ... Browse Code »

This patch brings cross-endian support to vhost when used to implement
legacy virtio devices. Since it is a relatively rare situation, the
feature availability is controlled by a kernel config option (not set
by default).

The vq->is_le boolean field is added to cache the endianness to be
used for ring accesses. It defaults to native endian, as expected
by legacy virtio devices. When the ring gets active, we force little
endian if the device is modern. When the ring is deactivated, we
revert to the native endian default.

If cross-endian was compiled in, a vq->user_be boolean field is added
so that userspace may request a specific endianness. This field is
used to override the default when activating the ring of a legacy
device. It has no effect on modern devices.

Signed-off-by: Greg Kurz

Signed-off-by: Michael S. Tsirkin
Reviewed-by: Cornelia Huck
Reviewed-by: David Gibson

Greg Kurz
2015-06-01 21:48:55 +0800

04 Feb, 2015

1 commit

aad9a1cec vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec() ... Browse Code »

Cc: Michael S. Tsirkin
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro

Al Viro
2015-02-04 14:34:15 +0800

29 Dec, 2014

1 commit

5d9a07b0d vhost: relax used address alignment ... Browse Code »

virtio 1.0 only requires used address to be 4 byte aligned,
vhost required 8 bytes (size of vring_used_elem).
Fix up vhost to match that.

Additionally, while vhost correctly requires 8 byte
alignment for log, it's unconnected to used ring:
it's a consequence that log has u64 entries.
Tweak code to make that clearer.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2014-12-29 16:55:06 +0800

09 Dec, 2014

2 commits

3b1bbe893 vhost: virtio 1.0 endian-ness support ... Browse Code »

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2014-12-09 18:05:29 +0800
64f7f0510 vhost: switch to __get/__put_user exclusively ... Browse Code »

Most places in vhost can use __get/__put_user rather than
get/put_user since addresses are pre-validated.
This should be good for performance, but this also
will help make code sparse-clean: get/put_user macros
don't play well with __virtioXX bitwise tags.
Switch to get/put_user to __ variants everywhere in vhost.
There's one exception - for consistency switch that
as well, and add an explicit access_ok check.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2014-12-09 18:05:29 +0800

09 Jun, 2014

3 commits

47283bef7 vhost: move memory pointer to VQs ... Browse Code »

commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc
vhost: replace rcu with mutex
replaced rcu sync for memory accesses with VQ mutex locl/unlock.
This is correct since all accesses are under VQ mutex, but incomplete:
we still do useless rcu lock/unlock operations, someone might copy this
code into some other context where this won't be right.
This use of RCU is also non standard and hard to understand.
Let's copy the pointer to each VQ structure, this way
the access rules become straight-forward, and there's
no need for RCU anymore.

Reported-by: Eric Dumazet
Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2014-06-09 21:21:07 +0800
ea16c5143 vhost: move acked_features to VQs ... Browse Code »

Refactor code to make sure features are only accessed
under VQ mutex. This makes everything simpler, no need
for RCU here anymore.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2014-06-09 21:21:06 +0800
98f9ca0a3 vhost: replace rcu with mutex ... Browse Code »

All memory accesses are done under some VQ mutex.
So lock/unlock all VQs is a faster equivalent of synchronize_rcu()
for memory access changes.
Some guests cause a lot of these changes, so it's helpful
to make them faster.

Reported-by: "Gonglei (Arei)"
Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2014-06-09 21:21:06 +0800

07 Dec, 2013

1 commit

59566b6e8 vhost: remove the dead branch ... Browse Code »

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.

Signed-off-by: Zhi Yong Wu
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Zhi Yong Wu
2013-12-07 04:22:05 +0800

17 Sep, 2013

1 commit

ac9fde247 vhost: wake up worker outside spin_lock ... Browse Code »

the wake_up_process func is included by spin_lock/unlock in
vhost_work_queue,
but it could be done outside the spin_lock.
I have test it with kernel 3.0.27 and guest suse11-sp2 using iperf,
the num as below.
original modified
thread_num tp(Gbps) vhost(%) | tp(Gbps) vhost(%)
1 9.59 28.82 | 9.59 27.49
8 9.61 32.92 | 9.62 26.77
64 9.58 46.48 | 9.55 38.99
256 9.6 63.7 | 9.6 52.59

Signed-off-by: Chuanyu Qin
Signed-off-by: Michael S. Tsirkin

Qin Chuanyu
2013-09-17 14:21:32 +0800

04 Sep, 2013

1 commit

c49e4e573 vhost: switch to use vhost_add_used_n() ... Browse Code »

Let vhost_add_used() to use vhost_add_used_n() to reduce the code
duplication. To avoid the overhead brought by __copy_to_user(). We will use
put_user() when one used need to be added.

Signed-off-by: Jason Wang
Signed-off-by: David S. Miller

Jason Wang
2013-09-04 10:46:57 +0800

21 Aug, 2013

1 commit

35596b279 vhost: Include linux/uio.h instead of linux/socket.h ... Browse Code »

memcpy_fromiovec is moved from net/core/iovec.c to lib/iovec.c.
linux/uio.h provides the declaration for memcpy_fromiovec.

Include linux/uio.h instead of inux/socket.h for it.

Signed-off-by: Asias He
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Asias He
2013-08-21 06:05:04 +0800

07 Jul, 2013

2 commits

6ac1afbf6 vhost: Make vhost a separate module ... Browse Code »

Currently, vhost-net and vhost-scsi are sharing the vhost core code.
However, vhost-scsi shares the code by including the vhost.c file
directly.

Making vhost a separate module makes it is easier to share code with
other vhost devices.

Signed-off-by: Asias He
Signed-off-by: Michael S. Tsirkin

Asias He
2013-07-07 22:33:44 +0800
6d5e6aa86 vhost: Simplify dev->vqs[i] access ... Browse Code »

Signed-off-by: Asias He
Signed-off-by: Michael S. Tsirkin

Asias He
2013-07-07 19:38:26 +0800

11 Jun, 2013

1 commit

05c053519 vhost: check owner before we overwrite ubuf_info ... Browse Code »

If device has an owner, we shouldn't touch ubuf_info
since it might be in use.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Michael S. Tsirkin
2013-06-11 17:46:21 +0800

06 May, 2013

2 commits

7542a6b0d vhost: drop virtio_net.h dependency ... Browse Code »

There's no net specific code in vhost.c anymore,
don't include the virtio_net.h header.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2013-05-06 19:04:06 +0800
54db63c2c vhost: Export vhost_dev_set_owner ... Browse Code »

Signed-off-by: Asias He
Signed-off-by: Michael S. Tsirkin

Asias He
2013-05-06 17:57:54 +0800

01 May, 2013

4 commits

150b9e51a vhost: fix error handling in RESET_OWNER ioctl ... Browse Code »

RESET_OWNER ioctl would leave the fd in a bad state if
memory allocation failed: device is stopped
but owner is not reset. Make state changes
after allocating memory, such that a failed
ioctl has no effect.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2013-05-01 15:02:54 +0800
81f95a558 vhost: move per-vq net specific fields out to net ... Browse Code »

This will remove the need for vhost scsi to pull
in virtio-net.h.

Signed-off-by: Michael S. Tsirkin

Michael S. Tsirkin
2013-05-01 15:02:53 +0800
2839400f8 vhost: move vhost-net zerocopy fields to net.c ... Browse Code »

On top of 'vhost: Allow device specific fields per vq', we can move device
specific fields to device virt queue from vhost virt queue.

Signed-off-by: Asias He
Signed-off-by: Michael S. Tsirkin

Asias He
2013-05-01 15:02:52 +0800
3ab2e420e vhost: Allow device specific fields per vq ... Browse Code »

This is useful for any device who wants device specific fields per vq.
For example, tcm_vhost wants a per vq field to track requests which are
in flight on the vq. Also, on top of this we can add patches to move
things like ubufs from vhost.h out to net.c.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: Asias He
Signed-off-by: Michael S. Tsirkin

Asias He
2013-05-01 15:02:45 +0800