12 Jun, 2009

9 commits

  • This patch allows a virtio driver to use VIRTIO_DEV_ANY_ID for the
    device id. This will be used by a test module that can be bound to
    any virtio device.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell

    Christian Borntraeger
     
  • This bug never appeared, since all current virtio drivers use
    VIRTIO_DEV_ANY_ID for the vendor field. If a real vendor would be used,
    the check in virtio_id_match is wrong - it returns 0 if
    id->vendor == dev->id.vendor.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell

    Christian Borntraeger
     
  • Add a new feature flag for indirect ring entries. These are ring
    entries which point to a table of buffer descriptors.

    The idea here is to increase the ring capacity by allowing a larger
    effective ring size whereby the ring size dictates the number of
    requests that may be outstanding, rather than the size of those
    requests.

    This should be most effective in the case of block I/O where we can
    potentially benefit by concurrently dispatching a large number of
    large requests. Even in the simple case of single segment block
    requests, this results in a threefold increase in ring capacity.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Rusty Russell

    Mark McLoughlin
     
  • Each device negotiates feature bits; expose these in sysfs to help
    diagnostics and debugging.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • This implements optional MSI-X support in virtio_pci.
    MSI-X is used whenever the host supports at least 2 MSI-X
    vectors: 1 for configuration changes and 1 for virtqueues.
    Per-virtqueue vectors are allocated if enough vectors
    available.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Anthony Liguori
    Signed-off-by: Rusty Russell (+ whitespace, style)

    Michael S. Tsirkin
     
  • This reorganizes virtio-pci code in vp_interrupt slightly, so that
    it's easier to add per-vq MSI support on top.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
    and updates all drivers. This is needed for MSI support, because MSI
    needs to know the total number of vectors upfront.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell (+ lguest/9p compile fixes)

    Michael S. Tsirkin
     
  • Add a linked list of all virtqueues for a virtio device: this helps for
    debugging and is also needed for upcoming interface change.

    Also, add a "name" field for clearer debug messages.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Virtio devices are supposed to negotiate features before they start using
    the device, but the current code doesn't do this. This is because the
    driver's probe() function invariably has to add buffers to a virtqueue,
    or probe the disk (virtio_blk).

    This currently doesn't matter since no existing backend is strict about
    the feature negotiation. But it's possible to imagine a future feature
    which completely changes how a device operates: in this case, we'd need
    to acknowledge it before using the device.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

19 Apr, 2009

1 commit


30 Mar, 2009

2 commits


03 Feb, 2009

1 commit

  • The host really shouldn't be notifying us of config changes
    before the device status is VIRTIO_CONFIG_S_DRIVER or
    VIRTIO_CONFIG_S_DRIVER_OK.

    However, if we do happen to be interrupted while we're not
    attached to a driver, we really shouldn't oops. Prevent
    this simply by checking that device->driver is non-NULL
    before trying to notify the driver of config changes.

    Problem observed by doing a "set_link virtio.0 down" with
    QEMU before the net driver had been loaded.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Mark McLoughlin
     

07 Jan, 2009

1 commit

  • We shouldn't be statically allocating the root device object,
    so dynamically allocate it using root_device_register()
    instead.

    Also avoids this warning from 'rmmod virtio_pci':

    Device 'virtio-pci' does not have a release() function, it is broken and must be fixed

    Signed-off-by: Mark McLoughlin
    Cc: Anthony Liguori
    Acked-by: Rusty Russell
    Signed-off-by: Greg Kroah-Hartman

    Mark McLoughlin
     

30 Dec, 2008

7 commits

  • Add a release() function for virtio_pci devices so as to avoid:

    Device 'virtio0' does not have a release() function, it is broken and must be fixed

    Move the code to free the resources associated with the device
    from virtio_pci_remove() into this new function. virtio_pci_remove()
    now merely unregisters the device which should cause the final
    ref to be dropped and virtio_pci_release_dev() to be called.

    Signed-off-by: Mark McLoughlin
    Reported-by: Michael Tokarev
    Cc: Anthony Liguori
    Signed-off-by: Rusty Russell

    Mark McLoughlin
     
  • Make the balloon interface always use 4K pages, and convert Linux pfns if
    necessary. This patch assumes that Linux's PAGE_SHIFT will never be less than
    12.

    Signed-off-by: Hollis Blanchard
    Signed-off-by: Rusty Russell (modified)

    Hollis Blanchard
     
  • This allows each virtio user to hand in the alignment appropriate to
    their virtio_ring structures.

    Signed-off-by: Rusty Russell
    Acked-by: Christian Borntraeger

    Rusty Russell
     
  • That doesn't work for non-4k guests which are now appearing.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • The virtio PCI devices don't depend on the guest page size. This matters
    now PowerPC virtio is gaining ground (they like 64k pages).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • This patch is part of a larger patch series which will remove
    the "char bus_id[20]" name string from struct device. The device
    name is managed in the kobject anyway, and without any size
    limitation, and just needlessly copied into "struct device".

    To set and read the device name dev_name(dev) and dev_set_name(dev)
    must be used. If your code uses static kobjects, which it shouldn't
    do, "const char *init_name" can be used to statically provide the
    name the registered device should have. At registration time, the
    init_name field is cleared, to enforce the use of dev_name(dev) to
    access the device name at a later time.

    We need to get rid of all occurrences of bus_id in the entire tree
    to be able to enable the new interface. Please apply this patch,
    and possibly convert any remaining remaining occurrences of bus_id.

    We want to submit a patch to -next, which will remove bus_id from
    "struct device", to find the remaining pieces to convert, and finally
    switch over to the new api, which will remove the 20 bytes array
    and does no longer have a size limitation.

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Kay Sievers
    Signed-off-by: Rusty Russell

    Kay Sievers
     
  • kzalloc() does not guarantee page alignment, and in fact this broke when
    I enabled CONFIG_SLUB_DEBUG_ON.

    (Thanks to Anthony Liguori for spotting the missing kfree sub)

    Signed-off-by: Hollis Blanchard
    Signed-off-by: Rusty Russell (fixed kfree)
    Tested-by: Anthony Liguori

    Hollis Blanchard
     

25 Aug, 2008

1 commit

  • Both v and vb->num_pages are u32 and unsigned int respectively. If v is less
    than vb->num_pages (and it is, when deflating the balloon), the result is a
    very large 32-bit number. Since we're returning a s64, instead of getting the
    same negative number we desire, we get a very large positive number.

    This handles the case where v < vb->num_pages and ensures we get a small,
    negative, s64 as the result.

    Rusty: please push this for 2.6.27-rc4. It's probably appropriate for the
    stable tree too as it will cause an unexpected OOM when ballooning.

    Signed-off-by: Anthony Liguori
    Signed-off-by: Rusty Russell (simplified)

    Anthony Liguori
     

25 Jul, 2008

5 commits

  • To prepare for virtio_ring transport feature bits, hook in a call in
    all the users to manipulate them. This currently just clears all the
    bits, since it doesn't understand any features.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Rather than explicitly handing the features to the lower-level, we just
    hand the virtio_device and have it set the features. This make it clear
    that it has the chance to manipulate the features of the device at this
    point (and that all feature negotiation is already done).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We assign feature bits as required, but it makes sense to reserve some
    for the particular transport, rather than the particular device.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Hook up to the probe() and remove() methods in bus_type
    rather than device_driver. The latter has been preferred
    since 2.6.16.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Rusty Russell

    Mark McLoughlin
     
  • We force notification when the ring is full, even if the host has
    indicated it doesn't want to know. This seemed like a good idea at
    the time: if we fill the transmit ring, we should tell the host
    immediately.

    Unfortunately this logic also applies to the receiving ring, which is
    refilled constantly. We should introduce real notification thesholds
    to replace this logic. Meanwhile, removing the logic altogether breaks
    the heuristics which KVM uses, so we use a hack: only notify if there are
    outgoing parts of the new buffer.

    Here are the number of exits with lguest's crappy network implementation:
    Before:
    network xmit 7859051 recv 236420
    After:
    network xmit 7858610 recv 118136

    Signed-off-by: Rusty Russell

    Rusty Russell
     

16 Jun, 2008

1 commit


30 May, 2008

5 commits

  • virtio allows drivers to suppress callbacks (ie. interrupts) for
    efficiency (no locking, it's just an optimization).

    There's a similar mechanism for the host to suppress notifications
    coming from the guest: in that case, we ignore the suppression if the
    ring is completely full.

    It turns out that life is simpler if the host similarly ignores
    callback suppression when the ring is completely empty: the network
    driver wants to free up old packets in a timely manner, and otherwise
    has to use a timer to poll.

    We have to remove the code which ignores interrupts when the driver
    has disabled them (again, it had no locking and hence was unreliable
    anyway).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Hello Rusty,

    seems that we still have a problem with virtio_net and the enable_cb callback.
    During a long running network stress tests with virtio and got the following
    oops:

    ------------[ cut here ]------------
    kernel BUG at drivers/virtio/virtio_ring.c:230!
    illegal operation: 0001 [#1] SMP
    Modules linked in:
    CPU: 0 Not tainted 2.6.26-rc2-kvm-00436-gc94c08b-dirty #34
    Process netserver (pid: 2582, task: 000000000fbc4c68, ksp: 000000000f42b990)
    Krnl PSW : 0704c00180000000 00000000002d0ec8 (vring_enable_cb+0x1c/0x60)
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
    Krnl GPRS: 0000000000000000 0000000000000000 000000000ef3d000 0000000010009800
    0000000000000000 0000000000419ce0 0000000000000080 000000000000007b
    000000000adb5538 000000000ef40900 000000000ef40000 000000000ef40920
    0000000000000000 0000000000000005 000000000029c1b0 000000000fea7d18
    Krnl Code: 00000000002d0ebc: a7110001 tmll %r1,1
    00000000002d0ec0: a7740004 brc 7,2d0ec8
    00000000002d0ec4: a7f40001 brc 15,2d0ec6
    >00000000002d0ec8: a517fffe nill %r1,65534
    00000000002d0ecc: 40103000 sth %r1,0(%r3)
    00000000002d0ed0: 07f0 bcr 15,%r0
    00000000002d0ed2: e31020380004 lg %r1,56(%r2)
    00000000002d0ed8: a7480000 lhi %r4,0
    Call Trace:
    ([] virtnet_poll+0x290/0x3b8)
    [] net_rx_action+0x9c/0x1b8
    [] __do_softirq+0x74/0x108
    [] do_softirq+0x92/0xac
    [] irq_exit+0x72/0xc8
    [] do_extint+0xe2/0x104
    [] ext_no_vtime+0x16/0x1a
    Last Breaking-Event-Address:
    [] vring_enable_cb+0x18/0x60

    I looked into the virtio_net code for some time and I think the following
    scenario happened. Please look at virtnet_poll:
    [...]
    /* Out of packets? */
    if (received < budget) {
    netif_rx_complete(vi->dev, napi);
    if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq))
    && napi_schedule_prep(napi)) {
    vi->rvq->vq_ops->disable_cb(vi->rvq);
    __netif_rx_schedule(vi->dev, napi);
    goto again;
    }
    }

    If an interrupt arrives after netif_rx_complete, a second poll routine can run
    on a different cpu. The second check for napi_schedule_prep would prevent any
    harm in the network stack, but we have called enable_cb possibly after the
    disable_cb in skb_recv_done.

    static void skb_recv_done(struct virtqueue *rvq)
    {
    struct virtnet_info *vi = rvq->vdev->priv;
    /* Schedule NAPI, Suppress further interrupts if successful. */
    if (netif_rx_schedule_prep(vi->dev, &vi->napi)) {
    rvq->vq_ops->disable_cb(rvq);
    __netif_rx_schedule(vi->dev, &vi->napi);
    }
    }

    That means that the second poll routine runs with interrupts enabled, which is
    ok, since we can handle additional interrupts. The problem is now that the
    second poll routine might also call enable_cb, triggering the BUG.

    The only solution I can come up with, is to remove the BUG statement in
    enable_cb - similar to disable_cb. Opinions or better ideas where the oops
    could come from?

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell

    Christian Borntraeger
     
  • Anthony Liguori points out that three different transports use the virtio code,
    but each one keeps its own counter to set the virtio_device's index field. In
    theory (though not in current practice) this means that names could be
    duplicated, and that risk grows as more transports are created.

    So we move the selection of the unique virtio_device.index into the common code
    in virtio.c, which has the side-benefit of removing duplicate code.

    The only complexity is that lguest and S/390 use the index to uniquely identify
    the device in case of catastrophic failure before register_virtio_device() is
    called: now we use the offset within the descriptor page as a unique identifier
    for the printks.

    Signed-off-by: Rusty Russell
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: Chris Lalancette
    Cc: Anthony Liguori

    Rusty Russell
     
  • The common virtio code sets the bus_id, overriding anything virtio_pci
    sets anyway.

    Signed-off-by: Rusty Russell
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: Chris Lalancette
    Cc: Anthony Liguori

    Rusty Russell
     
  • Chris Lalancette points out that virtio.c sets all device
    names to '0', '1', etc, which looks silly in /proc/interrupts. We change this
    from '%d' to 'virtio%d'.

    Signed-off-by: Rusty Russell
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: Chris Lalancette
    Cc: Anthony Liguori

    Rusty Russell
     

02 May, 2008

4 commits

  • A recent proposed feature addition to the virtio block driver revealed
    some flaws in the API: in particular, we assume that feature
    negotiation is complete once a driver's probe function returns.

    There is nothing in the API to require this, however, and even I
    didn't notice when it was violated.

    So instead, we require the driver to specify what features it supports
    in a table, we can then move the feature negotiation into the virtio
    core. The intersection of device and driver features are presented in
    a new 'features' bitmap in the struct virtio_device.

    Note that this highlights the difference between Linux unsigned-long
    bitmaps where each unsigned long is in native endian, and a
    straight-forward little-endian array of bytes.

    Drivers can still remove feature bits in their probe routine if they
    really have to.

    API changes:
    - dev->config->feature() no longer gets and acks a feature.
    - drivers should advertise their features in the 'feature_table' field
    - use virtio_has_feature() for extra sanity when checking feature bits

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • A recent proposed feature addition to the virtio block driver revealed
    some flaws in the API, in particular how easy it is to break big
    endian machines.

    The virtio config space was originally chosen to be little-endian,
    because we thought the config might be part of the PCI config space
    for virtio_pci. It's actually a separate mmio region, so that
    argument holds little water; as only x86 is currently using the virtio
    mechanism, we can change this (but must do so now, before the
    impending s390 merge).

    API changes:
    - __virtio_config_val() just becomes a striaght vdev->config_get() call.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • drivers/virtio/virtio_pci.c:148:2: warning: returning void-valued expression
    drivers/virtio/virtio_pci.c:155:2: warning: returning void-valued expression

    Signed-off-by: Harvey Harrison
    Signed-off-by: Rusty Russell

    Harvey Harrison
     
  • A corrupt virtqueue (caused by the other end screwing up) can have
    strange results such as a driver spinning: just bail when we try to
    get a buffer from a known-broken queue.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

08 Apr, 2008

1 commit

  • The 'disable_cb' callback is designed as an optimization to tell the host
    we don't need callbacks now. As it is not reliable, the debug check is
    overzealous: it can happen on two CPUs at the same time. Document this.

    Even if it were reliable, the virtio_net driver doesn't disable
    callbacks on transmit so the START_USE/END_USE debugging reentrance
    protection can be easily tripped even on UP.

    Thanks to Balaji Rao for the bug report and testing.

    Signed-off-by: Rusty Russell
    CC: Balaji Rao
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

31 Mar, 2008

1 commit


28 Mar, 2008

1 commit