12 Jan, 2012

12 commits

  • Handling balloon hibernate / restore is tricky. If the balloon was
    inflated before going into the hibernation state, upon resume, the host
    will not have any memory of that. Any pages that were passed on to the
    host earlier would most likely be invalid, and the host will have to
    re-balloon to the previous value to get in the pre-hibernate state.

    So the only sane thing for the guest to do here is to discard all the
    pages that were put in the balloon. When to discard the pages is the
    next question.

    One solution is to deflate the balloon just before writing the image to
    the disk (in the freeze() PM callback). However, asking for pages from
    the host just to discard them immediately after seems wasteful of
    resources. Hence, it makes sense to do this by just fudging our
    counters soon after wakeup. This means we don't deflate the balloon
    before sleep, and also don't put unnecessary pressure on the host.

    This also helps in the thaw case: if the freeze fails for whatever
    reason, the balloon should continue to remain in the inflated state.
    This was tested by issuing 'swapoff -a' and trying to go into the S4
    state. That fails, and the balloon stays inflated, as expected. Both
    the host and the guest are happy.

    Finally, in the restore() callback, we empty the list of pages that were
    previously given off to the host, add the appropriate number of pages to
    the totalram_pages counter, reset the num_pages counter to 0, and
    all is fine.

    As a last step, delete the vqs on the freeze callback to prepare for
    hibernation, and re-create them in the restore and thaw callbacks to
    resume normal operation.

    The kthread doesn't race with any operations here, since it's frozen
    before the freeze() call and is thawed after the thaw() and restore()
    callbacks, so we're safe with that.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • The probe and PM restore functions will share this code.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • Handle thaw, restore and freeze notifications from the PM core. Expose
    these to individual virtio drivers that can quiesce and resume vq
    operations. For drivers not implementing the thaw() method, use the
    restore method instead.

    These functions also save device-specific data so that the device can be
    put in pre-suspend state after resume, and disable and enable the PCI
    device in the freeze and resume functions, respectively.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • The older PM API doesn't have a way to get notifications on hibernate
    events. Switch to the newer one that gives us those notifications.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • Under the existing #ifdef DEBUG, check that they don't have more than
    1/10 of a second between an add_buf() and a
    virtqueue_notify()/virtqueue_kick_prepare() call.

    We could get false positives on a really busy system, but good for
    development.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • A virtio driver does virtqueue_add_buf() multiple times before finally
    calling virtqueue_kick(); previously we only exposed the added buffers
    in the virtqueue_kick() call. This means we don't need a memory
    barrier in virtqueue_add_buf(), but it reduces concurrency as the
    device (ie. host) can't see the buffers until the kick.

    In the unusual (but now possible) case where a driver does add_buf()
    and get_buf() without doing a kick, we do need to insert one before
    our counter wraps. Otherwise we could wrap num_added, and later on
    not realize that we have passed the marker where we should have
    kicked.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
    in vring_new_virtqueue()).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Based on patch by Christoph for virtio_blk speedup:

    Split virtqueue_kick to be able to do the actual notification
    outside the lock protecting the virtqueue. This patch was
    originally done by Stefan Hajnoczi, but I can't find the
    original one anymore and had to recreated it from memory.
    Pointers to the original or corrections for the commit message
    are welcome.

    Stefan's patch was here:

    https://github.com/stefanha/linux/commit/a6d06644e3a58e57a774e77d7dc34c4a5a2e7496
    http://www.spinics.net/lists/linux-virtualization/msg14616.html

    Third time's the charm!

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Remove wrapper functions. This makes the allocation type explicit in
    all callers; I used GPF_KERNEL where it seemed obvious, left it at
    GFP_ATOMIC otherwise.

    Signed-off-by: Rusty Russell
    Reviewed-by: Christoph Hellwig

    Rusty Russell
     
  • The old documentation is left over from when we used a structure with
    strategy pointers.

    And move the documentation to the C file as per kernel practice.
    Though I disagree...

    Signed-off-by: Rusty Russell
    Reviewed-by: Christoph Hellwig

    Rusty Russell
     
  • Trivial changes to remove forgotten junk, format comments, and correct names.

    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Rusty Russell

    Sasha Levin
     
  • We were cheating with our barriers; using the smp ones rather than the
    real device ones. That was fine, until rpmsg came along, which is
    used to talk to a real device (a non-SMP CPU).

    Unfortunately, just putting back the real barriers (reverting
    d57ed95d) causes a performance regression on virtio-pci. In
    particular, Amos reports netbench's TCP_RR over virtio_net CPU
    utilization increased up to 35% while throughput went down by up to
    14%.

    By comparison, this branch is in the noise.

    Reference: https://lkml.org/lkml/2011/12/11/22

    Signed-off-by: Rusty Russell

    Rusty Russell
     

03 Dec, 2011

1 commit


24 Nov, 2011

3 commits

  • virtio pci device reset actually just does an I/O
    write, which in PCI is really posted, that is it
    can complete on CPU before the device has received it.

    Further, interrupts might have been pending on
    another CPU, so device callback might get invoked after reset.

    This conflicts with how drivers use reset, which is typically:
    reset
    unregister
    a callback running after reset completed can race with
    unregister, potentially leading to use after free bugs.

    Fix by flushing out the write, and flushing pending interrupts.

    This assumes that device is never reset from
    its vq/config callbacks, or in parallel with being
    added/removed, document this assumption.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Guest features selector spelling mistake.

    Cc: Pawel Moll
    Cc: Rusty Russell
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Rusty Russell

    Sasha Levin
     
  • Fix this compile error on s390:

    CC [M] drivers/virtio/virtio_mmio.o
    drivers/virtio/virtio_mmio.c: In function 'vm_get_features':
    drivers/virtio/virtio_mmio.c:107:2: error: implicit declaration of function 'writel'

    Cc: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Acked-by: Pawel Moll
    Signed-off-by: Rusty Russell

    Heiko Carstens
     

22 Nov, 2011

1 commit


17 Nov, 2011

1 commit


14 Nov, 2011

1 commit

  • Commit 31a3ddda166cda86d2b5111e09ba4bda5239fae6 introduced
    a use after free in virtio-pci. The main issue is
    that the release method signals removal of the virtio device,
    while remove signals removal of the pci device.

    For example, on driver removal or hot-unplug,
    virtio_pci_release_dev is called before virtio_pci_remove.
    We then might get a crash as virtio_pci_remove tries to use the
    device freed by virtio_pci_release_dev.

    We allocate/free all resources together with the
    pci device, so we can leave the release method empty.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Amit Shah
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Michael S. Tsirkin
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

02 Nov, 2011

2 commits

  • This patch, based on virtio PCI driver, adds support for memory
    mapped (platform) virtio device. This should allow environments
    like qemu to use virtio-based block & network devices even on
    platforms without PCI support.

    One can define and register a platform device which resources
    will describe memory mapped control registers and "mailbox"
    interrupt. Such device can be also instantiated using the Device
    Tree node with compatible property equal "virtio,mmio".

    Cc: Anthony Liguori
    Cc: Michael S.Tsirkin
    Signed-off-by: Pawel Moll
    Signed-off-by: Rusty Russell

    Pawel Moll
     
  • For the MSI but non-per_vq_vector case, the config/change vq
    also gets added to the list of vqs that need to process the
    MSI interrupt. This is not needed as config has it's own
    handler (vp_config_changed). In any case, vring_interrupt()
    finds nothing needs to be done on this vq.

    I tested this patch by testing the "Fallback:" and "Finally
    fall back" cases in vp_find_vqs(). Please review.

    Signed-off-by: Krishna Kumar
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Krishna Kumar
     

01 Nov, 2011

1 commit


24 Oct, 2011

1 commit


23 Jul, 2011

1 commit

  • virtio has been so far used only in the context of virtualization,
    and the virtio Kconfig was sourced directly by the relevant arch
    Kconfigs when VIRTUALIZATION was selected.

    Now that we start using virtio for inter-processor communications,
    we need to source the virtio Kconfig outside of the virtualization
    scope too.

    Moreover, some architectures might use virtio for both virtualization
    and inter-processor communications, so directly sourcing virtio
    might yield unexpected results due to conflicting selections.

    The simple solution offered by this patch is to always source virtio's
    Kconfig in drivers/Kconfig, and remove it from the appropriate arch
    Kconfigs. Additionally, a virtio menu entry has been added so virtio
    drivers don't show up in the general drivers menu.

    This way anyone can use virtio, though it's arguably less accessible
    (and neat!) for virtualization users now.

    Note: some architectures (mips and sh) seem to have a VIRTUALIZATION
    menu merely for sourcing virtio's Kconfig, so that menu is removed too.

    Signed-off-by: Ohad Ben-Cohen
    Signed-off-by: Rusty Russell

    Ohad Ben-Cohen
     

30 May, 2011

3 commits

  • Add an API that tells the other side that callbacks
    should be delayed until a lot of work has been done.
    Implement using the new event_idx feature.

    Note: it might seem advantageous to let the drivers
    ask for a callback after a specific capacity has
    been reached. However, as a single head can
    free many entries in the descriptor table,
    we don't really have a clue about capacity
    until get_buf is called. The API is the simplest
    to implement at the moment, we'll see what kind of
    hints drivers can pass when there's more than one
    user of the feature.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Support for the new event idx feature:
    1. When enabling interrupts, publish the current avail index
    value to the host to get interrupts on the next update.
    2. Use the new avail_event feature to reduce the number
    of exits from the guest.

    Simple test with the simulator:

    [virtio]# time ./virtio_test
    spurious wakeus: 0x7

    real 0m0.169s
    user 0m0.140s
    sys 0m0.019s
    [virtio]# time ./virtio_test --no-event-idx
    spurious wakeus: 0x11

    real 0m0.649s
    user 0m0.295s
    sys 0m0.335s

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • The virtio balloon driver has a VIRTIO_BALLOON_F_MUST_TELL_HOST
    feature bit. Whenever the bit is set, the guest kernel must
    always tell the host before we free pages back to the allocator.
    Without this feature, we might free a page (and have another
    user touch it) while the hypervisor is unprepared for it.

    But, if the bit is _not_ set, we are under no obligation to
    reverse the order; we're under no obligation to do _anything_.
    As of now, qemu-kvm defines the bit, but doesn't set it.

    This patch makes the "tell host first" logic the only case. This
    should make everybody happy, and reduce the amount of untested or
    untestable code in the kernel.

    This _also_ means that we don't have to preserve a pfn list
    after the pages are freed, which should let us get rid of some
    temporary storage (vb->pfns) eventually.

    Signed-off-by: Dave Hansen
    Signed-off-by: Rusty Russell

    Dave Hansen
     

21 Apr, 2011

2 commits

  • In the case where a virtio-console port is in use (opened by a program)
    and a virtio-console device is removed, the port is kept around but all
    the virtio-related state is assumed to be gone.

    When the port is finally released (close() called), we call
    device_destroy() on the port's device. This results in the parent
    device's structures to be freed as well. This includes the PCI regions
    for the virtio-console PCI device.

    Once this is done, however, virtio_pci_release_dev() kicks in, as the
    last ref to the virtio device is now gone, and attempts to do

    pci_iounmap(pci_dev, vp_dev->ioaddr);
    pci_release_regions(pci_dev);
    pci_disable_device(pci_dev);

    which results in a double-free warning.

    Move the code that releases regions, etc., to the virtio_pci_remove()
    function, and all that's now left in release_dev is the final freeing of
    the vp_dev.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • When detaching a buffer from a vq, the avail.idx value should be
    decremented as well.

    This was noticed by hot-unplugging a virtio console port and then
    plugging in a new one on the same number (re-using the vqs which were
    just 'disowned'). qemu reported

    'Guest moved used index from 0 to 256'

    when any IO was attempted on the new port.

    CC: stable@kernel.org
    Reported-by: juzhang
    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     

20 Jan, 2011

1 commit

  • We sometimes need to map between the virtio device and
    the given pci device. One such use is OS installer that
    gets the boot pci device from BIOS and needs to
    find the relevant block device. Since it can't,
    installation fails.

    Instead of creating a top-level devices/virtio-pci
    directory, create each device under the corresponding
    pci device node. Symlinks to all virtio-pci
    devices can be found under the pci driver link in
    bus/pci/drivers/virtio-pci/devices, and all virtio
    devices under drivers/bus/virtio/devices.

    Signed-off-by: Milton Miller
    Signed-off-by: Rusty Russell
    Acked-by: Michael S. Tsirkin
    Tested-by: Michael S. Tsirkin
    Acked-by: Gleb Natapov
    Tested-by: "Daniel P. Berrange"
    Cc: stable@kernel.org

    Milton Miller
     

24 Nov, 2010

2 commits

  • The sysfs files for virtio produce the wrong format and are missing
    the required newline. The output for virtio bus vendor/device should
    have the same format as the corresponding entries for PCI devices.

    Although this technically changes the ABI for sysfs, these files were
    broken to start with!

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Rusty Russell

    Stephen Hemminger
     
  • We can't rely on indirect buffers for capacity
    calculations because they need a memory allocation
    which might fail. In particular, virtio_net can get
    into this situation under stress, and it drops packets
    and performs badly.

    So return the number of buffers we can guarantee users.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell
    Reported-By: Krishna Kumar2

    Michael S. Tsirkin
     

26 Jul, 2010

1 commit

  • virtio ring was changed to return an error code on OOM,
    but one caller was missed and still checks for vq->vring.num.
    The fix is just to check for
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell
    Tested-by: Chris Mason
    Cc: stable@kernel.org # .34.x
    Signed-off-by: Linus Torvalds

    Michael S. Tsirkin
     

23 Jun, 2010

2 commits

  • virtio-pci resets the device at startup by writing to the status
    register, but this does not clear the pci config space,
    specifically msi enable status which affects register
    layout.

    This breaks things like kdump when they try to use e.g. virtio-blk.

    Fix by forcing msi off at startup. Since pci.c already has
    a routine to do this, we export and use it instead of duplicating code.

    Signed-off-by: Michael S. Tsirkin
    Tested-by: Vivek Goyal
    Acked-by: Jesse Barnes
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Michael S. Tsirkin
     
  • add_buf returns ring size on out of memory,
    this is not what devices expect.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Amit Shah
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org # .34.x

    Michael S. Tsirkin
     

22 May, 2010

1 commit

  • * 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
    drivers/char: Eliminate use after free
    virtio: console: Accept console size along with resize control message
    virtio: console: Store each console's size in the console structure
    virtio: console: Resize console port 0 on config intr only if multiport is off
    virtio: console: Add support for nonblocking write()s
    virtio: console: Rename wait_is_over() to will_read_block()
    virtio: console: Don't always create a port 0 if using multiport
    virtio: console: Use a control message to add ports
    virtio: console: Move code around for future patches
    virtio: console: Remove config work handler
    virtio: console: Don't call hvc_remove() on unplugging console ports
    virtio: console: Return -EPIPE to hvc_console if we lost the connection
    virtio: console: Let host know of port or device add failures
    virtio: console: Add a __send_control_msg() that can send messages without a valid port
    virtio: Revert "virtio: disable multiport console support."
    virtio: add_buf_gfp
    trans_virtio: use virtqueue_xxx wrappers
    virtio-rng: use virtqueue_xxx wrappers
    virtio_ring: remove a level of indirection
    virtio_net: use virtqueue_xxx wrappers
    ...

    Fix up conflicts in drivers/net/virtio_net.c due to new virtqueue_xxx
    wrappers changes conflicting with some other cleanups.

    Linus Torvalds
     

19 May, 2010

3 commits