12 Jan, 2012

3 commits

  • Handle thaw, restore and freeze notifications from the PM core. Expose
    these to individual virtio drivers that can quiesce and resume vq
    operations. For drivers not implementing the thaw() method, use the
    restore method instead.

    These functions also save device-specific data so that the device can be
    put in pre-suspend state after resume, and disable and enable the PCI
    device in the freeze and resume functions, respectively.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • The older PM API doesn't have a way to get notifications on hibernate
    events. Switch to the newer one that gives us those notifications.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • We were cheating with our barriers; using the smp ones rather than the
    real device ones. That was fine, until rpmsg came along, which is
    used to talk to a real device (a non-SMP CPU).

    Unfortunately, just putting back the real barriers (reverting
    d57ed95d) causes a performance regression on virtio-pci. In
    particular, Amos reports netbench's TCP_RR over virtio_net CPU
    utilization increased up to 35% while throughput went down by up to
    14%.

    By comparison, this branch is in the noise.

    Reference: https://lkml.org/lkml/2011/12/11/22

    Signed-off-by: Rusty Russell

    Rusty Russell
     

03 Dec, 2011

1 commit


24 Nov, 2011

1 commit

  • virtio pci device reset actually just does an I/O
    write, which in PCI is really posted, that is it
    can complete on CPU before the device has received it.

    Further, interrupts might have been pending on
    another CPU, so device callback might get invoked after reset.

    This conflicts with how drivers use reset, which is typically:
    reset
    unregister
    a callback running after reset completed can race with
    unregister, potentially leading to use after free bugs.

    Fix by flushing out the write, and flushing pending interrupts.

    This assumes that device is never reset from
    its vq/config callbacks, or in parallel with being
    added/removed, document this assumption.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     

22 Nov, 2011

1 commit


17 Nov, 2011

1 commit


14 Nov, 2011

1 commit

  • Commit 31a3ddda166cda86d2b5111e09ba4bda5239fae6 introduced
    a use after free in virtio-pci. The main issue is
    that the release method signals removal of the virtio device,
    while remove signals removal of the pci device.

    For example, on driver removal or hot-unplug,
    virtio_pci_release_dev is called before virtio_pci_remove.
    We then might get a crash as virtio_pci_remove tries to use the
    device freed by virtio_pci_release_dev.

    We allocate/free all resources together with the
    pci device, so we can leave the release method empty.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Amit Shah
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Michael S. Tsirkin
     

02 Nov, 2011

1 commit

  • For the MSI but non-per_vq_vector case, the config/change vq
    also gets added to the list of vqs that need to process the
    MSI interrupt. This is not needed as config has it's own
    handler (vp_config_changed). In any case, vring_interrupt()
    finds nothing needs to be done on this vq.

    I tested this patch by testing the "Fallback:" and "Finally
    fall back" cases in vp_find_vqs(). Please review.

    Signed-off-by: Krishna Kumar
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Krishna Kumar
     

21 Apr, 2011

1 commit

  • In the case where a virtio-console port is in use (opened by a program)
    and a virtio-console device is removed, the port is kept around but all
    the virtio-related state is assumed to be gone.

    When the port is finally released (close() called), we call
    device_destroy() on the port's device. This results in the parent
    device's structures to be freed as well. This includes the PCI regions
    for the virtio-console PCI device.

    Once this is done, however, virtio_pci_release_dev() kicks in, as the
    last ref to the virtio device is now gone, and attempts to do

    pci_iounmap(pci_dev, vp_dev->ioaddr);
    pci_release_regions(pci_dev);
    pci_disable_device(pci_dev);

    which results in a double-free warning.

    Move the code that releases regions, etc., to the virtio_pci_remove()
    function, and all that's now left in release_dev is the final freeing of
    the vp_dev.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     

20 Jan, 2011

1 commit

  • We sometimes need to map between the virtio device and
    the given pci device. One such use is OS installer that
    gets the boot pci device from BIOS and needs to
    find the relevant block device. Since it can't,
    installation fails.

    Instead of creating a top-level devices/virtio-pci
    directory, create each device under the corresponding
    pci device node. Symlinks to all virtio-pci
    devices can be found under the pci driver link in
    bus/pci/drivers/virtio-pci/devices, and all virtio
    devices under drivers/bus/virtio/devices.

    Signed-off-by: Milton Miller
    Signed-off-by: Rusty Russell
    Acked-by: Michael S. Tsirkin
    Tested-by: Michael S. Tsirkin
    Acked-by: Gleb Natapov
    Tested-by: "Daniel P. Berrange"
    Cc: stable@kernel.org

    Milton Miller
     

23 Jun, 2010

1 commit

  • virtio-pci resets the device at startup by writing to the status
    register, but this does not clear the pci config space,
    specifically msi enable status which affects register
    layout.

    This breaks things like kdump when they try to use e.g. virtio-blk.

    Fix by forcing msi off at startup. Since pci.c already has
    a routine to do this, we export and use it instead of duplicating code.

    Signed-off-by: Michael S. Tsirkin
    Tested-by: Vivek Goyal
    Acked-by: Jesse Barnes
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Michael S. Tsirkin
     

23 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Mar, 2010

1 commit


02 Mar, 2010

1 commit


01 Mar, 2010

1 commit

  • I have observed the following error on virtio-net module unload:

    ------------[ cut here ]------------
    WARNING: at kernel/irq/manage.c:858 __free_irq+0xa0/0x14c()
    Hardware name: Bochs
    Trying to free already-free IRQ 0
    Modules linked in: virtio_net(-) virtio_blk virtio_pci virtio_ring
    virtio af_packet e1000 shpchp aacraid uhci_hcd ohci_hcd ehci_hcd [last
    unloaded: scsi_wait_scan]
    Pid: 1957, comm: rmmod Not tainted 2.6.33-rc8-vhost #24
    Call Trace:
    [] warn_slowpath_common+0x7c/0x94
    [] warn_slowpath_fmt+0x41/0x43
    [] ? __free_pages+0x5a/0x70
    [] __free_irq+0xa0/0x14c
    [] free_irq+0x3f/0x65
    [] vp_del_vqs+0x81/0xb1 [virtio_pci]
    [] virtnet_remove+0xda/0x10b [virtio_net]
    [] virtio_dev_remove+0x22/0x4a [virtio]
    [] __device_release_driver+0x66/0xac
    [] driver_detach+0x83/0xa9
    [] bus_remove_driver+0x91/0xb4
    [] driver_unregister+0x6c/0x74
    [] unregister_virtio_driver+0xe/0x10 [virtio]
    [] fini+0x15/0x17 [virtio_net]
    [] sys_delete_module+0x1c3/0x230
    [] ? old_ich_force_enable_hpet+0x117/0x164
    [] ? do_page_fault+0x29c/0x2cc
    [] sysenter_dispatch+0x7/0x27
    ---[ end trace 15e88e4c576cc62b ]---

    The bug is in virtio-pci: we use msix_vector as array index to get irq
    entry, but some vqs do not have a dedicated vector so this causes an out
    of bounds access. By chance, we seem to often get 0 value, which
    results in this error.

    Fix by verifying that vector is legal before using it as index.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Anthony Liguori
    Acked-by: Shirley Ma
    Acked-by: Amit Shah

    Michael S. Tsirkin
     

24 Feb, 2010

1 commit


29 Oct, 2009

1 commit

  • Commit f68d24082e22ccee3077d11aeb6dc5354f0ca7f1
    in 2.6.32-rc1 broke requesting IRQs for per-VQ MSI-X vectors:
    - vector number was used instead of the vector itself
    - we try to request an IRQ for VQ which does not
    have a callback handler

    This is a regression that causes warnings in kernel log,
    potentially lower performance as we need to scan vq list,
    and might cause system failure if the interrupt
    requested is in fact needed by another system.

    This was not noticed earlier because in most cases
    we were falling back on shared interrupt for all vqs.

    The warnings often look like this:

    virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
    virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X
    virtio-pci 0000:00:03.0: irq 28 for MSI/MSI-X
    IRQ handler type mismatch for IRQ 1
    current handler: i8042
    Pid: 2400, comm: modprobe Tainted: G W
    2.6.32-rc3-11952-gf3ed8d8-dirty #1
    Call Trace:
    [] ? __setup_irq+0x299/0x304
    [] ? request_threaded_irq+0x144/0x1c1
    [] ? vring_interrupt+0x0/0x30
    [] ? vp_try_to_find_vqs+0x583/0x5c7
    [] ? skb_recv_done+0x0/0x34 [virtio_net]
    [] ? vp_find_vqs+0x2d/0x83
    [] ? vp_get+0x3c/0x4e
    [] ? virtnet_probe+0x2f1/0x428 [virtio_net]
    [] ? skb_recv_done+0x0/0x34 [virtio_net]
    [] ? skb_xmit_done+0x0/0x39 [virtio_net]
    [] ? sysfs_do_create_link+0xcb/0x116
    [] ? vp_get_status+0x14/0x16
    [] ? virtio_dev_probe+0xa9/0xc8
    [] ? driver_probe_device+0x8d/0x128
    [] ? __driver_attach+0x4f/0x6f
    [] ? __driver_attach+0x0/0x6f
    [] ? bus_for_each_dev+0x43/0x74
    [] ? bus_add_driver+0xea/0x22d
    [] ? driver_register+0xa7/0x111
    [] ? init+0x0/0xc [virtio_net]
    [] ? do_one_initcall+0x50/0x148
    [] ? sys_init_module+0xc5/0x21a
    [] ? system_call_fastpath+0x16/0x1b
    virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
    virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X

    Reported-by: Marcelo Tosatti
    Reported-by: Shirley Ma
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     

23 Sep, 2009

1 commit

  • 1) Rename vp_request_vectors to vp_request_msix_vectors, and take
    non-MSI-X case out to caller.
    2) Comment weird pci_enable_msix API
    3) Rename vp_find_vq to setup_vq.
    4) Fix spaces to tabs
    5) Make nvectors calc internal to vp_try_to_find_vqs()
    6) Rename vector to msix_vector for more clarity.

    Signed-off-by: Rusty Russell
    Cc: "Michael S. Tsirkin"

    Rusty Russell
     

30 Jul, 2009

3 commits


17 Jul, 2009

1 commit


12 Jun, 2009

4 commits


03 Feb, 2009

1 commit

  • The host really shouldn't be notifying us of config changes
    before the device status is VIRTIO_CONFIG_S_DRIVER or
    VIRTIO_CONFIG_S_DRIVER_OK.

    However, if we do happen to be interrupted while we're not
    attached to a driver, we really shouldn't oops. Prevent
    this simply by checking that device->driver is non-NULL
    before trying to notify the driver of config changes.

    Problem observed by doing a "set_link virtio.0 down" with
    QEMU before the net driver had been loaded.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Mark McLoughlin
     

07 Jan, 2009

1 commit

  • We shouldn't be statically allocating the root device object,
    so dynamically allocate it using root_device_register()
    instead.

    Also avoids this warning from 'rmmod virtio_pci':

    Device 'virtio-pci' does not have a release() function, it is broken and must be fixed

    Signed-off-by: Mark McLoughlin
    Cc: Anthony Liguori
    Acked-by: Rusty Russell
    Signed-off-by: Greg Kroah-Hartman

    Mark McLoughlin
     

30 Dec, 2008

6 commits

  • Add a release() function for virtio_pci devices so as to avoid:

    Device 'virtio0' does not have a release() function, it is broken and must be fixed

    Move the code to free the resources associated with the device
    from virtio_pci_remove() into this new function. virtio_pci_remove()
    now merely unregisters the device which should cause the final
    ref to be dropped and virtio_pci_release_dev() to be called.

    Signed-off-by: Mark McLoughlin
    Reported-by: Michael Tokarev
    Cc: Anthony Liguori
    Signed-off-by: Rusty Russell

    Mark McLoughlin
     
  • This allows each virtio user to hand in the alignment appropriate to
    their virtio_ring structures.

    Signed-off-by: Rusty Russell
    Acked-by: Christian Borntraeger

    Rusty Russell
     
  • That doesn't work for non-4k guests which are now appearing.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • The virtio PCI devices don't depend on the guest page size. This matters
    now PowerPC virtio is gaining ground (they like 64k pages).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • This patch is part of a larger patch series which will remove
    the "char bus_id[20]" name string from struct device. The device
    name is managed in the kobject anyway, and without any size
    limitation, and just needlessly copied into "struct device".

    To set and read the device name dev_name(dev) and dev_set_name(dev)
    must be used. If your code uses static kobjects, which it shouldn't
    do, "const char *init_name" can be used to statically provide the
    name the registered device should have. At registration time, the
    init_name field is cleared, to enforce the use of dev_name(dev) to
    access the device name at a later time.

    We need to get rid of all occurrences of bus_id in the entire tree
    to be able to enable the new interface. Please apply this patch,
    and possibly convert any remaining remaining occurrences of bus_id.

    We want to submit a patch to -next, which will remove bus_id from
    "struct device", to find the remaining pieces to convert, and finally
    switch over to the new api, which will remove the 20 bytes array
    and does no longer have a size limitation.

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Kay Sievers
    Signed-off-by: Rusty Russell

    Kay Sievers
     
  • kzalloc() does not guarantee page alignment, and in fact this broke when
    I enabled CONFIG_SLUB_DEBUG_ON.

    (Thanks to Anthony Liguori for spotting the missing kfree sub)

    Signed-off-by: Hollis Blanchard
    Signed-off-by: Rusty Russell (fixed kfree)
    Tested-by: Anthony Liguori

    Hollis Blanchard
     

25 Jul, 2008

2 commits


30 May, 2008

2 commits

  • Anthony Liguori points out that three different transports use the virtio code,
    but each one keeps its own counter to set the virtio_device's index field. In
    theory (though not in current practice) this means that names could be
    duplicated, and that risk grows as more transports are created.

    So we move the selection of the unique virtio_device.index into the common code
    in virtio.c, which has the side-benefit of removing duplicate code.

    The only complexity is that lguest and S/390 use the index to uniquely identify
    the device in case of catastrophic failure before register_virtio_device() is
    called: now we use the offset within the descriptor page as a unique identifier
    for the printks.

    Signed-off-by: Rusty Russell
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: Chris Lalancette
    Cc: Anthony Liguori

    Rusty Russell
     
  • The common virtio code sets the bus_id, overriding anything virtio_pci
    sets anyway.

    Signed-off-by: Rusty Russell
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: Chris Lalancette
    Cc: Anthony Liguori

    Rusty Russell