09 Feb, 2017

1 commit

  • commit 0d5415b489f68b58e1983a53793d25d53098ed4b upstream.

    This reverts commit c7070619f3408d9a0dffbed9149e6f00479cf43b.

    This has been shown to regress on some ARM systems:

    by forcing on DMA API usage for ARM systems, we have inadvertently
    kicked open a hornets' nest in terms of cache-coherency. Namely that
    unless the virtio device is explicitly described as capable of coherent
    DMA by firmware, the DMA APIs on ARM and other DT-based platforms will
    assume it is non-coherent. This turns out to cause a big problem for the
    likes of QEMU and kvmtool, which generate virtio-mmio devices in their
    guest DTs but neglect to add the often-overlooked "dma-coherent"
    property; as a result, we end up with the guest making non-cacheable
    accesses to the vring, the host doing so cacheably, both talking past
    each other and things going horribly wrong.

    We are working on a safer work-around.

    Fixes: c7070619f340 ("vring: Force use of DMA API for ARM-based systems with legacy devices")
    Reported-by: Robin Murphy
    Signed-off-by: Will Deacon
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Michael S. Tsirkin
     

01 Feb, 2017

2 commits

  • commit f7f6634d23830ff74335734fbdb28ea109c1f349 upstream.

    Once DMA API usage is enabled, it becomes apparent that virtio-mmio is
    inadvertently relying on the default 32-bit DMA mask, which leads to
    problems like rapidly exhausting SWIOTLB bounce buffers.

    Ensure that we set the appropriate 64-bit DMA mask whenever possible,
    with the coherent mask suitably limited for the legacy vring as per
    a0be1db4304f ("virtio_pci: Limit DMA mask to 44 bits for legacy virtio
    devices").

    Cc: Andy Lutomirski
    Cc: Michael S. Tsirkin
    Reported-by: Jean-Philippe Brucker
    Fixes: b42111382f0e ("virtio_mmio: Use the DMA API if enabled")
    Signed-off-by: Robin Murphy
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Greg Kroah-Hartman

    Robin Murphy
     
  • commit c7070619f3408d9a0dffbed9149e6f00479cf43b upstream.

    Booting Linux on an ARM fastmodel containing an SMMU emulation results
    in an unexpected I/O page fault from the legacy virtio-blk PCI device:

    [ 1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received:
    [ 1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
    [ 1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
    [ 1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002
    [ 1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000
    [ 1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received:
    [ 1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
    [ 1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
    [ 1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000
    [ 1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000

    This is because the legacy virtio-blk device is behind an SMMU, so we
    have consequently swizzled its DMA ops and configured the SMMU to
    translate accesses. This then requires the vring code to use the DMA API
    to establish translations, otherwise all transactions will result in
    fatal faults and termination.

    Given that ARM-based systems only see an SMMU if one is really present
    (the topology is all described by firmware tables such as device-tree or
    IORT), then we can safely use the DMA API for all legacy virtio devices.
    Modern devices can advertise the prescense of an IOMMU using the
    VIRTIO_F_IOMMU_PLATFORM feature flag.

    Cc: Andy Lutomirski
    Cc: Michael S. Tsirkin
    Fixes: 876945dbf649 ("arm64: Hook up IOMMU dma_ops")
    Signed-off-by: Will Deacon
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     

31 Oct, 2016

5 commits

  • This inline function is unused on configurations
    where dma_map/unmap are empty macros.

    Make the function inline to avoid gcc errors because
    of an unused static function.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • Remove unused file config.c

    Signed-off-by: Juergen Gross
    Signed-off-by: Michael S. Tsirkin

    Juergen Gross
     
  • The following commit 'fad7b7b27b6a (virtio_balloon: Use a workqueue
    instead of "vballoon" kthread)' has added a regression. Original code with
    kthread starts the thread inside probe and checks the necessity to update
    balloon inside the thread immediately.

    Nowadays the code behaves differently. Work is queued only on the first
    command from the host after the negotiation. Thus there is a window
    especially at the guest startup or the module reloading when the balloon
    size is not updated until the notification from the host.

    This patch adds balloon size check at the end of the probe to match
    original behaviour.

    Signed-off-by: Konstantin Neumoin
    Signed-off-by: Denis V. Lunev
    Signed-off-by: Michael S. Tsirkin

    Konstantin Neumoin
     
  • According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is
    negotiated the driver MUST set flags to 0. Not dirtying the available
    ring in virtqueue_disable_cb also has a minor positive performance
    impact, improving L1 dcache load missed by ~0.5% in vring_bench.

    Writes to the used event field (vring_used_event) are still unconditional.

    Cc: Michael S. Tsirkin
    Cc: # f277ec4 virtio_ring: shadow available
    Cc:
    Signed-off-by: Ladi Prosek
    Signed-off-by: Michael S. Tsirkin

    Ladi Prosek
     
  • Legacy virtio defines the virtqueue base using a 32-bit PFN field, with
    a read-only register indicating a fixed page size of 4k.

    This can cause problems for DMA allocators that allocate top down from
    the DMA mask, which is set to 64 bits. In this case, the addresses are
    silently truncated to 44-bit, leading to IOMMU faults, failure to read
    from the queue or data corruption.

    This patch restricts the coherent DMA mask for legacy PCI virtio devices
    to 44 bits, which matches the specification.

    Cc: stable@vger.kernel.org
    Cc: Andy Lutomirski
    Cc: Michael S. Tsirkin
    Cc: Benjamin Serebrin
    Signed-off-by: Will Deacon
    Signed-off-by: Michael S. Tsirkin

    Will Deacon
     

10 Sep, 2016

1 commit

  • We get 1 warning when building kernel with W=1:
    drivers/virtio/virtio_ring.c:170:16: warning: no previous prototype for 'vring_dma_dev' [-Wmissing-prototypes]

    In fact, this function is only used in the file in which it is
    declared and don't need a declaration, but can be made static.
    so this patch marks this function with 'static'.

    Signed-off-by: Baoyou Xie
    Acked-by: Arnd Bergmann
    Signed-off-by: Michael S. Tsirkin

    Baoyou Xie
     

09 Aug, 2016

2 commits


06 Aug, 2016

1 commit

  • Pull virtio/vhost updates from Michael Tsirkin:

    - new vsock device support in host and guest

    - platform IOMMU support in host and guest, including compatibility
    quirks for legacy systems.

    - misc fixes and cleanups.

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    VSOCK: Use kvfree()
    vhost: split out vringh Kconfig
    vhost: detect 32 bit integer wrap around
    vhost: new device IOTLB API
    vhost: drop vringh dependency
    vhost: convert pre sorted vhost memory array to interval tree
    vhost: introduce vhost memory accessors
    VSOCK: Add Makefile and Kconfig
    VSOCK: Introduce vhost_vsock.ko
    VSOCK: Introduce virtio_transport.ko
    VSOCK: Introduce virtio_vsock_common.ko
    VSOCK: defer sock removal to transports
    VSOCK: transport-specific vsock_transport functions
    vhost: drop vringh dependency
    vop: pull in vhost Kconfig
    virtio: new feature to detect IOMMU device quirk
    balloon: check the number of available pages in leak balloon
    vhost: lockless enqueuing
    vhost: simplify work flushing

    Linus Torvalds
     

02 Aug, 2016

2 commits

  • The interaction between virtio and IOMMUs is messy.

    On most systems with virtio, physical addresses match bus addresses,
    and it doesn't particularly matter which one we use to program
    the device.

    On some systems, including Xen and any system with a physical device
    that speaks virtio behind a physical IOMMU, we must program the IOMMU
    for virtio DMA to work at all.

    On other systems, including SPARC and PPC64, virtio-pci devices are
    enumerated as though they are behind an IOMMU, but the virtio host
    ignores the IOMMU, so we must either pretend that the IOMMU isn't
    there or somehow map everything as the identity.

    Add a feature bit to detect that quirk: VIRTIO_F_IOMMU_PLATFORM.

    Any device with this feature bit set to 0 needs a quirk and has to be
    passed physical addresses (as opposed to bus addresses) even though
    the device is behind an IOMMU.

    Note: it has to be a per-device quirk because for example, there could
    be a mix of passed-through and virtual virtio devices. As another
    example, some devices could be implemented by an out of process
    hypervisor backend (in case of qemu vhost, or vhost-user) and so support
    for an IOMMU needs to be coded up separately.

    It would be cleanest to handle this in IOMMU core code, but that needs
    per-device DMA ops. While we are waiting for that to be implemented, use
    a work-around in virtio core.

    Note: a "noiommu" feature is a quirk - add a wrapper to make
    that clear.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • The balloon has a special mechanism that is subscribed to the oom
    notification which leads to deflation for a fixed number of pages.
    The number is always fixed even when the balloon is fully deflated.
    But leak_balloon did not expect that the pages to deflate will be more
    than taken, and raise a "BUG" in balloon_page_dequeue when page list
    will be empty.

    So, the simplest solution would be to check that the number of releases
    pages is less or equal to the number taken pages.

    Cc: stable@vger.kernel.org
    Signed-off-by: Konstantin Neumoin
    Signed-off-by: Denis V. Lunev
    CC: Michael S. Tsirkin
    Signed-off-by: Michael S. Tsirkin

    Konstantin Neumoin
     

27 Jul, 2016

2 commits

  • Randy reported below build error.

    > In file included from ../include/linux/balloon_compaction.h:48:0,
    > from ../mm/balloon_compaction.c:11:
    > ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline int compaction_register_node(struct node *node)
    > ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
    > ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline void compaction_unregister_node(struct node *node)
    >

    It was caused by non-lru page migration which needs compaction.h but
    compaction.h doesn't include any header to be standalone.

    I think proper header for non-lru page migration is migrate.h rather
    than compaction.h because migrate.h has already headers needed to work
    non-lru page migration indirectly like isolate_mode_t, migrate_mode
    MIGRATEPAGE_SUCCESS.

    [akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
    Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
    Signed-off-by: Minchan Kim
    Reported-by: Randy Dunlap
    Cc: Konstantin Khlebnikov
    Cc: Vlastimil Babka
    Cc: Gioh Kim
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Now, VM has a feature to migrate non-lru movable pages so balloon
    doesn't need custom migration hooks in migrate.c and compaction.c.

    Instead, this patch implements the page->mapping->a_ops->
    {isolate|migrate|putback} functions.

    With that, we could remove hooks for ballooning in general migration
    functions and make balloon compaction simple.

    [akpm@linux-foundation.org: compaction.h requires that the includer first include node.h]
    Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.org
    Signed-off-by: Gioh Kim
    Signed-off-by: Minchan Kim
    Acked-by: Vlastimil Babka
    Cc: Rafael Aquini
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

23 May, 2016

1 commit


01 May, 2016

1 commit

  • Smatch complains that we might not initialize "queue". The issue is
    callers like setup_vq() from virtio_pci_modern.c where "num" could be
    something like 2 and "vring_align" is 64. In that case, vring_size() is
    less than PAGE_SIZE. It won't happen in real life, but we're getting
    the value of "num" from a register so it's not really possible to tell
    what value it holds with static analysis.

    Let's just silence the warning.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Michael S. Tsirkin

    Dan Carpenter
     

07 Apr, 2016

1 commit


21 Mar, 2016

1 commit

  • Pull virtio/vhost updates from Michael Tsirkin:
    "New features, performance improvements, cleanups:

    - basic polling support for vhost
    - rework virtio to optionally use DMA API, fixing it on Xen
    - balloon stats gained a new entry
    - using the new napi_alloc_skb speeds up virtio net
    - virtio blk stats can now be read while another VCPU is busy
    inflating or deflating the balloon

    plus misc cleanups in various places"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()
    vhost_net: basic polling support
    vhost: introduce vhost_vq_avail_empty()
    vhost: introduce vhost_has_work()
    virtio_balloon: Allow to resize and update the balloon stats in parallel
    virtio_balloon: Use a workqueue instead of "vballoon" kthread
    virtio/s390: size of SET_IND payload
    virtio/s390: use dev_to_virtio
    vhost: rename vhost_init_used()
    vhost: rename cross-endian helpers
    virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH
    vring: Use the DMA API on Xen
    virtio_pci: Use the DMA API if enabled
    virtio_mmio: Use the DMA API if enabled
    virtio: Add improved queue allocation API
    virtio_ring: Support DMA APIs
    vring: Introduce vring_use_dma_api()
    s390/dma: Allow per device dma ops
    alpha/dma: use common noop dma ops
    dma: Provide simple noop dma ops

    Linus Torvalds
     

18 Mar, 2016

1 commit

  • Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
    statistics protocol, corresponding to 'Available' in /proc/meminfo.

    It indicates to the hypervisor how big the balloon can be inflated
    without pushing the guest system to swap.

    Signed-off-by: Igor Redko
    Signed-off-by: Denis V. Lunev
    Reviewed-by: Roman Kagan
    Cc: Michael S. Tsirkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Igor Redko
     

17 Mar, 2016

1 commit

  • Pull PCI updates from Bjorn Helgaas:
    "PCI changes for v4.6:

    Enumeration:
    - Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas)
    - Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas

    Resource management:
    - Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
    - Don't assign or reassign immutable resources (Bjorn Helgaas)
    - Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas)
    - Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas)
    - Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas)
    - ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas)
    - ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
    - MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
    - Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas)
    - Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas)
    - rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
    - designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)

    Virtualization:
    - Wait for up to 1000ms after FLR reset (Alex Williamson)
    - Support SR-IOV on any function type (Kelly Zytaruk)
    - Add ACS quirk for all Cavium devices (Manish Jaggi)

    AER:
    - Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas)
    - Restore pci_ops pointer while calling original pci_ops (David Daney)
    - Fix aer_inject error codes (Jean Delvare)
    - Use dev_warn() in aer_inject (Jean Delvare)
    - Log actual error causes in aer_inject (Jean Delvare)
    - Log aer_inject error injections (Jean Delvare)

    VPD:
    - Prevent VPD access for buggy devices (Babu Moger)
    - Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas)
    - Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas)
    - Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas)
    - Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas)
    - Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas)
    - Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas)
    - Update VPD definitions (Hannes Reinecke)
    - Allow access to VPD attributes with size 0 (Hannes Reinecke)
    - Determine actual VPD size on first access (Hannes Reinecke)

    Generic host bridge driver:
    - Move structure definitions to separate header file (David Daney)
    - Add pci_host_common_probe(), based on gen_pci_probe() (David Daney)
    - Expose pci_host_common_probe() for use by other drivers (David Daney)

    Altera host bridge driver:
    - Fix altera_pcie_link_is_up() (Ley Foon Tan)

    Cavium ThunderX host bridge driver:
    - Add PCIe host driver for ThunderX processors (David Daney)
    - Add driver for ThunderX-pass{1,2} on-chip devices (David Daney)

    Freescale i.MX6 host bridge driver:
    - Add DT bindings to configure PHY Tx driver settings (Justin Waters)
    - Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach)
    - Move PHY reset into imx6_pcie_establish_link() (Lucas Stach)
    - Remove broken Gen2 workaround (Lucas Stach)
    - Move link up check into imx6_pcie_wait_for_link() (Lucas Stach)

    Freescale Layerscape host bridge driver:
    - Add "fsl,ls2085a-pcie" compatible ID (Yang Shi)

    Intel VMD host bridge driver:
    - Attach VMD resources to parent domain's resource tree (Jon Derrick)
    - Set bus resource start to 0 (Keith Busch)

    Microsoft Hyper-V host bridge driver:
    - Add fwnode_handle to x86 pci_sysdata (Jake Oshins)
    - Look up IRQ domain by fwnode_handle (Jake Oshins)
    - Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins)

    NVIDIA Tegra host bridge driver:
    - Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding)
    - Implement ->{add,remove}_bus() callbacks (Thierry Reding)
    - Remove unused struct tegra_pcie.num_ports field (Thierry Reding)
    - Track bus -> CPU mapping (Thierry Reding)
    - Remove misleading PHYS_OFFSET (Thierry Reding)

    Renesas R-Car host bridge driver:
    - Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman)

    Synopsys DesignWare host bridge driver:
    - ARC: Add PCI support (Joao Pinto)
    - Add generic dw_pcie_wait_for_link() (Joao Pinto)
    - Add default link up check if sub-driver doesn't override (Joao Pinto)
    - Add driver for prototyping kits based on ARC SDP (Joao Pinto)

    TI Keystone host bridge driver:
    - Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin)

    Xilinx AXI host bridge driver:
    - Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada)
    - Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada)
    - Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada)
    - Update Zynq binding with Microblaze node (Bharat Kumar Gogada)
    - microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada)

    Xilinx NWL host bridge driver:
    - Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada)

    Miscellaneous:
    - Check device_attach() return value always (Bjorn Helgaas)
    - Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas)
    - Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas)
    - ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas)
    - Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas)
    - Remove includes of asm/pci-bridge.h (Bjorn Helgaas)
    - Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas)
    - unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas)
    - Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler)
    - Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas)
    - Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa)
    - frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig)
    - Move pci_dma_* helpers to common code (Christoph Hellwig)
    - Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus)
    - Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson)
    - Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)"

    * tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
    PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
    PCI: designware: Add driver for prototyping kits based on ARC SDP
    PCI: designware: Add default link up check if sub-driver doesn't override
    PCI: designware: Add generic dw_pcie_wait_for_link()
    PCI: Cleanup pci/pcie/Kconfig whitespace
    PCI: Simplify pci_create_attr() control flow
    PCI: Don't leak memory if sysfs_create_bin_file() fails
    PCI: Simplify sysfs ROM cleanup
    PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
    MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource
    MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
    ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource
    ia64/PCI: Use ioremap() instead of open-coded equivalent
    ia64/PCI: Use temporary struct resource * to avoid repetition
    PCI: Clean up pci_map_rom() whitespace
    PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs
    PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices
    PCI: thunder: Add PCIe host driver for ThunderX processors
    PCI: generic: Expose pci_host_common_probe() for use by other drivers
    PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe()
    ...

    Linus Torvalds
     

11 Mar, 2016

2 commits

  • The virtio balloon statistics are not updated when the balloon
    is being resized. But it seems that both tasks could be done
    in parallel.

    stats_handle_request() updates the statistics in the balloon
    structure and then communicates with the host.

    update_balloon_stats() calls all_vm_events() that just reads
    some per-CPU variables. The values might change during and
    after the call but it is expected and happens even without
    this patch.

    update_balloon_stats() also calls si_meminfo(). It is a bit
    more complex function. It too just reads some variables and
    looks lock-less safe. In each case, it seems to be called
    lock-less on several similar locations, e.g. from post_status()
    in dm_thread_func(), or from vmballoon_send_get_target().

    The communication with the host is done via a separate virtqueue,
    see vb->stats_vq vs. vb->inflate_vq and vb->deflate_vq. Therefore
    it could be used in parallel with fill_balloon() and leak_balloon().

    This patch splits the existing work into two pieces. One is for
    updating the balloon stats. The other is for resizing of the balloon.
    It seems that they can be proceed in parallel without any
    extra locking.

    Signed-off-by: Petr Mladek
    Signed-off-by: Michael S. Tsirkin

    Petr Mladek
     
  • This patch moves the deferred work from the "vballoon" kthread into a
    system freezable workqueue.

    We do not need to maintain and run a dedicated kthread. Also the event
    driven workqueues API makes the logic much easier. Especially, we do
    not longer need an own wait queue, wait function, and freeze point.

    The conversion is pretty straightforward. One cycle of the main loop
    is put into a work. The work is queued instead of waking the kthread.

    fill_balloon() and leak_balloon() have a limit for the amount of modified
    pages. The work re-queues itself when necessary. For this, we make
    fill_balloon() to return the number of really modified pages.
    Note that leak_balloon() already did this.

    virtballoon_restore() queues the work only when really needed.

    The only complication is that we need to prevent queuing the work
    when the balloon is being removed. It was easier before because the
    kthread simply removed itself from the wait queue. We need an
    extra boolean and spin lock now.

    My initial idea was to use a dedicated workqueue. Michael S. Tsirkin
    suggested using a system one. Tejun Heo confirmed that the system
    workqueue has a pretty high concurrency level (256) by default.
    Therefore we need not be afraid of too long blocking.

    Signed-off-by: Petr Mladek
    Signed-off-by: Michael S. Tsirkin

    Petr Mladek
     

09 Mar, 2016

1 commit

  • Introduce PCI_VENDOR/PCI_SUBVENDOR/PCI_SUBDEVICE defines to replace the
    constants scattered in the kernel already used to detect QEMU.

    They are defined in the QEMU codebase per docs/specs/pci-ids.txt.

    Signed-off-by: Robin H. Johnson
    Signed-off-by: Bjorn Helgaas
    Reviewed-by: Takashi Iwai
    Reviewed-by: Gerd Hoffmann
    Acked-by: Michael S. Tsirkin
    Acked-by: Daniel Vetter

    Robin H. Johnson
     

02 Mar, 2016

7 commits

  • Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Vrabel
    Reviewed-by: Wei Liu

    Andy Lutomirski
     
  • This switches to vring_create_virtqueue, simplifying the driver and
    adding DMA API support.

    This fixes virtio-pci on platforms and busses that have IOMMUs. This
    will break the experimental QEMU Q35 IOMMU support until QEMU is
    fixed. In exchange, it fixes physical virtio hardware as well as
    virtio-pci running under Xen.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin

    Andy Lutomirski
     
  • This switches to vring_create_virtqueue, simplifying the driver and
    adding DMA API support.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin

    Andy Lutomirski
     
  • This leaves vring_new_virtqueue alone for compatbility, but it
    adds two new improved APIs:

    vring_create_virtqueue: Creates a virtqueue backed by automatically
    allocated coherent memory. (Some day it this could be extended to
    support non-coherent memory, too, if there ends up being a platform
    on which it's worthwhile.)

    __vring_new_virtqueue: Creates a virtqueue with a manually-specified
    layout. This should allow mic_virtio to work much more cleanly.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin

    Andy Lutomirski
     
  • virtio_ring currently sends the device (usually a hypervisor)
    physical addresses of its I/O buffers. This is okay when DMA
    addresses and physical addresses are the same thing, but this isn't
    always the case. For example, this never works on Xen guests, and
    it is likely to fail if a physical "virtio" device ever ends up
    behind an IOMMU or swiotlb.

    The immediate use case for me is to enable virtio on Xen guests.
    For that to work, we need vring to support DMA address translation
    as well as a corresponding change to virtio_pci or to another
    driver.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin

    Andy Lutomirski
     
  • This is a kludge, but no one has come up with a a better idea yet.
    We'll introduce DMA API support guarded by vring_use_dma_api().
    Eventually we may be able to return true on more and more systems,
    and hopefully we can get rid of vring_use_dma_api() entirely some
    day.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin

    Andy Lutomirski
     
  • Looks like a copy-paste bug. The value is used as an optimization and a
    wrong value probably isn't causing any serious damage. Found when
    porting this code to Windows.

    Signed-off-by: Ladi Prosek
    Signed-off-by: Michael S. Tsirkin

    Ladi Prosek
     

26 Jan, 2016

1 commit

  • KASan detected a use-after-free error in virtio-pci remove code. In
    virtio_pci_remove(), vp_dev is still used after being freed in
    unregister_virtio_device() (in virtio_pci_release_dev() more
    precisely).

    To fix, keep a reference until cleanup is done.

    Fixes: 63bd62a08ca4 ("virtio_pci: defer kfree until release callback")
    Reported-by: Jerome Marchand
    Cc: stable@vger.kernel.org
    Cc: Sasha Levin
    Signed-off-by: Michael S. Tsirkin
    Tested-by: Jerome Marchand

    Michael S. Tsirkin
     

13 Jan, 2016

3 commits

  • checkpatch.pl wants arrays of strings declared as follows:

    static const char * const names[] = { "vq-1", "vq-2", "vq-3" };

    Currently the find_vqs() function takes a const char *names[] argument
    so passing checkpatch.pl's const char * const names[] results in a
    compiler error due to losing the second const.

    This patch adjusts the find_vqs() prototype and updates all virtio
    transports. This makes it possible for virtio_balloon.c, virtio_input.c,
    virtgpu_kms.c, and virtio_rpmsg_bus.c to use the checkpatch.pl-friendly
    type.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Bjorn Andersson

    Stefan Hajnoczi
     
  • During my compaction-related stuff, I encountered a bug
    with ballooning.

    With repeated inflating and deflating cycle, guest memory(
    ie, cat /proc/meminfo | grep MemTotal) is decreased and
    couldn't be recovered.

    The reason is balloon_lock doesn't cover release_pages_balloon
    so struct virtio_balloon fields could be overwritten by race
    of fill_balloon(e,g, vb->*pfns could be critical).

    This patch fixes it in my test.

    Cc:
    Signed-off-by: Minchan Kim
    Signed-off-by: Michael S. Tsirkin

    Minchan Kim
     
  • We need a full barrier after writing out event index, using
    virt_store_mb there seems better than open-coding. As usual, we need a
    wrapper to account for strong barriers.

    It's tempting to use this in vhost as well, for that, we'll
    need a variant of smp_store_mb that works on __user pointers.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Peter Zijlstra (Intel)

    Michael S. Tsirkin
     

07 Dec, 2015

3 commits

  • Improves cacheline transfer flow of available ring header.

    Virtqueues are implemented as a pair of rings, one producer->consumer
    avail ring and one consumer->producer used ring; preceding the
    avail ring in memory are two contiguous u16 fields -- avail->flags
    and avail->idx. A producer posts work by writing to avail->idx and
    a consumer reads avail->idx.

    The flags and idx fields only need to be written by a producer CPU
    and only read by a consumer CPU; when the producer and consumer are
    running on different CPUs and the virtio_ring code is structured to
    only have source writes/sink reads, we can continuously transfer the
    avail header cacheline between 'M' states between cores. This flow
    optimizes core -> core bandwidth on certain CPUs.

    (see: "Software Optimization Guide for AMD Family 15h Processors",
    Section 11.6; similar language appears in the 10h guide and should
    apply to CPUs w/ exclusive caches, using LLC as a transfer cache)

    Unfortunately the existing virtio_ring code issued reads to the
    avail->idx and read-modify-writes to avail->flags on the producer.

    This change shadows the flags and index fields in producer memory;
    the vring code now reads from the shadows and only ever writes to
    avail->flags and avail->idx, allowing the cacheline to transfer
    core -> core optimally.

    In a concurrent version of vring_bench, the time required for
    10,000,000 buffer checkout/returns was reduced by ~2% (average
    across many runs) on an AMD Piledriver (15h) CPU:

    (w/o shadowing):
    Performance counter stats for './vring_bench':
    5,451,082,016 L1-dcache-loads
    ...
    2.221477739 seconds time elapsed

    (w/ shadowing):
    Performance counter stats for './vring_bench':
    5,405,701,361 L1-dcache-loads
    ...
    2.168405376 seconds time elapsed

    The further away (in a NUMA sense) virtio producers and consumers are
    from each other, the more we expect to benefit. Physical implementations
    of virtio devices and implementations of virtio where the consumer polls
    vring avail indexes (vhost) should also benefit.

    Signed-off-by: Venkatesh Srinivas
    Signed-off-by: Michael S. Tsirkin

    Venkatesh Srinivas
     
  • b92b1b89a33c ("virtio: force vring descriptors to be allocated from
    lowmem") tried to exclude highmem pages for descriptors so it cleared
    __GFP_HIGHMEM from a given gfp mask. The patch also cleared __GFP_HIGH
    which doesn't make much sense for this fix because __GFP_HIGH only
    controls access to memory reserves and it doesn't have any influence
    on the zone selection. Some of the call paths use GFP_ATOMIC and
    dropping __GFP_HIGH will reduce their changes for success because the
    lack of access to memory reserves.

    Signed-off-by: Michal Hocko
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Will Deacon
    Reviewed-by: Mel Gorman

    Michal Hocko
     
  • The virtio core uses a static ida named virtio_index_ida for
    assigning index numbers to virtio devices during registration.
    The ida core may allocate some internal idr cache layers and
    an ida bitmap upon any ida allocation, and all these layers are
    truely freed only upon the ida destruction. The virtio_index_ida
    is not destroyed at present, leading to a memory leak when using
    the virtio core as a module and atleast one virtio device is
    registered and unregistered.

    Fix this by invoking ida_destroy() in the virtio core module
    exit.

    Cc: stable@vger.kernel.org
    Signed-off-by: Suman Anna
    Signed-off-by: Michael S. Tsirkin

    Suman Anna
     

08 Sep, 2015

1 commit

  • Balloon device is frequently used as a mean of cooperative memory control
    in between guest and host to manage memory overcommitment. This is the
    typical case for any hosting workload when KVM guest is provided for
    end-user.

    Though there is a problem in this setup. The end-user and hosting provider
    have signed SLA agreement in which some amount of memory is guaranted for
    the guest. The good thing is that this memory will be given to the guest
    when the guest will really need it (f.e. with OOM in guest and with
    VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
    is that end-user does not know this.

    Balloon by default reduce the amount of memory exposed to the end-user
    each time when the page is stolen from guest or returned back by using
    adjust_managed_page_count and thus /proc/meminfo shows reduced amount
    of memory.

    Fortunately the solution is simple, we should just avoid to call
    adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.

    Signed-off-by: Denis V. Lunev
    CC: Michael S. Tsirkin
    Signed-off-by: Michael S. Tsirkin

    Denis V. Lunev