19 Apr, 2012

1 commit

  • As pointed out by Jason Baron, when assigning a device to a guest
    we first set the iommu domain pointer, which enables mapping
    and unmapping of memory slots to the iommu. This leaves a window
    where this path is enabled, but we haven't synchronized the iommu
    mappings to the existing memory slots. Thus a slot being removed
    at that point could send us down unexpected code paths removing
    non-existent pinnings and iommu mappings. Take the slots_lock
    around creating the iommu domain and initial mappings as well as
    around iommu teardown to avoid this race.

    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     

12 Apr, 2012

1 commit

  • We've been adding new mappings, but not destroying old mappings.
    This can lead to a page leak as pages are pinned using
    get_user_pages, but only unpinned with put_page if they still
    exist in the memslots list on vm shutdown. A memslot that is
    destroyed while an iommu domain is enabled for the guest will
    therefore result in an elevated page reference count that is
    never cleared.

    Additionally, without this fix, the iommu is only programmed
    with the first translation for a gpa. This can result in
    peer-to-peer errors if a mapping is destroyed and replaced by a
    new mapping at the same gpa as the iommu will still be pointing
    to the original, pinned memory address.

    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     

20 Mar, 2012

1 commit

  • As kvm_notify_acked_irq calls kvm_assigned_dev_ack_irq under
    rcu_read_lock, we cannot use a mutex in the latter function. Switch to a
    spin lock to address this.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

08 Mar, 2012

8 commits


05 Mar, 2012

5 commits

  • This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to
    kvm_host.h to reduce the code duplication caused by the need for
    non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call
    gfn_to_memslot() in real mode.

    Rather than putting gfn_to_memslot() itself in a header, which would
    lead to increased code size, this puts __gfn_to_memslot() in a header.
    Then, the non-modular uses of gfn_to_memslot() are changed to call
    __gfn_to_memslot() instead. This way there is only one place in the
    source code that needs to be changed should the gfn_to_memslot()
    implementation need to be modified.

    On powerpc, the Book3S HV style of KVM has code that is called from
    real mode which needs to call gfn_to_memslot() and thus needs this.
    (Module code is allocated in the vmalloc region, which can't be
    accessed in real mode.)

    With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c.

    Signed-off-by: Paul Mackerras
    Acked-by: Avi Kivity
    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Paul Mackerras
     
  • find_index_from_host_irq returns 0 on error
    but callers assume < 0 on error. This should
    not matter much: an out of range irq should never happen since
    irq handler was registered with this irq #,
    and even if it does we get a spurious msix irq in guest
    and typically nothing terrible happens.

    Still, better to make it consistent.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     
  • This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
    smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
    the correct answer when called without kvm->mmu_lock being held.
    PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
    a single global spinlock in order to improve the scalability of updates
    to the guest MMU hashed page table, and so needs this.

    Signed-off-by: Paul Mackerras
    Acked-by: Avi Kivity
    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Paul Mackerras
     
  • This patch exports the s390 SIE hardware control block to userspace
    via the mapping of the vcpu file descriptor. In order to do so,
    a new arch callback named kvm_arch_vcpu_fault is introduced for all
    architectures. It allows to map architecture specific pages.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch introduces a new config option for user controlled kernel
    virtual machines. It introduces a parameter to KVM_CREATE_VM that
    allows to set bits that alter the capabilities of the newly created
    virtual machine.
    The parameter is passed to kvm_arch_init_vm for all architectures.
    The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
    This requires CAP_SYS_ADMIN privileges and creates a user controlled
    virtual machine on s390 architectures.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     

01 Feb, 2012

1 commit

  • It is possible that the __set_bit() in mark_page_dirty() is called
    simultaneously on the same region of memory, which may result in only
    one bit being set, because some callers do not take mmu_lock before
    mark_page_dirty().

    This problem is hard to produce because when we reach mark_page_dirty()
    beginning from, e.g., tdp_page_fault(), mmu_lock is being held during
    __direct_map(): making kvm-unit-tests' dirty log api test write to two
    pages concurrently was not useful for this reason.

    So we have confirmed that there can actually be race condition by
    checking if some callers really reach there without holding mmu_lock
    using spin_is_locked(): probably they were from kvm_write_guest_page().

    To fix this race, this patch changes the bit operation to the atomic
    version: note that nr_dirty_pages also suffers from the race but we do
    not need exactly correct numbers for now.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     

13 Jan, 2012

1 commit


11 Jan, 2012

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits)
    iommu/amd: Set IOTLB invalidation timeout
    iommu/amd: Init stats for iommu=pt
    iommu/amd: Remove unnecessary cache flushes in amd_iommu_resume
    iommu/amd: Add invalidate-context call-back
    iommu/amd: Add amd_iommu_device_info() function
    iommu/amd: Adapt IOMMU driver to PCI register name changes
    iommu/amd: Add invalid_ppr callback
    iommu/amd: Implement notifiers for IOMMUv2
    iommu/amd: Implement IO page-fault handler
    iommu/amd: Add routines to bind/unbind a pasid
    iommu/amd: Implement device aquisition code for IOMMUv2
    iommu/amd: Add driver stub for AMD IOMMUv2 support
    iommu/amd: Add stat counter for IOMMUv2 events
    iommu/amd: Add device errata handling
    iommu/amd: Add function to get IOMMUv2 domain for pdev
    iommu/amd: Implement function to send PPR completions
    iommu/amd: Implement functions to manage GCR3 table
    iommu/amd: Implement IOMMUv2 TLB flushing routines
    iommu/amd: Add support for IOMMUv2 domain mode
    iommu/amd: Add amd_iommu_domain_direct_map function
    ...

    Linus Torvalds
     

09 Jan, 2012

1 commit


27 Dec, 2011

14 commits


26 Dec, 2011

1 commit

  • Only allow KVM device assignment to attach to devices which:

    - Are not bridges
    - Have BAR resources (assume others are special devices)
    - The user has permissions to use

    Assigning a bridge is a configuration error, it's not supported, and
    typically doesn't result in the behavior the user is expecting anyway.
    Devices without BAR resources are typically chipset components that
    also don't have host drivers. We don't want users to hold such devices
    captive or cause system problems by fencing them off into an iommu
    domain. We determine "permission to use" by testing whether the user
    has access to the PCI sysfs resource files. By default a normal user
    will not have access to these files, so it provides a good indication
    that an administration agent has granted the user access to the device.

    [Yang Bai: add missing #include]
    [avi: fix comment style]

    Signed-off-by: Alex Williamson
    Signed-off-by: Yang Bai
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     

25 Dec, 2011

1 commit


10 Nov, 2011

1 commit

  • When mapping a memory region, split it to page sizes as supported
    by the iommu hardware. Always prefer bigger pages, when possible,
    in order to reduce the TLB pressure.

    The logic to do that is now added to the IOMMU core, so neither the iommu
    drivers themselves nor users of the IOMMU API have to duplicate it.

    This allows a more lenient granularity of mappings; traditionally the
    IOMMU API took 'order' (of a page) as a mapping size, and directly let
    the low level iommu drivers handle the mapping, but now that the IOMMU
    core can split arbitrary memory regions into pages, we can remove this
    limitation, so users don't have to split those regions by themselves.

    Currently the supported page sizes are advertised once and they then
    remain static. That works well for OMAP and MSM but it would probably
    not fly well with intel's hardware, where the page size capabilities
    seem to have the potential to be different between several DMA
    remapping devices.

    register_iommu() currently sets a default pgsize behavior, so we can convert
    the IOMMU drivers in subsequent patches. After all the drivers
    are converted, the temporary default settings will be removed.

    Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
    to deal with bytes instead of page order.

    Many thanks to Joerg Roedel for significant review!

    Signed-off-by: Ohad Ben-Cohen
    Cc: David Brown
    Cc: David Woodhouse
    Cc: Joerg Roedel
    Cc: Stepan Moskovchenko
    Cc: KyongHo Cho
    Cc: Hiroshi DOYU
    Cc: Laurent Pinchart
    Cc: kvm@vger.kernel.org
    Signed-off-by: Joerg Roedel

    Ohad Ben-Cohen
     

01 Nov, 2011

2 commits

  • This file has things like module_param_named() and MODULE_PARM_DESC()
    so it needs the full module.h header present. Without it, you'll get:

    CC arch/x86/kvm/../../../virt/kvm/iommu.o
    virt/kvm/iommu.c:37: error: expected ‘)’ before ‘bool’
    virt/kvm/iommu.c:39: error: expected ‘)’ before string constant
    make[3]: *** [arch/x86/kvm/../../../virt/kvm/iommu.o] Error 1
    make[2]: *** [arch/x86/kvm] Error 2

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • This was coming in via an implicit module.h (and its sub-includes)
    before, but we'll be cleaning that up shortly. Call out the stat.h
    include requirement in advance.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

31 Oct, 2011

1 commit

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits)
    iommu/core: Remove global iommu_ops and register_iommu
    iommu/msm: Use bus_set_iommu instead of register_iommu
    iommu/omap: Use bus_set_iommu instead of register_iommu
    iommu/vt-d: Use bus_set_iommu instead of register_iommu
    iommu/amd: Use bus_set_iommu instead of register_iommu
    iommu/core: Use bus->iommu_ops in the iommu-api
    iommu/core: Convert iommu_found to iommu_present
    iommu/core: Add bus_type parameter to iommu_domain_alloc
    Driver core: Add iommu_ops to bus_type
    iommu/core: Define iommu_ops and register_iommu only with CONFIG_IOMMU_API
    iommu/amd: Fix wrong shift direction
    iommu/omap: always provide iommu debug code
    iommu/core: let drivers know if an iommu fault handler isn't installed
    iommu/core: export iommu_set_fault_handler()
    iommu/omap: Fix build error with !IOMMU_SUPPORT
    iommu/omap: Migrate to the generic fault report mechanism
    iommu/core: Add fault reporting mechanism
    iommu/core: Use PAGE_SIZE instead of hard-coded value
    iommu/core: use the existing IS_ALIGNED macro
    iommu/msm: ->unmap() should return order of unmapped page
    ...

    Fixup trivial conflicts in drivers/iommu/Makefile: "move omap iommu to
    dedicated iommu folder" vs "Rename the DMAR and INTR_REMAP config
    options" just happened to touch lines next to each other.

    Linus Torvalds