13 Sep, 2013

1 commit

  • Pull IOMMU Updates from Joerg Roedel:
    "This round the updates contain:

    - A new driver for the Freescale PAMU IOMMU from Varun Sethi.

    This driver has cooked for a while and required changes to the
    IOMMU-API and infrastructure that were already merged before.

    - Updates for the ARM-SMMU driver from Will Deacon

    - Various fixes, the most important one is probably a fix from Alex
    Williamson for a memory leak in the VT-d page-table freeing code

    In summary not all that much. The biggest part in the diffstat is the
    new PAMU driver"

    * tag 'iommu-updates-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
    intel-iommu: Fix leaks in pagetable freeing
    iommu/amd: Fix resource leak in iommu_init_device()
    iommu/amd: Clean up unnecessary MSI/MSI-X capability find
    iommu/arm-smmu: Simplify VMID and ASID allocation
    iommu/arm-smmu: Don't use VMIDs for stage-1 translations
    iommu/arm-smmu: Tighten up global fault reporting
    iommu/arm-smmu: Remove broken big-endian check
    iommu/fsl: Remove unnecessary 'fsl-pamu' prefixes
    iommu/fsl: Fix whitespace problems noticed by git-am
    iommu/fsl: Freescale PAMU driver and iommu implementation.
    iommu/fsl: Add additional iommu attributes required by the PAMU driver.
    powerpc: Add iommu domain pointer to device archdata
    iommu/exynos: Remove dead code (set_prefbuf)

    Linus Torvalds
     

12 Sep, 2013

2 commits


21 Aug, 2013

1 commit


16 Aug, 2013

1 commit

  • Remove unneeded error handling on the result of a call to
    platform_get_resource when the value is passed to devm_ioremap_resource.

    A simplified version of the semantic patch that makes this change is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    expression pdev,res,n,e,e1;
    expression ret != 0;
    identifier l;
    @@

    - res = platform_get_resource(pdev, IORESOURCE_MEM, n);
    ... when != res
    - if (res == NULL) { ... \(goto l;\|return ret;\) }
    ... when != res
    + res = platform_get_resource(pdev, IORESOURCE_MEM, n);
    e = devm_ioremap_resource(e1, res);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Stephen Warren

    Julia Lawall
     

15 Aug, 2013

2 commits

  • At best the current code only seems to free the leaf pagetables and
    the root. If you're unlucky enough to have a large gap (like any
    QEMU guest with more than 3G of memory), only the first chunk of leaf
    pagetables are freed (plus the root). This is a massive memory leak.
    This patch re-writes the pagetable freeing function to use a
    recursive algorithm and manages to not only free all the pagetables,
    but does it without any apparent performance loss versus the current
    broken version.

    Signed-off-by: Alex Williamson
    Cc: stable@vger.kernel.org
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Joerg Roedel

    Alex Williamson
     
  • Detected by cppcheck.

    Signed-off-by: Kamil Dudka
    Signed-off-by: Joerg Roedel

    Radmila Kompová
     

14 Aug, 2013

9 commits

  • PCI core will initialize device MSI/MSI-X capability in
    pci_msi_init_pci_dev(). So device driver should use
    pci_dev->msi_cap/msix_cap to determine whether the device
    support MSI/MSI-X instead of using
    pci_find_capability(pci_dev, PCI_CAP_ID_MSI/MSIX). Access
    to PCIe device config space again will consume more time.

    Signed-off-by: Yijing Wang
    Cc: Joerg Roedel
    Cc: iommu@lists.linux-foundation.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Joerg Roedel

    Yijing Wang
     
  • We only use ASIDs and VMIDs to identify individual stage-1 and stage-2
    context-banks respectively, so rather than allocate these separately
    from the context-banks, just calculate them based on the context bank
    index.

    Note that VMIDs are offset by 1, since VMID 0 is reserved for stage-1.
    This doesn't cause us any issues with the numberspaces, since the
    maximum number of context banks is half the minimum number of VMIDs.

    Signed-off-by: Will Deacon
    Signed-off-by: Joerg Roedel

    Will Deacon
     
  • Although permitted by the architecture, using VMIDs for stage-1
    translations causes a complete nightmare for hypervisors, who end up
    having to virtualise the VMID space across VMs, which may be using
    multiple VMIDs each.

    To make life easier for hypervisors (which might just decide not to
    support this VMID virtualisation), this patch reworks the stage-1
    context-bank TLB invalidation so that:

    - Stage-1 mappings are marked non-global in the ptes
    - Each Stage-1 context-bank is assigned an ASID in TTBR0
    - VMID 0 is reserved for Stage-1 context-banks

    This allows the hypervisor to overwrite the Stage-1 VMID in the CBAR
    when trapping the write from the guest.

    Signed-off-by: Will Deacon
    Signed-off-by: Joerg Roedel

    Will Deacon
     
  • On systems which use a single, combined irq line for the SMMU, context
    faults may result in us spuriously reporting global faults with zero
    status registers.

    This patch fixes up the fsr checks in both the context and global fault
    interrupt handlers, so that we only report the fault if the fsr
    indicates something did indeed go awry.

    Signed-off-by: Will Deacon
    Signed-off-by: Joerg Roedel

    Will Deacon
     
  • The bottom word of the pgd should always be written to the low half of
    the TTBR, so we don't need to swap anything for big-endian.

    Signed-off-by: Will Deacon
    Signed-off-by: Joerg Roedel

    Will Deacon
     
  • The file defines a pr_fmt macro, so there is no need to add
    this prefix to individual messages.

    Signed-off-by: Joerg Roedel

    Joerg Roedel
     
  • Signed-off-by: Joerg Roedel

    Joerg Roedel
     
  • Following is a brief description of the PAMU hardware:
    PAMU determines what action to take and whether to authorize the action on
    the basis of the memory address, a Logical IO Device Number (LIODN), and
    PAACT table (logically) indexed by LIODN and address. Hardware devices which
    need to access memory must provide an LIODN in addition to the memory address.

    Peripheral Access Authorization and Control Tables (PAACTs) are the primary
    data structures used by PAMU. A PAACT is a table of peripheral access
    authorization and control entries (PAACE).Each PAACE defines the range of
    I/O bus address space that is accessible by the LIOD and the associated access
    capabilities.

    There are two types of PAACTs: primary PAACT (PPAACT) and secondary PAACT
    (SPAACT).A given physical I/O device may be able to act as one or more
    independent logical I/O devices (LIODs). Each such logical I/O device is
    assigned an identifier called logical I/O device number (LIODN). A LIODN is
    allocated a contiguous portion of the I/O bus address space called the DSA window
    for performing DSA operations. The DSA window may optionally be divided into
    multiple sub-windows, each of which may be used to map to a region in system
    storage space. The first sub-window is referred to as the primary sub-window
    and the remaining are called secondary sub-windows.

    This patch provides the PAMU driver (fsl_pamu.c) and the corresponding IOMMU
    API implementation (fsl_pamu_domain.c). The PAMU hardware driver (fsl_pamu.c)
    has been derived from the work done by Ashish Kalra and Timur Tabi.

    [For iommu group support]
    Acked-by: Alex Williamson

    Signed-off-by: Timur Tabi
    Signed-off-by: Varun Sethi
    Signed-off-by: Joerg Roedel

    Varun Sethi
     
  • exynos_sysmmu_set_prefbuf() is not called any where.

    Signed-off-by: Grant Grundler
    Reviewed-by: Cho KyongHo
    Signed-off-by: Joerg Roedel

    Grant Grundler
     

07 Aug, 2013

1 commit

  • Two header files exist in mach-msm's include/mach directory that
    are only used by the MSM iommu driver. Move these files to the
    iommu driver directory and prefix them with "msm_". This allows
    us to compile the MSM iommu driver on multi-platform kernels.

    Acked-by: Joerg Roedel
    Cc: Stepan Moskovchenko
    Signed-off-by: Stephen Boyd
    Signed-off-by: David Brown

    Stephen Boyd
     

11 Jul, 2013

1 commit

  • Pull IOMMU updates from Joerg Roedel:
    "A few updates this time, most important and exiciting (to me) is:

    - The new ARM SMMU driver. This is a common IOMMU driver that will
    hopefully be used in a lot of upcoming ARM chips. So the mess in
    the past where every SOC had its own IOMMU will be over.

    Besides that:

    - Some important fixes in the IOMMU unmap path. There are fixes in
    the common code and also in the AMD IOMMU driver.
    - Other random fixes"

    * tag 'iommu-updates-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
    MAINTAINERS: add entry for ARM system MMU driver
    iommu/arm: Add support for ARM Ltd. System MMU architecture
    documentation/iommu: Add description of ARM System MMU binding
    iommu: Use %pa and %zx instead of casting
    iommu/amd: Only unmap large pages from the first pte
    iommu: Fix compiler warning on pr_debug
    iommu/amd: Fix memory leak in free_pagetable
    iommu: Split iommu_unmaps
    iommu/{vt-d,amd}: Remove multifunction assumption around grouping
    iommu/omap: fix checkpatch warnings in omap iommu code
    iommu/omap: fix printk formats for dma_addr_t
    iommu/vt-d: DMAR reporting table needs at least one DRHD
    iommu/vt-d: Downgrade the warning if enabling irq remapping fails

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • Fix two obvious problems:

    1. We have registered msm_iommu_driver first, and need unregister it
    when registered msm_iommu_ctx_driver fail

    2. We don't need to kfree drvdata before kzalloc was successful.

    [akpm@linux-foundation.org: remove now-unneeded initialization of ctx_drvdata, remove unneeded braces]
    Signed-off-by: Libo Chen
    Acked-by: David Brown
    Cc: David Woodhouse
    Cc: James Hogan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Libo Chen
     

05 Jul, 2013

1 commit

  • Pull powerpc updates from Ben Herrenschmidt:
    "This is the powerpc changes for the 3.11 merge window. In addition to
    the usual bug fixes and small updates, the main highlights are:

    - Support for transparent huge pages by Aneesh Kumar for 64-bit
    server processors. This allows the use of 16M pages as transparent
    huge pages on kernels compiled with a 64K base page size.

    - Base VFIO support for KVM on power by Alexey Kardashevskiy

    - Wiring up of our nvram to the pstore infrastructure, including
    putting compressed oopses in there by Aruna Balakrishnaiah

    - Move, rework and improve our "EEH" (basically PCI error handling
    and recovery) infrastructure. It is no longer specific to pseries
    but is now usable by the new "powernv" platform as well (no
    hypervisor) by Gavin Shan.

    - I fixed some bugs in our math-emu instruction decoding and made it
    usable to emulate some optional FP instructions on processors with
    hard FP that lack them (such as fsqrt on Freescale embedded
    processors).

    - Support for Power8 "Event Based Branch" facility by Michael
    Ellerman. This facility allows what is basically "userspace
    interrupts" for performance monitor events.

    - A bunch of Transactional Memory vs. Signals bug fixes and HW
    breakpoint/watchpoint fixes by Michael Neuling.

    And more ... I appologize in advance if I've failed to highlight
    something that somebody deemed worth it."

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
    pstore: Add hsize argument in write_buf call of pstore_ftrace_call
    powerpc/fsl: add MPIC timer wakeup support
    powerpc/mpic: create mpic subsystem object
    powerpc/mpic: add global timer support
    powerpc/mpic: add irq_set_wake support
    powerpc/85xx: enable coreint for all the 64bit boards
    powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
    powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
    powerpc/mpic: Add get_version API both for internal and external use
    powerpc: Handle both new style and old style reserve maps
    powerpc/hw_brk: Fix off by one error when validating DAWR region end
    powerpc/pseries: Support compression of oops text via pstore
    powerpc/pseries: Re-organise the oops compression code
    pstore: Pass header size in the pstore write callback
    powerpc/powernv: Fix iommu initialization again
    powerpc/pseries: Inform the hypervisor we are using EBB regs
    powerpc/perf: Add power8 EBB support
    powerpc/perf: Core EBB support for 64-bit book3s
    powerpc/perf: Drop MMCRA from thread_struct
    powerpc/perf: Don't enable if we have zero events
    ...

    Linus Torvalds
     

04 Jul, 2013

1 commit

  • Pull PCI changes from Bjorn Helgaas:
    "PCI device hotplug
    - Add pci_alloc_dev() interface (Gu Zheng)
    - Add pci_bus_get()/put() for reference counting (Jiang Liu)
    - Fix SR-IOV reference count issues (Jiang Liu)
    - Remove unused acpi_pci_roots list (Jiang Liu)

    MSI
    - Conserve interrupt resources on x86 (Alexander Gordeev)

    AER
    - Force fatal severity when component has been reset (Betty Dall)
    - Reset link below Root Port as well as Downstream Port (Betty Dall)
    - Fix "Firmware first" flag setting (Bjorn Helgaas)
    - Don't parse HEST for non-PCIe devices (Bjorn Helgaas)

    ASPM
    - Warn when we can't disable ASPM as driver requests (Bjorn Helgaas)

    Miscellaneous
    - Add CircuitCo PCI IDs (Darren Hart)
    - Add AMD CZ SATA and SMBus PCI IDs (Shane Huang)
    - Work around Ivytown NTB BAR size issue (Jon Mason)
    - Detect invalid initial BAR values (Kevin Hao)
    - Add pcibios_release_device() (Sebastian Ott)
    - Fix powerpc & sparc PCI_UNKNOWN power state usage (Bjorn Helgaas)"

    * tag 'pci-v3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits)
    MAINTAINERS: Add ACPI folks for ACPI-related things under drivers/pci
    PCI: Add CircuitCo vendor ID and subsystem ID
    PCI: Use pdev->pm_cap instead of pci_find_capability(..,PCI_CAP_ID_PM)
    PCI: Return early on allocation failures to unindent mainline code
    PCI: Simplify IOV implementation and fix reference count races
    PCI: Drop redundant setting of bus->is_added in virtfn_add_bus()
    unicore32/PCI: Remove redundant call of pci_bus_add_devices()
    m68k/PCI: Remove redundant call of pci_bus_add_devices()
    PCI / ACPI / PM: Use correct power state strings in messages
    PCI: Fix comment typo for pcie_pme_remove()
    PCI: Rename pci_release_bus_bridge_dev() to pci_release_host_bridge_dev()
    PCI: Fix refcount issue in pci_create_root_bus() error recovery path
    ia64/PCI: Clean up pci_scan_root_bus() usage
    PCI/AER: Reset link for devices below Root Port or Downstream Port
    ACPI / APEI: Force fatal AER severity when component has been reset
    PCI/AER: Remove "extern" from function declarations
    PCI/AER: Move AER severity defines to aer.h
    PCI/AER: Set dev->__aer_firmware_first only for matching devices
    PCI/AER: Factor out HEST device type matching
    PCI/AER: Don't parse HEST table for non-PCIe devices
    ...

    Linus Torvalds
     

03 Jul, 2013

1 commit

  • Pull perf updates from Ingo Molnar:
    "Kernel improvements:

    - watchdog driver improvements by Li Zefan
    - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
    - event multiplexing via hrtimers and other improvements by Stephane
    Eranian
    - kernel stack use optimization by Andrew Hunter
    - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
    - NMI handling rate-limits by Dave Hansen
    - various hw_breakpoint fixes by Oleg Nesterov
    - hw_breakpoint overflow period sampling and related signal handling
    fixes by Jiri Olsa
    - Intel Haswell PMU support by Andi Kleen

    Tooling improvements:

    - Reset SIGTERM handler in workload child process, fix from David
    Ahern.
    - Makefile reorganization, prep work for Kconfig patches, from Jiri
    Olsa.
    - Add automated make test suite, from Jiri Olsa.
    - Add --percent-limit option to 'top' and 'report', from Namhyung
    Kim.
    - Sorting improvements, from Namhyung Kim.
    - Expand definition of sysfs format attribute, from Michael Ellerman.

    Tooling fixes:

    - 'perf tests' fixes from Jiri Olsa.
    - Make Power7 CPI stack events available in sysfs, from Sukadev
    Bhattiprolu.
    - Handle death by SIGTERM in 'perf record', fix from David Ahern.
    - Fix printing of perf_event_paranoid message, from David Ahern.
    - Handle realloc failures in 'perf kvm', from David Ahern.
    - Fix divide by 0 in variance, from David Ahern.
    - Save parent pid in thread struct, from David Ahern.
    - Handle JITed code in shared memory, from Andi Kleen.
    - Fixes for 'perf diff', from Jiri Olsa.
    - Remove some unused struct members, from Jiri Olsa.
    - Add missing liblk.a dependency for python/perf.so, fix from Jiri
    Olsa.
    - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
    - No need to do locking when adding hists in perf report, only 'top'
    needs that, from Namhyung Kim.
    - Fix alignment of symbol column in in the hists browser (top,
    report) when -v is given, from NAmhyung Kim.
    - Fix 'perf top' -E option behavior, from Namhyung Kim.
    - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
    - Fix compile errors in bp_signal 'perf test', from Sukadev
    Bhattiprolu.

    ... and more things"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
    perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
    perf/x86: Fix shared register mutual exclusion enforcement
    perf/x86/intel: Support full width counting
    x86: Add NMI duration tracepoints
    perf: Drop sample rate when sampling is too slow
    x86: Warn when NMI handlers take large amounts of time
    hw_breakpoint: Introduce "struct bp_cpuinfo"
    hw_breakpoint: Simplify *register_wide_hw_breakpoint()
    hw_breakpoint: Introduce cpumask_of_bp()
    hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
    hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
    perf/x86/intel: Add mem-loads/stores support for Haswell
    perf/x86/intel: Support Haswell/v4 LBR format
    perf/x86/intel: Move NMI clearing to end of PMI handler
    perf/x86/intel: Add Haswell PEBS support
    perf/x86/intel: Add simple Haswell PMU support
    perf/x86/intel: Add Haswell PEBS record support
    perf/x86/intel: Fix sparse warning
    perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
    perf/x86/amd: Add IOMMU Performance Counter resource management
    ...

    Linus Torvalds
     

26 Jun, 2013

2 commits


25 Jun, 2013

2 commits

  • Calling clk_set_min_rate() is no better than just calling
    clk_set_rate() because MSM clock code already takes care of
    calling the min_rate ops if the clock really needs
    clk_set_min_rate() called on it.

    Cc: Joerg Roedel
    Signed-off-by: Stephen Boyd
    Acked-by: Joerg Roedel
    Signed-off-by: David Brown

    Stephen Boyd
     
  • Add calls to clk_prepare and unprepare so that MSM can migrate to
    the common clock framework. We never unprepare the clocks until
    driver remove because the clocks are enabled and disabled in irq
    context. Finer grained power management is possible in the future
    via runtime power management techniques.

    Cc: Joerg Roedel
    Signed-off-by: Stephen Boyd
    Acked-by: Joerg Roedel
    Signed-off-by: David Brown

    Stephen Boyd
     

24 Jun, 2013

1 commit

  • printk supports using %pa for phys_addr_t and
    %zx for size_t so use those instead of %lx and
    casts to unsigned long.

    Other miscellaneous changes around this:

    Always use 0x%zx for size instead of one use of decimal.
    Coalesce format and align arguments.

    Signed-off-by: Joe Perches
    Signed-off-by: Joerg Roedel

    Joe Perches
     

23 Jun, 2013

2 commits

  • If we use a large mapping, the expectation is that only unmaps from
    the first pte in the superpage are supported. Unmaps from offsets
    into the superpage should fail (ie. return zero sized unmap). In the
    current code, unmapping from an offset clears the size of the full
    mapping starting from an offset. For instance, if we map a 16k
    physically contiguous range at IOVA 0x0 with a large page, then
    attempt to unmap 4k at offset 12k, 4 ptes are cleared (12k - 28k) and
    the unmap returns 16k unmapped. This potentially incorrectly clears
    valid mappings and confuses drivers like VFIO that use the unmap size
    to release pinned pages.

    Fix by refusing to unmap from offsets into the page.

    Signed-off-by: Alex Williamson
    Cc: stable@vger.kernel.org
    Signed-off-by: Joerg Roedel

    Alex Williamson
     
  • Signed-off-by: Alex Williamson
    Signed-off-by: Joerg Roedel

    Alex Williamson
     

21 Jun, 2013

1 commit


20 Jun, 2013

8 commits

  • iommu_map splits requests into pages that the iommu driver reports
    that it can handle. The iommu_unmap path does not do the same. This
    can cause problems not only from callers that might expect the same
    behavior as the map path, but even from the failure path of iommu_map,
    should it fail at a point where it has mapped and needs to unwind a
    set of pages that the iommu driver cannot handle directly. amd_iommu,
    for example, will BUG_ON if asked to unmap a non power of 2 size.

    Fix this by extracting and generalizing the sizing code from the
    iommu_map path and use it for both map and unmap.

    Signed-off-by: Alex Williamson
    Signed-off-by: Joerg Roedel

    Alex Williamson
     
  • If a device is multifunction and does not have ACS enabled then we
    assume that the entire package lacks ACS and use function 0 as the
    base of the group. The PCIe spec however states that components are
    permitted to implement ACS on some, none, or all of their applicable
    functions. It's therefore conceivable that function 0 may be fully
    independent and support ACS while other functions do not. Instead
    use the lowest function of the slot that does not have ACS enabled
    as the base of the group. This may be the current device, which is
    intentional. So long as we use a consistent algorithm, all the
    non-ACS functions will be grouped together and ACS functions will
    get separate groups.

    Signed-off-by: Alex Williamson
    Signed-off-by: Joerg Roedel

    Alex Williamson
     
  • This patch fixes the checkpatch warnings in omap iommu
    code, most of them are related to broken strings.

    Signed-off-by: Suman Anna
    Signed-off-by: Joerg Roedel

    Suman Anna
     
  • Fixed the following printk format warnings for dma_addr_t
    for OMAP IOMMU.

    drivers/iommu/omap-iommu.c: In function 'omap_iommu_iova_to_phys':
    drivers/iommu/omap-iommu.c:1238:4: warning: format '%lx' expects type 'long unsigned int', but argument 4 has type 'dma_addr_t'
    drivers/iommu/omap-iommu.c:1245:4: warning: format '%lx' expects type 'long unsigned int', but argument 4 has type 'dma_addr_t'

    Signed-off-by: Suman Anna
    Signed-off-by: Joerg Roedel

    Suman Anna
     
  • In intel vt-d spec , chapter 8.1 , DMA Remapping Reporting Structure.
    In the end of the table, it says:

    Remapping Structures[]
    -
    A list of structures. The list will contain one or
    more DMA Remapping Hardware Unit Definition
    (DRHD) structures, and zero or more Reserved
    Memory Region Reporting (RMRR) and Root Port
    ATS Capability Reporting (ATSR) structures.
    These structures are described below.

    So, there should be at least one DRHD structure in DMA Remapping
    reporting table. If there is no DRHD found, a warning is necessary.

    Signed-off-by: Li, Zhen-Hua
    Signed-off-by: Joerg Roedel

    Li, Zhen-Hua
     
  • This triggers on a MacBook Pro.
    See https://bugzilla.redhat.com/show_bug.cgi?id=948262 for
    the problem report.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Joerg Roedel

    Andy Lutomirski
     
  • The enables VFIO on the pSeries platform, enabling user space
    programs to access PCI devices directly.

    Signed-off-by: Alexey Kardashevskiy
    Cc: David Gibson
    Signed-off-by: Paul Mackerras
    Acked-by: Alex Williamson
    Signed-off-by: Benjamin Herrenschmidt

    Alexey Kardashevskiy
     
  • This initializes IOMMU groups based on the IOMMU configuration
    discovered during the PCI scan on POWERNV (POWER non virtualized)
    platform. The IOMMU groups are to be used later by the VFIO driver,
    which is used for PCI pass through.

    It also implements an API for mapping/unmapping pages for
    guest PCI drivers and providing DMA window properties.
    This API is going to be used later by QEMU-VFIO to handle
    h_put_tce hypercalls from the KVM guest.

    The iommu_put_tce_user_mode() does only a single page mapping
    as an API for adding many mappings at once is going to be
    added later.

    Although this driver has been tested only on the POWERNV
    platform, it should work on any platform which supports
    TCE tables. As h_put_tce hypercall is received by the host
    kernel and processed by the QEMU (what involves calling
    the host kernel again), performance is not the best -
    circa 220MB/s on 10Gb ethernet network.

    To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
    option and configure VFIO as required.

    Cc: David Gibson
    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Paul Mackerras
    Signed-off-by: Benjamin Herrenschmidt

    Alexey Kardashevskiy
     

19 Jun, 2013

1 commit

  • Add functionality to check the availability of the AMD IOMMU Performance
    Counters and export this functionality to other core drivers, such as in this
    case, a perf AMD IOMMU PMU. This feature is not bound to any specific AMD
    family/model other than the presence of the IOMMU with P-C enabled.

    The AMD IOMMU P-C support static counting only at this time.

    Signed-off-by: Steven Kinney
    Signed-off-by: Suravee Suthikulpanit
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1370466709-3212-2-git-send-email-suravee.suthikulpanit@amd.com
    Signed-off-by: Ingo Molnar

    Steven L Kinney
     

04 Jun, 2013

1 commit

  • Current multiple-MSI implementation does not take into account actual
    number of requested MSIs and always rounds that number to a larger
    power-of-two value. Yet, the number of MSIs a PCI device could send (and
    therefore the number of messages a device driver could request) may be
    smaller. As result, resources allocated for extra MSIs are just wasted.

    This update takes advantage of 'msi_desc::nvec_used' field introduced with
    generic MSI code to track the number of requested and used MSIs. As
    result, resources associated with interrupts are conserved. Of those
    resources most noticeable are x86 interrupt vectors.

    The initial version of this fix also conserved IRTEs, but Jan noticed that
    a malfunctioning PCI device might send a message number it did not claim
    and thus refer to an IRTE it does not own. To avoid this security hole,
    as many IRTEs are reserved as the device could possibly send.

    [bhelgaas: changelog, rename to "nvec_used"]
    Signed-off-by: Alexander Gordeev
    Signed-off-by: Bjorn Helgaas

    Alexander Gordeev