24 Sep, 2020

3 commits

  • There can be clients using the same swgroup in DT, for example i2c0
    and i2c1. The current driver will add them to separate IOMMU groups,
    though it has implemented device_group() callback which is to group
    devices using different swgroups like DC and DCB.

    All clients having the same swgroup should be also added to the same
    IOMMU group so as to share an asid. Otherwise, the asid register may
    get overwritten every time a new device is attached.

    Signed-off-by: Nicolin Chen
    Acked-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200911071643.17212-4-nicoleotsuka@gmail.com
    Signed-off-by: Joerg Roedel

    Nicolin Chen
     
  • IOVA might not be always 4KB aligned. So tegra_smmu_iova_to_phys
    function needs to add on the lower 12-bit offset from input iova.

    Signed-off-by: Nicolin Chen
    Acked-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200911071643.17212-3-nicoleotsuka@gmail.com
    Signed-off-by: Joerg Roedel

    Nicolin Chen
     
  • PAGE_SHIFT and PAGE_MASK are defined corresponding to the page size
    for CPU virtual addresses, which means PAGE_SHIFT could be a number
    other than 12, but tegra-smmu maintains fixed 4KB IOVA pages and has
    fixed [21:12] bit range for PTE entries.

    So this patch replaces all PAGE_SHIFT/PAGE_MASK references with the
    macros defined with SMMU_PTE_SHIFT.

    Signed-off-by: Nicolin Chen
    Acked-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200911071643.17212-2-nicoleotsuka@gmail.com
    Signed-off-by: Joerg Roedel

    Nicolin Chen
     

18 Sep, 2020

1 commit

  • The "num_tlb_lines" might not be a power-of-2 value, being 48 on
    Tegra210 for example. So the current way of calculating tlb_mask
    using the num_tlb_lines is not correct: tlb_mask=0x5f in case of
    num_tlb_lines=48, which will trim a setting of 0x30 (48) to 0x10.

    Signed-off-by: Nicolin Chen
    Acked-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200917113155.13438-2-nicoleotsuka@gmail.com
    Signed-off-by: Joerg Roedel

    Nicolin Chen
     

04 Sep, 2020

4 commits

  • The mapping operations of the Tegra SMMU driver are subjected to a race
    condition issues because SMMU Address Space isn't allocated and freed
    atomically, while it should be. This patch makes the mapping operations
    atomic, it fixes an accidentally released Host1x Address Space problem
    which happens while running multiple graphics tests in parallel on
    Tegra30, i.e. by having multiple threads racing with each other in the
    Host1x's submission and completion code paths, performing IOVA mappings
    and unmappings in parallel.

    Signed-off-by: Dmitry Osipenko
    Tested-by: Thierry Reding
    Acked-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200901203730.27865-1-digetx@gmail.com
    Signed-off-by: Joerg Roedel

    Dmitry Osipenko
     
  • In order to share groups between multiple devices we keep track of them
    in a per-SMMU list. When an IOMMU group is released, a dangling pointer
    to it stays around in that list. Fix this by implementing an IOMMU data
    release callback for groups where the dangling pointer can be removed.

    Signed-off-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200806155404.3936074-4-thierry.reding@gmail.com
    Signed-off-by: Joerg Roedel

    Thierry Reding
     
  • For groups that are shared between multiple devices, care must be taken
    to acquire a reference for each device, otherwise the IOMMU core ends up
    dropping the last reference too early, which will cause the group to be
    released while consumers may still be thinking that they're holding a
    reference to it.

    Signed-off-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200806155404.3936074-3-thierry.reding@gmail.com
    Signed-off-by: Joerg Roedel

    Thierry Reding
     
  • Set the name of static IOMMU groups to help with debugging.

    Signed-off-by: Thierry Reding
    Link: https://lore.kernel.org/r/20200806155404.3936074-2-thierry.reding@gmail.com
    Signed-off-by: Joerg Roedel

    Thierry Reding
     

30 Jun, 2020

1 commit


05 May, 2020

1 commit


13 Nov, 2019

1 commit


18 Oct, 2019

3 commits

  • Page tables that reside in physical memory beyond the 4 GiB boundary are
    currently not working properly. The reason is that when the physical
    address for page directory entries is read, it gets truncated at 32 bits
    and can cause crashes when passing that address to the DMA API.

    Fix this by first casting the PDE value to a dma_addr_t and then using
    the page frame number mask for the SMMU instance to mask out the invalid
    bits, which are typically used for mapping attributes, etc.

    Signed-off-by: Thierry Reding
    Signed-off-by: Joerg Roedel

    Thierry Reding
     
  • Enable clients' translation only after setting up the swgroups.

    Signed-off-by: Navneet Kumar
    Signed-off-by: Thierry Reding
    Signed-off-by: Joerg Roedel

    Navneet Kumar
     
  • Use PTB_ASID instead of SMMU_CONFIG to flush smmu.
    PTB_ASID can be accessed from non-secure mode, SMMU_CONFIG cannot be.
    Using SMMU_CONFIG could pose a problem when kernel doesn't have secure
    mode access enabled from boot.

    Signed-off-by: Navneet Kumar
    Reviewed-by: Dmitry Osipenko
    Tested-by: Dmitry Osipenko
    Signed-off-by: Thierry Reding
    Signed-off-by: Joerg Roedel

    Navneet Kumar
     

15 Oct, 2019

1 commit

  • Add a gfp_t parameter to the iommu_ops::map function.
    Remove the needless locking in the AMD iommu driver.

    The iommu_ops::map function (or the iommu_map function which calls it)
    was always supposed to be sleepable (according to Joerg's comment in
    this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
    should probably have had a "might_sleep()" since it was written. However
    currently the dma-iommu api can call iommu_map in an atomic context,
    which it shouldn't do. This doesn't cause any problems because any iommu
    driver which uses the dma-iommu api uses gfp_atomic in it's
    iommu_ops::map function. But doing this wastes the memory allocators
    atomic pools.

    Signed-off-by: Tom Murphy
    Reviewed-by: Robin Murphy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Joerg Roedel

    Tom Murphy
     

30 Jul, 2019

1 commit

  • To allow IOMMU drivers to batch up TLB flushing operations and postpone
    them until ->iotlb_sync() is called, extend the prototypes for the
    ->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to
    the current iommu_iotlb_gather structure.

    All affected IOMMU drivers are updated, but there should be no
    functional change since the extra parameter is ignored for now.

    Signed-off-by: Will Deacon

    Will Deacon
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license version 2 as
    published by the free software foundation #

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 4122 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

11 Apr, 2019

3 commits


16 Jan, 2019

1 commit


20 Dec, 2018

1 commit


17 Dec, 2018

1 commit


23 Nov, 2018

1 commit


08 Aug, 2018

1 commit

  • All iommu drivers use the default_iommu_map_sg implementation, and there
    is no good reason to ever override it. Just expose it as iommu_map_sg
    directly and remove the indirection, specially in our post-spectre world
    where indirect calls are horribly expensive.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Joerg Roedel

    Christoph Hellwig
     

21 Dec, 2017

1 commit


15 Dec, 2017

1 commit

  • Implement the ->device_group() and ->of_xlate() callbacks which are used
    in order to group devices. Each group can then share a single domain.

    This is implemented primarily in order to achieve the same semantics on
    Tegra210 and earlier as on Tegra186 where the Tegra SMMU was replaced by
    an ARM SMMU. Users of the IOMMU API can now use the same code to share
    domains between devices, whereas previously they used to attach each
    device individually.

    Acked-by: Alex Williamson
    Signed-off-by: Thierry Reding

    Thierry Reding
     

30 Aug, 2017

1 commit


17 Aug, 2017

1 commit


10 Aug, 2017

1 commit

  • As the last step to making groups mandatory, clean up the remaining
    drivers by adding basic support. Whilst it may not perfectly reflect
    the isolation capabilities of the hardware (tegra_smmu_swgroup sounds
    suspiciously like something that might warrant representing at the
    iommu_group level), using generic_device_group() should at least
    maintain existing behaviour with respect to the API.

    Signed-off-by: Robin Murphy
    Tested-by: Mikko Perttunen
    Signed-off-by: Joerg Roedel

    Robin Murphy
     

29 Apr, 2017

1 commit


13 Aug, 2015

9 commits

  • The number of TLB lines was increased from 16 on Tegra30 to 32 on
    Tegra114 and later. Parameterize the value so that the initial default
    can be set accordingly.

    On Tegra30, initializing the value to 32 would effectively disable the
    TLB and hence cause massive latencies for memory accesses translated
    through the SMMU. This is especially noticeable for isochronuous clients
    such as display, whose FIFOs would continuously underrun.

    Fixes: 891846516317 ("memory: Add NVIDIA Tegra memory controller support")
    Signed-off-by: Thierry Reding

    Thierry Reding
     
  • This code is used both when creating a new page directory entry and when
    tearing it down, with only the PDE value changing between both cases.

    Factor the code out so that it can be reused.

    Signed-off-by: Russell King
    [treding@nvidia.com: make commit message more accurate]
    Signed-off-by: Thierry Reding

    Russell King
     
  • Extract the use count reference accounting into a separate function and
    separate it from allocating the PTE.

    Signed-off-by: Russell King
    [treding@nvidia.com: extract and write commit message]
    Signed-off-by: Thierry Reding

    Russell King
     
  • Rather than explicitly zeroing pages allocated via alloc_page(), add
    __GFP_ZERO to the gfp mask to ask the allocator for zeroed pages.

    Signed-off-by: Russell King
    Signed-off-by: Thierry Reding

    Russell King
     
  • Remove the unnecessary manipulation of the PageReserved flags in the
    Tegra SMMU driver. None of this is required as the page(s) remain
    private to the SMMU driver.

    Signed-off-by: Russell King
    Signed-off-by: Thierry Reding

    Russell King
     
  • Use the DMA API instead of calling architecture internal functions in
    the Tegra SMMU driver.

    Signed-off-by: Russell King
    Signed-off-by: Thierry Reding

    Russell King
     
  • Pass smmu_flush_ptc() the device address rather than struct page
    pointer.

    Signed-off-by: Russell King
    Signed-off-by: Thierry Reding

    Russell King
     
  • smmu_flush_ptc() is used in two modes: one is to flush an individual
    entry, the other is to flush all entries. We know at the call site
    which we require. Split the function into smmu_flush_ptc_all() and
    smmu_flush_ptc().

    Signed-off-by: Russell King
    Signed-off-by: Thierry Reding

    Russell King
     
  • Drivers should not be using __cpuc_* functions nor outer_cache_flush()
    directly. This change partly cleans up tegra-smmu.c.

    The only difference between cache handling of the tegra variants is
    Denver, which omits the call to outer_cache_flush(). This is due to
    Denver being an ARM64 CPU, and the ARM64 architecture does not provide
    this function. (This, in itself, is a good reason why these should not
    be used.)

    Signed-off-by: Russell King
    [treding@nvidia.com: fix build failure on 64-bit ARM]
    Signed-off-by: Thierry Reding

    Russell King