Eric Lee / smarc-fsl-linux-kernel

13 May, 2020

1 commit

3a0ce12e3 iommu/iova: Unify format of the printed messages ... Browse Code »

Unify format of the printed messages, i.e. replace printk(LEVEL ... )
with pr_level(...).

Signed-off-by: Andy Shevchenko
Link: https://lore.kernel.org/r/20200507161804.13275-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Joerg Roedel

Andy Shevchenko
2020-05-13 16:54:51 +0800

23 Dec, 2019

1 commit

944c91753 iommu/iova: Silence warnings under memory pressure ... Browse Code »

When running heavy memory pressure workloads, this 5+ old system is
throwing endless warnings below because disk IO is too slow to recover
from swapping. Since the volume from alloc_iova_fast() could be large,
once it calls printk(), it will trigger disk IO (writing to the log
files) and pending softirqs which could cause an infinite loop and make
no progress for days by the ongoimng memory reclaim. This is the counter
part for Intel where the AMD part has already been merged. See the
commit 3d708895325b ("iommu/amd: Silence warnings under memory
pressure"). Since the allocation failure will be reported in
intel_alloc_iova(), so just call dev_err_once() there because even the
"ratelimited" is too much, and silence the one in alloc_iova_mem() to
avoid the expensive warn_alloc().

hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
slab_out_of_memory: 66 callbacks suppressed
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
node 0: slabs: 1822, objs: 16398, free: 0
node 1: slabs: 2051, objs: 18459, free: 31
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
node 0: slabs: 1822, objs: 16398, free: 0
node 1: slabs: 2051, objs: 18459, free: 31
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 0: slabs: 697, objs: 4182, free: 0
node 0: slabs: 697, objs: 4182, free: 0
node 0: slabs: 697, objs: 4182, free: 0
node 0: slabs: 697, objs: 4182, free: 0
node 1: slabs: 381, objs: 2286, free: 27
node 1: slabs: 381, objs: 2286, free: 27
node 1: slabs: 381, objs: 2286, free: 27
node 1: slabs: 381, objs: 2286, free: 27
node 0: slabs: 1822, objs: 16398, free: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 1: slabs: 2051, objs: 18459, free: 31
node 0: slabs: 697, objs: 4182, free: 0
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
node 1: slabs: 381, objs: 2286, free: 27
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 0: slabs: 697, objs: 4182, free: 0
node 1: slabs: 381, objs: 2286, free: 27
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
warn_alloc: 96 callbacks suppressed
kworker/11:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1
CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G B
Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19
12/27/2015
Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
dump_stack+0xa0/0xea
warn_alloc.cold.94+0x8a/0x12d
__alloc_pages_slowpath+0x1750/0x1870
__alloc_pages_nodemask+0x58a/0x710
alloc_pages_current+0x9c/0x110
alloc_slab_page+0xc9/0x760
allocate_slab+0x48f/0x5d0
new_slab+0x46/0x70
___slab_alloc+0x4ab/0x7b0
__slab_alloc+0x43/0x70
kmem_cache_alloc+0x2dd/0x450
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
alloc_iova+0x33/0x210
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 0: slabs: 697, objs: 4182, free: 0
alloc_iova_fast+0x62/0x3d1
node 1: slabs: 381, objs: 2286, free: 27
intel_alloc_iova+0xce/0xe0
intel_map_sg+0xed/0x410
scsi_dma_map+0xd7/0x160
scsi_queue_rq+0xbf7/0x1310
blk_mq_dispatch_rq_list+0x4d9/0xbc0
blk_mq_sched_dispatch_requests+0x24a/0x300
__blk_mq_run_hw_queue+0x156/0x230
blk_mq_run_work_fn+0x3b/0x40
process_one_work+0x579/0xb90
worker_thread+0x63/0x5b0
kthread+0x1e6/0x210
ret_from_fork+0x3a/0x50
Mem-Info:
active_anon:2422723 inactive_anon:361971 isolated_anon:34403
active_file:2285 inactive_file:1838 isolated_file:0
unevictable:0 dirty:1 writeback:5 unstable:0
slab_reclaimable:13972 slab_unreclaimable:453879
mapped:2380 shmem:154 pagetables:6948 bounce:0
free:19133 free_pcp:7363 free_cma:0

Signed-off-by: Qian Cai
Signed-off-by: Joerg Roedel

Qian Cai
2019-12-23 21:07:03 +0800

17 Dec, 2019

1 commit

472d26df5 iommu/iova: Init the struct iova to fix the possible memleak ... Browse Code »

During ethernet(Marvell octeontx2) set ring buffer test:
ethtool -G eth1 rx tx
following kmemleak will happen sometimes:

unreferenced object 0xffff000b85421340 (size 64):
comm "ethtool", pid 867, jiffies 4295323539 (age 550.500s)
hex dump (first 64 bytes):
80 13 42 85 0b 00 ff ff ff ff ff ff ff ff ff ff ..B.............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmem_cache_alloc+0x1b0/0x350
[] alloc_iova+0x3c/0x168
[] alloc_iova_fast+0x7c/0x2d8
[] iommu_dma_alloc_iova.isra.0+0x12c/0x138
[] __iommu_dma_map+0x8c/0xf8
[] iommu_dma_map_page+0x98/0xf8
[] otx2_alloc_rbuf+0xf4/0x158
[] otx2_rq_aura_pool_init+0x110/0x270
[] otx2_open+0x15c/0x734
[] otx2_dev_open+0x3c/0x68
[] otx2_set_ringparam+0x1ac/0x1d4
[] dev_ethtool+0xb84/0x2028
[] dev_ioctl+0x248/0x3a0
[] sock_ioctl+0x280/0x638
[] do_vfs_ioctl+0x8b0/0xa80
[] ksys_ioctl+0x84/0xb8

The reason:
When alloc_iova_mem() without initial with Zero, sometimes fpn_lo will
equal to IOVA_ANCHOR by chance, so when return with -ENOMEM(iova32_full)
from __alloc_and_insert_iova_range(), the new_iova will not be freed in
free_iova_mem().

Fixes: bb68b2fbfbd6 ("iommu/iova: Add rbtree anchor node")
Signed-off-by: Xiaotao Yin
Reviewed-by: Robin Murphy
Signed-off-by: Joerg Roedel

Xiaotao Yin
2019-12-17 18:13:20 +0800

30 Aug, 2019

1 commit

0d87308cc iommu/iova: Avoid false sharing on fq_timer_on ... Browse Code »

In commit 14bd9a607f90 ("iommu/iova: Separate atomic variables
to improve performance") Jinyu Qi identified that the atomic_cmpxchg()
in queue_iova() was causing a performance loss and moved critical fields
so that the false sharing would not impact them.

However, avoiding the false sharing in the first place seems easy.
We should attempt the atomic_cmpxchg() no more than 100 times
per second. Adding an atomic_read() will keep the cache
line mostly shared.

This false sharing came with commit 9a005a800ae8
("iommu/iova: Add flush timer").

Signed-off-by: Eric Dumazet
Fixes: 9a005a800ae8 ('iommu/iova: Add flush timer')
Cc: Jinyu Qi
Cc: Joerg Roedel
Acked-by: Robin Murphy
Signed-off-by: Joerg Roedel

Eric Dumazet
2019-08-30 21:21:53 +0800

22 Jul, 2019

2 commits

9eed17d37 iommu/iova: Remove stale cached32_node ... Browse Code »

Since the cached32_node is allowed to be advanced above dma_32bit_pfn
(to provide a shortcut into the limited range), we need to be careful to
remove the to be freed node if it is the cached32_node.

[ 48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110
[ 48.477812] Read of size 8 at addr ffff88870fc19020 by task kworker/u8:1/37
[ 48.477843]
[ 48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G U 5.2.0+ #735
[ 48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 48.478047] Workqueue: i915 __i915_gem_free_work [i915]
[ 48.478075] Call Trace:
[ 48.478111] dump_stack+0x5b/0x90
[ 48.478137] print_address_description+0x67/0x237
[ 48.478178] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478212] __kasan_report.cold.3+0x1c/0x38
[ 48.478240] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478280] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478308] __cached_rbnode_delete_update+0x68/0x110
[ 48.478344] private_free_iova+0x2b/0x60
[ 48.478378] iova_magazine_free_pfns+0x46/0xa0
[ 48.478403] free_iova_fast+0x277/0x340
[ 48.478443] fq_ring_free+0x15a/0x1a0
[ 48.478473] queue_iova+0x19c/0x1f0
[ 48.478597] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.478712] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.478826] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.478940] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.479053] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.479081] ? __sg_free_table+0x9e/0xf0
[ 48.479116] ? kfree+0x7f/0x150
[ 48.479234] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.479352] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.479465] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.479579] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.479607] process_one_work+0x495/0x710
[ 48.479642] worker_thread+0x4c7/0x6f0
[ 48.479687] ? process_one_work+0x710/0x710
[ 48.479724] kthread+0x1b2/0x1d0
[ 48.479774] ? kthread_create_worker_on_cpu+0xa0/0xa0
[ 48.479820] ret_from_fork+0x1f/0x30
[ 48.479864]
[ 48.479907] Allocated by task 631:
[ 48.479944] save_stack+0x19/0x80
[ 48.479994] __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 48.480038] kmem_cache_alloc+0x91/0xf0
[ 48.480082] alloc_iova+0x2b/0x1e0
[ 48.480125] alloc_iova_fast+0x58/0x376
[ 48.480166] intel_alloc_iova+0x90/0xc0
[ 48.480214] intel_map_sg+0xde/0x1f0
[ 48.480343] i915_gem_gtt_prepare_pages+0xb8/0x170 [i915]
[ 48.480465] huge_get_pages+0x232/0x2b0 [i915]
[ 48.480590] ____i915_gem_object_get_pages+0x40/0xb0 [i915]
[ 48.480712] __i915_gem_object_get_pages+0x90/0xa0 [i915]
[ 48.480834] i915_gem_object_prepare_write+0x2d6/0x330 [i915]
[ 48.480955] create_test_object.isra.54+0x1a9/0x3e0 [i915]
[ 48.481075] igt_shared_ctx_exec+0x365/0x3c0 [i915]
[ 48.481210] __i915_subtests.cold.4+0x30/0x92 [i915]
[ 48.481341] __run_selftests.cold.3+0xa9/0x119 [i915]
[ 48.481466] i915_live_selftests+0x3c/0x70 [i915]
[ 48.481583] i915_pci_probe+0xe7/0x220 [i915]
[ 48.481620] pci_device_probe+0xe0/0x180
[ 48.481665] really_probe+0x163/0x4e0
[ 48.481710] device_driver_attach+0x85/0x90
[ 48.481750] __driver_attach+0xa5/0x180
[ 48.481796] bus_for_each_dev+0xda/0x130
[ 48.481831] bus_add_driver+0x205/0x2e0
[ 48.481882] driver_register+0xca/0x140
[ 48.481927] do_one_initcall+0x6c/0x1af
[ 48.481970] do_init_module+0x106/0x350
[ 48.482010] load_module+0x3d2c/0x3ea0
[ 48.482058] __do_sys_finit_module+0x110/0x180
[ 48.482102] do_syscall_64+0x62/0x1f0
[ 48.482147] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 48.482190]
[ 48.482224] Freed by task 37:
[ 48.482273] save_stack+0x19/0x80
[ 48.482318] __kasan_slab_free+0x12e/0x180
[ 48.482363] kmem_cache_free+0x70/0x140
[ 48.482406] __free_iova+0x1d/0x30
[ 48.482445] fq_ring_free+0x15a/0x1a0
[ 48.482490] queue_iova+0x19c/0x1f0
[ 48.482624] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.482749] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.482873] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.482999] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.483123] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.483250] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.483378] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.483500] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.483622] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.483659] process_one_work+0x495/0x710
[ 48.483704] worker_thread+0x4c7/0x6f0
[ 48.483748] kthread+0x1b2/0x1d0
[ 48.483787] ret_from_fork+0x1f/0x30
[ 48.483831]
[ 48.483868] The buggy address belongs to the object at ffff88870fc19000
[ 48.483868] which belongs to the cache iommu_iova of size 40
[ 48.483920] The buggy address is located 32 bytes inside of
[ 48.483920] 40-byte region [ffff88870fc19000, ffff88870fc19028)
[ 48.483964] The buggy address belongs to the page:
[ 48.484006] page:ffffea001c3f0600 refcount:1 mapcount:0 mapping:ffff8888181a91c0 index:0x0 compound_mapcount: 0
[ 48.484045] flags: 0x8000000000010200(slab|head)
[ 48.484096] raw: 8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0
[ 48.484141] raw: 0000000000000000 0000000000120012 00000001ffffffff 0000000000000000
[ 48.484188] page dumped because: kasan: bad access detected
[ 48.484230]
[ 48.484265] Memory state around the buggy address:
[ 48.484314] ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484361] ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484406] >ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
[ 48.484451] ^
[ 48.484494] ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484530] ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602
Fixes: e60aa7b53845 ("iommu/iova: Extend rbtree node caching")
Signed-off-by: Chris Wilson
Cc: Robin Murphy
Cc: Joerg Roedel
Cc: Joerg Roedel
Cc: # v4.15+
Reviewed-by: Robin Murphy
Signed-off-by: Joerg Roedel

Chris Wilson
2019-07-22 23:50:49 +0800
effa46787 iommu/vt-d: Don't queue_iova() if there is no flush queue ... Browse Code »

Intel VT-d driver was reworked to use common deferred flushing
implementation. Previously there was one global per-cpu flush queue,
afterwards - one per domain.

Before deferring a flush, the queue should be allocated and initialized.

Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
queue. It's probably worth to init it for static or unmanaged domains
too, but it may be arguable - I'm leaving it to iommu folks.

Prevent queuing an iova flush if the domain doesn't have a queue.
The defensive check seems to be worth to keep even if queue would be
initialized for all kinds of domains. And is easy backportable.

On 4.19.43 stable kernel it has a user-visible effect: previously for
devices in si domain there were crashes, on sata devices:

BUG: spinlock bad magic on CPU#6, swapper/0/1
lock: 0xffff88844f582008, .magic: 00000000, .owner: /-1, .owner_cpu: 0
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
Call Trace:

dump_stack+0x61/0x7e
spin_bug+0x9d/0xa3
do_raw_spin_lock+0x22/0x8e
_raw_spin_lock_irqsave+0x32/0x3a
queue_iova+0x45/0x115
intel_unmap+0x107/0x113
intel_unmap_sg+0x6b/0x76
__ata_qc_complete+0x7f/0x103
ata_qc_complete+0x9b/0x26a
ata_qc_complete_multiple+0xd0/0xe3
ahci_handle_port_interrupt+0x3ee/0x48a
ahci_handle_port_intr+0x73/0xa9
ahci_single_level_irq_intr+0x40/0x60
__handle_irq_event_percpu+0x7f/0x19a
handle_irq_event_percpu+0x32/0x72
handle_irq_event+0x38/0x56
handle_edge_irq+0x102/0x121
handle_irq+0x147/0x15c
do_IRQ+0x66/0xf2
common_interrupt+0xf/0xf
RIP: 0010:__do_softirq+0x8c/0x2df

The same for usb devices that use ehci-pci:
BUG: spinlock bad magic on CPU#0, swapper/0/1
lock: 0xffff88844f402008, .magic: 00000000, .owner: /-1, .owner_cpu: 0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
Call Trace:

dump_stack+0x61/0x7e
spin_bug+0x9d/0xa3
do_raw_spin_lock+0x22/0x8e
_raw_spin_lock_irqsave+0x32/0x3a
queue_iova+0x77/0x145
intel_unmap+0x107/0x113
intel_unmap_page+0xe/0x10
usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
usb_hcd_unmap_urb_for_dma+0x17/0x100
unmap_urb_for_dma+0x22/0x24
__usb_hcd_giveback_urb+0x51/0xc3
usb_giveback_urb_bh+0x97/0xde
tasklet_action_common.isra.4+0x5f/0xa1
tasklet_action+0x2d/0x30
__do_softirq+0x138/0x2df
irq_exit+0x7d/0x8b
smp_apic_timer_interrupt+0x10f/0x151
apic_timer_interrupt+0xf/0x20

RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39

Cc: David Woodhouse
Cc: Joerg Roedel
Cc: Lu Baolu
Cc: iommu@lists.linux-foundation.org
Cc: # 4.14+
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Dmitry Safonov
Reviewed-by: Lu Baolu
Signed-off-by: Joerg Roedel

Dmitry Safonov
2019-07-22 23:43:06 +0800

05 Jun, 2019

1 commit

3b20eb237 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 320 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 59 temple place suite 330 boston ma 02111
1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 33 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Kate Stewart
Reviewed-by: Alexios Zavras
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000435.254582722@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:37:05 +0800

22 Mar, 2019

1 commit

80ef4464d iommu/iova: Fix tracking of recently failed iova address ... Browse Code »

If a 32 bit allocation request is too big to possibly succeed, it
early exits with a failure and then should never update max32_alloc_
size. This patch fixes current code, now the size is only updated if
the slow path failed while walking the tree. Without the fix the
allocation may enter the slow path again even if there was a failure
before of a request with the same or a smaller size.

Cc: # 4.20+
Fixes: bee60e94a1e2 ("iommu/iova: Optimise attempts to allocate iova from 32bit address range")
Reviewed-by: Robin Murphy
Signed-off-by: Robert Richter
Signed-off-by: Joerg Roedel

Robert Richter
2019-03-22 19:01:58 +0800

25 Sep, 2018

1 commit

bee60e94a iommu/iova: Optimise attempts to allocate iova from 32bit address range ... Browse Code »

As an optimisation for PCI devices, there is always first attempt
been made to allocate iova from SAC address range. This will lead
to unnecessary attempts, when there are no free ranges
available. Adding fix to track recently failed iova address size and
allow further attempts, only if requested size is lesser than a failed
size. The size is updated when any replenish happens.

Reviewed-by: Robin Murphy
Signed-off-by: Ganapatrao Kulkarni
Signed-off-by: Joerg Roedel

Ganapatrao Kulkarni
2018-09-25 16:18:27 +0800

22 Nov, 2017

1 commit

e99e88a9d treewide: setup_timer() -> timer_setup() ... Browse Code »

This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.

Casting from unsigned long:

void my_callback(unsigned long data)
{
struct something *ptr = (struct something *)data;
...
}
...
setup_timer(&ptr->my_timer, my_callback, ptr);

and forced object casts:

void my_callback(struct something *ptr)
{
...
}
...
setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

become:

void my_callback(struct timer_list *t)
{
struct something *ptr = from_timer(ptr, t, my_timer);
...
}
...
timer_setup(&ptr->my_timer, my_callback, 0);

Direct function assignments:

void my_callback(unsigned long data)
{
struct something *ptr = (struct something *)data;
...
}
...
ptr->my_timer.function = my_callback;

have a temporary cast added, along with converting the args:

void my_callback(struct timer_list *t)
{
struct something *ptr = from_timer(ptr, t, my_timer);
...
}
...
ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

And finally, callbacks without a data assignment:

void my_callback(unsigned long data)
{
...
}
...
setup_timer(&ptr->my_timer, my_callback, 0);

have their argument renamed to verify they're unused during conversion:

void my_callback(struct timer_list *unused)
{
...
}
...
timer_setup(&ptr->my_timer, my_callback, 0);

The conversion is done with the following Coccinelle script:

spatch --very-quiet --all-includes --include-headers \
-I ./arch/x86/include -I ./arch/x86/include/generated \
-I ./include -I ./arch/x86/include/uapi \
-I ./arch/x86/include/generated/uapi -I ./include/uapi \
-I ./include/generated/uapi --include ./include/linux/kconfig.h \
--dir . \
--cocci-file ~/src/data/timer_setup.cocci

@fix_address_of@
expression e;
@@

setup_timer(
-&(e)
+&e
, ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_timer, NULL, _E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E->_timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, &_E);
+timer_setup(&_E._timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_timer, _callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
_E->_timer@_stl.function = _callback;
|
_E->_timer@_stl.function = &_callback;
|
_E->_timer@_stl.function = (_cast_func)_callback;
|
_E->_timer@_stl.function = (_cast_func)&_callback;
|
_E._timer@_stl.function = _callback;
|
_E._timer@_stl.function = &_callback;
|
_E._timer@_stl.function = (_cast_func)_callback;
|
_E._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

void _callback(
-_origtype _origarg
+struct timer_list *t
)
{
(
... when != _origarg
_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
|
... when != _origarg
_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
|
... when != _origarg
_handletype *_handle;
... when != _handle
_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
|
... when != _origarg
_handletype *_handle;
... when != _handle
_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
... when != _origarg
)
}

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
depends on change_timer_function_usage &&
!change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

void _callback(
-_origtype _origarg
+struct timer_list *t
)
{
+ _handletype *_origarg = from_timer(_origarg, t, _timer);
+
... when != _origarg
- (_handletype *)_origarg
+ _origarg
... when != _origarg
}

// Avoid already converted callbacks.
@match_callback_converted
depends on change_timer_function_usage &&
!change_callback_handle_cast &&
!change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

void _callback(struct timer_list *t)
{ ... }

// callback(struct something *handle)
@change_callback_handle_arg
depends on change_timer_function_usage &&
!match_callback_converted &&
!change_callback_handle_cast &&
!change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

void _callback(
-_handletype *_handle
+struct timer_list *t
)
{
+ _handletype *_handle = from_timer(_handle, t, _timer);
...
}

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
depends on change_timer_function_usage &&
change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

void _callback(struct timer_list *t)
{
- _handletype *_handle = from_timer(_handle, t, _timer);
}

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
depends on change_timer_function_usage &&
!change_callback_handle_cast &&
!change_callback_handle_cast_no_arg &&
!change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_timer, _callback, 0);
+setup_timer(&_E->_timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._timer, _callback, 0);
+setup_timer(&_E._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
depends on change_timer_function_usage &&
(change_callback_handle_cast ||
change_callback_handle_cast_no_arg ||
change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
_E->_timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E->_timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E->_timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
;
|
_E->_timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
;
|
_E._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
depends on change_timer_function_usage &&
(change_callback_handle_cast ||
change_callback_handle_cast_no_arg ||
change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

_callback(
(
-(_cast_data)_E
+&_E->_timer
|
-(_cast_data)&_E
+&_E._timer
|
-_E
+&_E->_timer
)
)

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_timer, _callback, 0);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0L);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0UL);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0L);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0UL);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0L);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0UL);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0L);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0UL);
+timer_setup(_timer, _callback, 0);
)

@change_callback_unused_data
depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

void _callback(
-_origtype _origarg
+struct timer_list *unused
)
{
... when != _origarg
}

Signed-off-by: Kees Cook

Kees Cook
2017-11-22 07:57:07 +0800

07 Nov, 2017

1 commit

94e2cc4db iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq ... Browse Code »

get_cpu_ptr() disabled preemption and returns the ->fq object of the
current CPU. raw_cpu_ptr() does the same except that it not disable
preemption which means the scheduler can move it to another CPU after it
obtained the per-CPU object.
In this case this is not bad because the data structure itself is
protected with a spin_lock. This change shouldn't matter however on RT
it does because the sleeping lock can't be accessed with disabled
preemption.

Cc: Joerg Roedel
Cc: iommu@lists.linux-foundation.org
Reported-by: vinadhy@gmail.com
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Alex Williamson

Sebastian Andrzej Siewior
2017-11-07 02:24:35 +0800

12 Oct, 2017

1 commit

538d5b333 iommu/iova: Make rcache flush optional on IOVA allocation failure ... Browse Code »

Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
rcache for each CPU becomes serious bottleneck so we may want to defer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
Thus we may flush our rcaches and still get no new IOVA like in the
commonly used scenario:

if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

if (!iova)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
PCI devices a SAC address
2. alloc_iova() fails due to full 32-bit space
3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
throws entries away for nothing and alloc_iova() fails again
4. Next alloc_iova_fast() call cannot take advantage of rcache since we
have just defeated caches. In this case we pick the slowest option
to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.

Signed-off-by: Tomasz Nowicki
Reviewed-by: Robin Murphy
Tested-by: Nate Watterson
Signed-off-by: Joerg Roedel

Tomasz Nowicki
2017-10-12 20:18:02 +0800

02 Oct, 2017

1 commit

abbb8a093 iommu/iova: Don't try to copy anchor nodes ... Browse Code »

Anchor nodes are not reserved IOVAs in the way that copy_reserved_iova()
cares about - while the failure from reserve_iova() is benign since the
target domain will already have its own anchor, we still don't want to
be triggering spurious warnings.

Reported-by: kernel test robot
Signed-off-by: Robin Murphy
Fixes: bb68b2fbfbd6 ('iommu/iova: Add rbtree anchor node')
Signed-off-by: Joerg Roedel

Robin Murphy
2017-10-02 21:45:24 +0800

28 Sep, 2017

3 commits

e8b198402 iommu/iova: Try harder to allocate from rcache magazine ... Browse Code »

When devices with different DMA masks are using the same domain, or for
PCI devices where we usually try a speculative 32-bit allocation first,
there is a fair possibility that the top PFN of the rcache stack at any
given time may be unsuitable for the lower limit, prompting a fallback
to allocating anew from the rbtree. Consequently, we may end up
artifically increasing pressure on the 32-bit IOVA space as unused IOVAs
accumulate lower down in the rcache stacks, while callers with 32-bit
masks also impose unnecessary rbtree overhead.

In such cases, let's try a bit harder to satisfy the allocation locally
first - scanning the whole stack should still be relatively inexpensive.

Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2017-09-28 20:57:16 +0800
b826ee9a4 iommu/iova: Make rcache limit_pfn handling more robust ... Browse Code »

When popping a pfn from an rcache, we are currently checking it directly
against limit_pfn for viability. Since this represents iova->pfn_lo, it
is technically possible for the corresponding iova->pfn_hi to be greater
than limit_pfn. Although we generally get away with it in practice since
limit_pfn is typically a power-of-two boundary and the IOVAs are
size-aligned, it's pretty trivial to make the iova_rcache_get() path
take the allocation size into account for complete safety.

Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2017-09-28 20:57:15 +0800
7595dc588 iommu/iova: Simplify domain destruction ... Browse Code »

All put_iova_domain() should have to worry about is freeing memory - by
that point the domain must no longer be live, so the act of cleaning up
doesn't need to be concurrency-safe or maintain the rbtree in a
self-consistent state. There's no need to waste time with locking or
emptying the rcache magazines, and we can just use the postorder
traversal helper to clear out the remaining rbtree entries in-place.

Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2017-09-28 20:57:15 +0800

27 Sep, 2017

6 commits

973f5fbed iommu/iova: Simplify cached node logic ... Browse Code »

The logic of __get_cached_rbnode() is a little obtuse, but then
__get_prev_node_of_cached_rbnode_or_last_node_and_update_limit_pfn()
wouldn't exactly roll off the tongue...

Now that we have the invariant that there is always a valid node to
start searching downwards from, everything gets a bit easier to follow
if we simplify that function to do what it says on the tin and return
the cached node (or anchor node as appropriate) directly. In turn, we
can then deduplicate the rb_prev() and limit_pfn logic into the main
loop itself, further reduce the amount of code under the lock, and
generally make the inner workings a bit less subtle.

Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2017-09-27 23:09:58 +0800
bb68b2fbf iommu/iova: Add rbtree anchor node ... Browse Code »

Add a permanent dummy IOVA reservation to the rbtree, such that we can
always access the top of the address space instantly. The immediate
benefit is that we remove the overhead of the rb_last() traversal when
not using the cached node, but it also paves the way for further
simplifications.

Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2017-09-27 23:09:57 +0800
aa3ac9469 iommu/iova: Make dma_32bit_pfn implicit ... Browse Code »

Now that the cached node optimisation can apply to all allocations, the
couple of users which were playing tricks with dma_32bit_pfn in order to
benefit from it can stop doing so. Conversely, there is also no need for
all the other users to explicitly calculate a 'real' 32-bit PFN, when
init_iova_domain() can happily do that itself from the page granularity.

CC: Thierry Reding
CC: Jonathan Hunter
CC: David Airlie
CC: Sudeep Dutt
CC: Ashutosh Dixit
Signed-off-by: Zhen Lei
Tested-by: Ard Biesheuvel
Tested-by: Zhen Lei
Tested-by: Nate Watterson
[rm: use iova_shift(), rewrote commit message]
Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Zhen Lei
2017-09-27 23:09:57 +0800
e60aa7b53 iommu/iova: Extend rbtree node caching ... Browse Code »

The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.

Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.

Inspired by work by Zhen Lei .

Tested-by: Ard Biesheuvel
Tested-by: Zhen Lei
Tested-by: Nate Watterson
Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2017-09-27 23:09:57 +0800
086c83acb iommu/iova: Optimise the padding calculation ... Browse Code »

The mask for calculating the padding size doesn't change, so there's no
need to recalculate it every loop iteration. Furthermore, Once we've
done that, it becomes clear that we don't actually need to calculate a
padding size at all - by flipping the arithmetic around, we can just
combine the upper limit, size, and mask directly to check against the
lower limit.

For an arm64 build, this alone knocks 20% off the object code size of
the entire alloc_iova() function!

Signed-off-by: Zhen Lei
Tested-by: Ard Biesheuvel
Tested-by: Zhen Lei
Tested-by: Nate Watterson
[rm: simplified more of the arithmetic, rewrote commit message]
Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Zhen Lei
2017-09-27 23:09:56 +0800
2070f940a iommu/iova: Optimise rbtree searching ... Browse Code »

Checking the IOVA bounds separately before deciding which direction to
continue the search (if necessary) results in redundantly comparing both
pfns twice each. GCC can already determine that the final comparison op
is redundant and optimise it down to 3 in total, but we can go one
further with a little tweak of the ordering (which makes the intent of
the code that much cleaner as a bonus).

Signed-off-by: Zhen Lei
Tested-by: Ard Biesheuvel
Tested-by: Zhen Lei
Tested-by: Nate Watterson
[rm: rewrote commit message to clarify]
Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Zhen Lei
2017-09-27 23:09:56 +0800

16 Aug, 2017

5 commits

9a005a800 iommu/iova: Add flush timer ... Browse Code »

Add a timer to flush entries from the Flush-Queues every
10ms. This makes sure that no stale TLB entries remain for
too long after an IOVA has been unmapped.

Signed-off-by: Joerg Roedel

Joerg Roedel
2017-08-16 00:23:52 +0800
8109c2a2f iommu/iova: Add locking to Flush-Queues ... Browse Code »

The lock is taken from the same CPU most of the time. But
having it allows to flush the queue also from another CPU if
necessary.

This will be used by a timer to regularily flush any pending
IOVAs from the Flush-Queues.

Signed-off-by: Joerg Roedel

Joerg Roedel
2017-08-16 00:23:52 +0800
fb418dab8 iommu/iova: Add flush counters to Flush-Queue implementation ... Browse Code »

There are two counters:

* fq_flush_start_cnt - Increased when a TLB flush
is started.

* fq_flush_finish_cnt - Increased when a TLB flush
is finished.

The fq_flush_start_cnt is assigned to every Flush-Queue
entry on its creation. When freeing entries from the
Flush-Queue, the value in the entry is compared to the
fq_flush_finish_cnt. The entry can only be freed when its
value is less than the value of fq_flush_finish_cnt.

The reason for these counters it to take advantage of IOMMU
TLB flushes that happened on other CPUs. These already
flushed the TLB for Flush-Queue entries on other CPUs so
that they can already be freed without flushing the TLB
again.

This makes it less likely that the Flush-Queue is full and
saves IOMMU TLB flushes.

Signed-off-by: Joerg Roedel

Joerg Roedel
2017-08-16 00:23:51 +0800
192821010 iommu/iova: Implement Flush-Queue ring buffer ... Browse Code »

Add a function to add entries to the Flush-Queue ring
buffer. If the buffer is full, call the flush-callback and
free the entries.

Signed-off-by: Joerg Roedel

Joerg Roedel
2017-08-16 00:23:51 +0800
42f87e71c iommu/iova: Add flush-queue data structures ... Browse Code »

This patch adds the basic data-structures to implement
flush-queues in the generic IOVA code. It also adds the
initialization and destroy routines for these data
structures.

The initialization routine is designed so that the use of
this feature is optional for the users of IOVA code.

Signed-off-by: Joerg Roedel

Joerg Roedel
2017-08-16 00:23:50 +0800

28 Jun, 2017

2 commits

6a7086431 Merge branches 'iommu/fixes', 'arm/rockchip', 'arm/renesas', 'arm/smmu', 'arm/co… ... Browse Code »

…re', 'x86/vt-d', 'x86/amd', 's390' and 'core' into next

Joerg Roedel
2017-06-28 20:45:02 +0800
aaffaa8a3 iommu/iova: Don't disable preempt around this_cpu_ptr() ... Browse Code »

Commit 583248e6620a ("iommu/iova: Disable preemption around use of
this_cpu_ptr()") disables preemption while accessing a per-CPU variable.
This does keep lockdep quiet. However I don't see the point why it is
bad if we get migrated after its access to another CPU.
__iova_rcache_insert() and __iova_rcache_get() immediately locks the
variable after obtaining it - before accessing its members.
_If_ we get migrated away after retrieving the address of cpu_rcache
before taking the lock then the *other* task on the same CPU will
retrieve the same address of cpu_rcache and will spin on the lock.

alloc_iova_fast() disables preemption while invoking
free_cpu_cached_iovas() on each CPU. The function itself uses
per_cpu_ptr() which does not trigger a warning (like this_cpu_ptr()
does). It _could_ make sense to use get_online_cpus() instead but the we
have a hotplug notifier for CPU down (and none for up) so we are good.

Cc: Joerg Roedel
Cc: iommu@lists.linux-foundation.org
Cc: Andrew Morton
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Joerg Roedel

Sebastian Andrzej Siewior
2017-06-28 18:23:01 +0800

17 May, 2017

1 commit

757c370f0 iommu/iova: Sort out rbtree limit_pfn handling ... Browse Code »

When walking the rbtree, the fact that iovad->start_pfn and limit_pfn
are both inclusive limits creates an ambiguity once limit_pfn reaches
the bottom of the address space and they overlap. Commit 5016bdb796b3
("iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range") fixed
the worst side-effect of this, that of underflow wraparound leading to
bogus allocations, but the remaining fallout is that any attempt to
allocate start_pfn itself erroneously fails.

The cleanest way to resolve the ambiguity is to simply make limit_pfn an
exclusive limit when inside the guts of the rbtree. Since we're working
with PFNs, representing one past the top of the address space is always
possible without fear of overflow, and elsewhere it just makes life a
little more straightforward.

Reported-by: Aaron Sierra
Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2017-05-17 21:34:15 +0800

05 May, 2017

1 commit

2c0248d68 Merge branches 'arm/exynos', 'arm/omap', 'arm/rockchip', 'arm/mediatek', 'arm/sm… ... Browse Code »

…mu', 'arm/core', 'x86/vt-d', 'x86/amd' and 'core' into next

Joerg Roedel
2017-05-05 00:06:17 +0800

07 Apr, 2017

1 commit

5016bdb79 iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range ... Browse Code »

Normally, calling alloc_iova() using an iova_domain with insufficient
pfns remaining between start_pfn and dma_limit will fail and return a
NULL pointer. Unexpectedly, if such a "full" iova_domain contains an
iova with pfn_lo == 0, the alloc_iova() call will instead succeed and
return an iova containing invalid pfns.

This is caused by an underflow bug in __alloc_and_insert_iova_range()
that occurs after walking the "full" iova tree when the search ends
at the iova with pfn_lo == 0 and limit_pfn is then adjusted to be just
below that (-1). This (now huge) limit_pfn gives the impression that a
vast amount of space is available between it and start_pfn and thus
a new iova is allocated with the invalid pfn_hi value, 0xFFF.... .

To rememdy this, a check is introduced to ensure that adjustments to
limit_pfn will not underflow.

This issue has been observed in the wild, and is easily reproduced with
the following sample code.

struct iova_domain *iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
struct iova *rsvd_iova, *good_iova, *bad_iova;
unsigned long limit_pfn = 3;
unsigned long start_pfn = 1;
unsigned long va_size = 2;

init_iova_domain(iovad, SZ_4K, start_pfn, limit_pfn);
rsvd_iova = reserve_iova(iovad, 0, 0);
good_iova = alloc_iova(iovad, va_size, limit_pfn, true);
bad_iova = alloc_iova(iovad, va_size, limit_pfn, true);

Prior to the patch, this yielded:
*rsvd_iova == {0, 0} /* Expected */
*good_iova == {2, 3} /* Expected */
*bad_iova == {-2, -1} /* Oh no... */

After the patch, bad_iova is NULL as expected since inadequate
space remains between limit_pfn and start_pfn after allocating
good_iova.

Signed-off-by: Nate Watterson
Signed-off-by: Joerg Roedel

Nate Watterson
2017-04-07 19:40:40 +0800

21 Mar, 2017

1 commit

d751751a9 iommu/iova: Consolidate code for adding new node to iovad domain rbtree ... Browse Code »

This patch consolidates almost the same code used in iova_insert_rbtree()
and __alloc_and_insert_iova_range() functions. While touching this code,
replace BUG() with WARN_ON(1) to avoid taking down the whole system in
case of corrupted iova tree or incorrect calls.

Signed-off-by: Marek Szyprowski
Signed-off-by: Joerg Roedel

Marek Szyprowski
2017-03-21 21:58:18 +0800

04 Jan, 2017

1 commit

eba484b51 iommu/iova: Use rb_entry() ... Browse Code »

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang
Signed-off-by: Joerg Roedel

Geliang Tang
2017-01-04 22:35:19 +0800

15 Nov, 2016

1 commit

62280cf2e iommu/iova: Extend cached node lookup condition ... Browse Code »

When searching for a free IOVA range, we optimise the tree traversal
by starting from the cached32_node, instead of the last node, when
limit_pfn is equal to dma_32bit_pfn. However, if limit_pfn happens to
be smaller, then we'll go ahead and start from the top even though
dma_32bit_pfn is still a more suitable upper bound. Since this is
clearly a silly thing to do, adjust the lookup condition appropriately.

Signed-off-by: Robin Murphy
Signed-off-by: Joerg Roedel

Robin Murphy
2016-11-15 19:39:52 +0800

27 Jun, 2016

1 commit

583248e66 iommu/iova: Disable preemption around use of this_cpu_ptr() ... Browse Code »

Between acquiring the this_cpu_ptr() and using it, ideally we don't want
to be preempted and work on another CPU's private data. this_cpu_ptr()
checks whether or not preemption is disable, and get_cpu_ptr() provides
a convenient wrapper for operating on the cpu ptr inside a preemption
disabled critical section (which currently is provided by the
spinlock).

[ 167.997877] BUG: using smp_processor_id() in preemptible [00000000] code: usb-storage/216
[ 167.997940] caller is debug_smp_processor_id+0x17/0x20
[ 167.997945] CPU: 7 PID: 216 Comm: usb-storage Tainted: G U 4.7.0-rc1-gfxbench-RO_Patchwork_1057+ #1
[ 167.997948] Hardware name: Hewlett-Packard HP Pro 3500 Series/2ABF, BIOS 8.11 10/24/2012
[ 167.997951] 0000000000000000 ffff880118b7f9c8 ffffffff8140dca5 0000000000000007
[ 167.997958] ffffffff81a3a7e9 ffff880118b7f9f8 ffffffff8142a927 0000000000000000
[ 167.997965] ffff8800d499ed58 0000000000000001 00000000000fffff ffff880118b7fa08
[ 167.997971] Call Trace:
[ 167.997977] [] dump_stack+0x67/0x92
[ 167.997981] [] check_preemption_disabled+0xd7/0xe0
[ 167.997985] [] debug_smp_processor_id+0x17/0x20
[ 167.997990] [] alloc_iova_fast+0xb7/0x210
[ 167.997994] [] intel_alloc_iova+0x7f/0xd0
[ 167.997998] [] intel_map_sg+0xbd/0x240
[ 167.998002] [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 167.998009] [] usb_hcd_map_urb_for_dma+0x4b9/0x5a0
[ 167.998013] [] usb_hcd_submit_urb+0xe9/0xaa0
[ 167.998017] [] ? mark_held_locks+0x6f/0xa0
[ 167.998022] [] ? __raw_spin_lock_init+0x1c/0x50
[ 167.998025] [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 167.998028] [] usb_submit_urb+0x3f3/0x5a0
[ 167.998032] [] ? trace_hardirqs_on_caller+0x122/0x1b0
[ 167.998035] [] usb_sg_wait+0x67/0x150
[ 167.998039] [] usb_stor_bulk_transfer_sglist.part.3+0x82/0xd0
[ 167.998042] [] usb_stor_bulk_srb+0x4c/0x60
[ 167.998045] [] usb_stor_Bulk_transport+0x17e/0x420
[ 167.998049] [] usb_stor_invoke_transport+0x242/0x540
[ 167.998052] [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 167.998058] [] usb_stor_transparent_scsi_command+0x9/0x10
[ 167.998061] [] usb_stor_control_thread+0x158/0x260
[ 167.998064] [] ? fill_inquiry_response+0x20/0x20
[ 167.998067] [] ? fill_inquiry_response+0x20/0x20
[ 167.998071] [] kthread+0xea/0x100
[ 167.998078] [] ret_from_fork+0x1f/0x40
[ 167.998081] [] ? kthread_create_on_node+0x1f0/0x1f0

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96293
Signed-off-by: Chris Wilson
Cc: Joonas Lahtinen
Cc: Joerg Roedel
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Fixes: 9257b4a206fc ('iommu/iova: introduce per-cpu caching to iova allocation')
Signed-off-by: Joerg Roedel

Chris Wilson
2016-06-27 19:07:45 +0800

21 Apr, 2016

1 commit

9257b4a20 iommu/iova: introduce per-cpu caching to iova allocation ... Browse Code »

IOVA allocation has two problems that impede high-throughput I/O.
First, it can do a linear search over the allocated IOVA ranges.
Second, the rbtree spinlock that serializes IOVA allocations becomes
contended.

Address these problems by creating an API for caching allocated IOVA
ranges, so that the IOVA allocator isn't accessed frequently. This
patch adds a per-CPU cache, from which CPUs can alloc/free IOVAs
without taking the rbtree spinlock. The per-CPU caches are backed by
a global cache, to avoid invoking the (linear-time) IOVA allocator
without needing to make the per-CPU cache size excessive. This design
is based on magazines, as described in "Magazines and Vmem: Extending
the Slab Allocator to Many CPUs and Arbitrary Resources" (currently
available at https://www.usenix.org/legacy/event/usenix01/bonwick.html)

Adding caching on top of the existing rbtree allocator maintains the
property that IOVAs are densely packed in the IO virtual address space,
which is important for keeping IOMMU page table usage low.

To keep the cache size reasonable, we bound the IOVA space a CPU can
cache by 32 MiB (we cache a bounded number of IOVA ranges, and only
ranges of size
[mad@cs.technion.ac.il: rebased, cleaned up and reworded the commit message]
Signed-off-by: Adam Morrison
Reviewed-by: Shaohua Li
Reviewed-by: Ben Serebrin
[dwmw2: split out VT-d part into a separate patch]
Signed-off-by: David Woodhouse

Omer Peleg
2016-04-21 03:42:24 +0800

28 Jul, 2015

3 commits

15bbdec39 iommu: Make the iova library a module ... Browse Code »

The iova library has use outside the intel-iommu driver, thus make it a
module.

Signed-off-by: Sakari Ailus
Signed-off-by: David Woodhouse

Sakari Ailus
2015-07-28 22:48:01 +0800
9b41760b0 iommu: iova: Export symbols ... Browse Code »

Use EXPORT_SYMBOL_GPL() to export the iova library symbols. The symbols
include:

init_iova_domain();
iova_cache_get();
iova_cache_put();
iova_cache_init();
alloc_iova();
find_iova();
__free_iova();
free_iova();
put_iova_domain();
reserve_iova();
copy_reserved_iova();

Signed-off-by: Sakari Ailus
Signed-off-by: David Woodhouse

Sakari Ailus
2015-07-28 22:48:00 +0800
ae1ff3d62 iommu: iova: Move iova cache management to the iova library ... Browse Code »

This is necessary to separate intel-iommu from the iova library.

Signed-off-by: Sakari Ailus
Signed-off-by: David Woodhouse

Sakari Ailus
2015-07-28 22:47:58 +0800