Blame view
Documentation/ia64/aliasing.rst
9.16 KB
db9a0975a docs: ia64: conve... |
1 2 3 |
================================== Memory Attribute Aliasing on IA-64 ================================== |
32e62c636 [IA64] rework mem... |
4 |
|
db9a0975a docs: ia64: conve... |
5 |
Bjorn Helgaas <bjorn.helgaas@hp.com> |
32e62c636 [IA64] rework mem... |
6 |
|
db9a0975a docs: ia64: conve... |
7 |
May 4, 2006 |
32e62c636 [IA64] rework mem... |
8 |
|
db9a0975a docs: ia64: conve... |
9 10 11 |
Memory Attributes ================= |
32e62c636 [IA64] rework mem... |
12 13 14 15 16 |
Itanium supports several attributes for virtual memory references. The attribute is part of the virtual translation, i.e., it is contained in the TLB entry. The ones of most interest to the Linux kernel are: |
db9a0975a docs: ia64: conve... |
17 18 |
== ====================== WB Write-back (cacheable) |
32e62c636 [IA64] rework mem... |
19 20 |
UC Uncacheable WC Write-coalescing |
db9a0975a docs: ia64: conve... |
21 |
== ====================== |
32e62c636 [IA64] rework mem... |
22 23 24 25 26 27 28 29 30 31 32 33 34 |
System memory typically uses the WB attribute. The UC attribute is used for memory-mapped I/O devices. The WC attribute is uncacheable like UC is, but writes may be delayed and combined to increase performance for things like frame buffers. The Itanium architecture requires that we avoid accessing the same page with both a cacheable mapping and an uncacheable mapping[1]. The design of the chipset determines which attributes are supported on which regions of the address space. For example, some chipsets support either WB or UC access to main memory, while others support only WB access. |
db9a0975a docs: ia64: conve... |
35 36 |
Memory Map ========== |
32e62c636 [IA64] rework mem... |
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
Platform firmware describes the physical memory map and the supported attributes for each region. At boot-time, the kernel uses the EFI GetMemoryMap() interface. ACPI can also describe memory devices and the attributes they support, but Linux/ia64 currently doesn't use this information. The kernel uses the efi_memmap table returned from GetMemoryMap() to learn the attributes supported by each region of physical address space. Unfortunately, this table does not completely describe the address space because some machines omit some or all of the MMIO regions from the map. The kernel maintains another table, kern_memmap, which describes the memory Linux is actually using and the attribute for each region. This contains only system memory; it does not contain MMIO space. The kern_memmap table typically contains only a subset of the system memory described by the efi_memmap. Linux/ia64 can't use all memory in the system because of constraints imposed by the identity mapping scheme. The efi_memmap table is preserved unmodified because the original boot-time information is required for kexec. |
db9a0975a docs: ia64: conve... |
61 62 |
Kernel Identify Mappings ======================== |
32e62c636 [IA64] rework mem... |
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
Linux/ia64 identity mappings are done with large pages, currently either 16MB or 64MB, referred to as "granules." Cacheable mappings are speculative[2], so the processor can read any location in the page at any time, independent of the programmer's intentions. This means that to avoid attribute aliasing, Linux can create a cacheable identity mapping only when the entire granule supports cacheable access. Therefore, kern_memmap contains only full granule-sized regions that can referenced safely by an identity mapping. Uncacheable mappings are not speculative, so the processor will generate UC accesses only to locations explicitly referenced by software. This allows UC identity mappings to cover granules that are only partially populated, or populated with a combination of UC and WB regions. |
db9a0975a docs: ia64: conve... |
80 81 |
User Mappings ============= |
32e62c636 [IA64] rework mem... |
82 83 84 85 |
User mappings are typically done with 16K or 64K pages. The smaller page size allows more flexibility because only 16K or 64K has to be homogeneous with respect to memory attributes. |
db9a0975a docs: ia64: conve... |
86 87 |
Potential Attribute Aliasing Cases ================================== |
32e62c636 [IA64] rework mem... |
88 89 |
There are several ways the kernel creates new mappings: |
db9a0975a docs: ia64: conve... |
90 91 |
mmap of /dev/mem ---------------- |
32e62c636 [IA64] rework mem... |
92 93 94 95 96 97 98 99 100 101 102 103 |
This uses remap_pfn_range(), which creates user mappings. These mappings may be either WB or UC. If the region being mapped happens to be in kern_memmap, meaning that it may also be mapped by a kernel identity mapping, the user mapping must use the same attribute as the kernel mapping. If the region is not in kern_memmap, the user mapping should use an attribute reported as being supported in the EFI memory map. Since the EFI memory map does not describe MMIO on some machines, this should use an uncacheable mapping as a fallback. |
db9a0975a docs: ia64: conve... |
104 105 |
mmap of /sys/class/pci_bus/.../legacy_mem ----------------------------------------- |
32e62c636 [IA64] rework mem... |
106 107 108 109 110 111 112 113 114 115 116 117 |
This is very similar to mmap of /dev/mem, except that legacy_mem only allows mmap of the one megabyte "legacy MMIO" area for a specific PCI bus. Typically this is the first megabyte of physical address space, but it may be different on machines with several VGA devices. "X" uses this to access VGA frame buffers. Using legacy_mem rather than /dev/mem allows multiple instances of X to talk to different VGA cards. The /dev/mem mmap constraints apply. |
db9a0975a docs: ia64: conve... |
118 119 |
mmap of /proc/bus/pci/.../??.? ------------------------------ |
012b7105c [IA64] prevent MC... |
120 |
|
db9a0975a docs: ia64: conve... |
121 |
This is an MMIO mmap of PCI functions, which additionally may or |
012b7105c [IA64] prevent MC... |
122 123 124 125 126 127 128 129 |
may not be requested as using the WC attribute. If WC is requested, and the region in kern_memmap is either WC or UC, and the EFI memory map designates the region as WC, then the WC mapping is allowed. Otherwise, the user mapping must use the same attribute as the kernel mapping. |
db9a0975a docs: ia64: conve... |
130 131 |
read/write of /dev/mem ---------------------- |
32e62c636 [IA64] rework mem... |
132 133 134 135 136 137 138 139 140 141 142 143 |
This uses copy_from_user(), which implicitly uses a kernel identity mapping. This is obviously safe for things in kern_memmap. There may be corner cases of things that are not in kern_memmap, but could be accessed this way. For example, registers in MMIO space are not in kern_memmap, but could be accessed with a UC mapping. This would not cause attribute aliasing. But registers typically can be accessed only with four-byte or eight-byte accesses, and the copy_from_user() path doesn't allow any control over the access size, so this would be dangerous. |
db9a0975a docs: ia64: conve... |
144 145 |
ioremap() --------- |
32e62c636 [IA64] rework mem... |
146 |
|
ddd83eff5 [IA64] update mem... |
147 |
This returns a mapping for use inside the kernel. |
32e62c636 [IA64] rework mem... |
148 149 |
If the region is in kern_memmap, we should use the attribute |
ddd83eff5 [IA64] update mem... |
150 151 152 153 154 155 156 157 158 159 160 |
specified there. If the EFI memory map reports that the entire granule supports WB, we should use that (granules that are partially reserved or occupied by firmware do not appear in kern_memmap). If the granule contains non-WB memory, but we can cover the region safely with kernel page table mappings, we can use ioremap_page_range() as most other architectures do. Failing all of the above, we have to fall back to a UC mapping. |
32e62c636 [IA64] rework mem... |
161 |
|
db9a0975a docs: ia64: conve... |
162 163 |
Past Problem Cases ================== |
32e62c636 [IA64] rework mem... |
164 |
|
db9a0975a docs: ia64: conve... |
165 166 |
mmap of various MMIO regions from /dev/mem by "X" on Intel platforms -------------------------------------------------------------------- |
32e62c636 [IA64] rework mem... |
167 168 169 170 171 172 173 |
The EFI memory map may not report these MMIO regions. These must be allowed so that X will work. This means that when the EFI memory map is incomplete, every /dev/mem mmap must succeed. It may create either WB or UC user mappings, depending on whether the region is in kern_memmap or the EFI memory map. |
db9a0975a docs: ia64: conve... |
174 175 |
mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled ---------------------------------------------------------------------- |
32e62c636 [IA64] rework mem... |
176 |
|
32e62c636 [IA64] rework mem... |
177 |
The EFI memory map reports the following attributes: |
db9a0975a docs: ia64: conve... |
178 179 |
=============== ======= ================== |
32e62c636 [IA64] rework mem... |
180 181 182 |
0x00000-0x9FFFF WB only 0xA0000-0xBFFFF UC only (VGA frame buffer) 0xC0000-0xFFFFF WB only |
db9a0975a docs: ia64: conve... |
183 |
=============== ======= ================== |
32e62c636 [IA64] rework mem... |
184 185 186 187 188 |
This mmap is done with user pages, not kernel identity mappings, so it is safe to use WB mappings. The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, |
ddd83eff5 [IA64] update mem... |
189 190 191 192 |
which uses a granule-sized UC mapping. This granule will cover some WB-only memory, but since UC is non-speculative, the processor will never generate an uncacheable reference to the WB-only areas unless the driver explicitly touches them. |
32e62c636 [IA64] rework mem... |
193 |
|
db9a0975a docs: ia64: conve... |
194 195 |
mmap of 0x0-0xFFFFF legacy_mem by "X" ------------------------------------- |
32e62c636 [IA64] rework mem... |
196 |
|
ddd83eff5 [IA64] update mem... |
197 198 199 200 |
If the EFI memory map reports that the entire range supports the same attributes, we can allow the mmap (and we will prefer WB if supported, as is the case with HP sx[12]000 machines with VGA disabled). |
32e62c636 [IA64] rework mem... |
201 |
|
ddd83eff5 [IA64] update mem... |
202 203 204 |
If EFI reports the range as partly WB and partly UC (as on sx[12]000 machines with VGA enabled), we must fail the mmap because there's no safe attribute to use. |
32e62c636 [IA64] rework mem... |
205 |
|
ddd83eff5 [IA64] update mem... |
206 207 208 |
If EFI reports some of the range but not all (as on Intel firmware that doesn't report the VGA frame buffer at all), we should fail the mmap and force the user to map just the specific region of interest. |
32e62c636 [IA64] rework mem... |
209 |
|
db9a0975a docs: ia64: conve... |
210 211 212 213 |
mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled ------------------------------------------------------------------------ The EFI memory map reports the following attributes:: |
32e62c636 [IA64] rework mem... |
214 |
|
32e62c636 [IA64] rework mem... |
215 216 217 218 |
0x00000-0xFFFFF WB only (no VGA MMIO hole) This is a special case of the previous case, and the mmap should fail for the same reason as above. |
db9a0975a docs: ia64: conve... |
219 220 |
read of /sys/devices/.../rom ---------------------------- |
ddd83eff5 [IA64] update mem... |
221 222 223 224 225 226 227 228 |
For VGA devices, this may cause an ioremap() of 0xC0000. This used to be done with a UC mapping, because the VGA frame buffer at 0xA0000 prevents use of a WB granule. The UC mapping causes an MCA on HP sx[12]000 chipsets. We should use WB page table mappings to avoid covering the VGA frame buffer. |
db9a0975a docs: ia64: conve... |
229 230 |
Notes ===== |
32e62c636 [IA64] rework mem... |
231 232 233 |
[1] SDM rev 2.2, vol 2, sec 4.4.1. [2] SDM rev 2.2, vol 2, sec 4.4.6. |