Commit 1551df35f296f0a8df32f4f2054254f46e8be252
Committed by
Albert ARIBAUD
1 parent
f503cc49a5
Exists in
master
and in
50 other branches
arm: Switch to -mno-unaligned-access when supported by the compiler
When we tell the compiler to optimize for ARMv7 (and ARMv6 for that matter) it assumes a default of SCTRL.A being cleared and unaligned accesses being allowed and fast at the hardware level. We set this bit and must pass along -mno-unaligned-access so that the compiler will still breakdown accesses and not trigger a data abort. To better help understand the requirements of the project with respect to unaligned memory access, the Documentation/unaligned-memory-access.txt file has been added as doc/README.unaligned-memory-access.txt and is taken from the v3.14-rc1 tag of the kernel. Cc: Albert ARIBAUD <albert.u.boot@aribaud.net> Cc: Mans Rullgard <mans@mansr.com> Signed-off-by: Tom Rini <trini@ti.com>
Showing 9 changed files with 248 additions and 138 deletions Side-by-side Diff
README
... | ... | @@ -1726,7 +1726,7 @@ |
1726 | 1726 | |
1727 | 1727 | If this option is set, then U-Boot will prevent the environment |
1728 | 1728 | variable "splashimage" from being set to a problematic address |
1729 | - (see README.displaying-bmps and README.arm-unaligned-accesses). | |
1729 | + (see README.displaying-bmps). | |
1730 | 1730 | This option is useful for targets where, due to alignment |
1731 | 1731 | restrictions, an improperly aligned BMP image will cause a data |
1732 | 1732 | abort. If you think you will not have problems with unaligned |
arch/arm/cpu/armv7/config.mk
... | ... | @@ -10,9 +10,12 @@ |
10 | 10 | PF_CPPFLAGS_ARMV7 := $(call cc-option, -march=armv7-a, -march=armv5) |
11 | 11 | PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV7) |
12 | 12 | |
13 | -# SEE README.arm-unaligned-accesses | |
13 | +# On supported platforms we set the bit which causes us to trap on unaligned | |
14 | +# memory access. This is the opposite of what the compiler expects to be | |
15 | +# the default so we must pass in -mno-unaligned-access so that it is aware | |
16 | +# of our decision. | |
14 | 17 | PF_NO_UNALIGNED := $(call cc-option, -mno-unaligned-access,) |
15 | -PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED) | |
18 | +PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED) | |
16 | 19 | |
17 | 20 | ifneq ($(CONFIG_IMX_CONFIG),) |
18 | 21 | ifdef CONFIG_SPL |
arch/arm/cpu/armv8/config.mk
... | ... | @@ -6,11 +6,8 @@ |
6 | 6 | # |
7 | 7 | PLATFORM_RELFLAGS += -fno-common -ffixed-x18 |
8 | 8 | |
9 | -# SEE README.arm-unaligned-accesses | |
10 | -PF_NO_UNALIGNED := $(call cc-option, -mstrict-align) | |
11 | -PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED) | |
12 | - | |
13 | 9 | PF_CPPFLAGS_ARMV8 := $(call cc-option, -march=armv8-a) |
10 | +PF_NO_UNALIGNED := $(call cc-option, -mstrict-align) | |
14 | 11 | PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV8) |
15 | 12 | PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED) |
arch/arm/lib/interrupts.c
common/Makefile
doc/README.arm-unaligned-accesses
1 | -If you are reading this because of a data abort: the following MIGHT | |
2 | -be relevant to your abort, if it was caused by an alignment violation. | |
3 | -In order to determine this, use the PC from the abort dump along with | |
4 | -an objdump -s -S of the u-boot ELF binary to locate the function where | |
5 | -the abort happened; then compare this function with the examples below. | |
6 | -If they match, then you've been hit with a compiler generated unaligned | |
7 | -access, and you should rewrite your code or add -mno-unaligned-access | |
8 | -to the command line of the offending file. | |
9 | - | |
10 | -Note that the PC shown in the abort message is relocated. In order to | |
11 | -be able to match it to an address in the ELF binary dump, you will need | |
12 | -to know the relocation offset. If your target defines CONFIG_CMD_BDI | |
13 | -and if you can get to the prompt and enter commands before the abort | |
14 | -happens, then command "bdinfo" will give you the offset. Otherwise you | |
15 | -will need to try a build with DEBUG set, which will display the offset, | |
16 | -or use a debugger and set a breakpoint at relocate_code() to see the | |
17 | -offset (passed as an argument). | |
18 | - | |
19 | -* | |
20 | - | |
21 | -Since U-Boot runs on a variety of hardware, some only able to perform | |
22 | -unaligned accesses with a strong penalty, some unable to perform them | |
23 | -at all, the policy regarding unaligned accesses is to not perform any, | |
24 | -unless absolutely necessary because of hardware or standards. | |
25 | - | |
26 | -Also, on hardware which permits it, the core is configured to throw | |
27 | -data abort exceptions on unaligned accesses in order to catch these | |
28 | -unallowed accesses as early as possible. | |
29 | - | |
30 | -Until version 4.7, the gcc default for performing unaligned accesses | |
31 | -(-mno-unaligned-access) is to emulate unaligned accesses using aligned | |
32 | -loads and stores plus shifts and masks. Emulated unaligned accesses | |
33 | -will not be caught by hardware. These accesses may be costly and may | |
34 | -be actually unnecessary. In order to catch these accesses and remove | |
35 | -or optimize them, option -munaligned-access is explicitly set for all | |
36 | -versions of gcc which support it. | |
37 | - | |
38 | -From gcc 4.7 onward starting at armv7 architectures, the default for | |
39 | -performing unaligned accesses is to use unaligned native loads and | |
40 | -stores (-munaligned-access), because the cost of unaligned accesses | |
41 | -has dropped on armv7 and beyond. This should not affect U-Boot's | |
42 | -policy of controlling unaligned accesses, however the compiler may | |
43 | -generate uncontrolled unaligned accesses on its own in at least one | |
44 | -known case: when declaring a local initialized char array, e.g. | |
45 | - | |
46 | -function foo() | |
47 | -{ | |
48 | - char buffer[] = "initial value"; | |
49 | -/* or */ | |
50 | - char buffer[] = { 'i', 'n', 'i', 't', 0 }; | |
51 | - ... | |
52 | -} | |
53 | - | |
54 | -Under -munaligned-accesses with optimizations on, this declaration | |
55 | -causes the compiler to generate native loads from the literal string | |
56 | -and native stores to the buffer, and the literal string alignment | |
57 | -cannot be controlled. If it is misaligned, then the core will throw | |
58 | -a data abort exception. | |
59 | - | |
60 | -Quite probably the same might happen for 16-bit array initializations | |
61 | -where the constant is aligned on a boundary which is a multiple of 2 | |
62 | -but not of 4: | |
63 | - | |
64 | -function foo() | |
65 | -{ | |
66 | - u16 buffer[] = { 1, 2, 3 }; | |
67 | - ... | |
68 | -} | |
69 | - | |
70 | -The long term solution to this issue is to add an option to gcc to | |
71 | -allow controlling the general alignment of data, including constant | |
72 | -initialization values. | |
73 | - | |
74 | -However this will only apply to the version of gcc which will have such | |
75 | -an option. For other versions, there are four workarounds: | |
76 | - | |
77 | -a) Enforce as a rule that array initializations as described above | |
78 | - are forbidden. This is generally not acceptable as they are valid, | |
79 | - and usual, C constructs. The only case where they could be rejected | |
80 | - is when they actually equate to a const char* declaration, i.e. the | |
81 | - array is initialized and never modified in the function's scope. | |
82 | - | |
83 | -b) Drop the requirement on unaligned accesses at least for ARMv7, | |
84 | - i.e. do not throw a data abort exception upon unaligned accesses. | |
85 | - But that will allow adding badly aligned code to U-Boot, only for | |
86 | - it to fail when re-used with a stricter target, possibly once the | |
87 | - bad code is already in mainline. | |
88 | - | |
89 | -c) Relax the -munaligned-access rule globally. This will prevent native | |
90 | - unaligned accesses of course, but that will also hide any bug caused | |
91 | - by a bad unaligned access, making it much harder to diagnose it. It | |
92 | - is actually what already happens when building ARM targets with a | |
93 | - pre-4.7 gcc, and it may actually already hide some bugs yet unseen | |
94 | - until the target gets compiled with -munaligned-access. | |
95 | - | |
96 | -d) Relax the -munaligned-access rule only for for files susceptible to | |
97 | - the local initialized array issue and for armv7 architectures and | |
98 | - beyond. This minimizes the quantity of code which can hide unwanted | |
99 | - misaligned accesses. | |
100 | - | |
101 | -The option retained is d). | |
102 | - | |
103 | -Considering that actual occurrences of the issue are rare (as of this | |
104 | -writing, 5 files out of 7840 in U-Boot, or .3%, contain an initialized | |
105 | -local char array which cannot actually be replaced with a const char*), | |
106 | -contributors should not be required to systematically try and detect | |
107 | -the issue in their patches. | |
108 | - | |
109 | -Detecting files susceptible to the issue can be automated through a | |
110 | -filter installed as a hook in .git which recognizes local char array | |
111 | -initializations. Automation should err on the false positive side, for | |
112 | -instance flagging non-local arrays as if they were local if they cannot | |
113 | -be told apart. | |
114 | - | |
115 | -In any case, detection shall not prevent committing the patch, but | |
116 | -shall pre-populate the commit message with a note to the effect that | |
117 | -this patch contains an initialized local char or 16-bit array and thus | |
118 | -should be protected from the gcc 4.7 issue. | |
119 | - | |
120 | -Upon a positive detection, either $(PLATFORM_NO_UNALIGNED) should be | |
121 | -added to CFLAGS for the affected file(s), or if the array is a pseudo | |
122 | -const char*, it should be replaced by an actual one. |
doc/README.unaligned-memory-access.txt
1 | +Editors note: This document is _heavily_ cribbed from the Linux Kernel, with | |
2 | +really only the section about "Alignment vs. Networking" removed. | |
3 | + | |
4 | +UNALIGNED MEMORY ACCESSES | |
5 | +========================= | |
6 | + | |
7 | +Linux runs on a wide variety of architectures which have varying behaviour | |
8 | +when it comes to memory access. This document presents some details about | |
9 | +unaligned accesses, why you need to write code that doesn't cause them, | |
10 | +and how to write such code! | |
11 | + | |
12 | + | |
13 | +The definition of an unaligned access | |
14 | +===================================== | |
15 | + | |
16 | +Unaligned memory accesses occur when you try to read N bytes of data starting | |
17 | +from an address that is not evenly divisible by N (i.e. addr % N != 0). | |
18 | +For example, reading 4 bytes of data from address 0x10004 is fine, but | |
19 | +reading 4 bytes of data from address 0x10005 would be an unaligned memory | |
20 | +access. | |
21 | + | |
22 | +The above may seem a little vague, as memory access can happen in different | |
23 | +ways. The context here is at the machine code level: certain instructions read | |
24 | +or write a number of bytes to or from memory (e.g. movb, movw, movl in x86 | |
25 | +assembly). As will become clear, it is relatively easy to spot C statements | |
26 | +which will compile to multiple-byte memory access instructions, namely when | |
27 | +dealing with types such as u16, u32 and u64. | |
28 | + | |
29 | + | |
30 | +Natural alignment | |
31 | +================= | |
32 | + | |
33 | +The rule mentioned above forms what we refer to as natural alignment: | |
34 | +When accessing N bytes of memory, the base memory address must be evenly | |
35 | +divisible by N, i.e. addr % N == 0. | |
36 | + | |
37 | +When writing code, assume the target architecture has natural alignment | |
38 | +requirements. | |
39 | + | |
40 | +In reality, only a few architectures require natural alignment on all sizes | |
41 | +of memory access. However, we must consider ALL supported architectures; | |
42 | +writing code that satisfies natural alignment requirements is the easiest way | |
43 | +to achieve full portability. | |
44 | + | |
45 | + | |
46 | +Why unaligned access is bad | |
47 | +=========================== | |
48 | + | |
49 | +The effects of performing an unaligned memory access vary from architecture | |
50 | +to architecture. It would be easy to write a whole document on the differences | |
51 | +here; a summary of the common scenarios is presented below: | |
52 | + | |
53 | + - Some architectures are able to perform unaligned memory accesses | |
54 | + transparently, but there is usually a significant performance cost. | |
55 | + - Some architectures raise processor exceptions when unaligned accesses | |
56 | + happen. The exception handler is able to correct the unaligned access, | |
57 | + at significant cost to performance. | |
58 | + - Some architectures raise processor exceptions when unaligned accesses | |
59 | + happen, but the exceptions do not contain enough information for the | |
60 | + unaligned access to be corrected. | |
61 | + - Some architectures are not capable of unaligned memory access, but will | |
62 | + silently perform a different memory access to the one that was requested, | |
63 | + resulting in a subtle code bug that is hard to detect! | |
64 | + | |
65 | +It should be obvious from the above that if your code causes unaligned | |
66 | +memory accesses to happen, your code will not work correctly on certain | |
67 | +platforms and will cause performance problems on others. | |
68 | + | |
69 | + | |
70 | +Code that does not cause unaligned access | |
71 | +========================================= | |
72 | + | |
73 | +At first, the concepts above may seem a little hard to relate to actual | |
74 | +coding practice. After all, you don't have a great deal of control over | |
75 | +memory addresses of certain variables, etc. | |
76 | + | |
77 | +Fortunately things are not too complex, as in most cases, the compiler | |
78 | +ensures that things will work for you. For example, take the following | |
79 | +structure: | |
80 | + | |
81 | + struct foo { | |
82 | + u16 field1; | |
83 | + u32 field2; | |
84 | + u8 field3; | |
85 | + }; | |
86 | + | |
87 | +Let us assume that an instance of the above structure resides in memory | |
88 | +starting at address 0x10000. With a basic level of understanding, it would | |
89 | +not be unreasonable to expect that accessing field2 would cause an unaligned | |
90 | +access. You'd be expecting field2 to be located at offset 2 bytes into the | |
91 | +structure, i.e. address 0x10002, but that address is not evenly divisible | |
92 | +by 4 (remember, we're reading a 4 byte value here). | |
93 | + | |
94 | +Fortunately, the compiler understands the alignment constraints, so in the | |
95 | +above case it would insert 2 bytes of padding in between field1 and field2. | |
96 | +Therefore, for standard structure types you can always rely on the compiler | |
97 | +to pad structures so that accesses to fields are suitably aligned (assuming | |
98 | +you do not cast the field to a type of different length). | |
99 | + | |
100 | +Similarly, you can also rely on the compiler to align variables and function | |
101 | +parameters to a naturally aligned scheme, based on the size of the type of | |
102 | +the variable. | |
103 | + | |
104 | +At this point, it should be clear that accessing a single byte (u8 or char) | |
105 | +will never cause an unaligned access, because all memory addresses are evenly | |
106 | +divisible by one. | |
107 | + | |
108 | +On a related topic, with the above considerations in mind you may observe | |
109 | +that you could reorder the fields in the structure in order to place fields | |
110 | +where padding would otherwise be inserted, and hence reduce the overall | |
111 | +resident memory size of structure instances. The optimal layout of the | |
112 | +above example is: | |
113 | + | |
114 | + struct foo { | |
115 | + u32 field2; | |
116 | + u16 field1; | |
117 | + u8 field3; | |
118 | + }; | |
119 | + | |
120 | +For a natural alignment scheme, the compiler would only have to add a single | |
121 | +byte of padding at the end of the structure. This padding is added in order | |
122 | +to satisfy alignment constraints for arrays of these structures. | |
123 | + | |
124 | +Another point worth mentioning is the use of __attribute__((packed)) on a | |
125 | +structure type. This GCC-specific attribute tells the compiler never to | |
126 | +insert any padding within structures, useful when you want to use a C struct | |
127 | +to represent some data that comes in a fixed arrangement 'off the wire'. | |
128 | + | |
129 | +You might be inclined to believe that usage of this attribute can easily | |
130 | +lead to unaligned accesses when accessing fields that do not satisfy | |
131 | +architectural alignment requirements. However, again, the compiler is aware | |
132 | +of the alignment constraints and will generate extra instructions to perform | |
133 | +the memory access in a way that does not cause unaligned access. Of course, | |
134 | +the extra instructions obviously cause a loss in performance compared to the | |
135 | +non-packed case, so the packed attribute should only be used when avoiding | |
136 | +structure padding is of importance. | |
137 | + | |
138 | + | |
139 | +Code that causes unaligned access | |
140 | +================================= | |
141 | + | |
142 | +With the above in mind, let's move onto a real life example of a function | |
143 | +that can cause an unaligned memory access. The following function taken | |
144 | +from the Linux Kernel's include/linux/etherdevice.h is an optimized routine | |
145 | +to compare two ethernet MAC addresses for equality. | |
146 | + | |
147 | +bool ether_addr_equal(const u8 *addr1, const u8 *addr2) | |
148 | +{ | |
149 | +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS | |
150 | + u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) | | |
151 | + ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4))); | |
152 | + | |
153 | + return fold == 0; | |
154 | +#else | |
155 | + const u16 *a = (const u16 *)addr1; | |
156 | + const u16 *b = (const u16 *)addr2; | |
157 | + return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0; | |
158 | +#endif | |
159 | +} | |
160 | + | |
161 | +In the above function, when the hardware has efficient unaligned access | |
162 | +capability, there is no issue with this code. But when the hardware isn't | |
163 | +able to access memory on arbitrary boundaries, the reference to a[0] causes | |
164 | +2 bytes (16 bits) to be read from memory starting at address addr1. | |
165 | + | |
166 | +Think about what would happen if addr1 was an odd address such as 0x10003. | |
167 | +(Hint: it'd be an unaligned access.) | |
168 | + | |
169 | +Despite the potential unaligned access problems with the above function, it | |
170 | +is included in the kernel anyway but is understood to only work normally on | |
171 | +16-bit-aligned addresses. It is up to the caller to ensure this alignment or | |
172 | +not use this function at all. This alignment-unsafe function is still useful | |
173 | +as it is a decent optimization for the cases when you can ensure alignment, | |
174 | +which is true almost all of the time in ethernet networking context. | |
175 | + | |
176 | + | |
177 | +Here is another example of some code that could cause unaligned accesses: | |
178 | + void myfunc(u8 *data, u32 value) | |
179 | + { | |
180 | + [...] | |
181 | + *((u32 *) data) = cpu_to_le32(value); | |
182 | + [...] | |
183 | + } | |
184 | + | |
185 | +This code will cause unaligned accesses every time the data parameter points | |
186 | +to an address that is not evenly divisible by 4. | |
187 | + | |
188 | +In summary, the 2 main scenarios where you may run into unaligned access | |
189 | +problems involve: | |
190 | + 1. Casting variables to types of different lengths | |
191 | + 2. Pointer arithmetic followed by access to at least 2 bytes of data | |
192 | + | |
193 | + | |
194 | +Avoiding unaligned accesses | |
195 | +=========================== | |
196 | + | |
197 | +The easiest way to avoid unaligned access is to use the get_unaligned() and | |
198 | +put_unaligned() macros provided by the <asm/unaligned.h> header file. | |
199 | + | |
200 | +Going back to an earlier example of code that potentially causes unaligned | |
201 | +access: | |
202 | + | |
203 | + void myfunc(u8 *data, u32 value) | |
204 | + { | |
205 | + [...] | |
206 | + *((u32 *) data) = cpu_to_le32(value); | |
207 | + [...] | |
208 | + } | |
209 | + | |
210 | +To avoid the unaligned memory access, you would rewrite it as follows: | |
211 | + | |
212 | + void myfunc(u8 *data, u32 value) | |
213 | + { | |
214 | + [...] | |
215 | + value = cpu_to_le32(value); | |
216 | + put_unaligned(value, (u32 *) data); | |
217 | + [...] | |
218 | + } | |
219 | + | |
220 | +The get_unaligned() macro works similarly. Assuming 'data' is a pointer to | |
221 | +memory and you wish to avoid unaligned access, its usage is as follows: | |
222 | + | |
223 | + u32 value = get_unaligned((u32 *) data); | |
224 | + | |
225 | +These macros work for memory accesses of any length (not just 32 bits as | |
226 | +in the examples above). Be aware that when compared to standard access of | |
227 | +aligned memory, using these macros to access unaligned memory can be costly in | |
228 | +terms of performance. | |
229 | + | |
230 | +If use of such macros is not convenient, another option is to use memcpy(), | |
231 | +where the source or destination (or both) are of type u8* or unsigned char*. | |
232 | +Due to the byte-wise nature of this operation, unaligned accesses are avoided. | |
233 | + | |
234 | +-- | |
235 | +In the Linux Kernel, | |
236 | +Authors: Daniel Drake <dsd@gentoo.org>, | |
237 | + Johannes Berg <johannes@sipsolutions.net> | |
238 | +With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, | |
239 | +Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, | |
240 | +Vadim Lobanov |
fs/ubifs/Makefile
lib/Makefile