Commit 1551df35f296f0a8df32f4f2054254f46e8be252

Authored by Tom Rini
Committed by Albert ARIBAUD
1 parent f503cc49a5

arm: Switch to -mno-unaligned-access when supported by the compiler

When we tell the compiler to optimize for ARMv7 (and ARMv6 for that
matter) it assumes a default of SCTRL.A being cleared and unaligned
accesses being allowed and fast at the hardware level.  We set this bit
and must pass along -mno-unaligned-access so that the compiler will
still breakdown accesses and not trigger a data abort.

To better help understand the requirements of the project with respect
to unaligned memory access, the
Documentation/unaligned-memory-access.txt file has been added as
doc/README.unaligned-memory-access.txt and is taken from the v3.14-rc1
tag of the kernel.

Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
Cc: Mans Rullgard <mans@mansr.com>
Signed-off-by: Tom Rini <trini@ti.com>

Showing 9 changed files with 248 additions and 138 deletions Side-by-side Diff

... ... @@ -1726,7 +1726,7 @@
1726 1726  
1727 1727 If this option is set, then U-Boot will prevent the environment
1728 1728 variable "splashimage" from being set to a problematic address
1729   - (see README.displaying-bmps and README.arm-unaligned-accesses).
  1729 + (see README.displaying-bmps).
1730 1730 This option is useful for targets where, due to alignment
1731 1731 restrictions, an improperly aligned BMP image will cause a data
1732 1732 abort. If you think you will not have problems with unaligned
arch/arm/cpu/armv7/config.mk
... ... @@ -10,9 +10,12 @@
10 10 PF_CPPFLAGS_ARMV7 := $(call cc-option, -march=armv7-a, -march=armv5)
11 11 PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV7)
12 12  
13   -# SEE README.arm-unaligned-accesses
  13 +# On supported platforms we set the bit which causes us to trap on unaligned
  14 +# memory access. This is the opposite of what the compiler expects to be
  15 +# the default so we must pass in -mno-unaligned-access so that it is aware
  16 +# of our decision.
14 17 PF_NO_UNALIGNED := $(call cc-option, -mno-unaligned-access,)
15   -PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED)
  18 +PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED)
16 19  
17 20 ifneq ($(CONFIG_IMX_CONFIG),)
18 21 ifdef CONFIG_SPL
arch/arm/cpu/armv8/config.mk
... ... @@ -6,11 +6,8 @@
6 6 #
7 7 PLATFORM_RELFLAGS += -fno-common -ffixed-x18
8 8  
9   -# SEE README.arm-unaligned-accesses
10   -PF_NO_UNALIGNED := $(call cc-option, -mstrict-align)
11   -PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED)
12   -
13 9 PF_CPPFLAGS_ARMV8 := $(call cc-option, -march=armv8-a)
  10 +PF_NO_UNALIGNED := $(call cc-option, -mstrict-align)
14 11 PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV8)
15 12 PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED)
arch/arm/lib/interrupts.c
... ... @@ -153,7 +153,7 @@
153 153  
154 154 void do_data_abort (struct pt_regs *pt_regs)
155 155 {
156   - printf ("data abort\n\n MAYBE you should read doc/README.arm-unaligned-accesses\n\n");
  156 + printf ("data abort\n");
157 157 show_regs (pt_regs);
158 158 bad_mode ();
159 159 }
... ... @@ -239,6 +239,4 @@
239 239 obj-y += stdio.o
240 240  
241 241 CFLAGS_env_embedded.o := -Wa,--no-warn -DENV_CRC=$(shell tools/envcrc 2>/dev/null)
242   -CFLAGS_hush.o := $(PLATFORM_NO_UNALIGNED)
243   -CFLAGS_fdt_support.o := $(PLATFORM_NO_UNALIGNED)
doc/README.arm-unaligned-accesses
1   -If you are reading this because of a data abort: the following MIGHT
2   -be relevant to your abort, if it was caused by an alignment violation.
3   -In order to determine this, use the PC from the abort dump along with
4   -an objdump -s -S of the u-boot ELF binary to locate the function where
5   -the abort happened; then compare this function with the examples below.
6   -If they match, then you've been hit with a compiler generated unaligned
7   -access, and you should rewrite your code or add -mno-unaligned-access
8   -to the command line of the offending file.
9   -
10   -Note that the PC shown in the abort message is relocated. In order to
11   -be able to match it to an address in the ELF binary dump, you will need
12   -to know the relocation offset. If your target defines CONFIG_CMD_BDI
13   -and if you can get to the prompt and enter commands before the abort
14   -happens, then command "bdinfo" will give you the offset. Otherwise you
15   -will need to try a build with DEBUG set, which will display the offset,
16   -or use a debugger and set a breakpoint at relocate_code() to see the
17   -offset (passed as an argument).
18   -
19   -*
20   -
21   -Since U-Boot runs on a variety of hardware, some only able to perform
22   -unaligned accesses with a strong penalty, some unable to perform them
23   -at all, the policy regarding unaligned accesses is to not perform any,
24   -unless absolutely necessary because of hardware or standards.
25   -
26   -Also, on hardware which permits it, the core is configured to throw
27   -data abort exceptions on unaligned accesses in order to catch these
28   -unallowed accesses as early as possible.
29   -
30   -Until version 4.7, the gcc default for performing unaligned accesses
31   -(-mno-unaligned-access) is to emulate unaligned accesses using aligned
32   -loads and stores plus shifts and masks. Emulated unaligned accesses
33   -will not be caught by hardware. These accesses may be costly and may
34   -be actually unnecessary. In order to catch these accesses and remove
35   -or optimize them, option -munaligned-access is explicitly set for all
36   -versions of gcc which support it.
37   -
38   -From gcc 4.7 onward starting at armv7 architectures, the default for
39   -performing unaligned accesses is to use unaligned native loads and
40   -stores (-munaligned-access), because the cost of unaligned accesses
41   -has dropped on armv7 and beyond. This should not affect U-Boot's
42   -policy of controlling unaligned accesses, however the compiler may
43   -generate uncontrolled unaligned accesses on its own in at least one
44   -known case: when declaring a local initialized char array, e.g.
45   -
46   -function foo()
47   -{
48   - char buffer[] = "initial value";
49   -/* or */
50   - char buffer[] = { 'i', 'n', 'i', 't', 0 };
51   - ...
52   -}
53   -
54   -Under -munaligned-accesses with optimizations on, this declaration
55   -causes the compiler to generate native loads from the literal string
56   -and native stores to the buffer, and the literal string alignment
57   -cannot be controlled. If it is misaligned, then the core will throw
58   -a data abort exception.
59   -
60   -Quite probably the same might happen for 16-bit array initializations
61   -where the constant is aligned on a boundary which is a multiple of 2
62   -but not of 4:
63   -
64   -function foo()
65   -{
66   - u16 buffer[] = { 1, 2, 3 };
67   - ...
68   -}
69   -
70   -The long term solution to this issue is to add an option to gcc to
71   -allow controlling the general alignment of data, including constant
72   -initialization values.
73   -
74   -However this will only apply to the version of gcc which will have such
75   -an option. For other versions, there are four workarounds:
76   -
77   -a) Enforce as a rule that array initializations as described above
78   - are forbidden. This is generally not acceptable as they are valid,
79   - and usual, C constructs. The only case where they could be rejected
80   - is when they actually equate to a const char* declaration, i.e. the
81   - array is initialized and never modified in the function's scope.
82   -
83   -b) Drop the requirement on unaligned accesses at least for ARMv7,
84   - i.e. do not throw a data abort exception upon unaligned accesses.
85   - But that will allow adding badly aligned code to U-Boot, only for
86   - it to fail when re-used with a stricter target, possibly once the
87   - bad code is already in mainline.
88   -
89   -c) Relax the -munaligned-access rule globally. This will prevent native
90   - unaligned accesses of course, but that will also hide any bug caused
91   - by a bad unaligned access, making it much harder to diagnose it. It
92   - is actually what already happens when building ARM targets with a
93   - pre-4.7 gcc, and it may actually already hide some bugs yet unseen
94   - until the target gets compiled with -munaligned-access.
95   -
96   -d) Relax the -munaligned-access rule only for for files susceptible to
97   - the local initialized array issue and for armv7 architectures and
98   - beyond. This minimizes the quantity of code which can hide unwanted
99   - misaligned accesses.
100   -
101   -The option retained is d).
102   -
103   -Considering that actual occurrences of the issue are rare (as of this
104   -writing, 5 files out of 7840 in U-Boot, or .3%, contain an initialized
105   -local char array which cannot actually be replaced with a const char*),
106   -contributors should not be required to systematically try and detect
107   -the issue in their patches.
108   -
109   -Detecting files susceptible to the issue can be automated through a
110   -filter installed as a hook in .git which recognizes local char array
111   -initializations. Automation should err on the false positive side, for
112   -instance flagging non-local arrays as if they were local if they cannot
113   -be told apart.
114   -
115   -In any case, detection shall not prevent committing the patch, but
116   -shall pre-populate the commit message with a note to the effect that
117   -this patch contains an initialized local char or 16-bit array and thus
118   -should be protected from the gcc 4.7 issue.
119   -
120   -Upon a positive detection, either $(PLATFORM_NO_UNALIGNED) should be
121   -added to CFLAGS for the affected file(s), or if the array is a pseudo
122   -const char*, it should be replaced by an actual one.
doc/README.unaligned-memory-access.txt
  1 +Editors note: This document is _heavily_ cribbed from the Linux Kernel, with
  2 +really only the section about "Alignment vs. Networking" removed.
  3 +
  4 +UNALIGNED MEMORY ACCESSES
  5 +=========================
  6 +
  7 +Linux runs on a wide variety of architectures which have varying behaviour
  8 +when it comes to memory access. This document presents some details about
  9 +unaligned accesses, why you need to write code that doesn't cause them,
  10 +and how to write such code!
  11 +
  12 +
  13 +The definition of an unaligned access
  14 +=====================================
  15 +
  16 +Unaligned memory accesses occur when you try to read N bytes of data starting
  17 +from an address that is not evenly divisible by N (i.e. addr % N != 0).
  18 +For example, reading 4 bytes of data from address 0x10004 is fine, but
  19 +reading 4 bytes of data from address 0x10005 would be an unaligned memory
  20 +access.
  21 +
  22 +The above may seem a little vague, as memory access can happen in different
  23 +ways. The context here is at the machine code level: certain instructions read
  24 +or write a number of bytes to or from memory (e.g. movb, movw, movl in x86
  25 +assembly). As will become clear, it is relatively easy to spot C statements
  26 +which will compile to multiple-byte memory access instructions, namely when
  27 +dealing with types such as u16, u32 and u64.
  28 +
  29 +
  30 +Natural alignment
  31 +=================
  32 +
  33 +The rule mentioned above forms what we refer to as natural alignment:
  34 +When accessing N bytes of memory, the base memory address must be evenly
  35 +divisible by N, i.e. addr % N == 0.
  36 +
  37 +When writing code, assume the target architecture has natural alignment
  38 +requirements.
  39 +
  40 +In reality, only a few architectures require natural alignment on all sizes
  41 +of memory access. However, we must consider ALL supported architectures;
  42 +writing code that satisfies natural alignment requirements is the easiest way
  43 +to achieve full portability.
  44 +
  45 +
  46 +Why unaligned access is bad
  47 +===========================
  48 +
  49 +The effects of performing an unaligned memory access vary from architecture
  50 +to architecture. It would be easy to write a whole document on the differences
  51 +here; a summary of the common scenarios is presented below:
  52 +
  53 + - Some architectures are able to perform unaligned memory accesses
  54 + transparently, but there is usually a significant performance cost.
  55 + - Some architectures raise processor exceptions when unaligned accesses
  56 + happen. The exception handler is able to correct the unaligned access,
  57 + at significant cost to performance.
  58 + - Some architectures raise processor exceptions when unaligned accesses
  59 + happen, but the exceptions do not contain enough information for the
  60 + unaligned access to be corrected.
  61 + - Some architectures are not capable of unaligned memory access, but will
  62 + silently perform a different memory access to the one that was requested,
  63 + resulting in a subtle code bug that is hard to detect!
  64 +
  65 +It should be obvious from the above that if your code causes unaligned
  66 +memory accesses to happen, your code will not work correctly on certain
  67 +platforms and will cause performance problems on others.
  68 +
  69 +
  70 +Code that does not cause unaligned access
  71 +=========================================
  72 +
  73 +At first, the concepts above may seem a little hard to relate to actual
  74 +coding practice. After all, you don't have a great deal of control over
  75 +memory addresses of certain variables, etc.
  76 +
  77 +Fortunately things are not too complex, as in most cases, the compiler
  78 +ensures that things will work for you. For example, take the following
  79 +structure:
  80 +
  81 + struct foo {
  82 + u16 field1;
  83 + u32 field2;
  84 + u8 field3;
  85 + };
  86 +
  87 +Let us assume that an instance of the above structure resides in memory
  88 +starting at address 0x10000. With a basic level of understanding, it would
  89 +not be unreasonable to expect that accessing field2 would cause an unaligned
  90 +access. You'd be expecting field2 to be located at offset 2 bytes into the
  91 +structure, i.e. address 0x10002, but that address is not evenly divisible
  92 +by 4 (remember, we're reading a 4 byte value here).
  93 +
  94 +Fortunately, the compiler understands the alignment constraints, so in the
  95 +above case it would insert 2 bytes of padding in between field1 and field2.
  96 +Therefore, for standard structure types you can always rely on the compiler
  97 +to pad structures so that accesses to fields are suitably aligned (assuming
  98 +you do not cast the field to a type of different length).
  99 +
  100 +Similarly, you can also rely on the compiler to align variables and function
  101 +parameters to a naturally aligned scheme, based on the size of the type of
  102 +the variable.
  103 +
  104 +At this point, it should be clear that accessing a single byte (u8 or char)
  105 +will never cause an unaligned access, because all memory addresses are evenly
  106 +divisible by one.
  107 +
  108 +On a related topic, with the above considerations in mind you may observe
  109 +that you could reorder the fields in the structure in order to place fields
  110 +where padding would otherwise be inserted, and hence reduce the overall
  111 +resident memory size of structure instances. The optimal layout of the
  112 +above example is:
  113 +
  114 + struct foo {
  115 + u32 field2;
  116 + u16 field1;
  117 + u8 field3;
  118 + };
  119 +
  120 +For a natural alignment scheme, the compiler would only have to add a single
  121 +byte of padding at the end of the structure. This padding is added in order
  122 +to satisfy alignment constraints for arrays of these structures.
  123 +
  124 +Another point worth mentioning is the use of __attribute__((packed)) on a
  125 +structure type. This GCC-specific attribute tells the compiler never to
  126 +insert any padding within structures, useful when you want to use a C struct
  127 +to represent some data that comes in a fixed arrangement 'off the wire'.
  128 +
  129 +You might be inclined to believe that usage of this attribute can easily
  130 +lead to unaligned accesses when accessing fields that do not satisfy
  131 +architectural alignment requirements. However, again, the compiler is aware
  132 +of the alignment constraints and will generate extra instructions to perform
  133 +the memory access in a way that does not cause unaligned access. Of course,
  134 +the extra instructions obviously cause a loss in performance compared to the
  135 +non-packed case, so the packed attribute should only be used when avoiding
  136 +structure padding is of importance.
  137 +
  138 +
  139 +Code that causes unaligned access
  140 +=================================
  141 +
  142 +With the above in mind, let's move onto a real life example of a function
  143 +that can cause an unaligned memory access. The following function taken
  144 +from the Linux Kernel's include/linux/etherdevice.h is an optimized routine
  145 +to compare two ethernet MAC addresses for equality.
  146 +
  147 +bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
  148 +{
  149 +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  150 + u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
  151 + ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
  152 +
  153 + return fold == 0;
  154 +#else
  155 + const u16 *a = (const u16 *)addr1;
  156 + const u16 *b = (const u16 *)addr2;
  157 + return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0;
  158 +#endif
  159 +}
  160 +
  161 +In the above function, when the hardware has efficient unaligned access
  162 +capability, there is no issue with this code. But when the hardware isn't
  163 +able to access memory on arbitrary boundaries, the reference to a[0] causes
  164 +2 bytes (16 bits) to be read from memory starting at address addr1.
  165 +
  166 +Think about what would happen if addr1 was an odd address such as 0x10003.
  167 +(Hint: it'd be an unaligned access.)
  168 +
  169 +Despite the potential unaligned access problems with the above function, it
  170 +is included in the kernel anyway but is understood to only work normally on
  171 +16-bit-aligned addresses. It is up to the caller to ensure this alignment or
  172 +not use this function at all. This alignment-unsafe function is still useful
  173 +as it is a decent optimization for the cases when you can ensure alignment,
  174 +which is true almost all of the time in ethernet networking context.
  175 +
  176 +
  177 +Here is another example of some code that could cause unaligned accesses:
  178 + void myfunc(u8 *data, u32 value)
  179 + {
  180 + [...]
  181 + *((u32 *) data) = cpu_to_le32(value);
  182 + [...]
  183 + }
  184 +
  185 +This code will cause unaligned accesses every time the data parameter points
  186 +to an address that is not evenly divisible by 4.
  187 +
  188 +In summary, the 2 main scenarios where you may run into unaligned access
  189 +problems involve:
  190 + 1. Casting variables to types of different lengths
  191 + 2. Pointer arithmetic followed by access to at least 2 bytes of data
  192 +
  193 +
  194 +Avoiding unaligned accesses
  195 +===========================
  196 +
  197 +The easiest way to avoid unaligned access is to use the get_unaligned() and
  198 +put_unaligned() macros provided by the <asm/unaligned.h> header file.
  199 +
  200 +Going back to an earlier example of code that potentially causes unaligned
  201 +access:
  202 +
  203 + void myfunc(u8 *data, u32 value)
  204 + {
  205 + [...]
  206 + *((u32 *) data) = cpu_to_le32(value);
  207 + [...]
  208 + }
  209 +
  210 +To avoid the unaligned memory access, you would rewrite it as follows:
  211 +
  212 + void myfunc(u8 *data, u32 value)
  213 + {
  214 + [...]
  215 + value = cpu_to_le32(value);
  216 + put_unaligned(value, (u32 *) data);
  217 + [...]
  218 + }
  219 +
  220 +The get_unaligned() macro works similarly. Assuming 'data' is a pointer to
  221 +memory and you wish to avoid unaligned access, its usage is as follows:
  222 +
  223 + u32 value = get_unaligned((u32 *) data);
  224 +
  225 +These macros work for memory accesses of any length (not just 32 bits as
  226 +in the examples above). Be aware that when compared to standard access of
  227 +aligned memory, using these macros to access unaligned memory can be costly in
  228 +terms of performance.
  229 +
  230 +If use of such macros is not convenient, another option is to use memcpy(),
  231 +where the source or destination (or both) are of type u8* or unsigned char*.
  232 +Due to the byte-wise nature of this operation, unaligned accesses are avoided.
  233 +
  234 +--
  235 +In the Linux Kernel,
  236 +Authors: Daniel Drake <dsd@gentoo.org>,
  237 + Johannes Berg <johannes@sipsolutions.net>
  238 +With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
  239 +Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
  240 +Vadim Lobanov
... ... @@ -13,7 +13,4 @@
13 13 obj-y += lpt_commit.o scan.o lprops.o
14 14 obj-y += tnc.o tnc_misc.o debug.o crc16.o budget.o
15 15 obj-y += log.o orphan.o recovery.o replay.o
16   -
17   -# SEE README.arm-unaligned-accesses
18   -CFLAGS_super.o := $(PLATFORM_NO_UNALIGNED)
... ... @@ -65,7 +65,4 @@
65 65 obj-$(CONFIG_RANDOM_MACADDR) += rand.o
66 66 obj-$(CONFIG_BOOTP_RANDOM_DELAY) += rand.o
67 67 obj-$(CONFIG_CMD_LINK_LOCAL) += rand.o
68   -
69   -# SEE README.arm-unaligned-accesses
70   -CFLAGS_bzlib.o := $(PLATFORM_NO_UNALIGNED)