arm: Switch to -mno-unaligned-access when supported by the compiler

When we tell the compiler to optimize for ARMv7 (and ARMv6 for that matter) it assumes a default of SCTRL.A being cleared and unaligned accesses being allowed and fast at the hardware level. We set this bit and must pass along -mno-unaligned-access so that the compiler will still breakdown accesses and not trigger a data abort. To better help understand the requirements of the project with respect to unaligned memory access, the Documentation/unaligned-memory-access.txt file has been added as doc/README.unaligned-memory-access.txt and is taken from the v3.14-rc1 tag of the kernel. Cc: Albert ARIBAUD <albert.u.boot@aribaud.net> Cc: Mans Rullgard <mans@mansr.com> Signed-off-by: Tom Rini <trini@ti.com>

arm: Switch to -mno-unaligned-access when supported by the compiler
When we tell the compiler to optimize for ARMv7 (and ARMv6 for that matter) it assumes a default of SCTRL.A being cleared and unaligned accesses being allowed and fast at the hardware level. We set this bit and must pass along -mno-unaligned-access so that the compiler will still breakdown accesses and not trigger a data abort. To better help understand the requirements of the project with respect to unaligned memory access, the Documentation/unaligned-memory-access.txt file has been added as doc/README.unaligned-memory-access.txt and is taken from the v3.14-rc1 tag of the kernel. Cc: Albert ARIBAUD <albert.u.boot@aribaud.net> Cc: Mans Rullgard <mans@mansr.com> Signed-off-by: Tom Rini <trini@ti.com>
Tom Rini · Albert ARIBAUD
1 parent f503cc49a5
Showing 9 changed files with 248 additions and 138 deletions Side-by-side Diff
README
arch/arm/cpu/armv7/config.mk
arch/arm/cpu/armv8/config.mk
arch/arm/lib/interrupts.c
common/Makefile
doc/README.arm-unaligned-accesses
doc/README.unaligned-memory-access.txt
fs/ubifs/Makefile
lib/Makefile
@@ -1726,7 +1726,7 @@
  
 		If this option is set, then U-Boot will prevent the environment
 		variable "splashimage" from being set to a problematic address
-		(see README.displaying-bmps and README.arm-unaligned-accesses).
+		(see README.displaying-bmps).
 		This option is useful for targets where, due to alignment
 		restrictions, an improperly aligned BMP image will cause a data
 		abort. If you think you will not have problems with unaligned
@@ -10,9 +10,12 @@
 PF_CPPFLAGS_ARMV7 := $(call cc-option, -march=armv7-a, -march=armv5)
 PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV7)
  
-# SEE README.arm-unaligned-accesses
+# On supported platforms we set the bit which causes us to trap on unaligned
+# memory access.  This is the opposite of what the compiler expects to be
+# the default so we must pass in -mno-unaligned-access so that it is aware
+# of our decision.
 PF_NO_UNALIGNED := $(call cc-option, -mno-unaligned-access,)
-PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED)
+PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED)
  
 ifneq ($(CONFIG_IMX_CONFIG),)
 ifdef CONFIG_SPL
@@ -6,11 +6,8 @@
 #
 PLATFORM_RELFLAGS += -fno-common -ffixed-x18
  
-# SEE README.arm-unaligned-accesses
-PF_NO_UNALIGNED := $(call cc-option, -mstrict-align)
-PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED)
-
 PF_CPPFLAGS_ARMV8 := $(call cc-option, -march=armv8-a)
+PF_NO_UNALIGNED := $(call cc-option, -mstrict-align)
 PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV8)
 PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED)
@@ -153,7 +153,7 @@
  
 void do_data_abort (struct pt_regs *pt_regs)
 {
-	printf ("data abort\n\n    MAYBE you should read doc/README.arm-unaligned-accesses\n\n");
+	printf ("data abort\n");
 	show_regs (pt_regs);
 	bad_mode ();
 }
@@ -239,6 +239,4 @@
 obj-y += stdio.o
  
 CFLAGS_env_embedded.o := -Wa,--no-warn -DENV_CRC=$(shell tools/envcrc 2>/dev/null)
-CFLAGS_hush.o := $(PLATFORM_NO_UNALIGNED)
-CFLAGS_fdt_support.o := $(PLATFORM_NO_UNALIGNED)
-If you are reading this because of a data abort: the following MIGHT
-be relevant to your abort, if it was caused by an alignment violation.
-In order to determine this, use the PC from the abort dump along with
-an objdump -s -S of the u-boot ELF binary to locate the function where
-the abort happened; then compare this function with the examples below.
-If they match, then you've been hit with a compiler generated unaligned
-access, and you should rewrite your code or add -mno-unaligned-access
-to the command line of the offending file.
-
-Note that the PC shown in the abort message is relocated. In order to
-be able to match it to an address in the ELF binary dump, you will need
-to know the relocation offset. If your target defines CONFIG_CMD_BDI
-and if you can get to the prompt and enter commands before the abort
-happens, then command "bdinfo" will give you the offset. Otherwise you
-will need to try a build with DEBUG set, which will display the offset,
-or use a debugger and set a breakpoint at relocate_code() to see the
-offset (passed as an argument).
-
-*
-
-Since U-Boot runs on a variety of hardware, some only able to perform
-unaligned accesses with a strong penalty, some unable to perform them
-at all, the policy regarding unaligned accesses is to not perform any,
-unless absolutely necessary because of hardware or standards.
-
-Also, on hardware which permits it, the core is configured to throw
-data abort exceptions on unaligned accesses in order to catch these
-unallowed accesses as early as possible.
-
-Until version 4.7, the gcc default for performing unaligned accesses
-(-mno-unaligned-access) is to emulate unaligned accesses using aligned
-loads and stores plus shifts and masks. Emulated unaligned accesses
-will not be caught by hardware. These accesses may be costly and may
-be actually unnecessary. In order to catch these accesses and remove
-or optimize them, option -munaligned-access is explicitly set for all
-versions of gcc which support it.
-
-From gcc 4.7 onward starting at armv7 architectures, the default for
-performing unaligned accesses is to use unaligned native loads and
-stores (-munaligned-access), because the cost of unaligned accesses
-has dropped on armv7 and beyond. This should not affect U-Boot's
-policy of controlling unaligned accesses, however the compiler may
-generate uncontrolled unaligned accesses on its own in at least one
-known case: when declaring a local initialized char array, e.g.
-
-function foo()
-{
-	char buffer[] = "initial value";
-/* or */
-	char buffer[] = { 'i', 'n', 'i', 't', 0 };
-	...
-}
-
-Under -munaligned-accesses with optimizations on, this declaration
-causes the compiler to generate native loads from the literal string
-and native stores to the buffer, and the literal string alignment
-cannot be controlled. If it is misaligned, then the core will throw
-a data abort exception.
-
-Quite probably the same might happen for 16-bit array initializations
-where the constant is aligned on a boundary which is a multiple of 2
-but not of 4:
-
-function foo()
-{
-	u16 buffer[] = { 1, 2, 3 };
-	...
-}
-
-The long term solution to this issue is to add an option to gcc to
-allow controlling the general alignment of data, including constant
-initialization values.
-
-However this will only apply to the version of gcc which will have such
-an option. For other versions, there are four workarounds:
-
-a) Enforce as a rule that array initializations as described above
-   are forbidden. This is generally not acceptable as they are valid,
-   and usual, C constructs. The only case where they could be rejected
-   is when they actually equate to a const char* declaration, i.e. the
-   array is initialized and never modified in the function's scope.
-
-b) Drop the requirement on unaligned accesses at least for ARMv7,
-   i.e. do not throw a data abort exception upon unaligned accesses.
-   But that will allow adding badly aligned code to U-Boot, only for
-   it to fail when re-used with a stricter target, possibly once the
-   bad code is already in mainline.
-
-c) Relax the -munaligned-access rule globally. This will prevent native
-   unaligned accesses of course, but that will also hide any bug caused
-   by a bad unaligned access, making it much harder to diagnose it. It
-   is actually what already happens when building ARM targets with a
-   pre-4.7 gcc, and it may actually already hide some bugs yet unseen
-   until the target gets compiled with -munaligned-access.
-
-d) Relax the -munaligned-access rule only for for files susceptible to
-   the local initialized array issue and for armv7 architectures and
-   beyond. This minimizes the quantity of code which can hide unwanted
-   misaligned accesses.
-
-The option retained is d).
-
-Considering that actual occurrences of the issue are rare (as of this
-writing, 5 files out of 7840 in U-Boot, or .3%, contain an initialized
-local char array which cannot actually be replaced with a const char*),
-contributors should not be required to systematically try and detect
-the issue in their patches.
-
-Detecting files susceptible to the issue can be automated through a
-filter installed as a hook in .git which recognizes local char array
-initializations. Automation should err on the false positive side, for
-instance flagging non-local arrays as if they were local if they cannot
-be told apart.
-
-In any case, detection shall not prevent committing the patch, but
-shall pre-populate the commit message with a note to the effect that
-this patch contains an initialized local char or 16-bit array and thus
-should be protected from the gcc 4.7 issue.
-
-Upon a positive detection, either $(PLATFORM_NO_UNALIGNED) should be
-added to CFLAGS for the affected file(s), or if the array is a pseudo
-const char*, it should be replaced by an actual one.
+Editors note: This document is _heavily_ cribbed from the Linux Kernel, with
+really only the section about "Alignment vs. Networking" removed.
+
+UNALIGNED MEMORY ACCESSES
+=========================
+
+Linux runs on a wide variety of architectures which have varying behaviour
+when it comes to memory access. This document presents some details about
+unaligned accesses, why you need to write code that doesn't cause them,
+and how to write such code!
+
+
+The definition of an unaligned access
+=====================================
+
+Unaligned memory accesses occur when you try to read N bytes of data starting
+from an address that is not evenly divisible by N (i.e. addr % N != 0).
+For example, reading 4 bytes of data from address 0x10004 is fine, but
+reading 4 bytes of data from address 0x10005 would be an unaligned memory
+access.
+
+The above may seem a little vague, as memory access can happen in different
+ways. The context here is at the machine code level: certain instructions read
+or write a number of bytes to or from memory (e.g. movb, movw, movl in x86
+assembly). As will become clear, it is relatively easy to spot C statements
+which will compile to multiple-byte memory access instructions, namely when
+dealing with types such as u16, u32 and u64.
+
+
+Natural alignment
+=================
+
+The rule mentioned above forms what we refer to as natural alignment:
+When accessing N bytes of memory, the base memory address must be evenly
+divisible by N, i.e. addr % N == 0.
+
+When writing code, assume the target architecture has natural alignment
+requirements.
+
+In reality, only a few architectures require natural alignment on all sizes
+of memory access. However, we must consider ALL supported architectures;
+writing code that satisfies natural alignment requirements is the easiest way
+to achieve full portability.
+
+
+Why unaligned access is bad
+===========================
+
+The effects of performing an unaligned memory access vary from architecture
+to architecture. It would be easy to write a whole document on the differences
+here; a summary of the common scenarios is presented below:
+
+ - Some architectures are able to perform unaligned memory accesses
+   transparently, but there is usually a significant performance cost.
+ - Some architectures raise processor exceptions when unaligned accesses
+   happen. The exception handler is able to correct the unaligned access,
+   at significant cost to performance.
+ - Some architectures raise processor exceptions when unaligned accesses
+   happen, but the exceptions do not contain enough information for the
+   unaligned access to be corrected.
+ - Some architectures are not capable of unaligned memory access, but will
+   silently perform a different memory access to the one that was requested,
+   resulting in a subtle code bug that is hard to detect!
+
+It should be obvious from the above that if your code causes unaligned
+memory accesses to happen, your code will not work correctly on certain
+platforms and will cause performance problems on others.
+
+
+Code that does not cause unaligned access
+=========================================
+
+At first, the concepts above may seem a little hard to relate to actual
+coding practice. After all, you don't have a great deal of control over
+memory addresses of certain variables, etc.
+
+Fortunately things are not too complex, as in most cases, the compiler
+ensures that things will work for you. For example, take the following
+structure:
+
+	struct foo {
+		u16 field1;
+		u32 field2;
+		u8 field3;
+	};
+
+Let us assume that an instance of the above structure resides in memory
+starting at address 0x10000. With a basic level of understanding, it would
+not be unreasonable to expect that accessing field2 would cause an unaligned
+access. You'd be expecting field2 to be located at offset 2 bytes into the
+structure, i.e. address 0x10002, but that address is not evenly divisible
+by 4 (remember, we're reading a 4 byte value here).
+
+Fortunately, the compiler understands the alignment constraints, so in the
+above case it would insert 2 bytes of padding in between field1 and field2.
+Therefore, for standard structure types you can always rely on the compiler
+to pad structures so that accesses to fields are suitably aligned (assuming
+you do not cast the field to a type of different length).
+
+Similarly, you can also rely on the compiler to align variables and function
+parameters to a naturally aligned scheme, based on the size of the type of
+the variable.
+
+At this point, it should be clear that accessing a single byte (u8 or char)
+will never cause an unaligned access, because all memory addresses are evenly
+divisible by one.
+
+On a related topic, with the above considerations in mind you may observe
+that you could reorder the fields in the structure in order to place fields
+where padding would otherwise be inserted, and hence reduce the overall
+resident memory size of structure instances. The optimal layout of the
+above example is:
+
+	struct foo {
+		u32 field2;
+		u16 field1;
+		u8 field3;
+	};
+
+For a natural alignment scheme, the compiler would only have to add a single
+byte of padding at the end of the structure. This padding is added in order
+to satisfy alignment constraints for arrays of these structures.
+
+Another point worth mentioning is the use of __attribute__((packed)) on a
+structure type. This GCC-specific attribute tells the compiler never to
+insert any padding within structures, useful when you want to use a C struct
+to represent some data that comes in a fixed arrangement 'off the wire'.
+
+You might be inclined to believe that usage of this attribute can easily
+lead to unaligned accesses when accessing fields that do not satisfy
+architectural alignment requirements. However, again, the compiler is aware
+of the alignment constraints and will generate extra instructions to perform
+the memory access in a way that does not cause unaligned access. Of course,
+the extra instructions obviously cause a loss in performance compared to the
+non-packed case, so the packed attribute should only be used when avoiding
+structure padding is of importance.
+
+
+Code that causes unaligned access
+=================================
+
+With the above in mind, let's move onto a real life example of a function
+that can cause an unaligned memory access. The following function taken
+from the Linux Kernel's include/linux/etherdevice.h is an optimized routine
+to compare two ethernet MAC addresses for equality.
+
+bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
+{
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+	u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
+		   ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
+
+	return fold == 0;
+#else
+	const u16 *a = (const u16 *)addr1;
+	const u16 *b = (const u16 *)addr2;
+	return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0;
+#endif
+}
+
+In the above function, when the hardware has efficient unaligned access
+capability, there is no issue with this code.  But when the hardware isn't
+able to access memory on arbitrary boundaries, the reference to a[0] causes
+2 bytes (16 bits) to be read from memory starting at address addr1.
+
+Think about what would happen if addr1 was an odd address such as 0x10003.
+(Hint: it'd be an unaligned access.)
+
+Despite the potential unaligned access problems with the above function, it
+is included in the kernel anyway but is understood to only work normally on
+16-bit-aligned addresses. It is up to the caller to ensure this alignment or
+not use this function at all. This alignment-unsafe function is still useful
+as it is a decent optimization for the cases when you can ensure alignment,
+which is true almost all of the time in ethernet networking context.
+
+
+Here is another example of some code that could cause unaligned accesses:
+	void myfunc(u8 *data, u32 value)
+	{
+		[...]
+		*((u32 *) data) = cpu_to_le32(value);
+		[...]
+	}
+
+This code will cause unaligned accesses every time the data parameter points
+to an address that is not evenly divisible by 4.
+
+In summary, the 2 main scenarios where you may run into unaligned access
+problems involve:
+ 1. Casting variables to types of different lengths
+ 2. Pointer arithmetic followed by access to at least 2 bytes of data
+
+
+Avoiding unaligned accesses
+===========================
+
+The easiest way to avoid unaligned access is to use the get_unaligned() and
+put_unaligned() macros provided by the <asm/unaligned.h> header file.
+
+Going back to an earlier example of code that potentially causes unaligned
+access:
+
+	void myfunc(u8 *data, u32 value)
+	{
+		[...]
+		*((u32 *) data) = cpu_to_le32(value);
+		[...]
+	}
+
+To avoid the unaligned memory access, you would rewrite it as follows:
+
+	void myfunc(u8 *data, u32 value)
+	{
+		[...]
+		value = cpu_to_le32(value);
+		put_unaligned(value, (u32 *) data);
+		[...]
+	}
+
+The get_unaligned() macro works similarly. Assuming 'data' is a pointer to
+memory and you wish to avoid unaligned access, its usage is as follows:
+
+	u32 value = get_unaligned((u32 *) data);
+
+These macros work for memory accesses of any length (not just 32 bits as
+in the examples above). Be aware that when compared to standard access of
+aligned memory, using these macros to access unaligned memory can be costly in
+terms of performance.
+
+If use of such macros is not convenient, another option is to use memcpy(),
+where the source or destination (or both) are of type u8* or unsigned char*.
+Due to the byte-wise nature of this operation, unaligned accesses are avoided.
+
+--
+In the Linux Kernel,
+Authors: Daniel Drake <dsd@gentoo.org>,
+         Johannes Berg <johannes@sipsolutions.net>
+With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
+Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
+Vadim Lobanov
@@ -13,7 +13,4 @@
 obj-y += lpt_commit.o scan.o lprops.o
 obj-y += tnc.o tnc_misc.o debug.o crc16.o budget.o
 obj-y += log.o orphan.o recovery.o replay.o
-
-# SEE README.arm-unaligned-accesses
-CFLAGS_super.o := $(PLATFORM_NO_UNALIGNED)
@@ -65,7 +65,4 @@
 obj-$(CONFIG_RANDOM_MACADDR) += rand.o
 obj-$(CONFIG_BOOTP_RANDOM_DELAY) += rand.o
 obj-$(CONFIG_CMD_LINK_LOCAL) += rand.o
-
-# SEE README.arm-unaligned-accesses
-CFLAGS_bzlib.o := $(PLATFORM_NO_UNALIGNED)
...	...	@@ -1726,7 +1726,7 @@
1726	1726
1727	1727	If this option is set, then U-Boot will prevent the environment
1728	1728	variable "splashimage" from being set to a problematic address
1729		- (see README.displaying-bmps and README.arm-unaligned-accesses).
	1729	+ (see README.displaying-bmps).
1730	1730	This option is useful for targets where, due to alignment
1731	1731	restrictions, an improperly aligned BMP image will cause a data
1732	1732	abort. If you think you will not have problems with unaligned
...	...	@@ -10,9 +10,12 @@
10	10	PF_CPPFLAGS_ARMV7 := $(call cc-option, -march=armv7-a, -march=armv5)
11	11	PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV7)
12	12
13		-# SEE README.arm-unaligned-accesses
	13	+# On supported platforms we set the bit which causes us to trap on unaligned
	14	+# memory access. This is the opposite of what the compiler expects to be
	15	+# the default so we must pass in -mno-unaligned-access so that it is aware
	16	+# of our decision.
14	17	PF_NO_UNALIGNED := $(call cc-option, -mno-unaligned-access,)
15		-PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED)
	18	+PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED)
16	19
17	20	ifneq ($(CONFIG_IMX_CONFIG),)
18	21	ifdef CONFIG_SPL
...	...	@@ -6,11 +6,8 @@
6	6	#
7	7	PLATFORM_RELFLAGS += -fno-common -ffixed-x18
8	8
9		-# SEE README.arm-unaligned-accesses
10		-PF_NO_UNALIGNED := $(call cc-option, -mstrict-align)
11		-PLATFORM_NO_UNALIGNED := $(PF_NO_UNALIGNED)
12		-
13	9	PF_CPPFLAGS_ARMV8 := $(call cc-option, -march=armv8-a)
	10	+PF_NO_UNALIGNED := $(call cc-option, -mstrict-align)
14	11	PLATFORM_CPPFLAGS += $(PF_CPPFLAGS_ARMV8)
15	12	PLATFORM_CPPFLAGS += $(PF_NO_UNALIGNED)
...	...	@@ -153,7 +153,7 @@
153	153
154	154	void do_data_abort (struct pt_regs *pt_regs)
155	155	{
156		- printf ("data abort\n\n MAYBE you should read doc/README.arm-unaligned-accesses\n\n");
	156	+ printf ("data abort\n");
157	157	show_regs (pt_regs);
158	158	bad_mode ();
159	159	}
...	...	@@ -239,6 +239,4 @@
239	239	obj-y += stdio.o
240	240
241	241	CFLAGS_env_embedded.o := -Wa,--no-warn -DENV_CRC=$(shell tools/envcrc 2>/dev/null)
242		-CFLAGS_hush.o := $(PLATFORM_NO_UNALIGNED)
243		-CFLAGS_fdt_support.o := $(PLATFORM_NO_UNALIGNED)
1		-If you are reading this because of a data abort: the following MIGHT
2		-be relevant to your abort, if it was caused by an alignment violation.
3		-In order to determine this, use the PC from the abort dump along with
4		-an objdump -s -S of the u-boot ELF binary to locate the function where
5		-the abort happened; then compare this function with the examples below.
6		-If they match, then you've been hit with a compiler generated unaligned
7		-access, and you should rewrite your code or add -mno-unaligned-access
8		-to the command line of the offending file.
9		-
10		-Note that the PC shown in the abort message is relocated. In order to
11		-be able to match it to an address in the ELF binary dump, you will need
12		-to know the relocation offset. If your target defines CONFIG_CMD_BDI
13		-and if you can get to the prompt and enter commands before the abort
14		-happens, then command "bdinfo" will give you the offset. Otherwise you
15		-will need to try a build with DEBUG set, which will display the offset,
16		-or use a debugger and set a breakpoint at relocate_code() to see the
17		-offset (passed as an argument).
18		-
19		-*
20		-
21		-Since U-Boot runs on a variety of hardware, some only able to perform
22		-unaligned accesses with a strong penalty, some unable to perform them
23		-at all, the policy regarding unaligned accesses is to not perform any,
24		-unless absolutely necessary because of hardware or standards.
25		-
26		-Also, on hardware which permits it, the core is configured to throw
27		-data abort exceptions on unaligned accesses in order to catch these
28		-unallowed accesses as early as possible.
29		-
30		-Until version 4.7, the gcc default for performing unaligned accesses
31		-(-mno-unaligned-access) is to emulate unaligned accesses using aligned
32		-loads and stores plus shifts and masks. Emulated unaligned accesses
33		-will not be caught by hardware. These accesses may be costly and may
34		-be actually unnecessary. In order to catch these accesses and remove
35		-or optimize them, option -munaligned-access is explicitly set for all
36		-versions of gcc which support it.
37		-
38		-From gcc 4.7 onward starting at armv7 architectures, the default for
39		-performing unaligned accesses is to use unaligned native loads and
40		-stores (-munaligned-access), because the cost of unaligned accesses
41		-has dropped on armv7 and beyond. This should not affect U-Boot's
42		-policy of controlling unaligned accesses, however the compiler may
43		-generate uncontrolled unaligned accesses on its own in at least one
44		-known case: when declaring a local initialized char array, e.g.
45		-
46		-function foo()
47		-{
48		- char buffer[] = "initial value";
49		-/* or */
50		- char buffer[] = { 'i', 'n', 'i', 't', 0 };
51		- ...
52		-}
53		-
54		-Under -munaligned-accesses with optimizations on, this declaration
55		-causes the compiler to generate native loads from the literal string
56		-and native stores to the buffer, and the literal string alignment
57		-cannot be controlled. If it is misaligned, then the core will throw
58		-a data abort exception.
59		-
60		-Quite probably the same might happen for 16-bit array initializations
61		-where the constant is aligned on a boundary which is a multiple of 2
62		-but not of 4:
63		-
64		-function foo()
65		-{
66		- u16 buffer[] = { 1, 2, 3 };
67		- ...
68		-}
69		-
70		-The long term solution to this issue is to add an option to gcc to
71		-allow controlling the general alignment of data, including constant
72		-initialization values.
73		-
74		-However this will only apply to the version of gcc which will have such
75		-an option. For other versions, there are four workarounds:
76		-
77		-a) Enforce as a rule that array initializations as described above
78		- are forbidden. This is generally not acceptable as they are valid,
79		- and usual, C constructs. The only case where they could be rejected
80		- is when they actually equate to a const char* declaration, i.e. the
81		- array is initialized and never modified in the function's scope.
82		-
83		-b) Drop the requirement on unaligned accesses at least for ARMv7,
84		- i.e. do not throw a data abort exception upon unaligned accesses.
85		- But that will allow adding badly aligned code to U-Boot, only for
86		- it to fail when re-used with a stricter target, possibly once the
87		- bad code is already in mainline.
88		-
89		-c) Relax the -munaligned-access rule globally. This will prevent native
90		- unaligned accesses of course, but that will also hide any bug caused
91		- by a bad unaligned access, making it much harder to diagnose it. It
92		- is actually what already happens when building ARM targets with a
93		- pre-4.7 gcc, and it may actually already hide some bugs yet unseen
94		- until the target gets compiled with -munaligned-access.
95		-
96		-d) Relax the -munaligned-access rule only for for files susceptible to
97		- the local initialized array issue and for armv7 architectures and
98		- beyond. This minimizes the quantity of code which can hide unwanted
99		- misaligned accesses.
100		-
101		-The option retained is d).
102		-
103		-Considering that actual occurrences of the issue are rare (as of this
104		-writing, 5 files out of 7840 in U-Boot, or .3%, contain an initialized
105		-local char array which cannot actually be replaced with a const char*),
106		-contributors should not be required to systematically try and detect
107		-the issue in their patches.
108		-
109		-Detecting files susceptible to the issue can be automated through a
110		-filter installed as a hook in .git which recognizes local char array
111		-initializations. Automation should err on the false positive side, for
112		-instance flagging non-local arrays as if they were local if they cannot
113		-be told apart.
114		-
115		-In any case, detection shall not prevent committing the patch, but
116		-shall pre-populate the commit message with a note to the effect that
117		-this patch contains an initialized local char or 16-bit array and thus
118		-should be protected from the gcc 4.7 issue.
119		-
120		-Upon a positive detection, either $(PLATFORM_NO_UNALIGNED) should be
121		-added to CFLAGS for the affected file(s), or if the array is a pseudo
122		-const char*, it should be replaced by an actual one.
	1	+Editors note: This document is _heavily_ cribbed from the Linux Kernel, with
	2	+really only the section about "Alignment vs. Networking" removed.
	3	+
	4	+UNALIGNED MEMORY ACCESSES
	5	+=========================
	6	+
	7	+Linux runs on a wide variety of architectures which have varying behaviour
	8	+when it comes to memory access. This document presents some details about
	9	+unaligned accesses, why you need to write code that doesn't cause them,
	10	+and how to write such code!
	11	+
	12	+
	13	+The definition of an unaligned access
	14	+=====================================
	15	+
	16	+Unaligned memory accesses occur when you try to read N bytes of data starting
	17	+from an address that is not evenly divisible by N (i.e. addr % N != 0).
	18	+For example, reading 4 bytes of data from address 0x10004 is fine, but
	19	+reading 4 bytes of data from address 0x10005 would be an unaligned memory
	20	+access.
	21	+
	22	+The above may seem a little vague, as memory access can happen in different
	23	+ways. The context here is at the machine code level: certain instructions read
	24	+or write a number of bytes to or from memory (e.g. movb, movw, movl in x86
	25	+assembly). As will become clear, it is relatively easy to spot C statements
	26	+which will compile to multiple-byte memory access instructions, namely when
	27	+dealing with types such as u16, u32 and u64.
	28	+
	29	+
	30	+Natural alignment
	31	+=================
	32	+
	33	+The rule mentioned above forms what we refer to as natural alignment:
	34	+When accessing N bytes of memory, the base memory address must be evenly
	35	+divisible by N, i.e. addr % N == 0.
	36	+
	37	+When writing code, assume the target architecture has natural alignment
	38	+requirements.
	39	+
	40	+In reality, only a few architectures require natural alignment on all sizes
	41	+of memory access. However, we must consider ALL supported architectures;
	42	+writing code that satisfies natural alignment requirements is the easiest way
	43	+to achieve full portability.
	44	+
	45	+
	46	+Why unaligned access is bad
	47	+===========================
	48	+
	49	+The effects of performing an unaligned memory access vary from architecture
	50	+to architecture. It would be easy to write a whole document on the differences
	51	+here; a summary of the common scenarios is presented below:
	52	+
	53	+ - Some architectures are able to perform unaligned memory accesses
	54	+ transparently, but there is usually a significant performance cost.
	55	+ - Some architectures raise processor exceptions when unaligned accesses
	56	+ happen. The exception handler is able to correct the unaligned access,
	57	+ at significant cost to performance.
	58	+ - Some architectures raise processor exceptions when unaligned accesses
	59	+ happen, but the exceptions do not contain enough information for the
	60	+ unaligned access to be corrected.
	61	+ - Some architectures are not capable of unaligned memory access, but will
	62	+ silently perform a different memory access to the one that was requested,
	63	+ resulting in a subtle code bug that is hard to detect!
	64	+
	65	+It should be obvious from the above that if your code causes unaligned
	66	+memory accesses to happen, your code will not work correctly on certain
	67	+platforms and will cause performance problems on others.
	68	+
	69	+
	70	+Code that does not cause unaligned access
	71	+=========================================
	72	+
	73	+At first, the concepts above may seem a little hard to relate to actual
	74	+coding practice. After all, you don't have a great deal of control over
	75	+memory addresses of certain variables, etc.
	76	+
	77	+Fortunately things are not too complex, as in most cases, the compiler
	78	+ensures that things will work for you. For example, take the following
	79	+structure:
	80	+
	81	+ struct foo {
	82	+ u16 field1;
	83	+ u32 field2;
	84	+ u8 field3;
	85	+ };
	86	+
	87	+Let us assume that an instance of the above structure resides in memory
	88	+starting at address 0x10000. With a basic level of understanding, it would
	89	+not be unreasonable to expect that accessing field2 would cause an unaligned
	90	+access. You'd be expecting field2 to be located at offset 2 bytes into the
	91	+structure, i.e. address 0x10002, but that address is not evenly divisible
	92	+by 4 (remember, we're reading a 4 byte value here).
	93	+
	94	+Fortunately, the compiler understands the alignment constraints, so in the
	95	+above case it would insert 2 bytes of padding in between field1 and field2.
	96	+Therefore, for standard structure types you can always rely on the compiler
	97	+to pad structures so that accesses to fields are suitably aligned (assuming
	98	+you do not cast the field to a type of different length).
	99	+
	100	+Similarly, you can also rely on the compiler to align variables and function
	101	+parameters to a naturally aligned scheme, based on the size of the type of
	102	+the variable.
	103	+
	104	+At this point, it should be clear that accessing a single byte (u8 or char)
	105	+will never cause an unaligned access, because all memory addresses are evenly
	106	+divisible by one.
	107	+
	108	+On a related topic, with the above considerations in mind you may observe
	109	+that you could reorder the fields in the structure in order to place fields
	110	+where padding would otherwise be inserted, and hence reduce the overall
	111	+resident memory size of structure instances. The optimal layout of the
	112	+above example is:
	113	+
	114	+ struct foo {
	115	+ u32 field2;
	116	+ u16 field1;
	117	+ u8 field3;
	118	+ };
	119	+
	120	+For a natural alignment scheme, the compiler would only have to add a single
	121	+byte of padding at the end of the structure. This padding is added in order
	122	+to satisfy alignment constraints for arrays of these structures.
	123	+
	124	+Another point worth mentioning is the use of __attribute__((packed)) on a
	125	+structure type. This GCC-specific attribute tells the compiler never to
	126	+insert any padding within structures, useful when you want to use a C struct
	127	+to represent some data that comes in a fixed arrangement 'off the wire'.
	128	+
	129	+You might be inclined to believe that usage of this attribute can easily
	130	+lead to unaligned accesses when accessing fields that do not satisfy
	131	+architectural alignment requirements. However, again, the compiler is aware
	132	+of the alignment constraints and will generate extra instructions to perform
	133	+the memory access in a way that does not cause unaligned access. Of course,
	134	+the extra instructions obviously cause a loss in performance compared to the
	135	+non-packed case, so the packed attribute should only be used when avoiding
	136	+structure padding is of importance.
	137	+
	138	+
	139	+Code that causes unaligned access
	140	+=================================
	141	+
	142	+With the above in mind, let's move onto a real life example of a function
	143	+that can cause an unaligned memory access. The following function taken
	144	+from the Linux Kernel's include/linux/etherdevice.h is an optimized routine
	145	+to compare two ethernet MAC addresses for equality.
	146	+
	147	+bool ether_addr_equal(const u8 addr1, const u8 addr2)
	148	+{
	149	+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
	150	+ u32 fold = (((const u32 )addr1) ^ ((const u32 )addr2)) \|
	151	+ (((const u16 )(addr1 + 4)) ^ ((const u16 )(addr2 + 4)));
	152	+
	153	+ return fold == 0;
	154	+#else
	155	+ const u16 a = (const u16 )addr1;
	156	+ const u16 b = (const u16 )addr2;
	157	+ return ((a[0] ^ b[0]) \| (a[1] ^ b[1]) \| (a[2] ^ b[2])) != 0;
	158	+#endif
	159	+}
	160	+
	161	+In the above function, when the hardware has efficient unaligned access
	162	+capability, there is no issue with this code. But when the hardware isn't
	163	+able to access memory on arbitrary boundaries, the reference to a[0] causes
	164	+2 bytes (16 bits) to be read from memory starting at address addr1.
	165	+
	166	+Think about what would happen if addr1 was an odd address such as 0x10003.
	167	+(Hint: it'd be an unaligned access.)
	168	+
	169	+Despite the potential unaligned access problems with the above function, it
	170	+is included in the kernel anyway but is understood to only work normally on
	171	+16-bit-aligned addresses. It is up to the caller to ensure this alignment or
	172	+not use this function at all. This alignment-unsafe function is still useful
	173	+as it is a decent optimization for the cases when you can ensure alignment,
	174	+which is true almost all of the time in ethernet networking context.
	175	+
	176	+
	177	+Here is another example of some code that could cause unaligned accesses:
	178	+ void myfunc(u8 *data, u32 value)
	179	+ {
	180	+ [...]
	181	+ ((u32 ) data) = cpu_to_le32(value);
	182	+ [...]
	183	+ }
	184	+
	185	+This code will cause unaligned accesses every time the data parameter points
	186	+to an address that is not evenly divisible by 4.
	187	+
	188	+In summary, the 2 main scenarios where you may run into unaligned access
	189	+problems involve:
	190	+ 1. Casting variables to types of different lengths
	191	+ 2. Pointer arithmetic followed by access to at least 2 bytes of data
	192	+
	193	+
	194	+Avoiding unaligned accesses
	195	+===========================
	196	+
	197	+The easiest way to avoid unaligned access is to use the get_unaligned() and
	198	+put_unaligned() macros provided by the <asm/unaligned.h> header file.
	199	+
	200	+Going back to an earlier example of code that potentially causes unaligned
	201	+access:
	202	+
	203	+ void myfunc(u8 *data, u32 value)
	204	+ {
	205	+ [...]
	206	+ ((u32 ) data) = cpu_to_le32(value);
	207	+ [...]
	208	+ }
	209	+
	210	+To avoid the unaligned memory access, you would rewrite it as follows:
	211	+
	212	+ void myfunc(u8 *data, u32 value)
	213	+ {
	214	+ [...]
	215	+ value = cpu_to_le32(value);
	216	+ put_unaligned(value, (u32 *) data);
	217	+ [...]
	218	+ }
	219	+
	220	+The get_unaligned() macro works similarly. Assuming 'data' is a pointer to
	221	+memory and you wish to avoid unaligned access, its usage is as follows:
	222	+
	223	+ u32 value = get_unaligned((u32 *) data);
	224	+
	225	+These macros work for memory accesses of any length (not just 32 bits as
	226	+in the examples above). Be aware that when compared to standard access of
	227	+aligned memory, using these macros to access unaligned memory can be costly in
	228	+terms of performance.
	229	+
	230	+If use of such macros is not convenient, another option is to use memcpy(),
	231	+where the source or destination (or both) are of type u8* or unsigned char*.
	232	+Due to the byte-wise nature of this operation, unaligned accesses are avoided.
	233	+
	234	+--
	235	+In the Linux Kernel,
	236	+Authors: Daniel Drake <dsd@gentoo.org>,
	237	+ Johannes Berg <johannes@sipsolutions.net>
	238	+With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
	239	+Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
	240	+Vadim Lobanov
...	...	@@ -13,7 +13,4 @@
13	13	obj-y += lpt_commit.o scan.o lprops.o
14	14	obj-y += tnc.o tnc_misc.o debug.o crc16.o budget.o
15	15	obj-y += log.o orphan.o recovery.o replay.o
16		-
17		-# SEE README.arm-unaligned-accesses
18		-CFLAGS_super.o := $(PLATFORM_NO_UNALIGNED)
...	...	@@ -65,7 +65,4 @@
65	65	obj-$(CONFIG_RANDOM_MACADDR) += rand.o
66	66	obj-$(CONFIG_BOOTP_RANDOM_DELAY) += rand.o
67	67	obj-$(CONFIG_CMD_LINK_LOCAL) += rand.o
68		-
69		-# SEE README.arm-unaligned-accesses
70		-CFLAGS_bzlib.o := $(PLATFORM_NO_UNALIGNED)