Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

Commit 5776563648f6437ede91c91cbad85862ca682b0b

Authored by Qiaowei Ren 2014-11-14 23:18:32 +0800

Committed by Thomas Gleixner 2014-11-18 07:58:54 +0800

Exists in ti-lsk-linux-4.1.y and in 10 other branches

x86, mpx: Add documentation on Intel MPX

This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151832.7FDB1720@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Showing 1 changed file with 234 additions and 0 deletions Side-by-side Diff

Documentation/x86/intel_mpx.txt

Documentation/x86/intel_mpx.txt

Diff comments View file @ 5776563

	1	+1. Intel(R) MPX Overview
	2	+========================
	3	+
	4	+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
	5	+introduced into Intel Architecture. Intel MPX provides hardware features
	6	+that can be used in conjunction with compiler changes to check memory
	7	+references, for those references whose compile-time normal intentions are
	8	+usurped at runtime due to buffer overflow or underflow.
	9	+
	10	+For more information, please refer to Intel(R) Architecture Instruction
	11	+Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
	12	+Extensions.
	13	+
	14	+Note: Currently no hardware with MPX ISA is available but it is always
	15	+possible to use SDE (Intel(R) Software Development Emulator) instead, which
	16	+can be downloaded from
	17	+http://software.intel.com/en-us/articles/intel-software-development-emulator
	18	+
	19	+
	20	+2. How to get the advantage of MPX
	21	+==================================
	22	+
	23	+For MPX to work, changes are required in the kernel, binutils and compiler.
	24	+No source changes are required for applications, just a recompile.
	25	+
	26	+There are a lot of moving parts of this to all work right. The following
	27	+is how we expect the compiler, application and kernel to work together.
	28	+
	29	+1) Application developer compiles with -fmpx. The compiler will add the
	30	+ instrumentation as well as some setup code called early after the app
	31	+ starts. New instruction prefixes are noops for old CPUs.
	32	+2) That setup code allocates (virtual) space for the "bounds directory",
	33	+ points the "bndcfgu" register to the directory and notifies the kernel
	34	+ (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using
	35	+ MPX.
	36	+3) The kernel detects that the CPU has MPX, allows the new prctl() to
	37	+ succeed, and notes the location of the bounds directory. Userspace is
	38	+ expected to keep the bounds directory at that locationWe note it
	39	+ instead of reading it each time because the 'xsave' operation needed
	40	+ to access the bounds directory register is an expensive operation.
	41	+4) If the application needs to spill bounds out of the 4 registers, it
	42	+ issues a bndstx instruction. Since the bounds directory is empty at
	43	+ this point, a bounds fault (#BR) is raised, the kernel allocates a
	44	+ bounds table (in the user address space) and makes the relevant entry
	45	+ in the bounds directory point to the new table.
	46	+5) If the application violates the bounds specified in the bounds registers,
	47	+ a separate kind of #BR is raised which will deliver a signal with
	48	+ information about the violation in the 'struct siginfo'.
	49	+6) Whenever memory is freed, we know that it can no longer contain valid
	50	+ pointers, and we attempt to free the associated space in the bounds
	51	+ tables. If an entire table becomes unused, we will attempt to free
	52	+ the table and remove the entry in the directory.
	53	+
	54	+To summarize, there are essentially three things interacting here:
	55	+
	56	+GCC with -fmpx:
	57	+ * enables annotation of code with MPX instructions and prefixes
	58	+ * inserts code early in the application to call in to the "gcc runtime"
	59	+GCC MPX Runtime:
	60	+ * Checks for hardware MPX support in cpuid leaf
	61	+ * allocates virtual space for the bounds directory (malloc() essentially)
	62	+ * points the hardware BNDCFGU register at the directory
	63	+ * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
	64	+ start managing the bounds directories
	65	+Kernel MPX Code:
	66	+ * Checks for hardware MPX support in cpuid leaf
	67	+ * Handles #BR exceptions and sends SIGSEGV to the app when it violates
	68	+ bounds, like during a buffer overflow.
	69	+ * When bounds are spilled in to an unallocated bounds table, the kernel
	70	+ notices in the #BR exception, allocates the virtual space, then
	71	+ updates the bounds directory to point to the new table. It keeps
	72	+ special track of the memory with a VM_MPX flag.
	73	+ * Frees unused bounds tables at the time that the memory they described
	74	+ is unmapped.
	75	+
	76	+
	77	+3. How does MPX kernel code work
	78	+================================
	79	+
	80	+Handling #BR faults caused by MPX
	81	+---------------------------------
	82	+
	83	+When MPX is enabled, there are 2 new situations that can generate
	84	+#BR faults.
	85	+ * new bounds tables (BT) need to be allocated to save bounds.
	86	+ * bounds violation caused by MPX instructions.
	87	+
	88	+We hook #BR handler to handle these two new situations.
	89	+
	90	+On-demand kernel allocation of bounds tables
	91	+--------------------------------------------
	92	+
	93	+MPX only has 4 hardware registers for storing bounds information. If
	94	+MPX-enabled code needs more than these 4 registers, it needs to spill
	95	+them somewhere. It has two special instructions for this which allow
	96	+the bounds to be moved between the bounds registers and some new "bounds
	97	+tables".
	98	+
	99	+#BR exceptions are a new class of exceptions just for MPX. They are
	100	+similar conceptually to a page fault and will be raised by the MPX
	101	+hardware during both bounds violations or when the tables are not
	102	+present. The kernel handles those #BR exceptions for not-present tables
	103	+by carving the space out of the normal processes address space and then
	104	+pointing the bounds-directory over to it.
	105	+
	106	+The tables need to be accessed and controlled by userspace because
	107	+the instructions for moving bounds in and out of them are extremely
	108	+frequent. They potentially happen every time a register points to
	109	+memory. Any direct kernel involvement (like a syscall) to access the
	110	+tables would obviously destroy performance.
	111	+
	112	+Why not do this in userspace? MPX does not strictly require anything in
	113	+the kernel. It can theoretically be done completely from userspace. Here
	114	+are a few ways this could be done. We don't think any of them are practical
	115	+in the real-world, but here they are.
	116	+
	117	+Q: Can virtual space simply be reserved for the bounds tables so that we
	118	+ never have to allocate them?
	119	+A: MPX-enabled application will possibly create a lot of bounds tables in
	120	+ process address space to save bounds information. These tables can take
	121	+ up huge swaths of memory (as much as 80% of the memory on the system)
	122	+ even if we clean them up aggressively. In the worst-case scenario, the
	123	+ tables can be 4x the size of the data structure being tracked. IOW, a
	124	+ 1-page structure can require 4 bounds-table pages. An X-GB virtual
	125	+ area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
	126	+ If we were to preallocate them for the 128TB of user virtual address
	127	+ space, we would need to reserve 512TB+2GB, which is larger than the
	128	+ entire virtual address space today. This means they can not be reserved
	129	+ ahead of time. Also, a single process's pre-popualated bounds directory
	130	+ consumes 2GB of virtual AND physical memory. IOW, it's completely
	131	+ infeasible to prepopulate bounds directories.
	132	+
	133	+Q: Can we preallocate bounds table space at the same time memory is
	134	+ allocated which might contain pointers that might eventually need
	135	+ bounds tables?
	136	+A: This would work if we could hook the site of each and every memory
	137	+ allocation syscall. This can be done for small, constrained applications.
	138	+ But, it isn't practical at a larger scale since a given app has no
	139	+ way of controlling how all the parts of the app might allocate memory
	140	+ (think libraries). The kernel is really the only place to intercept
	141	+ these calls.
	142	+
	143	+Q: Could a bounds fault be handed to userspace and the tables allocated
	144	+ there in a signal handler intead of in the kernel?
	145	+A: mmap() is not on the list of safe async handler functions and even
	146	+ if mmap() would work it still requires locking or nasty tricks to
	147	+ keep track of the allocation state there.
	148	+
	149	+Having ruled out all of the userspace-only approaches for managing
	150	+bounds tables that we could think of, we create them on demand in
	151	+the kernel.
	152	+
	153	+Decoding MPX instructions
	154	+-------------------------
	155	+
	156	+If a #BR is generated due to a bounds violation caused by MPX.
	157	+We need to decode MPX instructions to get violation address and
	158	+set this address into extended struct siginfo.
	159	+
	160	+The _sigfault feild of struct siginfo is extended as follow:
	161	+
	162	+87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
	163	+88 struct {
	164	+89 void __user _addr; / faulting insn/memory ref. */
	165	+90 #ifdef __ARCH_SI_TRAPNO
	166	+91 int _trapno; /* TRAP # which caused the signal */
	167	+92 #endif
	168	+93 short _addr_lsb; /* LSB of the reported address */
	169	+94 struct {
	170	+95 void __user *_lower;
	171	+96 void __user *_upper;
	172	+97 } _addr_bnd;
	173	+98 } _sigfault;
	174	+
	175	+The '_addr' field refers to violation address, and new '_addr_and'
	176	+field refers to the upper/lower bounds when a #BR is caused.
	177	+
	178	+Glibc will be also updated to support this new siginfo. So user
	179	+can get violation address and bounds when bounds violations occur.
	180	+
	181	+Cleanup unused bounds tables
	182	+----------------------------
	183	+
	184	+When a BNDSTX instruction attempts to save bounds to a bounds directory
	185	+entry marked as invalid, a #BR is generated. This is an indication that
	186	+no bounds table exists for this entry. In this case the fault handler
	187	+will allocate a new bounds table on demand.
	188	+
	189	+Since the kernel allocated those tables on-demand without userspace
	190	+knowledge, it is also responsible for freeing them when the associated
	191	+mappings go away.
	192	+
	193	+Here, the solution for this issue is to hook do_munmap() to check
	194	+whether one process is MPX enabled. If yes, those bounds tables covered
	195	+in the virtual address region which is being unmapped will be freed also.
	196	+
	197	+Adding new prctl commands
	198	+-------------------------
	199	+
	200	+Two new prctl commands are added to enable and disable MPX bounds tables
	201	+management in kernel.
	202	+
	203	+155 #define PR_MPX_ENABLE_MANAGEMENT 43
	204	+156 #define PR_MPX_DISABLE_MANAGEMENT 44
	205	+
	206	+Runtime library in userspace is responsible for allocation of bounds
	207	+directory. So kernel have to use XSAVE instruction to get the base
	208	+of bounds directory from BNDCFG register.
	209	+
	210	+But XSAVE is expected to be very expensive. In order to do performance
	211	+optimization, we have to get the base of bounds directory and save it
	212	+into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
	213	+command execution.
	214	+
	215	+
	216	+4. Special rules
	217	+================
	218	+
	219	+1) If userspace is requesting help from the kernel to do the management
	220	+of bounds tables, it may not create or modify entries in the bounds directory.
	221	+
	222	+Certainly users can allocate bounds tables and forcibly point the bounds
	223	+directory at them through XSAVE instruction, and then set valid bit
	224	+of bounds entry to have this entry valid. But, the kernel will decline
	225	+to assist in managing these tables.
	226	+
	227	+2) Userspace may not take multiple bounds directory entries and point
	228	+them at the same bounds table.
	229	+
	230	+This is allowed architecturally. See more information "Intel(R) Architecture
	231	+Instruction Set Extensions Programming Reference" (9.3.4).
	232	+
	233	+However, if users did this, the kernel might be fooled in to unmaping an
	234	+in-use bounds table since it does not recognize sharing.