Commit 5776563648f6437ede91c91cbad85862ca682b0b

Authored by Qiaowei Ren
Committed by Thomas Gleixner
1 parent 1de4fa14ee

x86, mpx: Add documentation on Intel MPX

This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151832.7FDB1720@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Showing 1 changed file with 234 additions and 0 deletions Side-by-side Diff

Documentation/x86/intel_mpx.txt
  1 +1. Intel(R) MPX Overview
  2 +========================
  3 +
  4 +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
  5 +introduced into Intel Architecture. Intel MPX provides hardware features
  6 +that can be used in conjunction with compiler changes to check memory
  7 +references, for those references whose compile-time normal intentions are
  8 +usurped at runtime due to buffer overflow or underflow.
  9 +
  10 +For more information, please refer to Intel(R) Architecture Instruction
  11 +Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
  12 +Extensions.
  13 +
  14 +Note: Currently no hardware with MPX ISA is available but it is always
  15 +possible to use SDE (Intel(R) Software Development Emulator) instead, which
  16 +can be downloaded from
  17 +http://software.intel.com/en-us/articles/intel-software-development-emulator
  18 +
  19 +
  20 +2. How to get the advantage of MPX
  21 +==================================
  22 +
  23 +For MPX to work, changes are required in the kernel, binutils and compiler.
  24 +No source changes are required for applications, just a recompile.
  25 +
  26 +There are a lot of moving parts of this to all work right. The following
  27 +is how we expect the compiler, application and kernel to work together.
  28 +
  29 +1) Application developer compiles with -fmpx. The compiler will add the
  30 + instrumentation as well as some setup code called early after the app
  31 + starts. New instruction prefixes are noops for old CPUs.
  32 +2) That setup code allocates (virtual) space for the "bounds directory",
  33 + points the "bndcfgu" register to the directory and notifies the kernel
  34 + (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using
  35 + MPX.
  36 +3) The kernel detects that the CPU has MPX, allows the new prctl() to
  37 + succeed, and notes the location of the bounds directory. Userspace is
  38 + expected to keep the bounds directory at that locationWe note it
  39 + instead of reading it each time because the 'xsave' operation needed
  40 + to access the bounds directory register is an expensive operation.
  41 +4) If the application needs to spill bounds out of the 4 registers, it
  42 + issues a bndstx instruction. Since the bounds directory is empty at
  43 + this point, a bounds fault (#BR) is raised, the kernel allocates a
  44 + bounds table (in the user address space) and makes the relevant entry
  45 + in the bounds directory point to the new table.
  46 +5) If the application violates the bounds specified in the bounds registers,
  47 + a separate kind of #BR is raised which will deliver a signal with
  48 + information about the violation in the 'struct siginfo'.
  49 +6) Whenever memory is freed, we know that it can no longer contain valid
  50 + pointers, and we attempt to free the associated space in the bounds
  51 + tables. If an entire table becomes unused, we will attempt to free
  52 + the table and remove the entry in the directory.
  53 +
  54 +To summarize, there are essentially three things interacting here:
  55 +
  56 +GCC with -fmpx:
  57 + * enables annotation of code with MPX instructions and prefixes
  58 + * inserts code early in the application to call in to the "gcc runtime"
  59 +GCC MPX Runtime:
  60 + * Checks for hardware MPX support in cpuid leaf
  61 + * allocates virtual space for the bounds directory (malloc() essentially)
  62 + * points the hardware BNDCFGU register at the directory
  63 + * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
  64 + start managing the bounds directories
  65 +Kernel MPX Code:
  66 + * Checks for hardware MPX support in cpuid leaf
  67 + * Handles #BR exceptions and sends SIGSEGV to the app when it violates
  68 + bounds, like during a buffer overflow.
  69 + * When bounds are spilled in to an unallocated bounds table, the kernel
  70 + notices in the #BR exception, allocates the virtual space, then
  71 + updates the bounds directory to point to the new table. It keeps
  72 + special track of the memory with a VM_MPX flag.
  73 + * Frees unused bounds tables at the time that the memory they described
  74 + is unmapped.
  75 +
  76 +
  77 +3. How does MPX kernel code work
  78 +================================
  79 +
  80 +Handling #BR faults caused by MPX
  81 +---------------------------------
  82 +
  83 +When MPX is enabled, there are 2 new situations that can generate
  84 +#BR faults.
  85 + * new bounds tables (BT) need to be allocated to save bounds.
  86 + * bounds violation caused by MPX instructions.
  87 +
  88 +We hook #BR handler to handle these two new situations.
  89 +
  90 +On-demand kernel allocation of bounds tables
  91 +--------------------------------------------
  92 +
  93 +MPX only has 4 hardware registers for storing bounds information. If
  94 +MPX-enabled code needs more than these 4 registers, it needs to spill
  95 +them somewhere. It has two special instructions for this which allow
  96 +the bounds to be moved between the bounds registers and some new "bounds
  97 +tables".
  98 +
  99 +#BR exceptions are a new class of exceptions just for MPX. They are
  100 +similar conceptually to a page fault and will be raised by the MPX
  101 +hardware during both bounds violations or when the tables are not
  102 +present. The kernel handles those #BR exceptions for not-present tables
  103 +by carving the space out of the normal processes address space and then
  104 +pointing the bounds-directory over to it.
  105 +
  106 +The tables need to be accessed and controlled by userspace because
  107 +the instructions for moving bounds in and out of them are extremely
  108 +frequent. They potentially happen every time a register points to
  109 +memory. Any direct kernel involvement (like a syscall) to access the
  110 +tables would obviously destroy performance.
  111 +
  112 +Why not do this in userspace? MPX does not strictly require anything in
  113 +the kernel. It can theoretically be done completely from userspace. Here
  114 +are a few ways this could be done. We don't think any of them are practical
  115 +in the real-world, but here they are.
  116 +
  117 +Q: Can virtual space simply be reserved for the bounds tables so that we
  118 + never have to allocate them?
  119 +A: MPX-enabled application will possibly create a lot of bounds tables in
  120 + process address space to save bounds information. These tables can take
  121 + up huge swaths of memory (as much as 80% of the memory on the system)
  122 + even if we clean them up aggressively. In the worst-case scenario, the
  123 + tables can be 4x the size of the data structure being tracked. IOW, a
  124 + 1-page structure can require 4 bounds-table pages. An X-GB virtual
  125 + area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
  126 + If we were to preallocate them for the 128TB of user virtual address
  127 + space, we would need to reserve 512TB+2GB, which is larger than the
  128 + entire virtual address space today. This means they can not be reserved
  129 + ahead of time. Also, a single process's pre-popualated bounds directory
  130 + consumes 2GB of virtual *AND* physical memory. IOW, it's completely
  131 + infeasible to prepopulate bounds directories.
  132 +
  133 +Q: Can we preallocate bounds table space at the same time memory is
  134 + allocated which might contain pointers that might eventually need
  135 + bounds tables?
  136 +A: This would work if we could hook the site of each and every memory
  137 + allocation syscall. This can be done for small, constrained applications.
  138 + But, it isn't practical at a larger scale since a given app has no
  139 + way of controlling how all the parts of the app might allocate memory
  140 + (think libraries). The kernel is really the only place to intercept
  141 + these calls.
  142 +
  143 +Q: Could a bounds fault be handed to userspace and the tables allocated
  144 + there in a signal handler intead of in the kernel?
  145 +A: mmap() is not on the list of safe async handler functions and even
  146 + if mmap() would work it still requires locking or nasty tricks to
  147 + keep track of the allocation state there.
  148 +
  149 +Having ruled out all of the userspace-only approaches for managing
  150 +bounds tables that we could think of, we create them on demand in
  151 +the kernel.
  152 +
  153 +Decoding MPX instructions
  154 +-------------------------
  155 +
  156 +If a #BR is generated due to a bounds violation caused by MPX.
  157 +We need to decode MPX instructions to get violation address and
  158 +set this address into extended struct siginfo.
  159 +
  160 +The _sigfault feild of struct siginfo is extended as follow:
  161 +
  162 +87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
  163 +88 struct {
  164 +89 void __user *_addr; /* faulting insn/memory ref. */
  165 +90 #ifdef __ARCH_SI_TRAPNO
  166 +91 int _trapno; /* TRAP # which caused the signal */
  167 +92 #endif
  168 +93 short _addr_lsb; /* LSB of the reported address */
  169 +94 struct {
  170 +95 void __user *_lower;
  171 +96 void __user *_upper;
  172 +97 } _addr_bnd;
  173 +98 } _sigfault;
  174 +
  175 +The '_addr' field refers to violation address, and new '_addr_and'
  176 +field refers to the upper/lower bounds when a #BR is caused.
  177 +
  178 +Glibc will be also updated to support this new siginfo. So user
  179 +can get violation address and bounds when bounds violations occur.
  180 +
  181 +Cleanup unused bounds tables
  182 +----------------------------
  183 +
  184 +When a BNDSTX instruction attempts to save bounds to a bounds directory
  185 +entry marked as invalid, a #BR is generated. This is an indication that
  186 +no bounds table exists for this entry. In this case the fault handler
  187 +will allocate a new bounds table on demand.
  188 +
  189 +Since the kernel allocated those tables on-demand without userspace
  190 +knowledge, it is also responsible for freeing them when the associated
  191 +mappings go away.
  192 +
  193 +Here, the solution for this issue is to hook do_munmap() to check
  194 +whether one process is MPX enabled. If yes, those bounds tables covered
  195 +in the virtual address region which is being unmapped will be freed also.
  196 +
  197 +Adding new prctl commands
  198 +-------------------------
  199 +
  200 +Two new prctl commands are added to enable and disable MPX bounds tables
  201 +management in kernel.
  202 +
  203 +155 #define PR_MPX_ENABLE_MANAGEMENT 43
  204 +156 #define PR_MPX_DISABLE_MANAGEMENT 44
  205 +
  206 +Runtime library in userspace is responsible for allocation of bounds
  207 +directory. So kernel have to use XSAVE instruction to get the base
  208 +of bounds directory from BNDCFG register.
  209 +
  210 +But XSAVE is expected to be very expensive. In order to do performance
  211 +optimization, we have to get the base of bounds directory and save it
  212 +into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
  213 +command execution.
  214 +
  215 +
  216 +4. Special rules
  217 +================
  218 +
  219 +1) If userspace is requesting help from the kernel to do the management
  220 +of bounds tables, it may not create or modify entries in the bounds directory.
  221 +
  222 +Certainly users can allocate bounds tables and forcibly point the bounds
  223 +directory at them through XSAVE instruction, and then set valid bit
  224 +of bounds entry to have this entry valid. But, the kernel will decline
  225 +to assist in managing these tables.
  226 +
  227 +2) Userspace may not take multiple bounds directory entries and point
  228 +them at the same bounds table.
  229 +
  230 +This is allowed architecturally. See more information "Intel(R) Architecture
  231 +Instruction Set Extensions Programming Reference" (9.3.4).
  232 +
  233 +However, if users did this, the kernel might be fooled in to unmaping an
  234 +in-use bounds table since it does not recognize sharing.