Blame view

Documentation/security/self-protection.rst 13.5 KB
c2ed67434   Kees Cook   doc: ReSTify self...
1
2
3
  ======================
  Kernel Self-Protection
  ======================
9f8036643   Kees Cook   doc: self-protect...
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  
  Kernel self-protection is the design and implementation of systems and
  structures within the Linux kernel to protect against security flaws in
  the kernel itself. This covers a wide range of issues, including removing
  entire classes of bugs, blocking security flaw exploitation methods,
  and actively detecting attack attempts. Not all topics are explored in
  this document, but it should serve as a reasonable starting point and
  answer any frequently asked questions. (Patches welcome, of course!)
  
  In the worst-case scenario, we assume an unprivileged local attacker
  has arbitrary read and write access to the kernel's memory. In many
  cases, bugs being exploited will not provide this level of access,
  but with systems in place that defend against the worst case we'll
  cover the more limited cases as well. A higher bar, and one that should
  still be kept in mind, is protecting the kernel against a _privileged_
  local attacker, since the root user has access to a vastly increased
  attack surface. (Especially when they have the ability to load arbitrary
  kernel modules.)
  
  The goals for successful self-protection systems would be that they
  are effective, on by default, require no opt-in by developers, have no
  performance impact, do not impede kernel debugging, and have tests. It
  is uncommon that all these goals can be met, but it is worth explicitly
  mentioning them, since these aspects need to be explored, dealt with,
  and/or accepted.
c2ed67434   Kees Cook   doc: ReSTify self...
29
30
  Attack Surface Reduction
  ========================
9f8036643   Kees Cook   doc: self-protect...
31
32
33
34
35
36
  
  The most fundamental defense against security exploits is to reduce the
  areas of the kernel that can be used to redirect execution. This ranges
  from limiting the exposed APIs available to userspace, making in-kernel
  APIs hard to use incorrectly, minimizing the areas of writable kernel
  memory, etc.
c2ed67434   Kees Cook   doc: ReSTify self...
37
38
  Strict kernel memory permissions
  --------------------------------
9f8036643   Kees Cook   doc: self-protect...
39
40
41
42
  
  When all of kernel memory is writable, it becomes trivial for attacks
  to redirect execution flow. To reduce the availability of these targets
  the kernel needs to protect its memory with a tight set of permissions.
c2ed67434   Kees Cook   doc: ReSTify self...
43
44
  Executable code and read-only data must not be writable
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f8036643   Kees Cook   doc: self-protect...
45
46
47
48
49
50
51
52
53
  
  Any areas of the kernel with executable memory must not be writable.
  While this obviously includes the kernel text itself, we must consider
  all additional places too: kernel modules, JIT memory, etc. (There are
  temporary exceptions to this rule to support things like instruction
  alternatives, breakpoints, kprobes, etc. If these must exist in a
  kernel, they are implemented in a way where the memory is temporarily
  made writable during the update, and then returned to the original
  permissions.)
c2ed67434   Kees Cook   doc: ReSTify self...
54
55
  In support of this are ``CONFIG_STRICT_KERNEL_RWX`` and
  ``CONFIG_STRICT_MODULE_RWX``, which seek to make sure that code is not
9f8036643   Kees Cook   doc: self-protect...
56
57
  writable, data is not executable, and read-only data is neither writable
  nor executable.
ad21fc4fa   Laura Abbott   arch: Move CONFIG...
58
59
60
  Most architectures have these options on by default and not user selectable.
  For some architectures like arm that wish to have these be selectable,
  the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable
c2ed67434   Kees Cook   doc: ReSTify self...
61
  a Kconfig prompt. ``CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT`` determines
ad21fc4fa   Laura Abbott   arch: Move CONFIG...
62
  the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled.
c2ed67434   Kees Cook   doc: ReSTify self...
63
64
  Function pointers and sensitive variables must not be writable
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f8036643   Kees Cook   doc: self-protect...
65
66
67
68
69
70
71
72
73
74
  
  Vast areas of kernel memory contain function pointers that are looked
  up by the kernel and used to continue execution (e.g. descriptor/vector
  tables, file/network/etc operation structures, etc). The number of these
  variables must be reduced to an absolute minimum.
  
  Many such variables can be made read-only by setting them "const"
  so that they live in the .rodata section instead of the .data section
  of the kernel, gaining the protection of the kernel's strict memory
  permissions as described above.
c2ed67434   Kees Cook   doc: ReSTify self...
75
76
  For variables that are initialized once at ``__init`` time, these can
  be marked with the (new and under development) ``__ro_after_init``
9f8036643   Kees Cook   doc: self-protect...
77
78
79
80
81
82
83
84
  attribute.
  
  What remains are variables that are updated rarely (e.g. GDT). These
  will need another infrastructure (similar to the temporary exceptions
  made to kernel code mentioned above) that allow them to spend the rest
  of their lifetime read-only. (For example, when being updated, only the
  CPU thread performing the update would be given uninterruptible write
  access to the memory.)
c2ed67434   Kees Cook   doc: ReSTify self...
85
86
  Segregation of kernel memory from userspace memory
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9f8036643   Kees Cook   doc: self-protect...
87
88
89
90
91
92
93
94
  
  The kernel must never execute userspace memory. The kernel must also never
  access userspace memory without explicit expectation to do so. These
  rules can be enforced either by support of hardware-based restrictions
  (x86's SMEP/SMAP, ARM's PXN/PAN) or via emulation (ARM's Memory Domains).
  By blocking userspace memory in this way, execution and data parsing
  cannot be passed to trivially-controlled userspace memory, forcing
  attacks to operate entirely in kernel memory.
c2ed67434   Kees Cook   doc: ReSTify self...
95
96
  Reduced access to syscalls
  --------------------------
9f8036643   Kees Cook   doc: self-protect...
97
98
  
  One trivial way to eliminate many syscalls for 64-bit systems is building
c2ed67434   Kees Cook   doc: ReSTify self...
99
  without ``CONFIG_COMPAT``. However, this is rarely a feasible scenario.
9f8036643   Kees Cook   doc: self-protect...
100
101
102
103
104
105
106
107
108
109
110
111
  
  The "seccomp" system provides an opt-in feature made available to
  userspace, which provides a way to reduce the number of kernel entry
  points available to a running process. This limits the breadth of kernel
  code that can be reached, possibly reducing the availability of a given
  bug to an attack.
  
  An area of improvement would be creating viable ways to keep access to
  things like compat, user namespaces, BPF creation, and perf limited only
  to trusted processes. This would keep the scope of kernel entry points
  restricted to the more regular set of normally available to unprivileged
  userspace.
c2ed67434   Kees Cook   doc: ReSTify self...
112
113
  Restricting access to kernel modules
  ------------------------------------
9f8036643   Kees Cook   doc: self-protect...
114
115
116
117
118
119
120
121
122
123
124
125
126
127
  
  The kernel should never allow an unprivileged user the ability to
  load specific kernel modules, since that would provide a facility to
  unexpectedly extend the available attack surface. (The on-demand loading
  of modules via their predefined subsystems, e.g. MODULE_ALIAS_*, is
  considered "expected" here, though additional consideration should be
  given even to these.) For example, loading a filesystem module via an
  unprivileged socket API is nonsense: only the root or physically local
  user should trigger filesystem module loading. (And even this can be up
  for debate in some scenarios.)
  
  To protect against even privileged users, systems may need to either
  disable module loading entirely (e.g. monolithic kernel builds or
  modules_disabled sysctl), or provide signed modules (e.g.
c2ed67434   Kees Cook   doc: ReSTify self...
128
  ``CONFIG_MODULE_SIG_FORCE``, or dm-crypt with LoadPin), to keep from having
9f8036643   Kees Cook   doc: self-protect...
129
  root load arbitrary kernel code via the module loader interface.
c2ed67434   Kees Cook   doc: ReSTify self...
130
131
  Memory integrity
  ================
9f8036643   Kees Cook   doc: self-protect...
132
133
134
135
136
137
  
  There are many memory structures in the kernel that are regularly abused
  to gain execution control during an attack, By far the most commonly
  understood is that of the stack buffer overflow in which the return
  address stored on the stack is overwritten. Many other examples of this
  kind of attack exist, and protections exist to defend against them.
c2ed67434   Kees Cook   doc: ReSTify self...
138
139
  Stack buffer overflow
  ---------------------
9f8036643   Kees Cook   doc: self-protect...
140
141
142
143
144
  
  The classic stack buffer overflow involves writing past the expected end
  of a variable stored on the stack, ultimately writing a controlled value
  to the stack frame's stored return address. The most widely used defense
  is the presence of a stack canary between the stack variables and the
050e9baa9   Linus Torvalds   Kbuild: rename CC...
145
  return address (``CONFIG_STACKPROTECTOR``), which is verified just before
9f8036643   Kees Cook   doc: self-protect...
146
  the function returns. Other defenses include things like shadow stacks.
c2ed67434   Kees Cook   doc: ReSTify self...
147
148
  Stack depth overflow
  --------------------
9f8036643   Kees Cook   doc: self-protect...
149
150
151
152
153
154
155
156
  
  A less well understood attack is using a bug that triggers the
  kernel to consume stack memory with deep function calls or large stack
  allocations. With this attack it is possible to write beyond the end of
  the kernel's preallocated stack space and into sensitive structures. Two
  important changes need to be made for better protections: moving the
  sensitive thread_info structure elsewhere, and adding a faulting memory
  hole at the bottom of the stack to catch these overflows.
c2ed67434   Kees Cook   doc: ReSTify self...
157
158
  Heap memory integrity
  ---------------------
9f8036643   Kees Cook   doc: self-protect...
159
160
161
162
  
  The structures used to track heap free lists can be sanity-checked during
  allocation and freeing to make sure they aren't being used to manipulate
  other memory areas.
c2ed67434   Kees Cook   doc: ReSTify self...
163
164
  Counter integrity
  -----------------
9f8036643   Kees Cook   doc: self-protect...
165
166
167
168
169
  
  Many places in the kernel use atomic counters to track object references
  or perform similar lifetime management. When these counters can be made
  to wrap (over or under) this traditionally exposes a use-after-free
  flaw. By trapping atomic wrapping, this class of bug vanishes.
c2ed67434   Kees Cook   doc: ReSTify self...
170
171
  Size calculation overflow detection
  -----------------------------------
9f8036643   Kees Cook   doc: self-protect...
172
173
174
175
  
  Similar to counter overflow, integer overflows (usually size calculations)
  need to be detected at runtime to kill this class of bug, which
  traditionally leads to being able to write past the end of kernel buffers.
c2ed67434   Kees Cook   doc: ReSTify self...
176
177
  Probabilistic defenses
  ======================
9f8036643   Kees Cook   doc: self-protect...
178
179
180
181
182
183
  
  While many protections can be considered deterministic (e.g. read-only
  memory cannot be written to), some protections provide only statistical
  defense, in that an attack must gather enough information about a
  running system to overcome the defense. While not perfect, these do
  provide meaningful defenses.
c2ed67434   Kees Cook   doc: ReSTify self...
184
185
  Canaries, blinding, and other secrets
  -------------------------------------
9f8036643   Kees Cook   doc: self-protect...
186
187
  
  It should be noted that things like the stack canary discussed earlier
c9de4a82c   Kees Cook   docs: self-protec...
188
189
190
  are technically statistical defenses, since they rely on a secret value,
  and such values may become discoverable through an information exposure
  flaw.
9f8036643   Kees Cook   doc: self-protect...
191
192
193
194
195
196
197
198
  
  Blinding literal values for things like JITs, where the executable
  contents may be partially under the control of userspace, need a similar
  secret value.
  
  It is critical that the secret values used must be separate (e.g.
  different canary per stack) and high entropy (e.g. is the RNG actually
  working?) in order to maximize their success.
c2ed67434   Kees Cook   doc: ReSTify self...
199
200
  Kernel Address Space Layout Randomization (KASLR)
  -------------------------------------------------
9f8036643   Kees Cook   doc: self-protect...
201
202
203
204
  
  Since the location of kernel memory is almost always instrumental in
  mounting a successful attack, making the location non-deterministic
  raises the difficulty of an exploit. (Note that this in turn makes
c9de4a82c   Kees Cook   docs: self-protec...
205
206
  the value of information exposures higher, since they may be used to
  discover desired memory locations.)
9f8036643   Kees Cook   doc: self-protect...
207

c2ed67434   Kees Cook   doc: ReSTify self...
208
209
  Text and module base
  ~~~~~~~~~~~~~~~~~~~~
9f8036643   Kees Cook   doc: self-protect...
210
211
  
  By relocating the physical and virtual base address of the kernel at
c2ed67434   Kees Cook   doc: ReSTify self...
212
  boot-time (``CONFIG_RANDOMIZE_BASE``), attacks needing kernel code will be
9f8036643   Kees Cook   doc: self-protect...
213
214
215
216
  frustrated. Additionally, offsetting the module loading base address
  means that even systems that load the same set of modules in the same
  order every boot will not share a common base address with the rest of
  the kernel text.
c2ed67434   Kees Cook   doc: ReSTify self...
217
218
  Stack base
  ~~~~~~~~~~
9f8036643   Kees Cook   doc: self-protect...
219
220
221
222
  
  If the base address of the kernel stack is not the same between processes,
  or even not the same between syscalls, targets on or beyond the stack
  become more difficult to locate.
c2ed67434   Kees Cook   doc: ReSTify self...
223
224
  Dynamic memory base
  ~~~~~~~~~~~~~~~~~~~
9f8036643   Kees Cook   doc: self-protect...
225
226
227
228
  
  Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up
  being relatively deterministic in layout due to the order of early-boot
  initializations. If the base address of these areas is not the same
c9de4a82c   Kees Cook   docs: self-protec...
229
230
  between boots, targeting them is frustrated, requiring an information
  exposure specific to the region.
c2ed67434   Kees Cook   doc: ReSTify self...
231
232
  Structure layout
  ~~~~~~~~~~~~~~~~
c9de4a82c   Kees Cook   docs: self-protec...
233
234
235
236
237
  
  By performing a per-build randomization of the layout of sensitive
  structures, attacks must either be tuned to known kernel builds or expose
  enough kernel memory to determine structure layouts before manipulating
  them.
9f8036643   Kees Cook   doc: self-protect...
238

c2ed67434   Kees Cook   doc: ReSTify self...
239
240
  Preventing Information Exposures
  ================================
9f8036643   Kees Cook   doc: self-protect...
241
242
  
  Since the locations of sensitive structures are the primary target for
c9de4a82c   Kees Cook   docs: self-protec...
243
  attacks, it is important to defend against exposure of both kernel memory
9f8036643   Kees Cook   doc: self-protect...
244
245
  addresses and kernel memory contents (since they may contain kernel
  addresses or other sensitive things like canary values).
227d1a61e   Tobin C. Harding   doc: add document...
246
247
248
249
250
251
252
253
254
255
256
257
258
259
  Kernel addresses
  ----------------
  
  Printing kernel addresses to userspace leaks sensitive information about
  the kernel memory layout. Care should be exercised when using any printk
  specifier that prints the raw address, currently %px, %p[ad], (and %p[sSb]
  in certain circumstances [*]).  Any file written to using one of these
  specifiers should be readable only by privileged processes.
  
  Kernels 4.14 and older printed the raw address using %p. As of 4.15-rc1
  addresses printed with the specifier %p are hashed before printing.
  
  [*] If KALLSYMS is enabled and symbol lookup fails, the raw address is
  printed. If KALLSYMS is not enabled the raw address is printed.
c2ed67434   Kees Cook   doc: ReSTify self...
260
261
  Unique identifiers
  ------------------
9f8036643   Kees Cook   doc: self-protect...
262
263
264
265
  
  Kernel memory addresses must never be used as identifiers exposed to
  userspace. Instead, use an atomic counter, an idr, or similar unique
  identifier.
c2ed67434   Kees Cook   doc: ReSTify self...
266
267
  Memory initialization
  ---------------------
9f8036643   Kees Cook   doc: self-protect...
268
269
270
271
  
  Memory copied to userspace must always be fully initialized. If not
  explicitly memset(), this will require changes to the compiler to make
  sure structure holes are cleared.
c2ed67434   Kees Cook   doc: ReSTify self...
272
273
  Memory poisoning
  ----------------
9f8036643   Kees Cook   doc: self-protect...
274

ed535a2da   Alexander Popov   doc: self-protect...
275
276
277
278
279
  When releasing memory, it is best to poison the contents, to avoid reuse
  attacks that rely on the old contents of memory. E.g., clear stack on a
  syscall return (``CONFIG_GCC_PLUGIN_STACKLEAK``), wipe heap memory on a
  free. This frustrates many uninitialized variable attacks, stack content
  exposures, heap content exposures, and use-after-free attacks.
9f8036643   Kees Cook   doc: self-protect...
280

c2ed67434   Kees Cook   doc: ReSTify self...
281
282
  Destination tracking
  --------------------
9f8036643   Kees Cook   doc: self-protect...
283
284
285
  
  To help kill classes of bugs that result in kernel addresses being
  written to userspace, the destination of writes needs to be tracked. If
c2ed67434   Kees Cook   doc: ReSTify self...
286
  the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files),
9f8036643   Kees Cook   doc: self-protect...
287
  it should automatically censor sensitive values.