Commit b26486bf75148ab7b776c6a532a9bad33f987a38
Committed by
Ingo Molnar
1 parent
c0f7ac3a9e
Exists in
master
and in
7 other branches
kprobes: Add documents of jump optimization
Add documentations about kprobe jump optimization to Documentation/kprobes.txt. Changes in v10: - Editorial fixups by Jim Keniston. Changes in v8: - Update documentation and benchmark results. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Anders Kaseorg <andersk@ksplice.com> Cc: Tim Abbott <tabbott@ksplice.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20100225133504.6725.79395.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Showing 1 changed file with 195 additions and 12 deletions Side-by-side Diff
Documentation/kprobes.txt
1 | 1 | Title : Kernel Probes (Kprobes) |
2 | 2 | Authors : Jim Keniston <jkenisto@us.ibm.com> |
3 | - : Prasanna S Panchamukhi <prasanna@in.ibm.com> | |
3 | + : Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com> | |
4 | + : Masami Hiramatsu <mhiramat@redhat.com> | |
4 | 5 | |
5 | 6 | CONTENTS |
6 | 7 | |
... | ... | @@ -15,6 +16,7 @@ |
15 | 16 | 9. Jprobes Example |
16 | 17 | 10. Kretprobes Example |
17 | 18 | Appendix A: The kprobes debugfs interface |
19 | +Appendix B: The kprobes sysctl interface | |
18 | 20 | |
19 | 21 | 1. Concepts: Kprobes, Jprobes, Return Probes |
20 | 22 | |
... | ... | @@ -42,13 +44,13 @@ |
42 | 44 | can speed up unregistration process when you have to unregister |
43 | 45 | a lot of probes at once. |
44 | 46 | |
45 | -The next three subsections explain how the different types of | |
46 | -probes work. They explain certain things that you'll need to | |
47 | -know in order to make the best use of Kprobes -- e.g., the | |
48 | -difference between a pre_handler and a post_handler, and how | |
49 | -to use the maxactive and nmissed fields of a kretprobe. But | |
50 | -if you're in a hurry to start using Kprobes, you can skip ahead | |
51 | -to section 2. | |
47 | +The next four subsections explain how the different types of | |
48 | +probes work and how jump optimization works. They explain certain | |
49 | +things that you'll need to know in order to make the best use of | |
50 | +Kprobes -- e.g., the difference between a pre_handler and | |
51 | +a post_handler, and how to use the maxactive and nmissed fields of | |
52 | +a kretprobe. But if you're in a hurry to start using Kprobes, you | |
53 | +can skip ahead to section 2. | |
52 | 54 | |
53 | 55 | 1.1 How Does a Kprobe Work? |
54 | 56 | |
55 | 57 | |
... | ... | @@ -161,13 +163,125 @@ |
161 | 163 | object available, then in addition to incrementing the nmissed count, |
162 | 164 | the user entry_handler invocation is also skipped. |
163 | 165 | |
166 | +1.4 How Does Jump Optimization Work? | |
167 | + | |
168 | +If you configured your kernel with CONFIG_OPTPROBES=y (currently | |
169 | +this option is supported on x86/x86-64, non-preemptive kernel) and | |
170 | +the "debug.kprobes_optimization" kernel parameter is set to 1 (see | |
171 | +sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump | |
172 | +instruction instead of a breakpoint instruction at each probepoint. | |
173 | + | |
174 | +1.4.1 Init a Kprobe | |
175 | + | |
176 | +When a probe is registered, before attempting this optimization, | |
177 | +Kprobes inserts an ordinary, breakpoint-based kprobe at the specified | |
178 | +address. So, even if it's not possible to optimize this particular | |
179 | +probepoint, there'll be a probe there. | |
180 | + | |
181 | +1.4.2 Safety Check | |
182 | + | |
183 | +Before optimizing a probe, Kprobes performs the following safety checks: | |
184 | + | |
185 | +- Kprobes verifies that the region that will be replaced by the jump | |
186 | +instruction (the "optimized region") lies entirely within one function. | |
187 | +(A jump instruction is multiple bytes, and so may overlay multiple | |
188 | +instructions.) | |
189 | + | |
190 | +- Kprobes analyzes the entire function and verifies that there is no | |
191 | +jump into the optimized region. Specifically: | |
192 | + - the function contains no indirect jump; | |
193 | + - the function contains no instruction that causes an exception (since | |
194 | + the fixup code triggered by the exception could jump back into the | |
195 | + optimized region -- Kprobes checks the exception tables to verify this); | |
196 | + and | |
197 | + - there is no near jump to the optimized region (other than to the first | |
198 | + byte). | |
199 | + | |
200 | +- For each instruction in the optimized region, Kprobes verifies that | |
201 | +the instruction can be executed out of line. | |
202 | + | |
203 | +1.4.3 Preparing Detour Buffer | |
204 | + | |
205 | +Next, Kprobes prepares a "detour" buffer, which contains the following | |
206 | +instruction sequence: | |
207 | +- code to push the CPU's registers (emulating a breakpoint trap) | |
208 | +- a call to the trampoline code which calls user's probe handlers. | |
209 | +- code to restore registers | |
210 | +- the instructions from the optimized region | |
211 | +- a jump back to the original execution path. | |
212 | + | |
213 | +1.4.4 Pre-optimization | |
214 | + | |
215 | +After preparing the detour buffer, Kprobes verifies that none of the | |
216 | +following situations exist: | |
217 | +- The probe has either a break_handler (i.e., it's a jprobe) or a | |
218 | +post_handler. | |
219 | +- Other instructions in the optimized region are probed. | |
220 | +- The probe is disabled. | |
221 | +In any of the above cases, Kprobes won't start optimizing the probe. | |
222 | +Since these are temporary situations, Kprobes tries to start | |
223 | +optimizing it again if the situation is changed. | |
224 | + | |
225 | +If the kprobe can be optimized, Kprobes enqueues the kprobe to an | |
226 | +optimizing list, and kicks the kprobe-optimizer workqueue to optimize | |
227 | +it. If the to-be-optimized probepoint is hit before being optimized, | |
228 | +Kprobes returns control to the original instruction path by setting | |
229 | +the CPU's instruction pointer to the copied code in the detour buffer | |
230 | +-- thus at least avoiding the single-step. | |
231 | + | |
232 | +1.4.5 Optimization | |
233 | + | |
234 | +The Kprobe-optimizer doesn't insert the jump instruction immediately; | |
235 | +rather, it calls synchronize_sched() for safety first, because it's | |
236 | +possible for a CPU to be interrupted in the middle of executing the | |
237 | +optimized region(*). As you know, synchronize_sched() can ensure | |
238 | +that all interruptions that were active when synchronize_sched() | |
239 | +was called are done, but only if CONFIG_PREEMPT=n. So, this version | |
240 | +of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**) | |
241 | + | |
242 | +After that, the Kprobe-optimizer calls stop_machine() to replace | |
243 | +the optimized region with a jump instruction to the detour buffer, | |
244 | +using text_poke_smp(). | |
245 | + | |
246 | +1.4.6 Unoptimization | |
247 | + | |
248 | +When an optimized kprobe is unregistered, disabled, or blocked by | |
249 | +another kprobe, it will be unoptimized. If this happens before | |
250 | +the optimization is complete, the kprobe is just dequeued from the | |
251 | +optimized list. If the optimization has been done, the jump is | |
252 | +replaced with the original code (except for an int3 breakpoint in | |
253 | +the first byte) by using text_poke_smp(). | |
254 | + | |
255 | +(*)Please imagine that the 2nd instruction is interrupted and then | |
256 | +the optimizer replaces the 2nd instruction with the jump *address* | |
257 | +while the interrupt handler is running. When the interrupt | |
258 | +returns to original address, there is no valid instruction, | |
259 | +and it causes an unexpected result. | |
260 | + | |
261 | +(**)This optimization-safety checking may be replaced with the | |
262 | +stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y | |
263 | +kernel. | |
264 | + | |
265 | +NOTE for geeks: | |
266 | +The jump optimization changes the kprobe's pre_handler behavior. | |
267 | +Without optimization, the pre_handler can change the kernel's execution | |
268 | +path by changing regs->ip and returning 1. However, when the probe | |
269 | +is optimized, that modification is ignored. Thus, if you want to | |
270 | +tweak the kernel's execution path, you need to suppress optimization, | |
271 | +using one of the following techniques: | |
272 | +- Specify an empty function for the kprobe's post_handler or break_handler. | |
273 | + or | |
274 | +- Config CONFIG_OPTPROBES=n. | |
275 | + or | |
276 | +- Execute 'sysctl -w debug.kprobes_optimization=n' | |
277 | + | |
164 | 278 | 2. Architectures Supported |
165 | 279 | |
166 | 280 | Kprobes, jprobes, and return probes are implemented on the following |
167 | 281 | architectures: |
168 | 282 | |
169 | -- i386 | |
170 | -- x86_64 (AMD-64, EM64T) | |
283 | +- i386 (Supports jump optimization) | |
284 | +- x86_64 (AMD-64, EM64T) (Supports jump optimization) | |
171 | 285 | - ppc64 |
172 | 286 | - ia64 (Does not support probes on instruction slot1.) |
173 | 287 | - sparc64 (Return probes not yet implemented.) |
... | ... | @@ -193,6 +307,10 @@ |
193 | 307 | so you can use "objdump -d -l vmlinux" to see the source-to-object |
194 | 308 | code mapping. |
195 | 309 | |
310 | +If you want to reduce probing overhead, set "Kprobes jump optimization | |
311 | +support" (CONFIG_OPTPROBES) to "y". You can find this option under the | |
312 | +"Kprobes" line. | |
313 | + | |
196 | 314 | 4. API Reference |
197 | 315 | |
198 | 316 | The Kprobes API includes a "register" function and an "unregister" |
... | ... | @@ -389,7 +507,10 @@ |
389 | 507 | |
390 | 508 | Kprobes allows multiple probes at the same address. Currently, |
391 | 509 | however, there cannot be multiple jprobes on the same function at |
392 | -the same time. | |
510 | +the same time. Also, a probepoint for which there is a jprobe or | |
511 | +a post_handler cannot be optimized. So if you install a jprobe, | |
512 | +or a kprobe with a post_handler, at an optimized probepoint, the | |
513 | +probepoint will be unoptimized automatically. | |
393 | 514 | |
394 | 515 | In general, you can install a probe anywhere in the kernel. |
395 | 516 | In particular, you can probe interrupt handlers. Known exceptions |
... | ... | @@ -453,6 +574,38 @@ |
453 | 574 | on the x86_64 version of __switch_to(); the registration functions |
454 | 575 | return -EINVAL. |
455 | 576 | |
577 | +On x86/x86-64, since the Jump Optimization of Kprobes modifies | |
578 | +instructions widely, there are some limitations to optimization. To | |
579 | +explain it, we introduce some terminology. Imagine a 3-instruction | |
580 | +sequence consisting of a two 2-byte instructions and one 3-byte | |
581 | +instruction. | |
582 | + | |
583 | + IA | |
584 | + | | |
585 | +[-2][-1][0][1][2][3][4][5][6][7] | |
586 | + [ins1][ins2][ ins3 ] | |
587 | + [<- DCR ->] | |
588 | + [<- JTPR ->] | |
589 | + | |
590 | +ins1: 1st Instruction | |
591 | +ins2: 2nd Instruction | |
592 | +ins3: 3rd Instruction | |
593 | +IA: Insertion Address | |
594 | +JTPR: Jump Target Prohibition Region | |
595 | +DCR: Detoured Code Region | |
596 | + | |
597 | +The instructions in DCR are copied to the out-of-line buffer | |
598 | +of the kprobe, because the bytes in DCR are replaced by | |
599 | +a 5-byte jump instruction. So there are several limitations. | |
600 | + | |
601 | +a) The instructions in DCR must be relocatable. | |
602 | +b) The instructions in DCR must not include a call instruction. | |
603 | +c) JTPR must not be targeted by any jump or call instruction. | |
604 | +d) DCR must not straddle the border betweeen functions. | |
605 | + | |
606 | +Anyway, these limitations are checked by the in-kernel instruction | |
607 | +decoder, so you don't need to worry about that. | |
608 | + | |
456 | 609 | 6. Probe Overhead |
457 | 610 | |
458 | 611 | On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 |
... | ... | @@ -476,6 +629,19 @@ |
476 | 629 | ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) |
477 | 630 | k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 |
478 | 631 | |
632 | +6.1 Optimized Probe Overhead | |
633 | + | |
634 | +Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to | |
635 | +process. Here are sample overhead figures (in usec) for x86 architectures. | |
636 | +k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe, | |
637 | +r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe. | |
638 | + | |
639 | +i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips | |
640 | +k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33 | |
641 | + | |
642 | +x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips | |
643 | +k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30 | |
644 | + | |
479 | 645 | 7. TODO |
480 | 646 | |
481 | 647 | a. SystemTap (http://sourceware.org/systemtap): Provides a simplified |
... | ... | @@ -523,7 +689,8 @@ |
523 | 689 | a virtual address that is no longer valid (module init sections, module |
524 | 690 | virtual addresses that correspond to modules that've been unloaded), |
525 | 691 | such probes are marked with [GONE]. If the probe is temporarily disabled, |
526 | -such probes are marked with [DISABLED]. | |
692 | +such probes are marked with [DISABLED]. If the probe is optimized, it is | |
693 | +marked with [OPTIMIZED]. | |
527 | 694 | |
528 | 695 | /sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly. |
529 | 696 | |
... | ... | @@ -533,4 +700,19 @@ |
533 | 700 | file. Note that this knob just disarms and arms all kprobes and doesn't |
534 | 701 | change each probe's disabling state. This means that disabled kprobes (marked |
535 | 702 | [DISABLED]) will be not enabled if you turn ON all kprobes by this knob. |
703 | + | |
704 | + | |
705 | +Appendix B: The kprobes sysctl interface | |
706 | + | |
707 | +/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF. | |
708 | + | |
709 | +When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides | |
710 | +a knob to globally and forcibly turn jump optimization (see section | |
711 | +1.4) ON or OFF. By default, jump optimization is allowed (ON). | |
712 | +If you echo "0" to this file or set "debug.kprobes_optimization" to | |
713 | +0 via sysctl, all optimized probes will be unoptimized, and any new | |
714 | +probes registered after that will not be optimized. Note that this | |
715 | +knob *changes* the optimized state. This means that optimized probes | |
716 | +(marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be | |
717 | +removed). If the knob is turned on, they will be optimized again. |