Commit de1ba09b214056365d9082982905b255caafb7a2

Authored by Akinobu Mita
Committed by Linus Torvalds
1 parent 4b3bb06bea

[PATCH] fault injection: documentation and scripts

This patch set provides some fault-injection capabilities.

- kmalloc() failures

- alloc_pages() failures

- disk IO errors

We can see what really happens if those failures happen.

In order to enable these fault-injection capabilities:

1. Enable relevant config options (CONFIG_FAILSLAB, CONFIG_PAGE_ALLOC,
   CONFIG_MAKE_REQUEST) and if you want to configure them via debugfs,
   enable CONFIG_FAULT_INJECTION_DEBUG_FS.

2. Build and boot with this kernel

3. Configure fault-injection capabilities behavior by boot option or debugfs

   - Boot option

     failslab=
     fail_page_alloc=
     fail_make_request=

   - Debugfs

     /debug/failslab/*
     /debug/fail_page_alloc/*
     /debug/fail_make_request/*

   Please refer to the Documentation/fault-injection/fault-injection.txt
   for details.

4. See what really happens.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Don Mullis <dwm@meer.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Showing 4 changed files with 265 additions and 0 deletions Side-by-side Diff

Documentation/fault-injection/failcmd.sh
  1 +#!/bin/bash
  2 +
  3 +echo 1 > /proc/self/make-it-fail
  4 +exec $*
Documentation/fault-injection/failmodule.sh
  1 +#!/bin/bash
  2 +#
  3 +# Usage: failmodule <failname> <modulename> [stacktrace-depth]
  4 +#
  5 +# <failname>: "failslab", "fail_alloc_page", or "fail_make_request"
  6 +#
  7 +# <modulename>: module name that you want to inject faults.
  8 +#
  9 +# [stacktrace-depth]: the maximum number of stacktrace walking allowed
  10 +#
  11 +
  12 +STACKTRACE_DEPTH=5
  13 +if [ $# -gt 2 ]; then
  14 + STACKTRACE_DEPTH=$3
  15 +fi
  16 +
  17 +if [ ! -d /debug/$1 ]; then
  18 + echo "Fault-injection $1 does not exist" >&2
  19 + exit 1
  20 +fi
  21 +if [ ! -d /sys/module/$2 ]; then
  22 + echo "Module $2 does not exist" >&2
  23 + exit 1
  24 +fi
  25 +
  26 +# Disable any fault injection
  27 +echo 0 > /debug/$1/stacktrace-depth
  28 +
  29 +echo `cat /sys/module/$2/sections/.text` > /debug/$1/address-start
  30 +echo `cat /sys/module/$2/sections/.exit.text` > /debug/$1/address-end
  31 +echo $STACKTRACE_DEPTH > /debug/$1/stacktrace-depth
Documentation/fault-injection/fault-injection.txt
  1 +Fault injection capabilities infrastructure
  2 +===========================================
  3 +
  4 +See also drivers/md/faulty.c and "every_nth" module option for scsi_debug.
  5 +
  6 +
  7 +Available fault injection capabilities
  8 +--------------------------------------
  9 +
  10 +o failslab
  11 +
  12 + injects slab allocation failures. (kmalloc(), kmem_cache_alloc(), ...)
  13 +
  14 +o fail_page_alloc
  15 +
  16 + injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
  17 +
  18 +o fail_make_request
  19 +
  20 + injects disk IO errors on permitted devices by
  21 + /sys/block/<device>/make-it-fail or
  22 + /sys/block/<device>/<partition>/make-it-fail. (generic_make_request())
  23 +
  24 +Configure fault-injection capabilities behavior
  25 +-----------------------------------------------
  26 +
  27 +o debugfs entries
  28 +
  29 +fault-inject-debugfs kernel module provides some debugfs entries for runtime
  30 +configuration of fault-injection capabilities.
  31 +
  32 +- /debug/*/probability:
  33 +
  34 + likelihood of failure injection, in percent.
  35 + Format: <percent>
  36 +
  37 + Note that one-failure-per-handred is a very high error rate
  38 + for some testcases. Please set probably=100 and configure
  39 + /debug/*/interval for such testcases.
  40 +
  41 +- /debug/*/interval:
  42 +
  43 + specifies the interval between failures, for calls to
  44 + should_fail() that pass all the other tests.
  45 +
  46 + Note that if you enable this, by setting interval>1, you will
  47 + probably want to set probability=100.
  48 +
  49 +- /debug/*/times:
  50 +
  51 + specifies how many times failures may happen at most.
  52 + A value of -1 means "no limit".
  53 +
  54 +- /debug/*/space:
  55 +
  56 + specifies an initial resource "budget", decremented by "size"
  57 + on each call to should_fail(,size). Failure injection is
  58 + suppressed until "space" reaches zero.
  59 +
  60 +- /debug/*/verbose
  61 +
  62 + Format: { 0 | 1 | 2 }
  63 + specifies the verbosity of the messages when failure is injected.
  64 + We default to 0 (no extra messages), setting it to '1' will
  65 + print only to tell failure happened, '2' will print call trace too -
  66 + it is useful to debug the problems revealed by fault injection
  67 + capabilities.
  68 +
  69 +- /debug/*/task-filter:
  70 +
  71 + Format: { 0 | 1 }
  72 + A value of '0' disables filtering by process (default).
  73 + Any positive value limits failures to only processes indicated by
  74 + /proc/<pid>/make-it-fail==1.
  75 +
  76 +- /debug/*/address-start:
  77 +- /debug/*/address-end:
  78 +
  79 + specifies the range of virtual addresses tested during
  80 + stacktrace walking. Failure is injected only if some caller
  81 + in the walked stacktrace lies within this range.
  82 + Default is [0,ULONG_MAX) (whole of virtual address space).
  83 +
  84 +- /debug/*/stacktrace-depth:
  85 +
  86 + specifies the maximum stacktrace depth walked during search
  87 + for a caller within [address-start,address-end).
  88 +
  89 +- /debug/failslab/ignore-gfp-highmem:
  90 +- /debug/fail_page_alloc/ignore-gfp-highmem:
  91 +
  92 + Format: { 0 | 1 }
  93 + default is 0, setting it to '1' won't inject failures into
  94 + highmem/user allocations.
  95 +
  96 +- /debug/failslab/ignore-gfp-wait:
  97 +- /debug/fail_page_alloc/ignore-gfp-wait:
  98 +
  99 + Format: { 0 | 1 }
  100 + default is 0, setting it to '1' will inject failures
  101 + only into non-sleep allocations (GFP_ATOMIC allocations).
  102 +
  103 +o Boot option
  104 +
  105 +In order to inject faults while debugfs is not available (early boot time),
  106 +use the boot option:
  107 +
  108 + failslab=
  109 + fail_page_alloc=
  110 + fail_make_request=<interval>,<probability>,<space>,<times>
  111 +
  112 +How to add new fault injection capability
  113 +-----------------------------------------
  114 +
  115 +o #include <linux/fault-inject.h>
  116 +
  117 +o define the fault attributes
  118 +
  119 + DECLARE_FAULT_INJECTION(name);
  120 +
  121 + Please see the definition of struct fault_attr in fault-inject.h
  122 + for details.
  123 +
  124 +o provide the way to configure fault attributes
  125 +
  126 +- boot option
  127 +
  128 + If you need to enable the fault injection capability from boot time, you can
  129 + provide boot option to configure it. There is a helper function for it.
  130 +
  131 + setup_fault_attr(attr, str);
  132 +
  133 +- debugfs entries
  134 +
  135 + failslab, fail_page_alloc, and fail_make_request use this way.
  136 + There is a helper function for it.
  137 +
  138 + init_fault_attr_entries(entries, attr, name);
  139 + void cleanup_fault_attr_entries(entries);
  140 +
  141 +- module parameters
  142 +
  143 + If the scope of the fault injection capability is limited to a
  144 + single kernel module, it is better to provide module parameters to
  145 + configure the fault attributes.
  146 +
  147 +o add a hook to insert failures
  148 +
  149 + should_fail() returns 1 when failures should happen.
  150 +
  151 + should_fail(attr,size);
  152 +
  153 +Application Examples
  154 +--------------------
  155 +
  156 +o inject slab allocation failures into module init/cleanup code
  157 +
  158 +------------------------------------------------------------------------------
  159 +#!/bin/bash
  160 +
  161 +FAILCMD=Documentation/fault-injection/failcmd.sh
  162 +BLACKLIST="root_plug evbug"
  163 +
  164 +FAILNAME=failslab
  165 +echo Y > /debug/$FAILNAME/task-filter
  166 +echo 10 > /debug/$FAILNAME/probability
  167 +echo 100 > /debug/$FAILNAME/interval
  168 +echo -1 > /debug/$FAILNAME/times
  169 +echo 2 > /debug/$FAILNAME/verbose
  170 +echo 1 > /debug/$FAILNAME/ignore-gfp-highmem
  171 +echo 1 > /debug/$FAILNAME/ignore-gfp-wait
  172 +
  173 +blacklist()
  174 +{
  175 + echo $BLACKLIST | grep $1 > /dev/null 2>&1
  176 +}
  177 +
  178 +oops()
  179 +{
  180 + dmesg | grep BUG > /dev/null 2>&1
  181 +}
  182 +
  183 +find /lib/modules/`uname -r` -name '*.ko' -exec basename {} .ko \; |
  184 + while read i
  185 + do
  186 + oops && exit 1
  187 +
  188 + if ! blacklist $i
  189 + then
  190 + echo inserting $i...
  191 + bash $FAILCMD modprobe $i
  192 + fi
  193 + done
  194 +
  195 +lsmod | awk '{ if ($3 == 0) { print $1 } }' |
  196 + while read i
  197 + do
  198 + oops && exit 1
  199 +
  200 + if ! blacklist $i
  201 + then
  202 + echo removing $i...
  203 + bash $FAILCMD modprobe -r $i
  204 + fi
  205 + done
  206 +
  207 +------------------------------------------------------------------------------
  208 +
  209 +o inject slab allocation failures only for a specific module
  210 +
  211 +------------------------------------------------------------------------------
  212 +#!/bin/bash
  213 +
  214 +FAILMOD=Documentation/fault-injection/failmodule.sh
  215 +
  216 +echo injecting errors into the module $1...
  217 +
  218 +modprobe $1
  219 +bash $FAILMOD failslab $1 10
  220 +echo 25 > /debug/failslab/probability
  221 +
  222 +------------------------------------------------------------------------------
Documentation/kernel-parameters.txt
... ... @@ -548,6 +548,13 @@
548 548 eurwdt= [HW,WDT] Eurotech CPU-1220/1410 onboard watchdog.
549 549 Format: <io>[,<irq>]
550 550  
  551 + failslab=
  552 + fail_page_alloc=
  553 + fail_make_request=[KNL]
  554 + General fault injection mechanism.
  555 + Format: <interval>,<probability>,<space>,<times>
  556 + See also /Documentation/fault-injection/.
  557 +
551 558 fd_mcs= [HW,SCSI]
552 559 See header of drivers/scsi/fd_mcs.c.
553 560