Blame view

Documentation/powerpc/phyp-assisted-dump.txt 4.93 KB
d28a79326   Manish Ahuja   [POWERPC] pseries...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
  
                     Hypervisor-Assisted Dump
                     ------------------------
                         November 2007
  
  The goal of hypervisor-assisted dump is to enable the dump of
  a crashed system, and to do so from a fully-reset system, and
  to minimize the total elapsed time until the system is back
  in production use.
  
  As compared to kdump or other strategies, hypervisor-assisted
  dump offers several strong, practical advantages:
  
  -- Unlike kdump, the system has been reset, and loaded
     with a fresh copy of the kernel.  In particular,
     PCI and I/O devices have been reinitialized and are
     in a clean, consistent state.
  -- As the dump is performed, the dumped memory becomes
     immediately available to the system for normal use.
  -- After the dump is completed, no further reboots are
     required; the system will be fully usable, and running
a33f32244   Francis Galiegue   Documentation/: i...
22
     in its normal, production mode on its normal kernel.
d28a79326   Manish Ahuja   [POWERPC] pseries...
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
  
  The above can only be accomplished by coordination with,
  and assistance from the hypervisor. The procedure is
  as follows:
  
  -- When a system crashes, the hypervisor will save
     the low 256MB of RAM to a previously registered
     save region. It will also save system state, system
     registers, and hardware PTE's.
  
  -- After the low 256MB area has been saved, the
     hypervisor will reset PCI and other hardware state.
     It will *not* clear RAM. It will then launch the
     bootloader, as normal.
  
  -- The freshly booted kernel will notice that there
     is a new node (ibm,dump-kernel) in the device tree,
     indicating that there is crash data available from
     a previous boot. It will boot into only 256MB of RAM,
     reserving the rest of system memory.
  
  -- Userspace tools will parse /sys/kernel/release_region
     and read /proc/vmcore to obtain the contents of memory,
     which holds the previous crashed kernel. The userspace
     tools may copy this info to disk, or network, nas, san,
     iscsi, etc. as desired.
  
     For Example: the values in /sys/kernel/release-region
     would look something like this (address-range pairs).
     CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
     DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
  
  -- As the userspace tools complete saving a portion of
     dump, they echo an offset and size to
     /sys/kernel/release_region to release the reserved
     memory back to general use.
  
     An example of this is:
       "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
     which will release 256MB at the 1GB boundary.
  
  Please note that the hypervisor-assisted dump feature
  is only available on Power6-based systems with recent
  firmware versions.
  
  Implementation details:
  ----------------------
  
  During boot, a check is made to see if firmware supports
  this feature on this particular machine. If it does, then
  we check to see if a active dump is waiting for us. If yes
  then everything but 256 MB of RAM is reserved during early
  boot. This area is released once we collect a dump from user
  land scripts that are run. If there is dump data, then
  the /sys/kernel/release_region file is created, and
  the reserved memory is held.
  
  If there is no waiting dump data, then only the highest
  256MB of the ram is reserved as a scratch area. This area
  is *not* released: this region will be kept permanently
  reserved, so that it can act as a receptacle for a copy
  of the low 256MB in the case a crash does occur. See,
  however, "open issues" below, as to whether
  such a reserved region is really needed.
  
  Currently the dump will be copied from /proc/vmcore to a
  a new file upon user intervention. The starting address
  to be read and the range for each data point in provided
  in /sys/kernel/release_region.
  
  The tools to examine the dump will be same as the ones
  used for kdump.
  
  General notes:
  --------------
  Security: please note that there are potential security issues
  with any sort of dump mechanism. In particular, plaintext
  (unencrypted) data, and possibly passwords, may be present in
  the dump data. Userspace tools must take adequate precautions to
  preserve security.
  
  Open issues/ToDo:
  ------------
   o The various code paths that tell the hypervisor that a crash
     occurred, vs. it simply being a normal reboot, should be
     reviewed, and possibly clarified/fixed.
  
   o Instead of using /sys/kernel, should there be a /sys/dump
     instead? There is a dump_subsys being created by the s390 code,
     perhaps the pseries code should use a similar layout as well.
  
   o Is reserving a 256MB region really required? The goal of
     reserving a 256MB scratch area is to make sure that no
     important crash data is clobbered when the hypervisor
     save low mem to the scratch area. But, if one could assure
     that nothing important is located in some 256MB area, then
     it would not need to be reserved. Something that can be
     improved in subsequent versions.
  
   o Still working the kdump team to integrate this with kdump,
     some work remains but this would not affect the current
     patches.
  
   o Still need to write a shell script, to copy the dump away.
     Currently I am parsing it manually.