Blame view

Documentation/memory-hotplug.txt 18.6 KB
6867c9310   Yasunori Goto   Memory hotplug do...
1
2
3
  ==============
  Memory Hotplug
  ==============
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
4
5
  :Created:							Jul 28 2007
  :Updated: Add description of notifier of memory hotplug:	Oct 11 2007
6867c9310   Yasunori Goto   Memory hotplug do...
6
7
8
9
  
  This document is about memory hotplug including how-to-use and current status.
  Because Memory Hotplug is still under development, contents of this text will
  be changed often.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
10
  .. CONTENTS
6867c9310   Yasunori Goto   Memory hotplug do...
11

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    1. Introduction
      1.1 purpose of memory hotplug
      1.2. Phases of memory hotplug
      1.3. Unit of Memory online/offline operation
    2. Kernel Configuration
    3. sysfs files for memory hotplug
    4. Physical memory hot-add phase
      4.1 Hardware(Firmware) Support
      4.2 Notify memory hot-add event by hand
    5. Logical Memory hot-add phase
      5.1. State of memory
      5.2. How to online memory
    6. Logical memory remove
      6.1 Memory offline and ZONE_MOVABLE
      6.2. How to offline memory
    7. Physical memory remove
    8. Memory hotplug event notifier
    9. Future Work List
6867c9310   Yasunori Goto   Memory hotplug do...
30

6867c9310   Yasunori Goto   Memory hotplug do...
31

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
32
33
34
35
36
37
38
39
40
41
42
43
  .. note::
  
      (1) x86_64's has special implementation for memory hotplug.
          This text does not describe it.
      (2) This text assumes that sysfs is mounted at /sys.
  
  
  Introduction
  ============
  
  purpose of memory hotplug
  -------------------------
6867c9310   Yasunori Goto   Memory hotplug do...
44
45
46
47
48
49
50
51
52
53
54
55
  Memory Hotplug allows users to increase/decrease the amount of memory.
  Generally, there are two purposes.
  
  (A) For changing the amount of memory.
      This is to allow a feature like capacity on demand.
  (B) For installing/removing DIMMs or NUMA-nodes physically.
      This is to exchange DIMMs/NUMA-nodes, reduce power consumption, etc.
  
  (A) is required by highly virtualized environments and (B) is required by
  hardware which supports memory power management.
  
  Linux memory hotplug is designed for both purpose.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
56
57
58
59
  Phases of memory hotplug
  ------------------------
  
  There are 2 phases in Memory Hotplug:
6867c9310   Yasunori Goto   Memory hotplug do...
60
61
62
63
64
65
66
67
68
69
70
71
72
73
    1) Physical Memory Hotplug phase
    2) Logical Memory Hotplug phase.
  
  The First phase is to communicate hardware/firmware and make/erase
  environment for hotplugged memory. Basically, this phase is necessary
  for the purpose (B), but this is good phase for communication between
  highly virtualized environments too.
  
  When memory is hotplugged, the kernel recognizes new memory, makes new memory
  management tables, and makes sysfs files for new memory's operation.
  
  If firmware supports notification of connection of new memory to OS,
  this phase is triggered automatically. ACPI can notify this event. If not,
  "probe" operation by system administration is used instead.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
74
  (see :ref:`memory_hotplug_physical_mem`).
6867c9310   Yasunori Goto   Memory hotplug do...
75
76
  
  Logical Memory Hotplug phase is to change memory state into
19f594600   Matt LaPlante   trivial: Miscella...
77
  available/unavailable for users. Amount of memory from user's view is
6867c9310   Yasunori Goto   Memory hotplug do...
78
79
80
81
  changed by this phase. The kernel makes all memory in it as free pages
  when a memory range is available.
  
  In this document, this phase is described as online/offline.
19f594600   Matt LaPlante   trivial: Miscella...
82
  Logical Memory Hotplug phase is triggered by write of sysfs file by system
6867c9310   Yasunori Goto   Memory hotplug do...
83
84
85
  administrator. For the hot-add case, it must be executed after Physical Hotplug
  phase by hand.
  (However, if you writes udev's hotplug scripts for memory hotplug, these
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
86
  phases can be execute in seamless way.)
6867c9310   Yasunori Goto   Memory hotplug do...
87

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
88
89
  Unit of Memory online/offline operation
  ---------------------------------------
6867c9310   Yasunori Goto   Memory hotplug do...
90

56a3c655a   Li Zhong   memory-hotplug: u...
91
92
93
94
  Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
  into chunks of the same size. These chunks are called "sections". The size of
  a memory section is architecture dependent. For example, power uses 16MiB, ia64
  uses 1GiB.
6867c9310   Yasunori Goto   Memory hotplug do...
95

56a3c655a   Li Zhong   memory-hotplug: u...
96
97
98
99
  Memory sections are combined into chunks referred to as "memory blocks". The
  size of a memory block is architecture dependent and represents the logical
  unit upon which memory online/offline operations are to be performed. The
  default size of a memory block is the same as memory section size unless an
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
100
  architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.)
56a3c655a   Li Zhong   memory-hotplug: u...
101
102
  
  To determine the size (in bytes) of a memory block please read this file:
6867c9310   Yasunori Goto   Memory hotplug do...
103
104
  
  /sys/devices/system/memory/block_size_bytes
6867c9310   Yasunori Goto   Memory hotplug do...
105

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
106
107
  Kernel Configuration
  ====================
6867c9310   Yasunori Goto   Memory hotplug do...
108
109
  To use memory hotplug feature, kernel must be compiled with following
  config options.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
110
111
112
  - For all memory hotplug:
      - Memory model -> Sparse Memory  (CONFIG_SPARSEMEM)
      - Allow for memory hot-add       (CONFIG_MEMORY_HOTPLUG)
6867c9310   Yasunori Goto   Memory hotplug do...
113

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
114
115
116
  - To enable memory removal, the following are also necessary:
      - Allow for memory hot remove    (CONFIG_MEMORY_HOTREMOVE)
      - Page Migration                 (CONFIG_MIGRATION)
6867c9310   Yasunori Goto   Memory hotplug do...
117

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
118
119
120
  - For ACPI memory hotplug, the following are also necessary:
      - Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY)
      - This option can be kernel module.
6867c9310   Yasunori Goto   Memory hotplug do...
121
122
123
  
  - As a related configuration, if your box has a feature of NUMA-node hotplug
    via ACPI, then this option is necessary too.
6867c9310   Yasunori Goto   Memory hotplug do...
124

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
125
126
127
128
129
130
131
132
133
134
      - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
        (CONFIG_ACPI_CONTAINER).
  
       This option can be kernel module too.
  
  
  .. _memory_hotplug_sysfs_files:
  
  sysfs files for memory hotplug
  ==============================
56a3c655a   Li Zhong   memory-hotplug: u...
135

56a3c655a   Li Zhong   memory-hotplug: u...
136
  All memory blocks have their device information in sysfs.  Each memory block
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
137
  is described under /sys/devices/system/memory as:
6867c9310   Yasunori Goto   Memory hotplug do...
138

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
139
140
  	/sys/devices/system/memory/memoryXXX
  	(XXX is the memory block id.)
6867c9310   Yasunori Goto   Memory hotplug do...
141

56a3c655a   Li Zhong   memory-hotplug: u...
142
  For the memory block covered by the sysfs directory.  It is expected that all
0c2c99b1b   Nathan Fontenot   memory hotplug: A...
143
144
145
146
  memory sections in this range are present and no memory holes exist in the
  range. Currently there is no way to determine if there is a memory hole, but
  the existence of one should not affect the hotplug capabilities of the memory
  block.
6867c9310   Yasunori Goto   Memory hotplug do...
147

56a3c655a   Li Zhong   memory-hotplug: u...
148
  For example, assume 1GiB memory block size. A device for a memory starting at
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
149
150
151
  0x100000000 is /sys/device/system/memory/memory4::
  
  	(0x100000000 / 1Gib = 4)
6867c9310   Yasunori Goto   Memory hotplug do...
152
  This device covers address range [0x100000000 ... 0x140000000)
824333805   Sheng Yong   mem-hotplug: fix ...
153
  Under each memory block, you can see 5 files:
6867c9310   Yasunori Goto   Memory hotplug do...
154

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
155
156
157
158
159
160
161
162
163
164
165
166
  - /sys/devices/system/memory/memoryXXX/phys_index
  - /sys/devices/system/memory/memoryXXX/phys_device
  - /sys/devices/system/memory/memoryXXX/state
  - /sys/devices/system/memory/memoryXXX/removable
  - /sys/devices/system/memory/memoryXXX/valid_zones
  
  =================== ============================================================
  ``phys_index``      read-only and contains memory block id, same as XXX.
  ``state``           read-write
  
                      - at read:  contains online/offline state of memory.
                      - at write: user can specify "online_kernel",
6867c9310   Yasunori Goto   Memory hotplug do...
167

511c2aba8   Lai Jiangshan   mm, memory-hotplu...
168
                      "online_movable", "online", "offline" command
59e68a181   Xishi Qiu   mm/hotplug: fix a...
169
                      which will be performed on all sections in the block.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
170
  ``phys_device``     read-only: designed to show the name of physical memory
0c2c99b1b   Nathan Fontenot   memory hotplug: A...
171
                      device.  This is not well implemented now.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
172
  ``removable``       read-only: contains an integer value indicating
0c2c99b1b   Nathan Fontenot   memory hotplug: A...
173
174
175
176
177
                      whether the memory block is removable or not
                      removable.  A value of 1 indicates that the memory
                      block is removable and a value of 0 indicates that
                      it is not removable. A memory block is removable only if
                      every section in the block is removable.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
178
  ``valid_zones``     read-only: designed to show which zones this memory block
ed2f24009   Zhang Zhen   memory-hotplug: a...
179
  		    can be onlined to.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
180
181
  
  		    The first column shows it`s default zone.
ed2f24009   Zhang Zhen   memory-hotplug: a...
182
183
184
  		    "memory6/valid_zones: Normal Movable" shows this memoryblock
  		    can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE
  		    by online_movable.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
185

ed2f24009   Zhang Zhen   memory-hotplug: a...
186
187
188
  		    "memory7/valid_zones: Movable Normal" shows this memoryblock
  		    can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL
  		    by online_kernel.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
189
190
191
  =================== ============================================================
  
  .. note::
6867c9310   Yasunori Goto   Memory hotplug do...
192

6867c9310   Yasunori Goto   Memory hotplug do...
193
    These directories/files appear after physical memory hotplug phase.
dee5d0d51   Alex Chiang   mm: add numa node...
194
195
196
197
  If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed
  via symbolic links located in the /sys/devices/system/node/node* directories.
  
  For example:
c04fc586c   Gary Hade   mm: show node to ...
198
  /sys/devices/system/node/node0/memory9 -> ../../memory/memory9
6867c9310   Yasunori Goto   Memory hotplug do...
199

dee5d0d51   Alex Chiang   mm: add numa node...
200
201
  A backlink will also be created:
  /sys/devices/system/memory/memory9/node0 -> ../../node/node0
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
202
203
204
205
  .. _memory_hotplug_physical_mem:
  
  Physical memory hot-add phase
  =============================
56a3c655a   Li Zhong   memory-hotplug: u...
206

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
207
208
  Hardware(Firmware) Support
  --------------------------
6867c9310   Yasunori Goto   Memory hotplug do...
209

6867c9310   Yasunori Goto   Memory hotplug do...
210
211
212
213
214
215
216
217
218
  On x86_64/ia64 platform, memory hotplug by ACPI is supported.
  
  In general, the firmware (ACPI) which supports memory hotplug defines
  memory class object of _HID "PNP0C80". When a notify is asserted to PNP0C80,
  Linux's ACPI handler does hot-add memory to the system and calls a hotplug udev
  script. This will be done automatically.
  
  But scripts for memory hotplug are not contained in generic udev package(now).
  You may have to write it by yourself or online/offline memory by hand.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
219
220
  Please see :ref:`memory_hotplug_how_to_online_memory` and
  :ref:`memory_hotplug_how_to_offline_memory`.
6867c9310   Yasunori Goto   Memory hotplug do...
221
222
223
224
225
  
  If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004",
  "PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler
  calls hotplug code for all of objects which are defined in it.
  If memory device is found, memory hotplug code will be called.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
226
227
  Notify memory hot-add event by hand
  -----------------------------------
7cdb0d25b   David Rientjes   mm, hotplug: prob...
228
229
230
231
232
233
  On some architectures, the firmware may not notify the kernel of a memory
  hotplug event.  Therefore, the memory "probe" interface is supported to
  explicitly notify the kernel.  This interface depends on
  CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86
  if hotplug is supported, although for x86 this should be handled by ACPI
  notification.
6867c9310   Yasunori Goto   Memory hotplug do...
234
235
236
  
  Probe interface is located at
  /sys/devices/system/memory/probe
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
237
  You can tell the physical address of new memory to the kernel by::
6867c9310   Yasunori Goto   Memory hotplug do...
238

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
239
  	% echo start_address_of_new_memory > /sys/devices/system/memory/probe
6867c9310   Yasunori Goto   Memory hotplug do...
240

56a3c655a   Li Zhong   memory-hotplug: u...
241
242
243
  Then, [start_address_of_new_memory, start_address_of_new_memory +
  memory_block_size] memory range is hot-added. In this case, hotplug script is
  not called (in current implementation). You'll have to online memory by
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
244
  yourself.  Please see :ref:`memory_hotplug_how_to_online_memory`.
6867c9310   Yasunori Goto   Memory hotplug do...
245

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
246
247
  Logical Memory hot-add phase
  ============================
6867c9310   Yasunori Goto   Memory hotplug do...
248

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
249
250
251
252
253
254
  State of memory
  ---------------
  
  To see (online/offline) state of a memory block, read 'state' file::
  
  	% cat /sys/device/system/memory/memoryXXX/state
6867c9310   Yasunori Goto   Memory hotplug do...
255

6867c9310   Yasunori Goto   Memory hotplug do...
256

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
257
258
  - If the memory block is online, you'll read "online".
  - If the memory block is offline, you'll read "offline".
6867c9310   Yasunori Goto   Memory hotplug do...
259

6867c9310   Yasunori Goto   Memory hotplug do...
260

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
261
262
263
264
  .. _memory_hotplug_how_to_online_memory:
  
  How to online memory
  --------------------
6867c9310   Yasunori Goto   Memory hotplug do...
265

31bc3858e   Vitaly Kuznetsov   memory-hotplug: a...
266
  When the memory is hot-added, the kernel decides whether or not to "online"
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
267
  it according to the policy which can be read from "auto_online_blocks" file::
6867c9310   Yasunori Goto   Memory hotplug do...
268

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
269
  	% cat /sys/devices/system/memory/auto_online_blocks
31bc3858e   Vitaly Kuznetsov   memory-hotplug: a...
270

8604d9e53   Vitaly Kuznetsov   memory_hotplug: i...
271
272
273
274
  The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
  option. If it is disabled the default is "offline" which means the newly added
  memory is not in a ready-to-use state and you have to "online" the newly added
  memory blocks manually. Automatic onlining can be requested by writing "online"
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
275
  to "auto_online_blocks" file::
31bc3858e   Vitaly Kuznetsov   memory-hotplug: a...
276

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
277
  	% echo online > /sys/devices/system/memory/auto_online_blocks
31bc3858e   Vitaly Kuznetsov   memory-hotplug: a...
278
279
280
281
282
283
284
285
286
  
  This sets a global policy and impacts all memory blocks that will subsequently
  be hotplugged. Currently offline blocks keep their state. It is possible, under
  certain circumstances, that some memory blocks will be added but will fail to
  online. User space tools can check their "state" files
  (/sys/devices/system/memory/memoryXXX/state) and try to online them manually.
  
  If the automatic onlining wasn't requested, failed, or some memory block was
  offlined it is possible to change the individual block's state by writing to the
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
287
  "state" file::
6867c9310   Yasunori Goto   Memory hotplug do...
288

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
289
  	% echo online > /sys/devices/system/memory/memoryXXX/state
6867c9310   Yasunori Goto   Memory hotplug do...
290

56a3c655a   Li Zhong   memory-hotplug: u...
291
  This onlining will not change the ZONE type of the target memory block,
9f123ab54   Michal Hocko   mm, memory_hotplu...
292
293
294
  If the memory block doesn't belong to any zone an appropriate kernel zone
  (usually ZONE_NORMAL) will be used unless movable_node kernel command line
  option is specified when ZONE_MOVABLE will be used.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
295
296
297
  You can explicitly request to associate it with ZONE_MOVABLE by::
  
  	% echo online_movable > /sys/devices/system/memory/memoryXXX/state
511c2aba8   Lai Jiangshan   mm, memory-hotplu...
298

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
299
  .. note:: current limit: this memory block must be adjacent to ZONE_MOVABLE
511c2aba8   Lai Jiangshan   mm, memory-hotplu...
300

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
301
  Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by::
511c2aba8   Lai Jiangshan   mm, memory-hotplu...
302

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
303
304
305
  	% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
  
  .. note:: current limit: this memory block must be adjacent to ZONE_NORMAL
511c2aba8   Lai Jiangshan   mm, memory-hotplu...
306

9f123ab54   Michal Hocko   mm, memory_hotplu...
307
308
  An explicit zone onlining can fail (e.g. when the range is already within
  and existing and incompatible zone already).
56a3c655a   Li Zhong   memory-hotplug: u...
309
  After this, memory block XXX's state will be 'online' and the amount of
6867c9310   Yasunori Goto   Memory hotplug do...
310
  available memory will be increased.
6867c9310   Yasunori Goto   Memory hotplug do...
311
  This may be changed in future.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
312
313
314
315
316
  Logical memory remove
  =====================
  
  Memory offline and ZONE_MOVABLE
  -------------------------------
6867c9310   Yasunori Goto   Memory hotplug do...
317

6867c9310   Yasunori Goto   Memory hotplug do...
318
  Memory offlining is more complicated than memory online. Because memory offline
56a3c655a   Li Zhong   memory-hotplug: u...
319
320
  has to make the whole memory block be unused, memory offline can fail if
  the memory block includes memory which cannot be freed.
6867c9310   Yasunori Goto   Memory hotplug do...
321
322
  
  In general, memory offline can use 2 techniques.
56a3c655a   Li Zhong   memory-hotplug: u...
323
324
  (1) reclaim and free all memory in the memory block.
  (2) migrate all pages in the memory block.
6867c9310   Yasunori Goto   Memory hotplug do...
325
326
  
  In the current implementation, Linux's memory offline uses method (2), freeing
56a3c655a   Li Zhong   memory-hotplug: u...
327
  all  pages in the memory block by page migration. But not all pages are
6867c9310   Yasunori Goto   Memory hotplug do...
328
  migratable. Under current Linux, migratable pages are anonymous pages and
56a3c655a   Li Zhong   memory-hotplug: u...
329
330
  page caches. For offlining a memory block by migration, the kernel has to
  guarantee that the memory block contains only migratable pages.
6867c9310   Yasunori Goto   Memory hotplug do...
331

56a3c655a   Li Zhong   memory-hotplug: u...
332
333
  Now, a boot option for making a memory block which consists of migratable pages
  is supported. By specifying "kernelcore=" or "movablecore=" boot option, you can
6867c9310   Yasunori Goto   Memory hotplug do...
334
  create ZONE_MOVABLE...a zone which is just used for movable pages.
8c27ceff3   Mauro Carvalho Chehab   docs: fix locatio...
335
  (See also Documentation/admin-guide/kernel-parameters.rst)
6867c9310   Yasunori Goto   Memory hotplug do...
336
337
338
339
340
  
  Assume the system has "TOTAL" amount of memory at boot time, this boot option
  creates ZONE_MOVABLE as following.
  
  1) When kernelcore=YYYY boot option is used,
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
341
342
     Size of memory not for movable pages (not for offline) is YYYY.
     Size of memory for movable pages (for offline) is TOTAL-YYYY.
6867c9310   Yasunori Goto   Memory hotplug do...
343
344
  
  2) When movablecore=ZZZZ boot option is used,
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
345
346
347
348
     Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ.
     Size of memory for movable pages (for offline) is ZZZZ.
  
  .. note::
6867c9310   Yasunori Goto   Memory hotplug do...
349

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
350
351
     Unfortunately, there is no information to show which memory block belongs
     to ZONE_MOVABLE. This is TBD.
6867c9310   Yasunori Goto   Memory hotplug do...
352

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
353
  .. _memory_hotplug_how_to_offline_memory:
6867c9310   Yasunori Goto   Memory hotplug do...
354

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
355
356
  How to offline memory
  ---------------------
6867c9310   Yasunori Goto   Memory hotplug do...
357

56a3c655a   Li Zhong   memory-hotplug: u...
358
  You can offline a memory block by using the same sysfs interface that was used
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
359
  in memory onlining::
6867c9310   Yasunori Goto   Memory hotplug do...
360

c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
361
  	% echo offline > /sys/devices/system/memory/memoryXXX/state
6867c9310   Yasunori Goto   Memory hotplug do...
362

56a3c655a   Li Zhong   memory-hotplug: u...
363
  If offline succeeds, the state of the memory block is changed to be "offline".
6867c9310   Yasunori Goto   Memory hotplug do...
364
  If it fails, some error core (like -EBUSY) will be returned by the kernel.
56a3c655a   Li Zhong   memory-hotplug: u...
365
366
  Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline
  it.  If it doesn't contain 'unmovable' memory, you'll get success.
6867c9310   Yasunori Goto   Memory hotplug do...
367

56a3c655a   Li Zhong   memory-hotplug: u...
368
369
370
371
372
  A memory block under ZONE_MOVABLE is considered to be able to be offlined
  easily.  But under some busy state, it may return -EBUSY. Even if a memory
  block cannot be offlined due to -EBUSY, you can retry offlining it and may be
  able to offline it (or not). (For example, a page is referred to by some kernel
  internal call and released soon.)
6867c9310   Yasunori Goto   Memory hotplug do...
373
374
  
  Consideration:
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
375
376
377
378
379
380
381
382
    Memory hotplug's design direction is to make the possibility of memory
    offlining higher and to guarantee unplugging memory under any situation. But
    it needs more work. Returning -EBUSY under some situation may be good because
    the user can decide to retry more or not by himself. Currently, memory
    offlining code does some amount of retry with 120 seconds timeout.
  
  Physical memory remove
  ======================
6867c9310   Yasunori Goto   Memory hotplug do...
383

6867c9310   Yasunori Goto   Memory hotplug do...
384
385
386
  Need more implementation yet....
   - Notification completion of remove works by OS to firmware.
   - Guard from remove if not yet.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
387
388
  Memory hotplug event notifier
  =============================
433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
389
  Hotplugging events are sent to a notification queue.
10020ca24   Yasunori Goto   memory hotplug: d...
390

433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
391
392
393
  There are six types of notification defined in include/linux/memory.h:
  
  MEM_GOING_ONLINE
10020ca24   Yasunori Goto   memory hotplug: d...
394
395
396
    Generated before new memory becomes available in order to be able to
    prepare subsystems to handle memory. The page allocator is still unable
    to allocate from the new memory.
433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
397
  MEM_CANCEL_ONLINE
10020ca24   Yasunori Goto   memory hotplug: d...
398
    Generated if MEMORY_GOING_ONLINE fails.
433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
399
  MEM_ONLINE
19f594600   Matt LaPlante   trivial: Miscella...
400
    Generated when memory has successfully brought online. The callback may
10020ca24   Yasunori Goto   memory hotplug: d...
401
    allocate pages from the new memory.
433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
402
  MEM_GOING_OFFLINE
10020ca24   Yasunori Goto   memory hotplug: d...
403
404
405
    Generated to begin the process of offlining memory. Allocations are no
    longer possible from the memory but some of the memory to be offlined
    is still in use. The callback can be used to free memory known to a
56a3c655a   Li Zhong   memory-hotplug: u...
406
    subsystem from the indicated memory block.
10020ca24   Yasunori Goto   memory hotplug: d...
407

433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
408
  MEM_CANCEL_OFFLINE
10020ca24   Yasunori Goto   memory hotplug: d...
409
    Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from
56a3c655a   Li Zhong   memory-hotplug: u...
410
    the memory block that we attempted to offline.
10020ca24   Yasunori Goto   memory hotplug: d...
411

433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
412
  MEM_OFFLINE
10020ca24   Yasunori Goto   memory hotplug: d...
413
    Generated after offlining memory is complete.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
414
  A callback routine can be registered by calling::
433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
415

10020ca24   Yasunori Goto   memory hotplug: d...
416
    hotplug_memory_notifier(callback_func, priority)
433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
417
418
  Callback functions with higher values of priority are called before callback
  functions with lower values.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
419
  A callback function must have the following prototype::
433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
420
421
422
423
424
425
426
  
    int callback_func(
      struct notifier_block *self, unsigned long action, void *arg);
  
  The first argument of the callback function (self) is a pointer to the block
  of the notifier chain that points to the callback function itself.
  The second argument (action) is one of the event types described above.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
  The third argument (arg) passes a pointer of struct memory_notify::
  
  	struct memory_notify {
  		unsigned long start_pfn;
  		unsigned long nr_pages;
  		int status_change_nid_normal;
  		int status_change_nid_high;
  		int status_change_nid;
  	}
  
  - start_pfn is start_pfn of online/offline memory.
  - nr_pages is # of pages of online/offline memory.
  - status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
    is (will be) set/clear, if this is -1, then nodemask status is not changed.
  - status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
    is (will be) set/clear, if this is -1, then nodemask status is not changed.
  - status_change_nid is set node id when N_MEMORY of nodemask is (will be)
    set/clear. It means a new(memoryless) node gets new memory by online and a
    node loses all memory. If this is -1, then nodemask status is not changed.
  
    If status_changed_nid* >= 0, callback should create/discard structures for the
    node if necessary.
10020ca24   Yasunori Goto   memory hotplug: d...
449

433b89cfb   Heinrich Schuchardt   Doc/memory-hotplu...
450
451
452
453
454
455
456
457
458
459
460
  The callback routine shall return one of the values
  NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
  defined in include/linux/notifier.h
  
  NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
  
  NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
  MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
  further processing of the notification queue.
  
  NOTIFY_STOP stops further processing of the notification queue.
c18c1cce0   Mauro Carvalho Chehab   memory-hotplug.tx...
461
462
  Future Work
  ===========
6867c9310   Yasunori Goto   Memory hotplug do...
463
464
    - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
      sysctl or new control file.
56a3c655a   Li Zhong   memory-hotplug: u...
465
    - showing memory block and physical device relationship.
6867c9310   Yasunori Goto   Memory hotplug do...
466
467
468
469
    - test and make it better memory offlining.
    - support HugeTLB page migration and offlining.
    - memmap removing at memory offline.
    - physical remove memory.