Blame view

Documentation/vm/page_migration.rst 13.1 KB
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
1
2
3
  .. _page_migration:
  
  ==============
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
4
  Page migration
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
5
  ==============
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
6

50aab9b14   Ralph Campbell   mm/doc: editorial...
7
8
  Page migration allows moving the physical location of pages between
  nodes in a NUMA system while the process is running. This means that the
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
9
10
  virtual addresses that the process sees do not change. However, the
  system rearranges the physical location of those pages.
50aab9b14   Ralph Campbell   mm/doc: editorial...
11
12
13
14
  Also see :ref:`Heterogeneous Memory Management (HMM) <hmm>`
  for migrating pages to or from device private memory.
  
  The main intent of page migration is to reduce the latency of memory accesses
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
15
16
17
18
19
  by moving pages near to the processor where the process accessing that memory
  is running.
  
  Page migration allows a process to manually relocate the node on which its
  pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
50aab9b14   Ralph Campbell   mm/doc: editorial...
20
  a new memory policy via mbind(). The pages of a process can also be relocated
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
21
  from another process using the sys_migrate_pages() function call. The
50aab9b14   Ralph Campbell   mm/doc: editorial...
22
  migrate_pages() function call takes two sets of nodes and moves pages of a
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
23
  process that are located on the from nodes to the destination nodes.
b4fb37662   Christoph Lameter   [PATCH] Page migr...
24
25
  Page migration functions are provided by the numactl package by Andi Kleen
  (a version later than 0.9.3 is required. Get it from
50aab9b14   Ralph Campbell   mm/doc: editorial...
26
27
  https://github.com/numactl/numactl.git). numactl provides libnuma
  which provides an interface similar to other NUMA functionality for page
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
28
  migration.  cat ``/proc/<pid>/numa_maps`` allows an easy review of where the
6acb2ecef   Michael Kerrisk   Documentation/vm/...
29
30
  pages of a process are located. See also the numa_maps documentation in the
  proc(5) man page.
b4fb37662   Christoph Lameter   [PATCH] Page migr...
31
32
  
  Manual migration is useful if for example the scheduler has relocated
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
33
34
  a process to a processor on a distant node. A batch scheduler or an
  administrator may detect the situation and move the pages of the process
50aab9b14   Ralph Campbell   mm/doc: editorial...
35
  nearer to the new processor. The kernel itself only provides
742755a1d   Christoph Lameter   [PATCH] page migr...
36
37
38
  manual page migration support. Automatic page migration may be implemented
  through user space processes that move pages. A special function call
  "move_pages" allows the moving of individual pages within a process.
50aab9b14   Ralph Campbell   mm/doc: editorial...
39
  For example, A NUMA profiler may obtain a log showing frequent off-node
742755a1d   Christoph Lameter   [PATCH] page migr...
40
41
  accesses and may use the result to move pages to more advantageous
  locations.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
42
43
44
  
  Larger installations usually partition the system using cpusets into
  sections of nodes. Paul Jackson has equipped cpusets with the ability to
21acb9caa   Thadeu Lima de Souza Cascardo   trivial: fix wher...
45
  move pages when a task is moved to another cpuset (See
50aab9b14   Ralph Campbell   mm/doc: editorial...
46
47
  :ref:`CPUSETS <cpusets>`).
  Cpusets allow the automation of process locality. If a task is moved to
b4fb37662   Christoph Lameter   [PATCH] Page migr...
48
49
50
51
  a new cpuset then also all its pages are moved with it so that the
  performance of the process does not sink dramatically. Also the pages
  of processes in a cpuset are moved if the allowed memory nodes of a
  cpuset are changed.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
52
53
54
55
56
57
58
59
  
  Page migration allows the preservation of the relative location of pages
  within a group of nodes for all migration techniques which will preserve a
  particular memory allocation pattern generated even after migrating a
  process. This is necessary in order to preserve the memory latencies.
  Processes will run with similar performance after migration.
  
  Page migration occurs in several steps. First a high level
b4fb37662   Christoph Lameter   [PATCH] Page migr...
60
61
62
  description for those trying to use migrate_pages() from the kernel
  (for userspace usage see the Andi Kleen's numactl package mentioned above)
  and then a low level description of how the low level details work.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
63

1b7599b5d   Mike Rapoport   docs/vm: page_mig...
64
65
  In kernel use of migrate_pages()
  ================================
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
66
67
68
69
70
  
  1. Remove pages from the LRU.
  
     Lists of pages to be migrated are generated by scanning over
     pages and moving them into lists. This is done by
b4fb37662   Christoph Lameter   [PATCH] Page migr...
71
     calling isolate_lru_page().
50aab9b14   Ralph Campbell   mm/doc: editorial...
72
     Calling isolate_lru_page() increases the references to the page
b4fb37662   Christoph Lameter   [PATCH] Page migr...
73
     so that it cannot vanish while the page migration occurs.
50aab9b14   Ralph Campbell   mm/doc: editorial...
74
     It also prevents the swapper or other scans from encountering
b4fb37662   Christoph Lameter   [PATCH] Page migr...
75
     the page.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
76

742755a1d   Christoph Lameter   [PATCH] page migr...
77
78
79
  2. We need to have a function of type new_page_t that can be
     passed to migrate_pages(). This function should figure out
     how to allocate the correct new page given the old page.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
80
81
  
  3. The migrate_pages() function is called which attempts
742755a1d   Christoph Lameter   [PATCH] page migr...
82
83
84
     to do the migration. It will call the function to allocate
     the new page for each page that is considered for
     moving.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
85

1b7599b5d   Mike Rapoport   docs/vm: page_mig...
86
87
  How migrate_pages() works
  =========================
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
88

b4fb37662   Christoph Lameter   [PATCH] Page migr...
89
90
91
92
  migrate_pages() does several passes over its list of pages. A page is moved
  if all references to a page are removable at the time. The page has
  already been removed from the LRU via isolate_lru_page() and the refcount
  is increased so that the page cannot be freed while page migration occurs.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
93
94
  
  Steps:
50aab9b14   Ralph Campbell   mm/doc: editorial...
95
  1. Lock the page to be migrated.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
96

b93b01631   Matthew Wilcox   page cache: use x...
97
  2. Ensure that writeback is complete.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
98

cf4b769ab   Hugh Dickins   mm: page migratio...
99
  3. Lock the new page that we want to move to. It is locked so that accesses to
94ebdd28f   Colin Ian King   docs/vm: trivial ...
100
     this (not yet up-to-date) page immediately block while the move is in progress.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
101

cf4b769ab   Hugh Dickins   mm: page migratio...
102
  4. All the page table references to the page are converted to migration
7a14239a8   Hugh Dickins   mm Documentation:...
103
104
     entries. This decreases the mapcount of a page. If the resulting
     mapcount is not zero then we do not migrate the page. All user space
50aab9b14   Ralph Campbell   mm/doc: editorial...
105
106
     processes that attempt to access the page will now wait on the page lock
     or wait for the migration page table entry to be removed.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
107

b93b01631   Matthew Wilcox   page cache: use x...
108
109
  5. The i_pages lock is taken. This will cause all processes trying
     to access the page via the mapping to block on the spinlock.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
110

50aab9b14   Ralph Campbell   mm/doc: editorial...
111
112
  6. The refcount of the page is examined and we back out if references remain.
     Otherwise, we know that we are the only one referencing this page.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
113

cf4b769ab   Hugh Dickins   mm: page migratio...
114
  7. The radix tree is checked and if it does not contain the pointer to this
8d3c138b7   Christoph Lameter   [PATCH] page migr...
115
     page then we back out because someone else modified the radix tree.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
116

cf4b769ab   Hugh Dickins   mm: page migratio...
117
118
  8. The new page is prepped with some settings from the old page so that
     accesses to the new page will discover a page with the correct settings.
8d3c138b7   Christoph Lameter   [PATCH] page migr...
119
  9. The radix tree is changed to point to the new page.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
120

b93b01631   Matthew Wilcox   page cache: use x...
121
  10. The reference count of the old page is dropped because the address space
8d3c138b7   Christoph Lameter   [PATCH] page migr...
122
      reference is gone. A reference to the new page is established because
b93b01631   Matthew Wilcox   page cache: use x...
123
      the new page is referenced by the address space.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
124

b93b01631   Matthew Wilcox   page cache: use x...
125
126
  11. The i_pages lock is dropped. With that lookups in the mapping
      become possible again. Processes will move from spinning on the lock
8d3c138b7   Christoph Lameter   [PATCH] page migr...
127
      to sleeping on the locked new page.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
128

8d3c138b7   Christoph Lameter   [PATCH] page migr...
129
  12. The page contents are copied to the new page.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
130

8d3c138b7   Christoph Lameter   [PATCH] page migr...
131
  13. The remaining page flags are copied to the new page.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
132

8d3c138b7   Christoph Lameter   [PATCH] page migr...
133
134
  14. The old page flags are cleared to indicate that the page does
      not provide any information anymore.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
135

8d3c138b7   Christoph Lameter   [PATCH] page migr...
136
  15. Queued up writeback on the new page is triggered.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
137

50aab9b14   Ralph Campbell   mm/doc: editorial...
138
139
140
  16. If migration entries were inserted into the page table, then replace them
      with real ptes. Doing so will enable access for user space processes not
      already waiting for the page lock.
b4fb37662   Christoph Lameter   [PATCH] Page migr...
141

50aab9b14   Ralph Campbell   mm/doc: editorial...
142
  17. The page locks are dropped from the old and new page.
8d3c138b7   Christoph Lameter   [PATCH] page migr...
143
144
      Processes waiting on the page lock will redo their page faults
      and will reach the new page.
b4fb37662   Christoph Lameter   [PATCH] Page migr...
145

50aab9b14   Ralph Campbell   mm/doc: editorial...
146
147
  18. The new page is moved to the LRU and can be scanned by the swapper,
      etc. again.
b4fb37662   Christoph Lameter   [PATCH] Page migr...
148

1b7599b5d   Mike Rapoport   docs/vm: page_mig...
149
150
  Non-LRU page migration
  ======================
bda807d44   Minchan Kim   mm: migrate: supp...
151

50aab9b14   Ralph Campbell   mm/doc: editorial...
152
153
  Although migration originally aimed for reducing the latency of memory accesses
  for NUMA, compaction also uses migration to create high-order pages.
bda807d44   Minchan Kim   mm: migrate: supp...
154
155
  
  Current problem of the implementation is that it is designed to migrate only
50aab9b14   Ralph Campbell   mm/doc: editorial...
156
  *LRU* pages. However, there are potential non-LRU pages which can be migrated
bda807d44   Minchan Kim   mm: migrate: supp...
157
158
159
160
161
  in drivers, for example, zsmalloc, virtio-balloon pages.
  
  For virtio-balloon pages, some parts of migration code path have been hooked
  up and added virtio-balloon specific functions to intercept migration logics.
  It's too specific to a driver so other drivers who want to make their pages
50aab9b14   Ralph Campbell   mm/doc: editorial...
162
  movable would have to add their own specific hooks in the migration path.
bda807d44   Minchan Kim   mm: migrate: supp...
163

50aab9b14   Ralph Campbell   mm/doc: editorial...
164
  To overcome the problem, VM supports non-LRU page migration which provides
bda807d44   Minchan Kim   mm: migrate: supp...
165
  generic functions for non-LRU movable pages without driver specific hooks
50aab9b14   Ralph Campbell   mm/doc: editorial...
166
  in the migration path.
bda807d44   Minchan Kim   mm: migrate: supp...
167

50aab9b14   Ralph Campbell   mm/doc: editorial...
168
  If a driver wants to make its pages movable, it should define three functions
bda807d44   Minchan Kim   mm: migrate: supp...
169
  which are function pointers of struct address_space_operations.
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
170
  1. ``bool (*isolate_page) (struct page *page, isolate_mode_t mode);``
bda807d44   Minchan Kim   mm: migrate: supp...
171

50aab9b14   Ralph Campbell   mm/doc: editorial...
172
173
     What VM expects from isolate_page() function of driver is to return *true*
     if driver isolates the page successfully. On returning true, VM marks the page
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
174
175
     as PG_isolated so concurrent isolation in several CPUs skip the page
     for isolation. If a driver cannot isolate the page, it should return *false*.
bda807d44   Minchan Kim   mm: migrate: supp...
176

1b7599b5d   Mike Rapoport   docs/vm: page_mig...
177
     Once page is successfully isolated, VM uses page.lru fields so driver
50aab9b14   Ralph Campbell   mm/doc: editorial...
178
     shouldn't expect to preserve values in those fields.
bda807d44   Minchan Kim   mm: migrate: supp...
179

1b7599b5d   Mike Rapoport   docs/vm: page_mig...
180
181
  2. ``int (*migratepage) (struct address_space *mapping,``
  |	``struct page *newpage, struct page *oldpage, enum migrate_mode);``
bda807d44   Minchan Kim   mm: migrate: supp...
182

50aab9b14   Ralph Campbell   mm/doc: editorial...
183
184
185
     After isolation, VM calls migratepage() of driver with the isolated page.
     The function of migratepage() is to move the contents of the old page to the
     new page
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
186
187
     and set up fields of struct page newpage. Keep in mind that you should
     indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
50aab9b14   Ralph Campbell   mm/doc: editorial...
188
     under page_lock if you migrated the oldpage successfully and returned
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
189
190
     MIGRATEPAGE_SUCCESS. If driver cannot migrate the page at the moment, driver
     can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time
50aab9b14   Ralph Campbell   mm/doc: editorial...
191
192
193
     because VM interprets -EAGAIN as "temporary migration failure". On returning
     any error except -EAGAIN, VM will give up the page migration without
     retrying.
bda807d44   Minchan Kim   mm: migrate: supp...
194

50aab9b14   Ralph Campbell   mm/doc: editorial...
195
     Driver shouldn't touch the page.lru field while in the migratepage() function.
bda807d44   Minchan Kim   mm: migrate: supp...
196

1b7599b5d   Mike Rapoport   docs/vm: page_mig...
197
  3. ``void (*putback_page)(struct page *);``
bda807d44   Minchan Kim   mm: migrate: supp...
198

50aab9b14   Ralph Campbell   mm/doc: editorial...
199
200
201
     If migration fails on the isolated page, VM should return the isolated page
     to the driver so VM calls the driver's putback_page() with the isolated page.
     In this function, the driver should put the isolated page back into its own data
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
202
     structure.
a48d07afd   Christoph Lameter   [PATCH] Direct Mi...
203

50aab9b14   Ralph Campbell   mm/doc: editorial...
204
  4. non-LRU movable page flags
bda807d44   Minchan Kim   mm: migrate: supp...
205

50aab9b14   Ralph Campbell   mm/doc: editorial...
206
     There are two page flags for supporting non-LRU movable page.
bda807d44   Minchan Kim   mm: migrate: supp...
207

1b7599b5d   Mike Rapoport   docs/vm: page_mig...
208
     * PG_movable
bda807d44   Minchan Kim   mm: migrate: supp...
209

50aab9b14   Ralph Campbell   mm/doc: editorial...
210
       Driver should use the function below to make page movable under page_lock::
bda807d44   Minchan Kim   mm: migrate: supp...
211
212
  
  	void __SetPageMovable(struct page *page, struct address_space *mapping)
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
213
214
       It needs argument of address_space for registering migration
       family functions which will be called by VM. Exactly speaking,
50aab9b14   Ralph Campbell   mm/doc: editorial...
215
216
       PG_movable is not a real flag of struct page. Rather, VM
       reuses the page->mapping's lower bits to represent it::
bda807d44   Minchan Kim   mm: migrate: supp...
217
218
219
  
  	#define PAGE_MAPPING_MOVABLE 0x2
  	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
220
       so driver shouldn't access page->mapping directly. Instead, driver should
50aab9b14   Ralph Campbell   mm/doc: editorial...
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
       use page_mapping() which masks off the low two bits of page->mapping under
       page lock so it can get the right struct address_space.
  
       For testing of non-LRU movable pages, VM supports __PageMovable() function.
       However, it doesn't guarantee to identify non-LRU movable pages because
       the page->mapping field is unified with other variables in struct page.
       If the driver releases the page after isolation by VM, page->mapping
       doesn't have a stable value although it has PAGE_MAPPING_MOVABLE set
       (look at __ClearPageMovable). But __PageMovable() is cheap to call whether
       page is LRU or non-LRU movable once the page has been isolated because LRU
       pages can never have PAGE_MAPPING_MOVABLE set in page->mapping. It is also
       good for just peeking to test non-LRU movable pages before more expensive
       checking with lock_page() in pfn scanning to select a victim.
  
       For guaranteeing non-LRU movable page, VM provides PageMovable() function.
       Unlike __PageMovable(), PageMovable() validates page->mapping and
       mapping->a_ops->isolate_page under lock_page(). The lock_page() prevents
       sudden destroying of page->mapping.
  
       Drivers using __SetPageMovable() should clear the flag via
       __ClearMovablePage() under page_lock() before the releasing the page.
1b7599b5d   Mike Rapoport   docs/vm: page_mig...
242
243
244
245
  
     * PG_isolated
  
       To prevent concurrent isolation among several CPUs, VM marks isolated page
50aab9b14   Ralph Campbell   mm/doc: editorial...
246
247
248
249
250
251
252
       as PG_isolated under lock_page(). So if a CPU encounters PG_isolated
       non-LRU movable page, it can skip it. Driver doesn't need to manipulate the
       flag because VM will set/clear it automatically. Keep in mind that if the
       driver sees a PG_isolated page, it means the page has been isolated by the
       VM so it shouldn't touch the page.lru field.
       The PG_isolated flag is aliased with the PG_reclaim flag so drivers
       shouldn't use PG_isolated for its own purposes.
bda807d44   Minchan Kim   mm: migrate: supp...
253

1a5bae25e   Anshuman Khandual   mm/vmstat: add ev...
254
255
256
257
258
259
260
261
262
263
264
265
266
  Monitoring Migration
  =====================
  
  The following events (counters) can be used to monitor page migration.
  
  1. PGMIGRATE_SUCCESS: Normal page migration success. Each count means that a
     page was migrated. If the page was a non-THP page, then this counter is
     increased by one. If the page was a THP, then this counter is increased by
     the number of THP subpages. For example, migration of a single 2MB THP that
     has 4KB-size base pages (subpages) will cause this counter to increase by
     512.
  
  2. PGMIGRATE_FAIL: Normal page migration failure. Same counting rules as for
50aab9b14   Ralph Campbell   mm/doc: editorial...
267
268
     PGMIGRATE_SUCCESS, above: this will be increased by the number of subpages,
     if it was a THP.
1a5bae25e   Anshuman Khandual   mm/vmstat: add ev...
269
270
271
272
273
274
275
276
277
278
279
  
  3. THP_MIGRATION_SUCCESS: A THP was migrated without being split.
  
  4. THP_MIGRATION_FAIL: A THP could not be migrated nor it could be split.
  
  5. THP_MIGRATION_SPLIT: A THP was migrated, but not as such: first, the THP had
     to be split. After splitting, a migration retry was used for it's sub-pages.
  
  THP_MIGRATION_* events also update the appropriate PGMIGRATE_SUCCESS or
  PGMIGRATE_FAIL events. For example, a THP migration failure will cause both
  THP_MIGRATION_FAIL and PGMIGRATE_FAIL to increase.
bda807d44   Minchan Kim   mm: migrate: supp...
280
281
  Christoph Lameter, May 8, 2006.
  Minchan Kim, Mar 28, 2016.