Commit 92e793495597af4135d94314113bf13eafb0e663
Committed by
Linus Torvalds
1 parent
107dab5c92
Exists in
master
and in
20 other branches
kmem: add slab-specific documentation about the kmem controller
Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Showing 1 changed file with 7 additions and 0 deletions Inline Diff
Documentation/cgroups/memory.txt
1 | Memory Resource Controller | 1 | Memory Resource Controller |
2 | 2 | ||
3 | NOTE: The Memory Resource Controller has generically been referred to as the | 3 | NOTE: The Memory Resource Controller has generically been referred to as the |
4 | memory controller in this document. Do not confuse memory controller | 4 | memory controller in this document. Do not confuse memory controller |
5 | used here with the memory controller that is used in hardware. | 5 | used here with the memory controller that is used in hardware. |
6 | 6 | ||
7 | (For editors) | 7 | (For editors) |
8 | In this document: | 8 | In this document: |
9 | When we mention a cgroup (cgroupfs's directory) with memory controller, | 9 | When we mention a cgroup (cgroupfs's directory) with memory controller, |
10 | we call it "memory cgroup". When you see git-log and source code, you'll | 10 | we call it "memory cgroup". When you see git-log and source code, you'll |
11 | see patch's title and function names tend to use "memcg". | 11 | see patch's title and function names tend to use "memcg". |
12 | In this document, we avoid using it. | 12 | In this document, we avoid using it. |
13 | 13 | ||
14 | Benefits and Purpose of the memory controller | 14 | Benefits and Purpose of the memory controller |
15 | 15 | ||
16 | The memory controller isolates the memory behaviour of a group of tasks | 16 | The memory controller isolates the memory behaviour of a group of tasks |
17 | from the rest of the system. The article on LWN [12] mentions some probable | 17 | from the rest of the system. The article on LWN [12] mentions some probable |
18 | uses of the memory controller. The memory controller can be used to | 18 | uses of the memory controller. The memory controller can be used to |
19 | 19 | ||
20 | a. Isolate an application or a group of applications | 20 | a. Isolate an application or a group of applications |
21 | Memory-hungry applications can be isolated and limited to a smaller | 21 | Memory-hungry applications can be isolated and limited to a smaller |
22 | amount of memory. | 22 | amount of memory. |
23 | b. Create a cgroup with a limited amount of memory; this can be used | 23 | b. Create a cgroup with a limited amount of memory; this can be used |
24 | as a good alternative to booting with mem=XXXX. | 24 | as a good alternative to booting with mem=XXXX. |
25 | c. Virtualization solutions can control the amount of memory they want | 25 | c. Virtualization solutions can control the amount of memory they want |
26 | to assign to a virtual machine instance. | 26 | to assign to a virtual machine instance. |
27 | d. A CD/DVD burner could control the amount of memory used by the | 27 | d. A CD/DVD burner could control the amount of memory used by the |
28 | rest of the system to ensure that burning does not fail due to lack | 28 | rest of the system to ensure that burning does not fail due to lack |
29 | of available memory. | 29 | of available memory. |
30 | e. There are several other use cases; find one or use the controller just | 30 | e. There are several other use cases; find one or use the controller just |
31 | for fun (to learn and hack on the VM subsystem). | 31 | for fun (to learn and hack on the VM subsystem). |
32 | 32 | ||
33 | Current Status: linux-2.6.34-mmotm(development version of 2010/April) | 33 | Current Status: linux-2.6.34-mmotm(development version of 2010/April) |
34 | 34 | ||
35 | Features: | 35 | Features: |
36 | - accounting anonymous pages, file caches, swap caches usage and limiting them. | 36 | - accounting anonymous pages, file caches, swap caches usage and limiting them. |
37 | - pages are linked to per-memcg LRU exclusively, and there is no global LRU. | 37 | - pages are linked to per-memcg LRU exclusively, and there is no global LRU. |
38 | - optionally, memory+swap usage can be accounted and limited. | 38 | - optionally, memory+swap usage can be accounted and limited. |
39 | - hierarchical accounting | 39 | - hierarchical accounting |
40 | - soft limit | 40 | - soft limit |
41 | - moving (recharging) account at moving a task is selectable. | 41 | - moving (recharging) account at moving a task is selectable. |
42 | - usage threshold notifier | 42 | - usage threshold notifier |
43 | - oom-killer disable knob and oom-notifier | 43 | - oom-killer disable knob and oom-notifier |
44 | - Root cgroup has no limit controls. | 44 | - Root cgroup has no limit controls. |
45 | 45 | ||
46 | Kernel memory support is a work in progress, and the current version provides | 46 | Kernel memory support is a work in progress, and the current version provides |
47 | basically functionality. (See Section 2.7) | 47 | basically functionality. (See Section 2.7) |
48 | 48 | ||
49 | Brief summary of control files. | 49 | Brief summary of control files. |
50 | 50 | ||
51 | tasks # attach a task(thread) and show list of threads | 51 | tasks # attach a task(thread) and show list of threads |
52 | cgroup.procs # show list of processes | 52 | cgroup.procs # show list of processes |
53 | cgroup.event_control # an interface for event_fd() | 53 | cgroup.event_control # an interface for event_fd() |
54 | memory.usage_in_bytes # show current res_counter usage for memory | 54 | memory.usage_in_bytes # show current res_counter usage for memory |
55 | (See 5.5 for details) | 55 | (See 5.5 for details) |
56 | memory.memsw.usage_in_bytes # show current res_counter usage for memory+Swap | 56 | memory.memsw.usage_in_bytes # show current res_counter usage for memory+Swap |
57 | (See 5.5 for details) | 57 | (See 5.5 for details) |
58 | memory.limit_in_bytes # set/show limit of memory usage | 58 | memory.limit_in_bytes # set/show limit of memory usage |
59 | memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage | 59 | memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage |
60 | memory.failcnt # show the number of memory usage hits limits | 60 | memory.failcnt # show the number of memory usage hits limits |
61 | memory.memsw.failcnt # show the number of memory+Swap hits limits | 61 | memory.memsw.failcnt # show the number of memory+Swap hits limits |
62 | memory.max_usage_in_bytes # show max memory usage recorded | 62 | memory.max_usage_in_bytes # show max memory usage recorded |
63 | memory.memsw.max_usage_in_bytes # show max memory+Swap usage recorded | 63 | memory.memsw.max_usage_in_bytes # show max memory+Swap usage recorded |
64 | memory.soft_limit_in_bytes # set/show soft limit of memory usage | 64 | memory.soft_limit_in_bytes # set/show soft limit of memory usage |
65 | memory.stat # show various statistics | 65 | memory.stat # show various statistics |
66 | memory.use_hierarchy # set/show hierarchical account enabled | 66 | memory.use_hierarchy # set/show hierarchical account enabled |
67 | memory.force_empty # trigger forced move charge to parent | 67 | memory.force_empty # trigger forced move charge to parent |
68 | memory.swappiness # set/show swappiness parameter of vmscan | 68 | memory.swappiness # set/show swappiness parameter of vmscan |
69 | (See sysctl's vm.swappiness) | 69 | (See sysctl's vm.swappiness) |
70 | memory.move_charge_at_immigrate # set/show controls of moving charges | 70 | memory.move_charge_at_immigrate # set/show controls of moving charges |
71 | memory.oom_control # set/show oom controls. | 71 | memory.oom_control # set/show oom controls. |
72 | memory.numa_stat # show the number of memory usage per numa node | 72 | memory.numa_stat # show the number of memory usage per numa node |
73 | 73 | ||
74 | memory.kmem.limit_in_bytes # set/show hard limit for kernel memory | 74 | memory.kmem.limit_in_bytes # set/show hard limit for kernel memory |
75 | memory.kmem.usage_in_bytes # show current kernel memory allocation | 75 | memory.kmem.usage_in_bytes # show current kernel memory allocation |
76 | memory.kmem.failcnt # show the number of kernel memory usage hits limits | 76 | memory.kmem.failcnt # show the number of kernel memory usage hits limits |
77 | memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded | 77 | memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded |
78 | 78 | ||
79 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory | 79 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory |
80 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation | 80 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation |
81 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits | 81 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits |
82 | memory.kmem.tcp.max_usage_in_bytes # show max tcp buf memory usage recorded | 82 | memory.kmem.tcp.max_usage_in_bytes # show max tcp buf memory usage recorded |
83 | 83 | ||
84 | 1. History | 84 | 1. History |
85 | 85 | ||
86 | The memory controller has a long history. A request for comments for the memory | 86 | The memory controller has a long history. A request for comments for the memory |
87 | controller was posted by Balbir Singh [1]. At the time the RFC was posted | 87 | controller was posted by Balbir Singh [1]. At the time the RFC was posted |
88 | there were several implementations for memory control. The goal of the | 88 | there were several implementations for memory control. The goal of the |
89 | RFC was to build consensus and agreement for the minimal features required | 89 | RFC was to build consensus and agreement for the minimal features required |
90 | for memory control. The first RSS controller was posted by Balbir Singh[2] | 90 | for memory control. The first RSS controller was posted by Balbir Singh[2] |
91 | in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the | 91 | in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the |
92 | RSS controller. At OLS, at the resource management BoF, everyone suggested | 92 | RSS controller. At OLS, at the resource management BoF, everyone suggested |
93 | that we handle both page cache and RSS together. Another request was raised | 93 | that we handle both page cache and RSS together. Another request was raised |
94 | to allow user space handling of OOM. The current memory controller is | 94 | to allow user space handling of OOM. The current memory controller is |
95 | at version 6; it combines both mapped (RSS) and unmapped Page | 95 | at version 6; it combines both mapped (RSS) and unmapped Page |
96 | Cache Control [11]. | 96 | Cache Control [11]. |
97 | 97 | ||
98 | 2. Memory Control | 98 | 2. Memory Control |
99 | 99 | ||
100 | Memory is a unique resource in the sense that it is present in a limited | 100 | Memory is a unique resource in the sense that it is present in a limited |
101 | amount. If a task requires a lot of CPU processing, the task can spread | 101 | amount. If a task requires a lot of CPU processing, the task can spread |
102 | its processing over a period of hours, days, months or years, but with | 102 | its processing over a period of hours, days, months or years, but with |
103 | memory, the same physical memory needs to be reused to accomplish the task. | 103 | memory, the same physical memory needs to be reused to accomplish the task. |
104 | 104 | ||
105 | The memory controller implementation has been divided into phases. These | 105 | The memory controller implementation has been divided into phases. These |
106 | are: | 106 | are: |
107 | 107 | ||
108 | 1. Memory controller | 108 | 1. Memory controller |
109 | 2. mlock(2) controller | 109 | 2. mlock(2) controller |
110 | 3. Kernel user memory accounting and slab control | 110 | 3. Kernel user memory accounting and slab control |
111 | 4. user mappings length controller | 111 | 4. user mappings length controller |
112 | 112 | ||
113 | The memory controller is the first controller developed. | 113 | The memory controller is the first controller developed. |
114 | 114 | ||
115 | 2.1. Design | 115 | 2.1. Design |
116 | 116 | ||
117 | The core of the design is a counter called the res_counter. The res_counter | 117 | The core of the design is a counter called the res_counter. The res_counter |
118 | tracks the current memory usage and limit of the group of processes associated | 118 | tracks the current memory usage and limit of the group of processes associated |
119 | with the controller. Each cgroup has a memory controller specific data | 119 | with the controller. Each cgroup has a memory controller specific data |
120 | structure (mem_cgroup) associated with it. | 120 | structure (mem_cgroup) associated with it. |
121 | 121 | ||
122 | 2.2. Accounting | 122 | 2.2. Accounting |
123 | 123 | ||
124 | +--------------------+ | 124 | +--------------------+ |
125 | | mem_cgroup | | 125 | | mem_cgroup | |
126 | | (res_counter) | | 126 | | (res_counter) | |
127 | +--------------------+ | 127 | +--------------------+ |
128 | / ^ \ | 128 | / ^ \ |
129 | / | \ | 129 | / | \ |
130 | +---------------+ | +---------------+ | 130 | +---------------+ | +---------------+ |
131 | | mm_struct | |.... | mm_struct | | 131 | | mm_struct | |.... | mm_struct | |
132 | | | | | | | 132 | | | | | | |
133 | +---------------+ | +---------------+ | 133 | +---------------+ | +---------------+ |
134 | | | 134 | | |
135 | + --------------+ | 135 | + --------------+ |
136 | | | 136 | | |
137 | +---------------+ +------+--------+ | 137 | +---------------+ +------+--------+ |
138 | | page +----------> page_cgroup| | 138 | | page +----------> page_cgroup| |
139 | | | | | | 139 | | | | | |
140 | +---------------+ +---------------+ | 140 | +---------------+ +---------------+ |
141 | 141 | ||
142 | (Figure 1: Hierarchy of Accounting) | 142 | (Figure 1: Hierarchy of Accounting) |
143 | 143 | ||
144 | 144 | ||
145 | Figure 1 shows the important aspects of the controller | 145 | Figure 1 shows the important aspects of the controller |
146 | 146 | ||
147 | 1. Accounting happens per cgroup | 147 | 1. Accounting happens per cgroup |
148 | 2. Each mm_struct knows about which cgroup it belongs to | 148 | 2. Each mm_struct knows about which cgroup it belongs to |
149 | 3. Each page has a pointer to the page_cgroup, which in turn knows the | 149 | 3. Each page has a pointer to the page_cgroup, which in turn knows the |
150 | cgroup it belongs to | 150 | cgroup it belongs to |
151 | 151 | ||
152 | The accounting is done as follows: mem_cgroup_charge_common() is invoked to | 152 | The accounting is done as follows: mem_cgroup_charge_common() is invoked to |
153 | set up the necessary data structures and check if the cgroup that is being | 153 | set up the necessary data structures and check if the cgroup that is being |
154 | charged is over its limit. If it is, then reclaim is invoked on the cgroup. | 154 | charged is over its limit. If it is, then reclaim is invoked on the cgroup. |
155 | More details can be found in the reclaim section of this document. | 155 | More details can be found in the reclaim section of this document. |
156 | If everything goes well, a page meta-data-structure called page_cgroup is | 156 | If everything goes well, a page meta-data-structure called page_cgroup is |
157 | updated. page_cgroup has its own LRU on cgroup. | 157 | updated. page_cgroup has its own LRU on cgroup. |
158 | (*) page_cgroup structure is allocated at boot/memory-hotplug time. | 158 | (*) page_cgroup structure is allocated at boot/memory-hotplug time. |
159 | 159 | ||
160 | 2.2.1 Accounting details | 160 | 2.2.1 Accounting details |
161 | 161 | ||
162 | All mapped anon pages (RSS) and cache pages (Page Cache) are accounted. | 162 | All mapped anon pages (RSS) and cache pages (Page Cache) are accounted. |
163 | Some pages which are never reclaimable and will not be on the LRU | 163 | Some pages which are never reclaimable and will not be on the LRU |
164 | are not accounted. We just account pages under usual VM management. | 164 | are not accounted. We just account pages under usual VM management. |
165 | 165 | ||
166 | RSS pages are accounted at page_fault unless they've already been accounted | 166 | RSS pages are accounted at page_fault unless they've already been accounted |
167 | for earlier. A file page will be accounted for as Page Cache when it's | 167 | for earlier. A file page will be accounted for as Page Cache when it's |
168 | inserted into inode (radix-tree). While it's mapped into the page tables of | 168 | inserted into inode (radix-tree). While it's mapped into the page tables of |
169 | processes, duplicate accounting is carefully avoided. | 169 | processes, duplicate accounting is carefully avoided. |
170 | 170 | ||
171 | An RSS page is unaccounted when it's fully unmapped. A PageCache page is | 171 | An RSS page is unaccounted when it's fully unmapped. A PageCache page is |
172 | unaccounted when it's removed from radix-tree. Even if RSS pages are fully | 172 | unaccounted when it's removed from radix-tree. Even if RSS pages are fully |
173 | unmapped (by kswapd), they may exist as SwapCache in the system until they | 173 | unmapped (by kswapd), they may exist as SwapCache in the system until they |
174 | are really freed. Such SwapCaches are also accounted. | 174 | are really freed. Such SwapCaches are also accounted. |
175 | A swapped-in page is not accounted until it's mapped. | 175 | A swapped-in page is not accounted until it's mapped. |
176 | 176 | ||
177 | Note: The kernel does swapin-readahead and reads multiple swaps at once. | 177 | Note: The kernel does swapin-readahead and reads multiple swaps at once. |
178 | This means swapped-in pages may contain pages for other tasks than a task | 178 | This means swapped-in pages may contain pages for other tasks than a task |
179 | causing page fault. So, we avoid accounting at swap-in I/O. | 179 | causing page fault. So, we avoid accounting at swap-in I/O. |
180 | 180 | ||
181 | At page migration, accounting information is kept. | 181 | At page migration, accounting information is kept. |
182 | 182 | ||
183 | Note: we just account pages-on-LRU because our purpose is to control amount | 183 | Note: we just account pages-on-LRU because our purpose is to control amount |
184 | of used pages; not-on-LRU pages tend to be out-of-control from VM view. | 184 | of used pages; not-on-LRU pages tend to be out-of-control from VM view. |
185 | 185 | ||
186 | 2.3 Shared Page Accounting | 186 | 2.3 Shared Page Accounting |
187 | 187 | ||
188 | Shared pages are accounted on the basis of the first touch approach. The | 188 | Shared pages are accounted on the basis of the first touch approach. The |
189 | cgroup that first touches a page is accounted for the page. The principle | 189 | cgroup that first touches a page is accounted for the page. The principle |
190 | behind this approach is that a cgroup that aggressively uses a shared | 190 | behind this approach is that a cgroup that aggressively uses a shared |
191 | page will eventually get charged for it (once it is uncharged from | 191 | page will eventually get charged for it (once it is uncharged from |
192 | the cgroup that brought it in -- this will happen on memory pressure). | 192 | the cgroup that brought it in -- this will happen on memory pressure). |
193 | 193 | ||
194 | But see section 8.2: when moving a task to another cgroup, its pages may | 194 | But see section 8.2: when moving a task to another cgroup, its pages may |
195 | be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. | 195 | be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. |
196 | 196 | ||
197 | Exception: If CONFIG_CGROUP_CGROUP_MEMCG_SWAP is not used. | 197 | Exception: If CONFIG_CGROUP_CGROUP_MEMCG_SWAP is not used. |
198 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to | 198 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to |
199 | be backed into memory in force, charges for pages are accounted against the | 199 | be backed into memory in force, charges for pages are accounted against the |
200 | caller of swapoff rather than the users of shmem. | 200 | caller of swapoff rather than the users of shmem. |
201 | 201 | ||
202 | 2.4 Swap Extension (CONFIG_MEMCG_SWAP) | 202 | 2.4 Swap Extension (CONFIG_MEMCG_SWAP) |
203 | 203 | ||
204 | Swap Extension allows you to record charge for swap. A swapped-in page is | 204 | Swap Extension allows you to record charge for swap. A swapped-in page is |
205 | charged back to original page allocator if possible. | 205 | charged back to original page allocator if possible. |
206 | 206 | ||
207 | When swap is accounted, following files are added. | 207 | When swap is accounted, following files are added. |
208 | - memory.memsw.usage_in_bytes. | 208 | - memory.memsw.usage_in_bytes. |
209 | - memory.memsw.limit_in_bytes. | 209 | - memory.memsw.limit_in_bytes. |
210 | 210 | ||
211 | memsw means memory+swap. Usage of memory+swap is limited by | 211 | memsw means memory+swap. Usage of memory+swap is limited by |
212 | memsw.limit_in_bytes. | 212 | memsw.limit_in_bytes. |
213 | 213 | ||
214 | Example: Assume a system with 4G of swap. A task which allocates 6G of memory | 214 | Example: Assume a system with 4G of swap. A task which allocates 6G of memory |
215 | (by mistake) under 2G memory limitation will use all swap. | 215 | (by mistake) under 2G memory limitation will use all swap. |
216 | In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap. | 216 | In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap. |
217 | By using the memsw limit, you can avoid system OOM which can be caused by swap | 217 | By using the memsw limit, you can avoid system OOM which can be caused by swap |
218 | shortage. | 218 | shortage. |
219 | 219 | ||
220 | * why 'memory+swap' rather than swap. | 220 | * why 'memory+swap' rather than swap. |
221 | The global LRU(kswapd) can swap out arbitrary pages. Swap-out means | 221 | The global LRU(kswapd) can swap out arbitrary pages. Swap-out means |
222 | to move account from memory to swap...there is no change in usage of | 222 | to move account from memory to swap...there is no change in usage of |
223 | memory+swap. In other words, when we want to limit the usage of swap without | 223 | memory+swap. In other words, when we want to limit the usage of swap without |
224 | affecting global LRU, memory+swap limit is better than just limiting swap from | 224 | affecting global LRU, memory+swap limit is better than just limiting swap from |
225 | an OS point of view. | 225 | an OS point of view. |
226 | 226 | ||
227 | * What happens when a cgroup hits memory.memsw.limit_in_bytes | 227 | * What happens when a cgroup hits memory.memsw.limit_in_bytes |
228 | When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out | 228 | When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out |
229 | in this cgroup. Then, swap-out will not be done by cgroup routine and file | 229 | in this cgroup. Then, swap-out will not be done by cgroup routine and file |
230 | caches are dropped. But as mentioned above, global LRU can do swapout memory | 230 | caches are dropped. But as mentioned above, global LRU can do swapout memory |
231 | from it for sanity of the system's memory management state. You can't forbid | 231 | from it for sanity of the system's memory management state. You can't forbid |
232 | it by cgroup. | 232 | it by cgroup. |
233 | 233 | ||
234 | 2.5 Reclaim | 234 | 2.5 Reclaim |
235 | 235 | ||
236 | Each cgroup maintains a per cgroup LRU which has the same structure as | 236 | Each cgroup maintains a per cgroup LRU which has the same structure as |
237 | global VM. When a cgroup goes over its limit, we first try | 237 | global VM. When a cgroup goes over its limit, we first try |
238 | to reclaim memory from the cgroup so as to make space for the new | 238 | to reclaim memory from the cgroup so as to make space for the new |
239 | pages that the cgroup has touched. If the reclaim is unsuccessful, | 239 | pages that the cgroup has touched. If the reclaim is unsuccessful, |
240 | an OOM routine is invoked to select and kill the bulkiest task in the | 240 | an OOM routine is invoked to select and kill the bulkiest task in the |
241 | cgroup. (See 10. OOM Control below.) | 241 | cgroup. (See 10. OOM Control below.) |
242 | 242 | ||
243 | The reclaim algorithm has not been modified for cgroups, except that | 243 | The reclaim algorithm has not been modified for cgroups, except that |
244 | pages that are selected for reclaiming come from the per-cgroup LRU | 244 | pages that are selected for reclaiming come from the per-cgroup LRU |
245 | list. | 245 | list. |
246 | 246 | ||
247 | NOTE: Reclaim does not work for the root cgroup, since we cannot set any | 247 | NOTE: Reclaim does not work for the root cgroup, since we cannot set any |
248 | limits on the root cgroup. | 248 | limits on the root cgroup. |
249 | 249 | ||
250 | Note2: When panic_on_oom is set to "2", the whole system will panic. | 250 | Note2: When panic_on_oom is set to "2", the whole system will panic. |
251 | 251 | ||
252 | When oom event notifier is registered, event will be delivered. | 252 | When oom event notifier is registered, event will be delivered. |
253 | (See oom_control section) | 253 | (See oom_control section) |
254 | 254 | ||
255 | 2.6 Locking | 255 | 2.6 Locking |
256 | 256 | ||
257 | lock_page_cgroup()/unlock_page_cgroup() should not be called under | 257 | lock_page_cgroup()/unlock_page_cgroup() should not be called under |
258 | mapping->tree_lock. | 258 | mapping->tree_lock. |
259 | 259 | ||
260 | Other lock order is following: | 260 | Other lock order is following: |
261 | PG_locked. | 261 | PG_locked. |
262 | mm->page_table_lock | 262 | mm->page_table_lock |
263 | zone->lru_lock | 263 | zone->lru_lock |
264 | lock_page_cgroup. | 264 | lock_page_cgroup. |
265 | In many cases, just lock_page_cgroup() is called. | 265 | In many cases, just lock_page_cgroup() is called. |
266 | per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by | 266 | per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by |
267 | zone->lru_lock, it has no lock of its own. | 267 | zone->lru_lock, it has no lock of its own. |
268 | 268 | ||
269 | 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) | 269 | 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) |
270 | 270 | ||
271 | With the Kernel memory extension, the Memory Controller is able to limit | 271 | With the Kernel memory extension, the Memory Controller is able to limit |
272 | the amount of kernel memory used by the system. Kernel memory is fundamentally | 272 | the amount of kernel memory used by the system. Kernel memory is fundamentally |
273 | different than user memory, since it can't be swapped out, which makes it | 273 | different than user memory, since it can't be swapped out, which makes it |
274 | possible to DoS the system by consuming too much of this precious resource. | 274 | possible to DoS the system by consuming too much of this precious resource. |
275 | 275 | ||
276 | Kernel memory won't be accounted at all until limit on a group is set. This | 276 | Kernel memory won't be accounted at all until limit on a group is set. This |
277 | allows for existing setups to continue working without disruption. The limit | 277 | allows for existing setups to continue working without disruption. The limit |
278 | cannot be set if the cgroup have children, or if there are already tasks in the | 278 | cannot be set if the cgroup have children, or if there are already tasks in the |
279 | cgroup. Attempting to set the limit under those conditions will return -EBUSY. | 279 | cgroup. Attempting to set the limit under those conditions will return -EBUSY. |
280 | When use_hierarchy == 1 and a group is accounted, its children will | 280 | When use_hierarchy == 1 and a group is accounted, its children will |
281 | automatically be accounted regardless of their limit value. | 281 | automatically be accounted regardless of their limit value. |
282 | 282 | ||
283 | After a group is first limited, it will be kept being accounted until it | 283 | After a group is first limited, it will be kept being accounted until it |
284 | is removed. The memory limitation itself, can of course be removed by writing | 284 | is removed. The memory limitation itself, can of course be removed by writing |
285 | -1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not | 285 | -1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not |
286 | limited. | 286 | limited. |
287 | 287 | ||
288 | Kernel memory limits are not imposed for the root cgroup. Usage for the root | 288 | Kernel memory limits are not imposed for the root cgroup. Usage for the root |
289 | cgroup may or may not be accounted. The memory used is accumulated into | 289 | cgroup may or may not be accounted. The memory used is accumulated into |
290 | memory.kmem.usage_in_bytes, or in a separate counter when it makes sense. | 290 | memory.kmem.usage_in_bytes, or in a separate counter when it makes sense. |
291 | (currently only for tcp). | 291 | (currently only for tcp). |
292 | The main "kmem" counter is fed into the main counter, so kmem charges will | 292 | The main "kmem" counter is fed into the main counter, so kmem charges will |
293 | also be visible from the user counter. | 293 | also be visible from the user counter. |
294 | 294 | ||
295 | Currently no soft limit is implemented for kernel memory. It is future work | 295 | Currently no soft limit is implemented for kernel memory. It is future work |
296 | to trigger slab reclaim when those limits are reached. | 296 | to trigger slab reclaim when those limits are reached. |
297 | 297 | ||
298 | 2.7.1 Current Kernel Memory resources accounted | 298 | 2.7.1 Current Kernel Memory resources accounted |
299 | 299 | ||
300 | * stack pages: every process consumes some stack pages. By accounting into | 300 | * stack pages: every process consumes some stack pages. By accounting into |
301 | kernel memory, we prevent new processes from being created when the kernel | 301 | kernel memory, we prevent new processes from being created when the kernel |
302 | memory usage is too high. | 302 | memory usage is too high. |
303 | 303 | ||
304 | * slab pages: pages allocated by the SLAB or SLUB allocator are tracked. A copy | ||
305 | of each kmem_cache is created everytime the cache is touched by the first time | ||
306 | from inside the memcg. The creation is done lazily, so some objects can still be | ||
307 | skipped while the cache is being created. All objects in a slab page should | ||
308 | belong to the same memcg. This only fails to hold when a task is migrated to a | ||
309 | different memcg during the page allocation by the cache. | ||
310 | |||
304 | * sockets memory pressure: some sockets protocols have memory pressure | 311 | * sockets memory pressure: some sockets protocols have memory pressure |
305 | thresholds. The Memory Controller allows them to be controlled individually | 312 | thresholds. The Memory Controller allows them to be controlled individually |
306 | per cgroup, instead of globally. | 313 | per cgroup, instead of globally. |
307 | 314 | ||
308 | * tcp memory pressure: sockets memory pressure for the tcp protocol. | 315 | * tcp memory pressure: sockets memory pressure for the tcp protocol. |
309 | 316 | ||
310 | 2.7.3 Common use cases | 317 | 2.7.3 Common use cases |
311 | 318 | ||
312 | Because the "kmem" counter is fed to the main user counter, kernel memory can | 319 | Because the "kmem" counter is fed to the main user counter, kernel memory can |
313 | never be limited completely independently of user memory. Say "U" is the user | 320 | never be limited completely independently of user memory. Say "U" is the user |
314 | limit, and "K" the kernel limit. There are three possible ways limits can be | 321 | limit, and "K" the kernel limit. There are three possible ways limits can be |
315 | set: | 322 | set: |
316 | 323 | ||
317 | U != 0, K = unlimited: | 324 | U != 0, K = unlimited: |
318 | This is the standard memcg limitation mechanism already present before kmem | 325 | This is the standard memcg limitation mechanism already present before kmem |
319 | accounting. Kernel memory is completely ignored. | 326 | accounting. Kernel memory is completely ignored. |
320 | 327 | ||
321 | U != 0, K < U: | 328 | U != 0, K < U: |
322 | Kernel memory is a subset of the user memory. This setup is useful in | 329 | Kernel memory is a subset of the user memory. This setup is useful in |
323 | deployments where the total amount of memory per-cgroup is overcommited. | 330 | deployments where the total amount of memory per-cgroup is overcommited. |
324 | Overcommiting kernel memory limits is definitely not recommended, since the | 331 | Overcommiting kernel memory limits is definitely not recommended, since the |
325 | box can still run out of non-reclaimable memory. | 332 | box can still run out of non-reclaimable memory. |
326 | In this case, the admin could set up K so that the sum of all groups is | 333 | In this case, the admin could set up K so that the sum of all groups is |
327 | never greater than the total memory, and freely set U at the cost of his | 334 | never greater than the total memory, and freely set U at the cost of his |
328 | QoS. | 335 | QoS. |
329 | 336 | ||
330 | U != 0, K >= U: | 337 | U != 0, K >= U: |
331 | Since kmem charges will also be fed to the user counter and reclaim will be | 338 | Since kmem charges will also be fed to the user counter and reclaim will be |
332 | triggered for the cgroup for both kinds of memory. This setup gives the | 339 | triggered for the cgroup for both kinds of memory. This setup gives the |
333 | admin a unified view of memory, and it is also useful for people who just | 340 | admin a unified view of memory, and it is also useful for people who just |
334 | want to track kernel memory usage. | 341 | want to track kernel memory usage. |
335 | 342 | ||
336 | 3. User Interface | 343 | 3. User Interface |
337 | 344 | ||
338 | 0. Configuration | 345 | 0. Configuration |
339 | 346 | ||
340 | a. Enable CONFIG_CGROUPS | 347 | a. Enable CONFIG_CGROUPS |
341 | b. Enable CONFIG_RESOURCE_COUNTERS | 348 | b. Enable CONFIG_RESOURCE_COUNTERS |
342 | c. Enable CONFIG_MEMCG | 349 | c. Enable CONFIG_MEMCG |
343 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) | 350 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) |
344 | d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) | 351 | d. Enable CONFIG_MEMCG_KMEM (to use kmem extension) |
345 | 352 | ||
346 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) | 353 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
347 | # mount -t tmpfs none /sys/fs/cgroup | 354 | # mount -t tmpfs none /sys/fs/cgroup |
348 | # mkdir /sys/fs/cgroup/memory | 355 | # mkdir /sys/fs/cgroup/memory |
349 | # mount -t cgroup none /sys/fs/cgroup/memory -o memory | 356 | # mount -t cgroup none /sys/fs/cgroup/memory -o memory |
350 | 357 | ||
351 | 2. Make the new group and move bash into it | 358 | 2. Make the new group and move bash into it |
352 | # mkdir /sys/fs/cgroup/memory/0 | 359 | # mkdir /sys/fs/cgroup/memory/0 |
353 | # echo $$ > /sys/fs/cgroup/memory/0/tasks | 360 | # echo $$ > /sys/fs/cgroup/memory/0/tasks |
354 | 361 | ||
355 | Since now we're in the 0 cgroup, we can alter the memory limit: | 362 | Since now we're in the 0 cgroup, we can alter the memory limit: |
356 | # echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes | 363 | # echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
357 | 364 | ||
358 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, | 365 | NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, |
359 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) | 366 | mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) |
360 | 367 | ||
361 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). | 368 | NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). |
362 | NOTE: We cannot set limits on the root cgroup any more. | 369 | NOTE: We cannot set limits on the root cgroup any more. |
363 | 370 | ||
364 | # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes | 371 | # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes |
365 | 4194304 | 372 | 4194304 |
366 | 373 | ||
367 | We can check the usage: | 374 | We can check the usage: |
368 | # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes | 375 | # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes |
369 | 1216512 | 376 | 1216512 |
370 | 377 | ||
371 | A successful write to this file does not guarantee a successful setting of | 378 | A successful write to this file does not guarantee a successful setting of |
372 | this limit to the value written into the file. This can be due to a | 379 | this limit to the value written into the file. This can be due to a |
373 | number of factors, such as rounding up to page boundaries or the total | 380 | number of factors, such as rounding up to page boundaries or the total |
374 | availability of memory on the system. The user is required to re-read | 381 | availability of memory on the system. The user is required to re-read |
375 | this file after a write to guarantee the value committed by the kernel. | 382 | this file after a write to guarantee the value committed by the kernel. |
376 | 383 | ||
377 | # echo 1 > memory.limit_in_bytes | 384 | # echo 1 > memory.limit_in_bytes |
378 | # cat memory.limit_in_bytes | 385 | # cat memory.limit_in_bytes |
379 | 4096 | 386 | 4096 |
380 | 387 | ||
381 | The memory.failcnt field gives the number of times that the cgroup limit was | 388 | The memory.failcnt field gives the number of times that the cgroup limit was |
382 | exceeded. | 389 | exceeded. |
383 | 390 | ||
384 | The memory.stat file gives accounting information. Now, the number of | 391 | The memory.stat file gives accounting information. Now, the number of |
385 | caches, RSS and Active pages/Inactive pages are shown. | 392 | caches, RSS and Active pages/Inactive pages are shown. |
386 | 393 | ||
387 | 4. Testing | 394 | 4. Testing |
388 | 395 | ||
389 | For testing features and implementation, see memcg_test.txt. | 396 | For testing features and implementation, see memcg_test.txt. |
390 | 397 | ||
391 | Performance test is also important. To see pure memory controller's overhead, | 398 | Performance test is also important. To see pure memory controller's overhead, |
392 | testing on tmpfs will give you good numbers of small overheads. | 399 | testing on tmpfs will give you good numbers of small overheads. |
393 | Example: do kernel make on tmpfs. | 400 | Example: do kernel make on tmpfs. |
394 | 401 | ||
395 | Page-fault scalability is also important. At measuring parallel | 402 | Page-fault scalability is also important. At measuring parallel |
396 | page fault test, multi-process test may be better than multi-thread | 403 | page fault test, multi-process test may be better than multi-thread |
397 | test because it has noise of shared objects/status. | 404 | test because it has noise of shared objects/status. |
398 | 405 | ||
399 | But the above two are testing extreme situations. | 406 | But the above two are testing extreme situations. |
400 | Trying usual test under memory controller is always helpful. | 407 | Trying usual test under memory controller is always helpful. |
401 | 408 | ||
402 | 4.1 Troubleshooting | 409 | 4.1 Troubleshooting |
403 | 410 | ||
404 | Sometimes a user might find that the application under a cgroup is | 411 | Sometimes a user might find that the application under a cgroup is |
405 | terminated by the OOM killer. There are several causes for this: | 412 | terminated by the OOM killer. There are several causes for this: |
406 | 413 | ||
407 | 1. The cgroup limit is too low (just too low to do anything useful) | 414 | 1. The cgroup limit is too low (just too low to do anything useful) |
408 | 2. The user is using anonymous memory and swap is turned off or too low | 415 | 2. The user is using anonymous memory and swap is turned off or too low |
409 | 416 | ||
410 | A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of | 417 | A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of |
411 | some of the pages cached in the cgroup (page cache pages). | 418 | some of the pages cached in the cgroup (page cache pages). |
412 | 419 | ||
413 | To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and | 420 | To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and |
414 | seeing what happens will be helpful. | 421 | seeing what happens will be helpful. |
415 | 422 | ||
416 | 4.2 Task migration | 423 | 4.2 Task migration |
417 | 424 | ||
418 | When a task migrates from one cgroup to another, its charge is not | 425 | When a task migrates from one cgroup to another, its charge is not |
419 | carried forward by default. The pages allocated from the original cgroup still | 426 | carried forward by default. The pages allocated from the original cgroup still |
420 | remain charged to it, the charge is dropped when the page is freed or | 427 | remain charged to it, the charge is dropped when the page is freed or |
421 | reclaimed. | 428 | reclaimed. |
422 | 429 | ||
423 | You can move charges of a task along with task migration. | 430 | You can move charges of a task along with task migration. |
424 | See 8. "Move charges at task migration" | 431 | See 8. "Move charges at task migration" |
425 | 432 | ||
426 | 4.3 Removing a cgroup | 433 | 4.3 Removing a cgroup |
427 | 434 | ||
428 | A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a | 435 | A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a |
429 | cgroup might have some charge associated with it, even though all | 436 | cgroup might have some charge associated with it, even though all |
430 | tasks have migrated away from it. (because we charge against pages, not | 437 | tasks have migrated away from it. (because we charge against pages, not |
431 | against tasks.) | 438 | against tasks.) |
432 | 439 | ||
433 | We move the stats to root (if use_hierarchy==0) or parent (if | 440 | We move the stats to root (if use_hierarchy==0) or parent (if |
434 | use_hierarchy==1), and no change on the charge except uncharging | 441 | use_hierarchy==1), and no change on the charge except uncharging |
435 | from the child. | 442 | from the child. |
436 | 443 | ||
437 | Charges recorded in swap information is not updated at removal of cgroup. | 444 | Charges recorded in swap information is not updated at removal of cgroup. |
438 | Recorded information is discarded and a cgroup which uses swap (swapcache) | 445 | Recorded information is discarded and a cgroup which uses swap (swapcache) |
439 | will be charged as a new owner of it. | 446 | will be charged as a new owner of it. |
440 | 447 | ||
441 | About use_hierarchy, see Section 6. | 448 | About use_hierarchy, see Section 6. |
442 | 449 | ||
443 | 5. Misc. interfaces. | 450 | 5. Misc. interfaces. |
444 | 451 | ||
445 | 5.1 force_empty | 452 | 5.1 force_empty |
446 | memory.force_empty interface is provided to make cgroup's memory usage empty. | 453 | memory.force_empty interface is provided to make cgroup's memory usage empty. |
447 | You can use this interface only when the cgroup has no tasks. | 454 | You can use this interface only when the cgroup has no tasks. |
448 | When writing anything to this | 455 | When writing anything to this |
449 | 456 | ||
450 | # echo 0 > memory.force_empty | 457 | # echo 0 > memory.force_empty |
451 | 458 | ||
452 | Almost all pages tracked by this memory cgroup will be unmapped and freed. | 459 | Almost all pages tracked by this memory cgroup will be unmapped and freed. |
453 | Some pages cannot be freed because they are locked or in-use. Such pages are | 460 | Some pages cannot be freed because they are locked or in-use. Such pages are |
454 | moved to parent (if use_hierarchy==1) or root (if use_hierarchy==0) and this | 461 | moved to parent (if use_hierarchy==1) or root (if use_hierarchy==0) and this |
455 | cgroup will be empty. | 462 | cgroup will be empty. |
456 | 463 | ||
457 | The typical use case for this interface is before calling rmdir(). | 464 | The typical use case for this interface is before calling rmdir(). |
458 | Because rmdir() moves all pages to parent, some out-of-use page caches can be | 465 | Because rmdir() moves all pages to parent, some out-of-use page caches can be |
459 | moved to the parent. If you want to avoid that, force_empty will be useful. | 466 | moved to the parent. If you want to avoid that, force_empty will be useful. |
460 | 467 | ||
461 | Also, note that when memory.kmem.limit_in_bytes is set the charges due to | 468 | Also, note that when memory.kmem.limit_in_bytes is set the charges due to |
462 | kernel pages will still be seen. This is not considered a failure and the | 469 | kernel pages will still be seen. This is not considered a failure and the |
463 | write will still return success. In this case, it is expected that | 470 | write will still return success. In this case, it is expected that |
464 | memory.kmem.usage_in_bytes == memory.usage_in_bytes. | 471 | memory.kmem.usage_in_bytes == memory.usage_in_bytes. |
465 | 472 | ||
466 | About use_hierarchy, see Section 6. | 473 | About use_hierarchy, see Section 6. |
467 | 474 | ||
468 | 5.2 stat file | 475 | 5.2 stat file |
469 | 476 | ||
470 | memory.stat file includes following statistics | 477 | memory.stat file includes following statistics |
471 | 478 | ||
472 | # per-memory cgroup local status | 479 | # per-memory cgroup local status |
473 | cache - # of bytes of page cache memory. | 480 | cache - # of bytes of page cache memory. |
474 | rss - # of bytes of anonymous and swap cache memory. | 481 | rss - # of bytes of anonymous and swap cache memory. |
475 | mapped_file - # of bytes of mapped file (includes tmpfs/shmem) | 482 | mapped_file - # of bytes of mapped file (includes tmpfs/shmem) |
476 | pgpgin - # of charging events to the memory cgroup. The charging | 483 | pgpgin - # of charging events to the memory cgroup. The charging |
477 | event happens each time a page is accounted as either mapped | 484 | event happens each time a page is accounted as either mapped |
478 | anon page(RSS) or cache page(Page Cache) to the cgroup. | 485 | anon page(RSS) or cache page(Page Cache) to the cgroup. |
479 | pgpgout - # of uncharging events to the memory cgroup. The uncharging | 486 | pgpgout - # of uncharging events to the memory cgroup. The uncharging |
480 | event happens each time a page is unaccounted from the cgroup. | 487 | event happens each time a page is unaccounted from the cgroup. |
481 | swap - # of bytes of swap usage | 488 | swap - # of bytes of swap usage |
482 | inactive_anon - # of bytes of anonymous memory and swap cache memory on | 489 | inactive_anon - # of bytes of anonymous memory and swap cache memory on |
483 | LRU list. | 490 | LRU list. |
484 | active_anon - # of bytes of anonymous and swap cache memory on active | 491 | active_anon - # of bytes of anonymous and swap cache memory on active |
485 | inactive LRU list. | 492 | inactive LRU list. |
486 | inactive_file - # of bytes of file-backed memory on inactive LRU list. | 493 | inactive_file - # of bytes of file-backed memory on inactive LRU list. |
487 | active_file - # of bytes of file-backed memory on active LRU list. | 494 | active_file - # of bytes of file-backed memory on active LRU list. |
488 | unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc). | 495 | unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc). |
489 | 496 | ||
490 | # status considering hierarchy (see memory.use_hierarchy settings) | 497 | # status considering hierarchy (see memory.use_hierarchy settings) |
491 | 498 | ||
492 | hierarchical_memory_limit - # of bytes of memory limit with regard to hierarchy | 499 | hierarchical_memory_limit - # of bytes of memory limit with regard to hierarchy |
493 | under which the memory cgroup is | 500 | under which the memory cgroup is |
494 | hierarchical_memsw_limit - # of bytes of memory+swap limit with regard to | 501 | hierarchical_memsw_limit - # of bytes of memory+swap limit with regard to |
495 | hierarchy under which memory cgroup is. | 502 | hierarchy under which memory cgroup is. |
496 | 503 | ||
497 | total_<counter> - # hierarchical version of <counter>, which in | 504 | total_<counter> - # hierarchical version of <counter>, which in |
498 | addition to the cgroup's own value includes the | 505 | addition to the cgroup's own value includes the |
499 | sum of all hierarchical children's values of | 506 | sum of all hierarchical children's values of |
500 | <counter>, i.e. total_cache | 507 | <counter>, i.e. total_cache |
501 | 508 | ||
502 | # The following additional stats are dependent on CONFIG_DEBUG_VM. | 509 | # The following additional stats are dependent on CONFIG_DEBUG_VM. |
503 | 510 | ||
504 | recent_rotated_anon - VM internal parameter. (see mm/vmscan.c) | 511 | recent_rotated_anon - VM internal parameter. (see mm/vmscan.c) |
505 | recent_rotated_file - VM internal parameter. (see mm/vmscan.c) | 512 | recent_rotated_file - VM internal parameter. (see mm/vmscan.c) |
506 | recent_scanned_anon - VM internal parameter. (see mm/vmscan.c) | 513 | recent_scanned_anon - VM internal parameter. (see mm/vmscan.c) |
507 | recent_scanned_file - VM internal parameter. (see mm/vmscan.c) | 514 | recent_scanned_file - VM internal parameter. (see mm/vmscan.c) |
508 | 515 | ||
509 | Memo: | 516 | Memo: |
510 | recent_rotated means recent frequency of LRU rotation. | 517 | recent_rotated means recent frequency of LRU rotation. |
511 | recent_scanned means recent # of scans to LRU. | 518 | recent_scanned means recent # of scans to LRU. |
512 | showing for better debug please see the code for meanings. | 519 | showing for better debug please see the code for meanings. |
513 | 520 | ||
514 | Note: | 521 | Note: |
515 | Only anonymous and swap cache memory is listed as part of 'rss' stat. | 522 | Only anonymous and swap cache memory is listed as part of 'rss' stat. |
516 | This should not be confused with the true 'resident set size' or the | 523 | This should not be confused with the true 'resident set size' or the |
517 | amount of physical memory used by the cgroup. | 524 | amount of physical memory used by the cgroup. |
518 | 'rss + file_mapped" will give you resident set size of cgroup. | 525 | 'rss + file_mapped" will give you resident set size of cgroup. |
519 | (Note: file and shmem may be shared among other cgroups. In that case, | 526 | (Note: file and shmem may be shared among other cgroups. In that case, |
520 | file_mapped is accounted only when the memory cgroup is owner of page | 527 | file_mapped is accounted only when the memory cgroup is owner of page |
521 | cache.) | 528 | cache.) |
522 | 529 | ||
523 | 5.3 swappiness | 530 | 5.3 swappiness |
524 | 531 | ||
525 | Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only. | 532 | Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only. |
526 | Please note that unlike the global swappiness, memcg knob set to 0 | 533 | Please note that unlike the global swappiness, memcg knob set to 0 |
527 | really prevents from any swapping even if there is a swap storage | 534 | really prevents from any swapping even if there is a swap storage |
528 | available. This might lead to memcg OOM killer if there are no file | 535 | available. This might lead to memcg OOM killer if there are no file |
529 | pages to reclaim. | 536 | pages to reclaim. |
530 | 537 | ||
531 | Following cgroups' swappiness can't be changed. | 538 | Following cgroups' swappiness can't be changed. |
532 | - root cgroup (uses /proc/sys/vm/swappiness). | 539 | - root cgroup (uses /proc/sys/vm/swappiness). |
533 | - a cgroup which uses hierarchy and it has other cgroup(s) below it. | 540 | - a cgroup which uses hierarchy and it has other cgroup(s) below it. |
534 | - a cgroup which uses hierarchy and not the root of hierarchy. | 541 | - a cgroup which uses hierarchy and not the root of hierarchy. |
535 | 542 | ||
536 | 5.4 failcnt | 543 | 5.4 failcnt |
537 | 544 | ||
538 | A memory cgroup provides memory.failcnt and memory.memsw.failcnt files. | 545 | A memory cgroup provides memory.failcnt and memory.memsw.failcnt files. |
539 | This failcnt(== failure count) shows the number of times that a usage counter | 546 | This failcnt(== failure count) shows the number of times that a usage counter |
540 | hit its limit. When a memory cgroup hits a limit, failcnt increases and | 547 | hit its limit. When a memory cgroup hits a limit, failcnt increases and |
541 | memory under it will be reclaimed. | 548 | memory under it will be reclaimed. |
542 | 549 | ||
543 | You can reset failcnt by writing 0 to failcnt file. | 550 | You can reset failcnt by writing 0 to failcnt file. |
544 | # echo 0 > .../memory.failcnt | 551 | # echo 0 > .../memory.failcnt |
545 | 552 | ||
546 | 5.5 usage_in_bytes | 553 | 5.5 usage_in_bytes |
547 | 554 | ||
548 | For efficiency, as other kernel components, memory cgroup uses some optimization | 555 | For efficiency, as other kernel components, memory cgroup uses some optimization |
549 | to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the | 556 | to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the |
550 | method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz | 557 | method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz |
551 | value for efficient access. (Of course, when necessary, it's synchronized.) | 558 | value for efficient access. (Of course, when necessary, it's synchronized.) |
552 | If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) | 559 | If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) |
553 | value in memory.stat(see 5.2). | 560 | value in memory.stat(see 5.2). |
554 | 561 | ||
555 | 5.6 numa_stat | 562 | 5.6 numa_stat |
556 | 563 | ||
557 | This is similar to numa_maps but operates on a per-memcg basis. This is | 564 | This is similar to numa_maps but operates on a per-memcg basis. This is |
558 | useful for providing visibility into the numa locality information within | 565 | useful for providing visibility into the numa locality information within |
559 | an memcg since the pages are allowed to be allocated from any physical | 566 | an memcg since the pages are allowed to be allocated from any physical |
560 | node. One of the use cases is evaluating application performance by | 567 | node. One of the use cases is evaluating application performance by |
561 | combining this information with the application's CPU allocation. | 568 | combining this information with the application's CPU allocation. |
562 | 569 | ||
563 | We export "total", "file", "anon" and "unevictable" pages per-node for | 570 | We export "total", "file", "anon" and "unevictable" pages per-node for |
564 | each memcg. The ouput format of memory.numa_stat is: | 571 | each memcg. The ouput format of memory.numa_stat is: |
565 | 572 | ||
566 | total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... | 573 | total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... |
567 | file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... | 574 | file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... |
568 | anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | 575 | anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... |
569 | unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... | 576 | unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... |
570 | 577 | ||
571 | And we have total = file + anon + unevictable. | 578 | And we have total = file + anon + unevictable. |
572 | 579 | ||
573 | 6. Hierarchy support | 580 | 6. Hierarchy support |
574 | 581 | ||
575 | The memory controller supports a deep hierarchy and hierarchical accounting. | 582 | The memory controller supports a deep hierarchy and hierarchical accounting. |
576 | The hierarchy is created by creating the appropriate cgroups in the | 583 | The hierarchy is created by creating the appropriate cgroups in the |
577 | cgroup filesystem. Consider for example, the following cgroup filesystem | 584 | cgroup filesystem. Consider for example, the following cgroup filesystem |
578 | hierarchy | 585 | hierarchy |
579 | 586 | ||
580 | root | 587 | root |
581 | / | \ | 588 | / | \ |
582 | / | \ | 589 | / | \ |
583 | a b c | 590 | a b c |
584 | | \ | 591 | | \ |
585 | | \ | 592 | | \ |
586 | d e | 593 | d e |
587 | 594 | ||
588 | In the diagram above, with hierarchical accounting enabled, all memory | 595 | In the diagram above, with hierarchical accounting enabled, all memory |
589 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), | 596 | usage of e, is accounted to its ancestors up until the root (i.e, c and root), |
590 | that has memory.use_hierarchy enabled. If one of the ancestors goes over its | 597 | that has memory.use_hierarchy enabled. If one of the ancestors goes over its |
591 | limit, the reclaim algorithm reclaims from the tasks in the ancestor and the | 598 | limit, the reclaim algorithm reclaims from the tasks in the ancestor and the |
592 | children of the ancestor. | 599 | children of the ancestor. |
593 | 600 | ||
594 | 6.1 Enabling hierarchical accounting and reclaim | 601 | 6.1 Enabling hierarchical accounting and reclaim |
595 | 602 | ||
596 | A memory cgroup by default disables the hierarchy feature. Support | 603 | A memory cgroup by default disables the hierarchy feature. Support |
597 | can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup | 604 | can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup |
598 | 605 | ||
599 | # echo 1 > memory.use_hierarchy | 606 | # echo 1 > memory.use_hierarchy |
600 | 607 | ||
601 | The feature can be disabled by | 608 | The feature can be disabled by |
602 | 609 | ||
603 | # echo 0 > memory.use_hierarchy | 610 | # echo 0 > memory.use_hierarchy |
604 | 611 | ||
605 | NOTE1: Enabling/disabling will fail if either the cgroup already has other | 612 | NOTE1: Enabling/disabling will fail if either the cgroup already has other |
606 | cgroups created below it, or if the parent cgroup has use_hierarchy | 613 | cgroups created below it, or if the parent cgroup has use_hierarchy |
607 | enabled. | 614 | enabled. |
608 | 615 | ||
609 | NOTE2: When panic_on_oom is set to "2", the whole system will panic in | 616 | NOTE2: When panic_on_oom is set to "2", the whole system will panic in |
610 | case of an OOM event in any cgroup. | 617 | case of an OOM event in any cgroup. |
611 | 618 | ||
612 | 7. Soft limits | 619 | 7. Soft limits |
613 | 620 | ||
614 | Soft limits allow for greater sharing of memory. The idea behind soft limits | 621 | Soft limits allow for greater sharing of memory. The idea behind soft limits |
615 | is to allow control groups to use as much of the memory as needed, provided | 622 | is to allow control groups to use as much of the memory as needed, provided |
616 | 623 | ||
617 | a. There is no memory contention | 624 | a. There is no memory contention |
618 | b. They do not exceed their hard limit | 625 | b. They do not exceed their hard limit |
619 | 626 | ||
620 | When the system detects memory contention or low memory, control groups | 627 | When the system detects memory contention or low memory, control groups |
621 | are pushed back to their soft limits. If the soft limit of each control | 628 | are pushed back to their soft limits. If the soft limit of each control |
622 | group is very high, they are pushed back as much as possible to make | 629 | group is very high, they are pushed back as much as possible to make |
623 | sure that one control group does not starve the others of memory. | 630 | sure that one control group does not starve the others of memory. |
624 | 631 | ||
625 | Please note that soft limits is a best-effort feature; it comes with | 632 | Please note that soft limits is a best-effort feature; it comes with |
626 | no guarantees, but it does its best to make sure that when memory is | 633 | no guarantees, but it does its best to make sure that when memory is |
627 | heavily contended for, memory is allocated based on the soft limit | 634 | heavily contended for, memory is allocated based on the soft limit |
628 | hints/setup. Currently soft limit based reclaim is set up such that | 635 | hints/setup. Currently soft limit based reclaim is set up such that |
629 | it gets invoked from balance_pgdat (kswapd). | 636 | it gets invoked from balance_pgdat (kswapd). |
630 | 637 | ||
631 | 7.1 Interface | 638 | 7.1 Interface |
632 | 639 | ||
633 | Soft limits can be setup by using the following commands (in this example we | 640 | Soft limits can be setup by using the following commands (in this example we |
634 | assume a soft limit of 256 MiB) | 641 | assume a soft limit of 256 MiB) |
635 | 642 | ||
636 | # echo 256M > memory.soft_limit_in_bytes | 643 | # echo 256M > memory.soft_limit_in_bytes |
637 | 644 | ||
638 | If we want to change this to 1G, we can at any time use | 645 | If we want to change this to 1G, we can at any time use |
639 | 646 | ||
640 | # echo 1G > memory.soft_limit_in_bytes | 647 | # echo 1G > memory.soft_limit_in_bytes |
641 | 648 | ||
642 | NOTE1: Soft limits take effect over a long period of time, since they involve | 649 | NOTE1: Soft limits take effect over a long period of time, since they involve |
643 | reclaiming memory for balancing between memory cgroups | 650 | reclaiming memory for balancing between memory cgroups |
644 | NOTE2: It is recommended to set the soft limit always below the hard limit, | 651 | NOTE2: It is recommended to set the soft limit always below the hard limit, |
645 | otherwise the hard limit will take precedence. | 652 | otherwise the hard limit will take precedence. |
646 | 653 | ||
647 | 8. Move charges at task migration | 654 | 8. Move charges at task migration |
648 | 655 | ||
649 | Users can move charges associated with a task along with task migration, that | 656 | Users can move charges associated with a task along with task migration, that |
650 | is, uncharge task's pages from the old cgroup and charge them to the new cgroup. | 657 | is, uncharge task's pages from the old cgroup and charge them to the new cgroup. |
651 | This feature is not supported in !CONFIG_MMU environments because of lack of | 658 | This feature is not supported in !CONFIG_MMU environments because of lack of |
652 | page tables. | 659 | page tables. |
653 | 660 | ||
654 | 8.1 Interface | 661 | 8.1 Interface |
655 | 662 | ||
656 | This feature is disabled by default. It can be enabledi (and disabled again) by | 663 | This feature is disabled by default. It can be enabledi (and disabled again) by |
657 | writing to memory.move_charge_at_immigrate of the destination cgroup. | 664 | writing to memory.move_charge_at_immigrate of the destination cgroup. |
658 | 665 | ||
659 | If you want to enable it: | 666 | If you want to enable it: |
660 | 667 | ||
661 | # echo (some positive value) > memory.move_charge_at_immigrate | 668 | # echo (some positive value) > memory.move_charge_at_immigrate |
662 | 669 | ||
663 | Note: Each bits of move_charge_at_immigrate has its own meaning about what type | 670 | Note: Each bits of move_charge_at_immigrate has its own meaning about what type |
664 | of charges should be moved. See 8.2 for details. | 671 | of charges should be moved. See 8.2 for details. |
665 | Note: Charges are moved only when you move mm->owner, in other words, | 672 | Note: Charges are moved only when you move mm->owner, in other words, |
666 | a leader of a thread group. | 673 | a leader of a thread group. |
667 | Note: If we cannot find enough space for the task in the destination cgroup, we | 674 | Note: If we cannot find enough space for the task in the destination cgroup, we |
668 | try to make space by reclaiming memory. Task migration may fail if we | 675 | try to make space by reclaiming memory. Task migration may fail if we |
669 | cannot make enough space. | 676 | cannot make enough space. |
670 | Note: It can take several seconds if you move charges much. | 677 | Note: It can take several seconds if you move charges much. |
671 | 678 | ||
672 | And if you want disable it again: | 679 | And if you want disable it again: |
673 | 680 | ||
674 | # echo 0 > memory.move_charge_at_immigrate | 681 | # echo 0 > memory.move_charge_at_immigrate |
675 | 682 | ||
676 | 8.2 Type of charges which can be moved | 683 | 8.2 Type of charges which can be moved |
677 | 684 | ||
678 | Each bit in move_charge_at_immigrate has its own meaning about what type of | 685 | Each bit in move_charge_at_immigrate has its own meaning about what type of |
679 | charges should be moved. But in any case, it must be noted that an account of | 686 | charges should be moved. But in any case, it must be noted that an account of |
680 | a page or a swap can be moved only when it is charged to the task's current | 687 | a page or a swap can be moved only when it is charged to the task's current |
681 | (old) memory cgroup. | 688 | (old) memory cgroup. |
682 | 689 | ||
683 | bit | what type of charges would be moved ? | 690 | bit | what type of charges would be moved ? |
684 | -----+------------------------------------------------------------------------ | 691 | -----+------------------------------------------------------------------------ |
685 | 0 | A charge of an anonymous page (or swap of it) used by the target task. | 692 | 0 | A charge of an anonymous page (or swap of it) used by the target task. |
686 | | You must enable Swap Extension (see 2.4) to enable move of swap charges. | 693 | | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
687 | -----+------------------------------------------------------------------------ | 694 | -----+------------------------------------------------------------------------ |
688 | 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) | 695 | 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
689 | | and swaps of tmpfs file) mmapped by the target task. Unlike the case of | 696 | | and swaps of tmpfs file) mmapped by the target task. Unlike the case of |
690 | | anonymous pages, file pages (and swaps) in the range mmapped by the task | 697 | | anonymous pages, file pages (and swaps) in the range mmapped by the task |
691 | | will be moved even if the task hasn't done page fault, i.e. they might | 698 | | will be moved even if the task hasn't done page fault, i.e. they might |
692 | | not be the task's "RSS", but other task's "RSS" that maps the same file. | 699 | | not be the task's "RSS", but other task's "RSS" that maps the same file. |
693 | | And mapcount of the page is ignored (the page can be moved even if | 700 | | And mapcount of the page is ignored (the page can be moved even if |
694 | | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to | 701 | | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to |
695 | | enable move of swap charges. | 702 | | enable move of swap charges. |
696 | 703 | ||
697 | 8.3 TODO | 704 | 8.3 TODO |
698 | 705 | ||
699 | - All of moving charge operations are done under cgroup_mutex. It's not good | 706 | - All of moving charge operations are done under cgroup_mutex. It's not good |
700 | behavior to hold the mutex too long, so we may need some trick. | 707 | behavior to hold the mutex too long, so we may need some trick. |
701 | 708 | ||
702 | 9. Memory thresholds | 709 | 9. Memory thresholds |
703 | 710 | ||
704 | Memory cgroup implements memory thresholds using the cgroups notification | 711 | Memory cgroup implements memory thresholds using the cgroups notification |
705 | API (see cgroups.txt). It allows to register multiple memory and memsw | 712 | API (see cgroups.txt). It allows to register multiple memory and memsw |
706 | thresholds and gets notifications when it crosses. | 713 | thresholds and gets notifications when it crosses. |
707 | 714 | ||
708 | To register a threshold, an application must: | 715 | To register a threshold, an application must: |
709 | - create an eventfd using eventfd(2); | 716 | - create an eventfd using eventfd(2); |
710 | - open memory.usage_in_bytes or memory.memsw.usage_in_bytes; | 717 | - open memory.usage_in_bytes or memory.memsw.usage_in_bytes; |
711 | - write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to | 718 | - write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to |
712 | cgroup.event_control. | 719 | cgroup.event_control. |
713 | 720 | ||
714 | Application will be notified through eventfd when memory usage crosses | 721 | Application will be notified through eventfd when memory usage crosses |
715 | threshold in any direction. | 722 | threshold in any direction. |
716 | 723 | ||
717 | It's applicable for root and non-root cgroup. | 724 | It's applicable for root and non-root cgroup. |
718 | 725 | ||
719 | 10. OOM Control | 726 | 10. OOM Control |
720 | 727 | ||
721 | memory.oom_control file is for OOM notification and other controls. | 728 | memory.oom_control file is for OOM notification and other controls. |
722 | 729 | ||
723 | Memory cgroup implements OOM notifier using the cgroup notification | 730 | Memory cgroup implements OOM notifier using the cgroup notification |
724 | API (See cgroups.txt). It allows to register multiple OOM notification | 731 | API (See cgroups.txt). It allows to register multiple OOM notification |
725 | delivery and gets notification when OOM happens. | 732 | delivery and gets notification when OOM happens. |
726 | 733 | ||
727 | To register a notifier, an application must: | 734 | To register a notifier, an application must: |
728 | - create an eventfd using eventfd(2) | 735 | - create an eventfd using eventfd(2) |
729 | - open memory.oom_control file | 736 | - open memory.oom_control file |
730 | - write string like "<event_fd> <fd of memory.oom_control>" to | 737 | - write string like "<event_fd> <fd of memory.oom_control>" to |
731 | cgroup.event_control | 738 | cgroup.event_control |
732 | 739 | ||
733 | The application will be notified through eventfd when OOM happens. | 740 | The application will be notified through eventfd when OOM happens. |
734 | OOM notification doesn't work for the root cgroup. | 741 | OOM notification doesn't work for the root cgroup. |
735 | 742 | ||
736 | You can disable the OOM-killer by writing "1" to memory.oom_control file, as: | 743 | You can disable the OOM-killer by writing "1" to memory.oom_control file, as: |
737 | 744 | ||
738 | #echo 1 > memory.oom_control | 745 | #echo 1 > memory.oom_control |
739 | 746 | ||
740 | This operation is only allowed to the top cgroup of a sub-hierarchy. | 747 | This operation is only allowed to the top cgroup of a sub-hierarchy. |
741 | If OOM-killer is disabled, tasks under cgroup will hang/sleep | 748 | If OOM-killer is disabled, tasks under cgroup will hang/sleep |
742 | in memory cgroup's OOM-waitqueue when they request accountable memory. | 749 | in memory cgroup's OOM-waitqueue when they request accountable memory. |
743 | 750 | ||
744 | For running them, you have to relax the memory cgroup's OOM status by | 751 | For running them, you have to relax the memory cgroup's OOM status by |
745 | * enlarge limit or reduce usage. | 752 | * enlarge limit or reduce usage. |
746 | To reduce usage, | 753 | To reduce usage, |
747 | * kill some tasks. | 754 | * kill some tasks. |
748 | * move some tasks to other group with account migration. | 755 | * move some tasks to other group with account migration. |
749 | * remove some files (on tmpfs?) | 756 | * remove some files (on tmpfs?) |
750 | 757 | ||
751 | Then, stopped tasks will work again. | 758 | Then, stopped tasks will work again. |
752 | 759 | ||
753 | At reading, current status of OOM is shown. | 760 | At reading, current status of OOM is shown. |
754 | oom_kill_disable 0 or 1 (if 1, oom-killer is disabled) | 761 | oom_kill_disable 0 or 1 (if 1, oom-killer is disabled) |
755 | under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may | 762 | under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may |
756 | be stopped.) | 763 | be stopped.) |
757 | 764 | ||
758 | 11. TODO | 765 | 11. TODO |
759 | 766 | ||
760 | 1. Add support for accounting huge pages (as a separate controller) | 767 | 1. Add support for accounting huge pages (as a separate controller) |
761 | 2. Make per-cgroup scanner reclaim not-shared pages first | 768 | 2. Make per-cgroup scanner reclaim not-shared pages first |
762 | 3. Teach controller to account for shared-pages | 769 | 3. Teach controller to account for shared-pages |
763 | 4. Start reclamation in the background when the limit is | 770 | 4. Start reclamation in the background when the limit is |
764 | not yet hit but the usage is getting closer | 771 | not yet hit but the usage is getting closer |
765 | 772 | ||
766 | Summary | 773 | Summary |
767 | 774 | ||
768 | Overall, the memory controller has been a stable controller and has been | 775 | Overall, the memory controller has been a stable controller and has been |
769 | commented and discussed quite extensively in the community. | 776 | commented and discussed quite extensively in the community. |
770 | 777 | ||
771 | References | 778 | References |
772 | 779 | ||
773 | 1. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/ | 780 | 1. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/ |
774 | 2. Singh, Balbir. Memory Controller (RSS Control), | 781 | 2. Singh, Balbir. Memory Controller (RSS Control), |
775 | http://lwn.net/Articles/222762/ | 782 | http://lwn.net/Articles/222762/ |
776 | 3. Emelianov, Pavel. Resource controllers based on process cgroups | 783 | 3. Emelianov, Pavel. Resource controllers based on process cgroups |
777 | http://lkml.org/lkml/2007/3/6/198 | 784 | http://lkml.org/lkml/2007/3/6/198 |
778 | 4. Emelianov, Pavel. RSS controller based on process cgroups (v2) | 785 | 4. Emelianov, Pavel. RSS controller based on process cgroups (v2) |
779 | http://lkml.org/lkml/2007/4/9/78 | 786 | http://lkml.org/lkml/2007/4/9/78 |
780 | 5. Emelianov, Pavel. RSS controller based on process cgroups (v3) | 787 | 5. Emelianov, Pavel. RSS controller based on process cgroups (v3) |
781 | http://lkml.org/lkml/2007/5/30/244 | 788 | http://lkml.org/lkml/2007/5/30/244 |
782 | 6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/ | 789 | 6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/ |
783 | 7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control | 790 | 7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control |
784 | subsystem (v3), http://lwn.net/Articles/235534/ | 791 | subsystem (v3), http://lwn.net/Articles/235534/ |
785 | 8. Singh, Balbir. RSS controller v2 test results (lmbench), | 792 | 8. Singh, Balbir. RSS controller v2 test results (lmbench), |
786 | http://lkml.org/lkml/2007/5/17/232 | 793 | http://lkml.org/lkml/2007/5/17/232 |
787 | 9. Singh, Balbir. RSS controller v2 AIM9 results | 794 | 9. Singh, Balbir. RSS controller v2 AIM9 results |
788 | http://lkml.org/lkml/2007/5/18/1 | 795 | http://lkml.org/lkml/2007/5/18/1 |
789 | 10. Singh, Balbir. Memory controller v6 test results, | 796 | 10. Singh, Balbir. Memory controller v6 test results, |
790 | http://lkml.org/lkml/2007/8/19/36 | 797 | http://lkml.org/lkml/2007/8/19/36 |
791 | 11. Singh, Balbir. Memory controller introduction (v6), | 798 | 11. Singh, Balbir. Memory controller introduction (v6), |
792 | http://lkml.org/lkml/2007/8/17/69 | 799 | http://lkml.org/lkml/2007/8/17/69 |
793 | 12. Corbet, Jonathan, Controlling memory use in cgroups, | 800 | 12. Corbet, Jonathan, Controlling memory use in cgroups, |
794 | http://lwn.net/Articles/243795/ | 801 | http://lwn.net/Articles/243795/ |
795 | 802 |