Commit d5bdae7d59451b9d63303f7794ef32bb76ba6330

Authored by Glauber Costa
Committed by Linus Torvalds
1 parent 2ad306b17c

memcg: add documentation about the kmem controller

Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 1 changed file with 58 additions and 1 deletions Side-by-side Diff

Documentation/cgroups/memory.txt
... ... @@ -71,6 +71,11 @@
71 71 memory.oom_control # set/show oom controls.
72 72 memory.numa_stat # show the number of memory usage per numa node
73 73  
  74 + memory.kmem.limit_in_bytes # set/show hard limit for kernel memory
  75 + memory.kmem.usage_in_bytes # show current kernel memory allocation
  76 + memory.kmem.failcnt # show the number of kernel memory usage hits limits
  77 + memory.kmem.max_usage_in_bytes # show max kernel memory usage recorded
  78 +
74 79 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
75 80 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
76 81 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits
77 82  
78 83  
79 84  
... ... @@ -268,20 +273,66 @@
268 273 different than user memory, since it can't be swapped out, which makes it
269 274 possible to DoS the system by consuming too much of this precious resource.
270 275  
  276 +Kernel memory won't be accounted at all until limit on a group is set. This
  277 +allows for existing setups to continue working without disruption. The limit
  278 +cannot be set if the cgroup have children, or if there are already tasks in the
  279 +cgroup. Attempting to set the limit under those conditions will return -EBUSY.
  280 +When use_hierarchy == 1 and a group is accounted, its children will
  281 +automatically be accounted regardless of their limit value.
  282 +
  283 +After a group is first limited, it will be kept being accounted until it
  284 +is removed. The memory limitation itself, can of course be removed by writing
  285 +-1 to memory.kmem.limit_in_bytes. In this case, kmem will be accounted, but not
  286 +limited.
  287 +
271 288 Kernel memory limits are not imposed for the root cgroup. Usage for the root
272   -cgroup may or may not be accounted.
  289 +cgroup may or may not be accounted. The memory used is accumulated into
  290 +memory.kmem.usage_in_bytes, or in a separate counter when it makes sense.
  291 +(currently only for tcp).
  292 +The main "kmem" counter is fed into the main counter, so kmem charges will
  293 +also be visible from the user counter.
273 294  
274 295 Currently no soft limit is implemented for kernel memory. It is future work
275 296 to trigger slab reclaim when those limits are reached.
276 297  
277 298 2.7.1 Current Kernel Memory resources accounted
278 299  
  300 +* stack pages: every process consumes some stack pages. By accounting into
  301 +kernel memory, we prevent new processes from being created when the kernel
  302 +memory usage is too high.
  303 +
279 304 * sockets memory pressure: some sockets protocols have memory pressure
280 305 thresholds. The Memory Controller allows them to be controlled individually
281 306 per cgroup, instead of globally.
282 307  
283 308 * tcp memory pressure: sockets memory pressure for the tcp protocol.
284 309  
  310 +2.7.3 Common use cases
  311 +
  312 +Because the "kmem" counter is fed to the main user counter, kernel memory can
  313 +never be limited completely independently of user memory. Say "U" is the user
  314 +limit, and "K" the kernel limit. There are three possible ways limits can be
  315 +set:
  316 +
  317 + U != 0, K = unlimited:
  318 + This is the standard memcg limitation mechanism already present before kmem
  319 + accounting. Kernel memory is completely ignored.
  320 +
  321 + U != 0, K < U:
  322 + Kernel memory is a subset of the user memory. This setup is useful in
  323 + deployments where the total amount of memory per-cgroup is overcommited.
  324 + Overcommiting kernel memory limits is definitely not recommended, since the
  325 + box can still run out of non-reclaimable memory.
  326 + In this case, the admin could set up K so that the sum of all groups is
  327 + never greater than the total memory, and freely set U at the cost of his
  328 + QoS.
  329 +
  330 + U != 0, K >= U:
  331 + Since kmem charges will also be fed to the user counter and reclaim will be
  332 + triggered for the cgroup for both kinds of memory. This setup gives the
  333 + admin a unified view of memory, and it is also useful for people who just
  334 + want to track kernel memory usage.
  335 +
285 336 3. User Interface
286 337  
287 338 0. Configuration
... ... @@ -290,6 +341,7 @@
290 341 b. Enable CONFIG_RESOURCE_COUNTERS
291 342 c. Enable CONFIG_MEMCG
292 343 d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
  344 +d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
293 345  
294 346 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
295 347 # mount -t tmpfs none /sys/fs/cgroup
... ... @@ -405,6 +457,11 @@
405 457 The typical use case for this interface is before calling rmdir().
406 458 Because rmdir() moves all pages to parent, some out-of-use page caches can be
407 459 moved to the parent. If you want to avoid that, force_empty will be useful.
  460 +
  461 + Also, note that when memory.kmem.limit_in_bytes is set the charges due to
  462 + kernel pages will still be seen. This is not considered a failure and the
  463 + write will still return success. In this case, it is expected that
  464 + memory.kmem.usage_in_bytes == memory.usage_in_bytes.
408 465  
409 466 About use_hierarchy, see Section 6.
410 467