Commit 2da02997e08d3efe8174c7a47696e6f7cbe69ba9

Authored by David Rientjes
Committed by Linus Torvalds
1 parent 364aeb2849

mm: add dirty_background_bytes and dirty_bytes sysctls

This change introduces two new sysctls to /proc/sys/vm:
dirty_background_bytes and dirty_bytes.

dirty_background_bytes is the counterpart to dirty_background_ratio and
dirty_bytes is the counterpart to dirty_ratio.

With growing memory capacities of individual machines, it's no longer
sufficient to specify dirty thresholds as a percentage of the amount of
dirtyable memory over the entire system.

dirty_background_bytes and dirty_bytes specify quantities of memory, in
bytes, that represent the dirty limits for the entire system.  If either
of these values is set, its value represents the amount of dirty memory
that is needed to commence either background or direct writeback.

When a `bytes' or `ratio' file is written, its counterpart becomes a
function of the written value.  For example, if dirty_bytes is written to
be 8096, 8K of memory is required to commence direct writeback.
dirty_ratio is then functionally equivalent to 8K / the amount of
dirtyable memory:

	dirtyable_memory = free pages + mapped pages + file cache

	dirty_background_bytes = dirty_background_ratio * dirtyable_memory
		-or-
	dirty_background_ratio = dirty_background_bytes / dirtyable_memory

		AND

	dirty_bytes = dirty_ratio * dirtyable_memory
		-or-
	dirty_ratio = dirty_bytes / dirtyable_memory

Only one of dirty_background_bytes and dirty_background_ratio may be
specified at a time, and only one of dirty_bytes and dirty_ratio may be
specified.  When one sysctl is written, the other appears as 0 when read.

The `bytes' files operate on a page size granularity since dirty limits
are compared with ZVC values, which are in page units.

Prior to this change, the minimum dirty_ratio was 5 as implemented by
get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
written value between 0 and 100.  This restriction is maintained, but
dirty_bytes has a lower limit of only one page.

Also prior to this change, the dirty_background_ratio could not equal or
exceed dirty_ratio.  This restriction is maintained in addition to
restricting dirty_background_bytes.  If either background threshold equals
or exceeds that of the dirty threshold, it is implicitly set to half the
dirty threshold.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 5 changed files with 146 additions and 23 deletions Side-by-side Diff

Documentation/filesystems/proc.txt
... ... @@ -1385,6 +1385,15 @@
1385 1385 to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100
1386 1386 causes the kernel to prefer to reclaim dentries and inodes.
1387 1387  
  1388 +dirty_background_bytes
  1389 +----------------------
  1390 +
  1391 +Contains the amount of dirty memory at which the pdflush background writeback
  1392 +daemon will start writeback.
  1393 +
  1394 +If dirty_background_bytes is written, dirty_background_ratio becomes a function
  1395 +of its value (dirty_background_bytes / the amount of dirtyable system memory).
  1396 +
1388 1397 dirty_background_ratio
1389 1398 ----------------------
1390 1399  
1391 1400  
1392 1401  
... ... @@ -1393,13 +1402,28 @@
1393 1402 pages at which the pdflush background writeback daemon will start writing out
1394 1403 dirty data.
1395 1404  
  1405 +If dirty_background_ratio is written, dirty_background_bytes becomes a function
  1406 +of its value (dirty_background_ratio * the amount of dirtyable system memory).
  1407 +
  1408 +dirty_bytes
  1409 +-----------
  1410 +
  1411 +Contains the amount of dirty memory at which a process generating disk writes
  1412 +will itself start writeback.
  1413 +
  1414 +If dirty_bytes is written, dirty_ratio becomes a function of its value
  1415 +(dirty_bytes / the amount of dirtyable system memory).
  1416 +
1396 1417 dirty_ratio
1397   ------------------
  1418 +-----------
1398 1419  
1399 1420 Contains, as a percentage of the dirtyable system memory (free pages + mapped
1400 1421 pages + file cache, not including locked pages and HugePages), the number of
1401 1422 pages at which a process which is generating disk writes will itself start
1402 1423 writing out dirty data.
  1424 +
  1425 +If dirty_ratio is written, dirty_bytes becomes a function of its value
  1426 +(dirty_ratio * the amount of dirtyable system memory).
1403 1427  
1404 1428 dirty_writeback_centisecs
1405 1429 -------------------------
Documentation/sysctl/vm.txt
... ... @@ -41,7 +41,8 @@
41 41  
42 42 ==============================================================
43 43  
44   -dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
  44 +dirty_bytes, dirty_ratio, dirty_background_bytes,
  45 +dirty_background_ratio, dirty_expire_centisecs,
45 46 dirty_writeback_centisecs, highmem_is_dirtyable,
46 47 vfs_cache_pressure, laptop_mode, block_dump, swap_token_timeout,
47 48 drop-caches, hugepages_treat_as_movable:
include/linux/writeback.h
... ... @@ -107,7 +107,9 @@
107 107  
108 108 /* These are exported to sysctl. */
109 109 extern int dirty_background_ratio;
  110 +extern unsigned long dirty_background_bytes;
110 111 extern int vm_dirty_ratio;
  112 +extern unsigned long vm_dirty_bytes;
111 113 extern int dirty_writeback_interval;
112 114 extern int dirty_expire_interval;
113 115 extern int vm_highmem_is_dirtyable;
114 116  
... ... @@ -116,7 +118,16 @@
116 118  
117 119 extern unsigned long determine_dirtyable_memory(void);
118 120  
  121 +extern int dirty_background_ratio_handler(struct ctl_table *table, int write,
  122 + struct file *filp, void __user *buffer, size_t *lenp,
  123 + loff_t *ppos);
  124 +extern int dirty_background_bytes_handler(struct ctl_table *table, int write,
  125 + struct file *filp, void __user *buffer, size_t *lenp,
  126 + loff_t *ppos);
119 127 extern int dirty_ratio_handler(struct ctl_table *table, int write,
  128 + struct file *filp, void __user *buffer, size_t *lenp,
  129 + loff_t *ppos);
  130 +extern int dirty_bytes_handler(struct ctl_table *table, int write,
120 131 struct file *filp, void __user *buffer, size_t *lenp,
121 132 loff_t *ppos);
122 133  
... ... @@ -87,10 +87,6 @@
87 87 #endif /* #ifdef CONFIG_RCU_TORTURE_TEST */
88 88  
89 89 /* Constants used for minimum and maximum */
90   -#if defined(CONFIG_HIGHMEM) || defined(CONFIG_DETECT_SOFTLOCKUP)
91   -static int one = 1;
92   -#endif
93   -
94 90 #ifdef CONFIG_DETECT_SOFTLOCKUP
95 91 static int sixty = 60;
96 92 static int neg_one = -1;
... ... @@ -101,6 +97,7 @@
101 97 #endif
102 98  
103 99 static int zero;
  100 +static int one = 1;
104 101 static int one_hundred = 100;
105 102  
106 103 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
107 104  
... ... @@ -952,12 +949,22 @@
952 949 .data = &dirty_background_ratio,
953 950 .maxlen = sizeof(dirty_background_ratio),
954 951 .mode = 0644,
955   - .proc_handler = &proc_dointvec_minmax,
  952 + .proc_handler = &dirty_background_ratio_handler,
956 953 .strategy = &sysctl_intvec,
957 954 .extra1 = &zero,
958 955 .extra2 = &one_hundred,
959 956 },
960 957 {
  958 + .ctl_name = CTL_UNNUMBERED,
  959 + .procname = "dirty_background_bytes",
  960 + .data = &dirty_background_bytes,
  961 + .maxlen = sizeof(dirty_background_bytes),
  962 + .mode = 0644,
  963 + .proc_handler = &dirty_background_bytes_handler,
  964 + .strategy = &sysctl_intvec,
  965 + .extra1 = &one,
  966 + },
  967 + {
961 968 .ctl_name = VM_DIRTY_RATIO,
962 969 .procname = "dirty_ratio",
963 970 .data = &vm_dirty_ratio,
... ... @@ -967,6 +974,16 @@
967 974 .strategy = &sysctl_intvec,
968 975 .extra1 = &zero,
969 976 .extra2 = &one_hundred,
  977 + },
  978 + {
  979 + .ctl_name = CTL_UNNUMBERED,
  980 + .procname = "dirty_bytes",
  981 + .data = &vm_dirty_bytes,
  982 + .maxlen = sizeof(vm_dirty_bytes),
  983 + .mode = 0644,
  984 + .proc_handler = &dirty_bytes_handler,
  985 + .strategy = &sysctl_intvec,
  986 + .extra1 = &one,
970 987 },
971 988 {
972 989 .procname = "dirty_writeback_centisecs",
... ... @@ -69,6 +69,12 @@
69 69 int dirty_background_ratio = 5;
70 70  
71 71 /*
  72 + * dirty_background_bytes starts at 0 (disabled) so that it is a function of
  73 + * dirty_background_ratio * the amount of dirtyable memory
  74 + */
  75 +unsigned long dirty_background_bytes;
  76 +
  77 +/*
72 78 * free highmem will not be subtracted from the total free memory
73 79 * for calculating free ratios if vm_highmem_is_dirtyable is true
74 80 */
... ... @@ -80,6 +86,12 @@
80 86 int vm_dirty_ratio = 10;
81 87  
82 88 /*
  89 + * vm_dirty_bytes starts at 0 (disabled) so that it is a function of
  90 + * vm_dirty_ratio * the amount of dirtyable memory
  91 + */
  92 +unsigned long vm_dirty_bytes;
  93 +
  94 +/*
83 95 * The interval between `kupdate'-style writebacks, in jiffies
84 96 */
85 97 int dirty_writeback_interval = 5 * HZ;
86 98  
87 99  
88 100  
89 101  
90 102  
... ... @@ -135,27 +147,79 @@
135 147 {
136 148 unsigned long dirty_total;
137 149  
138   - dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) / 100;
  150 + if (vm_dirty_bytes)
  151 + dirty_total = vm_dirty_bytes / PAGE_SIZE;
  152 + else
  153 + dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
  154 + 100;
139 155 return 2 + ilog2(dirty_total - 1);
140 156 }
141 157  
142 158 /*
143   - * update the period when the dirty ratio changes.
  159 + * update the period when the dirty threshold changes.
144 160 */
  161 +static void update_completion_period(void)
  162 +{
  163 + int shift = calc_period_shift();
  164 + prop_change_shift(&vm_completions, shift);
  165 + prop_change_shift(&vm_dirties, shift);
  166 +}
  167 +
  168 +int dirty_background_ratio_handler(struct ctl_table *table, int write,
  169 + struct file *filp, void __user *buffer, size_t *lenp,
  170 + loff_t *ppos)
  171 +{
  172 + int ret;
  173 +
  174 + ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
  175 + if (ret == 0 && write)
  176 + dirty_background_bytes = 0;
  177 + return ret;
  178 +}
  179 +
  180 +int dirty_background_bytes_handler(struct ctl_table *table, int write,
  181 + struct file *filp, void __user *buffer, size_t *lenp,
  182 + loff_t *ppos)
  183 +{
  184 + int ret;
  185 +
  186 + ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos);
  187 + if (ret == 0 && write)
  188 + dirty_background_ratio = 0;
  189 + return ret;
  190 +}
  191 +
145 192 int dirty_ratio_handler(struct ctl_table *table, int write,
146 193 struct file *filp, void __user *buffer, size_t *lenp,
147 194 loff_t *ppos)
148 195 {
149 196 int old_ratio = vm_dirty_ratio;
150   - int ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
  197 + int ret;
  198 +
  199 + ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
151 200 if (ret == 0 && write && vm_dirty_ratio != old_ratio) {
152   - int shift = calc_period_shift();
153   - prop_change_shift(&vm_completions, shift);
154   - prop_change_shift(&vm_dirties, shift);
  201 + update_completion_period();
  202 + vm_dirty_bytes = 0;
155 203 }
156 204 return ret;
157 205 }
158 206  
  207 +
  208 +int dirty_bytes_handler(struct ctl_table *table, int write,
  209 + struct file *filp, void __user *buffer, size_t *lenp,
  210 + loff_t *ppos)
  211 +{
  212 + int old_bytes = vm_dirty_bytes;
  213 + int ret;
  214 +
  215 + ret = proc_doulongvec_minmax(table, write, filp, buffer, lenp, ppos);
  216 + if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
  217 + update_completion_period();
  218 + vm_dirty_ratio = 0;
  219 + }
  220 + return ret;
  221 +}
  222 +
159 223 /*
160 224 * Increment the BDI's writeout completion count and the global writeout
161 225 * completion count. Called from test_clear_page_writeback().
162 226  
163 227  
164 228  
... ... @@ -365,23 +429,29 @@
365 429 get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
366 430 unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
367 431 {
368   - int background_ratio; /* Percentages */
369   - int dirty_ratio;
370 432 unsigned long background;
371 433 unsigned long dirty;
372 434 unsigned long available_memory = determine_dirtyable_memory();
373 435 struct task_struct *tsk;
374 436  
375   - dirty_ratio = vm_dirty_ratio;
376   - if (dirty_ratio < 5)
377   - dirty_ratio = 5;
  437 + if (vm_dirty_bytes)
  438 + dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
  439 + else {
  440 + int dirty_ratio;
378 441  
379   - background_ratio = dirty_background_ratio;
380   - if (background_ratio >= dirty_ratio)
381   - background_ratio = dirty_ratio / 2;
  442 + dirty_ratio = vm_dirty_ratio;
  443 + if (dirty_ratio < 5)
  444 + dirty_ratio = 5;
  445 + dirty = (dirty_ratio * available_memory) / 100;
  446 + }
382 447  
383   - background = (background_ratio * available_memory) / 100;
384   - dirty = (dirty_ratio * available_memory) / 100;
  448 + if (dirty_background_bytes)
  449 + background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
  450 + else
  451 + background = (dirty_background_ratio * available_memory) / 100;
  452 +
  453 + if (background >= dirty)
  454 + background = dirty / 2;
385 455 tsk = current;
386 456 if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
387 457 background += background / 4;