Commit 2ce666c175bb502cae050c97364acf48449b9d6a

Authored by Mel Gorman
Committed by Jiri Slaby
1 parent 2d37a72e40

mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY

commit 1a501907bbea8e6ebb0b16cf6db9e9cbf1d2c813 upstream.

Commit "mm: vmscan: obey proportional scanning requirements for kswapd"
ensured that file/anon lists were scanned proportionally for reclaim from
kswapd but ignored it for direct reclaim.  The intent was to minimse
direct reclaim latency but Yuanhan Liu pointer out that it substitutes one
long stall for many small stalls and distorts aging for normal workloads
like streaming readers/writers.  Hugh Dickins pointed out that a
side-effect of the same commit was that when one LRU list dropped to zero
that the entirety of the other list was shrunk leading to excessive
reclaim in memcgs.  This patch scans the file/anon lists proportionally
for direct reclaim to similarly age page whether reclaimed by kswapd or
direct reclaim but takes care to abort reclaim if one LRU drops to zero
after reclaiming the requested number of pages.

Based on ext4 and using the Intel VM scalability test

                                              3.15.0-rc5            3.15.0-rc5
                                                shrinker            proportion
Unit  lru-file-readonce    elapsed      5.3500 (  0.00%)      5.4200 ( -1.31%)
Unit  lru-file-readonce time_range      0.2700 (  0.00%)      0.1400 ( 48.15%)
Unit  lru-file-readonce time_stddv      0.1148 (  0.00%)      0.0536 ( 53.33%)
Unit lru-file-readtwice    elapsed      8.1700 (  0.00%)      8.1700 (  0.00%)
Unit lru-file-readtwice time_range      0.4300 (  0.00%)      0.2300 ( 46.51%)
Unit lru-file-readtwice time_stddv      0.1650 (  0.00%)      0.0971 ( 41.16%)

The test cases are running multiple dd instances reading sparse files. The results are within
the noise for the small test machine. The impact of the patch is more noticable from the vmstats

                            3.15.0-rc5  3.15.0-rc5
                              shrinker  proportion
Minor Faults                     35154       36784
Major Faults                       611        1305
Swap Ins                           394        1651
Swap Outs                         4394        5891
Allocation stalls               118616       44781
Direct pages scanned           4935171     4602313
Kswapd pages scanned          15921292    16258483
Kswapd pages reclaimed        15913301    16248305
Direct pages reclaimed         4933368     4601133
Kswapd efficiency                  99%         99%
Kswapd velocity             670088.047  682555.961
Direct efficiency                  99%         99%
Direct velocity             207709.217  193212.133
Percentage direct scans            23%         22%
Page writes by reclaim        4858.000    6232.000
Page writes file                   464         341
Page writes anon                  4394        5891

Note that there are fewer allocation stalls even though the amount
of direct reclaim scanning is very approximately the same.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

Showing 1 changed file with 25 additions and 11 deletions Side-by-side Diff

... ... @@ -2018,13 +2018,27 @@
2018 2018 unsigned long nr_reclaimed = 0;
2019 2019 unsigned long nr_to_reclaim = sc->nr_to_reclaim;
2020 2020 struct blk_plug plug;
2021   - bool scan_adjusted = false;
  2021 + bool scan_adjusted;
2022 2022  
2023 2023 get_scan_count(lruvec, sc, nr);
2024 2024  
2025 2025 /* Record the original scan target for proportional adjustments later */
2026 2026 memcpy(targets, nr, sizeof(nr));
2027 2027  
  2028 + /*
  2029 + * Global reclaiming within direct reclaim at DEF_PRIORITY is a normal
  2030 + * event that can occur when there is little memory pressure e.g.
  2031 + * multiple streaming readers/writers. Hence, we do not abort scanning
  2032 + * when the requested number of pages are reclaimed when scanning at
  2033 + * DEF_PRIORITY on the assumption that the fact we are direct
  2034 + * reclaiming implies that kswapd is not keeping up and it is best to
  2035 + * do a batch of work at once. For memcg reclaim one check is made to
  2036 + * abort proportional reclaim if either the file or anon lru has already
  2037 + * dropped to zero at the first pass.
  2038 + */
  2039 + scan_adjusted = (global_reclaim(sc) && !current_is_kswapd() &&
  2040 + sc->priority == DEF_PRIORITY);
  2041 +
2028 2042 blk_start_plug(&plug);
2029 2043 while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
2030 2044 nr[LRU_INACTIVE_FILE]) {
2031 2045  
2032 2046  
... ... @@ -2045,23 +2059,23 @@
2045 2059 continue;
2046 2060  
2047 2061 /*
2048   - * For global direct reclaim, reclaim only the number of pages
2049   - * requested. Less care is taken to scan proportionally as it
2050   - * is more important to minimise direct reclaim stall latency
2051   - * than it is to properly age the LRU lists.
2052   - */
2053   - if (global_reclaim(sc) && !current_is_kswapd())
2054   - break;
2055   -
2056   - /*
2057 2062 * For kswapd and memcg, reclaim at least the number of pages
2058   - * requested. Ensure that the anon and file LRUs shrink
  2063 + * requested. Ensure that the anon and file LRUs are scanned
2059 2064 * proportionally what was requested by get_scan_count(). We
2060 2065 * stop reclaiming one LRU and reduce the amount scanning
2061 2066 * proportional to the original scan target.
2062 2067 */
2063 2068 nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE];
2064 2069 nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON];
  2070 +
  2071 + /*
  2072 + * It's just vindictive to attack the larger once the smaller
  2073 + * has gone to zero. And given the way we stop scanning the
  2074 + * smaller below, this makes sure that we only make one nudge
  2075 + * towards proportionality once we've got nr_to_reclaim.
  2076 + */
  2077 + if (!nr_file || !nr_anon)
  2078 + break;
2065 2079  
2066 2080 if (nr_file > nr_anon) {
2067 2081 unsigned long scan_target = targets[LRU_INACTIVE_ANON] +