Commit eb709b0d062efd653a61183af8e27b2711c3cf5c

Authored by Shaohua Li
Committed by Linus Torvalds
1 parent f68aa5b445

mm: batch activate_page() to reduce lock contention

The zone->lru_lock is heavily contented in workload where activate_page()
is frequently used.  We could do batch activate_page() to reduce the lock
contention.  The batched pages will be added into zone list when the pool
is full or page reclaim is trying to drain them.

For example, in a 4 socket 64 CPU system, create a sparse file and 64
processes, processes shared map to the file.  Each process read access the
whole file and then exit.  The process exit will do unmap_vmas() and cause
a lot of activate_page() call.  In such workload, we saw about 58% total
time reduction with below patch.  Other workloads with a lot of
activate_page also benefits a lot too.

Andrew Morton suggested activate_page() and putback_lru_pages() should
follow the same path to active pages, but this is hard to implement (see
commit 7a608572a282a ("Revert "mm: batch activate_page() to reduce lock
contention")).  On the other hand, do we really need putback_lru_pages()
to follow the same path?  I tested several FIO/FFSB benchmark (about 20
scripts for each benchmark) in 3 machines here from 2 sockets to 4
sockets.  My test doesn't show anything significant with/without below
patch (there is slight difference but mostly some noise which we found
even without below patch before).  Below patch basically returns to the
same as my first post.

I tested some microbenchmarks:
  case-anon-cow-rand-mt         0.58%
  case-anon-cow-rand           -3.30%
  case-anon-cow-seq-mt         -0.51%
  case-anon-cow-seq            -5.68%
  case-anon-r-rand-mt           0.23%
  case-anon-r-rand              0.81%
  case-anon-r-seq-mt           -0.71%
  case-anon-r-seq              -1.99%
  case-anon-rx-rand-mt          2.11%
  case-anon-rx-seq-mt           3.46%
  case-anon-w-rand-mt          -0.03%
  case-anon-w-rand             -0.50%
  case-anon-w-seq-mt           -1.08%
  case-anon-w-seq              -0.12%
  case-anon-wx-rand-mt         -5.02%
  case-anon-wx-seq-mt          -1.43%
  case-fork                     1.65%
  case-fork-sleep              -0.07%
  case-fork-withmem             1.39%
  case-hugetlb                 -0.59%
  case-lru-file-mmap-read-mt   -0.54%
  case-lru-file-mmap-read       0.61%
  case-lru-file-mmap-read-rand -2.24%
  case-lru-file-readonce       -0.64%
  case-lru-file-readtwice     -11.69%
  case-lru-memcg               -1.35%
  case-mmap-pread-rand-mt       1.88%
  case-mmap-pread-rand        -15.26%
  case-mmap-pread-seq-mt        0.89%
  case-mmap-pread-seq         -69.72%
  case-mmap-xread-rand-mt       0.71%
  case-mmap-xread-seq-mt        0.38%

The most significent are:
  case-lru-file-readtwice     -11.69%
  case-mmap-pread-rand        -15.26%
  case-mmap-pread-seq         -69.72%

which use activate_page a lot.  others are basically variations because
each run has slightly difference.

In UP case, 'size mm/swap.o'
before the two patches:
   text    data     bss     dec     hex filename
   6466     896       4    7366    1cc6 mm/swap.o
after the two patches:
   text    data     bss     dec     hex filename
   6343     896       4    7243    1c4b mm/swap.o

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 1 changed file with 40 additions and 5 deletions Side-by-side Diff

... ... @@ -272,14 +272,10 @@
272 272 memcg_reclaim_stat->recent_rotated[file]++;
273 273 }
274 274  
275   -/*
276   - * FIXME: speed this up?
277   - */
278   -void activate_page(struct page *page)
  275 +static void __activate_page(struct page *page, void *arg)
279 276 {
280 277 struct zone *zone = page_zone(page);
281 278  
282   - spin_lock_irq(&zone->lru_lock);
283 279 if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
284 280 int file = page_is_file_cache(page);
285 281 int lru = page_lru_base_type(page);
286 282  
... ... @@ -292,8 +288,45 @@
292 288  
293 289 update_page_reclaim_stat(zone, page, file, 1);
294 290 }
  291 +}
  292 +
  293 +#ifdef CONFIG_SMP
  294 +static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
  295 +
  296 +static void activate_page_drain(int cpu)
  297 +{
  298 + struct pagevec *pvec = &per_cpu(activate_page_pvecs, cpu);
  299 +
  300 + if (pagevec_count(pvec))
  301 + pagevec_lru_move_fn(pvec, __activate_page, NULL);
  302 +}
  303 +
  304 +void activate_page(struct page *page)
  305 +{
  306 + if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
  307 + struct pagevec *pvec = &get_cpu_var(activate_page_pvecs);
  308 +
  309 + page_cache_get(page);
  310 + if (!pagevec_add(pvec, page))
  311 + pagevec_lru_move_fn(pvec, __activate_page, NULL);
  312 + put_cpu_var(activate_page_pvecs);
  313 + }
  314 +}
  315 +
  316 +#else
  317 +static inline void activate_page_drain(int cpu)
  318 +{
  319 +}
  320 +
  321 +void activate_page(struct page *page)
  322 +{
  323 + struct zone *zone = page_zone(page);
  324 +
  325 + spin_lock_irq(&zone->lru_lock);
  326 + __activate_page(page, NULL);
295 327 spin_unlock_irq(&zone->lru_lock);
296 328 }
  329 +#endif
297 330  
298 331 /*
299 332 * Mark a page as having seen activity.
... ... @@ -464,6 +497,8 @@
464 497 pvec = &per_cpu(lru_deactivate_pvecs, cpu);
465 498 if (pagevec_count(pvec))
466 499 pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
  500 +
  501 + activate_page_drain(cpu);
467 502 }
468 503  
469 504 /**