Commit 8334b96221ff0dcbde4873d31eb4d84774ed8ed4

Authored by Minchan Kim
Committed by Linus Torvalds
1 parent 3115aec451

mm: /proc/pid/smaps:: show proportional swap share of the mapping

We want to know per-process workingset size for smart memory management
on userland and we use swap(ex, zram) heavily to maximize memory
efficiency so workingset includes swap as well as RSS.

On such system, if there are lots of shared anonymous pages, it's really
hard to figure out exactly how many each process consumes memory(ie, rss
+ wap) if the system has lots of shared anonymous memory(e.g, android).

This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
more exact workingset size per process.

Bongkyu tested it. Result is below.

1. 50M used swap
SwapTotal: 461976 kB
SwapFree: 411192 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
48236
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
141184

2. 240M used swap
SwapTotal: 461976 kB
SwapFree: 216808 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
230315
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
1387744

[akpm@linux-foundation.org: simplify kunmap_atomic() call]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Bongkyu Kim <bongkyu.kim@lge.com>
Tested-by: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 4 changed files with 77 additions and 7 deletions Side-by-side Diff

Documentation/filesystems/proc.txt
... ... @@ -424,6 +424,7 @@
424 424 Referenced: 892 kB
425 425 Anonymous: 0 kB
426 426 Swap: 0 kB
  427 +SwapPss: 0 kB
427 428 KernelPageSize: 4 kB
428 429 MMUPageSize: 4 kB
429 430 Locked: 374 kB
430 431  
... ... @@ -433,16 +434,23 @@
433 434 mapping in /proc/PID/maps. The remaining lines show the size of the mapping
434 435 (size), the amount of the mapping that is currently resident in RAM (RSS), the
435 436 process' proportional share of this mapping (PSS), the number of clean and
436   -dirty private pages in the mapping. Note that even a page which is part of a
437   -MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used
438   -by only one process, is accounted as private and not as shared. "Referenced"
439   -indicates the amount of memory currently marked as referenced or accessed.
  437 +dirty private pages in the mapping.
  438 +
  439 +The "proportional set size" (PSS) of a process is the count of pages it has
  440 +in memory, where each page is divided by the number of processes sharing it.
  441 +So if a process has 1000 pages all to itself, and 1000 shared with one other
  442 +process, its PSS will be 1500.
  443 +Note that even a page which is part of a MAP_SHARED mapping, but has only
  444 +a single pte mapped, i.e. is currently used by only one process, is accounted
  445 +as private and not as shared.
  446 +"Referenced" indicates the amount of memory currently marked as referenced or
  447 +accessed.
440 448 "Anonymous" shows the amount of memory that does not belong to any file. Even
441 449 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
442 450 and a page is modified, the file page is replaced by a private anonymous copy.
443 451 "Swap" shows how much would-be-anonymous memory is also used, but out on
444 452 swap.
445   -
  453 +"SwapPss" shows proportional swap share of this mapping.
446 454 "VmFlags" field deserves a separate description. This member represents the kernel
447 455 flags associated with the particular virtual memory area in two letter encoded
448 456 manner. The codes are the following:
... ... @@ -446,6 +446,7 @@
446 446 unsigned long anonymous_thp;
447 447 unsigned long swap;
448 448 u64 pss;
  449 + u64 swap_pss;
449 450 };
450 451  
451 452 static void smaps_account(struct mem_size_stats *mss, struct page *page,
452 453  
... ... @@ -492,9 +493,20 @@
492 493 } else if (is_swap_pte(*pte)) {
493 494 swp_entry_t swpent = pte_to_swp_entry(*pte);
494 495  
495   - if (!non_swap_entry(swpent))
  496 + if (!non_swap_entry(swpent)) {
  497 + int mapcount;
  498 +
496 499 mss->swap += PAGE_SIZE;
497   - else if (is_migration_entry(swpent))
  500 + mapcount = swp_swapcount(swpent);
  501 + if (mapcount >= 2) {
  502 + u64 pss_delta = (u64)PAGE_SIZE << PSS_SHIFT;
  503 +
  504 + do_div(pss_delta, mapcount);
  505 + mss->swap_pss += pss_delta;
  506 + } else {
  507 + mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
  508 + }
  509 + } else if (is_migration_entry(swpent))
498 510 page = migration_entry_to_page(swpent);
499 511 }
500 512  
... ... @@ -640,6 +652,7 @@
640 652 "Anonymous: %8lu kB\n"
641 653 "AnonHugePages: %8lu kB\n"
642 654 "Swap: %8lu kB\n"
  655 + "SwapPss: %8lu kB\n"
643 656 "KernelPageSize: %8lu kB\n"
644 657 "MMUPageSize: %8lu kB\n"
645 658 "Locked: %8lu kB\n",
... ... @@ -654,6 +667,7 @@
654 667 mss.anonymous >> 10,
655 668 mss.anonymous_thp >> 10,
656 669 mss.swap >> 10,
  670 + (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
657 671 vma_kernel_pagesize(vma) >> 10,
658 672 vma_mmu_pagesize(vma) >> 10,
659 673 (vma->vm_flags & VM_LOCKED) ?
include/linux/swap.h
... ... @@ -431,6 +431,7 @@
431 431 extern sector_t map_swap_page(struct page *, struct block_device **);
432 432 extern sector_t swapdev_block(int, pgoff_t);
433 433 extern int page_swapcount(struct page *);
  434 +extern int swp_swapcount(swp_entry_t entry);
434 435 extern struct swap_info_struct *page_swap_info(struct page *);
435 436 extern int reuse_swap_page(struct page *);
436 437 extern int try_to_free_swap(struct page *);
... ... @@ -518,6 +519,11 @@
518 519 }
519 520  
520 521 static inline int page_swapcount(struct page *page)
  522 +{
  523 + return 0;
  524 +}
  525 +
  526 +static inline int swp_swapcount(swp_entry_t entry)
521 527 {
522 528 return 0;
523 529 }
... ... @@ -875,6 +875,48 @@
875 875 }
876 876  
877 877 /*
  878 + * How many references to @entry are currently swapped out?
  879 + * This considers COUNT_CONTINUED so it returns exact answer.
  880 + */
  881 +int swp_swapcount(swp_entry_t entry)
  882 +{
  883 + int count, tmp_count, n;
  884 + struct swap_info_struct *p;
  885 + struct page *page;
  886 + pgoff_t offset;
  887 + unsigned char *map;
  888 +
  889 + p = swap_info_get(entry);
  890 + if (!p)
  891 + return 0;
  892 +
  893 + count = swap_count(p->swap_map[swp_offset(entry)]);
  894 + if (!(count & COUNT_CONTINUED))
  895 + goto out;
  896 +
  897 + count &= ~COUNT_CONTINUED;
  898 + n = SWAP_MAP_MAX + 1;
  899 +
  900 + offset = swp_offset(entry);
  901 + page = vmalloc_to_page(p->swap_map + offset);
  902 + offset &= ~PAGE_MASK;
  903 + VM_BUG_ON(page_private(page) != SWP_CONTINUED);
  904 +
  905 + do {
  906 + page = list_entry(page->lru.next, struct page, lru);
  907 + map = kmap_atomic(page);
  908 + tmp_count = map[offset];
  909 + kunmap_atomic(map);
  910 +
  911 + count += (tmp_count & ~COUNT_CONTINUED) * n;
  912 + n *= (SWAP_CONT_MAX + 1);
  913 + } while (tmp_count & COUNT_CONTINUED);
  914 +out:
  915 + spin_unlock(&p->lock);
  916 + return count;
  917 +}
  918 +
  919 +/*
878 920 * We can write to an anon page without COW if there are no other references
879 921 * to it. And as a side-effect, free up its swap: because the old content
880 922 * on disk will never be read, and seeking back there to write new content