Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2020

1 commit

0e9aa6755 mm: fix some broken comments ... Browse Code »

Fix some broken comments including typo, grammar error and wrong function
name.

Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200913095456.54873-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-17 02:11:19 +0800

14 Oct, 2020

3 commits

326463154 swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity ... Browse Code »

SWP_FS is used to make swap_{read,write}page() go through the filesystem,
and it's only used for swap files over NFS for now. Otherwise it will
directly submit IO to blockdev according to swapfile extents reported by
filesystems in advance.

As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's
rename to SWP_FS_OPS.

[1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.org

Suggested-by: Matthew Wilcox
Signed-off-by: Gao Xiang
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.com
Signed-off-by: Linus Torvalds

Gao Xiang
2020-10-14 09:38:29 +0800
a6de4b487 mm: convert find_get_entry to return the head page ... Browse Code »

There are only four callers remaining of find_get_entry().
get_shadow_from_swap_cache() only wants to see shadow entries and doesn't
care about which page is returned. Push the find_subpage() call into
find_lock_entry(), find_get_incore_page() and pagecache_get_page().

[willy@infradead.org: fix oops]
Link: https://lkml.kernel.org/r/20200914112738.GM6583@casper.infradead.org

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Alexey Dobriyan
Cc: Chris Wilson
Cc: Huang Ying
Cc: Hugh Dickins
Cc: Jani Nikula
Cc: Johannes Weiner
Cc: Matthew Auld
Cc: William Kucharski
Link: https://lkml.kernel.org/r/20200910183318.20139-7-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-10-14 09:38:29 +0800
61ef18655 mm: factor find_get_incore_page out of mincore_page ... Browse Code »

Patch series "Return head pages from find_*_entry", v2.

This patch series started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.

Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away. As part of auditing all
the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.

The diffstat is unflattering, but I added more kernel-doc and a new wrapper.

This patch (of 8);

Provide this functionality from the swap cache. It's useful for
more than just mincore().

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Hugh Dickins
Cc: William Kucharski
Cc: Jani Nikula
Cc: Alexey Dobriyan
Cc: Johannes Weiner
Cc: Chris Wilson
Cc: Matthew Auld
Cc: Huang Ying
Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-10-14 09:38:29 +0800

15 Aug, 2020

2 commits

b96a3db2f mm/swap_state: mark various intentional data races ... Browse Code »

swap_cache_info.* could be accessed concurrently as noticed by
KCSAN,

BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache

write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
lookup_swap_cache+0x12e/0x460
lookup_swap_cache at mm/swap_state.c:322
do_swap_page+0x112/0xeb0
__handle_mm_fault+0xc7a/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
lookup_swap_cache+0x117/0x460
lookup_swap_cache at mm/swap_state.c:322
shmem_swapin_page+0xc7/0x9e0
shmem_getpage_gfp+0x2ca/0x16c0
shmem_fault+0xef/0x3c0
__do_fault+0x9e/0x220
do_fault+0x4a0/0x920
__handle_mm_fault+0xc69/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40

Reported by Kernel Concurrency Sanitizer on:
CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G W O L 5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
__delete_from_swap_cache+0x681/0x8b0
__delete_from_swap_cache at mm/swap_state.c:178

read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
__delete_from_swap_cache+0x66e/0x8b0
__delete_from_swap_cache at mm/swap_state.c:178

Both the read and write are done as lockless. Since swap_cache_info.*
are only used to print out counter information, even if any of them
missed a few incremental due to data races, it will be harmless, so just
mark it as an intentional data race using the data_race() macro.

While at it, fix a checkpatch.pl warning,

WARNING: Single statement macros should not use a do {} while (0) loop

Signed-off-by: Qian Cai
Signed-off-by: Andrew Morton
Cc: Marco Elver
Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pw
Signed-off-by: Linus Torvalds

Qian Cai
2020-08-15 10:56:57 +0800
6c357848b mm: replace hpage_nr_pages with thp_nr_pages ... Browse Code »

The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: William Kucharski
Reviewed-by: Zi Yan
Cc: Mike Kravetz
Cc: David Hildenbrand
Cc: "Kirill A. Shutemov"
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-08-15 10:56:56 +0800

13 Aug, 2020

2 commits

aae466b00 mm/swap: implement workingset detection for anonymous LRU ... Browse Code »

This patch implements workingset detection for anonymous LRU. All the
infrastructure is implemented by the previous patches so this patch just
activates the workingset detection by installing/retrieving the shadow
entry and adding refault calculation.

Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Acked-by: Johannes Weiner
Acked-by: Vlastimil Babka
Cc: Hugh Dickins
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Link: http://lkml.kernel.org/r/1595490560-15117-6-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds

Joonsoo Kim
2020-08-13 01:57:56 +0800
3852f6768 mm/swapcache: support to handle the shadow entries ... Browse Code »

Workingset detection for anonymous page will be implemented in the
following patch and it requires to store the shadow entries into the
swapcache. This patch implements an infrastructure to store the shadow
entry in the swapcache.

Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Acked-by: Johannes Weiner
Cc: Hugh Dickins
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Vlastimil Babka
Link: http://lkml.kernel.org/r/1595490560-15117-5-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds

Joonsoo Kim
2020-08-13 01:57:55 +0800

08 Aug, 2020

1 commit

27ec4878d mm: swap: fix kerneldoc of swap_vma_readahead() ... Browse Code »

Fix W=1 compile warnings (invalid kerneldoc):

mm/swap_state.c:742: warning: Function parameter or member 'fentry' not described in 'swap_vma_readahead'
mm/swap_state.c:742: warning: Excess function parameter 'entry' description in 'swap_vma_readahead'

Signed-off-by: Krzysztof Kozlowski
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200728171109.28687-2-krzk@kernel.org
Signed-off-by: Linus Torvalds

Krzysztof Kozlowski
2020-08-08 02:33:24 +0800

26 Jun, 2020

1 commit

243bce09c mm: fix swap cache node allocation mask ... Browse Code »

Chris Murphy reports that a slightly overcommitted load, testing swap
and zram along with i915, splats and keeps on splatting, when it had
better fail less noisily:

gnome-shell: page allocation failure: order:0,
mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
Call Trace:
dump_stack+0x64/0x88
warn_alloc.cold+0x75/0xd9
__alloc_pages_slowpath.constprop.0+0xcfa/0xd30
__alloc_pages_nodemask+0x2df/0x320
alloc_slab_page+0x195/0x310
allocate_slab+0x3c5/0x440
___slab_alloc+0x40c/0x5f0
__slab_alloc+0x1c/0x30
kmem_cache_alloc+0x20e/0x220
xas_nomem+0x28/0x70
add_to_swap_cache+0x321/0x400
__read_swap_cache_async+0x105/0x240
swap_cluster_readahead+0x22c/0x2e0
shmem_swapin+0x8e/0xc0
shmem_swapin_page+0x196/0x740
shmem_getpage_gfp+0x3a2/0xa60
shmem_read_mapping_page_gfp+0x32/0x60
shmem_get_pages+0x155/0x5e0 [i915]
__i915_gem_object_get_pages+0x68/0xa0 [i915]
i915_vma_pin+0x3fe/0x6c0 [i915]
eb_add_vma+0x10b/0x2c0 [i915]
i915_gem_do_execbuffer+0x704/0x3430 [i915]
i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
drm_ioctl_kernel+0x86/0xd0 [drm]
drm_ioctl+0x206/0x390 [drm]
ksys_ioctl+0x82/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported on 5.7, but it goes back really to 3.1: when
shmem_read_mapping_page_gfp() was implemented for use by i915, and
allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
__read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
Fixes: 68da9f055755 ("tmpfs: pass gfp to shmem_getpage_gfp")
Signed-off-by: Hugh Dickins
Reviewed-by: Vlastimil Babka
Reviewed-by: Matthew Wilcox (Oracle)
Reported-by: Chris Murphy
Analyzed-by: Vlastimil Babka
Analyzed-by: Matthew Wilcox
Tested-by: Chris Murphy
Cc: [3.1+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2020-06-26 15:27:37 +0800

10 Jun, 2020

2 commits

c1e8d7c6a mmap locking API: convert mmap_sem comments ... Browse Code »

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800
e31cf2f4c mm: don't include asm/pgtable.h if linux/mm.h is already included ... Browse Code »

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include
in the files that include .

The include statements in such cases are remove with a simple loop:

for f in $(git grep -l "include ") ; do
sed -i -e '/include / d' $f
done

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Cc: Arnd Bergmann
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Chris Zankel
Cc: "David S. Miller"
Cc: Geert Uytterhoeven
Cc: Greentime Hu
Cc: Greg Ungerer
Cc: Guan Xuetao
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Matthew Wilcox
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Mike Rapoport
Cc: Nick Hu
Cc: Paul Walmsley
Cc: Richard Weinberger
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-06-10 00:39:13 +0800

04 Jun, 2020

5 commits

96f8bf4fb mm: vmscan: reclaim writepage is IO cost ... Browse Code »

The VM tries to balance reclaim pressure between anon and file so as to
reduce the amount of IO incurred due to the memory shortage. It already
counts refaults and swapins, but in addition it should also count
writepage calls during reclaim.

For swap, this is obvious: it's IO that wouldn't have occurred if the
anonymous memory hadn't been under memory pressure. From a relative
balancing point of view this makes sense as well: even if anon is cold and
reclaimable, a cache that isn't thrashing may have equally cold pages that
don't require IO to reclaim.

For file writeback, it's trickier: some of the reclaim writepage IO would
have likely occurred anyway due to dirty expiration. But not all of it -
premature writeback reduces batching and generates additional writes.
Since the flushers are already woken up by the time the VM starts writing
cache pages one by one, let's assume that we'e likely causing writes that
wouldn't have happened without memory pressure. In addition, the per-page
cost of IO would have probably been much cheaper if written in larger
batches from the flusher thread rather than the single-page-writes from
kswapd.

For our purposes - getting the trend right to accelerate convergence on a
stable state that doesn't require paging at all - this is sufficiently
accurate. If we later wanted to optimize for sustained thrashing, we can
still refine the measurements.

Count all writepage calls from kswapd as IO cost toward the LRU that the
page belongs to.

Why do this dynamically? Don't we know in advance that anon pages require
IO to reclaim, and so could build in a static bias?

First, scanning is not the same as reclaiming. If all the anon pages are
referenced, we may not swap for a while just because we're scanning the
anon list. During this time, however, it's important that we age
anonymous memory and the page cache at the same rate so that their
hot-cold gradients are comparable. Everything else being equal, we still
want to reclaim the coldest memory overall.

Second, we keep copies in swap unless the page changes. If there is
swap-backed data that's mostly read (tmpfs file) and has been swapped out
before, we can reclaim it without incurring additional IO.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Rik van Riel
Link: http://lkml.kernel.org/r/20200520232525.798933-14-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-06-04 11:09:49 +0800
314b57fb0 mm: balance LRU lists based on relative thrashing ... Browse Code »

Since the LRUs were split into anon and file lists, the VM has been
balancing between page cache and anonymous pages based on per-list ratios
of scanned vs. rotated pages. In most cases that tips page reclaim
towards the list that is easier to reclaim and has the fewest actively
used pages, but there are a few problems with it:

1. Refaults and LRU rotations are weighted the same way, even though
one costs IO and the other costs a bit of CPU.

2. The less we scan an LRU list based on already observed rotations,
the more we increase the sampling interval for new references, and
rotations become even more likely on that list. This can enter a
death spiral in which we stop looking at one list completely until
the other one is all but annihilated by page reclaim.

Since commit a528910e12ec ("mm: thrash detection-based file cache sizing")
we have refault detection for the page cache. Along with swapin events,
they are good indicators of when the file or anon list, respectively, is
too small for its workingset and needs to grow.

For example, if the page cache is thrashing, the cache pages need more
time in memory, while there may be colder pages on the anonymous list.
Likewise, if swapped pages are faulting back in, it indicates that we
reclaim anonymous pages too aggressively and should back off.

Replace LRU rotations with refaults and swapins as the basis for relative
reclaim cost of the two LRUs. This will have the VM target list balances
that incur the least amount of IO on aggregate.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Rik van Riel
Link: http://lkml.kernel.org/r/20200520232525.798933-12-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-06-04 11:09:49 +0800
6058eaec8 mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() ... Browse Code »

They're the same function, and for the purpose of all callers they are
equivalent to lru_cache_add().

[akpm@linux-foundation.org: fix it for local_lock changes]
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Rik van Riel
Acked-by: Michal Hocko
Acked-by: Minchan Kim
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-06-04 11:09:48 +0800
d9eb1ea2b mm: memcontrol: delete unused lrucare handling ... Browse Code »

Swapin faults were the last event to charge pages after they had already
been put on the LRU list. Now that we charge directly on swapin, the
lrucare portion of the charge code is unused.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Joonsoo Kim
Cc: Alex Shi
Cc: Hugh Dickins
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Balbir Singh
Cc: Shakeel Butt
Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-06-04 11:09:48 +0800
4c6355b25 mm: memcontrol: charge swapin pages on instantiation ... Browse Code »

Right now, users that are otherwise memory controlled can easily escape
their containment and allocate significant amounts of memory that they're
not being charged for. That's because swap readahead pages are not being
charged until somebody actually faults them into their page table. This
can be exploited with MADV_WILLNEED, which triggers arbitrary readahead
allocations without charging the pages.

There are additional problems with the delayed charging of swap pages:

1. To implement refault/workingset detection for anonymous pages, we
need to have a target LRU available at swapin time, but the LRU is not
determinable until the page has been charged.

2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
stable when the page is isolated from the LRU; otherwise, the locks
change under us. But swapcache gets charged after it's already on the
LRU, and even if we cannot isolate it ourselves (since charging is not
exactly optional).

The previous patch ensured we always maintain cgroup ownership records for
swap pages. This patch moves the swapcache charging point from the fault
handler to swapin time to fix all of the above problems.

v2: simplify swapin error checking (Joonsoo)

[hughd@google.com: fix livelock in __read_swap_cache_async()]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005212246080.8458@eggly.anvils
Signed-off-by: Johannes Weiner
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Reviewed-by: Alex Shi
Cc: Hugh Dickins
Cc: Joonsoo Kim
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Roman Gushchin
Cc: Shakeel Butt
Cc: Balbir Singh
Cc: Rafael Aquini
Cc: Alex Shi
Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-06-04 11:09:48 +0800

03 Jun, 2020

1 commit

d6c1f098f mm/swap_state: fix a data race in swapin_nr_pages ... Browse Code »

"prev_offset" is a static variable in swapin_nr_pages() that can be
accessed concurrently with only mmap_sem held in read mode as noticed by
KCSAN,

BUG: KCSAN: data-race in swap_cluster_readahead / swap_cluster_readahead

write to 0xffffffff92763830 of 8 bytes by task 14795 on cpu 17:
swap_cluster_readahead+0x2a6/0x5e0
swapin_readahead+0x92/0x8dc
do_swap_page+0x49b/0xf20
__handle_mm_fault+0xcfb/0xd70
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x715
page_fault+0x34/0x40

1 lock held by (dnf)/14795:
#0: ffff897bd2e98858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
do_user_addr_fault at arch/x86/mm/fault.c:1405
(inlined by) do_page_fault at arch/x86/mm/fault.c:1535
irq event stamp: 83493
count_memcg_event_mm+0x1a6/0x270
count_memcg_event_mm+0x119/0x270
__do_softirq+0x365/0x589
irq_exit+0xa2/0xc0

read to 0xffffffff92763830 of 8 bytes by task 1 on cpu 22:
swap_cluster_readahead+0xfd/0x5e0
swapin_readahead+0x92/0x8dc
do_swap_page+0x49b/0xf20
__handle_mm_fault+0xcfb/0xd70
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x715
page_fault+0x34/0x40

1 lock held by systemd/1:
#0: ffff897c38f14858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
irq event stamp: 43530289
count_memcg_event_mm+0x1a6/0x270
count_memcg_event_mm+0x119/0x270
__do_softirq+0x365/0x589
irq_exit+0xa2/0xc0

Signed-off-by: Qian Cai
Signed-off-by: Andrew Morton
Cc: Marco Elver
Cc: Hugh Dickins
Link: http://lkml.kernel.org/r/20200402213748.2237-1-cai@lca.pw
Signed-off-by: Linus Torvalds

Qian Cai
2020-06-03 01:59:08 +0800

03 Apr, 2020

1 commit

cb7744513 mm/swap_state.c: use the same way to count page in [add_to|delete_from]_swap_cache ... Browse Code »

add_to_swap_cache() and delete_from_swap_cache() are counterparts, while
currently they use different ways to count pages.

It doesn't break anything because we only have two sizes for PageAnon, but
this is confusing and not good practice.

This patch corrects it by making both functions use hpage_nr_pages().

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Reviewed-by: Matthew Wilcox (Oracle)
Link: http://lkml.kernel.org/r/20200315012920.2687-1-richard.weiyang@gmail.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-04-03 00:35:28 +0800

25 Sep, 2019

2 commits

4101196b1 mm: page cache: store only head pages in i_pages ... Browse Code »

Transparent Huge Pages are currently stored in i_pages as pointers to
consecutive subpages. This patch changes that to storing consecutive
pointers to the head page in preparation for storing huge pages more
efficiently in i_pages.

Large parts of this are "inspired" by Kirill's patch
https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

Kirill and Huang Ying contributed several fixes.

[willy@infradead.org: use compound_nr, squish uninit-var warning]
Link: http://lkml.kernel.org/r/20190731210400.7419-1-willy@infradead.org
Signed-off-by: Matthew Wilcox
Acked-by: Jan Kara
Reviewed-by: Kirill Shutemov
Reviewed-by: Song Liu
Tested-by: Song Liu
Tested-by: William Kucharski
Reviewed-by: William Kucharski
Tested-by: Qian Cai
Tested-by: Mikhail Gavrilov
Cc: Hugh Dickins
Cc: Chris Wilson
Cc: Song Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2019-09-25 06:54:08 +0800
d8c6546b1 mm: introduce compound_nr() ... Browse Code »

Replace 1 << compound_order(page) with compound_nr(page). Minor
improvements in readability.

Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Andrew Morton
Reviewed-by: Ira Weiny
Acked-by: Kirill A. Shutemov
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2019-09-25 06:54:08 +0800

13 Jul, 2019

2 commits

054f1d1fa mm/swap_state.c: simplify total_swapcache_pages() with get_swap_device() ... Browse Code »

total_swapcache_pages() may race with swapper_spaces[] allocation and
freeing. Previously, this is protected with a swapper_spaces[] specific
RCU mechanism. To simplify the logic/code complexity, it is replaced with
get/put_swap_device(). The code line number is reduced too. Although not
so important, the swapoff() performance improves too because one
synchronize_rcu() call during swapoff() is deleted.

[ying.huang@intel.com: fix bad swap file entry warning]
Link: http://lkml.kernel.org/r/20190531024102.21723-1-ying.huang@intel.com
Link: http://lkml.kernel.org/r/20190527082714.12151-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying"
Reviewed-by: Andrew Morton
Tested-by: Mike Kravetz
Cc: Hugh Dickins
Cc: Paul E. McKenney
Cc: Minchan Kim
Cc: Johannes Weiner
Cc: Tim Chen
Cc: Mel Gorman
Cc: Jérôme Glisse
Cc: Michal Hocko
Cc: Andrea Arcangeli
Cc: Yang Shi
Cc: David Rientjes
Cc: Rik van Riel
Cc: Jan Kara
Cc: Dave Jiang
Cc: Daniel Jordan
Cc: Andrea Parri
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2019-07-13 02:05:43 +0800
eb085574a mm, swap: fix race between swapoff and some swap operations ... Browse Code »

When swapin is performed, after getting the swap entry information from
the page table, system will swap in the swap entry, without any lock held
to prevent the swap device from being swapoff. This may cause the race
like below,

CPU 1 CPU 2
----- -----
do_swap_page
swapin_readahead
__read_swap_cache_async
swapoff swapcache_prepare
p->swap_map = NULL __swap_duplicate
p->swap_map[?] /* !!! NULL pointer access */

Because swapoff is usually done when system shutdown only, the race may
not hit many people in practice. But it is still a race need to be fixed.

To fix the race, get_swap_device() is added to check whether the specified
swap entry is valid in its swap device. If so, it will keep the swap
entry valid via preventing the swap device from being swapoff, until
put_swap_device() is called.

Because swapoff() is very rare code path, to make the normal path runs as
fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of
reference count is used to implement get/put_swap_device(). >From
get_swap_device() to put_swap_device(), RCU reader side is locked, so
synchronize_rcu() in swapoff() will wait until put_swap_device() is
called.

In addition to swap_map, cluster_info, etc. data structure in the struct
swap_info_struct, the swap cache radix tree will be freed after swapoff,
so this patch fixes the race between swap cache looking up and swapoff
too.

Races between some other swap cache usages and swapoff are fixed too via
calling synchronize_rcu() between clearing PageSwapCache() and freeing
swap cache data structure.

Another possible method to fix this is to use preempt_off() +
stop_machine() to prevent the swap device from being swapoff when its data
structure is being accessed. The overhead in hot-path of both methods is
similar. The advantages of RCU based method are,

1. stop_machine() may disturb the normal execution code path on other
CPUs.

2. File cache uses RCU to protect its radix tree. If the similar
mechanism is used for swap cache too, it is easier to share code
between them.

3. RCU is used to protect swap cache in total_swapcache_pages() and
exit_swap_address_space() already. The two mechanisms can be
merged to simplify the logic.

Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com
Fixes: 235b62176712 ("mm/swap: add cluster lock")
Signed-off-by: "Huang, Ying"
Reviewed-by: Andrea Parri
Not-nacked-by: Hugh Dickins
Cc: Andrea Arcangeli
Cc: Paul E. McKenney
Cc: Daniel Jordan
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Johannes Weiner
Cc: Tim Chen
Cc: Mel Gorman
Cc: Jérôme Glisse
Cc: Yang Shi
Cc: David Rientjes
Cc: Rik van Riel
Cc: Jan Kara
Cc: Dave Jiang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2019-07-13 02:05:43 +0800

06 Jul, 2019

1 commit

69bf4b6b5 Revert "mm: page cache: store only head pages in i_pages" ... Browse Code »

This reverts commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f.

Mikhail Gavrilov reports that it causes the VM_BUG_ON_PAGE() in
__delete_from_swap_cache() to trigger:

page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689 index:0xfecec363
anon
flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
page dumped because: VM_BUG_ON_PAGE(entry != page)
page->mem_cgroup:ffff978433ace000
------------[ cut here ]------------
kernel BUG at mm/swap_state.c:170!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2202 04/11/2019
RIP: 0010:__delete_from_swap_cache+0x20d/0x240
Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff 0b 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f
RSP: 0018:ffffa982036e7980 EFLAGS: 00010046
RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900
RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535
R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120
R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000
FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0
Call Trace:
delete_from_swap_cache+0x46/0xa0
try_to_free_swap+0xbc/0x110
swap_writepage+0x13/0x70
pageout.isra.0+0x13c/0x350
shrink_page_list+0xc14/0xdf0
shrink_inactive_list+0x1e5/0x3c0
shrink_node_memcg+0x202/0x760
shrink_node+0xe0/0x470
balance_pgdat+0x2d1/0x510
kswapd+0x220/0x420
kthread+0xfb/0x130
ret_from_fork+0x22/0x40

and it's not immediately obvious why it happens. It's too late in the
rc cycle to do anything but revert for now.

Link: https://lore.kernel.org/lkml/CABXGCsN9mYmBD-4GaaeW_NrDu+FDXLzr_6x+XNxfmFV6QkYCDg@mail.gmail.com/
Reported-and-bisected-by: Mikhail Gavrilov
Suggested-by: Jan Kara
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Matthew Wilcox
Cc: Kirill Shutemov
Cc: William Kucharski
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2019-07-06 10:55:18 +0800

15 May, 2019

1 commit

5fd4ca2d8 mm: page cache: store only head pages in i_pages ... Browse Code »

Transparent Huge Pages are currently stored in i_pages as pointers to
consecutive subpages. This patch changes that to storing consecutive
pointers to the head page in preparation for storing huge pages more
efficiently in i_pages.

Large parts of this are "inspired" by Kirill's patch
https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

[willy@infradead.org: fix swapcache pages]
Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org
[kirill@shutemov.name: hugetlb stores pages in page cache differently]
Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1
Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org
Signed-off-by: Matthew Wilcox
Acked-by: Jan Kara
Reviewed-by: Kirill Shutemov
Reviewed-and-tested-by: Song Liu
Tested-by: William Kucharski
Reviewed-by: William Kucharski
Tested-by: Qian Cai
Cc: Hugh Dickins
Cc: Song Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2019-05-15 00:47:45 +0800

06 Mar, 2019

2 commits

e9f598730 mm: swap: add comment for swap_vma_readahead ... Browse Code »

swap_vma_readahead()'s comment is missing, just add it.

Link: http://lkml.kernel.org/r/1546543673-108536-2-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi
Reviewed-by: Andrew Morton
Cc: Huang Ying
Cc: Tim Chen
Cc: Minchan Kim
Cc: Daniel Jordan
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yang Shi
2019-03-06 13:07:16 +0800
8fd2e0b50 mm: swap: check if swap backing device is congested or not ... Browse Code »

Swap readahead would read in a few pages regardless if the underlying
device is busy or not. It may incur long waiting time if the device is
congested, and it may also exacerbate the congestion.

Use inode_read_congested() to check if the underlying device is busy or
not like what file page readahead does. Get inode from
swap_info_struct.

Although we can add inode information in swap_address_space
(address_space->host), it may lead some unexpected side effect, i.e. it
may break mapping_cap_account_dirty(). Using inode from
swap_info_struct seems simple and good enough.

Just does the check in vma_cluster_readahead() since
swap_vma_readahead() is just used for non-rotational device which much
less likely has congestion than traditional HDD.

Although swap slots may be consecutive on swap partition, it still may
be fragmented on swap file. This check would help to reduce excessive
stall for such case.

The test with page_fault1 of will-it-scale (sometimes tracing may just
show runtest.py that is the wrapper script of page_fault1), which
basically launches NR_CPU threads to generate 128MB anonymous pages for
each thread, on my virtual machine with congested HDD shows long tail
latency is reduced significantly.

Without the patch
page_fault1_thr-1490 [023] 129.311706: funcgraph_entry: #57377.796 us | do_swap_page();
page_fault1_thr-1490 [023] 129.369103: funcgraph_entry: 5.642us | do_swap_page();
page_fault1_thr-1490 [023] 129.369119: funcgraph_entry: #1289.592 us | do_swap_page();
page_fault1_thr-1490 [023] 129.370411: funcgraph_entry: 4.957us | do_swap_page();
page_fault1_thr-1490 [023] 129.370419: funcgraph_entry: 1.940us | do_swap_page();
page_fault1_thr-1490 [023] 129.378847: funcgraph_entry: #1411.385 us | do_swap_page();
page_fault1_thr-1490 [023] 129.380262: funcgraph_entry: 3.916us | do_swap_page();
page_fault1_thr-1490 [023] 129.380275: funcgraph_entry: #4287.751 us | do_swap_page();

With the patch
runtest.py-1417 [020] 301.925911: funcgraph_entry: #9870.146 us | do_swap_page();
runtest.py-1417 [020] 301.935785: funcgraph_entry: 9.802us | do_swap_page();
runtest.py-1417 [020] 301.935799: funcgraph_entry: 3.551us | do_swap_page();
runtest.py-1417 [020] 301.935806: funcgraph_entry: 2.142us | do_swap_page();
runtest.py-1417 [020] 301.935853: funcgraph_entry: 6.938us | do_swap_page();
runtest.py-1417 [020] 301.935864: funcgraph_entry: 3.765us | do_swap_page();
runtest.py-1417 [020] 301.935871: funcgraph_entry: 3.600us | do_swap_page();
runtest.py-1417 [020] 301.935878: funcgraph_entry: 7.202us | do_swap_page();

[akpm@linux-foundation.org: code cleanup]
[yang.shi@linux.alibaba.com: add comment]
Link: http://lkml.kernel.org/r/bbc7bda7-62d0-df1a-23ef-d369e865bdca@linux.alibaba.com
Link: http://lkml.kernel.org/r/1546543673-108536-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi
Acked-by: Tim Chen
Reviewed-by: Andrew Morton
Cc: Huang Ying
Cc: Minchan Kim
Cc: Daniel Jordan
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yang Shi
2019-03-06 13:07:16 +0800

29 Oct, 2018

1 commit

dad4f140e Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax ... Browse Code »

Pull XArray conversion from Matthew Wilcox:
"The XArray provides an improved interface to the radix tree data
structure, providing locking as part of the API, specifying GFP flags
at allocation time, eliminating preloading, less re-walking the tree,
more efficient iterations and not exposing RCU-protected pointers to
its users.

This patch set

1. Introduces the XArray implementation

2. Converts the pagecache to use it

3. Converts memremap to use it

The page cache is the most complex and important user of the radix
tree, so converting it was most important. Converting the memremap
code removes the only other user of the multiorder code, which allows
us to remove the radix tree code that supported it.

I have 40+ followup patches to convert many other users of the radix
tree over to the XArray, but I'd like to get this part in first. The
other conversions haven't been in linux-next and aren't suitable for
applying yet, but you can see them in the xarray-conv branch if you're
interested"

* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
radix tree: Remove multiorder support
radix tree test: Convert multiorder tests to XArray
radix tree tests: Convert item_delete_rcu to XArray
radix tree tests: Convert item_kill_tree to XArray
radix tree tests: Move item_insert_order
radix tree test suite: Remove multiorder benchmarking
radix tree test suite: Remove __item_insert
memremap: Convert to XArray
xarray: Add range store functionality
xarray: Move multiorder_check to in-kernel tests
xarray: Move multiorder_shrink to kernel tests
xarray: Move multiorder account test in-kernel
radix tree test suite: Convert iteration test to XArray
radix tree test suite: Convert tag_tagged_items to XArray
radix tree: Remove radix_tree_clear_tags
radix tree: Remove radix_tree_maybe_preload_order
radix tree: Remove split/join code
radix tree: Remove radix_tree_update_node_t
page cache: Finish XArray conversion
dax: Convert page fault handlers to XArray
...

Linus Torvalds
2018-10-29 02:35:40 +0800

27 Oct, 2018

1 commit

1899ad18c mm: workingset: tell cache transitions from workingset thrashing ... Browse Code »

Refaults happen during transitions between workingsets as well as in-place
thrashing. Knowing the difference between the two has a range of
applications, including measuring the impact of memory shortage on the
system performance, as well as the ability to smarter balance pressure
between the filesystem cache and the swap-backed workingset.

During workingset transitions, inactive cache refaults and pushes out
established active cache. When that active cache isn't stale, however,
and also ends up refaulting, that's bonafide thrashing.

Introduce a new page flag that tells on eviction whether the page has been
active or not in its lifetime. This bit is then stored in the shadow
entry, to classify refaults as transitioning or thrashing.

How many page->flags does this leave us with on 32-bit?

20 bits are always page flags

21 if you have an MMU

23 with the zone bits for DMA, Normal, HighMem, Movable

29 with the sparsemem section bits

30 if PAE is enabled

31 with this patch.

So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes. If
that's not enough, the system can switch to discontigmem and re-gain the 6
or 7 sparsemem section bits.

Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Peter Zijlstra (Intel)
Tested-by: Daniel Drake
Tested-by: Suren Baghdasaryan
Cc: Christopher Lameter
Cc: Ingo Molnar
Cc: Johannes Weiner
Cc: Mike Galbraith
Cc: Peter Enderborg
Cc: Randy Dunlap
Cc: Shakeel Butt
Cc: Tejun Heo
Cc: Vinayak Menon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2018-10-27 07:26:32 +0800

21 Oct, 2018

3 commits

a28334862 page cache: Finish XArray conversion ... Browse Code »

With no more radix tree API users left, we can drop the GFP flags
and use xa_init() instead of INIT_RADIX_TREE().

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:44 +0800
4e17ec250 mm: Convert delete_from_swap_cache to XArray ... Browse Code »

Both callers of __delete_from_swap_cache have the swp_entry_t already,
so pass that in to make constructing the XA_STATE easier.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:37 +0800
8d93b41c0 mm: Convert add_to_swap_cache to XArray ... Browse Code »

Combine __add_to_swap_cache and add_to_swap_cache into one function
since there is no more need to preload.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:37 +0800

13 Jun, 2018

1 commit

778e1cdd8 treewide: kvzalloc() -> kvcalloc() ... Browse Code »

The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
patch replaces cases of:

kvzalloc(a * b, gfp)

with:
kvcalloc(a * b, gfp)

as well as handling cases of:

kvzalloc(a * b * c, gfp)

with:

kvzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kvcalloc(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kvzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kvzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kvzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kvzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kvzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kvzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kvzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kvzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kvzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kvzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kvzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kvzalloc
+ kvcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kvzalloc
+ kvcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kvzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kvzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kvzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kvzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kvzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kvzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kvzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kvzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kvzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kvzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kvzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kvzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kvzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kvzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kvzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kvzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kvzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kvzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kvzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kvzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kvzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kvzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kvzalloc(C1 * C2 * C3, ...)
|
kvzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kvzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kvzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kvzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kvzalloc(sizeof(THING) * C2, ...)
|
kvzalloc(sizeof(TYPE) * C2, ...)
|
kvzalloc(C1 * C2 * C3, ...)
|
kvzalloc(C1 * C2, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kvzalloc
+ kvcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kvzalloc
+ kvcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kvzalloc
+ kvcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kvzalloc
+ kvcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook

Kees Cook
2018-06-13 07:19:22 +0800

08 Jun, 2018

1 commit

bb98f2c5a mm, memcontrol: move swap charge handling into get_swap_page() ... Browse Code »

Patch series "mm, memcontrol: Implement memory.swap.events", v2.

This patchset implements memory.swap.events which contains max and fail
events so that userland can monitor and respond to swap running out.

This patch (of 2):

get_swap_page() is always followed by mem_cgroup_try_charge_swap().
This patch moves mem_cgroup_try_charge_swap() into get_swap_page() and
makes get_swap_page() call the function even after swap allocation
failure.

This simplifies the callers and consolidates memcg related logic and
will ease adding swap related memcg events.

Link: http://lkml.kernel.org/r/20180416230934.GH1911913@devbig577.frc2.facebook.com
Signed-off-by: Tejun Heo
Reviewed-by: Andrew Morton
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Roman Gushchin
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2018-06-08 08:34:34 +0800

12 Apr, 2018

1 commit

b93b01631 page cache: use xa_lock ... Browse Code »

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root. Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox
Acked-by: Jeff Layton
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Ryusuke Konishi
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2018-04-12 01:28:39 +0800

06 Apr, 2018

3 commits

f5c754d63 mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static ... Browse Code »

The bool enable_vma_readahead and swap_vma_readahead() are local to the
source and do not need to be in global scope, so make them static.

Cleans up sparse warnings:

mm/swap_state.c:41:6: warning: symbol 'enable_vma_readahead' was not declared. Should it be static?
mm/swap_state.c:742:13: warning: symbol 'swap_vma_readahead' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20180223164852.5159-1-colin.king@canonical.com
Signed-off-by: Colin Ian King
Reviewed-by: Andrew Morton
Acked-by: "Huang, Ying"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Colin Ian King
2018-04-06 12:36:27 +0800
e9e9b7ece mm: swap: unify cluster-based and vma-based swap readahead ... Browse Code »

This patch makes do_swap_page() not need to be aware of two different
swap readahead algorithms. Just unify cluster-based and vma-based
readahead function call.

Link: http://lkml.kernel.org/r/1509520520-32367-3-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/20180220085249.151400-3-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Andrew Morton
Cc: Hugh Dickins
Cc: Huang Ying
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2018-04-06 12:36:25 +0800
eaf649ebc mm: swap: clean up swap readahead ... Browse Code »

When I see recent change of swap readahead, I am very unhappy about
current code structure which diverges two swap readahead algorithm in
do_swap_page. This patch is to clean it up.

Main motivation is that fault handler doesn't need to be aware of
readahead algorithms but just should call swapin_readahead.

As first step, this patch cleans up a little bit but not perfect (I just
separate for review easier) so next patch will make the goal complete.

[minchan@kernel.org: do not check readahead flag with THP anon]
Link: http://lkml.kernel.org/r/874lm83zho.fsf@yhuang-dev.intel.com
Link: http://lkml.kernel.org/r/20180227232611.169883-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/1509520520-32367-2-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/20180220085249.151400-2-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Andrew Morton
Cc: Hugh Dickins
Cc: Huang Ying
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2018-04-06 12:36:25 +0800

16 Nov, 2017

2 commits

c6f92f9fb mm: remove cold parameter for release_pages ... Browse Code »

All callers of release_pages claim the pages being released are cache
hot. As no one cares about the hotness of pages being released to the
allocator, just ditch the parameter.

No performance impact is expected as the overhead is marginal. The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-7-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2017-11-16 10:21:06 +0800
783cb68ee mm/swap_state.c: declare a few variables as __read_mostly ... Browse Code »

These global variables are only set during initialization or rarely
change, so declare them as __read_mostly.

Link: http://lkml.kernel.org/r/1507802349-5554-1-git-send-email-changbin.du@intel.com
Signed-off-by: Changbin Du
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Changbin Du
2017-11-16 10:21:05 +0800