Eric Lee / smarc-fsl-linux-kernel

02 Jul, 2021

2 commits

b756a3b5e mm: device exclusive memory access ... Browse Code »

Some devices require exclusive write access to shared virtual memory (SVM)
ranges to perform atomic operations on that memory. This requires CPU
page tables to be updated to deny access whilst atomic operations are
occurring.

In order to do this introduce a new swap entry type
(SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for exclusive
access by a device all page table mappings for the particular range are
replaced with device exclusive swap entries. This causes any CPU access
to the page to result in a fault.

Faults are resovled by replacing the faulting entry with the original
mapping. This results in MMU notifiers being called which a driver uses
to update access permissions such as revoking atomic access. After
notifiers have been called the device will no longer have exclusive access
to the region.

Walking of the page tables to find the target pages is handled by
get_user_pages() rather than a direct page table walk. A direct page
table walk similar to what migrate_vma_collect()/unmap() does could also
have been utilised. However this resulted in more code similar in
functionality to what get_user_pages() provides as page faulting is
required to make the PTEs present and to break COW.

[dan.carpenter@oracle.com: fix signedness bug in make_device_exclusive_range()]
Link: https://lkml.kernel.org/r/YNIz5NVnZ5GiZ3u1@mwanda

Link: https://lkml.kernel.org/r/20210616105937.23201-8-apopple@nvidia.com
Signed-off-by: Alistair Popple
Signed-off-by: Dan Carpenter
Reviewed-by: Christoph Hellwig
Cc: Ben Skeggs
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: John Hubbard
Cc: "Matthew Wilcox (Oracle)"
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Shakeel Butt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alistair Popple
2021-07-02 02:06:03 +0800
af5cdaf82 mm: remove special swap entry functions ... Browse Code »

Patch series "Add support for SVM atomics in Nouveau", v11.

Introduction
============

Some devices have features such as atomic PTE bits that can be used to
implement atomic access to system memory. To support atomic operations to
a shared virtual memory page such a device needs access to that page which
is exclusive of the CPU. This series introduces a mechanism to
temporarily unmap pages granting exclusive access to a device.

These changes are required to support OpenCL atomic operations in Nouveau
to shared virtual memory (SVM) regions allocated with the
CL_MEM_SVM_ATOMICS clSVMAlloc flag. A more complete description of the
OpenCL SVM feature is available at
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/
OpenCL_API.html#_shared_virtual_memory .

Implementation
==============

Exclusive device access is implemented by adding a new swap entry type
(SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main
difference is that on fault the original entry is immediately restored by
the fault handler instead of waiting.

Restoring the entry triggers calls to MMU notifers which allows a device
driver to revoke the atomic access permission from the GPU prior to the
CPU finalising the entry.

Patches
=======

Patches 1 & 2 refactor existing migration and device private entry
functions.

Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
functionality into separate functions - try_to_migrate_one() and
try_to_munlock_one().

Patch 5 renames some existing code but does not introduce functionality.

Patch 6 is a small clean-up to swap entry handling in copy_pte_range().

Patch 7 contains the bulk of the implementation for device exclusive
memory.

Patch 8 contains some additions to the HMM selftests to ensure everything
works as expected.

Patch 9 is a cleanup for the Nouveau SVM implementation.

Patch 10 contains the implementation of atomic access for the Nouveau
driver.

Testing
=======

This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
which checks that GPU atomic accesses to system memory are atomic.
Without this series the test fails as there is no way of write-protecting
the page mapping which results in the device clobbering CPU writes. For
reference the test is available at
https://ozlabs.org/~apopple/opencl_svm_atomics/

Further testing has been performed by adding support for testing exclusive
access to the hmm-tests kselftests.

This patch (of 10):

Remove multiple similar inline functions for dealing with different types
of special swap entries.

Both migration and device private swap entries use the swap offset to
store a pfn. Instead of multiple inline functions to obtain a struct page
for each swap entry type use a common function pfn_swap_entry_to_page().
Also open-code the various entry_to_pfn() functions as this results is
shorter code that is easier to understand.

Link: https://lkml.kernel.org/r/20210616105937.23201-1-apopple@nvidia.com
Link: https://lkml.kernel.org/r/20210616105937.23201-2-apopple@nvidia.com
Signed-off-by: Alistair Popple
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Cc: "Matthew Wilcox (Oracle)"
Cc: Hugh Dickins
Cc: Peter Xu
Cc: Shakeel Butt
Cc: Ben Skeggs
Cc: Jason Gunthorpe
Cc: John Hubbard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alistair Popple
2021-07-02 02:06:03 +0800

25 Jun, 2021

10 commits

a7a69d8ba mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() ... Browse Code »

Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
ptlock in the PVMW_SYNC case? That too might have been responsible for
BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
I've never seen any.

Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com
Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Tested-by: Wang Yugui
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
a9a7504d9 mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes ... Browse Code »

Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

Crash dumps showed two tail pages of a shmem huge page remained mapped
by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
of a long unmapped range; and no page table had yet been allocated for
the head of the huge page to be mapped into.

Although designed to handle these odd misaligned huge-page-mapped-by-pte
cases, page_vma_mapped_walk() falls short by returning false prematurely
when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
are cases when a huge page may span the boundary, with ptes present in
the next.

Restructure page_vma_mapped_walk() as a loop to continue in these cases,
while keeping its layout much as before. Add a step_forward() helper to
advance pvmw->address across those boundaries: originally I tried to use
mm's standard p?d_addr_end() macros, but hit the same crash 512 times
less often: because of the way redundant levels are folded together, but
folded differently in different configurations, it was just too
difficult to use them correctly; and step_forward() is simpler anyway.

Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
a765c417d mm: page_vma_mapped_walk(): get vma_address_end() earlier ... Browse Code »

page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the
start, rather than later at next_pte.

It's a little unnecessary overhead on the first call, but makes for a
simpler loop in the following commit.

Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
474466301 mm: page_vma_mapped_walk(): use goto instead of while (1) ... Browse Code »

page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte,
and use "goto this_pte", in place of the "while (1)" loop at the end.

Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
b3807a91a mm: page_vma_mapped_walk(): add a level of indentation ... Browse Code »

page_vma_mapped_walk() cleanup: add a level of indentation to much of
the body, making no functional change in this commit, but reducing the
later diff when this is all converted to a loop.

[hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix]
Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com

Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
448282487 mm: page_vma_mapped_walk(): crossing page table boundary ... Browse Code »

page_vma_mapped_walk() cleanup: adjust the test for crossing page table
boundary - I believe pvmw->address is always page-aligned, but nothing
else here assumed that; and remember to reset pvmw->pte to NULL after
unmapping the page table, though I never saw any bug from that.

Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
e2e1d4076 mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block ... Browse Code »

page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
follow the same "return not_found, return not_found, return true"
pattern as the block above it (note: returning not_found there is never
premature, since existence or prior existence of huge pmd guarantees
good alignment).

Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Reviewed-by: Peter Xu
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
3306d3119 mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd ... Browse Code »

page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
use it in subsequent tests, instead of repeatedly dereferencing pointer.

Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Reviewed-by: Peter Xu
Cc: Alistair Popple
Cc: Matthew Wilcox
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
6d0fd5987 mm: page_vma_mapped_walk(): settle PageHuge on entry ... Browse Code »

page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of
the way at the start, so no need to worry about it later.

Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Reviewed-by: Peter Xu
Cc: Alistair Popple
Cc: "Kirill A. Shutemov"
Cc: Matthew Wilcox
Cc: Ralph Campbell
Cc: Wang Yugui
Cc: Will Deacon
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800
f003c03bd mm: page_vma_mapped_walk(): use page for pvmw->page ... Browse Code »

Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes".

I've marked all of these for stable: many are merely cleanups, but I
think they are much better before the main fix than after.

This patch (of 11):

page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page
was used, sometimes pvmw->page itself: use the local copy "page"
throughout.

Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com
Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com
Signed-off-by: Hugh Dickins
Reviewed-by: Alistair Popple
Acked-by: Kirill A. Shutemov
Reviewed-by: Peter Xu
Cc: Yang Shi
Cc: Wang Yugui
Cc: Matthew Wilcox
Cc: Ralph Campbell
Cc: Zi Yan
Cc: Will Deacon
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-25 10:40:53 +0800

17 Jun, 2021

2 commits

494334e43 mm/thp: fix vma_address() if virtual address below file offset ... Browse Code »

Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).

When that BUG() was changed to a WARN(), it would later crash on the
VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
mm/internal.h:vma_address(), used by rmap_walk_file() for
try_to_unmap().

vma_address() is usually correct, but there's a wraparound case when the
vm_start address is unusually low, but vm_pgoff not so low:
vma_address() chooses max(start, vma->vm_start), but that decides on the
wrong address, because start has become almost ULONG_MAX.

Rewrite vma_address() to be more careful about vm_pgoff; move the
VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can
be safely used from page_mapped_in_vma() and page_address_in_vma() too.

Add vma_address_end() to apply similar care to end address calculation,
in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
though it raises a question of whether callers would do better to supply
pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.

An irritation is that their apparent generality breaks down on KSM
pages, which cannot be located by the page->index that page_to_pgoff()
uses: as commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte()
for ksm pages") once discovered. I dithered over the best thing to do
about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both
vma_address() and vma_address_end(); though the only place in danger of
using it on them was try_to_unmap_one().

Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
head page, instead of thp_size(): to make the right calculation on a
hugetlbfs page, whether or not THPs are configured. try_to_unmap() is
used on hugetlbfs pages, but perhaps the wrong calculation never
mattered.

Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com
Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
Signed-off-by: Hugh Dickins
Acked-by: Kirill A. Shutemov
Cc: Alistair Popple
Cc: Jan Kara
Cc: Jue Wang
Cc: "Matthew Wilcox (Oracle)"
Cc: Miaohe Lin
Cc: Minchan Kim
Cc: Naoya Horiguchi
Cc: Oscar Salvador
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Shakeel Butt
Cc: Wang Yugui
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-17 00:24:42 +0800
732ed5582 mm/thp: try_to_unmap() use TTU_SYNC for safe splitting ... Browse Code »

Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
(!unmap_success): with dump_page() showing mapcount:1, but then its raw
struct page output showing _mapcount ffffffff i.e. mapcount 0.

And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
all indicative of some mapcount difficulty in development here perhaps.
But the !CONFIG_DEBUG_VM path handles the failures correctly and
silently.

I believe the problem is that once a racing unmap has cleared pte or
pmd, try_to_unmap_one() may skip taking the page table lock, and emerge
from try_to_unmap() before the racing task has reached decrementing
mapcount.

Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding
TTU_SYNC to the options, and passing that from unmap_page().

When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
for both: the slight overhead added should rarely matter, except perhaps
if splitting sparsely-populated multiply-mapped shmem. Once confident
that bugs are fixed, TTU_SYNC here can be removed, and the race
tolerated.

Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com
Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
Signed-off-by: Hugh Dickins
Cc: Alistair Popple
Cc: Jan Kara
Cc: Jue Wang
Cc: Kirill A. Shutemov
Cc: "Matthew Wilcox (Oracle)"
Cc: Miaohe Lin
Cc: Minchan Kim
Cc: Naoya Horiguchi
Cc: Oscar Salvador
Cc: Peter Xu
Cc: Ralph Campbell
Cc: Shakeel Butt
Cc: Wang Yugui
Cc: Yang Shi
Cc: Zi Yan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2021-06-17 00:24:42 +0800

07 May, 2021

1 commit

baf2f90ba mm: fix typos in comments ... Browse Code »

succed -> succeed in mm/hugetlb.c
wil -> will in mm/mempolicy.c
wit -> with in mm/page_alloc.c
Retruns -> Returns in mm/page_vma_mapped.c
confict -> conflict in mm/secretmem.c
No functionality changed.

Link: https://lkml.kernel.org/r/20210408140027.60623-1-lujialin4@huawei.com
Signed-off-by: Lu Jialin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lu Jialin
2021-05-07 15:26:35 +0800

16 Dec, 2020

1 commit

777f303c0 mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte ... Browse Code »

check_pte() needs a correct colon for kernel-doc markup, otherwise, gcc
has the following warning for W=1, mm/page_vma_mapped.c:86: warning:
Function parameter or member 'pvmw' not described in 'check_pte'

Link: https://lkml.kernel.org/r/1605597167-25145-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi
Cc: Randy Dunlap
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alex Shi
2020-12-16 04:13:41 +0800

15 Aug, 2020

2 commits

6c357848b mm: replace hpage_nr_pages with thp_nr_pages ... Browse Code »

The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[akpm@linux-foundation.org: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: William Kucharski
Reviewed-by: Zi Yan
Cc: Mike Kravetz
Cc: David Hildenbrand
Cc: "Kirill A. Shutemov"
Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-08-15 10:56:56 +0800
af3bbc12d mm: add thp_size ... Browse Code »

This function returns the number of bytes in a THP. It is like
page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
is disabled.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: William Kucharski
Reviewed-by: Zi Yan
Cc: David Hildenbrand
Cc: Mike Kravetz
Cc: "Kirill A. Shutemov"
Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-08-15 10:56:56 +0800

01 Feb, 2020

1 commit

5b8d6e37b mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page ... Browse Code »

When check_pte, pfn of normal, hugetlbfs and THP page need be compared.
The current implementation apply comparison as

- normal 4K page: page_pfn < page_pfn + 1
- hugetlbfs page: page_pfn < page_pfn + HPAGE_PMD_NR
- THP page: page_pfn < page_pfn + HPAGE_PMD_NR

in pfn_in_hpage. For hugetlbfs page, it should be page_pfn == pfn

Now, change pfn_in_hpage to pfn_is_match to highlight that comparison is
not only for THP and explicitly compare for these cases.

No impact upon current behavior, just make the code clear. I think it
is important to make the code clear - comparing hugetlbfs page in range
page_pfn < page_pfn + HPAGE_PMD_NR is confusing.

Link: http://lkml.kernel.org/r/1578737885-8890-1-git-send-email-lixinhai.lxh@gmail.com
Signed-off-by: Li Xinhai
Acked-by: Kirill A. Shutemov
Acked-by: Mike Kravetz
Cc: Kirill A. Shutemov
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Xinhai
2020-02-01 02:30:38 +0800

25 Sep, 2019

1 commit

a50b854e0 mm: introduce page_size() ... Browse Code »

Patch series "Make working with compound pages easier", v2.

These three patches add three helpers and convert the appropriate
places to use them.

This patch (of 3):

It's unnecessarily hard to find out the size of a potentially huge page.
Replace 'PAGE_SIZE << compound_order(page)' with page_size(page).

Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle)
Acked-by: Michal Hocko
Reviewed-by: Andrew Morton
Reviewed-by: Ira Weiny
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2019-09-25 06:54:08 +0800

31 Oct, 2018

1 commit

aab8d0520 mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly ... Browse Code »

Private ZONE_DEVICE pages use a special pte entry and thus are not
present. Properly handle this case in map_pte(), it is already handled in
check_pte(), the map_pte() part was lost in some rebase most probably.

Without this patch the slow migration path can not migrate back to any
private ZONE_DEVICE memory to regular memory. This was found after stress
testing migration back to system memory. This ultimatly can lead to the
CPU constantly page fault looping on the special swap entry.

Link: http://lkml.kernel.org/r/20181019160442.18723-3-jglisse@redhat.com
Signed-off-by: Ralph Campbell
Signed-off-by: Jérôme Glisse
Reviewed-by: Balbir Singh
Cc: Andrew Morton
Cc: Kirill A. Shutemov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ralph Campbell
2018-10-31 23:54:11 +0800

23 Jan, 2018

1 commit

7222708e8 mm, page_vma_mapped: Introduce pfn_in_hpage() ... Browse Code »

The new helper would check if the pfn belongs to the page. For huge
pages it checks if the PFN is within range covered by the huge page.

The helper is used in check_pte(). The original code the helper replaces
had two call to page_to_pfn(). page_to_pfn() is relatively costly.

Although current GCC is able to optimize code to have one call, it's
better to do this explicitly.

Signed-off-by: Kirill A. Shutemov
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2018-01-23 04:15:57 +0800

22 Jan, 2018

1 commit

0d665e7b1 mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte() ... Browse Code »

Tetsuo reported random crashes under memory pressure on 32-bit x86
system and tracked down to change that introduced
page_vma_mapped_walk().

The root cause of the issue is the faulty pointer math in check_pte().
As ->pte may point to an arbitrary page we have to check that they are
belong to the section before doing math. Otherwise it may lead to weird
results.

It wasn't noticed until now as mem_map[] is virtually contiguous on
flatmem or vmemmap sparsemem. Pointer arithmetic just works against all
'struct page' pointers. But with classic sparsemem, it doesn't because
each section memap is allocated separately and so consecutive pfns
crossing two sections might have struct pages at completely unrelated
addresses.

Let's restructure code a bit and replace pointer arithmetic with
operations on pfns.

Signed-off-by: Kirill A. Shutemov
Reported-and-tested-by: Tetsuo Handa
Acked-by: Michal Hocko
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2018-01-22 09:44:47 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

14 Oct, 2017

2 commits

a7b100953 mm: page_vma_mapped: ensure pmd is loaded with READ_ONCE outside of lock ... Browse Code »

Loading the pmd without holding the pmd_lock exposes us to races with
concurrent updaters of the page tables but, worse still, it also allows
the compiler to cache the pmd value in a register and reuse it later on,
even if we've performed a READ_ONCE in between and seen a more recent
value.

In the case of page_vma_mapped_walk, this leads to the following crash
when the pmd loaded for the initial pmd_trans_huge check is all zeroes
and a subsequent valid table entry is loaded by check_pmd. We then
proceed into map_pte, but the compiler re-uses the zero entry inside
pte_offset_map, resulting in a junk pointer being installed in
pvmw->pte:

PC is at check_pte+0x20/0x170
LR is at page_vma_mapped_walk+0x2e0/0x540
[...]
Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
Call trace:
check_pte+0x20/0x170
page_vma_mapped_walk+0x2e0/0x540
page_mkclean_one+0xac/0x278
rmap_walk_file+0xf0/0x238
rmap_walk+0x64/0xa0
page_mkclean+0x90/0xa8
clear_page_dirty_for_io+0x84/0x2a8
mpage_submit_page+0x34/0x98
mpage_process_page_bufs+0x164/0x170
mpage_prepare_extent_to_map+0x134/0x2b8
ext4_writepages+0x484/0xe30
do_writepages+0x44/0xe8
__filemap_fdatawrite_range+0xbc/0x110
file_write_and_wait_range+0x48/0xd8
ext4_sync_file+0x80/0x4b8
vfs_fsync_range+0x64/0xc0
SyS_msync+0x194/0x1e8

This patch fixes the problem by ensuring that READ_ONCE is used before
the initial checks on the pmd, and this value is subsequently used when
checking whether or not the pmd is present. pmd_check is removed and
the pmd_present check is inlined directly.

Link: http://lkml.kernel.org/r/1507222630-5839-1-git-send-email-will.deacon@arm.com
Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
Signed-off-by: Will Deacon
Tested-by: Yury Norov
Tested-by: Richard Ruigrok
Acked-by: Kirill A. Shutemov
Cc: "Paul E. McKenney"
Cc: Peter Zijlstra
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Will Deacon
2017-10-14 07:18:33 +0800
af0db981f mm: remove unnecessary WARN_ONCE in page_vma_mapped_walk(). ... Browse Code »

A non present pmd entry can appear after pmd_lock is taken in
page_vma_mapped_walk(), even if THP migration is not enabled. The
WARN_ONCE is unnecessary.

Link: http://lkml.kernel.org/r/20171003142606.12324-1-zi.yan@sent.com
Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan
Reported-by: Abdul Haleem
Tested-by: Abdul Haleem
Acked-by: Kirill A. Shutemov
Cc: Anshuman Khandual
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zi Yan
2017-10-14 07:18:32 +0800

09 Sep, 2017

2 commits

a5430dda8 mm/migrate: support un-addressable ZONE_DEVICE page in migration ... Browse Code »

Allow to unmap and restore special swap entry of un-addressable
ZONE_DEVICE memory.

Link: http://lkml.kernel.org/r/20170817000548.32038-17-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Cc: Kirill A. Shutemov
Cc: Aneesh Kumar
Cc: Balbir Singh
Cc: Benjamin Herrenschmidt
Cc: Dan Williams
Cc: David Nellans
Cc: Evgeny Baskakov
Cc: Johannes Weiner
Cc: John Hubbard
Cc: Mark Hairgrove
Cc: Michal Hocko
Cc: Paul E. McKenney
Cc: Ross Zwisler
Cc: Sherry Cheung
Cc: Subhash Gutti
Cc: Vladimir Davydov
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jérôme Glisse
2017-09-09 09:26:46 +0800
616b83715 mm: thp: enable thp migration in generic path ... Browse Code »

Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration. If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration. If you succeed to
allocate destination thp, you enter thp migration. Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan
Acked-by: Kirill A. Shutemov
Cc: "H. Peter Anvin"
Cc: Anshuman Khandual
Cc: Dave Hansen
Cc: David Nellans
Cc: Ingo Molnar
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Naoya Horiguchi
Cc: Thomas Gleixner
Cc: Vlastimil Babka
Cc: Andrea Arcangeli
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zi Yan
2017-09-09 09:26:45 +0800

07 Jul, 2017

1 commit

7868a2087 mm/hugetlb: add size parameter to huge_pte_offset() ... Browse Code »

A poisoned or migrated hugepage is stored as a swap entry in the page
tables. On architectures that support hugepages consisting of
contiguous page table entries (such as on arm64) this leads to ambiguity
in determining the page table entry to return in huge_pte_offset() when
a poisoned entry is encountered.

Let's remove the ambiguity by adding a size parameter to convey
additional information about the requested address. Also fixup the
definition/usage of huge_pte_offset() throughout the tree.

Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal
Acked-by: Steve Capper
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Tony Luck
Cc: Fenghua Yu
Cc: James Hogan (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle (supporter:MIPS)
Cc: "James E.J. Bottomley"
Cc: Helge Deller
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Yoshinori Sato
Cc: Rich Felker
Cc: "David S. Miller"
Cc: Chris Metcalf
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Alexander Viro
Cc: Michal Hocko
Cc: Mike Kravetz
Cc: Naoya Horiguchi
Cc: "Aneesh Kumar K.V"
Cc: "Kirill A. Shutemov"
Cc: Hillf Danton
Cc: Mark Rutland
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Punit Agrawal
2017-07-07 07:24:34 +0800

08 Apr, 2017

1 commit

d75450ff4 mm: fix page_vma_mapped_walk() for ksm pages ... Browse Code »

Doug Smythies reports oops with KSM in this backtrace, I've been seeing
the same:

page_vma_mapped_walk+0xe6/0x5b0
page_referenced_one+0x91/0x1a0
rmap_walk_ksm+0x100/0x190
rmap_walk+0x4f/0x60
page_referenced+0x149/0x170
shrink_active_list+0x1c2/0x430
shrink_node_memcg+0x67a/0x7a0
shrink_node+0xe1/0x320
kswapd+0x34b/0x720

Just as observed in commit 4b0ece6fa016 ("mm: migrate: fix
remove_migration_pte() for ksm pages"), you cannot use page->index
calculations on ksm pages.

page_vma_mapped_walk() is relying on __vma_address(), where a ksm page
can lead it off the end of the page table, and into whatever nonsense is
in the next page, ending as an oops inside check_pte()'s pte_page().

KSM tells page_vma_mapped_walk() exactly where to look for the page, it
does not need any page->index calculation: and that's so also for all
the normal and file and anon pages - just not for THPs and their
subpages. Get out early in most cases: instead of a PageKsm test, move
down the earlier not-THP-page test, as suggested by Kirill.

I'm also slightly worried that this loop can stray into other vmas, so
added a vm_end test to prevent surprises; though I have not imagined
anything worse than a very contrived case, in which a page mlocked in
the next vma might be reclaimed because it is not mlocked in this vma.

Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1704031104400.1118@eggly.anvils
Signed-off-by: Hugh Dickins
Reported-by: Doug Smythies
Tested-by: Doug Smythies
Reviewed-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2017-04-08 15:47:48 +0800

10 Mar, 2017

1 commit

c2febafc6 mm: convert generic code to 5-level paging ... Browse Code »

Convert all non-architecture-specific code to 5-level paging.

It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-03-10 03:48:47 +0800

25 Feb, 2017

2 commits

6a328a626 mm: convert page_mapped_in_vma() to use page_vma_mapped_walk() ... Browse Code »

For consistency, it worth converting all page_check_address() to
page_vma_mapped_walk(), so we could drop the former.

Link: http://lkml.kernel.org/r/20170129173858.45174-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Hillf Danton
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Srikar Dronamraju
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-02-25 09:46:55 +0800
ace71a19c mm: introduce page_vma_mapped_walk() ... Browse Code »

Introduce a new interface to check if a page is mapped into a vma. It
aims to address shortcomings of page_check_address{,_transhuge}.

Existing interface is not able to handle PTE-mapped THPs: it only finds
the first PTE. The rest lefted unnoticed.

page_vma_mapped_walk() iterates over all possible mapping of the page in
the vma.

Link: http://lkml.kernel.org/r/20170129173858.45174-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Hillf Danton
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Srikar Dronamraju
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-02-25 09:46:55 +0800