Eric Lee / smarc-fsl-linux-kernel

13 Aug, 2020

2 commits

bce617ede mm: do page fault accounting in handle_mm_fault ... Browse Code »

Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):

https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

What this series did:

- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.

- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.

Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.

Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.

Case (3): there're still quite some archs that have not enabled
this perf event.

Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.

- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.

- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.

Patchset layout:

Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

This patch (of 25):

This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Cc: Albert Ou
Cc: Alexander Gordeev
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Chris Zankel
Cc: Dave Hansen
Cc: David S. Miller
Cc: Geert Uytterhoeven
Cc: Gerald Schaefer
Cc: Greentime Hu
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: James E.J. Bottomley
Cc: John Hubbard
Cc: Jonas Bonn
Cc: Ley Foon Tan
Cc: "Luck, Tony"
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Nick Hu
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Richard Henderson
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:02 +0800
0cb80a2fb mm/hmm.c: delete duplicated word ... Browse Code »

Drop the repeated word "pages".

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Zi Yan
Link: http://lkml.kernel.org/r/20200801173822.14973-4-rdunlap@infradead.org
Signed-off-by: Linus Torvalds

Randy Dunlap
2020-08-13 01:57:58 +0800

11 Jul, 2020

1 commit

3b50a6e53 mm/hmm: provide the page mapping order in hmm_range_fault() ... Browse Code »

hmm_range_fault() returns an array of page frame numbers and flags for how
the pages are mapped in the requested process' page tables. The PFN can be
used to get the struct page with hmm_pfn_to_page() and the page size order
can be determined with compound_order(page).

However, if the page is larger than order 0 (PAGE_SIZE), there is no
indication that a compound page is mapped by the CPU using a larger page
size. Without this information, the caller can't safely use a large device
PTE to map the compound page because the CPU might be using smaller PTEs
with different read/write permissions.

Add a new function hmm_pfn_to_map_order() to return the mapping size order
so that callers know the pages are being mapped with consistent
permissions and a large device page table mapping can be used if one is
available.

This will allow devices to optimize mapping the page into HW by avoiding
or batching work for huge pages. For instance the dma_map can be done with
a high order directly.

Link: https://lore.kernel.org/r/20200701225352.9649-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe

Ralph Campbell
2020-07-11 03:24:28 +0800

10 Jun, 2020

1 commit

42fc54140 mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked() ... Browse Code »

Add new APIs to assert that mmap_sem is held.

Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
makes the assertions more tolerant of future changes to the lock type.

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800

11 May, 2020

3 commits

2733ea144 mm/hmm: remove the customizable pfn format from hmm_range_fault ... Browse Code »

Presumably the intent here was that hmm_range_fault() could put the data
into some HW specific format and thus avoid some work. However, nothing
actually does that, and it isn't clear how anything actually could do that
as hmm_range_fault() provides CPU addresses which must be DMA mapped.

Perhaps there is some special HW that does not need DMA mapping, but we
don't have any examples of this, and the theoretical performance win of
avoiding an extra scan over the pfns array doesn't seem worth the
complexity. Plus pfns needs to be scanned anyhow to sort out any
DEVICE_PRIVATE pages.

This version replaces the uint64_t with an usigned long containing a pfn
and fixed flags. On input flags is filled with the HMM_PFN_REQ_* values,
on successful output it is filled with HMM_PFN_* values, describing the
state of the pages.

amdgpu is simple to convert, it doesn't use snapshot and doesn't use
per-page flags.

nouveau uses only 16 hmm_pte entries at most (ie fits in a few cache
lines), and it sweeps over its pfns array a couple of times anyhow. It
also has a nasty call chain before it reaches the dma map and hardware
suggesting performance isn't important:

nouveau_svm_fault():
args.i.m.method = NVIF_VMM_V0_PFNMAP
nouveau_range_fault()
nvif_object_ioctl()
client->driver->ioctl()
struct nvif_driver nvif_driver_nvkm:
.ioctl = nvkm_client_ioctl
nvkm_ioctl()
nvkm_ioctl_path()
nvkm_ioctl_v0[type].func(..)
nvkm_ioctl_mthd()
nvkm_object_mthd()
struct nvkm_object_func nvkm_uvmm:
.mthd = nvkm_uvmm_mthd
nvkm_uvmm_mthd()
nvkm_uvmm_mthd_pfnmap()
nvkm_vmm_pfn_map()
nvkm_vmm_ptes_get_map()
func == gp100_vmm_pgt_pfn
struct nvkm_vmm_desc_func gp100_vmm_desc_spt:
.pfn = gp100_vmm_pgt_pfn
nvkm_vmm_iter()
REF_PTES == func == gp100_vmm_pgt_pfn()
dma_map_page()

Link: https://lore.kernel.org/r/5-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com
Acked-by: Felix Kuehling
Tested-by: Ralph Campbell
Signed-off-by: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-05-11 21:47:29 +0800
5c8f3c4cf mm/hmm: remove HMM_PFN_SPECIAL ... Browse Code »

This is just an alias for HMM_PFN_ERROR, nothing cares that the error was
because of a special page vs any other error case.

Link: https://lore.kernel.org/r/4-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com
Acked-by: Felix Kuehling
Reviewed-by: Christoph Hellwig
Reviewed-by: John Hubbard
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-05-11 21:47:29 +0800
be957c886 mm/hmm: make hmm_range_fault return 0 or -1 ... Browse Code »

hmm_vma_walk->last is supposed to be updated after every write to the
pfns, so that it can be returned by hmm_range_fault(). However, this is
not done consistently. Fortunately nothing checks the return code of
hmm_range_fault() for anything other than error.

More importantly last must be set before returning -EBUSY as it is used to
prevent reading an output pfn as an input flags when the loop restarts.

For clarity and simplicity make hmm_range_fault() return 0 or -ERRNO. Only
set last when returning -EBUSY.

Link: https://lore.kernel.org/r/2-v2-b4e84f444c7d+24f57-hmm_no_flags_jgg@mellanox.com
Acked-by: Felix Kuehling
Tested-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-05-11 21:47:29 +0800

31 Mar, 2020

4 commits

bd5d3587b mm/hmm: return error for non-vma snapshots ... Browse Code »

The pagewalker does not call most ops with NULL vma, those are all routed
to hmm_vma_walk_hole() via ops->pte_hole instead.

Thus hmm_vma_fault() is only called with a NULL vma from
hmm_vma_walk_hole(), so hoist the NULL vma check to there.

Now it is clear that snapshotting with no vma is a HMM_PFN_ERROR as
without a vma we have no path to call hmm_vma_fault().

Link: https://lore.kernel.org/r/20200327200021.29372-10-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-31 03:58:36 +0800
53bfe17ff mm/hmm: do not set pfns when returning an error code ... Browse Code »

Most places that return an error code, like -EFAULT, do not set
HMM_PFN_ERROR, only two places do this.

Resolve this inconsistency by never setting the pfns on an error
exit. This doesn't seem like a worthwhile thing to do anyhow.

If for some reason it becomes important, it makes more sense to directly
return the address of the failing page rather than have the caller scan
for the HMM_PFN_ERROR.

No caller inspects the pnfs output array if hmm_range_fault() fails.

Link: https://lore.kernel.org/r/20200327200021.29372-9-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-31 03:58:36 +0800
846babe85 mm/hmm: do not unconditionally set pfns when returning EBUSY ... Browse Code »

In hmm_vma_handle_pte() and hmm_vma_walk_hugetlb_entry() if fault happens
then -EBUSY will be returned and the pfns input flags will have been
destroyed.

For hmm_vma_handle_pte() set HMM_PFN_NONE only on the success returns that
don't otherwise store to pfns.

For hmm_vma_walk_hugetlb_entry() all exit paths already set pfns, so
remove the redundant store.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Link: https://lore.kernel.org/r/20200327200021.29372-8-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-31 03:58:36 +0800
f66c9a33a mm/hmm: use device_private_entry_to_pfn() ... Browse Code »

swp_offset() should not be called directly, the wrappers are supposed to
abstract away the encoding of the device_private specific information in
the swap entry.

Link: https://lore.kernel.org/r/20200327200021.29372-7-jgg@ziepe.ca
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Tested-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-31 03:58:36 +0800

28 Mar, 2020

4 commits

6bfef2f91 mm/hmm: remove HMM_FAULT_SNAPSHOT ... Browse Code »

Now that flags are handled on a fine-grained per-page basis this global
flag is redundant and has a confusing overlap with the pfn_flags_mask and
default_flags.

Normalize the HMM_FAULT_SNAPSHOT behavior into one place. Callers needing
the SNAPSHOT behavior should set a pfn_flags_mask and default_flags that
always results in a cleared HMM_PFN_VALID. Then no pages will be faulted,
and HMM_FAULT_SNAPSHOT is not a special flow that overrides the masking
mechanism.

As this is the last flag, also remove the flags argument. If future flags
are needed they can be part of the struct hmm_range function arguments.

Link: https://lore.kernel.org/r/20200327200021.29372-5-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-28 07:19:24 +0800
f970b977e mm/hmm: remove unused code and tidy comments ... Browse Code »

Delete several functions that are never called, fix some desync between
comments and structure content, toss the now out of date top of file
header, and move one function only used by hmm.c into hmm.c

Link: https://lore.kernel.org/r/20200327200021.29372-4-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-28 07:19:24 +0800
a3eb13c15 mm/hmm: return the fault type from hmm_pte_need_fault() ... Browse Code »

Using two bools instead of flags return is not necessary and leads to
bugs. Returning a value is easier for the compiler to check and easier to
pass around the code flow.

Convert the two bools into flags and push the change to all callers.

Link: https://lore.kernel.org/r/20200327200021.29372-3-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-28 07:19:24 +0800
068354ade mm/hmm: remove pgmap checking for devmap pages ... Browse Code »

The checking boils down to some racy check if the pagemap is still
available or not. Instead of checking this, rely entirely on the
notifiers, if a pagemap is destroyed then all pages that belong to it must
be removed from the tables and the notifiers triggered.

Link: https://lore.kernel.org/r/20200327200021.29372-2-jgg@ziepe.ca
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Tested-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-28 07:19:23 +0800

27 Mar, 2020

14 commits

08ddddda6 mm/hmm: check the device private page owner in hmm_range_fault() ... Browse Code »

hmm_range_fault() will succeed for any kind of device private memory, even
if it doesn't belong to the calling entity. While nouveau has some crude
checks for that, they are broken because they assume nouveau is the only
user of device private memory. Fix this by passing in an expected pgmap
owner in the hmm_range_fault structure.

If a device_private page is found and doesn't match the owner then it is
treated as an non-present and non-faultable page.

This prevents a bug in amdgpu, where it doesn't know how to handle
device_private pages, but hmm_range_fault would return them anyhow.

Fixes: 4ef589dc9b10 ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE")
Link: https://lore.kernel.org/r/20200316193216.920734-5-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2020-03-27 01:33:38 +0800
17ffdc482 mm: simplify device private page handling in hmm_range_fault ... Browse Code »

Remove the HMM_PFN_DEVICE_PRIVATE flag, no driver has ever set this flag
on input, and the only place that uses it on output can be trivially
changed to use is_device_private_page().

This removes the ability to request that device_private pages are faulted
back into system memory.

Link: https://lore.kernel.org/r/20200316193216.920734-4-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2020-03-27 01:33:38 +0800
5a0c38d30 mm: merge hmm_vma_do_fault into into hmm_vma_walk_hole_ ... Browse Code »

There is no good reason for this split, as it just obsfucates the flow.

Link: https://lore.kernel.org/r/20200316135310.899364-6-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2020-03-27 01:33:37 +0800
f8c888a30 mm/hmm: don't handle the non-fault case in hmm_vma_walk_hole_() ... Browse Code »

Setting a pfns entry to NONE before returning -EBUSY is a bug that will
cause corruption of the input flags on the next loop.

There is just a single caller using hmm_vma_walk_hole_() for the non-fault
case. Use hmm_pfns_fill() to fill the whole pfn array with zeroes in the
only caller for the non-fault case and remove the non-fault path from
hmm_vma_walk_hole_(). This avoids setting NONE before returning -EBUSY.

Also rename the function to hmm_vma_fault() to better describe what it
does.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Link: https://lore.kernel.org/r/20200316135310.899364-5-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2020-03-27 01:33:37 +0800
45050692d mm/hmm: simplify hmm_vma_walk_hugetlb_entry() ... Browse Code »

Remove the rather confusing goto label and just handle the fault case
directly in the branch checking for it.

Link: https://lore.kernel.org/r/20200316135310.899364-4-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2020-03-27 01:33:37 +0800
96268163f mm/hmm: remove the unused HMM_FAULT_ALLOW_RETRY flag ... Browse Code »

The HMM_FAULT_ALLOW_RETRY isn't used anywhere in the tree. Remove it and
the weird -EAGAIN handling where handle_mm_fault() drops the mmap_sem.

Link: https://lore.kernel.org/r/20200316135310.899364-3-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2020-03-27 01:33:37 +0800
24cee8ab4 mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd() ... Browse Code »

pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
fault is requested then the pfns should be returned with the not valid
flags.

It should not unconditionally fault if faulting is not requested.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800
405506274 mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling ... Browse Code »

Currently if a special PTE is encountered hmm_range_fault() immediately
returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
uses).

EFAULT should only be returned after testing with hmm_pte_need_fault().

Also pte_devmap() and pte_special() are exclusive, and there is no need to
check IS_ENABLED, pte_special() is stubbed out to return false on
unsupported architectures.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800
2288a9a68 mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages ... Browse Code »

hmm_range_fault() should never return 0 if the caller requested a valid
page, but the pfns output for that page would be HMM_PFN_ERROR.

hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
detect if the page is in faulting mode or not.

Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
code.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800
76612d6ce mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte() ... Browse Code »

The intention with this code is to determine if the caller required the
pages to be valid, and if so, then take some action to make them valid.
The action varies depending on the page type.

In all cases, if the caller doesn't ask for the page, then
hmm_range_fault() should not return an error.

Revise the implementation to be clearer, and fix some bugs:

- hmm_pte_need_fault() must always be called before testing fault or
write_fault otherwise the defaults of false apply and the if()'s don't
work. This was missed on the is_migration_entry() branch

- -EFAULT should not be returned unless hmm_pte_need_fault() indicates
fault is required - ie snapshotting should not fail.

- For !pte_present() the cpu_flags are always 0, except in the special
case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
confusing.

Reorganize the flow so that it always follows the pattern of calling
hmm_pte_need_fault() and then checking fault || write_fault.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800
c2579c9c4 mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT ... Browse Code »

All return paths that do EFAULT must call hmm_range_need_fault() to
determine if the user requires this page to be valid.

If the page cannot be made valid if the user later requires it, due to vma
flags in this case, then the return should be HMM_PFN_ERROR.

Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800
7d082987e mm/hmm: add missing pfns set to hmm_vma_walk_pmd() ... Browse Code »

All success exit paths from the walker functions must set the pfns array.

A migration entry with no required fault is a HMM_PFN_NONE return, just
like the pte case.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800
05fc1df95 mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock ... Browse Code »

This eventually calls into handle_mm_fault() which is a sleeping function.
Release the lock first.

hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
need the lock.

Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
Cc: Steven Price
Reviewed-by: Ralph Campbell
Reviewed-by: Steven Price
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800
dfdc22078 mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte() ... Browse Code »

Many of the direct returns of error skipped doing the pte_unmap(). All non
zero exit paths must unmap the pte.

The pte_unmap() is split unnaturally like this because some of the error
exit paths trigger a sleep and must release the lock before sleeping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2020-03-27 01:33:37 +0800

04 Feb, 2020

2 commits

b7a16c7ad mm: pagewalk: add 'depth' parameter to pte_hole ... Browse Code »

The pte_hole() callback is called at multiple levels of the page tables.
Code dumping the kernel page tables needs to know what at what depth the
missing entry is. Add this is an extra parameter to pte_hole(). When the
depth isn't know (e.g. processing a vma) then -1 is passed.

The depth that is reported is the actual level where the entry is missing
(ignoring any folding that is in place), i.e. any levels where
PTRS_PER_P?D is set to 1 are ignored.

Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
natural numbers as levels 2/3/4.

Link: http://lkml.kernel.org/r/20191218162402.45610-16-steven.price@arm.com
Signed-off-by: Steven Price
Tested-by: Zong Li
Cc: Albert Ou
Cc: Alexandre Ghiti
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Dave Hansen
Cc: David S. Miller
Cc: Heiko Carstens
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: James Hogan
Cc: James Morse
Cc: Jerome Glisse
Cc: "Liang, Kan"
Cc: Mark Rutland
Cc: Michael Ellerman
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Russell King
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vineet Gupta
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Price
2020-02-04 11:05:25 +0800
3afc42363 mm: pagewalk: add p4d_entry() and pgd_entry() ... Browse Code »

pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were no
users. We're about to add users so reintroduce them, along with
p4d_entry() as we now have 5 levels of tables.

Note that commit a00cc7d9dd93d66a ("mm, x86: add support for PUD-sized
transparent hugepages") already re-added pud_entry() but with different
semantics to the other callbacks. This commit reverts the semantics back
to match the other callbacks.

To support hmm.c which now uses the new semantics of pud_entry() a new
member ('action') of struct mm_walk is added which allows the callbacks to
either descend (ACTION_SUBTREE, the default), skip (ACTION_CONTINUE) or
repeat the callback (ACTION_AGAIN). hmm.c is then updated to call
pud_trans_huge_lock() itself and make use of the splitting/retry logic of
the core code.

After this change pud_entry() is called for all entries, not just
transparent huge pages.

[arnd@arndb.de: fix unused variable warning]
Link: http://lkml.kernel.org/r/20200107204607.1533842-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20191218162402.45610-12-steven.price@arm.com
Signed-off-by: Steven Price
Signed-off-by: Arnd Bergmann
Cc: Albert Ou
Cc: Alexandre Ghiti
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Dave Hansen
Cc: David S. Miller
Cc: Heiko Carstens
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: James Hogan
Cc: James Morse
Cc: Jerome Glisse
Cc: "Liang, Kan"
Cc: Mark Rutland
Cc: Michael Ellerman
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Russell King
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Zong Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Price
2020-02-04 11:05:25 +0800

24 Nov, 2019

4 commits

93f4e735b mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap ... Browse Code »

These two functions have never been used since they were added.

Link: https://lore.kernel.org/r/20191113134528.21187-1-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: John Hubbard
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-11-24 07:56:45 +0800
d28c2c9a4 mm/hmm: make full use of walk_page_range() ... Browse Code »

hmm_range_fault() calls find_vma() and walk_page_range() in a loop. This
is unnecessary duplication since walk_page_range() calls find_vma() in a
loop already.

Simplify hmm_range_fault() by defining a walk_test() callback function to
filter unhandled vmas.

This also fixes a bug where hmm_range_fault() was not checking start >=
vma->vm_start before checking vma->vm_flags so hmm_range_fault() could
return an error based on the wrong vma for the requested range.

It also fixes a bug when the vma has no read access and the caller did not
request a fault, there shouldn't be any error return code.

Link: https://lore.kernel.org/r/20191104222141.5173-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Ralph Campbell
2019-11-24 07:56:45 +0800
a22dd5064 mm/hmm: remove hmm_mirror and related ... Browse Code »

The only two users of this are now converted to use mmu_interval_notifier,
delete all the code and update hmm.rst.

Link: https://lore.kernel.org/r/20191112202231.3856-14-jgg@ziepe.ca
Reviewed-by: Jérôme Glisse
Tested-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2019-11-24 07:56:45 +0800
04ec32fbc mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror ... Browse Code »

hmm_mirror's handling of ranges does not use a sequence count which
results in this bug:

CPU0 CPU1
hmm_range_wait_until_valid(range)
valid == true
hmm_range_fault(range)
hmm_invalidate_range_start()
range->valid = false
hmm_invalidate_range_end()
range->valid = true
hmm_range_valid(range)
valid == true

Where the hmm_range_valid() should not have succeeded.

Adding the required sequence count would make it nearly identical to the
new mmu_interval_notifier. Instead replace the hmm_mirror stuff with
mmu_interval_notifier.

Co-existence of the two APIs is the first step.

Link: https://lore.kernel.org/r/20191112202231.3856-4-jgg@ziepe.ca
Reviewed-by: Jérôme Glisse
Tested-by: Philip Yang
Tested-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2019-11-24 07:56:44 +0800

30 Oct, 2019

1 commit

ac541f250 mm/hmm: allow snapshot of the special zero page ... Browse Code »

If a device driver like nouveau tries to use hmm_range_fault() to access
the special shared zero page in system memory, hmm_range_fault() will
return -EFAULT and kill the process.

Allow hmm_range_fault() to return success (0) when the CPU pagetable entry
points to the special shared zero page.

page_to_pfn() and pfn_to_page() are defined on the zero page so just
handle it like any other page.

Link: https://lore.kernel.org/r/20191023195515.13168-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: "Jérôme Glisse"
Acked-by: David Hildenbrand
Signed-off-by: Jason Gunthorpe

Ralph Campbell
2019-10-30 01:26:28 +0800

07 Sep, 2019

2 commits

7b86ac337 pagewalk: separate function pointers from iterator data ... Browse Code »

The mm_walk structure currently mixed data and code. Split out the
operations vectors into a new mm_walk_ops structure, and while we are
changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.

Based on patch from Linus Torvalds.

Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Thomas Hellstrom
Reviewed-by: Steven Price
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-09-07 15:28:04 +0800
a520110e4 mm: split out a new pagewalk.h header from mm.h ... Browse Code »

Add a new header for the two handful of users of the walk_page_range /
walk_page_vma interface instead of polluting all users of mm.h with it.

Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Thomas Hellstrom
Reviewed-by: Steven Price
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-09-07 15:28:04 +0800

28 Aug, 2019

2 commits

c18ce674d mm/hmm: hmm_range_fault() infinite loop ... Browse Code »

Normally, callers to handle_mm_fault() are supposed to check the
vma->vm_flags first. hmm_range_fault() checks for VM_READ but doesn't
check for VM_WRITE if the caller requests a page to be faulted in with
write permission (via the hmm_range.pfns[] value). If the vma is write
protected, this can result in an infinite loop:

hmm_range_fault()
walk_page_range()
...
hmm_vma_walk_hole()
hmm_vma_walk_hole_()
hmm_vma_do_fault()
handle_mm_fault(FAULT_FLAG_WRITE)
/* returns VM_FAULT_WRITE */
/* returns -EBUSY */
/* returns -EBUSY */
/* returns -EBUSY */
/* loops on -EBUSY and range->valid */

Prevent this by checking for vma->vm_flags & VM_WRITE before calling
handle_mm_fault().

Link: https://lore.kernel.org/r/20190823221753.2514-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe

Ralph Campbell
2019-08-28 06:27:07 +0800
6c64f2bbe mm/hmm: hmm_range_fault() NULL pointer bug ... Browse Code »

Although hmm_range_fault() calls find_vma() to make sure that a vma exists
before calling walk_page_range(), hmm_vma_walk_hole() can still be called
with walk->vma == NULL if the start and end address are not contained
within the vma range.

hmm_range_fault() /* calls find_vma() but no range check */
walk_page_range() /* calls find_vma(), sets walk->vma = NULL */
__walk_page_range()
walk_pgd_range()
walk_p4d_range()
walk_pud_range()
hmm_vma_walk_hole()
hmm_vma_walk_hole_()
hmm_vma_do_fault()
handle_mm_fault(vma=0)

Link: https://lore.kernel.org/r/20190823221753.2514-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe

Ralph Campbell
2019-08-28 06:27:07 +0800