07 Sep, 2019
2 commits
-
The mm_walk structure currently mixed data and code. Split out the
operations vectors into a new mm_walk_ops structure, and while we are
changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.Based on patch from Linus Torvalds.
Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Thomas Hellstrom
Reviewed-by: Steven Price
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
Add a new header for the two handful of users of the walk_page_range /
walk_page_vma interface instead of polluting all users of mm.h with it.Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Thomas Hellstrom
Reviewed-by: Steven Price
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
28 Aug, 2019
2 commits
-
Normally, callers to handle_mm_fault() are supposed to check the
vma->vm_flags first. hmm_range_fault() checks for VM_READ but doesn't
check for VM_WRITE if the caller requests a page to be faulted in with
write permission (via the hmm_range.pfns[] value). If the vma is write
protected, this can result in an infinite loop:hmm_range_fault()
walk_page_range()
...
hmm_vma_walk_hole()
hmm_vma_walk_hole_()
hmm_vma_do_fault()
handle_mm_fault(FAULT_FLAG_WRITE)
/* returns VM_FAULT_WRITE */
/* returns -EBUSY */
/* returns -EBUSY */
/* returns -EBUSY */
/* loops on -EBUSY and range->valid */Prevent this by checking for vma->vm_flags & VM_WRITE before calling
handle_mm_fault().Link: https://lore.kernel.org/r/20190823221753.2514-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
Although hmm_range_fault() calls find_vma() to make sure that a vma exists
before calling walk_page_range(), hmm_vma_walk_hole() can still be called
with walk->vma == NULL if the start and end address are not contained
within the vma range.hmm_range_fault() /* calls find_vma() but no range check */
walk_page_range() /* calls find_vma(), sets walk->vma = NULL */
__walk_page_range()
walk_pgd_range()
walk_p4d_range()
walk_pud_range()
hmm_vma_walk_hole()
hmm_vma_walk_hole_()
hmm_vma_do_fault()
handle_mm_fault(vma=0)Link: https://lore.kernel.org/r/20190823221753.2514-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Signed-off-by: Jason Gunthorpe
23 Aug, 2019
1 commit
-
hmm_range_fault() may return NULL pages because some of the pfns are equal
to HMM_PFN_NONE. This happens randomly under memory pressure. The reason
is during the swapped out page pte path, hmm_vma_handle_pte() doesn't
update the fault variable from cpu_flags, so it failed to call
hmm_vam_do_fault() to swap the page in.The fix is to call hmm_pte_need_fault() to update fault variable.
Fixes: 74eee180b935 ("mm/hmm/mirror: device page fault handler")
Link: https://lore.kernel.org/r/20190815205227.7949-1-Philip.Yang@amd.com
Signed-off-by: Philip Yang
Reviewed-by: "Jérôme Glisse"
Signed-off-by: Jason Gunthorpe
20 Aug, 2019
1 commit
-
This is a significant simplification, it eliminates all the remaining
'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
paths, and takes away all the ugly locking and abuse of page_table_lock.mmu_notifier_get() provides the single struct hmm per struct mm which
eliminates mm->hmm.It also directly guarantees that no mmu_notifier op callback is callable
while concurrent free is possible, this eliminates all the krefs inside
the mmu_notifier callbacks.The remaining krefs in the range code were overly cautious, drivers are
already not permitted to free the mirror while a range exists.Link: https://lore.kernel.org/r/20190806231548.25242-6-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig
Reviewed-by: Ralph Campbell
Tested-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe
08 Aug, 2019
8 commits
-
Stub out the whole function and assign NULL to the .hugetlb_entry method
if CONFIG_HUGETLB_PAGE is not set, as the method won't ever be called in
that case.Link: https://lore.kernel.org/r/20190806160554.14046-13-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
Stub out the whole function when CONFIG_TRANSPARENT_HUGEPAGE is not set to
make the function easier to read.Link: https://lore.kernel.org/r/20190806160554.14046-12-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
We only need the special pud_entry walker if PUD-sized hugepages and pte
mappings are supported, else the common pagewalk code will take care of
the iteration. Not implementing this callback reduced the amount of code
compiled for non-x86 platforms, and also fixes compile failures with other
architectures when helpers like pud_pfn are not implemented.Link: https://lore.kernel.org/r/20190806160554.14046-11-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
pte_index is an internal arch helper in various architectures, without
consistent semantics. Open code that calculation of a PMD index based on
the virtual address instead.Link: https://lore.kernel.org/r/20190806160554.14046-10-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
The pagewalk code already passes the value as the hmask parameter.
Link: https://lore.kernel.org/r/20190806160554.14046-9-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
All users pass PAGE_SIZE here, and if we wanted to support single entries
for huge pages we should really just add a HMM_FAULT_HUGEPAGE flag instead
that uses the huge page size instead of having the caller calculate that
size once, just for the hmm code to verify it.Link: https://lore.kernel.org/r/20190806160554.14046-8-hch@lst.de
Signed-off-by: Christoph Hellwig
Acked-by: Felix Kuehling
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
The start, end and page_shift values are all saved in the range structure,
so we might as well use that for argument passing.Link: https://lore.kernel.org/r/20190806160554.14046-7-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Felix Kuehling
Signed-off-by: Jason Gunthorpe -
Link: https://lore.kernel.org/r/20190806160554.14046-6-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe
26 Jul, 2019
8 commits
-
Since hmm_range_fault() doesn't use the struct hmm_range vma field, remove
it.Link: https://lore.kernel.org/r/20190726005650.2566-8-rcampbell@nvidia.com
Suggested-by: Jason Gunthorpe
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
walk_page_range() will only call hmm_vma_walk_hugetlb_entry() for
hugetlbfs pages and doesn't call hmm_vma_walk_pmd() in this case.Therefore, it is safe to remove the check for vma->vm_flags & VM_HUGETLB
in hmm_vma_walk_pmd().Link: https://lore.kernel.org/r/20190726005650.2566-7-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
Add a HMM_FAULT_SNAPSHOT flag so that hmm_range_snapshot can be merged
into the almost identical hmm_range_fault function.Link: https://lore.kernel.org/r/20190726005650.2566-5-rcampbell@nvidia.com
Signed-off-by: Christoph Hellwig
Signed-off-by: Ralph Campbell
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
This allows easier expansion to other flags, and also makes the callers a
little easier to read.Link: https://lore.kernel.org/r/20190726005650.2566-4-rcampbell@nvidia.com
Signed-off-by: Christoph Hellwig
Signed-off-by: Ralph Campbell
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
A few more comments and minor programming style clean ups. There should
be no functional changes.Link: https://lore.kernel.org/r/20190726005650.2566-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Signed-off-by: Jason Gunthorpe -
The hmm_mirror_ops callback function sync_cpu_device_pagetables() passes a
struct hmm_update which is a simplified version of struct
mmu_notifier_range. This is unnecessary so replace hmm_update with
mmu_notifier_range directly.Link: https://lore.kernel.org/r/20190726005650.2566-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell
Reviewed: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
[jgg: white space tuning]
Signed-off-by: Jason Gunthorpe -
The magic dropping of mmap_sem when handle_mm_fault returns VM_FAULT_RETRY
is rather subtile. Add a comment explaining it.Link: https://lore.kernel.org/r/20190724065258.16603-8-hch@lst.de
Tested-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe
[hch: wrote a changelog]
Signed-off-by: Christoph Hellwig -
We should not have two different error codes for the same
condition. EAGAIN must be reserved for the FAULT_FLAG_ALLOW_RETRY retry
case and signals to the caller that the mmap_sem has been unlocked.Use EBUSY for the !valid case so that callers can get the locking right.
Link: https://lore.kernel.org/r/20190724065258.16603-2-hch@lst.de
Tested-by: Ralph Campbell
Signed-off-by: Christoph Hellwig
Reviewed-by: Ralph Campbell
Reviewed-by: Jason Gunthorpe
Reviewed-by: Felix Kuehling
[jgg: elaborated commit message]
Signed-off-by: Jason Gunthorpe
15 Jul, 2019
1 commit
-
Pull HMM updates from Jason Gunthorpe:
"Improvements and bug fixes for the hmm interface in the kernel:- Improve clarity, locking and APIs related to the 'hmm mirror'
feature merged last cycle. In linux-next we now see AMDGPU and
nouveau to be using this API.- Remove old or transitional hmm APIs. These are hold overs from the
past with no users, or APIs that existed only to manage cross tree
conflicts. There are still a few more of these cleanups that didn't
make the merge window cut off.- Improve some core mm APIs:
- export alloc_pages_vma() for driver use
- refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
- refactor duplicative driver code into the core dev_pagemap
struct- Remove hmm wrappers of improved core mm APIs, instead have drivers
use the simplified API directly- Remove DEVICE_PUBLIC
- Simplify the kconfig flow for the hmm users and core code"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
mm: remove the HMM config option
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: simplify ZONE_DEVICE page private data
mm: remove hmm_devmem_add
mm: remove hmm_vma_alloc_locked_page
nouveau: use devm_memremap_pages directly
nouveau: use alloc_page_vma directly
PCI/P2PDMA: use the dev_pagemap internal refcount
device-dax: use the dev_pagemap internal refcount
memremap: provide an optional internal refcount in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: remove the data field in struct dev_pagemap
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: move dev_pagemap callbacks into a separate structure
memremap: validate the pagemap type passed to devm_memremap_pages
mm: factor out a devm_request_free_mem_region helper
mm: export alloc_pages_vma
...
03 Jul, 2019
14 commits
-
Christoph Hellwig says:
====================
Below is a series that cleans up the dev_pagemap interface so that it is
more easily usable, which removes the need to wrap it in hmm and thus
allowing to kill a lot of codeChanges since v3:
- pull in "mm/swap: Fix release_pages() when releasing devmap pages" and
rebase the other patches on top of that
- fold the hmm_devmem_add_resource into the DEVICE_PUBLIC memory removal
patch
- remove _vm_normal_page as it isn't needed without DEVICE_PUBLIC memory
- pick up various ACKsChanges since v2:
- fix nvdimm kunit build
- add a new memory type for device dax
- fix a few issues in intermediate patches that didn't show up in the end
result
- incorporate feedback from Michal Hocko, including killing of
the DEVICE_PUBLIC memory type entirelyChanges since v1:
- rebase
- also switch p2pdma to the internal refcount
- add type checking for pgmap->type
- rename the migrate method to migrate_to_ram
- cleanup the altmap_valid flag
- various tidbits from the reviews
====================Conflicts resolved by:
- Keeping Ira's version of the code in swap.c
- Using the delete for the section in hmm.rst
- Using the delete for the devmap code in hmm.c and .h* branch 'hmm-devmem-cleanup.4': (24 commits)
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
mm: remove the HMM config option
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: simplify ZONE_DEVICE page private data
mm: remove hmm_devmem_add
mm: remove hmm_vma_alloc_locked_page
nouveau: use devm_memremap_pages directly
nouveau: use alloc_page_vma directly
PCI/P2PDMA: use the dev_pagemap internal refcount
device-dax: use the dev_pagemap internal refcount
memremap: provide an optional internal refcount in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: remove the data field in struct dev_pagemap
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: move dev_pagemap callbacks into a separate structure
memremap: validate the pagemap type passed to devm_memremap_pages
mm: factor out a devm_request_free_mem_region helper
mm: export alloc_pages_vma
...Signed-off-by: Jason Gunthorpe
-
Required for dependencies in the next patches.
-
All the mm/hmm.c code is better keyed off HMM_MIRROR. Also let nouveau
depend on it instead of the mix of a dummy dependency symbol plus the
actually selected one. Drop various odd dependencies, as the code is
pretty portable.Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
There isn't really much value add in the hmm_devmem_add wrapper and
more, as using devm_memremap_pages directly now is just as simple.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
The only user of it has just been removed, and there wasn't really any need
to wrap a basic memory allocator to start with.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
Add a flags field to struct dev_pagemap to replace the altmap_valid
boolean to be a little more extensible. Also add a pgmap_altmap() helper
to find the optional altmap and clean up the code using the altmap using
it.Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
struct dev_pagemap is always embedded into a containing structure, so
there is no need to an additional private data field.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
This replaces the hacky ->fault callback, which is currently directly
called from common code through a hmm specific data structure as an
exercise in layering violations.Signed-off-by: Christoph Hellwig
Reviewed-by: Ralph Campbell
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
Just check if there is a ->page_free operation set and take care of the
static key enable, as well as the put using device managed resources.
Also check that a ->page_free is provided for the pgmaps types that
require it, and check for a valid type as well while we are at it.Note that this also fixes the fact that hmm never called
dev_pagemap_put_ops and thus would leave the slow path enabled forever,
even after a device driver unload or disable.Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
Passing the actual typed structure leads to more understandable code
vs just passing the ref member.Reported-by: Logan Gunthorpe
Signed-off-by: Christoph Hellwig
Reviewed-by: Logan Gunthorpe
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
The dev_pagemap is a growing too many callbacks. Move them into a
separate ops structure so that they are not duplicated for multiple
instances, and an attacker can't easily overwrite them.Signed-off-by: Christoph Hellwig
Reviewed-by: Logan Gunthorpe
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
Keep the physical address allocation that hmm_add_device does with the
rest of the resource code, and allow future reuse of it without the hmm
wrapper.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: John Hubbard
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
->mapping isn't even used by HMM users, and the field at the same offset
in the zone_device part of the union is declared as pad. (Which btw is
rather confusing, as DAX uses ->pgmap and ->mapping from two different
sides of the union, but DAX doesn't use hmm_devmem_free).Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: John Hubbard
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe -
The code hasn't been used since it was added to the tree, and doesn't
appear to actually be usable.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Acked-by: Michal Hocko
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe
02 Jul, 2019
1 commit
-
This code is a trivial wrapper around device model helpers, which
should have been integrated into the driver device model usage from
the start. Assuming it actually had users, which it never had since
the code was added more than 1 1/2 years ago.Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: John Hubbard
Reviewed-by: Dan Williams
Signed-off-by: Jason Gunthorpe
28 Jun, 2019
1 commit
-
If the trylock on the hmm->mirrors_sem fails the function will return
without decrementing the notifiers that were previously incremented. Since
the caller will not call invalidate_range_end() on EAGAIN this will result
in notifiers becoming permanently incremented and deadlock.If the sync_cpu_device_pagetables() required blocking the function will
not return EAGAIN even though the device continues to touch the
pages. This is a violation of the mmu notifier contract.Switch, and rename, the ranges_lock to a spin lock so we can reliably
obtain it without blocking during error unwind.The error unwind is necessary since the notifiers count must be held
incremented across the call to sync_cpu_device_pagetables() as we cannot
allow the range to become marked valid by a parallel
invalidate_start/end() pair while doing sync_cpu_device_pagetables().Signed-off-by: Jason Gunthorpe
Reviewed-by: Ralph Campbell
Reviewed-by: Christoph Hellwig
Tested-by: Philip Yang
25 Jun, 2019
1 commit
-
hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.This fixes a use after-free race of the form:
CPU0 CPU1
hmm_release()
up_write(&hmm->mirrors_sem);
hmm_mirror_unregister(mirror)
down_write(&hmm->mirrors_sem);
up_write(&hmm->mirrors_sem);
kfree(mirror)
mirror->ops->release(mirror)The only user we have today for ops->release is an empty function, so this
is unambiguously safe.As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.Signed-off-by: Jason Gunthorpe
Reviewed-by: Christoph Hellwig
Tested-by: Philip Yang