Eric Lee / smarc-fsl-linux-kernel

05 Mar, 2020

1 commit

8cb5db61a mm/gup: allow FOLL_FORCE for get_user_pages_fast() ... Browse Code »

commit f4000fdf435b8301a11cf85237c561047f8c4c72 upstream.

Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
This, combined with the fact that get_user_pages_fast() falls back to
"slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
if you need FOLL_FORCE, you cannot call get_user_pages_fast().

There does not appear to be any reason for filtering out FOLL_FORCE.
There is nothing in the _fast() implementation that requires that we
avoid writing to the pages. So it appears to have been an oversight.

Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().

Link: http://lkml.kernel.org/r/20200107224558.2362728-9-jhubbard@nvidia.com
Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
Signed-off-by: John Hubbard
Reviewed-by: Leon Romanovsky
Reviewed-by: Jan Kara
Cc: Christoph Hellwig
Cc: Alex Williamson
Cc: Aneesh Kumar K.V
Cc: Björn Töpel
Cc: Daniel Vetter
Cc: Dan Williams
Cc: Hans Verkuil
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Gunthorpe
Cc: Jens Axboe
Cc: Jerome Glisse
Cc: Jonathan Corbet
Cc: Kirill A. Shutemov
Cc: Mauro Carvalho Chehab
Cc: Mike Rapoport
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

John Hubbard
2020-03-05 23:43:51 +0800

19 Oct, 2019

1 commit

0cd22afdc mm/gup: fix a misnamed "write" argument, and a related bug ... Browse Code »

In several routines, the "flags" argument is incorrectly named "write".
Change it to "flags".

Also, in one place, the misnaming led to an actual bug:
"flags & FOLL_WRITE" is required, rather than just "flags".
(That problem was flagged by krobot, in v1 of this patch.)

Also, change the flags argument from int, to unsigned int.

You can see that this was a simple oversight, because the
calling code passes "flags" to the fifth argument:

gup_pgd_range():
...
if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
PGDIR_SHIFT, next, flags, pages, nr))

...which, until this patch, the callees referred to as "write".

Also, change two lines to avoid checkpatch line length
complaints, and another line to fix another oversight
that checkpatch called out: missing "int" on pdshift.

Link: http://lkml.kernel.org/r/20191014184639.1512873-3-jhubbard@nvidia.com
Fixes: b798bec4741b ("mm/gup: change write parameter to flags in fast walk")
Signed-off-by: John Hubbard
Reported-by: kbuild test robot
Suggested-by: Kirill A. Shutemov
Suggested-by: Ira Weiny
Acked-by: Kirill A. Shutemov
Reviewed-by: Ira Weiny
Cc: Christoph Hellwig
Cc: Aneesh Kumar K.V
Cc: Keith Busch
Cc: Shuah Khan
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

John Hubbard
2019-10-19 18:32:32 +0800

26 Sep, 2019

1 commit

f96525941 mm: untag user pointers in mm/gup.c ... Browse Code »

This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

mm/gup.c provides a kernel interface that accepts user addresses and
manipulates user pages directly (for example get_user_pages, that is used
by the futex syscall). Since a user can provided tagged addresses, we
need to handle this case.

Add untagging to gup.c functions that use user addresses for vma lookups.

Link: http://lkml.kernel.org/r/4731bddba3c938658c10ff4ed55cc01c60f4c8f8.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Reviewed-by: Khalid Aziz
Reviewed-by: Vincenzo Frascino
Reviewed-by: Kees Cook
Reviewed-by: Catalin Marinas
Cc: Al Viro
Cc: Dave Hansen
Cc: Eric Auger
Cc: Felix Kuehling
Cc: Jens Wiklander
Cc: Mauro Carvalho Chehab
Cc: Mike Rapoport
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Konovalov
2019-09-26 08:51:41 +0800

25 Sep, 2019

3 commits

bfe7b00de mm, thp: introduce FOLL_SPLIT_PMD ... Browse Code »

Introduce a new foll_flag: FOLL_SPLIT_PMD. As the name says
FOLL_SPLIT_PMD splits huge pmd for given mm_struct, the underlining huge
page stays as-is.

FOLL_SPLIT_PMD is useful for cases where we need to use regular pages, but
would switch back to huge page and huge pmd on. One of such example is
uprobe. The following patches use FOLL_SPLIT_PMD in uprobe.

Link: http://lkml.kernel.org/r/20190815164525.1848545-4-songliubraving@fb.com
Signed-off-by: Song Liu
Reviewed-by: Oleg Nesterov
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Song Liu
2019-09-25 06:54:11 +0800
2d15eb31b mm/gup: add make_dirty arg to put_user_pages_dirty_lock() ... Browse Code »

[11~From: John Hubbard
Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",
v3.

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've had
to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set. That
can only be applied after all call sites are converted, but it's good to
get an early look at it.

This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm:
introduce put_user_page*(), placeholder versions").

This patch (of 3):

Provide more capable variation of put_user_pages_dirty_lock(), and delete
put_user_pages_dirty(). This is based on the following:

1. Lots of call sites become simpler if a bool is passed into
put_user_page*(), instead of making the call site choose which
put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock() is
usually correct, and set_page_dirty() is usually a bug, or at least
questionable, within a put_user_page*() calling chain.

This leads to the following API choices:

* put_user_pages_dirty_lock(page, npages, make_dirty)

* There is no put_user_pages_dirty(). You have to
hand code that, in the rare case that it's
required.

[jhubbard@nvidia.com: remove unused variable in siw_free_plist()]
Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard
Cc: Matthew Wilcox
Cc: Jan Kara
Cc: Christoph Hellwig
Cc: Ira Weiny
Cc: Jason Gunthorpe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@linux-foundation.org
2019-09-25 06:54:08 +0800
d8c6546b1 mm: introduce compound_nr() ... Browse Code »

Replace 1 << compound_order(page) with compound_nr(page). Minor
improvements in readability.

Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Andrew Morton
Reviewed-by: Ira Weiny
Acked-by: Kirill A. Shutemov
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2019-09-25 06:54:08 +0800

17 Jul, 2019

1 commit

175967318 mm: introduce ARCH_HAS_PTE_DEVMAP ... Browse Code »

ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.

In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.

Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy
Acked-by: Dan Williams
Reviewed-by: Ira Weiny
Acked-by: Oliver O'Halloran
Reviewed-by: Anshuman Khandual
Cc: Michael Ellerman
Cc: Catalin Marinas
Cc: David Hildenbrand
Cc: Jerome Glisse
Cc: Michal Hocko
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Murphy
2019-07-17 10:23:25 +0800

15 Jul, 2019

1 commit

fec88ab0a Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma ... Browse Code »

Pull HMM updates from Jason Gunthorpe:
"Improvements and bug fixes for the hmm interface in the kernel:

- Improve clarity, locking and APIs related to the 'hmm mirror'
feature merged last cycle. In linux-next we now see AMDGPU and
nouveau to be using this API.

- Remove old or transitional hmm APIs. These are hold overs from the
past with no users, or APIs that existed only to manage cross tree
conflicts. There are still a few more of these cleanups that didn't
make the merge window cut off.

- Improve some core mm APIs:
- export alloc_pages_vma() for driver use
- refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
- refactor duplicative driver code into the core dev_pagemap
struct

- Remove hmm wrappers of improved core mm APIs, instead have drivers
use the simplified API directly

- Remove DEVICE_PUBLIC

- Simplify the kconfig flow for the hmm users and core code"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits)
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
mm: remove the HMM config option
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: simplify ZONE_DEVICE page private data
mm: remove hmm_devmem_add
mm: remove hmm_vma_alloc_locked_page
nouveau: use devm_memremap_pages directly
nouveau: use alloc_page_vma directly
PCI/P2PDMA: use the dev_pagemap internal refcount
device-dax: use the dev_pagemap internal refcount
memremap: provide an optional internal refcount in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: remove the data field in struct dev_pagemap
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: move dev_pagemap callbacks into a separate structure
memremap: validate the pagemap type passed to devm_memremap_pages
mm: factor out a devm_request_free_mem_region helper
mm: export alloc_pages_vma
...

Linus Torvalds
2019-07-15 10:42:11 +0800

13 Jul, 2019

14 commits

790c73690 mm/gup.c: mark undo_dev_pagemap as __maybe_unused ... Browse Code »

Several mips builds generate the following build warning.

mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used

The function is declared unconditionally but only called from behind
various ifdefs. Mark it __maybe_unused.

Link: http://lkml.kernel.org/r/1562072523-22311-1-git-send-email-linux@roeck-us.net
Signed-off-by: Guenter Roeck
Reviewed-by: Andrew Morton
Cc: Stephen Rothwell
Cc: Robin Murphy
Cc: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Guenter Roeck
2019-07-13 02:05:45 +0800
b5d1c39f3 mm/gup.c: remove some BUG_ONs from get_gate_page() ... Browse Code »

If we end up without a PGD or PUD entry backing the gate area, don't BUG
-- just fail gracefully.

It's not entirely implausible that this could happen some day on x86. It
doesn't right now even with an execute-only emulated vsyscall page because
the fixmap shares the PUD, but the core mm code shouldn't rely on that
particular detail to avoid OOPSing.

Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.org
Signed-off-by: Andy Lutomirski
Reviewed-by: Kees Cook
Reviewed-by: Andrew Morton
Cc: Florian Weimer
Cc: Jann Horn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Lutomirski
2019-07-13 02:05:45 +0800
aa712399c mm/gup: speed up check_and_migrate_cma_pages() on huge page ... Browse Code »

Both hugetlb and thp locate on the same migration type of pageblock, since
they are allocated from a free_list[]. Based on this fact, it is enough
to check on a single subpage to decide the migration type of the whole
huge page. By this way, it saves (2M/4K - 1) times loop for pmd_huge on
x86, similar on other archs.

Furthermore, when executing isolate_huge_page(), it avoid taking global
hugetlb_lock many times, and meanless remove/add to the local link list
cma_page_list.

[akpm@linux-foundation.org: make `i' and `step' unsigned]
Link: http://lkml.kernel.org/r/1561612545-28997-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Pingfan Liu
Reviewed-by: Andrew Morton
Reviewed-by: Ira Weiny
Cc: Mike Rapoport
Cc: "Kirill A. Shutemov"
Cc: Thomas Gleixner
Cc: John Hubbard
Cc: "Aneesh Kumar K.V"
Cc: Christoph Hellwig
Cc: Keith Busch
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pingfan Liu
2019-07-13 02:05:45 +0800
520b4a449 mm: mark the page referenced in gup_hugepte ... Browse Code »

All other get_user_page_fast cases mark the page referenced, so do this
here as well.

Link: http://lkml.kernel.org/r/20190625143715.1689-17-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800
01a369160 mm: switch gup_hugepte to use try_get_compound_head ... Browse Code »

This applies the overflow fixes from 8fde12ca79aff ("mm: prevent
get_user_pages() from overflowing page refcount") to the powerpc hugepd
code and brings it back in sync with the other GUP cases.

Link: http://lkml.kernel.org/r/20190625143715.1689-16-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800
cbd34da7d mm: move the powerpc hugepd code to mm/gup.c ... Browse Code »

While only powerpc supports the hugepd case, the code is pretty generic
and I'd like to keep all GUP internals in one place.

Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800
817be129e mm: validate get_user_pages_fast flags ... Browse Code »

We can only deal with FOLL_WRITE and/or FOLL_LONGTERM in
get_user_pages_fast, so reject all other flags.

Link: http://lkml.kernel.org/r/20190625143715.1689-14-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800
050a9adc6 mm: consolidate the get_user_pages* implementations ... Browse Code »

Always build mm/gup.c so that we don't have to provide separate nommu
stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs
when HAVE_FAST_GUP into the main implementations, which will never call
the fast path if HAVE_FAST_GUP is not set.

This also ensures the new put_user_pages* helpers are available for nommu,
as those are currently missing, which would create a problem as soon as we
actually grew users for it.

Link: http://lkml.kernel.org/r/20190625143715.1689-13-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800
d3649f68b mm: reorder code blocks in gup.c ... Browse Code »

This moves the actually exported functions towards the end of the file,
and reorders some functions to be in more logical blocks as a preparation
for moving various stubs inline into the main functionality using
IS_ENABLED().

Link: http://lkml.kernel.org/r/20190625143715.1689-12-hch@lst.de
Signed-off-by: Christoph Hellwig
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:45 +0800
67a929e09 mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUP ... Browse Code »

We only support the generic GUP now, so rename the config option to
be more clear, and always use the mm/Kconfig definition of the
symbol and select it from the arch Kconfigs.

Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Khalid Aziz
Reviewed-by: Jason Gunthorpe
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:44 +0800
39656e83d mm: lift the x86_32 PAE version of gup_get_pte to common code ... Browse Code »

The split low/high access is the only non-READ_ONCE version of gup_get_pte
that did show up in the various arch implemenations. Lift it to common
code and drop the ifdef based arch override.

Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:44 +0800
26f4c3280 mm: simplify gup_fast_permitted ... Browse Code »

Pass in the already calculated end value instead of recomputing it, and
leave the end > start check in the callers instead of duplicating them in
the arch code.

Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: James Hogan
Cc: Khalid Aziz
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:44 +0800
f455c8548 mm: use untagged_addr() for get_user_pages_fast addresses ... Browse Code »

Patch series "switch the remaining architectures to use generic GUP", v4.

A series to switch mips, sh and sparc64 to use the generic GUP code so
that we only have one codebase to touch for further improvements to this
code.

This patch (of 16):

This will allow sparc64, or any future architecture with memory tagging to
override its tags for get_user_pages and get_user_pages_fast.

Link: http://lkml.kernel.org/r/20190625143715.1689-2-hch@lst.de
Signed-off-by: Christoph Hellwig
Reviewed-by: Khalid Aziz
Reviewed-by: Jason Gunthorpe
Cc: Paul Burton
Cc: James Hogan
Cc: Yoshinori Sato
Cc: Rich Felker
Cc: David Miller
Cc: Nicholas Piggin
Cc: Khalid Aziz
Cc: Andrey Konovalov
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2019-07-13 02:05:44 +0800
a7030aea2 mm/gup.c: make follow_page_mask() static ... Browse Code »

follow_page_mask() is only used in gup.c, make it static.

Link: http://lkml.kernel.org/r/20190510190831.GA4061@bharath12345-Inspiron-5559
Signed-off-by: Bharath Vedartham
Reviewed-by: Ira Weiny
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bharath Vedartham
2019-07-13 02:05:42 +0800

03 Jul, 2019

1 commit

25b2995a3 mm: remove MEMORY_DEVICE_PUBLIC support ... Browse Code »

The code hasn't been used since it was added to the tree, and doesn't
appear to actually be usable.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Acked-by: Michal Hocko
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-07-03 01:32:43 +0800

02 Jun, 2019

1 commit

df17277b2 mm/gup: continue VM_FAULT_RETRY processing even for pre-faults ... Browse Code »

When get_user_pages*() is called with pages = NULL, the processing of
VM_FAULT_RETRY terminates early without actually retrying to fault-in all
the pages.

If the pages in the requested range belong to a VMA that has userfaultfd
registered, handle_userfault() returns VM_FAULT_RETRY *after* user space
has populated the page, but for the gup pre-fault case there's no actual
retry and the caller will get no pages although they are present.

This issue was uncovered when running post-copy memory restore in CRIU
after d9c9ce34ed5c ("x86/fpu: Fault-in user stack if
copy_fpstate_to_sigframe() fails").

After this change, the copying of FPU state to the sigframe switched from
copy_to_user() variants which caused a real page fault to get_user_pages()
with pages parameter set to NULL.

In post-copy mode of CRIU, the destination memory is managed with
userfaultfd and lack of the retry for pre-fault case in get_user_pages()
causes a crash of the restored process.

Making the pre-fault behavior of get_user_pages() the same as the "normal"
one fixes the issue.

Link: http://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com
Fixes: d9c9ce34ed5c ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
Signed-off-by: Mike Rapoport
Tested-by: Andrei Vagin [https://travis-ci.org/avagin/linux/builds/533184940]
Tested-by: Hugh Dickins
Cc: Andrea Arcangeli
Cc: Sebastian Andrzej Siewior
Cc: Borislav Petkov
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-06-02 06:51:31 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

15 May, 2019

5 commits

fc1d8e7cc mm: introduce put_user_page*(), placeholder versions ... Browse Code »

A discussion of the overall problem is below.

As mentioned in patch 0001, the steps are to fix the problem are:

1) Provide put_user_page*() routines, intended to be used
for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of
call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
implement tracking of these pages. This tracking will be separate from
the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
special handling (especially in writeback paths) when the pages are
backed by a filesystem.

Overview
========

Some kernel components (file systems, device drivers) need to access
memory that is specified via process virtual address. For a long time,
the API to achieve that was get_user_pages ("GUP") and its variations.
However, GUP has critical limitations that have been overlooked; in
particular, GUP does not interact correctly with filesystems in all
situations. That means that file-backed memory + GUP is a recipe for
potential problems, some of which have already occurred in the field.

GUP was first introduced for Direct IO (O_DIRECT), allowing filesystem
code to get the struct page behind a virtual address and to let storage
hardware perform a direct copy to or from that page. This is a
short-lived access pattern, and as such, the window for a concurrent
writeback of GUP'd page was small enough that there were not (we think)
any reported problems. Also, userspace was expected to understand and
accept that Direct IO was not synchronized with memory-mapped access to
that data, nor with any process address space changes such as munmap(),
mremap(), etc.

Over the years, more GUP uses have appeared (virtualization, device
drivers, RDMA) that can keep the pages they get via GUP for a long period
of time (seconds, minutes, hours, days, ...). This long-term pinning
makes an underlying design problem more obvious.

In fact, there are a number of key problems inherent to GUP:

Interactions with file systems
==============================

File systems expect to be able to write back data, both to reclaim pages,
and for data integrity. Allowing other hardware (NICs, GPUs, etc) to gain
write access to the file memory pages means that such hardware can dirty
the pages, without the filesystem being aware. This can, in some cases
(depending on filesystem, filesystem options, block device, block device
options, and other variables), lead to data corruption, and also to kernel
bugs of the form:

kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!
backtrace:
ext4_writepage
__writepage
write_cache_pages
ext4_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork

...which is due to the file system asserting that there are still buffer
heads attached:

({ \
BUG_ON(!PagePrivate(page)); \
((struct buffer_head *)page_private(page)); \
})

Dave Chinner's description of this is very clear:

"The fundamental issue is that ->page_mkwrite must be called on every
write access to a clean file backed page, not just the first one.
How long the GUP reference lasts is irrelevant, if the page is clean
and you need to dirty it, you must call ->page_mkwrite before it is
marked writeable and dirtied. Every. Time."

This is just one symptom of the larger design problem: real filesystems
that actually write to a backing device, do not actually support
get_user_pages() being called on their pages, and letting hardware write
directly to those pages--even though that pattern has been going on since
about 2005 or so.

Long term GUP
=============

Long term GUP is an issue when FOLL_WRITE is specified to GUP (so, a
writeable mapping is created), and the pages are file-backed. That can
lead to filesystem corruption. What happens is that when a file-backed
page is being written back, it is first mapped read-only in all of the CPU
page tables; the file system then assumes that nobody can write to the
page, and that the page content is therefore stable. Unfortunately, the
GUP callers generally do not monitor changes to the CPU pages tables; they
instead assume that the following pattern is safe (it's not):

get_user_pages()

Hardware can keep a reference to those pages for a very long time,
and write to it at any time. Because "hardware" here means "devices
that are not a CPU", this activity occurs without any interaction with
the kernel's file system code.

for each page
set_page_dirty
put_page()

In fact, the GUP documentation even recommends that pattern.

Anyway, the file system assumes that the page is stable (nothing is
writing to the page), and that is a problem: stable page content is
necessary for many filesystem actions during writeback, such as checksum,
encryption, RAID striping, etc. Furthermore, filesystem features like COW
(copy on write) or snapshot also rely on being able to use a new page for
as memory for that memory range inside the file.

Corruption during write back is clearly possible here. To solve that, one
idea is to identify pages that have active GUP, so that we can use a
bounce page to write stable data to the filesystem. The filesystem would
work on the bounce page, while any of the active GUP might write to the
original page. This would avoid the stable page violation problem, but
note that it is only part of the overall solution, because other problems
remain.

Other filesystem features that need to replace the page with a new one can
be inhibited for pages that are GUP-pinned. This will, however, alter and
limit some of those filesystem features. The only fix for that would be
to require GUP users to monitor and respond to CPU page table updates.
Subsystems such as ODP and HMM do this, for example. This aspect of the
problem is still under discussion.

Direct IO
=========

Direct IO can cause corruption, if userspace does Direct-IO that writes to
a range of virtual addresses that are mmap'd to a file. The pages written
to are file-backed pages that can be under write back, while the Direct IO
is taking place. Here, Direct IO races with a write back: it calls GUP
before page_mkclean() has replaced the CPU pte with a read-only entry.
The race window is pretty small, which is probably why years have gone by
before we noticed this problem: Direct IO is generally very quick, and
tends to finish up before the filesystem gets around to do anything with
the page contents. However, it's still a real problem. The solution is
to never let GUP return pages that are under write back, but instead,
force GUP to take a write fault on those pages. That way, GUP will
properly synchronize with the active write back. This does not change the
required GUP behavior, it just avoids that race.

Details
=======

Introduces put_user_page(), which simply calls put_page(). This provides
a way to update all get_user_pages*() callers, so that they call
put_user_page(), instead of put_page().

Also introduces put_user_pages(), and a few dirty/locked variations, as a
replacement for release_pages(), and also as a replacement for open-coded
loops that release multiple pages. These may be used for subsequent
performance improvements, via batching of pages to be released.

This is the first step of fixing a problem (also described in [1] and [2])
with interactions between get_user_pages ("gup") and filesystems.

Problem description: let's start with a bug report. Below, is what
happens sometimes, under memory pressure, when a driver pins some pages
via gup, and then marks those pages dirty, and releases them. Note that
the gup documentation actually recommends that pattern. The problem is
that the filesystem may do a writeback while the pages were gup-pinned,
and then the filesystem believes that the pages are clean. So, when the
driver later marks the pages as dirty, that conflicts with the
filesystem's page tracking and results in a BUG(), like this one that I
experienced:

kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!
backtrace:
ext4_writepage
__writepage
write_cache_pages
ext4_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork

...which is due to the file system asserting that there are still buffer
heads attached:

({ \
BUG_ON(!PagePrivate(page)); \
((struct buffer_head *)page_private(page)); \
})

Dave Chinner's description of this is very clear:

"The fundamental issue is that ->page_mkwrite must be called on
every write access to a clean file backed page, not just the first
one. How long the GUP reference lasts is irrelevant, if the page is
clean and you need to dirty it, you must call ->page_mkwrite before it
is marked writeable and dirtied. Every. Time."

This is just one symptom of the larger design problem: real filesystems
that actually write to a backing device, do not actually support
get_user_pages() being called on their pages, and letting hardware write
directly to those pages--even though that pattern has been going on since
about 2005 or so.

The steps are to fix it are:

1) (This patch): provide put_user_page*() routines, intended to be used
for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of
call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
implement tracking of these pages. This tracking will be separate from
the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
special handling (especially in writeback paths) when the pages are
backed by a filesystem.

[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
[2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

Link: http://lkml.kernel.org/r/20190327023632.13307-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard
Reviewed-by: Jan Kara
Reviewed-by: Mike Rapoport [docs]
Reviewed-by: Ira Weiny
Reviewed-by: Jérôme Glisse
Reviewed-by: Christoph Lameter
Tested-by: Ira Weiny
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jason Gunthorpe
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Ralph Campbell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

John Hubbard
2019-05-15 00:47:47 +0800
7af75561e mm/gup: add FOLL_LONGTERM capability to GUP fast ... Browse Code »

DAX pages were previously unprotected from longterm pins when users called
get_user_pages_fast().

Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall back to
regular GUP processing if a DEVMAP page is encountered.

[ira.weiny@intel.com: v3]
Link: http://lkml.kernel.org/r/20190328084422.29911-5-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-5-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-5-ira.weiny@intel.com
Signed-off-by: Ira Weiny
Reviewed-by: Andrew Morton
Cc: Aneesh Kumar K.V
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Dan Williams
Cc: "David S. Miller"
Cc: Heiko Carstens
Cc: Ingo Molnar
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: John Hubbard
Cc: "Kirill A. Shutemov"
Cc: Martin Schwidefsky
Cc: Michal Hocko
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Thomas Gleixner
Cc: Yoshinori Sato
Cc: Mike Marshall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ira Weiny
2019-05-15 00:47:46 +0800
73b0140bf mm/gup: change GUP fast to use flags rather than a write 'bool' ... Browse Code »

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality. New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted. This breaks the current
GUP call site convention of having the returned pages be the final
parameter. So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny
Reviewed-by: Mike Marshall
Cc: Aneesh Kumar K.V
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Dan Williams
Cc: "David S. Miller"
Cc: Heiko Carstens
Cc: Ingo Molnar
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: John Hubbard
Cc: "Kirill A. Shutemov"
Cc: Martin Schwidefsky
Cc: Michal Hocko
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Thomas Gleixner
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ira Weiny
2019-05-15 00:47:46 +0800
b798bec47 mm/gup: change write parameter to flags in fast walk ... Browse Code »

In order to support more options in the GUP fast walk, change the write
parameter to flags throughout the call stack.

This patch does not change functionality and passes FOLL_WRITE where write
was previously used.

Link: http://lkml.kernel.org/r/20190328084422.29911-3-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-3-ira.weiny@intel.com
Signed-off-by: Ira Weiny
Reviewed-by: Dan Williams
Cc: Aneesh Kumar K.V
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: "David S. Miller"
Cc: Heiko Carstens
Cc: Ingo Molnar
Cc: James Hogan
Cc: Jason Gunthorpe
Cc: John Hubbard
Cc: "Kirill A. Shutemov"
Cc: Martin Schwidefsky
Cc: Michal Hocko
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Thomas Gleixner
Cc: Yoshinori Sato
Cc: Mike Marshall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ira Weiny
2019-05-15 00:47:45 +0800
932f4a630 mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM ... Browse Code »

Pach series "Add FOLL_LONGTERM to GUP fast and use it".

HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages. These pages can be held for a significant time. But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks. XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/3/19/939

"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move. I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem. Then I think we can change the flag to a
better name.

Secondly, it depends on how often you are registering memory. I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance. I don't have the numbers as the
tests for HFI1 were done a long time ago. But there was a significant
advantage. Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast. There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well. Also
to this point others are looking to use *_fast.

As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same. I agree and I think further cleanup
will be coming. But I'm focused on getting the final solution for DAX at
the moment.

This patch (of 7):

This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast(). Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.

Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.

This patch does not change any functionality. In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked. However, callers of get_user_pages_fast()
were not "protected".

FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.

NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages. This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.

As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.

In review[1] it was asked:

> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a
misnomer. This is really flagging a pin which is going to be given to
hardware and can't move. I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem. Then I think we can
change the flag to a better name.

Second, It depends on how often you are registering memory. I have spoken
with some RDMA users who consider MR in the performance path... For the
overall application performance. I don't have the numbers as the tests
for HFI1 were done a long time ago. But there was a significant
advantage. Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast. There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well. Also
to this point others are looking to use *_fast.

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same. I agree and I think further cleanup
will be coming. But I'm focused on getting the final solution for DAX at
the moment.

[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

[ira.weiny@intel.com: v3]
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny
Reviewed-by: Andrew Morton
Cc: Aneesh Kumar K.V
Cc: Michal Hocko
Cc: John Hubbard
Cc: "Kirill A. Shutemov"
Cc: Peter Zijlstra
Cc: Jason Gunthorpe
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "David S. Miller"
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Rich Felker
Cc: Yoshinori Sato
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: Ralf Baechle
Cc: James Hogan
Cc: Dan Williams
Cc: Mike Marshall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ira Weiny
2019-05-15 00:47:45 +0800

15 Apr, 2019

2 commits

6b3a70773 Merge branch 'page-refs' (page ref overflow) ... Browse Code »

Merge page ref overflow branch.

Jann Horn reported that he can overflow the page ref count with
sufficient memory (and a filesystem that is intentionally extremely
slow).

Admittedly it's not exactly easy. To have more than four billion
references to a page requires a minimum of 32GB of kernel memory just
for the pointers to the pages, much less any metadata to keep track of
those pointers. Jann needed a total of 140GB of memory and a specially
crafted filesystem that leaves all reads pending (in order to not ever
free the page references and just keep adding more).

Still, we have a fairly straightforward way to limit the two obvious
user-controllable sources of page references: direct-IO like page
references gotten through get_user_pages(), and the splice pipe page
duplication. So let's just do that.

* branch page-refs:
fs: prevent page refcount overflow in pipe_buf_get
mm: prevent get_user_pages() from overflowing page refcount
mm: add 'try_get_page()' helper function
mm: make page ref count overflow check tighter and more explicit

Linus Torvalds
2019-04-15 06:09:40 +0800
8fde12ca7 mm: prevent get_user_pages() from overflowing page refcount ... Browse Code »

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it. One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times. This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn
Acked-by: Matthew Wilcox
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2019-04-15 01:00:04 +0800

08 Mar, 2019

1 commit

f86727f8b Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 mm cleanup from Ingo Molnar:
"A single GUP cleanup"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/gup: Remove the 'write' parameter from gup_fast_permitted()

Linus Torvalds
2019-03-08 09:43:58 +0800

06 Mar, 2019

1 commit

9a4e9f3b2 mm: update get_user_pages_longterm to migrate pages allocated from CMA region ... Browse Code »

This patch updates get_user_pages_longterm to migrate pages allocated
out of CMA region. This makes sure that we don't keep non-movable pages
(due to page reference count) in the CMA area.

This will be used by ppc64 in a later patch to avoid pinning pages in
the CMA region. ppc64 uses CMA region for allocation of the hardware
page table (hash page table) and not able to migrate pages out of CMA
region results in page table allocation failures.

One case where we hit this easy is when a guest using a VFIO passthrough
device. VFIO locks all the guest's memory and if the guest memory is
backed by CMA region, it becomes unmovable resulting in fragmenting the
CMA and possibly preventing other guests from allocation a large enough
hash page table.

NOTE: We allocate the new page without using __GFP_THISNODE

Link: http://lkml.kernel.org/r/20190114095438.32470-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V
Cc: Alexey Kardashevskiy
Cc: Andrea Arcangeli
Cc: David Gibson
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2019-03-06 13:07:19 +0800

13 Feb, 2019

1 commit

414fd080d mm/gup: fix gup_pmd_range() for dax ... Browse Code »

For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true
on x86. So the function works as long as hugetlb is configured.
However, dax doesn't depend on hugetlb.

Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.com
Signed-off-by: Yu Zhao
Reviewed-by: Jan Kara
Cc: Dan Williams
Cc: Huang Ying
Cc: Matthew Wilcox
Cc: Keith Busch
Cc: "Michael S . Tsirkin"
Cc: John Hubbard
Cc: Wei Yang
Cc: Mike Rapoport
Cc: Andrea Arcangeli
Cc: "Kirill A . Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yu Zhao
2019-02-13 08:33:18 +0800

11 Feb, 2019

1 commit

ad8cfb9c4 mm/gup: Remove the 'write' parameter from gup_fast_permitted() ... Browse Code »

The 'write' parameter is unused in gup_fast_permitted() so remove it.

Signed-off-by: Ira Weiny
Acked-by: Kirill A. Shutemov
Reviewed-by: Thomas Gleixner
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Dan Williams
Cc: Dave Hansen
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20190210223424.13934-1-ira.weiny@intel.com
Signed-off-by: Ingo Molnar

Ira Weiny
2019-02-11 15:20:40 +0800

06 Jan, 2019

1 commit

a65981109 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- procfs updates

- various misc bits

- lib/ updates

- epoll updates

- autofs

- fatfs

- a few more MM bits

* emailed patches from Andrew Morton : (58 commits)
mm/page_io.c: fix polled swap page in
checkpatch: add Co-developed-by to signature tags
docs: fix Co-Developed-by docs
drivers/base/platform.c: kmemleak ignore a known leak
fs: don't open code lru_to_page()
fs/: remove caller signal_pending branch predictions
mm/: remove caller signal_pending branch predictions
arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
kernel/sched/: remove caller signal_pending branch predictions
kernel/locking/mutex.c: remove caller signal_pending branch predictions
mm: select HAVE_MOVE_PMD on x86 for faster mremap
mm: speed up mremap by 20x on large regions
mm: treewide: remove unused address argument from pte_alloc functions
initramfs: cleanup incomplete rootfs
scripts/gdb: fix lx-version string output
kernel/kcov.c: mark write_comp_data() as notrace
kernel/sysctl: add panic_print into sysctl
panic: add options to print system info when panic happens
bfs: extra sanity checking and static inode bitmap
exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
...

Linus Torvalds
2019-01-06 01:16:18 +0800

05 Jan, 2019

1 commit

fa45f1162 mm/: remove caller signal_pending branch predictions ... Browse Code »

This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2019-01-05 05:13:48 +0800

04 Jan, 2019

1 commit

96d4f267e Remove 'type' argument from access_ok() function ... Browse Code »

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

- csky still had the old "verify_area()" name as an alias.

- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)

- microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds

Linus Torvalds
2019-01-04 10:57:57 +0800

01 Dec, 2018

1 commit

08be37b79 mm/gup: finish consolidating error handling ... Browse Code »

Commit df06b37ffe5a ("mm/gup: cache dev_pagemap while pinning pages")
attempted to operate on each page that get_user_pages had retrieved. In
order to do that, it created a common exit point from the routine.
However, one case was missed, which this patch fixes up.

Also, there was still an unnecessary shadow declaration (with a
different type) of the "ret" variable, which this patch removes.

Keith's description of the situation is:

This also fixes a potentially leaked dev_pagemap reference count if a
failure occurs when an iteration crosses a vma boundary. I don't think
it's normal to have different vma's on a users mapped zone device
memory, but good to fix anyway.

I actually thought that this code:

/* first iteration or cross vma bound */
if (!vma || start >= vma->vm_end) {
vma = find_extend_vma(mm, start);
if (!vma && in_gate_area(mm, start)) {
ret = get_gate_page(mm, start & PAGE_MASK,
gup_flags, &vma,
pages ? &pages[i] : NULL);
if (ret)
goto out;

dealt with the "you're trying to pin the gate page, as part of this
call", rather than the generic case of crossing a vma boundary. (I
think there's a fine point that I must be overlooking.) But it's still a
valid case, either way.

Link: http://lkml.kernel.org/r/20181121081402.29641-2-jhubbard@nvidia.com
Fixes: df06b37ffe5a4 ("mm/gup: cache dev_pagemap while pinning pages")
Signed-off-by: John Hubbard
Reviewed-by: Keith Busch
Cc: Dan Williams
Cc: Kirill A. Shutemov
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

John Hubbard
2018-12-01 06:56:13 +0800