Eric Lee / smarc-fsl-linux-kernel

17 Jul, 2018

1 commit

81ebc9dec mm: do not bug_on on incorrect length in __mm_populate() ... Browse Code »

commit bb177a732c4369bb58a1fe1df8f552b6f0f7db5f upstream.

syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate

kernel BUG at mm/gup.c:1242!
invalid opcode: 0000 [#1] SMP
CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
RIP: 0010:__mm_populate+0x1e2/0x1f0
Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
Call Trace:
vm_brk_flags+0xc3/0x100
vm_brk+0x1f/0x30
load_elf_library+0x281/0x2e0
__ia32_sys_uselib+0x170/0x1e0
do_fast_syscall_32+0xca/0x420
entry_SYSENTER_compat+0x70/0x7f

The reason is that the length of the new brk is not page aligned when we
try to populate the it. There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is
expanded as it should. All we need is to tell mm_populate about it.
Besides that there is absolutely no reason to to bug_on in the first
place. The worst thing that could happen is that the last page wouldn't
get populated and that is far from putting system into an inconsistent
state.

Fix the issue by moving the length sanitization code from do_brk_flags
up to vm_brk_flags. The only other caller of do_brk_flags is brk
syscall entry and it makes sure to provide the proper length so t here
is no need for sanitation and so we can use do_brk_flags without it.

Also remove the bogus BUG_ONs.

[osalvador@techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko
Reported-by: syzbot
Tested-by: Tetsuo Handa
Reviewed-by: Oscar Salvador
Cc: Zi Yan
Cc: "Aneesh Kumar K.V"
Cc: Dan Williams
Cc: "Kirill A. Shutemov"
Cc: Michael S. Tsirkin
Cc: Al Viro
Cc: "Huang, Ying"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Michal Hocko
2018-07-17 17:39:29 +0800

03 Jul, 2018

1 commit

2d329968a mm: fix __gup_device_huge vs unmap ... Browse Code »

commit a9b6de77b1a3ff729f7bfc54b2e17711776a416c upstream.

get_user_pages_fast() for device pages is missing the typical validation
that all page references have been taken while the mapping was valid.
Without this validation truncate operations can not reliably coordinate
against new page reference events like O_DIRECT.

Cc:
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Reported-by: Jan Kara
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman

Dan Williams
2018-07-03 17:24:57 +0800

19 May, 2018

1 commit

5c9a9508d proc: do not access cmdline nor environ from file-backed areas ... Browse Code »

commit 7f7ccc2ccc2e70c6054685f5e3522efa81556830 upstream.

proc_pid_cmdline_read() and environ_read() directly access the target
process' VM to retrieve the command line and environment. If this
process remaps these areas onto a file via mmap(), the requesting
process may experience various issues such as extra delays if the
underlying device is slow to respond.

Let's simply refuse to access file-backed areas in these functions.
For this we add a new FOLL_ANON gup flag that is passed to all calls
to access_remote_vm(). The code already takes care of such failures
(including unmapped areas). Accesses via /proc/pid/mem were not
changed though.

This was assigned CVE-2018-1120.

Note for stable backports: the patch may apply to kernels prior to 4.11
but silently miss one location; it must be checked that no call to
access_remote_vm() keeps zero as the last argument.

Reported-by: Qualys Security Advisory
Cc: Linus Torvalds
Cc: Andy Lutomirski
Cc: Oleg Nesterov
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Willy Tarreau
2018-05-19 16:20:27 +0800

19 Apr, 2018

1 commit

adea72f0e get_user_pages_fast(): return -EFAULT on access_ok failure ... Browse Code »

commit c61611f70958d86f659bca25c02ae69413747a8d upstream.

get_user_pages_fast is supposed to be a faster drop-in equivalent of
get_user_pages. As such, callers expect it to return a negative return
code when passed an invalid address, and never expect it to return 0
when passed a positive number of pages, since its documentation says:

* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
* were pinned, returns -errno.

When get_user_pages_fast fall back on get_user_pages this is exactly
what happens. Unfortunately the implementation is inconsistent: it
returns 0 if passed a kernel address, confusing callers: for example,
the following is pretty common but does not appear to do the right thing
with a kernel address:

ret = get_user_pages_fast(addr, 1, writeable, &page);
if (ret < 0)
return ret;

Change get_user_pages_fast to return -EFAULT when supplied a kernel
address to make it match expectations.

All callers have been audited for consistency with the documented
semantics.

Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com
Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
Signed-off-by: Michael S. Tsirkin
Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton
Cc: Kirill A. Shutemov
Cc: Huang Ying
Cc: Jonathan Corbet
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Thorsten Leemhuis
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Michael S. Tsirkin
2018-04-19 14:56:19 +0800

10 Dec, 2017

1 commit

342ee8775 mm, x86/mm: Fix performance regression in get_user_pages_fast() ... Browse Code »

[ Upstream commit 5b65c4677a57a1d4414212f9995aa0e46a21ff80 ]

The 0-day test bot found a performance regression that was tracked down to
switching x86 to the generic get_user_pages_fast() implementation:

http://lkml.kernel.org/r/20170710024020.GA26389@yexl-desktop

The regression was caused by the fact that we now use local_irq_save() +
local_irq_restore() in get_user_pages_fast() to disable interrupts.
In x86 implementation local_irq_disable() + local_irq_enable() was used.

The fix is to make get_user_pages_fast() use local_irq_disable(),
leaving local_irq_save() for __get_user_pages_fast() that can be called
with interrupts disabled.

Numbers for pinning a gigabyte of memory, one page a time, 20 repeats:

Before: Average: 14.91 ms, stddev: 0.45 ms
After: Average: 10.76 ms, stddev: 0.18 ms

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Huang Ying
Cc: Jonathan Corbet
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Thorsten Leemhuis
Cc: linux-mm@kvack.org
Fixes: e585513b76f7 ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
Link: http://lkml.kernel.org/r/20170908215603.9189-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Kirill A. Shutemov
2017-12-10 20:40:43 +0800

05 Dec, 2017

1 commit

40aa9d299 mm: introduce get_user_pages_longterm ... Browse Code »

commit 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 upstream.

Patch series "introduce get_user_pages_longterm()", v2.

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

This patch (of 4):

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

[akpm@linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams
Suggested-by: Christoph Hellwig
Cc: Doug Ledford
Cc: Hal Rosenstock
Cc: Inki Dae
Cc: Jan Kara
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Joonyoung Shim
Cc: Kyungmin Park
Cc: Mauro Carvalho Chehab
Cc: Mel Gorman
Cc: Ross Zwisler
Cc: Sean Hefty
Cc: Seung-Woo Kim
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Dan Williams
2017-12-05 18:26:28 +0800

09 Sep, 2017

2 commits

df6ad6983 mm/device-public-memory: device memory cache coherent with CPU ... Browse Code »

Platform with advance system bus (like CAPI or CCIX) allow device memory
to be accessible from CPU in a cache coherent fashion. Add a new type of
ZONE_DEVICE to represent such memory. The use case are the same as for
the un-addressable device memory but without all the corners cases.

Link: http://lkml.kernel.org/r/20170817000548.32038-19-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Cc: Aneesh Kumar
Cc: Paul E. McKenney
Cc: Benjamin Herrenschmidt
Cc: Dan Williams
Cc: Ross Zwisler
Cc: Balbir Singh
Cc: David Nellans
Cc: Evgeny Baskakov
Cc: Johannes Weiner
Cc: John Hubbard
Cc: Kirill A. Shutemov
Cc: Mark Hairgrove
Cc: Michal Hocko
Cc: Sherry Cheung
Cc: Subhash Gutti
Cc: Vladimir Davydov
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jérôme Glisse
2017-09-09 09:26:46 +0800
84c3fc4e9 mm: thp: check pmd migration entry in common path ... Browse Code »

When THP migration is being used, memory management code needs to handle
pmd migration entries properly. This patch uses !pmd_present() or
is_swap_pmd() (depending on whether pmd_none() needs separate code or
not) to check pmd migration entries at the places where a pmd entry is
present.

Since pmd-related code uses split_huge_page(), split_huge_pmd(),
pmd_trans_huge(), pmd_trans_unstable(), or
pmd_none_or_trans_huge_or_clear_bad(), this patch:

1. adds pmd migration entry split code in split_huge_pmd(),

2. takes care of pmd migration entries whenever pmd_trans_huge() is present,

3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.

Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
them.

Until this commit, a pmd entry should be:
1. pointing to a pte page,
2. is_swap_pmd(),
3. pmd_trans_huge(),
4. pmd_devmap(), or
5. pmd_none().

Signed-off-by: Zi Yan
Cc: Kirill A. Shutemov
Cc: "H. Peter Anvin"
Cc: Anshuman Khandual
Cc: Dave Hansen
Cc: David Nellans
Cc: Ingo Molnar
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Naoya Horiguchi
Cc: Thomas Gleixner
Cc: Vlastimil Babka
Cc: Andrea Arcangeli
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zi Yan
2017-09-09 09:26:45 +0800

07 Sep, 2017

1 commit

09180ca4b mm/gup: make __gup_device_* require THP ... Browse Code »

These functions are the only bits of generic code that use
{pud,pmd}_pfn() without checking for CONFIG_TRANSPARENT_HUGEPAGE. This
works fine on x86, the only arch with devmap support, since the *_pfn()
functions are always defined there, but this isn't true for every
architecture.

Link: http://lkml.kernel.org/r/20170626063833.11094-1-oohall@gmail.com
Signed-off-by: Oliver O'Halloran
Cc: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oliver O'Halloran
2017-09-07 08:27:26 +0800

07 Jul, 2017

5 commits

d63206ee3 mm, gup: ensure real head page is ref-counted when using hugepages ... Browse Code »

When speculatively taking references to a hugepage using
page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the
page returned by pmd_page() is the head page. Although normally true,
this assumption doesn't hold when the hugepage comprises of successive
page table entries such as when using contiguous bit on arm64 at PTE or
PMD levels.

This can be addressed by ensuring that the page passed to
page_cache_add_speculative() is the real head or by de-referencing the
head page within the function.

We take the first approach to keep the usage pattern aligned with
page_cache_get_speculative() where users already pass the appropriate
page, i.e., the de-referenced head.

Apply the same logic to fix gup_huge_[pud|pgd]() as well.

[punit.agrawal@arm.com: fix arm64 ltp failure]
Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com
Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal
Acked-by: Steve Capper
Cc: Michal Hocko
Cc: "Kirill A. Shutemov"
Cc: Aneesh Kumar K.V
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Naoya Horiguchi
Cc: Mark Rutland
Cc: Hillf Danton
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Punit Agrawal
2017-07-07 07:24:34 +0800
a3e328556 mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages ... Browse Code »

When operating on hugepages with DEBUG_VM enabled, the GUP code checks
the compound head for each tail page prior to calling
page_cache_add_speculative. This is broken, because on the fast-GUP
path (where we don't hold any page table locks) we can be racing with a
concurrent invocation of split_huge_page_to_list.

split_huge_page_to_list deals with this race by using page_ref_freeze to
freeze the page and force concurrent GUPs to fail whilst the component
pages are modified. This modification includes clearing the
compound_head field for the tail pages, so checking this prior to a
successful call to page_cache_add_speculative can lead to false
positives: In fact, page_cache_add_speculative *already* has this check
once the page refcount has been successfully updated, so we can simply
remove the broken calls to VM_BUG_ON_PAGE.

Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.com
Signed-off-by: Will Deacon
Signed-off-by: Punit Agrawal
Acked-by: Steve Capper
Acked-by: Kirill A. Shutemov
Cc: Aneesh Kumar K.V
Cc: Catalin Marinas
Cc: Naoya Horiguchi
Cc: Mark Rutland
Cc: Hillf Danton
Cc: Michal Hocko
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Will Deacon
2017-07-07 07:24:34 +0800
4dc71451a mm/follow_page_mask: add support for hugepage directory entry ... Browse Code »

Architectures like ppc64 supports hugepage size that is not mapped to
any of of the page table levels. Instead they add an alternate page
table entry format called hugepage directory (hugepd). hugepd indicates
that the page table entry maps to a set of hugetlb pages. Add support
for this in generic follow_page_mask code. We already support this
format in the generic gup code.

The default implementation prints warning and returns NULL. We will add
ppc64 support in later patches

Link: http://lkml.kernel.org/r/1494926612-23928-7-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Cc: Anshuman Khandual
Cc: Naoya Horiguchi
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2017-07-07 07:24:33 +0800
faaa5b62d mm/follow_page_mask: add support for hugetlb pgd entries ... Browse Code »

ppc64 supports pgd hugetlb entries. Add code to handle hugetlb pgd
entries to follow_page_mask so that ppc64 can switch to it to handle
hugetlbe entries.

Link: http://lkml.kernel.org/r/1494926612-23928-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual
Signed-off-by: Aneesh Kumar K.V
Cc: Naoya Horiguchi
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anshuman Khandual
2017-07-07 07:24:33 +0800
080dbb618 mm/follow_page_mask: split follow_page_mask to smaller functions. ... Browse Code »

Makes code reading easy. No functional changes in this patch. In a
followup patch, we will be updating the follow_page_mask to handle
hugetlb hugepd format so that archs like ppc64 can switch to the generic
version. This split helps in doing that nicely.

Link: http://lkml.kernel.org/r/1494926612-23928-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Reviewed-by: Naoya Horiguchi
Cc: Anshuman Khandual
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2017-07-07 07:24:33 +0800

22 Jun, 2017

1 commit

a4eb8b993 Merge branch 'linus' into x86/mm, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2017-06-22 16:57:28 +0800

19 Jun, 2017

1 commit

1be7107fb mm: larger stack guard gap, between vmas ... Browse Code »

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov
Original-patch-by: Michal Hocko
Signed-off-by: Hugh Dickins
Acked-by: Michal Hocko
Tested-by: Helge Deller # parisc
Signed-off-by: Linus Torvalds

Hugh Dickins
2017-06-19 21:50:20 +0800

13 Jun, 2017

1 commit

e585513b7 x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation ... Browse Code »

This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-06-13 14:56:50 +0800

03 Jun, 2017

1 commit

9a291a7c9 mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified ... Browse Code »

KVM uses get_user_pages() to resolve its stage2 faults. KVM sets the
FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
finds a VM_FAULT_HWPOISON. KVM handles these hwpoison pages as a
special case. (check_user_page_hwpoison())

When huge pages are involved, this doesn't work so well.
get_user_pages() calls follow_hugetlb_page(), which stops early if it
receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
-EFAULT to the caller. The step to map this to -EHWPOISON based on the
FOLL_ flags is missing. The hwpoison special case is skipped, and
-EFAULT is returned to user-space, causing Qemu or kvmtool to exit.

Instead, move this VM_FAULT_ to errno mapping code into a header file
and use it from faultin_page() and follow_hugetlb_page().

With this, KVM works as expected.

This isn't a problem for arm64 today as we haven't enabled
MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
too, so I think this should be a fix. This doesn't apply earlier than
stable's v4.11.1 due to all sorts of cleanup.

[james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
suggested.
Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
Signed-off-by: James Morse
Acked-by: Punit Agrawal
Acked-by: Naoya Horiguchi
Cc: "Kirill A . Shutemov"
Cc: [4.11.1+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

James Morse
2017-06-03 06:07:38 +0800

04 May, 2017

1 commit

aa2369f11 mm/gup.c: fix access_ok() argument type ... Browse Code »

MIPS just got changed to only accept a pointer argument for access_ok(),
causing one warning in drivers/scsi/pmcraid.c. I tried changing x86 the
same way and found the same warning in __get_user_pages_fast() and
nowhere else in the kernel during randconfig testing:

mm/gup.c: In function '__get_user_pages_fast':
mm/gup.c:1578:6: error: passing argument 1 of '__chk_range_not_ok' makes pointer from integer without a cast [-Werror=int-conversion]

It would probably be a good idea to enforce type-safety in general, so
let's change this file to not cause a warning if we do that.

I don't know why the warning did not appear on MIPS.

Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
Link: http://lkml.kernel.org/r/20170421162659.3314521-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann
Cc: Alexander Viro
Acked-by: Ingo Molnar
Cc: Michal Hocko
Cc: "Kirill A. Shutemov"
Cc: Lorenzo Stoakes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2017-05-04 06:52:12 +0800

23 Apr, 2017

1 commit

6dd29b3df Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" ... Browse Code »

This reverts commit 2947ba054a4dabbd82848728d765346886050029.

Dan Williams reported dax-pmem kernel warnings with the following signature:

WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
percpu ref (dax_pmem_percpu_release [dax_pmem])
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Kirill A. Shutemov
Cc: Linus Torvalds
Cc: Michal Hocko
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dann.frazier@canonical.com
Cc: dave.hansen@intel.com
Cc: steve.capper@linaro.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-04-23 17:45:20 +0800

18 Mar, 2017

7 commits

2947ba054 x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation ... Browse Code »

This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Aneesh Kumar K . V
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dann Frazier
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Steve Capper
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-03-18 16:48:03 +0800
73e10a618 mm/gup: Provide callback to check if __GUP_fast() is allowed for the range ... Browse Code »

This is a preparation patch for the transition of x86 to the generic GUP_fast()
implementation.

On x86, get_user_pages_fast() does a couple of sanity checks to see if we can
call __get_user_pages_fast() for the range.

This kind of wrapping protection should be useful for the generic code too.

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Aneesh Kumar K . V
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dann Frazier
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Steve Capper
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-7-kirill.shutemov@linux.intel.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-03-18 16:48:03 +0800
b59f65fa0 mm/gup: Implement the dev_pagemap() logic in the generic get_user_pages_fast() function ... Browse Code »

This is a preparation patch for the transition of x86 to the generic GUP_fast()
implementation.

Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's
only implemented on x86. On non-x86, the new code will be compiled out.

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Aneesh Kumar K . V
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dan Williams
Cc: Dann Frazier
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Steve Capper
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-03-18 16:48:02 +0800
e93480537 mm/gup: Mark all pages PageReferenced in generic get_user_pages_fast() ... Browse Code »

This is a preparation patch for the transition of x86 to the generic GUP_fast()
implementation.

Unlike generic GUP_fast(), the x86 version makes all pages it touches
referenced. It seems required for GRU and EPT.

See the following commit:

8ee53820edfd ("thp: mmu_notifier_test_young")

Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Andrew Morton
Cc: Aneesh Kumar K . V
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dann Frazier
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Steve Capper
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-03-18 16:48:02 +0800
0005d20b2 mm/gup: Move page table entry dereference into helper function ... Browse Code »

This is a preparation patch for the transition of x86 to the generic GUP_fast()
implementation.

On x86 PAE, page table entry is larger than sizeof(long) and we would
need to provide a helper that can read the entry atomically.

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Aneesh Kumar K . V
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dann Frazier
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Steve Capper
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-03-18 16:48:02 +0800
e7884f8ea mm/gup: Move permission checks into helpers ... Browse Code »

This is a preparation patch for the transition of x86 to the generic GUP_fast()
implementation.

On x86, we would need to do additional permission checks to determine if
access is allowed.

Let's abstract it out into separate helpers.

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Aneesh Kumar K . V
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dann Frazier
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Steve Capper
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-03-18 16:48:01 +0800
9a804fece mm/gup: Drop the arch_pte_access_permitted() MMU callback ... Browse Code »

The only arch that defines it to something meaningful is x86.
But x86 doesn't use the generic GUP_fast() implementation -- the
only place where the callback is called.

Let's drop it.

Signed-off-by: Kirill A. Shutemov
Cc: Andrew Morton
Cc: Aneesh Kumar K . V
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dann Frazier
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Steve Capper
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar

Kirill A. Shutemov
2017-03-18 16:48:01 +0800

13 Mar, 2017

1 commit

ce70df089 mm, gup: fix typo in gup_p4d_range() ... Browse Code »

gup_p4d_range() should call gup_pud_range(), not itself.

[ This was not noticed on x86: this is the HAVE_GENERIC_RCU_GUP code
used by arm[64] and powerpc - Linus ]

Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
Signed-off-by: Kirill A. Shutemov
Reported-by: Chris Packham
Reported-by: Anton Blanchard
Acked-by: Michal Hocko
Acked-by: Mark Rutland
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-03-13 23:58:09 +0800

10 Mar, 2017

1 commit

c2febafc6 mm: convert generic code to 5-level paging ... Browse Code »

Convert all non-architecture-specific code to 5-level paging.

It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-03-10 03:48:47 +0800

02 Mar, 2017

1 commit

174cd4b1e sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sc… ... Browse Code »

…hed.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2017-03-02 15:42:32 +0800

25 Feb, 2017

2 commits

db08f2030 mm/gup: check for protnone only if it is a PTE entry ... Browse Code »

Do the prot_none/FOLL_NUMA check after we are sure this is a THP pte.
Archs can implement prot_none such that it can return true for regular
pmd entries.

Link: http://lkml.kernel.org/r/1487498326-8734-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Hillf Danton
Cc: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2017-02-25 09:46:56 +0800
a00cc7d9d mm, x86: add support for PUD-sized transparent hugepages ... Browse Code »

The current transparent hugepage code only supports PMDs. This patch
adds support for transparent use of PUDs with DAX. It does not include
support for anonymous pages. x86 support code also added.

Most of this patch simply parallels the work that was done for huge
PMDs. The only major difference is how the new ->pud_entry method in
mm_walk works. The ->pmd_entry method replaces the ->pte_entry method,
whereas the ->pud_entry method works along with either ->pmd_entry or
->pte_entry. The pagewalk code takes care of locking the PUD before
calling ->pud_walk, so handlers do not need to worry whether the PUD is
stable.

[dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
[dave.jiang@intel.com: native_pud_clear missing on i386 build]
Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox
Signed-off-by: Dave Jiang
Tested-by: Alexander Kapshuk
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dan Williams
Cc: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Nilesh Choudhury
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-02-25 09:46:54 +0800

23 Feb, 2017

1 commit

87ffc118b userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY ... Browse Code »

Add support for VM_FAULT_RETRY to follow_hugetlb_page() so that
get_user_pages_unlocked/locked and "nonblocking/FOLL_NOWAIT" features
will work on hugetlbfs.

This is required for fully functional userfaultfd non-present support on
hugetlbfs.

Link: http://lkml.kernel.org/r/20161216144821.5183-25-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli
Reviewed-by: Mike Kravetz
Cc: "Dr. David Alan Gilbert"
Cc: Hillf Danton
Cc: Michael Rapoport
Cc: Mike Rapoport
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2017-02-23 08:41:28 +0800

15 Dec, 2016

2 commits

8b7457ef9 mm: unexport __get_user_pages_unlocked() ... Browse Code »

Unexport the low-level __get_user_pages_unlocked() function and replaces
invocations with calls to more appropriate higher-level functions.

In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked()
with get_user_pages_unlocked() since we can now pass gup_flags.

In async_pf_execute() and process_vm_rw_single_vec() we need to pass
different tsk, mm arguments so get_user_pages_remote() is the sane
replacement in these cases (having added manual acquisition and release
of mmap_sem.)

Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
flag. However, this flag was originally silently dropped by commit
1e9877902dc7 ("mm/gup: Introduce get_user_pages_remote()"), so this
appears to have been unintentional and reintroducing it is therefore not
an issue.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Cc: Jan Kara
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Paolo Bonzini
Cc: Radim Krcmar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-12-15 08:04:09 +0800
5b56d49fc mm: add locked parameter to get_user_pages_remote() ... Browse Code »

Patch series "mm: unexport __get_user_pages_unlocked()".

This patch series continues the cleanup of get_user_pages*() functions
taking advantage of the fact we can now pass gup_flags as we please.

It firstly adds an additional 'locked' parameter to
get_user_pages_remote() to allow for its callers to utilise
VM_FAULT_RETRY functionality. This is necessary as the invocation of
__get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
this and no other existing higher level function would allow it to do
so.

Secondly existing callers of __get_user_pages_unlocked() are replaced
with the appropriate higher-level replacement -
get_user_pages_unlocked() if the current task and memory descriptor are
referenced, or get_user_pages_remote() if other task/memory descriptors
are referenced (having acquiring mmap_sem.)

This patch (of 2):

Add a int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().

Taking into account the previous adjustments to get_user_pages*()
functions allowing for the passing of gup_flags, we are now in a
position where __get_user_pages_unlocked() need only be exported for his
ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
subsequently unexport __get_user_pages_unlocked() as well as allowing
for future flexibility in the use of get_user_pages_remote().

[sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Cc: Jan Kara
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Paolo Bonzini
Cc: Radim Krcmar
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-12-15 08:04:08 +0800

13 Dec, 2016

2 commits

80a795162 mm: fix up get_user_pages* comments ... Browse Code »

In the previous round of get_user_pages* changes comments attached to
__get_user_pages_unlocked() and get_user_pages_unlocked() were rendered
incorrect, this patch corrects them.

In addition the get_user_pages_unlocked() comment seems to have already
been outdated as it referred to tsk, mm parameters which were removed in
c12d2da5 ("mm/gup: Remove the macro overload API migration helpers from
the get_user*() APIs"), this patch fixes this also.

Link: http://lkml.kernel.org/r/20161025233435.5338-1-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-12-13 10:55:07 +0800
771ab4302 mm/gup.c: make unnecessarily global vma_permits_fault() static ... Browse Code »

Make vma_permits_fault() static as it is only used in mm/gup.c

This fixes a sparse warning.

Link: http://lkml.kernel.org/r/20161017122353.31598-1-tklauser@distanz.ch
Signed-off-by: Tobias Klauser
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tobias Klauser
2016-12-13 10:55:07 +0800

25 Oct, 2016

1 commit

0d7317598 mm: unexport __get_user_pages() ... Browse Code »

This patch unexports the low-level __get_user_pages() function.

Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.

We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:

get_user_page_nowait():
get_user_pages(start, 1, flags, page, NULL)
__get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, start, 1,
flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)

check_user_page_hwpoison():
get_user_pages(addr, 1, flags, NULL, NULL)
__get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
NULL, NULL)

Signed-off-by: Lorenzo Stoakes
Acked-by: Paolo Bonzini
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-25 10:13:20 +0800

19 Oct, 2016

2 commits

9beae1ea8 mm: replace get_user_pages_remote() write/force parameters with gup_flags ... Browse Code »

This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Reviewed-by: Jan Kara
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 23:12:02 +0800
768ae309a mm: replace get_user_pages() write/force parameters with gup_flags ... Browse Code »

This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Acked-by: Christian König
Acked-by: Jesper Nilsson
Acked-by: Michal Hocko
Reviewed-by: Jan Kara
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 23:11:43 +0800