Eric Lee / smarc-fsl-linux-kernel

02 Apr, 2017

1 commit

bee3f412d Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/de… ... Browse Code »

…ller/parisc-linux into uaccess.parisc

Al Viro
2017-04-02 22:33:48 +0800

29 Mar, 2017

1 commit

db68ce10c new helper: uaccess_kernel() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2017-03-29 04:43:25 +0800

10 Mar, 2017

2 commits

90eceff1a mm: introduce __p4d_alloc() ... Browse Code »

For full 5-level paging we need a helper to allocate p4d page table.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-03-10 03:48:48 +0800
c2febafc6 mm: convert generic code to 5-level paging ... Browse Code »

Convert all non-architecture-specific code to 5-level paging.

It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-03-10 03:48:47 +0800

02 Mar, 2017

4 commits

299300258 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:35 +0800
6a3827d75 sched/headers: Prepare for new header dependencies before moving code to <linux/… ... Browse Code »

…sched/numa_balancing.h>

We are going to split <linux/sched/numa_balancing.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/numa_balancing.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2017-03-02 15:42:30 +0800
f7ccbae45 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:28 +0800
6e84f3152 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:28 +0800

25 Feb, 2017

8 commits

288bc5494 mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte. ... Browse Code »

Patch series "Numabalancing preserve write fix", v2.

This patch series address an issue w.r.t THP migration and autonuma
preserve write feature. migrate_misplaced_transhuge_page() cannot deal
with concurrent modification of the page. It does a page copy without
following the migration pte sequence. IIUC, this was done to keep the
migration simpler and at the time of implemenation we didn't had THP
page cache which would have required a more elaborate migration scheme.
That means thp autonuma migration expect the protnone with saved write
to be done such that both kernel and user cannot update the page
content. This patch series enables archs like ppc64 to do that. We are
good with the hash translation mode with the current code, because we
never create a hardware page table entry for a protnone pte.

This patch (of 2):

Autonuma preserves the write permission across numa fault to avoid
taking a writefault after a numa fault (Commit: b191f9b106ea " mm: numa:
preserve PTE write permissions across a NUMA hinting fault").
Architecture can implement protnone in different ways and some may
choose to implement that by clearing Read/ Write/Exec bit of pte.
Setting the write bit on such pte can result in wrong behaviour. Fix
this up by allowing arch to override how to save the write bit on a
protnone pte.

[aneesh.kumar@linux.vnet.ibm.com: don't mark pte saved write in case of dirty_accountable]
Link: http://lkml.kernel.org/r/1487942884-16517-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
[aneesh.kumar@linux.vnet.ibm.com: v3]
Link: http://lkml.kernel.org/r/1487498625-10891-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1487050314-3892-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Michael Neuling
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2017-02-25 09:46:56 +0800
cee216a69 mm/autonuma: don't use set_pte_at when updating protnone ptes ... Browse Code »

Architectures like ppc64, use privilege access bit to mark pte non
accessible. This implies that kernel can do a copy_to_user to an
address marked for numa fault. This also implies that there can be a
parallel hardware update for the pte. set_pte_at cannot be used in such
scenarios. Hence switch the pte update to use ptep_get_and_clear and
set_pte_at combination.

[akpm@linux-foundation.org: remove unwanted ppc change, per Aneesh]
Link: http://lkml.kernel.org/r/1486400776-28114-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2017-02-25 09:46:56 +0800
166f61b94 mm: codgin-style fixes ... Browse Code »

Fix whitespace issues, extraneous braces.

Link: http://lkml.kernel.org/r/1485992240-10986-5-git-send-email-me@tobin.cc
Signed-off-by: Tobin C Harding
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tobin C Harding
2017-02-25 09:46:55 +0800
7f2b6ce8e mm/memory.c: use NULL instead of literal 0 ... Browse Code »

Patch fixes sparse warning: Using plain integer as NULL pointer.
Replaces assignment of 0 to pointer with NULL assignment.

Link: http://lkml.kernel.org/r/1485992240-10986-2-git-send-email-me@tobin.cc
Signed-off-by: Tobin C Harding
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tobin C Harding
2017-02-25 09:46:55 +0800
c791ace1e mm: replace FAULT_FLAG_SIZE with parameter to huge_fault ... Browse Code »

Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has
been somewhat painful with getting the flags set and removed at the
correct locations. More than one kernel oops was introduced due to
difficulties of getting the placement correctly.

Remove the flag values and introduce an input parameter to huge_fault
that indicates the size of the page entry. This makes the code easier
to trace and should avoid the issues we see with the fault flags where
removal of the flag was necessary in the fallback paths.

Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang
Tested-by: Dan Williams
Reviewed-by: Jan Kara
Cc: Matthew Wilcox
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Nilesh Choudhury
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jiang
2017-02-25 09:46:54 +0800
a00cc7d9d mm, x86: add support for PUD-sized transparent hugepages ... Browse Code »

The current transparent hugepage code only supports PMDs. This patch
adds support for transparent use of PUDs with DAX. It does not include
support for anonymous pages. x86 support code also added.

Most of this patch simply parallels the work that was done for huge
PMDs. The only major difference is how the new ->pud_entry method in
mm_walk works. The ->pmd_entry method replaces the ->pte_entry method,
whereas the ->pud_entry method works along with either ->pmd_entry or
->pte_entry. The pagewalk code takes care of locking the PUD before
calling ->pud_walk, so handlers do not need to worry whether the PUD is
stable.

[dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
[dave.jiang@intel.com: native_pud_clear missing on i386 build]
Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox
Signed-off-by: Dave Jiang
Tested-by: Alexander Kapshuk
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dan Williams
Cc: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Nilesh Choudhury
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2017-02-25 09:46:54 +0800
a2d581675 mm,fs,dax: change ->pmd_fault to ->huge_fault ... Browse Code »

Patch series "1G transparent hugepage support for device dax", v2.

The following series implements support for 1G trasparent hugepage on
x86 for device dax. The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX. I have
forward ported the relevant bits to 4.10-rc. The current submission has
only the necessary code to support device DAX.

Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs. Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file. We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].

[1]: https://lkml.org/lkml/2016/1/31/52

Comments from Nilesh @ Oracle:

There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:

processes : 10,000
memory : 6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MB

As you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level. Memory sizes will keep increasing; so
this number will keep increasing.

An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.

This patch (of 3):

In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault. The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.

[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox
Signed-off-by: Dave Jiang
Signed-off-by: Ross Zwisler
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dan Williams
Cc: Kirill A. Shutemov
Cc: Nilesh Choudhury
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Dave Jiang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jiang
2017-02-25 09:46:54 +0800
11bac8000 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf ... Browse Code »

->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang
Signed-off-by: Arnd Bergmann
Reviewed-by: Ross Zwisler
Cc: Theodore Ts'o
Cc: Darrick J. Wong
Cc: Matthew Wilcox
Cc: Dave Hansen
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jiang
2017-02-25 09:46:54 +0800

23 Feb, 2017

7 commits

ecf1385d7 mm: drop unused argument of zap_page_range() ... Browse Code »

There's no users of zap_page_range() who wants non-NULL 'details'.
Let's drop it.

Link: http://lkml.kernel.org/r/20170118122429.43661-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Cc: Peter Zijlstra
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-02-23 08:41:30 +0800
3e8715fdc mm: drop zap_details::check_swap_entries ... Browse Code »

detail == NULL would give the same functionality as
.check_swap_entries==true.

Link: http://lkml.kernel.org/r/20170118122429.43661-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Cc: Peter Zijlstra
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-02-23 08:41:30 +0800
da162e936 mm: drop zap_details::ignore_dirty ... Browse Code »

The only user of ignore_dirty is oom-reaper. But it doesn't really use
it.

ignore_dirty only has effect on file pages mapped with dirty pte. But
oom-repear skips shared VMAs, so there's no way we can dirty file pte in
them.

Link: http://lkml.kernel.org/r/20170118122429.43661-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Cc: Peter Zijlstra
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2017-02-23 08:41:30 +0800
810a56b94 userfaultfd: hugetlbfs: fix __mcopy_atomic_hugetlb retry/error processing ... Browse Code »

The new routine copy_huge_page_from_user() uses kmap_atomic() to map
PAGE_SIZE pages. However, this prevents page faults in the subsequent
call to copy_from_user(). This is OK in the case where the routine is
copied with mmap_sema held. However, in another case we want to allow
page faults. So, add a new argument allow_pagefault to indicate if the
routine should allow page faults.

[dan.carpenter@oracle.com: unmap the correct pointer]
Link: http://lkml.kernel.org/r/20170113082608.GA3548@mwanda
[akpm@linux-foundation.org: kunmap() takes a page*, per Hugh]
Link: http://lkml.kernel.org/r/20161216144821.5183-20-aarcange@redhat.com
Signed-off-by: Mike Kravetz
Signed-off-by: Andrea Arcangeli
Signed-off-by: Dan Carpenter
Cc: "Dr. David Alan Gilbert"
Cc: Hillf Danton
Cc: Michael Rapoport
Cc: Mike Rapoport
Cc: Pavel Emelyanov
Cc: Hugh Dickins
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2017-02-23 08:41:28 +0800
fa4d75c1d userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support ... Browse Code »

userfaultfd UFFDIO_COPY allows user level code to copy data to a page at
fault time. The data is copied from user space to a newly allocated
huge page. The new routine copy_huge_page_from_user performs this copy.

Link: http://lkml.kernel.org/r/20161216144821.5183-17-aarcange@redhat.com
Signed-off-by: Mike Kravetz
Signed-off-by: Andrea Arcangeli
Cc: "Dr. David Alan Gilbert"
Cc: Hillf Danton
Cc: Michael Rapoport
Cc: Mike Kravetz
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2017-02-23 08:41:28 +0800
f42003917 mm, dax: change pmd_fault() to take only vmf parameter ... Browse Code »

pmd_fault() and related functions really only need the vmf parameter since
the additional parameters are all included in the vmf struct. Remove the
additional parameter and simplify pmd_fault() and friends.

Link: http://lkml.kernel.org/r/1484085142-2297-8-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang
Reviewed-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Dave Chinner
Cc: Dave Jiang
Cc: Matthew Wilcox
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jiang
2017-02-23 08:41:26 +0800
d8a849e1b mm, dax: make pmd_fault() and friends be the same as fault() ... Browse Code »

Instead of passing in multiple parameters in the pmd_fault() handler,
a vmf can be passed in just like a fault() handler. This will simplify
code and remove the need for the actual pmd fault handlers to allocate a
vmf. Related functions are also modified to do the same.

[dave.jiang@intel.com: fix issue with xfs_tests stall when DAX option is off]
Link: http://lkml.kernel.org/r/148469861071.195597.3619476895250028518.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/1484085142-2297-7-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang
Reviewed-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Dave Chinner
Cc: Matthew Wilcox
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jiang
2017-02-23 08:41:26 +0800

11 Jan, 2017

2 commits

f729c8c9b dax: wrprotect pmd_t in dax_mapping_entry_mkclean ... Browse Code »

Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pmd_t of a DAX PMD entry during an *sync operation. This can result
in data loss in the following sequence:

1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the
pmd_t dirty and writeable
2) fsync, flushing out PMD data and cleaning the radix tree entry. We
currently fail to mark the pmd_t as clean and write protected.
3) more mmap writes to the PMD. These don't cause any page faults since
the pmd_t is dirty and writeable. The radix tree entry remains clean.
4) fsync, which fails to flush the dirty PMD data because the radix tree
entry was clean.
5) crash - dirty data that should have been fsync'd as part of 4) could
still have been in the processor cache, and is lost.

Fix this by marking the pmd_t clean and write protected in
dax_mapping_entry_mkclean(), which is called as part of the fsync
operation 2). This will cause the writes in step 3) above to generate
page faults where we'll re-dirty the PMD radix tree entry, resulting in
flushes in the fsync that happens in step 4).

Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush")
Link: http://lkml.kernel.org/r/1482272586-21177-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Alexander Viro
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jan Kara
Cc: Matthew Wilcox
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2017-01-11 10:31:54 +0800
097963959 mm: add follow_pte_pmd() ... Browse Code »

Patch series "Write protect DAX PMDs in *sync path".

Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pmd_t of a DAX PMD entry during an *sync operation. This can result
in data loss, as detailed in patch 2.

This series is based on Dan's "libnvdimm-pending" branch, which is the
current home for Jan's "dax: Page invalidation fixes" series. You can
find a working tree here:

https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean

This patch (of 2):

Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a
huge page PMD leaf to be found and returned.

Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Suggested-by: Dave Hansen
Cc: Alexander Viro
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jan Kara
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2017-01-11 10:31:54 +0800

08 Jan, 2017

1 commit

b0b9b3df2 mm: stop leaking PageTables ... Browse Code »

4.10-rc loadtest (even on x86, and even without THPCache) fails with
"fork: Cannot allocate memory" or some such; and /proc/meminfo shows
PageTables growing.

Commit 953c66c2b22a ("mm: THP page cache support for ppc64") that got
merged in rc1 removed the freeing of an unused preallocated pagetable
after do_fault_around() has called map_pages().

This is usually a good optimization, so that the followup doesn't have
to reallocate one; but it's not sufficient to shift the freeing into
alloc_set_pte(), since there are failure cases (most commonly
VM_FAULT_RETRY) which never reach finish_fault().

Check and free it at the outer level in do_fault(), then we don't need
to worry in alloc_set_pte(), and can restore that to how it was (I
cannot find any reason to pte_free() under lock as it was doing).

And fix a separate pagetable leak, or crash, introduced by the same
change, that could only show up on some ppc64: why does do_set_pmd()'s
failure case attempt to withdraw a pagetable when it never deposited
one, at the same time overwriting (so leaking) the vmf->prealloc_pte?
Residue of an earlier implementation, perhaps? Delete it.

Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64")
Cc: Aneesh Kumar K.V
Cc: Kirill A. Shutemov
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Balbir Singh
Cc: Andrew Morton
Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2017-01-08 09:49:33 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

15 Dec, 2016

13 commits

a57cb1c1d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- a few misc things

- kexec updates

- DMA-mapping updates to better support networking DMA operations

- IPC updates

- various MM changes to improve DAX fault handling

- lots of radix-tree changes, mainly to the test suite. All leading up
to reimplementing the IDA/IDR code to be a wrapper layer over the
radix-tree. However the final trigger-pulling patch is held off for
4.11.

* emailed patches from Andrew Morton : (114 commits)
radix tree test suite: delete unused rcupdate.c
radix tree test suite: add new tag check
radix-tree: ensure counts are initialised
radix tree test suite: cache recently freed objects
radix tree test suite: add some more functionality
idr: reduce the number of bits per level from 8 to 6
rxrpc: abstract away knowledge of IDR internals
tpm: use idr_find(), not idr_find_slowpath()
idr: add ida_is_empty
radix tree test suite: check multiorder iteration
radix-tree: fix replacement for multiorder entries
radix-tree: add radix_tree_split_preload()
radix-tree: add radix_tree_split
radix-tree: add radix_tree_join
radix-tree: delete radix_tree_range_tag_if_tagged()
radix-tree: delete radix_tree_locate_item()
radix-tree: improve multiorder iterators
btrfs: fix race in btrfs_free_dummy_fs_info()
radix-tree: improve dump output
radix-tree: make radix_tree_find_next_bit more useful
...

Linus Torvalds
2016-12-15 09:25:18 +0800
2f89dc12a dax: protect PTE modification on WP fault by radix tree entry lock ... Browse Code »

Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
has released corresponding radix tree entry lock. When we want to
writeprotect PTE on cache flush, we need PTE modification to happen
under radix tree entry lock to ensure consistent updates of PTE and
radix tree (standard faults use page lock to ensure this consistency).
So move update of PTE bit into dax_pfn_mkwrite().

Link: http://lkml.kernel.org/r/1479460644-25076-20-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
cae124025 mm: export follow_pte() ... Browse Code »

DAX will need to implement its own version of page_check_address(). To
avoid duplicating page table walking code, export follow_pte() which
does what we need.

Link: http://lkml.kernel.org/r/1479460644-25076-18-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
a19e25536 mm: change return values of finish_mkwrite_fault() ... Browse Code »

Currently finish_mkwrite_fault() returns 0 when PTE got changed before
we acquired PTE lock and VM_FAULT_WRITE when we succeeded in modifying
the PTE. This is somewhat confusing since 0 generally means success, it
is also inconsistent with finish_fault() which returns 0 on success.
Change finish_mkwrite_fault() to return 0 on success and VM_FAULT_NOPAGE
when PTE changed. Practically, there should be no behavioral difference
since we bail out from the fault the same way regardless whether we
return 0, VM_FAULT_NOPAGE, or VM_FAULT_WRITE. Also note that
VM_FAULT_WRITE has no effect for shared mappings since the only two
places that check it - KSM and GUP - care about private mappings only.
Generally the meaning of VM_FAULT_WRITE for shared mappings is not well
defined and we should probably clean that up.

Link: http://lkml.kernel.org/r/1479460644-25076-17-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Acked-by: Kirill A. Shutemov
Cc: Ross Zwisler
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
66a6197c1 mm: provide helper for finishing mkwrite faults ... Browse Code »

Provide a helper function for finishing write faults due to PTE being
read-only. The helper will be used by DAX to avoid the need of
complicating generic MM code with DAX locking specifics.

Link: http://lkml.kernel.org/r/1479460644-25076-16-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
997dd98dd mm: move part of wp_page_reuse() into the single call site ... Browse Code »

wp_page_reuse() handles write shared faults which is needed only in
wp_page_shared(). Move the handling only into that location to make
wp_page_reuse() simpler and avoid a strange situation when we sometimes
pass in locked page, sometimes unlocked etc.

Link: http://lkml.kernel.org/r/1479460644-25076-15-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
a41b70d6d mm: use vmf->page during WP faults ... Browse Code »

So far we set vmf->page during WP faults only when we needed to pass it
to the ->page_mkwrite handler. Set it in all the cases now and use that
instead of passing page pointer explicitly around.

Link: http://lkml.kernel.org/r/1479460644-25076-14-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
38b8cb7fb mm: pass vm_fault structure into do_page_mkwrite() ... Browse Code »

We will need more information in the ->page_mkwrite() helper for DAX to
be able to fully finish faults there. Pass vm_fault structure to
do_page_mkwrite() and use it there so that information propagates
properly from upper layers.

Link: http://lkml.kernel.org/r/1479460644-25076-13-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
97ba0c2b4 mm: factor out common parts of write fault handling ... Browse Code »

Currently we duplicate handling of shared write faults in
wp_page_reuse() and do_shared_fault(). Factor them out into a common
function.

Link: http://lkml.kernel.org/r/1479460644-25076-12-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
b1aa812b2 mm: move handling of COW faults into DAX code ... Browse Code »

Move final handling of COW faults from generic code into DAX fault
handler. That way generic code doesn't have to be aware of
peculiarities of DAX locking so remove that knowledge and make locking
functions private to fs/dax.c.

Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Acked-by: Kirill A. Shutemov
Reviewed-by: Ross Zwisler
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
9118c0cbd mm: factor out functionality to finish page faults ... Browse Code »

Introduce finish_fault() as a helper function for finishing page faults.
It is rather thin wrapper around alloc_set_pte() but since we'd want to
call this from DAX code or filesystems, it is still useful to avoid some
boilerplate code.

Link: http://lkml.kernel.org/r/1479460644-25076-10-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
3917048d4 mm: allow full handling of COW faults in ->fault handlers ... Browse Code »

Patch series "dax: Clear dirty bits after flushing caches", v5.

Patchset to clear dirty bits from radix tree of DAX inodes when caches
for corresponding pfns have been flushed. In principle, these patches
enable handlers to easily update PTEs and do other work necessary to
finish the fault without duplicating the functionality present in the
generic code. I'd like to thank Kirill and Ross for reviews of the
series!

This patch (of 20):

To allow full handling of COW faults add memcg field to struct vm_fault
and a return value of ->fault() handler meaning that COW fault is fully
handled and memcg charge must not be canceled. This will allow us to
remove knowledge about special DAX locking from the generic fault code.

Link: http://lkml.kernel.org/r/1479460644-25076-9-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800
2994302bc mm: add orig_pte field into vm_fault ... Browse Code »

Add orig_pte field to vm_fault structure to allow ->page_mkwrite
handlers to fully handle the fault.

This also allows us to save some passing of extra arguments around.

Link: http://lkml.kernel.org/r/1479460644-25076-8-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2016-12-15 08:04:09 +0800