02 Apr, 2017
1 commit
-
…ller/parisc-linux into uaccess.parisc
29 Mar, 2017
1 commit
-
Signed-off-by: Al Viro
10 Mar, 2017
2 commits
-
For full 5-level paging we need a helper to allocate p4d page table.
Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds -
Convert all non-architecture-specific code to 5-level paging.
It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
02 Mar, 2017
4 commits
-
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
…sched/numa_balancing.h>
We are going to split <linux/sched/numa_balancing.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder <linux/sched/numa_balancing.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org> -
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.The APIs that are going to be moved first are:
mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
25 Feb, 2017
8 commits
-
Patch series "Numabalancing preserve write fix", v2.
This patch series address an issue w.r.t THP migration and autonuma
preserve write feature. migrate_misplaced_transhuge_page() cannot deal
with concurrent modification of the page. It does a page copy without
following the migration pte sequence. IIUC, this was done to keep the
migration simpler and at the time of implemenation we didn't had THP
page cache which would have required a more elaborate migration scheme.
That means thp autonuma migration expect the protnone with saved write
to be done such that both kernel and user cannot update the page
content. This patch series enables archs like ppc64 to do that. We are
good with the hash translation mode with the current code, because we
never create a hardware page table entry for a protnone pte.This patch (of 2):
Autonuma preserves the write permission across numa fault to avoid
taking a writefault after a numa fault (Commit: b191f9b106ea " mm: numa:
preserve PTE write permissions across a NUMA hinting fault").
Architecture can implement protnone in different ways and some may
choose to implement that by clearing Read/ Write/Exec bit of pte.
Setting the write bit on such pte can result in wrong behaviour. Fix
this up by allowing arch to override how to save the write bit on a
protnone pte.[aneesh.kumar@linux.vnet.ibm.com: don't mark pte saved write in case of dirty_accountable]
Link: http://lkml.kernel.org/r/1487942884-16517-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
[aneesh.kumar@linux.vnet.ibm.com: v3]
Link: http://lkml.kernel.org/r/1487498625-10891-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1487050314-3892-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Michael Neuling
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Architectures like ppc64, use privilege access bit to mark pte non
accessible. This implies that kernel can do a copy_to_user to an
address marked for numa fault. This also implies that there can be a
parallel hardware update for the pte. set_pte_at cannot be used in such
scenarios. Hence switch the pte update to use ptep_get_and_clear and
set_pte_at combination.[akpm@linux-foundation.org: remove unwanted ppc change, per Aneesh]
Link: http://lkml.kernel.org/r/1486400776-28114-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix whitespace issues, extraneous braces.
Link: http://lkml.kernel.org/r/1485992240-10986-5-git-send-email-me@tobin.cc
Signed-off-by: Tobin C Harding
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch fixes sparse warning: Using plain integer as NULL pointer.
Replaces assignment of 0 to pointer with NULL assignment.Link: http://lkml.kernel.org/r/1485992240-10986-2-git-send-email-me@tobin.cc
Signed-off-by: Tobin C Harding
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has
been somewhat painful with getting the flags set and removed at the
correct locations. More than one kernel oops was introduced due to
difficulties of getting the placement correctly.Remove the flag values and introduce an input parameter to huge_fault
that indicates the size of the page entry. This makes the code easier
to trace and should avoid the issues we see with the fault flags where
removal of the flag was necessary in the fallback paths.Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang
Tested-by: Dan Williams
Reviewed-by: Jan Kara
Cc: Matthew Wilcox
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Nilesh Choudhury
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The current transparent hugepage code only supports PMDs. This patch
adds support for transparent use of PUDs with DAX. It does not include
support for anonymous pages. x86 support code also added.Most of this patch simply parallels the work that was done for huge
PMDs. The only major difference is how the new ->pud_entry method in
mm_walk works. The ->pmd_entry method replaces the ->pte_entry method,
whereas the ->pud_entry method works along with either ->pmd_entry or
->pte_entry. The pagewalk code takes care of locking the PUD before
calling ->pud_walk, so handlers do not need to worry whether the PUD is
stable.[dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
[dave.jiang@intel.com: native_pud_clear missing on i386 build]
Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox
Signed-off-by: Dave Jiang
Tested-by: Alexander Kapshuk
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dan Williams
Cc: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Nilesh Choudhury
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "1G transparent hugepage support for device dax", v2.
The following series implements support for 1G trasparent hugepage on
x86 for device dax. The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX. I have
forward ported the relevant bits to 4.10-rc. The current submission has
only the necessary code to support device DAX.Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs. Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file. We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].[1]: https://lkml.org/lkml/2016/1/31/52
Comments from Nilesh @ Oracle:
There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:processes : 10,000
memory : 6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MBAs you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level. Memory sizes will keep increasing; so
this number will keep increasing.An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.This patch (of 3):
In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault. The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox
Signed-off-by: Dave Jiang
Signed-off-by: Ross Zwisler
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Dan Williams
Cc: Kirill A. Shutemov
Cc: Nilesh Choudhury
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Dave Jiang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.Remove the vma parameter to simplify things.
[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang
Signed-off-by: Arnd Bergmann
Reviewed-by: Ross Zwisler
Cc: Theodore Ts'o
Cc: Darrick J. Wong
Cc: Matthew Wilcox
Cc: Dave Hansen
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Feb, 2017
7 commits
-
There's no users of zap_page_range() who wants non-NULL 'details'.
Let's drop it.Link: http://lkml.kernel.org/r/20170118122429.43661-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Cc: Peter Zijlstra
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
detail == NULL would give the same functionality as
.check_swap_entries==true.Link: http://lkml.kernel.org/r/20170118122429.43661-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Cc: Peter Zijlstra
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The only user of ignore_dirty is oom-reaper. But it doesn't really use
it.ignore_dirty only has effect on file pages mapped with dirty pte. But
oom-repear skips shared VMAs, so there's no way we can dirty file pte in
them.Link: http://lkml.kernel.org/r/20170118122429.43661-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Cc: Tetsuo Handa
Cc: Peter Zijlstra
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The new routine copy_huge_page_from_user() uses kmap_atomic() to map
PAGE_SIZE pages. However, this prevents page faults in the subsequent
call to copy_from_user(). This is OK in the case where the routine is
copied with mmap_sema held. However, in another case we want to allow
page faults. So, add a new argument allow_pagefault to indicate if the
routine should allow page faults.[dan.carpenter@oracle.com: unmap the correct pointer]
Link: http://lkml.kernel.org/r/20170113082608.GA3548@mwanda
[akpm@linux-foundation.org: kunmap() takes a page*, per Hugh]
Link: http://lkml.kernel.org/r/20161216144821.5183-20-aarcange@redhat.com
Signed-off-by: Mike Kravetz
Signed-off-by: Andrea Arcangeli
Signed-off-by: Dan Carpenter
Cc: "Dr. David Alan Gilbert"
Cc: Hillf Danton
Cc: Michael Rapoport
Cc: Mike Rapoport
Cc: Pavel Emelyanov
Cc: Hugh Dickins
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
userfaultfd UFFDIO_COPY allows user level code to copy data to a page at
fault time. The data is copied from user space to a newly allocated
huge page. The new routine copy_huge_page_from_user performs this copy.Link: http://lkml.kernel.org/r/20161216144821.5183-17-aarcange@redhat.com
Signed-off-by: Mike Kravetz
Signed-off-by: Andrea Arcangeli
Cc: "Dr. David Alan Gilbert"
Cc: Hillf Danton
Cc: Michael Rapoport
Cc: Mike Kravetz
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
pmd_fault() and related functions really only need the vmf parameter since
the additional parameters are all included in the vmf struct. Remove the
additional parameter and simplify pmd_fault() and friends.Link: http://lkml.kernel.org/r/1484085142-2297-8-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang
Reviewed-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Dave Chinner
Cc: Dave Jiang
Cc: Matthew Wilcox
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of passing in multiple parameters in the pmd_fault() handler,
a vmf can be passed in just like a fault() handler. This will simplify
code and remove the need for the actual pmd fault handlers to allocate a
vmf. Related functions are also modified to do the same.[dave.jiang@intel.com: fix issue with xfs_tests stall when DAX option is off]
Link: http://lkml.kernel.org/r/148469861071.195597.3619476895250028518.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/1484085142-2297-7-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang
Reviewed-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Dave Chinner
Cc: Matthew Wilcox
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jan, 2017
2 commits
-
Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pmd_t of a DAX PMD entry during an *sync operation. This can result
in data loss in the following sequence:1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the
pmd_t dirty and writeable
2) fsync, flushing out PMD data and cleaning the radix tree entry. We
currently fail to mark the pmd_t as clean and write protected.
3) more mmap writes to the PMD. These don't cause any page faults since
the pmd_t is dirty and writeable. The radix tree entry remains clean.
4) fsync, which fails to flush the dirty PMD data because the radix tree
entry was clean.
5) crash - dirty data that should have been fsync'd as part of 4) could
still have been in the processor cache, and is lost.Fix this by marking the pmd_t clean and write protected in
dax_mapping_entry_mkclean(), which is called as part of the fsync
operation 2). This will cause the writes in step 3) above to generate
page faults where we'll re-dirty the PMD radix tree entry, resulting in
flushes in the fsync that happens in step 4).Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush")
Link: http://lkml.kernel.org/r/1482272586-21177-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Alexander Viro
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jan Kara
Cc: Matthew Wilcox
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "Write protect DAX PMDs in *sync path".
Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pmd_t of a DAX PMD entry during an *sync operation. This can result
in data loss, as detailed in patch 2.This series is based on Dan's "libnvdimm-pending" branch, which is the
current home for Jan's "dax: Page invalidation fixes" series. You can
find a working tree here:https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean
This patch (of 2):
Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a
huge page PMD leaf to be found and returned.Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Suggested-by: Dave Hansen
Cc: Alexander Viro
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jan Kara
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Jan, 2017
1 commit
-
4.10-rc loadtest (even on x86, and even without THPCache) fails with
"fork: Cannot allocate memory" or some such; and /proc/meminfo shows
PageTables growing.Commit 953c66c2b22a ("mm: THP page cache support for ppc64") that got
merged in rc1 removed the freeing of an unused preallocated pagetable
after do_fault_around() has called map_pages().This is usually a good optimization, so that the followup doesn't have
to reallocate one; but it's not sufficient to shift the freeing into
alloc_set_pte(), since there are failure cases (most commonly
VM_FAULT_RETRY) which never reach finish_fault().Check and free it at the outer level in do_fault(), then we don't need
to worry in alloc_set_pte(), and can restore that to how it was (I
cannot find any reason to pte_free() under lock as it was doing).And fix a separate pagetable leak, or crash, introduced by the same
change, that could only show up on some ppc64: why does do_set_pmd()'s
failure case attempt to withdraw a pagetable when it never deposited
one, at the same time overwriting (so leaking) the vmf->prealloc_pte?
Residue of an earlier implementation, perhaps? Delete it.Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64")
Cc: Aneesh Kumar K.V
Cc: Kirill A. Shutemov
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Balbir Singh
Cc: Andrew Morton
Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds
25 Dec, 2016
1 commit
-
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)to do the replacement at the end of the merge window.
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
15 Dec, 2016
13 commits
-
Merge more updates from Andrew Morton:
- a few misc things
- kexec updates
- DMA-mapping updates to better support networking DMA operations
- IPC updates
- various MM changes to improve DAX fault handling
- lots of radix-tree changes, mainly to the test suite. All leading up
to reimplementing the IDA/IDR code to be a wrapper layer over the
radix-tree. However the final trigger-pulling patch is held off for
4.11.* emailed patches from Andrew Morton : (114 commits)
radix tree test suite: delete unused rcupdate.c
radix tree test suite: add new tag check
radix-tree: ensure counts are initialised
radix tree test suite: cache recently freed objects
radix tree test suite: add some more functionality
idr: reduce the number of bits per level from 8 to 6
rxrpc: abstract away knowledge of IDR internals
tpm: use idr_find(), not idr_find_slowpath()
idr: add ida_is_empty
radix tree test suite: check multiorder iteration
radix-tree: fix replacement for multiorder entries
radix-tree: add radix_tree_split_preload()
radix-tree: add radix_tree_split
radix-tree: add radix_tree_join
radix-tree: delete radix_tree_range_tag_if_tagged()
radix-tree: delete radix_tree_locate_item()
radix-tree: improve multiorder iterators
btrfs: fix race in btrfs_free_dummy_fs_info()
radix-tree: improve dump output
radix-tree: make radix_tree_find_next_bit more useful
... -
Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
has released corresponding radix tree entry lock. When we want to
writeprotect PTE on cache flush, we need PTE modification to happen
under radix tree entry lock to ensure consistent updates of PTE and
radix tree (standard faults use page lock to ensure this consistency).
So move update of PTE bit into dax_pfn_mkwrite().Link: http://lkml.kernel.org/r/1479460644-25076-20-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
DAX will need to implement its own version of page_check_address(). To
avoid duplicating page table walking code, export follow_pte() which
does what we need.Link: http://lkml.kernel.org/r/1479460644-25076-18-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Cc: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently finish_mkwrite_fault() returns 0 when PTE got changed before
we acquired PTE lock and VM_FAULT_WRITE when we succeeded in modifying
the PTE. This is somewhat confusing since 0 generally means success, it
is also inconsistent with finish_fault() which returns 0 on success.
Change finish_mkwrite_fault() to return 0 on success and VM_FAULT_NOPAGE
when PTE changed. Practically, there should be no behavioral difference
since we bail out from the fault the same way regardless whether we
return 0, VM_FAULT_NOPAGE, or VM_FAULT_WRITE. Also note that
VM_FAULT_WRITE has no effect for shared mappings since the only two
places that check it - KSM and GUP - care about private mappings only.
Generally the meaning of VM_FAULT_WRITE for shared mappings is not well
defined and we should probably clean that up.Link: http://lkml.kernel.org/r/1479460644-25076-17-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Acked-by: Kirill A. Shutemov
Cc: Ross Zwisler
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Provide a helper function for finishing write faults due to PTE being
read-only. The helper will be used by DAX to avoid the need of
complicating generic MM code with DAX locking specifics.Link: http://lkml.kernel.org/r/1479460644-25076-16-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
wp_page_reuse() handles write shared faults which is needed only in
wp_page_shared(). Move the handling only into that location to make
wp_page_reuse() simpler and avoid a strange situation when we sometimes
pass in locked page, sometimes unlocked etc.Link: http://lkml.kernel.org/r/1479460644-25076-15-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
So far we set vmf->page during WP faults only when we needed to pass it
to the ->page_mkwrite handler. Set it in all the cases now and use that
instead of passing page pointer explicitly around.Link: http://lkml.kernel.org/r/1479460644-25076-14-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We will need more information in the ->page_mkwrite() helper for DAX to
be able to fully finish faults there. Pass vm_fault structure to
do_page_mkwrite() and use it there so that information propagates
properly from upper layers.Link: http://lkml.kernel.org/r/1479460644-25076-13-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently we duplicate handling of shared write faults in
wp_page_reuse() and do_shared_fault(). Factor them out into a common
function.Link: http://lkml.kernel.org/r/1479460644-25076-12-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move final handling of COW faults from generic code into DAX fault
handler. That way generic code doesn't have to be aware of
peculiarities of DAX locking so remove that knowledge and make locking
functions private to fs/dax.c.Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Acked-by: Kirill A. Shutemov
Reviewed-by: Ross Zwisler
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce finish_fault() as a helper function for finishing page faults.
It is rather thin wrapper around alloc_set_pte() but since we'd want to
call this from DAX code or filesystems, it is still useful to avoid some
boilerplate code.Link: http://lkml.kernel.org/r/1479460644-25076-10-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "dax: Clear dirty bits after flushing caches", v5.
Patchset to clear dirty bits from radix tree of DAX inodes when caches
for corresponding pfns have been flushed. In principle, these patches
enable handlers to easily update PTEs and do other work necessary to
finish the fault without duplicating the functionality present in the
generic code. I'd like to thank Kirill and Ross for reviews of the
series!This patch (of 20):
To allow full handling of COW faults add memcg field to struct vm_fault
and a return value of ->fault() handler meaning that COW fault is fully
handled and memcg charge must not be canceled. This will allow us to
remove knowledge about special DAX locking from the generic fault code.Link: http://lkml.kernel.org/r/1479460644-25076-9-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add orig_pte field to vm_fault structure to allow ->page_mkwrite
handlers to fully handle the fault.This also allows us to save some passing of extra arguments around.
Link: http://lkml.kernel.org/r/1479460644-25076-8-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Ross Zwisler
Acked-by: Kirill A. Shutemov
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds