15 Dec, 2016

40 commits

  • radix_tree_join() was freeing nodes with a non-zero ->exceptional count,
    and radix_tree_split() wasn't zeroing ->exceptional when it allocated
    the new node. Fix this by making all callers of radix_tree_node_alloc()
    pass in the new counts (and some other always-initialised fields), which
    will prevent the problem recurring if in future we decide to do
    something similar.

    Link: http://lkml.kernel.org/r/1481667692-14500-3-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Cc: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The kmem_cache_alloc implementation simply allocates new memory from
    malloc() and calls the ctor, which zeroes out the entire object. This
    means it cannot spot bugs where the object isn't properly reinitialised
    before being freed.

    Add a small (11 objects) cache before freeing objects back to malloc.
    This is enough to let us write a test to catch it, although the memory
    allocator is now aware of the structure of the radix tree node, since it
    chains free objects through ->private_data (like the percpu cache does).

    Link: http://lkml.kernel.org/r/1481667692-14500-2-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Cc: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • IDR needs more functionality from the kernel: kmalloc()/kfree(), and
    xchg().

    Link: http://lkml.kernel.org/r/1480369871-5271-67-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In preparation for merging the IDR and radix tree, reduce the fanout at
    each level from 256 to 64. If this causes a performance problem then a
    bisect will point to this commit, and we'll have a better idea about
    what we might do to fix it.

    Link: http://lkml.kernel.org/r/1480369871-5271-66-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
    to IDR_SIZE.

    Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Reviewed-by: David Howells
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • idr_find_slowpath() is not intended to be part of the public API, it's
    an implementation detail. There's no reason to skip straight to the
    slowpath here.

    Link: http://lkml.kernel.org/r/1480369871-5271-64-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Peter Huewe
    Cc: Marcel Selhorst
    Cc: Jarkko Sakkinen
    Cc: Jason Gunthorpe
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Two of the USB Gadgets were poking around in the internals of struct ida
    in order to determine if it is empty. Add the appropriate abstraction.

    Link: http://lkml.kernel.org/r/1480369871-5271-63-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Acked-by: Konstantin Khlebnikov
    Tested-by: Kirill A. Shutemov
    Cc: Ross Zwisler
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Michal Nazarewicz
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The random iteration test only inserts order-0 entries currently.
    Update it to insert entries of order between 7 and 0. Also make the
    maximum index configurable, make some variables static, make the test
    duration variable, remove some useless spinning, and add a fifth thread
    which calls tag_tagged_items().

    Link: http://lkml.kernel.org/r/1480369871-5271-62-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • When replacing an entry with NULL, we need to delete any sibling
    entries. Also account deleting exceptional entries properly. Also fix
    a bug with radix_tree_iter_replace() where we would fail to remove
    entirely freed nodes. Also fix accounting bug when switching between
    normal and exceptional entries with replace_slot. Also add testcases
    for all these bugs.

    Link: http://lkml.kernel.org/r/1480369871-5271-61-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Calculate how many nodes we need to allocate to split an old_order entry
    into multiple entries, each of size new_order. The test suite checks
    that we allocated exactly the right number of nodes; neither too many
    (checked by rtp->nr == 0), nor too few (checked by comparing
    nr_allocated before and after the call to radix_tree_split()).

    Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This new function splits a larger multiorder entry into smaller entries
    (potentially multi-order entries). These entries are initialised to
    RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
    confused. The caller should then call radix_tree_for_each_slot() and
    radix_tree_replace_slot() in order to turn these retry entries into the
    intended new entries. Tags are replicated from the original multiorder
    entry into each new entry.

    Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This new function allows for the replacement of many smaller entries in
    the radix tree with one larger multiorder entry. From the point of view
    of an RCU walker, they may see a mixture of the smaller entries and the
    large entry during the same walk, but they will never see NULL for an
    index which was populated before the join.

    Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This is an exceptionally complicated function with just one caller
    (tag_pages_for_writeback). We devote a large portion of the runtime of
    the test suite to testing this one function which has one caller. By
    introducing the new function radix_tree_iter_tag_set(), we can eliminate
    all of the complexity while keeping the performance. The caller can now
    use a fairly standard radix_tree_for_each() loop, and it doesn't need to
    worry about tricksy things like 'start' wrapping.

    The test suite continues to spend a large amount of time investigating
    this function, but now it's testing the underlying primitives such as
    radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
    which are also used by other parts of the kernel.

    Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This rather complicated function can be better implemented as an
    iterator. It has only one caller, so move the functionality to the only
    place that needs it. Update the test suite to follow the same pattern.

    Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Acked-by: Konstantin Khlebnikov
    Tested-by: Kirill A. Shutemov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This fixes several interlinked problems with the iterators in the
    presence of multiorder entries.

    1. radix_tree_iter_next() would only advance by one slot, which would
    result in the iterators returning the same entry more than once if
    there were sibling entries.

    2. radix_tree_next_slot() could return an internal pointer instead of
    a user pointer if a tagged multiorder entry was immediately followed by
    an entry of lower order.

    3. radix_tree_next_slot() expanded to a lot more code than it used to
    when multiorder support was compiled in. And I wasn't comfortable with
    entry_to_node() being in a header file.

    Fixing radix_tree_iter_next() for the presence of sibling entries
    necessarily involves examining the contents of the radix tree, so we now
    need to pass 'slot' to radix_tree_iter_next(), and we need to change the
    calling convention so it is called *before* dropping the lock which
    protects the tree. Also rename it to radix_tree_iter_resume(), as some
    people thought it was necessary to call radix_tree_iter_next() each time
    around the loop.

    radix_tree_next_slot() becomes closer to how it looked before multiorder
    support was introduced. It only checks to see if the next entry in the
    chunk is a sibling entry or a pointer to a node; this should be rare
    enough that handling this case out of line is not a performance impact
    (and such impact is amortised by the fact that the entry we just
    processed was a multiorder entry). Also, radix_tree_next_slot() used to
    force a new chunk lookup for untagged entries, which is more expensive
    than the out of line sibling entry skipping.

    Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • We drop the lock which protects the radix tree, so we must call
    radix_tree_iter_next() in order to avoid a modification to the tree
    invalidating the iterator state.

    Link: http://lkml.kernel.org/r/1480369871-5271-54-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Print the indices of the entries as unsigned (instead of signed)
    integers and print the parent node of each entry to help navigate around
    larger trees where the layout is not quite so obvious. Print the
    indices covered by a node. Rearrange the order of fields printed so the
    indices and parents line up for each type of entry.

    Link: http://lkml.kernel.org/r/1480369871-5271-53-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Since this function is specialised to the radix tree, pass in the node
    and tag to calculate the address of the bitmap in
    radix_tree_find_next_bit() instead of the caller. Likewise, there is no
    need to pass in the size of the bitmap.

    Link: http://lkml.kernel.org/r/1480369871-5271-52-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Similar to node_tag_clear(), factor node_tag_set() out of
    radix_tree_range_tag_if_tagged().

    Link: http://lkml.kernel.org/r/1480369871-5271-51-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • I want to be able to reference node->parent after freeing node.

    Currently node->parent is in a union with rcu_head, so it is overwritten
    when the node is put on the RCU list. We know that private_list is not
    referenced after the node is freed, so it is safe for these two members
    to share space.

    Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Remove the old find_next_bit code in favour of linking in the find_bit
    code from tools/lib.

    Link: http://lkml.kernel.org/r/1480369871-5271-48-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • I need the following functions for the radix tree:

    bitmap_fill
    bitmap_empty
    bitmap_full

    Copy the implementations from include/linux/bitmap.h

    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This probably doubles the size of each item allocated by the test suite
    but it lets us check a few more things, and may be needed for upcoming
    API changes that require the caller pass in the order of the entry.

    Link: http://lkml.kernel.org/r/1480369871-5271-46-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • item_kill_tree() assumes that everything in the tree is a pointer to a
    struct item, which is annoying when testing the behaviour of exceptional
    entries. Fix it to delete exceptional entries on the assumption they
    don't need to be freed.

    Link: http://lkml.kernel.org/r/1480369871-5271-45-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Calling rcu_barrier() allows all of the rcu-freed memory to be actually
    returned to the pool, and allows nr_allocated to return to 0. As well
    as allowing diffs between runs to be more useful, it also lets us
    pinpoint leaks more effectively.

    Link: http://lkml.kernel.org/r/1480369871-5271-44-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This adds simple benchmark for iterator similar to one I've used for
    commit 78c1d78488a3 ("radix-tree: introduce bit-optimized iterator")

    Building with make BENCHMARK=1 set radix tree order to 6, this allows to
    get performance comparable to in kernel performance.

    Link: http://lkml.kernel.org/r/1480369871-5271-43-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Each thread needs to register itself with RCU, otherwise the reading
    thread's read lock has no effect and the freeing thread will free the
    memory in the tree without waiting for the read lock to be dropped.

    Link: http://lkml.kernel.org/r/1480369871-5271-42-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Instead of reseeding the random number generator every time around the
    loop in big_gang_check(), seed it at the beginning of execution. Use
    rand_r() and an independent base seed for each thread in
    iteration_test() so they don't stomp all over each others state. Since
    this particular test depends on the kernel scheduler, the iteration test
    can't be reproduced based purely on the random seed, but at least it
    won't pollute the other tests.

    Print the seed, and allow the seed to be specified so that a run which
    hits a problem can be reproduced.

    Link: http://lkml.kernel.org/r/1480369871-5271-41-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • It can be a source of mild concern when the test suite shows that we're
    leaking nodes. While poring over the source code looking for leaks can
    lead to some fascinating bugs being discovered, sometimes the leak is
    simply that these nodes were preallocated and are sitting on the per-CPU
    list. Free them by calling the CPU dead callback.

    Link: http://lkml.kernel.org/r/1480369871-5271-40-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Rather than simply NOP out preempt_enable() and preempt_disable(), keep
    track of preempt_count and display it regularly in case either the test
    suite or the code under test is forgetting to balance the enables &
    disables. Only found a test-case that was forgetting to re-enable
    preemption, but it's a possibility worth checking.

    Link: http://lkml.kernel.org/r/1480369871-5271-39-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In order to test the preload code, it is necessary to fail GFP_ATOMIC
    allocations, which requires defining GFP_KERNEL and GFP_ATOMIC properly.
    Remove the obsolete __GFP_WAIT and copy the definitions of the __GFP
    flags which are used from the kernel include files. We also need the
    real definition of gfpflags_allow_blocking() to persuade the radix tree
    to actually use its preallocated nodes.

    Link: http://lkml.kernel.org/r/1480369871-5271-38-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Patch series "Radix tree patches for 4.10", v3.

    Mostly these are improvements; the only bug fixes in here relate to
    multiorder entries (which are unused in the 4.9 tree).

    This patch (of 32):

    The radix tree uses its own buggy WARN_ON_ONCE. Replace it with the
    definition from asm-generic/bug.h

    Link: http://lkml.kernel.org/r/1480369871-5271-37-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Currently we never clear dirty tags in DAX mappings and thus address
    ranges to flush accumulate. Now that we have locking of radix tree
    entries, we have all the locking necessary to reliably clear the radix
    tree dirty tag when flushing caches for corresponding address range.
    Similarly to page_mkclean() we also have to write-protect pages to get a
    page fault when the page is next written to so that we can mark the
    entry dirty again.

    Link: http://lkml.kernel.org/r/1479460644-25076-21-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
    has released corresponding radix tree entry lock. When we want to
    writeprotect PTE on cache flush, we need PTE modification to happen
    under radix tree entry lock to ensure consistent updates of PTE and
    radix tree (standard faults use page lock to ensure this consistency).
    So move update of PTE bit into dax_pfn_mkwrite().

    Link: http://lkml.kernel.org/r/1479460644-25076-20-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Currently, flushing of caches for DAX mappings was ignoring entry lock.
    So far this was ok (modulo a bug that a difference in entry lock could
    cause cache flushing to be mistakenly skipped) but in the following
    patches we will write-protect PTEs on cache flushing and clear dirty
    tags. For that we will need more exclusion. So do cache flushing under
    an entry lock. This allows us to remove one lock-unlock pair of
    mapping->tree_lock as a bonus.

    Link: http://lkml.kernel.org/r/1479460644-25076-19-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • DAX will need to implement its own version of page_check_address(). To
    avoid duplicating page table walking code, export follow_pte() which
    does what we need.

    Link: http://lkml.kernel.org/r/1479460644-25076-18-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Currently finish_mkwrite_fault() returns 0 when PTE got changed before
    we acquired PTE lock and VM_FAULT_WRITE when we succeeded in modifying
    the PTE. This is somewhat confusing since 0 generally means success, it
    is also inconsistent with finish_fault() which returns 0 on success.
    Change finish_mkwrite_fault() to return 0 on success and VM_FAULT_NOPAGE
    when PTE changed. Practically, there should be no behavioral difference
    since we bail out from the fault the same way regardless whether we
    return 0, VM_FAULT_NOPAGE, or VM_FAULT_WRITE. Also note that
    VM_FAULT_WRITE has no effect for shared mappings since the only two
    places that check it - KSM and GUP - care about private mappings only.
    Generally the meaning of VM_FAULT_WRITE for shared mappings is not well
    defined and we should probably clean that up.

    Link: http://lkml.kernel.org/r/1479460644-25076-17-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Acked-by: Kirill A. Shutemov
    Cc: Ross Zwisler
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Provide a helper function for finishing write faults due to PTE being
    read-only. The helper will be used by DAX to avoid the need of
    complicating generic MM code with DAX locking specifics.

    Link: http://lkml.kernel.org/r/1479460644-25076-16-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Ross Zwisler
    Acked-by: Kirill A. Shutemov
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • wp_page_reuse() handles write shared faults which is needed only in
    wp_page_shared(). Move the handling only into that location to make
    wp_page_reuse() simpler and avoid a strange situation when we sometimes
    pass in locked page, sometimes unlocked etc.

    Link: http://lkml.kernel.org/r/1479460644-25076-15-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Ross Zwisler
    Acked-by: Kirill A. Shutemov
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara