21 May, 2016

40 commits

  • These don't belong in radix-tree.h any more than PAGECACHE_TAG_* do.
    Let's try to maintain the idea that radix-tree simply implements an
    abstract data type.

    Signed-off-by: NeilBrown
    Reviewed-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Now that the shift amount is stored in the node, radix_tree_descend()
    can calculate offset itself from index, which removes several lines of
    code from each of the tree walkers.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In addition to replacing the entry, we also clear all associated tags.
    This is really a one-off special for page_cache_tree_delete() which had
    far too much detailed knowledge about how the radix tree works.

    For efficiency, factor node_tag_clear() out of radix_tree_tag_clear() It
    can be used by radix_tree_delete_item() as well as
    radix_tree_replace_clear_tags().

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • 1. Rename the existing variable 'slot' to 'child'.
    2. Introduce a new variable called 'slot' which is the address of the
    slot we're dealing with. This lets us simplify the tree insertion,
    and removes the recalculation of 'slot' at the end of the function.
    3. Using 'slot' in the sibling pointer insertion part makes the code
    more readable.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Convert radix_tree_range_tag_if_tagged to name the nodes parent, node
    and child instead of node & slot.

    Use parent->offset instead of playing games with 'upindex'.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Convert radix_tree_next_chunk to use 'child' instead of 'slot' as the
    name of the child node. Also use node_maxindex() where it makes sense.

    The 'rnode' variable was unnecessary; it doesn't overlap in usage with
    'node', so we can just use 'node' the whole way through the function.

    Improve the testcase to start the walk from every index in the carefully
    constructed tree, and to accept any index within the range covered by
    the entry.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the more standard 'node' and 'child' instead of 'to_free' and
    'slot'.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • As with indirect_to_ptr(), ptr_to_indirect() and
    RADIX_TREE_INDIRECT_PTR, change radix_tree_is_indirect_ptr() to
    radix_tree_is_internal_node().

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Mirrors the earlier commit introducing node_to_entry().

    Also change the type returned to be a struct radix_tree_node pointer.
    That lets us simplify a couple of places in the radix tree shrink &
    extend paths where we could convert an entry into a pointer, modify the
    node, then convert the pointer back into an entry.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • ptr_to_indirect() was a bad name. What it really means is "Convert this
    pointer to a node into an entry suitable for storing in the radix tree".
    So node_to_entry() seemed like a better name.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The name RADIX_TREE_INDIRECT_PTR doesn't really match the meaning.
    RADIX_TREE_INTERNAL_NODE is a better name.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The only remaining references to root->height were in extend and shrink,
    where it was updated. Now we can remove it entirely.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • verify_node() can use node->shift instead of the height.

    tree_verify_min_height() can be converted over to using node_maxindex()
    and shift_maxindex() instead of radix_tree_maxindex().

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If radix_tree_shrink returns whether it managed to shrink, then
    __radix_tree_delete_node doesn't ned to query the tree to find out
    whether it did any work or not.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • node->shift represents the shift necessary for looking in the slots
    array at this level. It is equal to the old (node->height - 1) *
    RADIX_TREE_MAP_SHIFT.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Neither piece of information we're storing in node->path can be larger
    than 64, so store each in its own unsigned char instead of shifting and
    masking to store them both in an unsigned int.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Typos, whitespace, grammar, line length, using the correct types, etc.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • radix_tree_is_indirect_ptr() is an internal API. The correct call to
    use is radix_tree_deref_retry() which has the appropriate unlikely()
    annotation.

    Fixes: c6400ba7e13a ("drivers/hwspinlock: fix race between radix tree insertion and lookup")
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The multiorder support is a sufficiently large feature to be worth
    adding copyrigt lines for.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • - Print which indices are covered by every leaf entry
    - Print sibling entries
    - Print the node pointer instead of the slot entry
    - Build by default in userspace, and make it accessible to the test-suite

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • I had previously decided that tagging a single multiorder entry would
    count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now
    believe that decision to be a mistake, and it should count as a single
    entry. That's more likely to be what callers expect.

    When walking back up the tree from a newly-tagged entry, the current
    code assumed we were starting from the lowest level of the tree; if we
    have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in
    size then we need to shift the index by 'shift' before we start walking
    back up the tree, or we will end up not setting tags on higher entries,
    and then mistakenly thinking that entries below a certain point in the
    tree are not tagged.

    If the first index we examine is a sibling entry of a tagged multiorder
    entry, we were not tagging it. We need to examine the canonical entry,
    and the easiest way to do that is to use radix_tree_descend(). We then
    have to skip over sibling slots when looking for the next entry in the
    tree or we will end up walking back to the canonical entry.

    Add several tests for radix_tree_range_tag_if_tagged().

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Add a unit test that provides coverage for the bug fixed in the commit
    entitled "radix-tree: rewrite radix_tree_locate_item fix" from Hugh
    Dickins. I've verified that this test fails before his patch due to
    miscalculated 'index' values in __locate() in lib/radix-tree.c, and
    passes with his fix.

    Link: http://lkml.kernel.org/r/1462307263-20623-1-git-send-email-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_locate_item(). Modify the locate tests to test multiorder
    entries too.

    [hughd@google.com: radix_tree_locate_item() is often returning the wrong index]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1605012108490.1166@eggly.anvils
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If the radix tree user attempted to insert a colliding entry with an
    existing multiorder entry, then radix_tree_create() could encounter a
    sibling entry when walking down the tree to look for a slot. Use
    radix_tree_descend() to fix the problem, and add a test-case to make
    sure the problem doesn't come back in future.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Add a generic test for multi-order tag verification, and call it using
    several different configurations.

    This test creates a multi-order radix tree using the given index and
    order, and then sets, checks and clears tags using the indices covered
    by the single multi-order radix tree entry.

    With the various calls done by this test we verify root multi-order
    entries without siblings, multi-order entries without siblings in a
    radix tree node, as well as multi-order entries with siblings of various
    sizes.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_get()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_clear()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_set()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Add a unit test to verify that we can iterate over multi-order entries
    properly via a radix_tree_for_each_slot() loop.

    This was done with a single, somewhat complicated configuration that was
    meant to test many of the various corner cases having to do with
    multi-order entries:

    - An iteration could begin at a sibling entry, and we need to return the
    canonical entry.
    - We could have entries of various orders in the same slots[] array.
    - We could have multi-order entries at a nonzero height, followed by
    indirect pointers to more radix tree nodes later in that same slots[]
    array.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • This enables the macros radix_tree_for_each_slot() and friends to be
    used with multi-order entries.

    The way that this works is that we treat all entries in a given slots[]
    array as a single chunk. If the index given to radix_tree_next_chunk()
    happens to point us to a sibling entry, we will back up iter->index so
    that it points to the canonical entry, and that will be the place where
    we start our iteration.

    As we're processing a chunk in radix_tree_next_slot(), we process
    canonical entries, skip over sibling entries, and restart the chunk
    lookup if we find a non-sibling indirect pointer. This drops back to
    the radix_tree_next_chunk() code, which will re-walk the tree and look
    for another chunk.

    This allows us to properly handle multi-order entries mixed with other
    entries that are at various heights in the radix tree.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • These BUG_ON tests are to ensure that all the tags are clear when
    inserting a new entry. If we insert a multiorder entry, we'll end up
    looking at the tags for a different node, and so the BUG_ON can end up
    triggering spuriously.

    Also, we now have three tags, not two, so check all three are clear, and
    check all the root tags with a single call to BUG_ON since the bits are
    stored contiguously.

    Include a test-case to ensure this problem does not reoccur.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the new multi-order support functions to rewrite __radix_tree_lookup()

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Setting the indirect bit on the user data entry used to be unambiguous
    because the tree walking code knew not to expect internal nodes in the
    last level of the tree. Multiorder entries can appear at any level of
    the tree, and a leaf with the indirect bit set is indistinguishable from
    a pointer to a node.

    Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid
    user entry, nor a valid pointer to a node. The radix_tree_deref_retry()
    function continues to work the same way, but tree walking code can
    distinguish it from a pointer to a node.

    Also fix the condition for setting slot->parent to NULL; it does not
    matter what height the tree is, it only matters whether slot is an
    indirect pointer. Move this code above the comment which is referring
    to the assignment to root->rnode.

    Also fix the condition for preventing the tree from shrinking to a
    single entry if it's a multiorder entry.

    Add a test-case to the test suite that checks that the tree goes back
    down to its original height after an item is inserted & deleted from a
    higher index in the tree.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Test suite infrastructure for working with multiorder entries.

    The test itself is pretty basic: Add an entry, check that all expected
    indices return that entry and that indices around that entry don't
    return an entry. Then delete the entry and check no index returns that
    entry. Tests a few edge conditions including the multiorder entry at
    index 0 and at a higher index. Also tests deleting through an alias as
    well as through the canonical index.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The current code will insert entries at each level, as if we're going to
    add a new entry at the bottom level, so we then get an -EEXIST when we
    try to insert the entry into the tree. The best way to fix this is to
    not check 'order' when inserting into an empty tree.

    We still need to 'extend' the tree to the height necessary for the maximum
    index corresponding to this entry, so pass that value to
    radix_tree_extend() rather than the index we're asked to create, or we
    won't create a tree that's deep enough.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • All the tree walking functions start with some variant of this code;
    centralise it in one place so we're not chasing subtly different bugs
    everywhere.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Now that sibling pointers are handled explicitly, there is no purpose
    served by restricting the order to be >= RADIX_TREE_MAP_SHIFT.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If we deleted an entry through an index which looked up a sibling
    pointer, we'd end up zeroing out the wrong slots in the node. Use
    get_slot_offset() to find the right slot.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The subtraction was the wrong way round, leading to undefined behaviour
    (shift by an amount larger than the size of the type).

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The code I previously added to enable multiorder radix tree entries was
    untested and therefore buggy. This commit adds the support functions
    that Ross and I decided were necessary over a four-week period of
    iterating various designs.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox