10 Dec, 2016

1 commit

  • This reverts commit 53855d10f4567a0577360b6448d52a863929775b.

    It shouldn't have come in yet - it depends on the changes in linux-next
    that will come in during the next merge window. As Matthew Wilcox says,
    the test suite is broken with the current state without the revert.

    Requested-by: Matthew Wilcox
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Dec, 2016

1 commit

  • Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test
    suite as it adds a call to cpuhp_setup_state_nocalls() which is not
    currently emulated in the test suite. Add it, and delete the emulation
    of the old CPU hotplug mechanism.

    Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

06 Oct, 2016

1 commit

  • When the underflow checks were added to workingset_node_shadow_dec(),
    they triggered immediately:

    kernel BUG at ./include/linux/swap.h:276!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
    soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
    CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1
    Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
    task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
    RIP: page_cache_tree_insert+0xf1/0x100
    Call Trace:
    __add_to_page_cache_locked+0x12e/0x270
    add_to_page_cache_lru+0x4e/0xe0
    mpage_readpages+0x112/0x1d0
    blkdev_readpages+0x1d/0x20
    __do_page_cache_readahead+0x1ad/0x290
    force_page_cache_readahead+0xaa/0x100
    page_cache_sync_readahead+0x3f/0x50
    generic_file_read_iter+0x5af/0x740
    blkdev_read_iter+0x35/0x40
    __vfs_read+0xe1/0x130
    vfs_read+0x96/0x130
    SyS_read+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x13/0x8f
    Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 0b e8 88 68 ef ff 0f 1f 84 00
    RIP page_cache_tree_insert+0xf1/0x100

    This is a long-standing bug in the way shadow entries are accounted in
    the radix tree nodes. The shrinker needs to know when radix tree nodes
    contain only shadow entries, no pages, so node->count is split in half
    to count shadows in the upper bits and pages in the lower bits.

    Unfortunately, the radix tree implementation doesn't know of this and
    assumes all entries are in node->count. When there is a shadow entry
    directly in root->rnode and the tree is later extended, the radix tree
    implementation will copy that entry into the new node and and bump its
    node->count, i.e. increases the page count bits. Once the shadow gets
    removed and we subtract from the upper counter, node->count underflows
    and triggers the warning. Afterwards, without node->count reaching 0
    again, the radix tree node is leaked.

    Limit shadow entries to when we have actual radix tree nodes and can
    count them properly. That means we lose the ability to detect refaults
    from files that had only the first page faulted in at eviction time.

    Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
    Signed-off-by: Johannes Weiner
    Reported-and-tested-by: Linus Torvalds
    Reviewed-by: Jan Kara
    Cc: Andrew Morton
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

26 Sep, 2016

1 commit

  • The fixes to the radix tree test suite show that the multi-order case is
    broken. The basic reason is that the radix tree code uses tagged
    pointers with the "internal" bit in the low bits, and calculating the
    pointer indices was supposed to mask off those bits. But gcc will
    notice that we then use the index to re-create the pointer, and will
    avoid doing the arithmetic and use the tagged pointer directly.

    This cleans the code up, using the existing is_sibling_entry() helper to
    validate the sibling pointer range (instead of open-coding it), and
    using entry_to_node() to mask off the low tag bit from the pointer. And
    once you do that, you might as well just use the now cleaned-up pointer
    directly.

    [ Side note: the multi-order code isn't actually ever used in the kernel
    right now, and the only reason I didn't just delete all that code is
    that Kirill Shutemov piped up and said:

    "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
    It also converts shmem-with-huge-pages and hugetlb to them.

    I'm okay with converting it to other mechanism, but I need
    something. (I looked into Konstantin's RFC patchset[2]. It looks
    okay, but I don't feel myself qualified to review it as I don't
    know much about radix-tree internals.)"

    [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
    [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ]

    Reported-by: Matthew Wilcox
    Cc: Andrew Morton
    Cc: Ross Zwisler
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Cedric Blancher
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Aug, 2016

1 commit

  • Radix trees may be used not only for storing page cache pages, so
    unconditionally accounting radix tree nodes to the current memory cgroup
    is bad: if a radix tree node is used for storing data shared among
    different cgroups we risk pinning dead memory cgroups forever.

    So let's only account radix tree nodes if it was explicitly requested by
    passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to
    account page cache entries, so mark mapping->page_tree so.

    Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup")
    Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: [4.6+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

27 Jul, 2016

1 commit

  • The new helper is similar to radix_tree_maybe_preload(), but tries to
    preload number of nodes required to insert (1 << order) continuous
    naturally-aligned elements.

    This is required to push huge pages into pagecache.

    Link: http://lkml.kernel.org/r/1466021202-61880-24-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

21 May, 2016

34 commits

  • Now that the shift amount is stored in the node, radix_tree_descend()
    can calculate offset itself from index, which removes several lines of
    code from each of the tree walkers.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In addition to replacing the entry, we also clear all associated tags.
    This is really a one-off special for page_cache_tree_delete() which had
    far too much detailed knowledge about how the radix tree works.

    For efficiency, factor node_tag_clear() out of radix_tree_tag_clear() It
    can be used by radix_tree_delete_item() as well as
    radix_tree_replace_clear_tags().

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • 1. Rename the existing variable 'slot' to 'child'.
    2. Introduce a new variable called 'slot' which is the address of the
    slot we're dealing with. This lets us simplify the tree insertion,
    and removes the recalculation of 'slot' at the end of the function.
    3. Using 'slot' in the sibling pointer insertion part makes the code
    more readable.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Convert radix_tree_range_tag_if_tagged to name the nodes parent, node
    and child instead of node & slot.

    Use parent->offset instead of playing games with 'upindex'.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Convert radix_tree_next_chunk to use 'child' instead of 'slot' as the
    name of the child node. Also use node_maxindex() where it makes sense.

    The 'rnode' variable was unnecessary; it doesn't overlap in usage with
    'node', so we can just use 'node' the whole way through the function.

    Improve the testcase to start the walk from every index in the carefully
    constructed tree, and to accept any index within the range covered by
    the entry.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the more standard 'node' and 'child' instead of 'to_free' and
    'slot'.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • As with indirect_to_ptr(), ptr_to_indirect() and
    RADIX_TREE_INDIRECT_PTR, change radix_tree_is_indirect_ptr() to
    radix_tree_is_internal_node().

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Mirrors the earlier commit introducing node_to_entry().

    Also change the type returned to be a struct radix_tree_node pointer.
    That lets us simplify a couple of places in the radix tree shrink &
    extend paths where we could convert an entry into a pointer, modify the
    node, then convert the pointer back into an entry.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • ptr_to_indirect() was a bad name. What it really means is "Convert this
    pointer to a node into an entry suitable for storing in the radix tree".
    So node_to_entry() seemed like a better name.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The name RADIX_TREE_INDIRECT_PTR doesn't really match the meaning.
    RADIX_TREE_INTERNAL_NODE is a better name.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The only remaining references to root->height were in extend and shrink,
    where it was updated. Now we can remove it entirely.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If radix_tree_shrink returns whether it managed to shrink, then
    __radix_tree_delete_node doesn't ned to query the tree to find out
    whether it did any work or not.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • node->shift represents the shift necessary for looking in the slots
    array at this level. It is equal to the old (node->height - 1) *
    RADIX_TREE_MAP_SHIFT.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Neither piece of information we're storing in node->path can be larger
    than 64, so store each in its own unsigned char instead of shifting and
    masking to store them both in an unsigned int.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Typos, whitespace, grammar, line length, using the correct types, etc.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The multiorder support is a sufficiently large feature to be worth
    adding copyrigt lines for.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • - Print which indices are covered by every leaf entry
    - Print sibling entries
    - Print the node pointer instead of the slot entry
    - Build by default in userspace, and make it accessible to the test-suite

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • I had previously decided that tagging a single multiorder entry would
    count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now
    believe that decision to be a mistake, and it should count as a single
    entry. That's more likely to be what callers expect.

    When walking back up the tree from a newly-tagged entry, the current
    code assumed we were starting from the lowest level of the tree; if we
    have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in
    size then we need to shift the index by 'shift' before we start walking
    back up the tree, or we will end up not setting tags on higher entries,
    and then mistakenly thinking that entries below a certain point in the
    tree are not tagged.

    If the first index we examine is a sibling entry of a tagged multiorder
    entry, we were not tagging it. We need to examine the canonical entry,
    and the easiest way to do that is to use radix_tree_descend(). We then
    have to skip over sibling slots when looking for the next entry in the
    tree or we will end up walking back to the canonical entry.

    Add several tests for radix_tree_range_tag_if_tagged().

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the new multi-order support functions to rewrite
    radix_tree_locate_item(). Modify the locate tests to test multiorder
    entries too.

    [hughd@google.com: radix_tree_locate_item() is often returning the wrong index]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1605012108490.1166@eggly.anvils
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If the radix tree user attempted to insert a colliding entry with an
    existing multiorder entry, then radix_tree_create() could encounter a
    sibling entry when walking down the tree to look for a slot. Use
    radix_tree_descend() to fix the problem, and add a test-case to make
    sure the problem doesn't come back in future.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_get()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_clear()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_set()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • This enables the macros radix_tree_for_each_slot() and friends to be
    used with multi-order entries.

    The way that this works is that we treat all entries in a given slots[]
    array as a single chunk. If the index given to radix_tree_next_chunk()
    happens to point us to a sibling entry, we will back up iter->index so
    that it points to the canonical entry, and that will be the place where
    we start our iteration.

    As we're processing a chunk in radix_tree_next_slot(), we process
    canonical entries, skip over sibling entries, and restart the chunk
    lookup if we find a non-sibling indirect pointer. This drops back to
    the radix_tree_next_chunk() code, which will re-walk the tree and look
    for another chunk.

    This allows us to properly handle multi-order entries mixed with other
    entries that are at various heights in the radix tree.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • These BUG_ON tests are to ensure that all the tags are clear when
    inserting a new entry. If we insert a multiorder entry, we'll end up
    looking at the tags for a different node, and so the BUG_ON can end up
    triggering spuriously.

    Also, we now have three tags, not two, so check all three are clear, and
    check all the root tags with a single call to BUG_ON since the bits are
    stored contiguously.

    Include a test-case to ensure this problem does not reoccur.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the new multi-order support functions to rewrite __radix_tree_lookup()

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Setting the indirect bit on the user data entry used to be unambiguous
    because the tree walking code knew not to expect internal nodes in the
    last level of the tree. Multiorder entries can appear at any level of
    the tree, and a leaf with the indirect bit set is indistinguishable from
    a pointer to a node.

    Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid
    user entry, nor a valid pointer to a node. The radix_tree_deref_retry()
    function continues to work the same way, but tree walking code can
    distinguish it from a pointer to a node.

    Also fix the condition for setting slot->parent to NULL; it does not
    matter what height the tree is, it only matters whether slot is an
    indirect pointer. Move this code above the comment which is referring
    to the assignment to root->rnode.

    Also fix the condition for preventing the tree from shrinking to a
    single entry if it's a multiorder entry.

    Add a test-case to the test suite that checks that the tree goes back
    down to its original height after an item is inserted & deleted from a
    higher index in the tree.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The current code will insert entries at each level, as if we're going to
    add a new entry at the bottom level, so we then get an -EEXIST when we
    try to insert the entry into the tree. The best way to fix this is to
    not check 'order' when inserting into an empty tree.

    We still need to 'extend' the tree to the height necessary for the maximum
    index corresponding to this entry, so pass that value to
    radix_tree_extend() rather than the index we're asked to create, or we
    won't create a tree that's deep enough.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • All the tree walking functions start with some variant of this code;
    centralise it in one place so we're not chasing subtly different bugs
    everywhere.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Now that sibling pointers are handled explicitly, there is no purpose
    served by restricting the order to be >= RADIX_TREE_MAP_SHIFT.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If we deleted an entry through an index which looked up a sibling
    pointer, we'd end up zeroing out the wrong slots in the node. Use
    get_slot_offset() to find the right slot.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The subtraction was the wrong way round, leading to undefined behaviour
    (shift by an amount larger than the size of the type).

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The code I previously added to enable multiorder radix tree entries was
    untested and therefore buggy. This commit adds the support functions
    that Ross and I decided were necessary over a four-week period of
    iterating various designs.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • I've been receiving increasingly concerned notes from 0day about how
    much my recent changes have been bloating the radix tree. Make it
    happier by only including multiorder support if
    CONFIG_TRANSPARENT_HUGEPAGES is set.

    This is an independent Kconfig option, so other radix tree users can
    also set it if they have a need.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox