31 May, 2016

2 commits

  • Use '+ 0' and '+ 1' as offsets, like they were intended, instead of
    adding to the result.

    Fixes: 2b1b0d66704a ("lib/uuid.c: introduce a few more generic helpers")
    Signed-off-by: Bjørn Mork
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Linus Torvalds

    Bjørn Mork
     
  • It appears that somehow I missed a test of the latest UUID rework which
    landed in the kernel. Present a small test module to avoid such cases
    in the future.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

29 May, 2016

2 commits

  • Pull string hash improvements from George Spelvin:
    "This series does several related things:

    - Makes the dcache hash (fs/namei.c) useful for general kernel use.

    (Thanks to Bruce for noticing the zero-length corner case)

    - Converts the string hashes in to use the
    above.

    - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
    32-bit multiplies will do well enough.

    - Rids the world of the bad hash multipliers in hash_32.

    This finishes the job started in commit 689de1d6ca95 ("Minimal
    fix-up of bad hashing behavior of hash_64()")

    The vast majority of Linux architectures have hardware support for
    32x32-bit multiply and so derive no benefit from "simplified"
    multipliers.

    The few processors that do not (68000, h8/300 and some models of
    Microblaze) have arch-specific implementations added. Those
    patches are last in the series.

    - Overhauls the dcache hash mixing.

    The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
    CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
    Replaced with a much more careful design that's simultaneously
    faster and better. (My own invention, as there was noting suitable
    in the literature I could find. Comments welcome!)

    - Modify the hash_name() loop to skip the initial HASH_MIX(). This
    would let us salt the hash if we ever wanted to.

    - Sort out partial_name_hash().

    The hash function is declared as using a long state, even though
    it's truncated to 32 bits at the end and the extra internal state
    contributes nothing to the result. And some callers do odd things:

    - fs/hfs/string.c only allocates 32 bits of state
    - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

    - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
    rather than 0..sizeof(long)-1. This would simplify users other
    than full_name_hash"

    Special thanks to Bruce Fields for testing and finding bugs in v1. (I
    learned some humbling lessons about "obviously correct" code.)

    On the arch-specific front, the m68k assembly has been tested in a
    standalone test harness, I've been in contact with the Microblaze
    maintainers who mostly don't care, as the hardware multiplier is never
    omitted in real-world applications, and I haven't heard anything from
    the H8/300 world"

    * 'hash' of git://ftp.sciencehorizons.net/linux:
    h8300: Add
    microblaze: Add
    m68k: Add
    : Add support for architecture-specific functions
    fs/namei.c: Improve dcache hash function
    Eliminate bad hash multipliers from hash_32() and hash_64()
    Change hash_64() return value to 32 bits
    : Define hash_str() in terms of hashlen_string()
    fs/namei.c: Add hashlen_string() function
    Pull out string hash to

    Linus Torvalds
     
  • This is just the infrastructure; there are no users yet.

    This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
    the existence of .

    That file may define its own versions of various functions, and define
    HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

    Included is a self-test (in lib/test_hash.c) that verifies the basics.
    It is NOT in general required that the arch-specific functions compute
    the same thing as the generic, but if a HAVE_* symbol is defined with
    the value 1, then equality is tested.

    Signed-off-by: George Spelvin
    Cc: Geert Uytterhoeven
    Cc: Greg Ungerer
    Cc: Andreas Schwab
    Cc: Philippe De Muyter
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: Alistair Francis
    Cc: Michal Simek
    Cc: Yoshinori Sato
    Cc: uclinux-h8-devel@lists.sourceforge.jp

    George Spelvin
     

27 May, 2016

1 commit

  • With netconsole (at least) the pr_err("... disablingn") call can
    recurse back into the dma-debug code, where it'll try to grab
    free_entries_lock again. Avoid the problem by doing the printk after
    dropping the lock.

    Link: http://lkml.kernel.org/r/1463678421-18683-1-git-send-email-ville.syrjala@linux.intel.com
    Signed-off-by: Ville Syrjälä
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ville Syrjälä
     

26 May, 2016

2 commits

  • Pull vfs iov_iter regression fix from Al Viro:
    "Fix for braino in 'fold checks into iterate_and_advance()'"

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    do "fold checks into iterate_and_advance()" right

    Linus Torvalds
     
  • the only case when we should skip the iterate_and_advance() guts
    is when nothing's left in the iterator, _not_ just when requested
    amount is 0. Said guts will do nothing in the latter case anyway;
    the problem we tried to deal with in the aforementioned commit is
    that when there's nothing left *and* the amount requested is 0,
    we might end up deferencing one iovec too many; the value we fetch
    from there is discarded in that case, but theoretically it might
    oops if the iovec array ends exactly at the end of page with the
    next page not mapped.

    Bailing out on zero size requested had an unexpected side effect -
    zero-length segment in the beginning of iovec array ended up
    throwing do_loop_readv_writev() into infinite spin; we do not
    advance past the empty segment at all. Reproducer is trivial:
    echo '#include ' >a.c
    echo 'main() {char c; struct iovec v[] = {{&c,0},{&c,1}}; readv(0,v,2);}' >>a.c
    cc a.c && ./a.out

    Al Viro
     

24 May, 2016

1 commit

  • With VT=n, the kernel build fails with:

    drivers/built-in.o: In function `kgdboc_pre_exp_handler':
    kgdboc.c:(.text+0x7b5aa): undefined reference to `fg_console'
    kgdboc.c:(.text+0x7b5ce): undefined reference to `vc_cons'
    kgdboc.c:(.text+0x7b5d5): undefined reference to `vc_cons'

    kgdboc.o is built when KGDB_SERIAL_CONSOLE is set. So make
    KGDB_SERIAL_CONSOLE depend on HW_CONSOLE which includes those symbols.

    Link: http://lkml.kernel.org/r/1459412955-4696-1-git-send-email-jslaby@suse.cz
    Signed-off-by: Jiri Slaby
    Reported-by: "Jim Davis"
    Acked-by: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

21 May, 2016

32 commits

  • Merge more updates from Andrew Morton:

    - the rest of MM

    - KASAN updates

    - procfs updates

    - exit, fork updates

    - printk updates

    - lib/ updates

    - radix-tree testsuite updates

    - checkpatch updates

    - kprobes updates

    - a few other misc bits

    * emailed patches from Andrew Morton : (162 commits)
    samples/kprobes: print out the symbol name for the hooks
    samples/kprobes: add a new module parameter
    kprobes: add the "tls" argument for j_do_fork
    init/main.c: simplify initcall_blacklisted()
    fs/efs/super.c: fix return value
    checkpatch: improve --git shortcut
    checkpatch: reduce number of `git log` calls with --git
    checkpatch: add support to check already applied git commits
    checkpatch: add --list-types to show message types to show or ignore
    checkpatch: advertise the --fix and --fix-inplace options more
    checkpatch: whine about ACCESS_ONCE
    checkpatch: add test for keywords not starting on tabstops
    checkpatch: improve CONSTANT_COMPARISON test for structure members
    checkpatch: add PREFER_IS_ENABLED test
    lib/GCD.c: use binary GCD algorithm instead of Euclidean
    radix-tree: free up the bottom bit of exceptional entries for reuse
    dax: move RADIX_DAX_ definitions to dax.c
    radix-tree: make radix_tree_descend() more useful
    radix-tree: introduce radix_tree_replace_clear_tags()
    radix-tree: tidy up __radix_tree_create()
    ...

    Linus Torvalds
     
  • Pull driver core updates from Greg KH:
    "Here's the "big" driver core update for 4.7-rc1.

    Mostly just debugfs changes, the long-known and messy races with
    removing debugfs files should be fixed thanks to the great work of
    Nicolai Stange. We also have some isa updates in here (the x86
    maintainers told me to take it through this tree), a new warning when
    we run out of dynamic char major numbers, and a few other assorted
    changes, details in the shortlog.

    All have been in linux-next for some time with no reported issues"

    * tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
    Revert "base: dd: don't remove driver_data in -EPROBE_DEFER case"
    gpio: ws16c48: Utilize the ISA bus driver
    gpio: 104-idio-16: Utilize the ISA bus driver
    gpio: 104-idi-48: Utilize the ISA bus driver
    gpio: 104-dio-48e: Utilize the ISA bus driver
    watchdog: ebc-c384_wdt: Utilize the ISA bus driver
    iio: stx104: Utilize the module_isa_driver and max_num_isa_dev macros
    iio: stx104: Add X86 dependency to STX104 Kconfig option
    Documentation: Add ISA bus driver documentation
    isa: Implement the max_num_isa_dev macro
    isa: Implement the module_isa_driver macro
    pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS
    isa: Decouple X86_32 dependency from the ISA Kconfig option
    driver-core: use 'dev' argument in dev_dbg_ratelimited stub
    base: dd: don't remove driver_data in -EPROBE_DEFER case
    kernfs: Move faulting copy_user operations outside of the mutex
    devcoredump: add scatterlist support
    debugfs: unproxify files created through debugfs_create_u32_array()
    debugfs: unproxify files created through debugfs_create_blob()
    debugfs: unproxify files created through debugfs_create_bool()
    ...

    Linus Torvalds
     
  • The binary GCD algorithm is based on the following facts:
    1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
    2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
    3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

    Even on x86 machines with reasonable division hardware, the binary
    algorithm runs about 25% faster (80% the execution time) than the
    division-based Euclidian algorithm.

    On platforms like Alpha and ARMv6 where division is a function call to
    emulation code, it's even more significant.

    There are two variants of the code here, depending on whether a fast
    __ffs (find least significant set bit) instruction is available. This
    allows the unpredictable branches in the bit-at-a-time shifting loop to
    be eliminated.

    If fast __ffs is not available, the "even/odd" GCD variant is used.

    I use the following code to benchmark:

    #include
    #include
    #include
    #include
    #include
    #include

    #define swap(a, b) \
    do { \
    a ^= b; \
    b ^= a; \
    a ^= b; \
    } while (0)

    unsigned long gcd0(unsigned long a, unsigned long b)
    {
    unsigned long r;

    if (a < b) {
    swap(a, b);
    }

    if (b == 0)
    return a;

    while ((r = a % b) != 0) {
    a = b;
    b = r;
    }

    return b;
    }

    unsigned long gcd1(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd2(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    unsigned long gcd3(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);
    if (b == 1)
    return r & -r;

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == 1)
    return r & -r;
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd4(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;
    if (b == r)
    return r;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == r)
    return r;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
    gcd0, gcd1, gcd2, gcd3, gcd4,
    };

    #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

    #if defined(__x86_64__)

    #define rdtscll(val) do { \
    unsigned long __a,__d; \
    __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
    (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<= start)
    ret = end - start;
    else
    ret = ~0ULL - start + 1 + end;

    *res = gcd_res;
    return ret;
    }

    #else

    static inline struct timespec read_time(void)
    {
    struct timespec time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
    return time;
    }

    static inline unsigned long long diff_time(struct timespec start, struct timespec end)
    {
    struct timespec temp;

    if ((end.tv_nsec - start.tv_nsec) < 0) {
    temp.tv_sec = end.tv_sec - start.tv_sec - 1;
    temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
    } else {
    temp.tv_sec = end.tv_sec - start.tv_sec;
    temp.tv_nsec = end.tv_nsec - start.tv_nsec;
    }

    return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
    }

    static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
    unsigned long a, unsigned long b, unsigned long *res)
    {
    struct timespec start, end;
    unsigned long gcd_res;

    start = read_time();
    gcd_res = gcd(a, b);
    end = read_time();

    *res = gcd_res;
    return diff_time(start, end);
    }

    #endif

    static inline unsigned long get_rand()
    {
    if (sizeof(long) == 8)
    return (unsigned long)rand() << 32 | rand();
    else
    return rand();
    }

    int main(int argc, char **argv)
    {
    unsigned int seed = time(0);
    int loops = 100;
    int repeats = 1000;
    unsigned long (*res)[TEST_ENTRIES];
    unsigned long long elapsed[TEST_ENTRIES];
    int i, j, k;

    for (;;) {
    int opt = getopt(argc, argv, "n:r:s:");
    /* End condition always first */
    if (opt == -1)
    break;

    switch (opt) {
    case 'n':
    loops = atoi(optarg);
    break;
    case 'r':
    repeats = atoi(optarg);
    break;
    case 's':
    seed = strtoul(optarg, NULL, 10);
    break;
    default:
    /* You won't actually get here. */
    break;
    }
    }

    res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
    memset(elapsed, 0, sizeof(elapsed));

    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    /* Do we have args? */
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    unsigned long long min_elapsed[TEST_ENTRIES];
    for (k = 0; k < repeats; k++) {
    for (i = 0; i < TEST_ENTRIES; i++) {
    unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
    if (k == 0 || min_elapsed[i] > tmp)
    min_elapsed[i] = tmp;
    }
    }
    for (i = 0; i < TEST_ENTRIES; i++)
    elapsed[i] += min_elapsed[i];
    }

    for (i = 0; i < TEST_ENTRIES; i++)
    printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

    k = 0;
    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    for (i = 1; i < TEST_ENTRIES; i++) {
    if (res[j][i] != res[j][0])
    break;
    }
    if (i < TEST_ENTRIES) {
    if (k == 0) {
    k = 1;
    fprintf(stderr, "Error:\n");
    }
    fprintf(stderr, "gcd(%lu, %lu): ", a, b);
    for (i = 0; i < TEST_ENTRIES; i++)
    fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
    }
    }

    if (k == 0)
    fprintf(stderr, "PASS\n");

    free(res);

    return 0;
    }

    Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 10174
    gcd1: elapsed 2120
    gcd2: elapsed 2902
    gcd3: elapsed 2039
    gcd4: elapsed 2812
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9309
    gcd1: elapsed 2280
    gcd2: elapsed 2822
    gcd3: elapsed 2217
    gcd4: elapsed 2710
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9589
    gcd1: elapsed 2098
    gcd2: elapsed 2815
    gcd3: elapsed 2030
    gcd4: elapsed 2718
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9914
    gcd1: elapsed 2309
    gcd2: elapsed 2779
    gcd3: elapsed 2228
    gcd4: elapsed 2709
    PASS

    [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
    Signed-off-by: Zhaoxiu Zeng
    Signed-off-by: George Spelvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhaoxiu Zeng
     
  • Now that the shift amount is stored in the node, radix_tree_descend()
    can calculate offset itself from index, which removes several lines of
    code from each of the tree walkers.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In addition to replacing the entry, we also clear all associated tags.
    This is really a one-off special for page_cache_tree_delete() which had
    far too much detailed knowledge about how the radix tree works.

    For efficiency, factor node_tag_clear() out of radix_tree_tag_clear() It
    can be used by radix_tree_delete_item() as well as
    radix_tree_replace_clear_tags().

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • 1. Rename the existing variable 'slot' to 'child'.
    2. Introduce a new variable called 'slot' which is the address of the
    slot we're dealing with. This lets us simplify the tree insertion,
    and removes the recalculation of 'slot' at the end of the function.
    3. Using 'slot' in the sibling pointer insertion part makes the code
    more readable.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Convert radix_tree_range_tag_if_tagged to name the nodes parent, node
    and child instead of node & slot.

    Use parent->offset instead of playing games with 'upindex'.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Convert radix_tree_next_chunk to use 'child' instead of 'slot' as the
    name of the child node. Also use node_maxindex() where it makes sense.

    The 'rnode' variable was unnecessary; it doesn't overlap in usage with
    'node', so we can just use 'node' the whole way through the function.

    Improve the testcase to start the walk from every index in the carefully
    constructed tree, and to accept any index within the range covered by
    the entry.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the more standard 'node' and 'child' instead of 'to_free' and
    'slot'.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • As with indirect_to_ptr(), ptr_to_indirect() and
    RADIX_TREE_INDIRECT_PTR, change radix_tree_is_indirect_ptr() to
    radix_tree_is_internal_node().

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Mirrors the earlier commit introducing node_to_entry().

    Also change the type returned to be a struct radix_tree_node pointer.
    That lets us simplify a couple of places in the radix tree shrink &
    extend paths where we could convert an entry into a pointer, modify the
    node, then convert the pointer back into an entry.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • ptr_to_indirect() was a bad name. What it really means is "Convert this
    pointer to a node into an entry suitable for storing in the radix tree".
    So node_to_entry() seemed like a better name.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The name RADIX_TREE_INDIRECT_PTR doesn't really match the meaning.
    RADIX_TREE_INTERNAL_NODE is a better name.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The only remaining references to root->height were in extend and shrink,
    where it was updated. Now we can remove it entirely.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If radix_tree_shrink returns whether it managed to shrink, then
    __radix_tree_delete_node doesn't ned to query the tree to find out
    whether it did any work or not.

    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • node->shift represents the shift necessary for looking in the slots
    array at this level. It is equal to the old (node->height - 1) *
    RADIX_TREE_MAP_SHIFT.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Neither piece of information we're storing in node->path can be larger
    than 64, so store each in its own unsigned char instead of shifting and
    masking to store them both in an unsigned int.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Typos, whitespace, grammar, line length, using the correct types, etc.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The multiorder support is a sufficiently large feature to be worth
    adding copyrigt lines for.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • - Print which indices are covered by every leaf entry
    - Print sibling entries
    - Print the node pointer instead of the slot entry
    - Build by default in userspace, and make it accessible to the test-suite

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • I had previously decided that tagging a single multiorder entry would
    count as tagging 2^order entries for the purposes of 'nr_to_tag'. I now
    believe that decision to be a mistake, and it should count as a single
    entry. That's more likely to be what callers expect.

    When walking back up the tree from a newly-tagged entry, the current
    code assumed we were starting from the lowest level of the tree; if we
    have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in
    size then we need to shift the index by 'shift' before we start walking
    back up the tree, or we will end up not setting tags on higher entries,
    and then mistakenly thinking that entries below a certain point in the
    tree are not tagged.

    If the first index we examine is a sibling entry of a tagged multiorder
    entry, we were not tagging it. We need to examine the canonical entry,
    and the easiest way to do that is to use radix_tree_descend(). We then
    have to skip over sibling slots when looking for the next entry in the
    tree or we will end up walking back to the canonical entry.

    Add several tests for radix_tree_range_tag_if_tagged().

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the new multi-order support functions to rewrite
    radix_tree_locate_item(). Modify the locate tests to test multiorder
    entries too.

    [hughd@google.com: radix_tree_locate_item() is often returning the wrong index]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1605012108490.1166@eggly.anvils
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • If the radix tree user attempted to insert a colliding entry with an
    existing multiorder entry, then radix_tree_create() could encounter a
    sibling entry when walking down the tree to look for a slot. Use
    radix_tree_descend() to fix the problem, and add a test-case to make
    sure the problem doesn't come back in future.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_get()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_clear()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Use the new multi-order support functions to rewrite
    radix_tree_tag_set()

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • This enables the macros radix_tree_for_each_slot() and friends to be
    used with multi-order entries.

    The way that this works is that we treat all entries in a given slots[]
    array as a single chunk. If the index given to radix_tree_next_chunk()
    happens to point us to a sibling entry, we will back up iter->index so
    that it points to the canonical entry, and that will be the place where
    we start our iteration.

    As we're processing a chunk in radix_tree_next_slot(), we process
    canonical entries, skip over sibling entries, and restart the chunk
    lookup if we find a non-sibling indirect pointer. This drops back to
    the radix_tree_next_chunk() code, which will re-walk the tree and look
    for another chunk.

    This allows us to properly handle multi-order entries mixed with other
    entries that are at various heights in the radix tree.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Matthew Wilcox
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • These BUG_ON tests are to ensure that all the tags are clear when
    inserting a new entry. If we insert a multiorder entry, we'll end up
    looking at the tags for a different node, and so the BUG_ON can end up
    triggering spuriously.

    Also, we now have three tags, not two, so check all three are clear, and
    check all the root tags with a single call to BUG_ON since the bits are
    stored contiguously.

    Include a test-case to ensure this problem does not reoccur.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the new multi-order support functions to rewrite __radix_tree_lookup()

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Setting the indirect bit on the user data entry used to be unambiguous
    because the tree walking code knew not to expect internal nodes in the
    last level of the tree. Multiorder entries can appear at any level of
    the tree, and a leaf with the indirect bit set is indistinguishable from
    a pointer to a node.

    Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid
    user entry, nor a valid pointer to a node. The radix_tree_deref_retry()
    function continues to work the same way, but tree walking code can
    distinguish it from a pointer to a node.

    Also fix the condition for setting slot->parent to NULL; it does not
    matter what height the tree is, it only matters whether slot is an
    indirect pointer. Move this code above the comment which is referring
    to the assignment to root->rnode.

    Also fix the condition for preventing the tree from shrinking to a
    single entry if it's a multiorder entry.

    Add a test-case to the test suite that checks that the tree goes back
    down to its original height after an item is inserted & deleted from a
    higher index in the tree.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The current code will insert entries at each level, as if we're going to
    add a new entry at the bottom level, so we then get an -EEXIST when we
    try to insert the entry into the tree. The best way to fix this is to
    not check 'order' when inserting into an empty tree.

    We still need to 'extend' the tree to the height necessary for the maximum
    index corresponding to this entry, so pass that value to
    radix_tree_extend() rather than the index we're asked to create, or we
    won't create a tree that's deep enough.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • All the tree walking functions start with some variant of this code;
    centralise it in one place so we're not chasing subtly different bugs
    everywhere.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox