07 Feb, 2018

3 commits

  • We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
    It's essentially a joined iteration in search for a non-zero bit, which is
    currently implemented as a lookup join (find a nonzero bit on the lhs,
    lookup the rhs to see if it's set there).

    Implement a direct join (find a nonzero bit on the incrementally built
    join). Also add generic bitmap benchmarks in the new `test_find_bit`
    module for new function (see `find_next_and_bit` in [2] and [3] below).

    For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
    faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
    usage. Note that on Arm, the new pure-C implementation still outperforms
    the old one that uses a mix of C and asm (`find_next_bit`) [3].

    [1] Approximate benchmark code:

    ```
    unsigned long src1p[nr_cpumask_longs] = {pattern1};
    unsigned long src2p[nr_cpumask_longs] = {pattern2};
    for (/*a bunch of repetitions*/) {
    for (int n = -1; n ]
    Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
    Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
    Signed-off-by: Clement Courbet
    Signed-off-by: Geert Uytterhoeven
    Cc: Yury Norov
    Cc: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton

    Signed-off-by: Linus Torvalds

    Clement Courbet
     
  • As suggested in review comments:
    * printk: align numbers using whitespaces instead of tabs;
    * return error value from init() to avoid calling rmmod if testing again;
    * use ktime_get instead of get_cycles as some arches don't support it;

    The output in dmesg (on QEMU arm64):
    [ 38.823430] Start testing find_bit() with random-filled bitmap
    [ 38.845358] find_next_bit: 20138448 ns, 163968 iterations
    [ 38.856217] find_next_zero_bit: 10615328 ns, 163713 iterations
    [ 38.863564] find_last_bit: 7111888 ns, 163967 iterations
    [ 40.944796] find_first_bit: 2081007216 ns, 163968 iterations
    [ 40.944975]
    [ 40.944975] Start testing find_bit() with sparse bitmap
    [ 40.945268] find_next_bit: 73216 ns, 656 iterations
    [ 40.967858] find_next_zero_bit: 22461008 ns, 327025 iterations
    [ 40.968047] find_last_bit: 62320 ns, 656 iterations
    [ 40.978060] find_first_bit: 9889360 ns, 656 iterations

    Link: http://lkml.kernel.org/r/20171124143040.a44jvhmnaiyedg2i@yury-thinkpad
    Signed-off-by: Yury Norov
    Tested-by: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Clement Courbet
    Cc: Matthew Wilcox
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • As suggested in review comments, rename test_find_bit.c to
    find_bit_benchmark.c.

    Link: http://lkml.kernel.org/r/20171124143040.a44jvhmnaiyedg2i@yury-thinkpad
    Signed-off-by: Yury Norov
    Tested-by: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Clement Courbet
    Cc: Matthew Wilcox
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov