12 May, 2018

1 commit

  • test_find_first_bit() is intentionally sub-optimal, and may cause soft
    lockup due to long time of run on some systems. So decrease length of
    bitmap to traverse to avoid lockup.

    With the change below, time of test execution doesn't exceed 0.2 seconds
    on my testing system.

    Link: http://lkml.kernel.org/r/20180420171949.15710-1-ynorov@caviumnetworks.com
    Fixes: 4441fca0a27f5 ("lib: test module for find_*_bit() functions")
    Signed-off-by: Yury Norov
    Reviewed-by: Andrew Morton
    Reported-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     

07 Feb, 2018

3 commits

  • We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
    It's essentially a joined iteration in search for a non-zero bit, which is
    currently implemented as a lookup join (find a nonzero bit on the lhs,
    lookup the rhs to see if it's set there).

    Implement a direct join (find a nonzero bit on the incrementally built
    join). Also add generic bitmap benchmarks in the new `test_find_bit`
    module for new function (see `find_next_and_bit` in [2] and [3] below).

    For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
    faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
    usage. Note that on Arm, the new pure-C implementation still outperforms
    the old one that uses a mix of C and asm (`find_next_bit`) [3].

    [1] Approximate benchmark code:

    ```
    unsigned long src1p[nr_cpumask_longs] = {pattern1};
    unsigned long src2p[nr_cpumask_longs] = {pattern2};
    for (/*a bunch of repetitions*/) {
    for (int n = -1; n ]
    Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
    Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
    Signed-off-by: Clement Courbet
    Signed-off-by: Geert Uytterhoeven
    Cc: Yury Norov
    Cc: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton

    Signed-off-by: Linus Torvalds

    Clement Courbet
     
  • As suggested in review comments:
    * printk: align numbers using whitespaces instead of tabs;
    * return error value from init() to avoid calling rmmod if testing again;
    * use ktime_get instead of get_cycles as some arches don't support it;

    The output in dmesg (on QEMU arm64):
    [ 38.823430] Start testing find_bit() with random-filled bitmap
    [ 38.845358] find_next_bit: 20138448 ns, 163968 iterations
    [ 38.856217] find_next_zero_bit: 10615328 ns, 163713 iterations
    [ 38.863564] find_last_bit: 7111888 ns, 163967 iterations
    [ 40.944796] find_first_bit: 2081007216 ns, 163968 iterations
    [ 40.944975]
    [ 40.944975] Start testing find_bit() with sparse bitmap
    [ 40.945268] find_next_bit: 73216 ns, 656 iterations
    [ 40.967858] find_next_zero_bit: 22461008 ns, 327025 iterations
    [ 40.968047] find_last_bit: 62320 ns, 656 iterations
    [ 40.978060] find_first_bit: 9889360 ns, 656 iterations

    Link: http://lkml.kernel.org/r/20171124143040.a44jvhmnaiyedg2i@yury-thinkpad
    Signed-off-by: Yury Norov
    Tested-by: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Clement Courbet
    Cc: Matthew Wilcox
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • As suggested in review comments, rename test_find_bit.c to
    find_bit_benchmark.c.

    Link: http://lkml.kernel.org/r/20171124143040.a44jvhmnaiyedg2i@yury-thinkpad
    Signed-off-by: Yury Norov
    Tested-by: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Clement Courbet
    Cc: Matthew Wilcox
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov