07 Feb, 2018
3 commits
-
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which is
currently implemented as a lookup join (find a nonzero bit on the lhs,
lookup the rhs to see if it's set there).Implement a direct join (find a nonzero bit on the incrementally built
join). Also add generic bitmap benchmarks in the new `test_find_bit`
module for new function (see `find_next_and_bit` in [2] and [3] below).For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
usage. Note that on Arm, the new pure-C implementation still outperforms
the old one that uses a mix of C and asm (`find_next_bit`) [3].[1] Approximate benchmark code:
```
unsigned long src1p[nr_cpumask_longs] = {pattern1};
unsigned long src2p[nr_cpumask_longs] = {pattern2};
for (/*a bunch of repetitions*/) {
for (int n = -1; n ]
Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
Signed-off-by: Clement Courbet
Signed-off-by: Geert Uytterhoeven
Cc: Yury Norov
Cc: Geert Uytterhoeven
Cc: Alexey Dobriyan
Cc: Rasmus Villemoes
Signed-off-by: Andrew MortonSigned-off-by: Linus Torvalds
-
As suggested in review comments:
* printk: align numbers using whitespaces instead of tabs;
* return error value from init() to avoid calling rmmod if testing again;
* use ktime_get instead of get_cycles as some arches don't support it;The output in dmesg (on QEMU arm64):
[ 38.823430] Start testing find_bit() with random-filled bitmap
[ 38.845358] find_next_bit: 20138448 ns, 163968 iterations
[ 38.856217] find_next_zero_bit: 10615328 ns, 163713 iterations
[ 38.863564] find_last_bit: 7111888 ns, 163967 iterations
[ 40.944796] find_first_bit: 2081007216 ns, 163968 iterations
[ 40.944975]
[ 40.944975] Start testing find_bit() with sparse bitmap
[ 40.945268] find_next_bit: 73216 ns, 656 iterations
[ 40.967858] find_next_zero_bit: 22461008 ns, 327025 iterations
[ 40.968047] find_last_bit: 62320 ns, 656 iterations
[ 40.978060] find_first_bit: 9889360 ns, 656 iterationsLink: http://lkml.kernel.org/r/20171124143040.a44jvhmnaiyedg2i@yury-thinkpad
Signed-off-by: Yury Norov
Tested-by: Geert Uytterhoeven
Cc: Alexey Dobriyan
Cc: Clement Courbet
Cc: Matthew Wilcox
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As suggested in review comments, rename test_find_bit.c to
find_bit_benchmark.c.Link: http://lkml.kernel.org/r/20171124143040.a44jvhmnaiyedg2i@yury-thinkpad
Signed-off-by: Yury Norov
Tested-by: Geert Uytterhoeven
Cc: Alexey Dobriyan
Cc: Clement Courbet
Cc: Matthew Wilcox
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds