07 Feb, 2018

1 commit

  • We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
    It's essentially a joined iteration in search for a non-zero bit, which is
    currently implemented as a lookup join (find a nonzero bit on the lhs,
    lookup the rhs to see if it's set there).

    Implement a direct join (find a nonzero bit on the incrementally built
    join). Also add generic bitmap benchmarks in the new `test_find_bit`
    module for new function (see `find_next_and_bit` in [2] and [3] below).

    For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
    faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
    usage. Note that on Arm, the new pure-C implementation still outperforms
    the old one that uses a mix of C and asm (`find_next_bit`) [3].

    [1] Approximate benchmark code:

    ```
    unsigned long src1p[nr_cpumask_longs] = {pattern1};
    unsigned long src2p[nr_cpumask_longs] = {pattern2};
    for (/*a bunch of repetitions*/) {
    for (int n = -1; n ]
    Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
    Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
    Signed-off-by: Clement Courbet
    Signed-off-by: Geert Uytterhoeven
    Cc: Yury Norov
    Cc: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton

    Signed-off-by: Linus Torvalds

    Clement Courbet
     

25 Feb, 2017

1 commit

  • This saves 32 bytes on my x86-64 build, mostly due to alignment
    considerations and sharing more code between find_next_bit and
    find_next_zero_bit, but it does save a couple of instructions.

    There's really two parts to this commit:
    - First, the first half of the test: (!nbits || start >= nbits) is
    trivially a subset of the second half, since nbits and start are both
    unsigned
    - Second, while looking at the disassembly, I noticed that GCC was
    predicting the branch taken. Since this is a failure case, it's
    clearly the less likely of the two branches, so add an unlikely() to
    override GCC's heuristics.

    [mawilcox@microsoft.com: v2]
    Link: http://lkml.kernel.org/r/1483709016-1834-1-git-send-email-mawilcox@linuxonhyperv.com
    Link: http://lkml.kernel.org/r/1483709016-1834-1-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Acked-by: Yury Norov
    Acked-by: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

17 Apr, 2015

1 commit

  • This file contains implementation for all find_*_bit{,_le}
    So giving it more generic name looks reasonable.

    Signed-off-by: Yury Norov
    Reviewed-by: Rasmus Villemoes
    Reviewed-by: George Spelvin
    Cc: Alexey Klimov
    Cc: David S. Miller
    Cc: Daniel Borkmann
    Cc: Hannes Frederic Sowa
    Cc: Lai Jiangshan
    Cc: Mark Salter
    Cc: AKASHI Takahiro
    Cc: Thomas Graf
    Cc: Valentin Rothberg
    Cc: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov