17 Oct, 2020

1 commit

  • kernel.h is being used as a dump for all kinds of stuff for a long time.
    Here is the attempt to start cleaning it up by splitting out min()/max()
    et al. helpers.

    At the same time convert users in header and lib folder to use new header.
    Though for time being include new header back to kernel.h to avoid
    twisted indirected includes for other existing users.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Cc: "Rafael J. Wysocki"
    Cc: Steven Rostedt
    Cc: Rasmus Villemoes
    Cc: Joe Perches
    Cc: Linus Torvalds
    Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.com
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

01 Feb, 2020

3 commits

  • It saves 25% of .text for arm64, and more for BE architectures.

    Before:
    $ size lib/find_bit.o
    text data bss dec hex filename
    1012 56 0 1068 42c lib/find_bit.o

    After:
    $ size lib/find_bit.o
    text data bss dec hex filename
    776 56 0 832 340 lib/find_bit.o

    Link: http://lkml.kernel.org/r/20200103202846.21616-3-yury.norov@gmail.com
    Signed-off-by: Yury Norov
    Cc: Thomas Gleixner
    Cc: Allison Randal
    Cc: William Breathitt Gray
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • _find_next_bit and _find_next_bit_le are very similar functions. It's
    possible to join them by adding 1 parameter and a couple of simple
    checks. It's simplify maintenance and make possible to shrink the size
    of .text by un-inlining the unified function (in the following patch).

    Link: http://lkml.kernel.org/r/20200103202846.21616-2-yury.norov@gmail.com
    Signed-off-by: Yury Norov
    Cc: Allison Randal
    Cc: Joe Perches
    Cc: Thomas Gleixner
    Cc: William Breathitt Gray
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     
  • ext2_swab() is defined locally in lib/find_bit.c However it is not
    specific to ext2, neither to bitmaps.

    There are many potential users of it, so rename it to just swab() and
    move to include/uapi/linux/swab.h

    ABI guarantees that size of unsigned long corresponds to BITS_PER_LONG,
    therefore drop unneeded cast.

    Link: http://lkml.kernel.org/r/20200103202846.21616-1-yury.norov@gmail.com
    Signed-off-by: Yury Norov
    Cc: Allison Randal
    Cc: Joe Perches
    Cc: Thomas Gleixner
    Cc: William Breathitt Gray
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov
     

05 Dec, 2019

1 commit

  • Pach series "Introduce the for_each_set_clump8 macro", v18.

    While adding GPIO get_multiple/set_multiple callback support for various
    drivers, I noticed a pattern of looping manifesting that would be useful
    standardized as a macro.

    This patchset introduces the for_each_set_clump8 macro and utilizes it
    in several GPIO drivers. The for_each_set_clump macro8 facilitates a
    for-loop syntax that iterates over a memory region entire groups of set
    bits at a time.

    For example, suppose you would like to iterate over a 32-bit integer 8
    bits at a time, skipping over 8-bit groups with no set bit, where
    XXXXXXXX represents the current 8-bit group:

    Example: 10111110 00000000 11111111 00110011
    First loop: 10111110 00000000 11111111 XXXXXXXX
    Second loop: 10111110 00000000 XXXXXXXX 00110011
    Third loop: XXXXXXXX 00000000 11111111 00110011

    Each iteration of the loop returns the next 8-bit group that has at
    least one set bit.

    The for_each_set_clump8 macro has four parameters:

    * start: set to the bit offset of the current clump
    * clump: set to the current clump value
    * bits: bitmap to search within
    * size: bitmap size in number of bits

    In this version of the patchset, the for_each_set_clump macro has been
    reimplemented and simplified based on the suggestions provided by Rasmus
    Villemoes and Andy Shevchenko in the version 4 submission.

    In particular, the function of the for_each_set_clump macro has been
    restricted to handle only 8-bit clumps; the drivers that use the
    for_each_set_clump macro only handle 8-bit ports so a generic
    for_each_set_clump implementation is not necessary. Thus, a solution
    for large clumps (i.e. those larger than the width of a bitmap word)
    can be postponed until a driver appears that actually requires such a
    generic for_each_set_clump implementation.

    For what it's worth, a semi-generic for_each_set_clump (i.e. for clumps
    smaller than the width of a bitmap word) can be implemented by simply
    replacing the hardcoded '8' and '0xFF' instances with respective
    variables. I have not yet had a need for such an implementation, and
    since it falls short of a true generic for_each_set_clump function, I
    have decided to forgo such an implementation for now.

    In addition, the bitmap_get_value8 and bitmap_set_value8 functions are
    introduced to get and set 8-bit values respectively. Their use is based
    on the behavior suggested in the patchset version 4 review.

    This patch (of 14):

    This macro iterates for each 8-bit group of bits (clump) with set bits,
    within a bitmap memory region. For each iteration, "start" is set to
    the bit offset of the found clump, while the respective clump value is
    stored to the location pointed by "clump". Additionally, the
    bitmap_get_value8 and bitmap_set_value8 functions are introduced to
    respectively get and set an 8-bit value in a bitmap memory region.

    [gustavo@embeddedor.com: fix potential sign-extension overflow]
    Link: http://lkml.kernel.org/r/20191015184657.GA26541@embeddedor
    [akpm@linux-foundation.org: s/ULL/UL/, per Joe]
    [vilhelm.gray@gmail.com: add for_each_set_clump8 documentation]
    Link: http://lkml.kernel.org/r/20191016161825.301082-1-vilhelm.gray@gmail.com
    Link: http://lkml.kernel.org/r/893c3b4f03266c9496137cc98ac2b1bd27f92c73.1570641097.git.vilhelm.gray@gmail.com
    Signed-off-by: William Breathitt Gray
    Signed-off-by: Gustavo A. R. Silva
    Suggested-by: Andy Shevchenko
    Suggested-by: Rasmus Villemoes
    Suggested-by: Lukas Wunner
    Tested-by: Andy Shevchenko
    Cc: Arnd Bergmann
    Cc: Linus Walleij
    Cc: Bartosz Golaszewski
    Cc: Masahiro Yamada
    Cc: Geert Uytterhoeven
    Cc: Phil Reid
    Cc: Geert Uytterhoeven
    Cc: Mathias Duckeck
    Cc: Morten Hein Tiljeset
    Cc: Sean Nyekjaer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    William Breathitt Gray
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

07 Feb, 2018

1 commit

  • We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
    It's essentially a joined iteration in search for a non-zero bit, which is
    currently implemented as a lookup join (find a nonzero bit on the lhs,
    lookup the rhs to see if it's set there).

    Implement a direct join (find a nonzero bit on the incrementally built
    join). Also add generic bitmap benchmarks in the new `test_find_bit`
    module for new function (see `find_next_and_bit` in [2] and [3] below).

    For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
    faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory
    usage. Note that on Arm, the new pure-C implementation still outperforms
    the old one that uses a mix of C and asm (`find_next_bit`) [3].

    [1] Approximate benchmark code:

    ```
    unsigned long src1p[nr_cpumask_longs] = {pattern1};
    unsigned long src2p[nr_cpumask_longs] = {pattern2};
    for (/*a bunch of repetitions*/) {
    for (int n = -1; n ]
    Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
    Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
    Signed-off-by: Clement Courbet
    Signed-off-by: Geert Uytterhoeven
    Cc: Yury Norov
    Cc: Geert Uytterhoeven
    Cc: Alexey Dobriyan
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton

    Signed-off-by: Linus Torvalds

    Clement Courbet
     

25 Feb, 2017

1 commit

  • This saves 32 bytes on my x86-64 build, mostly due to alignment
    considerations and sharing more code between find_next_bit and
    find_next_zero_bit, but it does save a couple of instructions.

    There's really two parts to this commit:
    - First, the first half of the test: (!nbits || start >= nbits) is
    trivially a subset of the second half, since nbits and start are both
    unsigned
    - Second, while looking at the disassembly, I noticed that GCC was
    predicting the branch taken. Since this is a failure case, it's
    clearly the less likely of the two branches, so add an unlikely() to
    override GCC's heuristics.

    [mawilcox@microsoft.com: v2]
    Link: http://lkml.kernel.org/r/1483709016-1834-1-git-send-email-mawilcox@linuxonhyperv.com
    Link: http://lkml.kernel.org/r/1483709016-1834-1-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Acked-by: Yury Norov
    Acked-by: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

17 Apr, 2015

1 commit

  • This file contains implementation for all find_*_bit{,_le}
    So giving it more generic name looks reasonable.

    Signed-off-by: Yury Norov
    Reviewed-by: Rasmus Villemoes
    Reviewed-by: George Spelvin
    Cc: Alexey Klimov
    Cc: David S. Miller
    Cc: Daniel Borkmann
    Cc: Hannes Frederic Sowa
    Cc: Lai Jiangshan
    Cc: Mark Salter
    Cc: AKASHI Takahiro
    Cc: Thomas Graf
    Cc: Valentin Rothberg
    Cc: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yury Norov