16 Feb, 2019

1 commit

  • The upcoming GCC 9 release extends the -Wmissing-attributes warnings
    (enabled by -Wall) to C and aliases: it warns when particular function
    attributes are missing in the aliases but not in their target.

    In particular, it triggers here because crc32_le_base/__crc32c_le_base
    aren't __pure while their target crc32_le/__crc32c_le are.

    These aliases are used by architectures as a fallback in accelerated
    versions of CRC32. See commit 9784d82db3eb ("lib/crc32: make core crc32()
    routines weak so they can be overridden").

    Therefore, being fallbacks, it is likely that even if the aliases
    were called from C, there wouldn't be any optimizations possible.
    Currently, the only user is arm64, which calls this from asm.

    Still, marking the aliases as __pure makes sense and is a good idea
    for documentation purposes and possible future optimizations,
    which also silences the warning.

    Acked-by: Ard Biesheuvel
    Tested-by: Laura Abbott
    Signed-off-by: Miguel Ojeda

    Miguel Ojeda
     

10 Sep, 2018

1 commit


27 Jul, 2018

2 commits


27 Sep, 2017

1 commit


25 Feb, 2017

1 commit

  • Extract the crc32 test code into its own source file, to allow to
    compile it either to a loadable module, or builtin into the kernel.

    Link: http://lkml.kernel.org/r/1483470276-10517-1-git-send-email-geert@linux-m68k.org
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

03 Aug, 2016

1 commit

  • The crc32 test function measures the elapsed time in nanoseconds, but
    uses 'struct timespec' for that. We want to remove timespec from the
    kernel for y2038 compatibility, and ktime_get_ns() also helps make the
    code simpler here.

    It is also slightly better to use monontonic time, as we are only
    interested in the time difference.

    Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Cc: "David S . Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

26 Jun, 2014

3 commits

  • In case they help the compiler.

    Signed-off-by: George Spelvin
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    George Spelvin
     
  • So it gets discarded after the selftest.

    Signed-off-by: George Spelvin
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    George Spelvin
     
  • There's no need for a full 32x32 matrix, when rows before the last are
    just shifted copies of the rows after them.

    There's still room for improvement (especially on X86 processors with
    CRC32 and PCLMUL instructions), but this is a large step in the
    right direction [which is in particular useful for its current user,
    namely SCTP checksumming over multiple skb frags[] entries, i.e. in
    IPVS balancing when other CRC32 offloads are not available].

    The internal primitive is now called crc32_generic_shift and takes one
    less argument; the XOR with crc2 is done in inline wrappers.

    Signed-off-by: George Spelvin
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    George Spelvin
     

05 Jun, 2014

1 commit


05 Nov, 2013

2 commits

  • We can safely reduce the number of test cases by a tenth.
    There is no particular need to run as many as we're running
    now for crc32{,c}_combine, that gives us still ~8000 tests
    we're doing if people run kernels with crc selftests enabled
    which is perfectly fine.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Fengguang reports that when crc32 selftests are running on startup, on
    some e.g. 32bit systems, we can get a CPU stall like "INFO: rcu_sched
    self-detected stall on CPU { 0} (t=2101 jiffies g=4294967081 c=4294967080
    q=41)". As this is not intended, add a cond_resched() at the end of a
    test case to fix it. Introduced by efba721f63 ("lib: crc32: add test cases
    for crc32{, c}_combine routines").

    Reported-by: Fengguang Wu
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

04 Nov, 2013

3 commits

  • We already have 100 test cases for crcs itself, so split the test
    buffer with a-prio known checksums, and test crc of two blocks
    against crc of the whole block for the same results.

    Output/result with CONFIG_CRC32_SELFTEST=y:

    [ 2.687095] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
    [ 2.687097] crc32: self tests passed, processed 225944 bytes in 278177 nsec
    [ 2.687383] crc32c: CRC_LE_BITS = 64
    [ 2.687385] crc32c: self tests passed, processed 225944 bytes in 141708 nsec
    [ 7.336771] crc32_combine: 113072 self tests passed
    [ 12.050479] crc32c_combine: 113072 self tests passed
    [ 17.633089] alg: No test for crc32 (crc32-pclmul)

    Signed-off-by: Daniel Borkmann
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This patch adds a combinator to merge two or more crc32{,c}s
    into a new one. This is useful for checksum computations of
    fragmented skbs that use crc32/crc32c as checksums.

    The arithmetics for combining both in the GF(2) was taken and
    slightly modified from zlib. Only passing two crcs is insufficient
    as two crcs and the length of the second piece is needed for
    merging. The code is made generic, so that only polynomials
    need to be passed for crc32_le resp. crc32c_le.

    Signed-off-by: Daniel Borkmann
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This is nothing more but a whitepace cleanup, as 80 chars is not a
    hard but soft limit, and otherwise makes the test cases array really
    look ugly. So fix it up.

    Signed-off-by: Daniel Borkmann
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

12 Sep, 2013

1 commit


06 Oct, 2012

1 commit

  • Fix the const sections for the code generated by crc32 table. There's
    no ro version of the cacheline aligned section, so we cannot put in
    const data without a conflict Just don't make the crc tables const for
    now.

    [ak@linux.intel.com: some fixes and new description]
    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Joe Mario
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Mario
     

31 Jul, 2012

1 commit

  • Variables t4, t5, t6 and t7 are only used when CRC_LE_BITS != 32. Fix
    the following compilation warnings:

    lib/crc32.c: In function 'crc32_body':
    lib/crc32.c:77:55: warning: unused variable 't7'
    lib/crc32.c:77:41: warning: unused variable 't6'
    lib/crc32.c:77:27: warning: unused variable 't5'
    lib/crc32.c:77:13: warning: unused variable 't4'

    Signed-off-by: Thiago Rafael Becker
    Cc: "Darrick J. Wong"
    Cc: Bob Pearson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thiago Rafael Becker
     

24 Mar, 2012

11 commits

  • Add self-test code for crc32c.

    Signed-off-by: Darrick J. Wong
    Cc: Bob Pearson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • Reuse the existing crc32 code to stamp out a crc32c implementation.

    Signed-off-by: Darrick J. Wong
    Cc: Herbert Xu
    Cc: Bob Pearson
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • Add a comment at the top of crc32.c

    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • Add two changes that improve the performance of x86 systems

    1. replace main loop with incrementing counter this change improves
    the performance of the selftest by about 5-6% on Nehalem CPUs. The
    apparent reason is that the compiler can use the loop index to perform
    an indexed memory access. This is reported to make the performance of
    PowerPC CPUs to get worse.

    2. replace the rem_len loop with incrementing counter this change
    improves the performance of the selftest, which has more than the usual
    number of occurances, by about 1-2% on x86 CPUs. In actual work loads
    the length is most often a multiple of 4 bytes and this code does not
    get executed as often if at all. Again this change is reported to make
    the performance of PowerPC get worse.

    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • Add slicing-by-8 algorithm to the existing slicing-by-4 algorithm. This
    consists of:

    - extend largest BITS size from 32 to 64
    - extend tables from tab[4][256] to up to tab[8][256]
    - Add code for inner loop.

    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • crc32.c provides a choice of one of several algorithms for computing the
    LSB and LSB versions of the CRC32 checksum based on the parameters
    CRC_LE_BITS and CRC_BE_BITS.

    In the original version the values 1, 2, 4 and 8 respectively selected
    versions of the alrogithm that computed the crc 1, 2, 4 and 32 bits as a
    time.

    This patch series adds a new version that computes the CRC 64 bits at a
    time. To make things easier to understand the parameter has been
    reinterpreted to actually stand for the number of bits processed in each
    step of the algorithm so that the old value 8 has been replaced with the
    value 32.

    This also allows us to add in a widely used crc algorithm that computes
    the crc 8 bits at a time called the Sarwate algorithm.

    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • crc32.c in its original version freely mixed u32, __le32 and __be32 types
    which caused warnings from sparse with __CHECK_ENDIAN__. This patch fixes
    these by forcing the types to u32.

    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • Misc cleanup of lib/crc32.c and related files.

    - remove unnecessary header files.

    - straighten out some convoluted ifdef's

    - rewrite some references to 2 dimensional arrays as 1 dimensional
    arrays to make them correct. I.e. replace tab[i] with tab[0][i].

    - a few trivial whitespace changes

    - fix a warning in gen_crc32tables.c caused by a mismatch in the type of
    the pointer passed to output table. Since the table is only used at
    kernel compile time, it is simpler to make the table big enough to hold
    the largest column size used. One cannot make the column size smaller
    in output_table because it has to be used by both the le and be tables
    and they can have different column sizes.

    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • Replace the unit test provided in crc32.c, which doesn't have a makefile
    and doesn't compile with current headers, with a simpler self test
    routine that also gives a measure of performance and runs at module init
    time. The self test option can be enabled through a configuration
    option CONFIG_CRC32_SELFTEST.

    The test stresses the pre and post loops and is thus not very realistic
    since actual uses will likely have addresses and lengths that are at
    least 4 byte aligned. However, the main loop is long enough so that the
    performance is dominated by that loop.

    The expected values for crc32_le and crc32_be were generated with the
    original version of crc32.c using CRC_BITS_LE = 8 and CRC_BITS_BE = 8.
    These values were then used to check all the values of the BITS
    parameters in both the original and new versions.

    The performance results show some variability from run to run in spite
    of attempts to both warm the cache and reduce the amount of OS noise by
    limiting interrutps during the test. To get comparable results and to
    analyse options wrt performance the best time reported over a small
    sample of runs has been taken.

    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • Move a long comment from lib/crc32.c to Documentation/crc32.txt where it
    will more likely get read.

    Edited the resulting document to add an explanation of the slicing-by-n
    algorithm.

    [djwong@us.ibm.com: minor changelog tweaks]
    [akpm@linux-foundation.org: fix typo, per George]
    Signed-off-by: George Spelvin
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     
  • This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out
    a software crc32c implementation. It removes the crc32c implementation
    in crypto/ in favor of using the stamped-out one in lib/. There is also
    a change to Kconfig so that the kernel builder can pick an
    implementation best suited for the hardware.

    The motivation for this patchset is that I am working on adding full
    metadata checksumming to ext4. As far as performance impact of adding
    checksumming goes, I see nearly no change with a standard mail server
    ffsb simulation. On a test that involves only file creation and
    deletion and extent tree writes, I see a drop of about 50 pcercent with
    the current kernel crc32c implementation; this improves to a drop of
    about 20 percent with the enclosed crc32c code.

    When metadata is usually a small fraction of total IO, this new
    implementation doesn't help much because metadata is usually a small
    fraction of total IO. However, when we are doing IO that is almost all
    metadata (such as rm -rf'ing a tree), then this patch speeds up the
    operation substantially.

    Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this
    patchset should improve their speed as well. I have not yet quantified
    that, however. This latest submission combines Bob's patches from late
    August 2011 with mine so that they can be one coherent patch set.
    Please excuse my inability to combine some of the patches; I've been
    advised to leave Bob's patches alone and build atop them instead. :/

    Since the last posting, I've also collected some crc32c test results on
    a bunch of different x86/powerpc/sparc platforms. The results can be
    viewed here: http://goo.gl/sgt3i ; the "crc32-kern-le" and "crc32c"
    columns describe the performance of the kernel's current crc32 and
    crc32c software implementations. The "crc32c-by8-le" column shows
    crc32c performance with this patchset applied. I expect crc32
    performance to be roughly the same.

    The two _boost columns at the right side of the spreadsheet shows how much
    faster the new implementation is over the old one. As you can see, crc32
    rises substantially, and crc32c experiences a huge increase.

    This patch:

    - remove trailing whitespace from lib/crc32.c
    - remove trailing whitespace from lib/crc32defs.h

    [djwong@us.ibm.com: changelog tweaks]
    Signed-off-by: Bob Pearson
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Pearson
     

11 Jan, 2012

1 commit

  • Taking a pointer reference to each row in the crc table matrix, one can
    reduce the inner loop with a few insn's

    Signed-off-by: Joakim Tjernlund
    Cc: Bob Pearson
    Cc: Frank Zago
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joakim Tjernlund
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

26 May, 2010

1 commit


25 May, 2010

2 commits

  • Since crc32.c contains a nifty test program that can be executed in user
    space, make sure endian detection works reliably in user space too.

    Signed-off-by: Joakim Tjernlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joakim Tjernlund
     
  • Precompute more crc32 values(0xcc00, 0xcc0000 and 0xcc000000) into tables.
    This increases the table size from 1KB to 4KB but the performance benfit
    makes it worth it:

    28% faster on MPC8321, 266 MHz
    2x faster on Core 2 Duo, 3.1GHz

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Joakim Tjernlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joakim Tjernlund
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

1 commit


16 Dec, 2009

1 commit


03 Feb, 2008

1 commit


19 Oct, 2007

1 commit

  • To be consistent with the use of attributes in the rest of the kernel
    replace all use of __attribute_pure__ with __pure and delete the definition
    of __attribute_pure__.

    Signed-off-by: Ralf Baechle
    Cc: Russell King
    Acked-by: Mauro Carvalho Chehab
    Cc: Bryan Wu
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralf Baechle