22 Nov, 2019

1 commit


17 Nov, 2019

13 commits

  • This implementation is the fastest available x86_64 implementation, and
    unlike Sandy2x, it doesn't requie use of the floating point registers at
    all. Instead it makes use of BMI2 and ADX, available on recent
    microarchitectures. The implementation was written by Armando
    Faz-Hernández with contributions (upstream) from Samuel Neves and me,
    in addition to further changes in the kernel implementation from us.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Samuel Neves
    Co-developed-by: Samuel Neves
    [ardb: - move to arch/x86/crypto
    - wire into lib/crypto framework
    - implement crypto API KPP hooks ]
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • Expose the generic Curve25519 library via the crypto API KPP interface.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • These implementations from Samuel Neves support AVX and AVX-512VL.
    Originally this used AVX-512F, but Skylake thermal throttling made
    AVX-512VL more attractive and possible to do with negligable difference.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Samuel Neves
    Co-developed-by: Samuel Neves
    [ardb: move to arch/x86/crypto, wire into lib/crypto framework]
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • Wire up our newly added Blake2s implementation via the shash API.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation for
    MIPS authored by Andy Polyakov, a prior 64-bit only version of which has been
    contributed by him to the OpenSSL project. The file 'poly1305-mips.pl' is taken
    straight from this upstream GitHub repository [0] at commit
    d22ade312a7af958ec955620b0d241cf42c37feb, and already contains all the changes
    required to build it as part of a Linux kernel module.

    [0] https://github.com/dot-asm/cryptogams

    Co-developed-by: Andy Polyakov
    Signed-off-by: Andy Polyakov
    Co-developed-by: René van Dorst
    Signed-off-by: René van Dorst
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Implement the arch init/update/final Poly1305 library routines in the
    accelerated SIMD driver for x86 so they are accessible to users of
    the Poly1305 library interface as well.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Remove the dependency on the generic Poly1305 driver. Instead, depend
    on the generic library so that we only reuse code without pulling in
    the generic skcipher implementation as well.

    While at it, remove the logic that prefers the non-SIMD path for short
    inputs - this is no longer necessary after recent FPU handling changes
    on x86.

    Since this removes the last remaining user of the routines exported
    by the generic shash driver, unexport them and make them static.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Move the core Poly1305 routines shared between the generic Poly1305
    shash driver and the Adiantum and NHPoly1305 drivers into a separate
    library so that using just this pieces does not pull in the crypto
    API pieces of the generic Poly1305 routine.

    In a subsequent patch, we will augment this generic library with
    init/update/final routines so that Poyl1305 algorithm can be used
    directly without the need for using the crypto API's shash abstraction.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This integrates the accelerated MIPS 32r2 implementation of ChaCha
    into both the API and library interfaces of the kernel crypto stack.

    The significance of this is that, in addition to becoming available
    as an accelerated library implementation, it can also be used by
    existing crypto API code such as Adiantum (for block encryption on
    ultra low performance cores) or IPsec using chacha20poly1305. These
    are use cases that have already opted into using the abstract crypto
    API. In order to support Adiantum, the core assembler routine has
    been adapted to take the round count as a function argument rather
    than hardcoding it to 20.

    Co-developed-by: René van Dorst
    Signed-off-by: René van Dorst
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Wire the existing x86 SIMD ChaCha code into the new ChaCha library
    interface, so that users of the library interface will get the
    accelerated version when available.

    Given that calls into the library API will always go through the
    routines in this module if it is enabled, switch to static keys
    to select the optimal implementation available (which may be none
    at all, in which case we defer to the generic implementation for
    all invocations).

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • In preparation of extending the x86 ChaCha driver to also expose the ChaCha
    library interface, drop the dependency on the chacha_generic crypto driver
    as a non-SIMD fallback, and depend on the generic ChaCha library directly.
    This way, we only pull in the code we actually need, without registering
    a set of ChaCha skciphers that we will never use.

    Since turning the FPU on and off is cheap these days, simplify the SIMD
    routine by dropping the per-page yield, which makes for a cleaner switch
    to the library API as well. This also allows use to invoke the skcipher
    walk routines in non-atomic mode.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Currently, our generic ChaCha implementation consists of a permute
    function in lib/chacha.c that operates on the 64-byte ChaCha state
    directly [and which is always included into the core kernel since it
    is used by the /dev/random driver], and the crypto API plumbing to
    expose it as a skcipher.

    In order to support in-kernel users that need the ChaCha streamcipher
    but have no need [or tolerance] for going through the abstractions of
    the crypto API, let's expose the streamcipher bits via a library API
    as well, in a way that permits the implementation to be superseded by
    an architecture specific one if provided.

    So move the streamcipher code into a separate module in lib/crypto,
    and expose the init() and crypt() routines to users of the library.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • In preparation of introducing a set of crypto library interfaces, tidy
    up the Makefile and split off the Kconfig symbols into a separate file.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

01 Nov, 2019

2 commits

  • Now that the blkcipher algorithm type has been removed in favor of
    skcipher, rename the crypto_blkcipher kernel module to crypto_skcipher,
    and rename the config options accordingly:

    CONFIG_CRYPTO_BLKCIPHER => CONFIG_CRYPTO_SKCIPHER
    CONFIG_CRYPTO_BLKCIPHER2 => CONFIG_CRYPTO_SKCIPHER2

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • The patch brings support of several BLAKE2 variants (2b with various
    digest lengths). The keyed digest is supported, using tfm->setkey call.
    The in-tree user will be btrfs (for checksumming), we're going to use
    the BLAKE2b-256 variant.

    The code is reference implementation taken from the official sources and
    modified in terms of kernel coding style (whitespace, comments, uintXX_t
    -> uXX types, removed unused prototypes and #ifdefs, removed testing
    code, changed secure_zero_memory -> memzero_explicit, used own helpers
    for unaligned reads/writes and rotations).

    Further changes removed sanity checks of key length or output size,
    these values are verified in the crypto API callbacks or hardcoded in
    shash_alg and not exposed to users.

    Signed-off-by: David Sterba
    Signed-off-by: Herbert Xu

    David Sterba
     

25 Oct, 2019

1 commit

  • Convert the glue code for the PowerPC SPE implementations of AES-ECB,
    AES-CBC, AES-CTR, and AES-XTS from the deprecated "blkcipher" API to the
    "skcipher" API. This is needed in order for the blkcipher API to be
    removed.

    Tested with:

    export ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-
    make mpc85xx_defconfig
    cat >> .config << EOF
    # CONFIG_MODULES is not set
    # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
    CONFIG_DEBUG_KERNEL=y
    CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
    CONFIG_CRYPTO_AES=y
    CONFIG_CRYPTO_CBC=y
    CONFIG_CRYPTO_CTR=y
    CONFIG_CRYPTO_ECB=y
    CONFIG_CRYPTO_XTS=y
    CONFIG_CRYPTO_AES_PPC_SPE=y
    EOF
    make olddefconfig
    make -j32
    qemu-system-ppc -M mpc8544ds -cpu e500 -nographic \
    -kernel arch/powerpc/boot/zImage \
    -append cryptomgr.fuzz_iterations=1000

    Note that xts-ppc-spe still fails the comparison tests due to the lack
    of ciphertext stealing support. This is not addressed by this patch.

    This patch also cleans up the code by making ->encrypt() and ->decrypt()
    call a common function for each of ECB, CBC, and XTS, and by using a
    clearer way to compute the length to process at each step.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

23 Oct, 2019

3 commits

  • Convert the glue code for the SPARC64 DES opcodes implementations of
    DES-ECB, DES-CBC, 3DES-ECB, and 3DES-CBC from the deprecated "blkcipher"
    API to the "skcipher" API. This is needed in order for the blkcipher
    API to be removed.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Convert the glue code for the SPARC64 Camellia opcodes implementations
    of Camellia-ECB and Camellia-CBC from the deprecated "blkcipher" API to
    the "skcipher" API. This is needed in order for the blkcipher API to be
    removed.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Convert the glue code for the SPARC64 AES opcodes implementations of
    AES-ECB, AES-CBC, and AES-CTR from the deprecated "blkcipher" API to the
    "skcipher" API. This is needed in order for the blkcipher API to be
    removed.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

10 Oct, 2019

1 commit

  • Now that the Clang compiler has taken it upon itself to police the
    compiler command line, and reject combinations for arguments it views
    as incompatible, the AEGIS128 no longer builds correctly, and errors
    out like this:

    clang-10: warning: ignoring extension 'crypto' because the 'armv7-a'
    architecture does not support it [-Winvalid-command-line-argument]

    So let's switch to armv8-a instead, which matches the crypto-neon-fp-armv8
    FPU profile we specify. Since neither were actually supported by GCC
    versions before 4.8, let's tighten the Kconfig dependencies as well so
    we won't run into errors when building with an ancient compiler.

    Signed-off-by: Ard Biesheuvel
    Reviewed-by: Nathan Chancellor
    Tested-by: Nathan Chancellor
    Reviewed-by: Nick Desaulniers
    Tested-by: Nick Desaulniers
    Reported-by:
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

22 Sep, 2019

1 commit

  • …device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - crypto and DM crypt advances that allow the crypto API to reclaim
    implementation details that do not belong in DM crypt. The wrapper
    template for ESSIV generation that was factored out will also be used
    by fscrypt in the future.

    - Add root hash pkcs#7 signature verification to the DM verity target.

    - Add a new "clone" DM target that allows for efficient remote
    replication of a device.

    - Enhance DM bufio's cache to be tailored to each client based on use.
    Clients that make heavy use of the cache get more of it, and those
    that use less have reduced cache usage.

    - Add a new DM_GET_TARGET_VERSION ioctl to allow userspace to query the
    version number of a DM target (even if the associated module isn't
    yet loaded).

    - Fix invalid memory access in DM zoned target.

    - Fix the max_discard_sectors limit advertised by the DM raid target;
    it was mistakenly storing the limit in bytes rather than sectors.

    - Small optimizations and cleanups in DM writecache target.

    - Various fixes and cleanups in DM core, DM raid1 and space map portion
    of DM persistent data library.

    * tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
    dm: introduce DM_GET_TARGET_VERSION
    dm bufio: introduce a global cache replacement
    dm bufio: remove old-style buffer cleanup
    dm bufio: introduce a global queue
    dm bufio: refactor adjust_total_allocated
    dm bufio: call adjust_total_allocated from __link_buffer and __unlink_buffer
    dm: add clone target
    dm raid: fix updating of max_discard_sectors limit
    dm writecache: skip writecache_wait for pmem mode
    dm stats: use struct_size() helper
    dm crypt: omit parsing of the encapsulated cipher
    dm crypt: switch to ESSIV crypto API template
    crypto: essiv - create wrapper template for ESSIV generation
    dm space map common: remove check for impossible sm_find_free() return value
    dm raid1: use struct_size() with kzalloc()
    dm writecache: optimize performance by sorting the blocks for writeback_all
    dm writecache: add unlikely for getting two block with same LBA
    dm writecache: remove unused member pointer in writeback_struct
    dm zoned: fix invalid memory access
    dm verity: add root hash pkcs#7 signature verification
    ...

    Linus Torvalds
     

04 Sep, 2019

1 commit

  • Implement a template that wraps a (skcipher,shash) or (aead,shash) tuple
    so that we can consolidate the ESSIV handling in fscrypt and dm-crypt and
    move it into the crypto API. This will result in better test coverage, and
    will allow future changes to make the bare cipher interface internal to the
    crypto subsystem, in order to increase robustness of the API against misuse.

    Signed-off-by: Ard Biesheuvel
    Acked-by: Herbert Xu
    Tested-by: Milan Broz
    Signed-off-by: Mike Snitzer

    Ard Biesheuvel
     

22 Aug, 2019

3 commits

  • Drop the duplicate generic sha256 (and sha224) implementation from
    crypto/sha256_generic.c and use the implementation from
    lib/crypto/sha256.c instead.

    "diff -u lib/crypto/sha256.c sha256_generic.c" shows that the core
    sha256_transform function from both implementations is identical and
    the other code is functionally identical too.

    Suggested-by: Eric Biggers
    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     
  • Before this commit lib/crypto/sha256.c has only been used in the s390 and
    x86 purgatory code, make it suitable for generic use:

    * Export interesting symbols
    * Add -D__DISABLE_EXPORTS to CFLAGS_sha256.o for purgatory builds to
    avoid the exports for the purgatory builds
    * Add to lib/crypto/Makefile and crypto/Kconfig

    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     
  • Another one for the cipher museum: split off DES core processing into
    a separate module so other drivers (mostly for crypto accelerators)
    can reuse the code without pulling in the generic DES cipher itself.
    This will also permit the cipher interface to be made private to the
    crypto API itself once we move the only user in the kernel (CIFS) to
    this library interface.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

15 Aug, 2019

1 commit

  • Provide an accelerated implementation of aegis128 by wiring up the
    SIMD hooks in the generic driver to an implementation based on NEON
    intrinsics, which can be compiled to both ARM and arm64 code.

    This results in a performance of 2.2 cycles per byte on Cortex-A53,
    which is a performance increase of ~11x compared to the generic
    code.

    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

02 Aug, 2019

1 commit

  • This reverts commit ecc8bc81f2fb3976737ef312f824ba6053aa3590
    ("crypto: aegis128 - provide a SIMD implementation based on NEON
    intrinsics") and commit 7cdc0ddbf74a19cecb2f0e9efa2cae9d3c665189
    ("crypto: aegis128 - add support for SIMD acceleration").

    They cause compile errors on platforms other than ARM because
    the mechanism to selectively compile the SIMD code is broken.

    Repoted-by: Heiko Carstens
    Reported-by: Stephen Rothwell
    Signed-off-by: Herbert Xu

    Herbert Xu
     

27 Jul, 2019

1 commit

  • To help avoid confusion, add a comment to ghash-generic.c which explains
    the convention that the kernel's implementation of GHASH uses.

    Also update the Kconfig help text and module descriptions to call GHASH
    a "hash function" rather than a "message digest", since the latter
    normally means a real cryptographic hash function, which GHASH is not.

    Cc: Pascal Van Leeuwen
    Signed-off-by: Eric Biggers
    Reviewed-by: Ard Biesheuvel
    Acked-by: Pascal Van Leeuwen
    Signed-off-by: Herbert Xu

    Eric Biggers
     

26 Jul, 2019

7 commits

  • Provide an accelerated implementation of aegis128 by wiring up the
    SIMD hooks in the generic driver to an implementation based on NEON
    intrinsics, which can be compiled to both ARM and arm64 code.

    This results in a performance of 2.2 cycles per byte on Cortex-A53,
    which is a performance increase of ~11x compared to the generic
    code.

    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Three variants of AEGIS were proposed for the CAESAR competition, and
    only one was selected for the final portfolio: AEGIS128.

    The other variants, AEGIS128L and AEGIS256, are not likely to ever turn
    up in networking protocols or other places where interoperability
    between Linux and other systems is a concern, nor are they likely to
    be subjected to further cryptanalysis. However, uninformed users may
    think that AEGIS128L (which is faster) is equally fit for use.

    So let's remove them now, before anyone starts using them and we are
    forced to support them forever.

    Note that there are no known flaws in the algorithms or in any of these
    implementations, but they have simply outlived their usefulness.

    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • MORUS was not selected as a winner in the CAESAR competition, which
    is not surprising since it is considered to be cryptographically
    broken [0]. (Note that this is not an implementation defect, but a
    flaw in the underlying algorithm). Since it is unlikely to be in use
    currently, let's remove it before we're stuck with it.

    [0] https://eprint.iacr.org/2019/172.pdf

    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Drop aes-generic's version of crypto_aes_expand_key(), and switch to
    the key expansion routine provided by the AES library. AES key expansion
    is not performance critical, and it is better to have a single version
    shared by all AES implementations.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • The AES assembler code for x86 isn't actually faster than code
    generated by the compiler from aes_generic.c, and considering
    the disproportionate maintenance burden of assembler code on
    x86, it is better just to drop it entirely. Modern x86 systems
    will use AES-NI anyway, and given that the modules being removed
    have a dependency on aes_generic already, we can remove them
    without running the risk of regressions.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • The AES-NI code contains fallbacks for invocations that occur from a
    context where the SIMD unit is unavailable, which really only occurs
    when running in softirq context that was entered from a hard IRQ that
    was taken while running kernel code that was already using the FPU.

    That means performance is not really a consideration, and we can just
    use the new library code for this use case, which has a smaller
    footprint and is believed to be time invariant. This will allow us to
    drop the non-SIMD asm routines in a subsequent patch.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Take the existing small footprint and mostly time invariant C code
    and turn it into a AES library that can be used for non-performance
    critical, casual use of AES, and as a fallback for, e.g., SIMD code
    that needs a secondary path that can be taken in contexts where the
    SIMD unit is off limits (e.g., in hard interrupts taken from kernel
    context)

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

20 Jun, 2019

1 commit


06 Jun, 2019

1 commit

  • xxhash is currently implemented as a self-contained module in /lib.
    This patch enables that module to be used as part of the generic kernel
    crypto framework. It adds a simple wrapper to the 64bit version.

    I've also added test vectors (with help from Nick Terrell). The upstream
    xxhash code is tested by running hashing operation on random 222 byte
    data with seed values of 0 and a prime number. The upstream test
    suite can be found at https://github.com/Cyan4973/xxHash/blob/cf46e0c/xxhsum.c#L664

    Essentially hashing is run on data of length 0,1,14,222 with the
    aforementioned seed values 0 and prime 2654435761. The particular random
    222 byte string was provided to me by Nick Terrell by reading
    /dev/random and the checksums were calculated by the upstream xxsum
    utility with the following bash script:

    dd if=/dev/random of=TEST_VECTOR bs=1 count=222

    for a in 0 1; do
    for l in 0 1 14 222; do
    for s in 0 2654435761; do
    echo algo $a length $l seed $s;
    head -c $l TEST_VECTOR | ~/projects/kernel/xxHash/xxhsum -H$a -s$s
    done
    done
    done

    This produces output as follows:

    algo 0 length 0 seed 0
    02cc5d05 stdin
    algo 0 length 0 seed 2654435761
    02cc5d05 stdin
    algo 0 length 1 seed 0
    25201171 stdin
    algo 0 length 1 seed 2654435761
    25201171 stdin
    algo 0 length 14 seed 0
    c1d95975 stdin
    algo 0 length 14 seed 2654435761
    c1d95975 stdin
    algo 0 length 222 seed 0
    b38662a6 stdin
    algo 0 length 222 seed 2654435761
    b38662a6 stdin
    algo 1 length 0 seed 0
    ef46db3751d8e999 stdin
    algo 1 length 0 seed 2654435761
    ac75fda2929b17ef stdin
    algo 1 length 1 seed 0
    27c3f04c2881203a stdin
    algo 1 length 1 seed 2654435761
    4a15ed26415dfe4d stdin
    algo 1 length 14 seed 0
    3d33dc700231dfad stdin
    algo 1 length 14 seed 2654435761
    ea5f7ddef9a64f80 stdin
    algo 1 length 222 seed 0
    5f3d3c08ec2bef34 stdin
    algo 1 length 222 seed 2654435761
    6a9df59664c7ed62 stdin

    algo 1 is xx64 variant, algo 0 is the 32 bit variant which is currently
    not hooked up.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Nikolay Borisov
     

30 May, 2019

2 commits