24 Mar, 2019

1 commit

  • commit 62fecf295e3c48be1b5f17c440b93875b9adb4d6 upstream.

    The SIMD routine ported from x86 used to have a special code path
    for inputs < 16 bytes, which got lost somewhere along the way.
    Instead, the current glue code aligns the input pointer to permit
    the NEON routine to use special versions of the vld1 instructions
    that assume 16 byte alignment, but this could result in inputs of
    less than 16 bytes to be passed in. This not only fails the new
    extended tests that Eric has implemented, it also results in the
    code reading past the end of the input, which could potentially
    result in crashes when dealing with less than 16 bytes of input
    at the end of a page which is followed by an unmapped page.

    So update the glue code to only invoke the NEON routine if the
    input is at least 16 bytes.

    Reported-by: Eric Biggers
    Reviewed-by: Eric Biggers
    Fixes: 1d481f1cd892 ("crypto: arm/crct10dif - port x86 SSE implementation to ARM")
    Cc: # v4.10+
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     

14 Nov, 2018

1 commit

  • commit 578bdaabd015b9b164842c3e8ace9802f38e7ecc upstream.

    These are unused, undesired, and have never actually been used by
    anybody. The original authors of this code have changed their mind about
    its inclusion. While originally proposed for disk encryption on low-end
    devices, the idea was discarded [1] in favor of something else before
    that could really get going. Therefore, this patch removes Speck.

    [1] https://marc.info/?l=linux-crypto-vger&m=153359499015659

    Signed-off-by: Jason A. Donenfeld
    Acked-by: Eric Biggers
    Cc: stable@vger.kernel.org
    Acked-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Jason A. Donenfeld
     

24 Aug, 2018

1 commit

  • Almost all files in the kernel are either plain text or UTF-8 encoded. A
    couple however are ISO_8859-1, usually just a few characters in a C
    comments, for historic reasons.

    This converts them all to UTF-8 for consistency.

    Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Acked-by: Simon Horman [IPVS portion]
    Acked-by: Jonathan Cameron [IIO]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Rob Herring
    Cc: Joe Perches
    Cc: Arnd Bergmann
    Cc: Samuel Ortiz
    Cc: "David S. Miller"
    Cc: Rob Herring
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

03 Aug, 2018

2 commits


09 Jul, 2018

3 commits

  • Some ahash algorithms set .cra_type = &crypto_ahash_type. But this is
    redundant with the C structure type ('struct ahash_alg'), and
    crypto_register_ahash() already sets the .cra_type automatically.
    Apparently the useless assignment has just been copy+pasted around.

    So, remove the useless assignment from all the ahash algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Acked-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH. But this
    is redundant with the C structure type ('struct ahash_alg'), and
    crypto_register_ahash() already sets the type flag automatically,
    clearing any type flag that was already there. Apparently the useless
    assignment has just been copy+pasted around.

    So, remove the useless assignment from all the ahash algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Acked-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH. But this
    is redundant with the C structure type ('struct shash_alg'), and
    crypto_register_shash() already sets the type flag automatically,
    clearing any type flag that was already there. Apparently the useless
    assignment has just been copy+pasted around.

    So, remove the useless assignment from all the shash algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

01 Jul, 2018

1 commit

  • Building the kernel with CONFIG_THUMB2_KERNEL=y and
    CONFIG_CRYPTO_SPECK_NEON set fails with the following errors:

    arch/arm/crypto/speck-neon-core.S: Assembler messages:

    arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here -- `bic sp,#0xf'
    arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here -- `bic sp,#0xf'
    arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here -- `bic sp,#0xf'
    arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here -- `bic sp,#0xf'

    The problem is that the 'bic' instruction can't operate on the 'sp'
    register in Thumb2 mode. Fix it by using a temporary register. This
    isn't in the main loop, so the performance difference is negligible.
    This also matches what aes-neonbs-core.S does.

    Reported-by: Stefan Agner
    Fixes: ede9622162fa ("crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS")
    Signed-off-by: Eric Biggers
    Acked-by: Ard Biesheuvel
    Reviewed-by: Stefan Agner
    Signed-off-by: Herbert Xu

    Eric Biggers
     

31 May, 2018

1 commit

  • Several source files have been taken from OpenSSL. In some of them a
    comment that "permission to use under GPL terms is granted" was
    included below a contradictory license statement. In several cases,
    there was no indication that the license of the code was compatible
    with the GPLv2.

    This change clarifies the licensing for all of these files. I've
    confirmed with the author (Andy Polyakov) that a) he has licensed the
    files with the GPLv2 comment under that license and b) that he's also
    happy to license the other files under GPLv2 too. In one case, the
    file is already contained in his CRYPTOGAMS bundle, which has a GPLv2
    option, and so no special measures are needed.

    In all cases, the license status of code has been clarified by making
    the GPLv2 license prominent.

    The .S files have been regenerated from the updated .pl files.

    This is a comment-only change. No code is changed.

    Signed-off-by: Adam Langley
    Signed-off-by: Herbert Xu

    Adam Langley
     

07 Apr, 2018

1 commit


23 Mar, 2018

1 commit

  • The decision to rebuild .S_shipped is made based on the relative
    timestamps of .S_shipped and .pl files but git makes this essentially
    random. This means that the perl script might run anyway (usually at
    most once per checkout), defeating the whole purpose of _shipped.

    Fix by skipping the rule unless explicit make variables are provided:
    REGENERATE_ARM_CRYPTO or REGENERATE_ARM64_CRYPTO.

    This can produce nasty occasional build failures downstream, for example
    for toolchains with broken perl. The solution is minimally intrusive to
    make it easier to push into stable.

    Another report on a similar issue here: https://lkml.org/lkml/2018/3/8/1379

    Signed-off-by: Leonard Crestez
    Cc:
    Reviewed-by: Masahiro Yamada
    Acked-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Leonard Crestez
     

22 Feb, 2018

2 commits

  • Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
    128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
    Speck64. Each 128-byte chunk goes through XTS preprocessing, then is
    encrypted/decrypted (doing one cipher round for all the blocks, then the
    next round, etc.), then goes through XTS postprocessing.

    The performance depends on the processor but can be about 3 times faster
    than the generic code. For example, on an ARMv7 processor we observe
    the following performance with Speck128/256-XTS:

    xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
    xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s

    In comparison to AES-256-XTS without the Cryptography Extensions:

    xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s
    xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s
    xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s

    Speck64/128-XTS is even faster:

    xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s

    Note that as with the generic code, only the Speck128 and Speck64
    variants are supported. Also, for now only the XTS mode of operation is
    supported, to target the disk and file encryption use cases. The NEON
    code also only handles the portion of the data that is evenly divisible
    into 128-byte chunks, with any remainder handled by a C fallback. Of
    course, other modes of operation could be added later if needed, and/or
    the NEON code could be updated to handle other buffer sizes.

    The XTS specification is only defined for AES which has a 128-bit block
    size, so for the GF(2^64) math needed for Speck64-XTS we use the
    reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
    paper. Of course, when possible users should use Speck128-XTS, but even
    that may be too slow on some processors; Speck64-XTS can be faster.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Move the AES inverse S-box to the .rodata section
    where it is safe from abuse by speculation.

    Signed-off-by: Jinbum Park
    Acked-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Jinbum Park
     

12 Jan, 2018

1 commit

  • We need to consistently enforce that keyed hashes cannot be used without
    setting the key. To do this we need a reliable way to determine whether
    a given hash algorithm is keyed or not. AF_ALG currently does this by
    checking for the presence of a ->setkey() method. However, this is
    actually slightly broken because the CRC-32 algorithms implement
    ->setkey() but can also be used without a key. (The CRC-32 "key" is not
    actually a cryptographic key but rather represents the initial state.
    If not overridden, then a default initial state is used.)

    Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
    indicates that the algorithm has a ->setkey() method, but it is not
    required to be called. Then set it on all the CRC-32 algorithms.

    The same also applies to the Adler-32 implementation in Lustre.

    Also, the cryptd and mcryptd templates have to pass through the flag
    from their underlying algorithm.

    Cc: stable@vger.kernel.org
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

11 Dec, 2017

1 commit

  • Fix ptr_ret.cocci warnings:
    arch/arm/crypto/aes-neonbs-glue.c:184:1-3: WARNING: PTR_ERR_OR_ZERO can be used
    arch/arm/crypto/aes-neonbs-glue.c:261:1-3: WARNING: PTR_ERR_OR_ZERO can be used

    Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

    Generated by: scripts/coccinelle/api/ptr_ret.cocci

    Signed-off-by: Vasyl Gomonovych
    Signed-off-by: Herbert Xu

    Gomonovych, Vasyl
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

04 Aug, 2017

3 commits

  • For the final round, avoid the expanded and padded lookup tables
    exported by the generic AES driver. Instead, for encryption, we can
    perform byte loads from the same table we used for the inner rounds,
    which will still be hot in the caches. For decryption, use the inverse
    AES Sbox directly, which is 4x smaller than the inverse lookup table
    exported by the generic driver.

    This should significantly reduce the Dcache footprint of our code,
    which makes the code more robust against timing attacks. It does not
    introduce any additional module dependencies, given that we already
    rely on the core AES module for the shared key expansion routines.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Implement a NEON fallback for systems that do support NEON but have
    no support for the optional 64x64->128 polynomial multiplication
    instruction that is part of the ARMv8 Crypto Extensions. It is based
    on the paper "Fast Software Polynomial Multiplication on ARM Processors
    Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
    Ricardo Dahab (https://hal.inria.fr/hal-01506572)

    On a 32-bit guest executing under KVM on a Cortex-A57, the new code is
    not only 4x faster than the generic table based GHASH driver, it is also
    time invariant. (Note that the existing vmull.p64 code is 16x faster on
    this core).

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • There are quite a number of occurrences in the kernel of the pattern

    if (dst != src)
    memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
    crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);

    or

    crypto_xor(keystream, src, nbytes);
    memcpy(dst, keystream, nbytes);

    where crypto_xor() is preceded or followed by a memcpy() invocation
    that is only there because crypto_xor() uses its output parameter as
    one of the inputs. To avoid having to add new instances of this pattern
    in the arm64 code, which will be refactored to implement non-SIMD
    fallbacks, add an alternative implementation called crypto_xor_cpy(),
    taking separate input and output arguments. This removes the need for
    the separate memcpy().

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

01 Jun, 2017

5 commits


09 Mar, 2017

1 commit

  • Currently, the bit sliced NEON AES code for ARM has a link time
    dependency on the scalar ARM asm implementation, which it uses as a
    fallback to perform CBC encryption and the encryption of the initial
    XTS tweak.

    The bit sliced NEON code is both fast and time invariant, which makes
    it a reasonable default on hardware that supports it. However, the
    ARM asm code it pulls in is not time invariant, and due to the way it
    is linked in, cannot be overridden by the new generic time invariant
    driver. In fact, it will not be used at all, given that the ARM asm
    code registers itself as a cipher with a priority that exceeds the
    priority of the fixed time cipher.

    So remove the link time dependency, and allocate the fallback cipher
    via the crypto API. Note that this requires this driver's module_init
    call to be replaced with late_initcall, so that the (possibly generic)
    fallback cipher is guaranteed to be available when the builtin test
    is performed at registration time.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

01 Mar, 2017

2 commits

  • The accelerated CRC32 module for ARM may use either the scalar CRC32
    instructions, the NEON 64x64 to 128 bit polynomial multiplication
    (vmull.p64) instruction, or both, depending on what the current CPU
    supports.

    However, this also requires support in binutils, and as it turns out,
    versions of binutils exist that support the vmull.p64 instruction but
    not the crc32 instructions.

    So refactor the Makefile logic so that this module only gets built if
    binutils has support for both.

    Signed-off-by: Ard Biesheuvel
    Acked-by: Jon Hunter
    Tested-by: Jon Hunter
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Annotate a vmov instruction with an explicit element size of 32 bits.
    This is inferred by recent toolchains, but apparently, older versions
    need some help figuring this out.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

03 Feb, 2017

3 commits

  • The ARM bit sliced AES core code uses the IV buffer to pass the final
    keystream block back to the glue code if the input is not a multiple of
    the block size, so that the asm code does not have to deal with anything
    except 16 byte blocks. This is done under the assumption that the outgoing
    IV is meaningless anyway in this case, given that chaining is no longer
    possible under these circumstances.

    However, as it turns out, the CCM driver does expect the IV to retain
    a value that is equal to the original IV except for the counter value,
    and even interprets byte zero as a length indicator, which may result
    in memory corruption if the IV is overwritten with something else.

    So use a separate buffer to return the final keystream block.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Remove the unnecessary alignmask: it is much more efficient to deal with
    the misalignment in the core algorithm than relying on the crypto API to
    copy the data to a suitably aligned buffer.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Remove the unnecessary alignmask: it is much more efficient to deal with
    the misalignment in the core algorithm than relying on the crypto API to
    copy the data to a suitably aligned buffer.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

23 Jan, 2017

1 commit

  • The GNU assembler for ARM version 2.22 or older fails to infer the
    element size from the vmov instructions, and aborts the build in
    the following way;

    .../aes-neonbs-core.S: Assembler messages:
    .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10'
    .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9'
    .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8'
    .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7'
    .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10'
    .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9'
    .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8'
    .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7'

    Fix this by setting the element size explicitly, by replacing vmov with
    vmov.32.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

13 Jan, 2017

4 commits

  • The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
    which means we can no longer use 'tt' as a register alias on recent
    versions of binutils for ARM. So replace the alias with 'ttab'.

    Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher")
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This replaces the unwieldy generated implementation of bit-sliced AES
    in CBC/CTR/XTS modes that originated in the OpenSSL project with a
    new version that is heavily based on the OpenSSL implementation, but
    has a number of advantages over the old version:
    - it does not rely on the scalar AES cipher that also originated in the
    OpenSSL project and contains redundant lookup tables and key schedule
    generation routines (which we already have in crypto/aes_generic.)
    - it uses the same expanded key schedule for encryption and decryption,
    reducing the size of the per-key data structure by 1696 bytes
    - it adds an implementation of AES in ECB mode, which can be wrapped by
    other generic chaining mode implementations
    - it moves the handling of corner cases that are non critical to performance
    to the glue layer written in C
    - it was written directly in assembler rather than generated from a Perl
    script

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This replaces the scalar AES cipher that originates in the OpenSSL project
    with a new implementation that is ~15% (*) faster (on modern cores), and
    reuses the lookup tables and the key schedule generation routines from the
    generic C implementation (which is usually compiled in anyway due to
    networking and other subsystems depending on it).

    Note that the bit sliced NEON code for AES still depends on the scalar cipher
    that this patch replaces, so it is not removed entirely yet.

    * On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
    for 128-bit keys.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This is a straight port to ARM/NEON of the x86 SSE3 implementation
    of the ChaCha20 stream cipher. It uses the new skcipher walksize
    attribute to process the input in strides of 4x the block size.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

28 Dec, 2016

1 commit

  • This patch reverts the following commits:

    8621caa0d45e731f2e9f5889ff5bb384fcd6e059
    8096667273477e735b0072b11a6d617ccee45e5f

    I should not have applied them because they had already been
    obsoleted by a subsequent patch series. They also cause a build
    failure because of the subsequent commit 9ae433bc79f9.

    Fixes: 9ae433bc79f ("crypto: chacha20 - convert generic and...")
    Signed-off-by: Herbert Xu

    Herbert Xu
     

27 Dec, 2016

1 commit


07 Dec, 2016

2 commits

  • This is a combination of the the Intel algorithm implemented using SSE
    and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
    the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
    version 8 of the architecture. Two versions of the above combo are
    provided, one for CRC32 and one for CRC32C.

    The PMULL/NEON algorithm is faster, but operates on blocks of at least
    64 bytes, and on multiples of 16 bytes only. For the remaining input,
    or for all input on systems that lack the PMULL 64x64->128 instructions,
    the CRC32 instructions will be used.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This is a transliteration of the Intel algorithm implemented
    using SSE and PCLMULQDQ instructions that resides in the file
    arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
    operate on buffers that are 16 byte aligned (but of any size)

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel