24 Mar, 2019
1 commit
-
commit 62fecf295e3c48be1b5f17c440b93875b9adb4d6 upstream.
The SIMD routine ported from x86 used to have a special code path
for inputs < 16 bytes, which got lost somewhere along the way.
Instead, the current glue code aligns the input pointer to permit
the NEON routine to use special versions of the vld1 instructions
that assume 16 byte alignment, but this could result in inputs of
less than 16 bytes to be passed in. This not only fails the new
extended tests that Eric has implemented, it also results in the
code reading past the end of the input, which could potentially
result in crashes when dealing with less than 16 bytes of input
at the end of a page which is followed by an unmapped page.So update the glue code to only invoke the NEON routine if the
input is at least 16 bytes.Reported-by: Eric Biggers
Reviewed-by: Eric Biggers
Fixes: 1d481f1cd892 ("crypto: arm/crct10dif - port x86 SSE implementation to ARM")
Cc: # v4.10+
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman
14 Nov, 2018
1 commit
-
commit 578bdaabd015b9b164842c3e8ace9802f38e7ecc upstream.
These are unused, undesired, and have never actually been used by
anybody. The original authors of this code have changed their mind about
its inclusion. While originally proposed for disk encryption on low-end
devices, the idea was discarded [1] in favor of something else before
that could really get going. Therefore, this patch removes Speck.[1] https://marc.info/?l=linux-crypto-vger&m=153359499015659
Signed-off-by: Jason A. Donenfeld
Acked-by: Eric Biggers
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman
24 Aug, 2018
1 commit
-
Almost all files in the kernel are either plain text or UTF-8 encoded. A
couple however are ISO_8859-1, usually just a few characters in a C
comments, for historic reasons.This converts them all to UTF-8 for consistency.
Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann
Acked-by: Simon Horman [IPVS portion]
Acked-by: Jonathan Cameron [IIO]
Acked-by: Michael Ellerman [powerpc]
Acked-by: Rob Herring
Cc: Joe Perches
Cc: Arnd Bergmann
Cc: Samuel Ortiz
Cc: "David S. Miller"
Cc: Rob Herring
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Aug, 2018
2 commits
-
The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too.Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Merge mainline to pick up c7513c2a2714 ("crypto/arm64: aes-ce-gcm -
add missing kernel_neon_begin/end pair").
09 Jul, 2018
3 commits
-
Some ahash algorithms set .cra_type = &crypto_ahash_type. But this is
redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the .cra_type automatically.
Apparently the useless assignment has just been copy+pasted around.So, remove the useless assignment from all the ahash algorithms.
This patch shouldn't change any actual behavior.
Signed-off-by: Eric Biggers
Acked-by: Gilad Ben-Yossef
Signed-off-by: Herbert Xu -
Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH. But this
is redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the type flag automatically,
clearing any type flag that was already there. Apparently the useless
assignment has just been copy+pasted around.So, remove the useless assignment from all the ahash algorithms.
This patch shouldn't change any actual behavior.
Signed-off-by: Eric Biggers
Acked-by: Gilad Ben-Yossef
Signed-off-by: Herbert Xu -
Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH. But this
is redundant with the C structure type ('struct shash_alg'), and
crypto_register_shash() already sets the type flag automatically,
clearing any type flag that was already there. Apparently the useless
assignment has just been copy+pasted around.So, remove the useless assignment from all the shash algorithms.
This patch shouldn't change any actual behavior.
Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu
01 Jul, 2018
1 commit
-
Building the kernel with CONFIG_THUMB2_KERNEL=y and
CONFIG_CRYPTO_SPECK_NEON set fails with the following errors:arch/arm/crypto/speck-neon-core.S: Assembler messages:
arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here -- `bic sp,#0xf'
arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here -- `bic sp,#0xf'
arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here -- `bic sp,#0xf'
arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here -- `bic sp,#0xf'The problem is that the 'bic' instruction can't operate on the 'sp'
register in Thumb2 mode. Fix it by using a temporary register. This
isn't in the main loop, so the performance difference is negligible.
This also matches what aes-neonbs-core.S does.Reported-by: Stefan Agner
Fixes: ede9622162fa ("crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS")
Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
Reviewed-by: Stefan Agner
Signed-off-by: Herbert Xu
31 May, 2018
1 commit
-
Several source files have been taken from OpenSSL. In some of them a
comment that "permission to use under GPL terms is granted" was
included below a contradictory license statement. In several cases,
there was no indication that the license of the code was compatible
with the GPLv2.This change clarifies the licensing for all of these files. I've
confirmed with the author (Andy Polyakov) that a) he has licensed the
files with the GPLv2 comment under that license and b) that he's also
happy to license the other files under GPLv2 too. In one case, the
file is already contained in his CRYPTOGAMS bundle, which has a GPLv2
option, and so no special measures are needed.In all cases, the license status of code has been clarified by making
the GPLv2 license prominent.The .S files have been regenerated from the updated .pl files.
This is a comment-only change. No code is changed.
Signed-off-by: Adam Langley
Signed-off-by: Herbert Xu
07 Apr, 2018
1 commit
-
GNU Make automatically deletes intermediate files that are updated
in a chain of pattern rules.Example 1) %.dtb.o
Acked-by: Frank Rowand
Acked-by: Ingo Molnar
23 Mar, 2018
1 commit
-
The decision to rebuild .S_shipped is made based on the relative
timestamps of .S_shipped and .pl files but git makes this essentially
random. This means that the perl script might run anyway (usually at
most once per checkout), defeating the whole purpose of _shipped.Fix by skipping the rule unless explicit make variables are provided:
REGENERATE_ARM_CRYPTO or REGENERATE_ARM64_CRYPTO.This can produce nasty occasional build failures downstream, for example
for toolchains with broken perl. The solution is minimally intrusive to
make it easier to push into stable.Another report on a similar issue here: https://lkml.org/lkml/2018/3/8/1379
Signed-off-by: Leonard Crestez
Cc:
Reviewed-by: Masahiro Yamada
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
22 Feb, 2018
2 commits
-
Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64. Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.The performance depends on the processor but can be about 3 times faster
than the generic code. For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/sIn comparison to AES-256-XTS without the Cryptography Extensions:
xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s
xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s
xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/sSpeck64/128-XTS is even faster:
xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s
Note that as with the generic code, only the Speck128 and Speck64
variants are supported. Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases. The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback. Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper. Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu -
Move the AES inverse S-box to the .rodata section
where it is safe from abuse by speculation.Signed-off-by: Jinbum Park
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
12 Jan, 2018
1 commit
-
We need to consistently enforce that keyed hashes cannot be used without
setting the key. To do this we need a reliable way to determine whether
a given hash algorithm is keyed or not. AF_ALG currently does this by
checking for the presence of a ->setkey() method. However, this is
actually slightly broken because the CRC-32 algorithms implement
->setkey() but can also be used without a key. (The CRC-32 "key" is not
actually a cryptographic key but rather represents the initial state.
If not overridden, then a default initial state is used.)Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
indicates that the algorithm has a ->setkey() method, but it is not
required to be called. Then set it on all the CRC-32 algorithms.The same also applies to the Adler-32 implementation in Lustre.
Also, the cryptd and mcryptd templates have to pass through the flag
from their underlying algorithm.Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu
11 Dec, 2017
1 commit
-
Fix ptr_ret.cocci warnings:
arch/arm/crypto/aes-neonbs-glue.c:184:1-3: WARNING: PTR_ERR_OR_ZERO can be used
arch/arm/crypto/aes-neonbs-glue.c:261:1-3: WARNING: PTR_ERR_OR_ZERO can be usedUse PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: Vasyl Gomonovych
Signed-off-by: Herbert Xu
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
04 Aug, 2017
3 commits
-
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572)On a 32-bit guest executing under KVM on a Cortex-A57, the new code is
not only 4x faster than the generic table based GHASH driver, it is also
time invariant. (Note that the existing vmull.p64 code is 16x faster on
this core).Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
There are quite a number of occurrences in the kernel of the pattern
if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);or
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);where crypto_xor() is preceded or followed by a memcpy() invocation
that is only there because crypto_xor() uses its output parameter as
one of the inputs. To avoid having to add new instances of this pattern
in the arm64 code, which will be refactored to implement non-SIMD
fallbacks, add an alternative implementation called crypto_xor_cpy(),
taking separate input and output arguments. This removes the need for
the separate memcpy().Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
01 Jun, 2017
5 commits
-
Make the module autoloadable by tying it to the CPU feature bits that
describe whether the optional instructions it relies on are implemented
by the current CPU.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
09 Mar, 2017
1 commit
-
Currently, the bit sliced NEON AES code for ARM has a link time
dependency on the scalar ARM asm implementation, which it uses as a
fallback to perform CBC encryption and the encryption of the initial
XTS tweak.The bit sliced NEON code is both fast and time invariant, which makes
it a reasonable default on hardware that supports it. However, the
ARM asm code it pulls in is not time invariant, and due to the way it
is linked in, cannot be overridden by the new generic time invariant
driver. In fact, it will not be used at all, given that the ARM asm
code registers itself as a cipher with a priority that exceeds the
priority of the fixed time cipher.So remove the link time dependency, and allocate the fallback cipher
via the crypto API. Note that this requires this driver's module_init
call to be replaced with late_initcall, so that the (possibly generic)
fallback cipher is guaranteed to be available when the builtin test
is performed at registration time.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
01 Mar, 2017
2 commits
-
The accelerated CRC32 module for ARM may use either the scalar CRC32
instructions, the NEON 64x64 to 128 bit polynomial multiplication
(vmull.p64) instruction, or both, depending on what the current CPU
supports.However, this also requires support in binutils, and as it turns out,
versions of binutils exist that support the vmull.p64 instruction but
not the crc32 instructions.So refactor the Makefile logic so that this module only gets built if
binutils has support for both.Signed-off-by: Ard Biesheuvel
Acked-by: Jon Hunter
Tested-by: Jon Hunter
Signed-off-by: Herbert Xu -
Annotate a vmov instruction with an explicit element size of 32 bits.
This is inferred by recent toolchains, but apparently, older versions
need some help figuring this out.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
03 Feb, 2017
3 commits
-
The ARM bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.So use a separate buffer to return the final keystream block.
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
23 Jan, 2017
1 commit
-
The GNU assembler for ARM version 2.22 or older fails to infer the
element size from the vmov instructions, and aborts the build in
the following way;.../aes-neonbs-core.S: Assembler messages:
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7'Fix this by setting the element size explicitly, by replacing vmov with
vmov.32.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
13 Jan, 2017
4 commits
-
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher")
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
This replaces the unwieldy generated implementation of bit-sliced AES
in CBC/CTR/XTS modes that originated in the OpenSSL project with a
new version that is heavily based on the OpenSSL implementation, but
has a number of advantages over the old version:
- it does not rely on the scalar AES cipher that also originated in the
OpenSSL project and contains redundant lookup tables and key schedule
generation routines (which we already have in crypto/aes_generic.)
- it uses the same expanded key schedule for encryption and decryption,
reducing the size of the per-key data structure by 1696 bytes
- it adds an implementation of AES in ECB mode, which can be wrapped by
other generic chaining mode implementations
- it moves the handling of corner cases that are non critical to performance
to the glue layer written in C
- it was written directly in assembler rather than generated from a Perl
scriptSigned-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
This replaces the scalar AES cipher that originates in the OpenSSL project
with a new implementation that is ~15% (*) faster (on modern cores), and
reuses the lookup tables and the key schedule generation routines from the
generic C implementation (which is usually compiled in anyway due to
networking and other subsystems depending on it).Note that the bit sliced NEON code for AES still depends on the scalar cipher
that this patch replaces, so it is not removed entirely yet.* On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
for 128-bit keys.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
28 Dec, 2016
1 commit
-
This patch reverts the following commits:
8621caa0d45e731f2e9f5889ff5bb384fcd6e059
8096667273477e735b0072b11a6d617ccee45e5fI should not have applied them because they had already been
obsoleted by a subsequent patch series. They also cause a build
failure because of the subsequent commit 9ae433bc79f9.Fixes: 9ae433bc79f ("crypto: chacha20 - convert generic and...")
Signed-off-by: Herbert Xu
27 Dec, 2016
1 commit
-
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
07 Dec, 2016
2 commits
-
This is a combination of the the Intel algorithm implemented using SSE
and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
version 8 of the architecture. Two versions of the above combo are
provided, one for CRC32 and one for CRC32C.The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu -
This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on buffers that are 16 byte aligned (but of any size)Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu