Eric Lee / smarc-fsl-linux-kernel

24 Mar, 2019

1 commit

0beb34b86 crypto: arm/crct10dif - revert to C code for short inputs ... Browse Code »

commit 62fecf295e3c48be1b5f17c440b93875b9adb4d6 upstream.

The SIMD routine ported from x86 used to have a special code path
for inputs < 16 bytes, which got lost somewhere along the way.
Instead, the current glue code aligns the input pointer to permit
the NEON routine to use special versions of the vld1 instructions
that assume 16 byte alignment, but this could result in inputs of
less than 16 bytes to be passed in. This not only fails the new
extended tests that Eric has implemented, it also results in the
code reading past the end of the input, which could potentially
result in crashes when dealing with less than 16 bytes of input
at the end of a page which is followed by an unmapped page.

So update the glue code to only invoke the NEON routine if the
input is at least 16 bytes.

Reported-by: Eric Biggers
Reviewed-by: Eric Biggers
Fixes: 1d481f1cd892 ("crypto: arm/crct10dif - port x86 SSE implementation to ARM")
Cc: # v4.10+
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman

Ard Biesheuvel
2019-03-24 03:09:54 +0800

14 Nov, 2018

1 commit

3252b60cf crypto: speck - remove Speck ... Browse Code »

commit 578bdaabd015b9b164842c3e8ace9802f38e7ecc upstream.

These are unused, undesired, and have never actually been used by
anybody. The original authors of this code have changed their mind about
its inclusion. While originally proposed for disk encryption on low-end
devices, the idea was discarded [1] in favor of something else before
that could really get going. Therefore, this patch removes Speck.

[1] https://marc.info/?l=linux-crypto-vger&m=153359499015659

Signed-off-by: Jason A. Donenfeld
Acked-by: Eric Biggers
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman

Jason A. Donenfeld
2018-11-14 03:08:46 +0800

24 Aug, 2018

1 commit

3723c6324 treewide: convert ISO_8859-1 text comments to utf-8 ... Browse Code »

Almost all files in the kernel are either plain text or UTF-8 encoded. A
couple however are ISO_8859-1, usually just a few characters in a C
comments, for historic reasons.

This converts them all to UTF-8 for consistency.

Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann
Acked-by: Simon Horman [IPVS portion]
Acked-by: Jonathan Cameron [IIO]
Acked-by: Michael Ellerman [powerpc]
Acked-by: Rob Herring
Cc: Joe Perches
Cc: Arnd Bergmann
Cc: Samuel Ortiz
Cc: "David S. Miller"
Cc: Rob Herring
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2018-08-24 09:48:43 +0800

03 Aug, 2018

2 commits

4e34e51f4 crypto: arm/chacha20 - always use vrev for 16-bit rotates ... Browse Code »

The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too.

Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Eric Biggers
2018-08-03 18:06:05 +0800
c5f5aeef9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux ... Browse Code »

Merge mainline to pick up c7513c2a2714 ("crypto/arm64: aes-ce-gcm -
add missing kernel_neon_begin/end pair").

Herbert Xu
2018-08-03 17:55:12 +0800

09 Jul, 2018

3 commits

c87a405e3 crypto: ahash - remove useless setting of cra_type ... Browse Code »

Some ahash algorithms set .cra_type = &crypto_ahash_type. But this is
redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the .cra_type automatically.
Apparently the useless assignment has just been copy+pasted around.

So, remove the useless assignment from all the ahash algorithms.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers
Acked-by: Gilad Ben-Yossef
Signed-off-by: Herbert Xu

Eric Biggers
2018-07-09 00:30:26 +0800
6a38f6224 crypto: ahash - remove useless setting of type flags ... Browse Code »

Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH. But this
is redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the type flag automatically,
clearing any type flag that was already there. Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the ahash algorithms.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers
Acked-by: Gilad Ben-Yossef
Signed-off-by: Herbert Xu

Eric Biggers
2018-07-09 00:30:25 +0800
e50944e21 crypto: shash - remove useless setting of type flags ... Browse Code »

Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH. But this
is redundant with the C structure type ('struct shash_alg'), and
crypto_register_shash() already sets the type flag automatically,
clearing any type flag that was already there. Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the shash algorithms.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu

Eric Biggers
2018-07-09 00:30:24 +0800

01 Jul, 2018

1 commit

a068b94d7 crypto: arm/speck - fix building in Thumb2 mode ... Browse Code »

Building the kernel with CONFIG_THUMB2_KERNEL=y and
CONFIG_CRYPTO_SPECK_NEON set fails with the following errors:

arch/arm/crypto/speck-neon-core.S: Assembler messages:

arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here -- `bic sp,#0xf'
arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here -- `bic sp,#0xf'
arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here -- `bic sp,#0xf'
arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here -- `bic sp,#0xf'

The problem is that the 'bic' instruction can't operate on the 'sp'
register in Thumb2 mode. Fix it by using a temporary register. This
isn't in the main loop, so the performance difference is negligible.
This also matches what aes-neonbs-core.S does.

Reported-by: Stefan Agner
Fixes: ede9622162fa ("crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS")
Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
Reviewed-by: Stefan Agner
Signed-off-by: Herbert Xu

Eric Biggers
2018-07-01 23:31:46 +0800

31 May, 2018

1 commit

c2e415fe7 crypto: clarify licensing of OpenSSL asm code ... Browse Code »

Several source files have been taken from OpenSSL. In some of them a
comment that "permission to use under GPL terms is granted" was
included below a contradictory license statement. In several cases,
there was no indication that the license of the code was compatible
with the GPLv2.

This change clarifies the licensing for all of these files. I've
confirmed with the author (Andy Polyakov) that a) he has licensed the
files with the GPLv2 comment under that license and b) that he's also
happy to license the other files under GPLv2 too. In one case, the
file is already contained in his CRYPTOGAMS bundle, which has a GPLv2
option, and so no special measures are needed.

In all cases, the license status of code has been clarified by making
the GPLv2 license prominent.

The .S files have been regenerated from the updated .pl files.

This is a comment-only change. No code is changed.

Signed-off-by: Adam Langley
Signed-off-by: Herbert Xu

Adam Langley
2018-05-31 00:13:44 +0800

07 Apr, 2018

1 commit

54a702f70 kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers ... Browse Code »

GNU Make automatically deletes intermediate files that are updated
in a chain of pattern rules.

Example 1) %.dtb.o
Acked-by: Frank Rowand
Acked-by: Ingo Molnar

Masahiro Yamada
2018-04-07 18:04:02 +0800

23 Mar, 2018

1 commit

6aaf49b49 crypto: arm,arm64 - Fix random regeneration of S_shipped ... Browse Code »

The decision to rebuild .S_shipped is made based on the relative
timestamps of .S_shipped and .pl files but git makes this essentially
random. This means that the perl script might run anyway (usually at
most once per checkout), defeating the whole purpose of _shipped.

Fix by skipping the rule unless explicit make variables are provided:
REGENERATE_ARM_CRYPTO or REGENERATE_ARM64_CRYPTO.

This can produce nasty occasional build failures downstream, for example
for toolchains with broken perl. The solution is minimally intrusive to
make it easier to push into stable.

Another report on a similar issue here: https://lkml.org/lkml/2018/3/8/1379

Signed-off-by: Leonard Crestez
Cc:
Reviewed-by: Masahiro Yamada
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Leonard Crestez
2018-03-23 23:43:19 +0800

22 Feb, 2018

2 commits

ede962216 crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS ... Browse Code »

Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64. Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.

The performance depends on the processor but can be about 3 times faster
than the generic code. For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:

xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s

In comparison to AES-256-XTS without the Cryptography Extensions:

xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s
xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s
xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s

Speck64/128-XTS is even faster:

xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s

Note that as with the generic code, only the Speck128 and Speck64
variants are supported. Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases. The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback. Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.

The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper. Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.

Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu

Eric Biggers
2018-02-22 22:16:55 +0800
4ff8b1dd8 crypto: arm/aes-cipher - move S-box to .rodata section ... Browse Code »

Move the AES inverse S-box to the .rodata section
where it is safe from abuse by speculation.

Signed-off-by: Jinbum Park
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Jinbum Park
2018-02-22 22:16:19 +0800

12 Jan, 2018

1 commit

a208fa8f3 crypto: hash - annotate algorithms taking optional key ... Browse Code »

We need to consistently enforce that keyed hashes cannot be used without
setting the key. To do this we need a reliable way to determine whether
a given hash algorithm is keyed or not. AF_ALG currently does this by
checking for the presence of a ->setkey() method. However, this is
actually slightly broken because the CRC-32 algorithms implement
->setkey() but can also be used without a key. (The CRC-32 "key" is not
actually a cryptographic key but rather represents the initial state.
If not overridden, then a default initial state is used.)

Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
indicates that the algorithm has a ->setkey() method, but it is not
required to be called. Then set it on all the CRC-32 algorithms.

The same also applies to the Adler-32 implementation in Lustre.

Also, the cryptd and mcryptd templates have to pass through the flag
from their underlying algorithm.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu

Eric Biggers
2018-01-12 20:03:35 +0800

11 Dec, 2017

1 commit

26d85e5f3 crypto: arm/aes-neonbs - Use PTR_ERR_OR_ZERO() ... Browse Code »

Fix ptr_ret.cocci warnings:
arch/arm/crypto/aes-neonbs-glue.c:184:1-3: WARNING: PTR_ERR_OR_ZERO can be used
arch/arm/crypto/aes-neonbs-glue.c:261:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Vasyl Gomonovych
Signed-off-by: Herbert Xu

Gomonovych, Vasyl
2017-12-11 19:36:56 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

04 Aug, 2017

3 commits

0d149ce67 crypto: arm/aes - avoid expanded lookup tables in the final round ... Browse Code »

For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.

This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-08-04 09:27:25 +0800
3759ee057 crypto: arm/ghash - add NEON accelerated fallback for vmull.p64 ... Browse Code »

Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572)

On a 32-bit guest executing under KVM on a Cortex-A57, the new code is
not only 4x faster than the generic table based GHASH driver, it is also
time invariant. (Note that the existing vmull.p64 code is 16x faster on
this core).

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-08-04 09:27:24 +0800
45fe93dff crypto: algapi - make crypto_xor() take separate dst and src arguments ... Browse Code »

There are quite a number of occurrences in the kernel of the pattern

if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);

or

crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);

where crypto_xor() is preceded or followed by a memcpy() invocation
that is only there because crypto_xor() uses its output parameter as
one of the inputs. To avoid having to add new instances of this pattern
in the arm64 code, which will be refactored to implement non-SIMD
fallbacks, add an alternative implementation called crypto_xor_cpy(),
taking separate input and output arguments. This removes the need for
the separate memcpy().

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-08-04 09:27:15 +0800

01 Jun, 2017

5 commits

2a9faf8b7 crypto: arm/crc32 - enable module autoloading based on CPU feature bits ... Browse Code »

Make the module autoloadable by tying it to the CPU feature bits that
describe whether the optional instructions it relies on are implemented
by the current CPU.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-06-01 12:55:42 +0800
a83ff88be crypto: arm/sha2-ce - enable module autoloading based on CPU feature bits ... Browse Code »

Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-06-01 12:55:41 +0800
bd56f95ea crypto: arm/sha1-ce - enable module autoloading based on CPU feature bits ... Browse Code »

Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-06-01 12:55:40 +0800
c9d9f608b crypto: arm/ghash-ce - enable module autoloading based on CPU feature bits ... Browse Code »

Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-06-01 12:55:39 +0800
4d8061a59 crypto: arm/aes-ce - enable module autoloading based on CPU feature bits ... Browse Code »

Make the module autoloadable by tying it to the CPU feature bit that
describes whether the optional instructions it relies on are implemented
by the current CPU.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-06-01 12:55:38 +0800

09 Mar, 2017

1 commit

b56f5cbc7 crypto: arm/aes-neonbs - resolve fallback cipher at runtime ... Browse Code »

Currently, the bit sliced NEON AES code for ARM has a link time
dependency on the scalar ARM asm implementation, which it uses as a
fallback to perform CBC encryption and the encryption of the initial
XTS tweak.

The bit sliced NEON code is both fast and time invariant, which makes
it a reasonable default on hardware that supports it. However, the
ARM asm code it pulls in is not time invariant, and due to the way it
is linked in, cannot be overridden by the new generic time invariant
driver. In fact, it will not be used at all, given that the ARM asm
code registers itself as a cipher with a priority that exceeds the
priority of the fixed time cipher.

So remove the link time dependency, and allocate the fallback cipher
via the crypto API. Note that this requires this driver's module_init
call to be replaced with late_initcall, so that the (possibly generic)
fallback cipher is guaranteed to be available when the builtin test
is performed at registration time.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-03-09 18:34:16 +0800

01 Mar, 2017

2 commits

efa7cebdb crypto: arm/crc32 - add build time test for CRC instruction support ... Browse Code »

The accelerated CRC32 module for ARM may use either the scalar CRC32
instructions, the NEON 64x64 to 128 bit polynomial multiplication
(vmull.p64) instruction, or both, depending on what the current CPU
supports.

However, this also requires support in binutils, and as it turns out,
versions of binutils exist that support the vmull.p64 instruction but
not the crc32 instructions.

So refactor the Makefile logic so that this module only gets built if
binutils has support for both.

Signed-off-by: Ard Biesheuvel
Acked-by: Jon Hunter
Tested-by: Jon Hunter
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-03-01 19:47:53 +0800
1fb1683cb crypto: arm/crc32 - fix build error with outdated binutils ... Browse Code »

Annotate a vmov instruction with an explicit element size of 32 bits.
This is inferred by recent toolchains, but apparently, older versions
need some help figuring this out.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-03-01 19:47:51 +0800

03 Feb, 2017

3 commits

1a20b9661 crypto: arm/aes - don't use IV buffer to return final keystream block ... Browse Code »

The ARM bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.

However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.

So use a separate buffer to return the final keystream block.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-02-03 18:16:21 +0800
4a70b5262 crypto: arm/chacha20 - remove cra_alignmask ... Browse Code »

Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-02-03 18:16:19 +0800
1465fb13d crypto: arm/aes-ce - remove cra_alignmask ... Browse Code »

Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-02-03 18:16:16 +0800

23 Jan, 2017

1 commit

13954e788 crypto: arm/aes-neonbs - fix issue with v2.22 and older assembler ... Browse Code »

The GNU assembler for ARM version 2.22 or older fails to infer the
element size from the vmov instructions, and aborts the build in
the following way;

.../aes-neonbs-core.S: Assembler messages:
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7'

Fix this by setting the element size explicitly, by replacing vmov with
vmov.32.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-01-23 22:50:25 +0800

13 Jan, 2017

4 commits

658fa754c crypto: arm/aes - avoid reserved 'tt' mnemonic in asm code ... Browse Code »

The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.

Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher")
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-01-13 18:47:21 +0800
cc477bf64 crypto: arm/aes - replace bit-sliced OpenSSL NEON code ... Browse Code »

This replaces the unwieldy generated implementation of bit-sliced AES
in CBC/CTR/XTS modes that originated in the OpenSSL project with a
new version that is heavily based on the OpenSSL implementation, but
has a number of advantages over the old version:
- it does not rely on the scalar AES cipher that also originated in the
OpenSSL project and contains redundant lookup tables and key schedule
generation routines (which we already have in crypto/aes_generic.)
- it uses the same expanded key schedule for encryption and decryption,
reducing the size of the per-key data structure by 1696 bytes
- it adds an implementation of AES in ECB mode, which can be wrapped by
other generic chaining mode implementations
- it moves the handling of corner cases that are non critical to performance
to the glue layer written in C
- it was written directly in assembler rather than generated from a Perl
script

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-01-13 18:27:31 +0800
81edb4262 crypto: arm/aes - replace scalar AES cipher ... Browse Code »

This replaces the scalar AES cipher that originates in the OpenSSL project
with a new implementation that is ~15% (*) faster (on modern cores), and
reuses the lookup tables and the key schedule generation routines from the
generic C implementation (which is usually compiled in anyway due to
networking and other subsystems depending on it).

Note that the bit sliced NEON code for AES still depends on the scalar cipher
that this patch replaces, so it is not removed entirely yet.

* On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
for 128-bit keys.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-01-13 00:26:50 +0800
afaf712e9 crypto: arm/chacha20 - implement NEON version based on SSE3 code ... Browse Code »

This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-01-13 00:26:48 +0800

28 Dec, 2016

1 commit

5386e5d1f Revert "crypto: arm64/ARM: NEON accelerated ChaCha20" ... Browse Code »

This patch reverts the following commits:

8621caa0d45e731f2e9f5889ff5bb384fcd6e059
8096667273477e735b0072b11a6d617ccee45e5f

I should not have applied them because they had already been
obsoleted by a subsequent patch series. They also cause a build
failure because of the subsequent commit 9ae433bc79f9.

Fixes: 9ae433bc79f ("crypto: chacha20 - convert generic and...")
Signed-off-by: Herbert Xu

Herbert Xu
2016-12-28 17:39:26 +0800

27 Dec, 2016

1 commit

809666727 crypto: arm/chacha20 - implement NEON version based on SSE3 code ... Browse Code »

This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2016-12-27 17:47:29 +0800

07 Dec, 2016

2 commits

d0a3431a7 crypto: arm/crc32 - accelerated support based on x86 SSE implementation ... Browse Code »

This is a combination of the the Intel algorithm implemented using SSE
and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
version 8 of the architecture. Two versions of the above combo are
provided, one for CRC32 and one for CRC32C.

The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2016-12-07 20:01:24 +0800
1d481f1cd crypto: arm/crct10dif - port x86 SSE implementation to ARM ... Browse Code »

This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on buffers that are 16 byte aligned (but of any size)

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2016-12-07 20:01:21 +0800