Eric Lee / smarc-fsl-linux-kernel

25 Oct, 2019

1 commit

528282630 crypto: aegis128 - duplicate init() and final() hooks in SIMD code ... Browse Code »

In order to speed up aegis128 processing even more, duplicate the init()
and final() routines as SIMD versions in their entirety. This results
in a 2x speedup on ARM Cortex-A57 for ~1500 byte packets (using AES
instructions).

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2019-10-25 23:06:05 +0800

30 Aug, 2019

1 commit

389139b34 crypto: arm64/aegis128 - use explicit vector load for permute vectors ... Browse Code »

When building the new aegis128 NEON code in big endian mode, Clang
complains about the const uint8x16_t permute vectors in the following
way:

crypto/aegis128-neon-inner.c:58:40: warning: vector initializers are not
compatible with NEON intrinsics in big endian mode
[-Wnonportable-vector-initialization]
static const uint8x16_t shift_rows = {
^
crypto/aegis128-neon-inner.c:58:40: note: consider using vld1q_u8() to
initialize a vector from memory, or vcombine_u8(vcreate_u8(), vcreate_u8())
to initialize from integer constants

Since the same issue applies to the uint8x16x4_t loads of the AES Sbox,
update those references as well. However, since GCC does not implement
the vld1q_u8_x4() intrinsic, switch from IS_ENABLED() to a preprocessor
conditional to conditionally include this code.

Reported-by: Nathan Chancellor
Signed-off-by: Ard Biesheuvel
Tested-by: Nathan Chancellor
Signed-off-by: Herbert Xu

Ard Biesheuvel
2019-08-30 16:05:27 +0800

15 Aug, 2019

2 commits

198429631 crypto: arm64/aegis128 - implement plain NEON version ... Browse Code »

Provide a version of the core AES transform to the aegis128 SIMD
code that does not rely on the special AES instructions, but uses
plain NEON instructions instead. This allows the SIMD version of
the aegis128 driver to be used on arm64 systems that do not
implement those instructions (which are not mandatory in the
architecture), such as the Raspberry Pi 3.

Since GCC makes a mess of this when using the tbl/tbx intrinsics
to perform the sbox substitution, preload the Sbox into v16..v31
in this case and use inline asm to emit the tbl/tbx instructions.
Clang does not support this approach, nor does it require it, since
it does a much better job at code generation, so there we use the
intrinsics as usual.

Cc: Nick Desaulniers
Signed-off-by: Ard Biesheuvel
Acked-by: Nick Desaulniers
Signed-off-by: Herbert Xu

Ard Biesheuvel
2019-08-15 19:52:15 +0800
a4397635a crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics ... Browse Code »

Provide an accelerated implementation of aegis128 by wiring up the
SIMD hooks in the generic driver to an implementation based on NEON
intrinsics, which can be compiled to both ARM and arm64 code.

This results in a performance of 2.2 cycles per byte on Cortex-A53,
which is a performance increase of ~11x compared to the generic
code.

Reviewed-by: Ondrej Mosnacek
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2019-08-15 19:52:15 +0800

02 Aug, 2019

1 commit

c9f1fd4f2 Revert "crypto: aegis128 - add support for SIMD acceleration" ... Browse Code »

This reverts commit ecc8bc81f2fb3976737ef312f824ba6053aa3590
("crypto: aegis128 - provide a SIMD implementation based on NEON
intrinsics") and commit 7cdc0ddbf74a19cecb2f0e9efa2cae9d3c665189
("crypto: aegis128 - add support for SIMD acceleration").

They cause compile errors on platforms other than ARM because
the mechanism to selectively compile the SIMD code is broken.

Repoted-by: Heiko Carstens
Reported-by: Stephen Rothwell
Signed-off-by: Herbert Xu

Herbert Xu
2019-08-02 11:31:35 +0800

26 Jul, 2019

1 commit

ecc8bc81f crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics ... Browse Code »

Provide an accelerated implementation of aegis128 by wiring up the
SIMD hooks in the generic driver to an implementation based on NEON
intrinsics, which can be compiled to both ARM and arm64 code.

This results in a performance of 2.2 cycles per byte on Cortex-A53,
which is a performance increase of ~11x compared to the generic
code.

Reviewed-by: Ondrej Mosnacek
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2019-07-26 13:03:58 +0800