Eric Lee / smarc-fsl-linux-kernel | Embedian Git Server

01 Mar, 2019

1 commit

1ad3935b3 lib/raid6: use vdupq_n_u8 to avoid endianness warnings ... Browse Code »

Clang warns: vector initializers are not compatible with NEON intrinsics
in big endian mode [-Wnonportable-vector-initialization]

While this is usually the case, it's not an issue for this case since
we're initializing the uint8x16_t (16x uint8_t's) with the same value.

Instead, use vdupq_n_u8 which both compilers lower into a single movi
instruction: https://godbolt.org/z/vBrgzt

This avoids the static storage for a constant value.

Link: https://github.com/ClangBuiltLinux/linux/issues/214
Suggested-by: Nathan Chancellor
Reviewed-by: Ard Biesheuvel
Signed-off-by: Nick Desaulniers
Signed-off-by: Catalin Marinas

ndesaulniers@google.com
2019-03-01 01:44:51 +0800

10 Aug, 2017

1 commit

35129dde8 md/raid6: use faster multiplication for ARM NEON delta syndrome ... Browse Code »

The P/Q left side optimization in the delta syndrome simply involves
repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given
that 'x * x * x * x' equals 'x^4' even in the polynomial world, we
can accelerate this substantially by performing up to 4 such operations
at once, using the NEON instructions for polynomial multiplication.

Results on a Cortex-A57 running in 64-bit mode:

Before:
-------
raid6: neonx1 xor() 1680 MB/s
raid6: neonx2 xor() 2286 MB/s
raid6: neonx4 xor() 3162 MB/s
raid6: neonx8 xor() 3389 MB/s

After:
------
raid6: neonx1 xor() 2281 MB/s
raid6: neonx2 xor() 3362 MB/s
raid6: neonx4 xor() 3787 MB/s
raid6: neonx8 xor() 4239 MB/s

While we're at it, simplify MASK() by using a signed shift rather than
a vector compare involving a temp register.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Catalin Marinas

Ard Biesheuvel
2017-08-10 01:51:57 +0800

01 Sep, 2015

1 commit

0e833e697 md/raid6: delta syndrome for ARM NEON ... Browse Code »

This implements XOR syndrome calculation using NEON intrinsics.
As before, the module can be built for ARM and arm64 from the
same source.

Relative performance on a Cortex-A57 based system:

raid6: int64x1 gen() 905 MB/s
raid6: int64x1 xor() 881 MB/s
raid6: int64x2 gen() 1343 MB/s
raid6: int64x2 xor() 1286 MB/s
raid6: int64x4 gen() 1896 MB/s
raid6: int64x4 xor() 1321 MB/s
raid6: int64x8 gen() 1773 MB/s
raid6: int64x8 xor() 1165 MB/s
raid6: neonx1 gen() 1834 MB/s
raid6: neonx1 xor() 1278 MB/s
raid6: neonx2 gen() 2528 MB/s
raid6: neonx2 xor() 1942 MB/s
raid6: neonx4 gen() 2888 MB/s
raid6: neonx4 xor() 2334 MB/s
raid6: neonx8 gen() 2957 MB/s
raid6: neonx8 xor() 2232 MB/s
raid6: using algorithm neonx8 gen() 2957 MB/s
raid6: .... xor() 2232 MB/s, rmw enabled

Cc: Markus Stockhausen
Cc: Neil Brown
Signed-off-by: Ard Biesheuvel
Signed-off-by: NeilBrown

Ard Biesheuvel
2015-09-01 01:29:05 +0800

09 Jul, 2013

1 commit

7d11965dd lib/raid6: add ARM-NEON accelerated syndrome calculation ... Browse Code »

Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.

Signed-off-by: Ard Biesheuvel
Acked-by: Nicolas Pitre
Cc: hpa@linux.intel.com

Ard Biesheuvel
2013-07-09 05:09:18 +0800