01 Mar, 2019
1 commit
-
Clang warns: vector initializers are not compatible with NEON intrinsics
in big endian mode [-Wnonportable-vector-initialization]While this is usually the case, it's not an issue for this case since
we're initializing the uint8x16_t (16x uint8_t's) with the same value.Instead, use vdupq_n_u8 which both compilers lower into a single movi
instruction: https://godbolt.org/z/vBrgztThis avoids the static storage for a constant value.
Link: https://github.com/ClangBuiltLinux/linux/issues/214
Suggested-by: Nathan Chancellor
Reviewed-by: Ard Biesheuvel
Signed-off-by: Nick Desaulniers
Signed-off-by: Catalin Marinas
10 Aug, 2017
1 commit
-
The P/Q left side optimization in the delta syndrome simply involves
repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given
that 'x * x * x * x' equals 'x^4' even in the polynomial world, we
can accelerate this substantially by performing up to 4 such operations
at once, using the NEON instructions for polynomial multiplication.Results on a Cortex-A57 running in 64-bit mode:
Before:
-------
raid6: neonx1 xor() 1680 MB/s
raid6: neonx2 xor() 2286 MB/s
raid6: neonx4 xor() 3162 MB/s
raid6: neonx8 xor() 3389 MB/sAfter:
------
raid6: neonx1 xor() 2281 MB/s
raid6: neonx2 xor() 3362 MB/s
raid6: neonx4 xor() 3787 MB/s
raid6: neonx8 xor() 4239 MB/sWhile we're at it, simplify MASK() by using a signed shift rather than
a vector compare involving a temp register.Signed-off-by: Ard Biesheuvel
Signed-off-by: Catalin Marinas
01 Sep, 2015
1 commit
-
This implements XOR syndrome calculation using NEON intrinsics.
As before, the module can be built for ARM and arm64 from the
same source.Relative performance on a Cortex-A57 based system:
raid6: int64x1 gen() 905 MB/s
raid6: int64x1 xor() 881 MB/s
raid6: int64x2 gen() 1343 MB/s
raid6: int64x2 xor() 1286 MB/s
raid6: int64x4 gen() 1896 MB/s
raid6: int64x4 xor() 1321 MB/s
raid6: int64x8 gen() 1773 MB/s
raid6: int64x8 xor() 1165 MB/s
raid6: neonx1 gen() 1834 MB/s
raid6: neonx1 xor() 1278 MB/s
raid6: neonx2 gen() 2528 MB/s
raid6: neonx2 xor() 1942 MB/s
raid6: neonx4 gen() 2888 MB/s
raid6: neonx4 xor() 2334 MB/s
raid6: neonx8 gen() 2957 MB/s
raid6: neonx8 xor() 2232 MB/s
raid6: using algorithm neonx8 gen() 2957 MB/s
raid6: .... xor() 2232 MB/s, rmw enabledCc: Markus Stockhausen
Cc: Neil Brown
Signed-off-by: Ard Biesheuvel
Signed-off-by: NeilBrown
09 Jul, 2013
1 commit
-
Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.Signed-off-by: Ard Biesheuvel
Acked-by: Nicolas Pitre
Cc: hpa@linux.intel.com