20 Mar, 2018

1 commit

  • This patch uses the vpermxor instruction to optimise the raid6 Q
    syndrome. This instruction was made available with POWER8, ISA version
    2.07. It allows for both vperm and vxor instructions to be done in a
    single instruction. This has been tested for correctness on a ppc64le
    vm with a basic RAID6 setup containing 5 drives.

    The performance benchmarks are from the raid6test in the
    /lib/raid6/test directory. These results are from an IBM Firestone
    machine with ppc64le architecture. The benchmark results show a 35%
    speed increase over the best existing algorithm for powerpc (altivec).
    The raid6test has also been run on a big-endian ppc64 vm to ensure it
    also works for big-endian architectures.

    Performance benchmarks:
    raid6: altivecx4 gen() 18773 MB/s
    raid6: altivecx8 gen() 19438 MB/s

    raid6: vpermxor4 gen() 25112 MB/s
    raid6: vpermxor8 gen() 26279 MB/s

    Signed-off-by: Matt Brown
    Reviewed-by: Daniel Axtens
    [mpe: Add VPERMXOR macro so we can build with old binutils]
    Signed-off-by: Michael Ellerman

    Matt Brown
     

29 Aug, 2016

1 commit

  • Using vector registers is slightly faster:

    raid6: vx128x8 gen() 19705 MB/s
    raid6: vx128x8 xor() 11886 MB/s
    raid6: using algorithm vx128x8 gen() 19705 MB/s
    raid6: .... xor() 11886 MB/s, rmw enabled

    vs the software algorithms:

    raid6: int64x1 gen() 3018 MB/s
    raid6: int64x1 xor() 1429 MB/s
    raid6: int64x2 gen() 4661 MB/s
    raid6: int64x2 xor() 3143 MB/s
    raid6: int64x4 gen() 5392 MB/s
    raid6: int64x4 xor() 3509 MB/s
    raid6: int64x8 gen() 4441 MB/s
    raid6: int64x8 xor() 3207 MB/s
    raid6: using algorithm int64x4 gen() 5392 MB/s
    raid6: .... xor() 3509 MB/s, rmw enabled

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

09 Jul, 2013

1 commit

  • Rebased/reworked a patch contributed by Rob Herring that uses
    NEON intrinsics to perform the RAID-6 syndrome calculations.
    It uses the existing unroll.awk code to generate several
    unrolled versions of which the best performing one is selected
    at boot time.

    Signed-off-by: Ard Biesheuvel
    Acked-by: Nicolas Pitre
    Cc: hpa@linux.intel.com

    Ard Biesheuvel
     

30 Aug, 2010

1 commit