07 Jan, 2019

1 commit

  • Pull more Kbuild updates from Masahiro Yamada:

    - improve boolinit.cocci and use_after_iter.cocci semantic patches

    - fix alignment for kallsyms

    - move 'asm goto' compiler test to Kconfig and clean up jump_label
    CONFIG option

    - generate asm-generic wrappers automatically if arch does not
    implement mandatory UAPI headers

    - remove redundant generic-y defines

    - misc cleanups

    * tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kconfig: rename generated .*conf-cfg to *conf-cfg
    kbuild: remove unnecessary stubs for archheader and archscripts
    kbuild: use assignment instead of define ... endef for filechk_* rules
    arch: remove redundant UAPI generic-y defines
    kbuild: generate asm-generic wrappers if mandatory headers are missing
    arch: remove stale comments "UAPI Header export list"
    riscv: remove redundant kernel-space generic-y
    kbuild: change filechk to surround the given command with { }
    kbuild: remove redundant target cleaning on failure
    kbuild: clean up rule_dtc_dt_yaml
    kbuild: remove UIMAGE_IN and UIMAGE_OUT
    jump_label: move 'asm goto' support test to Kconfig
    kallsyms: lower alignment on ARM
    scripts: coccinelle: boolinit: drop warnings on named constants
    scripts: coccinelle: check for redeclaration
    kconfig: remove unused "file" field of yylval union
    nds32: remove redundant kernel-space generic-y
    nios2: remove unneeded HAS_DMA define

    Linus Torvalds
     

06 Jan, 2019

1 commit


03 Jan, 2019

1 commit

  • Pull the pending 4.21 changes for md from Shaohua.

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    md: fix raid10 hang issue caused by barrier
    raid10: refactor common wait code from regular read/write request
    md: remvoe redundant condition check
    lib/raid6: add option to skip algo benchmarking
    lib/raid6: sort algos in rough performance order
    lib/raid6: check for assembler SSSE3 support
    lib/raid6: avoid __attribute_const__ redefinition
    lib/raid6: add missing include for raid6test
    md: remove set but not used variable 'bi_rdev'

    Jens Axboe
     

21 Dec, 2018

3 commits

  • This is helpful for systems where fast startup time is important.
    It is especially nice to avoid benchmarking RAID functions that are
    never used (for example, BTRFS selects RAID6_PQ even if the parity RAID
    mode is not in use).

    This saves 250+ milliseconds of boot time on modern x86 and ARM systems
    with a dozen or more available implementations.

    The new option is defaulted to 'y' to match the previous behavior of
    always benchmarking on init.

    Signed-off-by: Daniel Verkamp
    Signed-off-by: Shaohua Li

    Daniel Verkamp
     
  • Sort the list of RAID6 algorithms in roughly decreasing order of
    expected performance: newer instruction sets first (within each
    architecture) and wider unrollings first.

    This doesn't make any difference right now, since all functions are
    benchmarked; a follow-up change will make use of this by optionally
    choosing the first valid function rather than testing all of them.

    The Itanium raid6_intx{16,32} entries are also moved down to be near the
    other raid6_intx entries for clarity.

    Signed-off-by: Daniel Verkamp
    Signed-off-by: Shaohua Li

    Daniel Verkamp
     
  • Allow the x86 SSSE3 recovery function to be tested in raid6test.

    Signed-off-by: Daniel Verkamp
    Signed-off-by: Shaohua Li

    Daniel Verkamp
     

20 Dec, 2018

1 commit

  • We cannot build these files with clang as it does not allow altivec
    instructions in assembly when -msoft-float is passed.

    Jinsong Ji wrote:
    > We currently disable Altivec/VSX support when enabling soft-float. So
    > any usage of vector builtins will break.
    >
    > Enable Altivec/VSX with soft-float may need quite some clean up work, so
    > I guess this is currently a limitation.
    >
    > Removing -msoft-float will make it work (and we are lucky that no
    > floating point instructions will be generated as well).

    This is a workaround until the issue is resolved in clang.

    Link: https://bugs.llvm.org/show_bug.cgi?id=31177
    Link: https://github.com/ClangBuiltLinux/linux/issues/239
    Signed-off-by: Joel Stanley
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Michael Ellerman

    Joel Stanley
     

07 Nov, 2018

1 commit

  • The lib/raid6/test fails to build the neon objects
    on arm64 because the correct machine type is 'aarch64'.

    Once this is correctly enabled, the neon recovery objects
    need to be added to the build.

    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Jeremy Linton
    Signed-off-by: Catalin Marinas

    Jeremy Linton
     

04 Jul, 2018

1 commit

  • In the quest to remove all stack VLA usage from the kernel[1], this moves
    the "$#" replacement from being an argument to being inside the function,
    which avoids generating VLAs.

    [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

    Signed-off-by: Kees Cook
    Signed-off-by: Martin Schwidefsky

    Kees Cook
     

08 Apr, 2018

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "Notable changes:

    - Support for 4PB user address space on 64-bit, opt-in via mmap().

    - Removal of POWER4 support, which was accidentally broken in 2016
    and no one noticed, and blocked use of some modern instructions.

    - Workarounds so that the hypervisor can enable Transactional Memory
    on Power9.

    - A series to disable the DAWR (Data Address Watchpoint Register) on
    Power9.

    - More information displayed in the meltdown/spectre_v1/v2 sysfs
    files.

    - A vpermxor (Power8 Altivec) implementation for the raid6 Q
    Syndrome.

    - A big series to make the allocation of our pacas (per cpu area),
    kernel page tables, and per-cpu stacks NUMA aware when using the
    Radix MMU on Power9.

    And as usual many fixes, reworks and cleanups.

    Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy,
    Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual,
    Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
    Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic
    Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook,
    Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe,
    Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring,
    Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira,
    Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
    Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher
    Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev
    Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav
    Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun"

    * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits)
    powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
    powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
    powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit
    powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
    Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"
    powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
    powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
    cxl: Fix possible deadlock when processing page faults from cxllib
    powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
    powerpc/mm/radix: Update command line parsing for disable_radix
    powerpc/mm/radix: Parse disable_radix commandline correctly.
    powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb
    powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
    powerpc/mm/keys: Update documentation and remove unnecessary check
    powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
    powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
    powerpc/powernv: Always stop secondaries before reboot/shutdown
    powerpc: hard disable irqs in smp_send_stop loop
    powerpc: use NMI IPI for smp_send_stop
    powerpc/powernv: Fix SMT4 forcing idle code
    ...

    Linus Torvalds
     

06 Apr, 2018

1 commit

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    kfifo: fix inaccurate comment
    tools/thermal: tmon: fix for segfault
    net: Spelling s/stucture/structure/
    edd: don't spam log if no EDD information is present
    Documentation: Fix early-microcode.txt references after file rename
    tracing: Block comments should align the * on each line
    treewide: Fix typos in printk
    GenWQE: Fix a typo in two comments
    treewide: Align function definition open/close braces

    Linus Torvalds
     

26 Mar, 2018

2 commits

  • The Tile architecture is getting removed, so we no longer need this either.

    Acked-by: Ard Biesheuvel
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • Some functions definitions have either the initial open brace and/or
    the closing brace outside of column 1.

    Move those braces to column 1.

    This allows various function analyzers like gnu complexity to work
    properly for these modified functions.

    Signed-off-by: Joe Perches
    Acked-by: Andy Shevchenko
    Acked-by: Paul Moore
    Acked-by: Alex Deucher
    Acked-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Acked-by: Alexandre Belloni
    Acked-by: Martin K. Petersen
    Acked-by: Takashi Iwai
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Rafael J. Wysocki
    Acked-by: Nicolin Chen
    Acked-by: Martin K. Petersen
    Acked-by: Steven Rostedt (VMware)
    Signed-off-by: Jiri Kosina

    Joe Perches
     

20 Mar, 2018

2 commits

  • Previously the raid6 test Makefile did not build the POWER specific files
    (altivec and vpermxor).
    This patch fixes the bug, so that all appropriate files for powerpc are built.

    This patch also fixes the missing and mismatched ifdef statements to allow the
    altivec.uc file to be built correctly.

    Signed-off-by: Matt Brown
    Signed-off-by: Michael Ellerman

    Matt Brown
     
  • This patch uses the vpermxor instruction to optimise the raid6 Q
    syndrome. This instruction was made available with POWER8, ISA version
    2.07. It allows for both vperm and vxor instructions to be done in a
    single instruction. This has been tested for correctness on a ppc64le
    vm with a basic RAID6 setup containing 5 drives.

    The performance benchmarks are from the raid6test in the
    /lib/raid6/test directory. These results are from an IBM Firestone
    machine with ppc64le architecture. The benchmark results show a 35%
    speed increase over the best existing algorithm for powerpc (altivec).
    The raid6test has also been run on a big-endian ppc64 vm to ensure it
    also works for big-endian architectures.

    Performance benchmarks:
    raid6: altivecx4 gen() 18773 MB/s
    raid6: altivecx8 gen() 19438 MB/s

    raid6: vpermxor4 gen() 25112 MB/s
    raid6: vpermxor8 gen() 26279 MB/s

    Signed-off-by: Matt Brown
    Reviewed-by: Daniel Axtens
    [mpe: Add VPERMXOR macro so we can build with old binutils]
    Signed-off-by: Michael Ellerman

    Matt Brown
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

08 Sep, 2017

1 commit

  • Pull MD updates from Shaohua Li:
    "This update mainly fixes bugs:

    - Make raid5 ppl support several ppl from Pawel

    - Several raid5-cache bug fixes from Song

    - Bitmap fixes from Neil and Me

    - One raid1/10 regression fix since 4.12 from Me

    - Other small fixes and cleanup"

    * tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    md/bitmap: disable bitmap_resize for file-backed bitmaps.
    raid5-ppl: Recovery support for multiple partial parity logs
    md: Runtime support for multiple ppls
    md/raid0: attach correct cgroup info in bio
    lib/raid6: align AVX512 constants to 512 bits, not bytes
    raid5: remove raid5_build_block
    md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
    md: replace seq_release_private with seq_release
    md: notify about new spare disk in the container
    md/raid1/10: reset bio allocated from mempool
    md/raid5: release/flush io in raid5_do_work()
    md/bitmap: copy correct data for bitmap super

    Linus Torvalds
     

26 Aug, 2017

1 commit


10 Aug, 2017

2 commits

  • Provide a NEON accelerated implementation of the recovery algorithm,
    which supersedes the default byte-by-byte one.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Catalin Marinas

    Ard Biesheuvel
     
  • The P/Q left side optimization in the delta syndrome simply involves
    repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given
    that 'x * x * x * x' equals 'x^4' even in the polynomial world, we
    can accelerate this substantially by performing up to 4 such operations
    at once, using the NEON instructions for polynomial multiplication.

    Results on a Cortex-A57 running in 64-bit mode:

    Before:
    -------
    raid6: neonx1 xor() 1680 MB/s
    raid6: neonx2 xor() 2286 MB/s
    raid6: neonx4 xor() 3162 MB/s
    raid6: neonx8 xor() 3389 MB/s

    After:
    ------
    raid6: neonx1 xor() 2281 MB/s
    raid6: neonx2 xor() 3362 MB/s
    raid6: neonx4 xor() 3787 MB/s
    raid6: neonx8 xor() 4239 MB/s

    While we're at it, simplify MASK() by using a signed shift rather than
    a vector compare involving a temp register.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Catalin Marinas

    Ard Biesheuvel
     

16 May, 2017

1 commit

  • The raid6_gfexp table represents {2}^n values for 0 < 256. The
    Linux async_tx framework pass values from raid6_gfexp as coefficients
    for each source to prep_dma_pq() callback of DMA channel with PQ
    capability. This creates problem for RAID6 offload engines (such as
    Broadcom SBA) which take disk position (i.e. log of {2}) instead of
    multiplicative cofficients from raid6_gfexp table.

    This patch adds raid6_gflog table having log-of-2 value for any given
    x such that 0 < 256. For any given disk coefficient x, the
    corresponding disk position is given by raid6_gflog[x]. The RAID6
    offload engine driver can use this newly added raid6_gflog table to
    get disk position from multiplicative coefficient.

    Signed-off-by: Anup Patel
    Reviewed-by: Scott Branden
    Reviewed-by: Ray Jui
    Acked-by: Shaohua Li
    Signed-off-by: Vinod Koul

    Anup Patel
     

08 Nov, 2016

1 commit


08 Oct, 2016

1 commit

  • Pull MD updates from Shaohua Li:
    "This update includes:

    - new AVX512 instruction based raid6 gen/recovery algorithm

    - a couple of md-cluster related bug fixes

    - fix a potential deadlock

    - set nonrotational bit for raid array with SSD

    - set correct max_hw_sectors for raid5/6, which hopefuly can improve
    performance a little bit

    - other minor fixes"

    * tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    md: set rotational bit
    raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
    raid5: handle register_shrinker failure
    raid5: fix to detect failure of register_shrinker
    md: fix a potential deadlock
    md/bitmap: fix wrong cleanup
    raid5: allow arbitrary max_hw_sectors
    lib/raid6: Add AVX512 optimized xor_syndrome functions
    lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
    lib/raid6: Add AVX512 optimized recovery functions
    lib/raid6: Add AVX512 optimized gen_syndrome functions
    md-cluster: make resync lock also could be interruptted
    md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
    md-cluster: convert the completion to wait queue
    md-cluster: protect md_find_rdev_nr_rcu with rcu lock
    md-cluster: clean related infos of cluster
    md: changes for MD_STILL_CLOSED flag
    md-cluster: remove some unnecessary dlm_unlock_sync
    md-cluster: use FORCEUNLOCK in lockres_free
    md-cluster: call md_kick_rdev_from_array once ack failed

    Linus Torvalds
     

27 Sep, 2016

1 commit

  • Specifying the aligned attributes to the char data[NDISKS][PAGE_SIZE],
    char recovi[PAGE_SIZE] and char recovi[PAGE_SIZE] arrays, so that all
    malloc memory is page boundary aligned.

    Without these alignment attributes, the test causes a segfault in
    userspace when the NDISKS are changed to 4 from 16.

    The RAID stripes will be page aligned anyway, so we want to test what
    the kernel actually will execute.

    Cc: H. Peter Anvin
    Cc: Yu-cheng Yu
    Signed-off-by: Gayatri Kammela
    Reviewed-by: H. Peter Anvin
    Signed-off-by: Shaohua Li

    Gayatri Kammela
     

22 Sep, 2016

4 commits

  • Optimize RAID6 xor_syndrome functions to take advantage of the 512-bit
    ZMM integer instructions introduced in AVX512.

    AVX512 optimized xor_syndrome functions, which is simply based on sse2.c
    written by hpa.

    The patch was tested and benchmarked before submission on
    a hardware that has AVX512 flags to support such instructions

    Cc: H. Peter Anvin
    Cc: Jim Kukunas
    Cc: Fenghua Yu
    Cc: Megha Dey
    Signed-off-by: Gayatri Kammela
    Reviewed-by: Fenghua Yu
    Signed-off-by: Shaohua Li

    Gayatri Kammela
     
  • Adding avx512 gen_syndrome and recovery functions so as to allow code to
    be compiled and tested successfully in userspace.

    This patch is tested in userspace and improvement in performace is
    observed.

    Cc: H. Peter Anvin
    Cc: Jim Kukunas
    Cc: Fenghua Yu
    Signed-off-by: Megha Dey
    Signed-off-by: Gayatri Kammela
    Reviewed-by: Fenghua Yu
    Signed-off-by: Shaohua Li

    Gayatri Kammela
     
  • Optimize RAID6 recovery functions to take advantage of
    the 512-bit ZMM integer instructions introduced in AVX512.

    AVX512 optimized recovery functions, which is simply based
    on recov_avx2.c written by Jim Kukunas

    This patch was tested and benchmarked before submission on
    a hardware that has AVX512 flags to support such instructions

    Cc: Jim Kukunas
    Cc: H. Peter Anvin
    Cc: Fenghua Yu
    Signed-off-by: Megha Dey
    Signed-off-by: Gayatri Kammela
    Reviewed-by: Fenghua Yu
    Signed-off-by: Shaohua Li

    Gayatri Kammela
     
  • Optimize RAID6 gen_syndrom functions to take advantage of
    the 512-bit ZMM integer instructions introduced in AVX512.

    AVX512 optimized gen_syndrom functions, which is simply based
    on avx2.c written by Yuanhan Liu and sse2.c written by hpa.

    The patch was tested and benchmarked before submission on
    a hardware that has AVX512 flags to support such instructions

    Cc: H. Peter Anvin
    Cc: Jim Kukunas
    Cc: Fenghua Yu
    Signed-off-by: Megha Dey
    Signed-off-by: Gayatri Kammela
    Reviewed-by: Fenghua Yu
    Signed-off-by: Shaohua Li

    Gayatri Kammela
     

01 Sep, 2016

1 commit


29 Aug, 2016

1 commit

  • Using vector registers is slightly faster:

    raid6: vx128x8 gen() 19705 MB/s
    raid6: vx128x8 xor() 11886 MB/s
    raid6: using algorithm vx128x8 gen() 19705 MB/s
    raid6: .... xor() 11886 MB/s, rmw enabled

    vs the software algorithms:

    raid6: int64x1 gen() 3018 MB/s
    raid6: int64x1 xor() 1429 MB/s
    raid6: int64x2 gen() 4661 MB/s
    raid6: int64x2 xor() 3143 MB/s
    raid6: int64x4 gen() 5392 MB/s
    raid6: int64x4 xor() 3509 MB/s
    raid6: int64x8 gen() 4441 MB/s
    raid6: int64x8 xor() 3207 MB/s
    raid6: using algorithm int64x4 gen() 5392 MB/s
    raid6: .... xor() 3509 MB/s, rmw enabled

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

01 Dec, 2015

1 commit

  • The enable_kernel_*() functions leave the relevant MSR bits enabled
    until we exit the kernel sometime later. Create disable versions
    that wrap the kernel use of FP, Altivec VSX or SPE.

    While we don't want to disable it normally for performance reasons
    (MSR writes are slow), it will be used for a debug boot option that
    does this and catches bad uses in other areas of the kernel.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Michael Ellerman

    Anton Blanchard
     

01 Sep, 2015

1 commit

  • This implements XOR syndrome calculation using NEON intrinsics.
    As before, the module can be built for ARM and arm64 from the
    same source.

    Relative performance on a Cortex-A57 based system:

    raid6: int64x1 gen() 905 MB/s
    raid6: int64x1 xor() 881 MB/s
    raid6: int64x2 gen() 1343 MB/s
    raid6: int64x2 xor() 1286 MB/s
    raid6: int64x4 gen() 1896 MB/s
    raid6: int64x4 xor() 1321 MB/s
    raid6: int64x8 gen() 1773 MB/s
    raid6: int64x8 xor() 1165 MB/s
    raid6: neonx1 gen() 1834 MB/s
    raid6: neonx1 xor() 1278 MB/s
    raid6: neonx2 gen() 2528 MB/s
    raid6: neonx2 xor() 1942 MB/s
    raid6: neonx4 gen() 2888 MB/s
    raid6: neonx4 xor() 2334 MB/s
    raid6: neonx8 gen() 2957 MB/s
    raid6: neonx8 xor() 2232 MB/s
    raid6: using algorithm neonx8 gen() 2957 MB/s
    raid6: .... xor() 2232 MB/s, rmw enabled

    Cc: Markus Stockhausen
    Cc: Neil Brown
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: NeilBrown

    Ard Biesheuvel
     

24 Jun, 2015

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - disable the 32-bit vdso when building LE, so we can build with a
    64-bit only toolchain.

    - EEH fixes from Gavin & Richard.

    - enable the sys_kcmp syscall from Laurent.

    - sysfs control for fastsleep workaround from Shreyas.

    - expose OPAL events as an irq chip by Alistair.

    - MSI ops moved to pci_controller_ops by Daniel.

    - fix for kernel to userspace backtraces for perf from Anton.

    - merge pseries and pseries_le defconfigs from Cyril.

    - CXL in-kernel API from Mikey.

    - OPAL prd driver from Jeremy.

    - fix for DSCR handling & tests from Anshuman.

    - Powernv flash mtd driver from Cyril.

    - dynamic DMA Window support on powernv from Alexey.

    - LLVM clang fixes & workarounds from Anton.

    - reworked version of the patch to abort syscalls when transactional.

    - fix the swap encoding to support 4TB, from Aneesh.

    - various fixes as usual.

    - Freescale updates from Scott: Highlights include more 8xx
    optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
    t1024/t1023 support, and various fixes and cleanup.

    * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
    cxl: Fix typo in debug print
    cxl: Add CXL_KERNEL_API config option
    powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
    powerpc/mm: Change the swap encoding in pte.
    powerpc/mm: PTE_RPN_MAX is not used, remove the same
    powerpc/tm: Abort syscalls in active transactions
    powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
    powerpc/include: Add opal-prd to installed uapi headers
    powerpc/powernv: fix construction of opal PRD messages
    powerpc/powernv: Increase opal-irqchip initcall priority
    powerpc: Make doorbell check preemption safe
    powerpc/powernv: pnv_init_idle_states() should only run on powernv
    macintosh/nvram: Remove as unused
    powerpc: Don't use gcc specific options on clang
    powerpc: Don't use -mno-strict-align on clang
    powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
    powerpc: Only use -mabi=altivec if toolchain supports it
    powerpc: Fix duplicate const clang warning in user access code
    vfio: powerpc/spapr: Support Dynamic DMA windows
    vfio: powerpc/spapr: Register memory and define IOMMU v2
    ...

    Linus Torvalds
     

11 Jun, 2015

1 commit


19 May, 2015

1 commit

  • We already have fpu/types.h, move i387.h to fpu/api.h.

    The file name has become a misnomer anyway: it offers generic FPU APIs,
    but is not limited to i387 functionality.

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

22 Apr, 2015

4 commits

  • The second and (last) optimized XOR syndrome calculation. This version
    supports right and left side optimization. All CPUs with architecture
    older than Haswell will benefit from it.

    It should be noted that SSE2 movntdq kills performance for memory areas
    that are read and written simultaneously in chunks smaller than cache
    line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR
    functions.

    Signed-off-by: Markus Stockhausen
    Signed-off-by: NeilBrown

    Markus Stockhausen
     
  • Start the algorithms with the very basic one. It is left and right
    optimized. That means we can avoid all calculations for unneeded pages
    above the right stop offset. For pages below the left start offset we
    still need the syndrome multiplication but without reading data pages.

    Signed-off-by: Markus Stockhausen
    Signed-off-by: NeilBrown

    Markus Stockhausen
     
  • It is always helpful to have a test tool in place if we implement
    new data critical algorithms. So add some test routines to the raid6
    checker that can prove if the new xor_syndrome() works as expected.

    Run through all permutations of start/stop pages per algorithm and
    simulate a xor_syndrome() assisted rmw run. After each rmw check if
    the recovery algorithm still confirms that the stripe is fine.

    Signed-off-by: Markus Stockhausen
    Signed-off-by: NeilBrown

    Markus Stockhausen
     
  • v3: s-o-b comment, explanation of performance and descision for
    the start/stop implementation

    Implementing rmw functionality for RAID6 requires optimized syndrome
    calculation. Up to now we can only generate a complete syndrome. The
    target P/Q pages are always overwritten. With this patch we provide
    a framework for inplace P/Q modification. In the first place simply
    fill those functions with NULL values.

    xor_syndrome() has two additional parameters: start & stop. These
    will indicate the first and last page that are changing during a
    rmw run. That makes it possible to avoid several unneccessary loops
    and speed up calculation. The caller needs to implement the following
    logic to make the functions work.

    1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
    blocks inside P/Q between (and including) start and end.

    2) modify any block with start
    Signed-off-by: NeilBrown

    Markus Stockhausen
     

04 Feb, 2015

1 commit