08 Oct, 2020

1 commit

  • Clang warns:

    crypto/xor.c:101:4: warning: variable 'count' is uninitialized when used
    here [-Wuninitialized]
    count++;
    ^~~~~
    crypto/xor.c:86:17: note: initialize the variable 'count' to silence
    this warning
    int i, j, count;
    ^
    = 0
    1 warning generated.

    After the refactoring to use ktime that happened in this function, count
    is only assigned, never read. Just remove the variable to get rid of the
    warning.

    Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
    Link: https://github.com/ClangBuiltLinux/linux/issues/1171
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Douglas Anderson
    Acked-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Nathan Chancellor
     

02 Oct, 2020

2 commits

  • Currently, we use the jiffies counter as a time source, by staring at
    it until a HZ period elapses, and then staring at it again and perform
    as many XOR operations as we can at the same time until another HZ
    period elapses, so that we can calculate the throughput. This takes
    longer than necessary, and depends on HZ, which is undesirable, since
    HZ is system dependent.

    Let's use the ktime interface instead, and use it to time a fixed
    number of XOR operations, which can be done much faster, and makes
    the time spent depend on the performance level of the system itself,
    which is much more reasonable. To ensure that we have the resolution
    we need even on systems with 32 kHz time sources, while not spending too
    much time in the benchmark on a slow CPU, let's switch to 3 attempts of
    800 repetitions each: that way, we will only misidentify algorithms that
    perform within 10% of each other as the fastest if they are faster than
    10 GB/s to begin with, which is not expected to occur on systems with
    such coarse clocks.

    On ThunderX2, I get the following results:

    Before:

    [72625.956765] xor: measuring software checksum speed
    [72625.993104] 8regs : 10169.000 MB/sec
    [72626.033099] 32regs : 12050.000 MB/sec
    [72626.073095] arm64_neon: 11100.000 MB/sec
    [72626.073097] xor: using function: 32regs (12050.000 MB/sec)

    After:

    [72599.650216] xor: measuring software checksum speed
    [72599.651188] 8regs : 10491 MB/sec
    [72599.652006] 32regs : 12345 MB/sec
    [72599.652871] arm64_neon : 11402 MB/sec
    [72599.652873] xor: using function: 32regs (12345 MB/sec)

    Link: https://lore.kernel.org/linux-crypto/20200923182230.22715-3-ardb@kernel.org/
    Signed-off-by: Ard Biesheuvel
    Reviewed-by: Douglas Anderson
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Currently, the XOR module performs its boot time benchmark at core
    initcall time when it is built-in, to ensure that the RAID code can
    make use of it when it is built-in as well.

    Let's defer this to a later stage during the boot, to avoid impacting
    the overall boot time of the system. Instead, just pick an arbitrary
    implementation from the list, and use that as the preliminary default.

    Reviewed-by: Douglas Anderson
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 or at your option any
    later version you should have received a copy of the gnu general
    public license for example usr src linux copying if not write to the
    free software foundation inc 675 mass ave cambridge ma 02139 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 20 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520170858.552543146@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 Nov, 2017

1 commit

  • Convert all allocations that used a NOTRACK flag to stop using it.

    Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Levin, Alexander (Sasha Levin)
     

31 Aug, 2016

1 commit


24 Aug, 2016

1 commit


11 Oct, 2012

1 commit


24 May, 2012

1 commit

  • Pull md updates from NeilBrown:
    "It's been a busy cycle for md - lots of fun stuff here.. if you like
    this kind of thing :-)

    Main features:
    - RAID10 arrays can be reshaped - adding and removing devices and
    changing chunks (not 'far' array though)
    - allow RAID5 arrays to be reshaped with a backup file (not tested
    yet, but the priciple works fine for RAID10).
    - arrays can be reshaped while a bitmap is present - you no longer
    need to remove it first
    - SSSE3 support for RAID6 syndrome calculations

    and of course a number of minor fixes etc."

    * tag 'md-3.5' of git://neil.brown.name/md: (56 commits)
    md/bitmap: record the space available for the bitmap in the superblock.
    md/raid10: Remove extras after reshape to smaller number of devices.
    md/raid5: improve removal of extra devices after reshape.
    md: check the return of mddev_find()
    MD RAID1: Further conditionalize 'fullsync'
    DM RAID: Use md_error() in place of simply setting Faulty bit
    DM RAID: Record and handle missing devices
    DM RAID: Set recovery flags on resume
    md/raid5: Allow reshape while a bitmap is present.
    md/raid10: resize bitmap when required during reshape.
    md: allow array to be resized while bitmap is present.
    md/bitmap: make sure reshape request are reflected in superblock.
    md/bitmap: add bitmap_resize function to allow bitmap resizing.
    md/bitmap: use DIV_ROUND_UP instead of open-code
    md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap'
    md/bitmap: make bitmap bitops atomic.
    md/bitmap: make _page_attr bitops atomic.
    md/bitmap: merge bitmap_file_unmap and bitmap_file_put.
    md/bitmap: remove async freeing of bitmap file.
    md/bitmap: convert some spin_lock_irqsave to spin_lock_irq
    ...

    Linus Torvalds
     

22 May, 2012

2 commits


09 Apr, 2012

1 commit

  • Currently, it says

    [ 1.015541] xor: automatically using best checksumming function: generic_sse
    [ 1.040769] generic_sse: 6679.000 MB/sec
    [ 1.045377] xor: using function: generic_sse (6679.000 MB/sec)

    and repeats the function name three times unnecessarily. Change it into

    [ 1.015115] xor: automatically using best checksumming function:
    [ 1.040794] generic_sse: 6680.000 MB/sec

    and save us a line in dmesg.

    No functional change.

    Cc: Herbert Xu
    Signed-off-by: Borislav Petkov
    Signed-off-by: Herbert Xu

    Borislav Petkov
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

15 Jun, 2009

1 commit


31 Mar, 2009

1 commit


13 Jul, 2007

2 commits

  • The async_tx api provides methods for describing a chain of asynchronous
    bulk memory transfers/transforms with support for inter-transactional
    dependencies. It is implemented as a dmaengine client that smooths over
    the details of different hardware offload engine implementations. Code
    that is written to the api can optimize for asynchronous operation and the
    api will fit the chain of operations to the available offload resources.

    I imagine that any piece of ADMA hardware would register with the
    'async_*' subsystem, and a call to async_X would be routed as
    appropriate, or be run in-line. - Neil Brown

    async_tx exploits the capabilities of struct dma_async_tx_descriptor to
    provide an api of the following general format:

    struct dma_async_tx_descriptor *
    async_(..., struct dma_async_tx_descriptor *depend_tx,
    dma_async_tx_callback cb_fn, void *cb_param)
    {
    struct dma_chan *chan = async_tx_find_channel(depend_tx, );
    struct dma_device *device = chan ? chan->device : NULL;
    int int_en = cb_fn ? 1 : 0;
    struct dma_async_tx_descriptor *tx = device ?
    device->device_prep_dma_(chan, len, int_en) : NULL;

    if (tx) { /* run asynchronously */
    ...
    tx->tx_set_dest(addr, tx, index);
    ...
    tx->tx_set_src(addr, tx, index);
    ...
    async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
    } else { /* run synchronously */
    ...

    ...
    async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
    }

    return tx;
    }

    async_tx_find_channel() returns a capable channel from its pool. The
    channel pool is organized as a per-cpu array of channel pointers. The
    async_tx_rebalance() routine is tasked with managing these arrays. In the
    uniprocessor case async_tx_rebalance() tries to spread responsibility
    evenly over channels of similar capabilities. For example if there are two
    copy+xor channels, one will handle copy operations and the other will
    handle xor. In the SMP case async_tx_rebalance() attempts to spread the
    operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
    channel0 while cpu1 gets copy channel 1 and xor channel 1. When a
    dependency is specified async_tx_find_channel defaults to keeping the
    operation on the same channel. A xor->copy->xor chain will stay on one
    channel if it supports both operation types, otherwise the transaction will
    transition between a copy and a xor resource.

    Currently the raid5 implementation in the MD raid456 driver has been
    converted to the async_tx api. A driver for the offload engines on the
    Intel Xscale series of I/O processors, iop-adma, is provided in a later
    commit. With the iop-adma driver and async_tx, raid456 is able to offload
    copy, xor, and xor-zero-sum operations to hardware engines.

    On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
    improvement) and sequential reads to a degraded array (40 - 55%
    improvement). For the other cases performance was roughly equal, +/- a few
    percentage points. On a x86-smp platform the performance of the async_tx
    implementation (in synchronous mode) was also +/- a few percentage points
    of the original implementation. According to 'top' on iop342 CPU
    utilization drops from ~50% to ~15% during a 'resync' while the speed
    according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.

    The tiobench command line used for testing was: tiobench --size 2048
    --block 4096 --block 131072 --dir /mnt/raid --numruns 5
    * iop342 had 1GB of memory available

    Details:
    * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
    async_tx_find_channel a static inline routine that always returns NULL
    * when a callback is specified for a given transaction an interrupt will
    fire at operation completion time and the callback will occur in a
    tasklet. if the the channel does not support interrupts then a live
    polling wait will be performed
    * the api is written as a dmaengine client that requests all available
    channels
    * In support of dependencies the api implicitly schedules channel-switch
    interrupts. The interrupt triggers the cleanup tasklet which causes
    pending operations to be scheduled on the next channel
    * Xor engines treat an xor destination address differently than a software
    xor routine. To the software routine the destination address is an implied
    source, whereas engines treat it as a write-only destination. This patch
    modifies the xor_blocks routine to take a an explicit destination address
    to mirror the hardware.

    Changelog:
    * fixed a leftover debug print
    * don't allow callbacks in async_interrupt_cond
    * fixed xor_block changes
    * fixed usage of ASYNC_TX_XOR_DROP_DEST
    * drop dma mapping methods, suggested by Chris Leech
    * printk warning fixups from Andrew Morton
    * don't use inline in C files, Adrian Bunk
    * select the API when MD is enabled
    * BUG_ON xor source counts
    Signed-off-by: Dan Williams
    Acked-By: NeilBrown

    Dan Williams
     
  • The async_tx api tries to use a dma engine for an operation, but will fall
    back to an optimized software routine otherwise. Xor support is
    implemented using the raid5 xor routines. For organizational purposes this
    routine is moved to a common area.

    The following fixes are also made:
    * rename xor_block => xor_blocks, suggested by Adrian Bunk
    * ensure that xor.o initializes before md.o in the built-in case
    * checkpatch.pl fixes
    * mark calibrate_xor_blocks __init, Adrian Bunk

    Cc: Adrian Bunk
    Cc: NeilBrown
    Cc: Herbert Xu
    Signed-off-by: Dan Williams

    Dan Williams