27 Jul, 2012

1 commit

  • Pull crypto updates from Herbert Xu:

    - Fixed algorithm construction hang when self-test fails.
    - Added SHA variants to talitos AEAD list.
    - New driver for Exynos random number generator.
    - Performance enhancements for arc4.
    - Added hwrng support to caam.
    - Added ahash support to caam.
    - Fixed bad kfree in aesni-intel.
    - Allow aesni-intel in FIPS mode.
    - Added atmel driver with support for AES/3DES/SHA.
    - Bug fixes for mv_cesa.
    - CRC hardware driver for BF60x family processors.

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (66 commits)
    crypto: twofish-avx - remove useless instruction
    crypto: testmgr - add aead cbc aes hmac sha1,256,512 test vectors
    crypto: talitos - add sha224, sha384 and sha512 to existing AEAD algorithms
    crypto: talitos - export the talitos_submit function
    crypto: talitos - move talitos structures to header file
    crypto: atmel - add new tests to tcrypt
    crypto: atmel - add Atmel SHA1/SHA256 driver
    crypto: atmel - add Atmel DES/TDES driver
    crypto: atmel - add Atmel AES driver
    ARM: AT91SAM9G45: add crypto peripherals
    crypto: testmgr - allow aesni-intel and ghash_clmulni-intel in fips mode
    hwrng: exynos - Add support for Exynos random number generator
    crypto: aesni-intel - fix wrong kfree pointer
    crypto: caam - ERA retrieval and printing for SEC device
    crypto: caam - Using alloc_coherent for caam job rings
    crypto: algapi - Fix hang on crypto allocation
    crypto: arc4 - now arc needs blockcipher support
    crypto: caam - one tasklet per job ring
    crypto: caam - consolidate memory barriers from job ring en/dequeue
    crypto: caam - only query h/w in job ring dequeue path
    ...

    Linus Torvalds
     

11 Jul, 2012

3 commits

  • Test vectors were generated starting from existing CBC(AES) test vectors
    (RFC3602, NIST SP800-38A) and adding HMAC(SHA*) computed with Crypto++ and
    double-checked with HashCalc.

    Signed-off-by: Horia Geanta
    Signed-off-by: Herbert Xu

    Horia Geanta
     
  • - set sg buffers size equals to message size
    - add cfb & ofb tests for AES, DES & TDES

    Signed-off-by: Nicolas Royer
    Acked-by: Nicolas Ferre
    Acked-by: Eric Bénard
    Tested-by: Eric Bénard
    Signed-off-by: Herbert Xu

    Nicolas Royer
     
  • Patch 863b557a88f8c033f7419fabafef4712a5055f85 added NULL entries
    for intel accelerated drivers but did not marked these fips allowed.
    This cause panic if running tests with fips=1.

    For ghash, fips_allowed flag was added in patch
    18c0ebd2d8194cce4b3f67e2903fa01bea892cbc.

    Without patch, "modprobe tcrypt" fails with
    alg: skcipher: Failed to load transform for cbc-aes-aesni: -2
    cbc-aes-aesni: cbc(aes) alg self test failed in fips mode!
    (panic)

    Also add missing cryptd(__driver-cbc-aes-aesni) and
    cryptd(__driver-gcm-aes-aesni) test to complement
    null tests above, otherwise system complains with
    alg: No test for __cbc-aes-aesni (cryptd(__driver-cbc-aes-aesni))
    alg: No test for __gcm-aes-aesni (cryptd(__driver-gcm-aes-aesni))

    Signed-off-by: Milan Broz
    Signed-off-by: Paul Wouters
    Signed-off-by: Herbert Xu

    Milan Broz
     

30 Jun, 2012

1 commit

  • This patch adds the following structure:

    struct netlink_kernel_cfg {
    unsigned int groups;
    void (*input)(struct sk_buff *skb);
    struct mutex *cb_mutex;
    };

    That can be passed to netlink_kernel_create to set optional configurations
    for netlink kernel sockets.

    I've populated this structure by looking for NULL and zero parameters at the
    existing code. The remaining parameters that always need to be set are still
    left in the original interface.

    That includes optional parameters for the netlink socket creation. This allows
    easy extensibility of this interface in the future.

    This patch also adapts all callers to use this new interface.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

27 Jun, 2012

10 commits


22 Jun, 2012

1 commit

  • It has been observed that sometimes the crypto allocation code
    will get stuck for 60 seconds or multiples thereof. This is
    usually caused by an algorithm failing to pass the self-test.

    If an algorithm fails to be constructed, we will immediately notify
    all larval waiters. However, if it succeeds in construction, but
    then fails the self-test, we won't notify anyone at all.

    This patch fixes this by merging the notification in the case
    where the algorithm fails to be constructed with that of the
    the case where it pases the self-test. This way regardless of
    what happens, we'll give the larval waiters an answer.

    Signed-off-by: Herbert Xu

    Herbert Xu
     

14 Jun, 2012

3 commits

  • This patch changes u8 in struct arc4_ctx and variables to u32 (as AMD seems
    to have problem with u8 array). Below are tcrypt results of old 1-byte block
    cipher versus ecb(arc4) with u8 and ecb(arc4) with u32.

    tcrypt results, x86-64 (speed ratios: new-u32/old, new-u8/old):

    u32 u8
    AMD Phenom II : x3.6 x2.7
    Intel Core 2 : x2.0 x1.9

    tcrypt results, i386 (speed ratios: new-u32/old, new-u8/old):

    u32 u8
    Intel Atom N260 : x1.5 x1.4

    Cc: Jon Oberheide
    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Currently arc4.c provides simple one-byte blocksize cipher which is wrapped
    by ecb() module, giving function call overhead on every encrypted byte. This
    patch adds ecb(arc4) directly into arc4.c for higher performance.

    tcrypt results (speed ratios: new/old):

    AMD Phenom II, x86-64 : x2.7
    Intel Core 2, x86-64 : x1.9
    Intel Atom N260, i386 : x1.4

    Cc: Jon Oberheide
    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

12 Jun, 2012

4 commits

  • This patch adds a x86_64/avx assembler implementation of the Serpent block
    cipher. The implementation is very similar to the sse2 implementation and
    processes eight blocks in parallel. Because of the new non-destructive three
    operand syntax all move-instructions can be removed and therefore a little
    performance increase is provided.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmark results:

    Intel Core i5-2500 CPU (fam:6, model:42, step:7)

    serpent-avx-x86_64 vs. serpent-sse2-x86_64
    128bit key: (lrw:256bit) (xts:256bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 1.03x 1.01x 1.01x 1.01x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x
    64B 1.00x 1.00x 1.00x 1.00x 1.00x 0.99x 1.00x 1.01x 1.00x 1.00x
    256B 1.05x 1.03x 1.00x 1.02x 1.05x 1.06x 1.05x 1.02x 1.05x 1.02x
    1024B 1.05x 1.02x 1.00x 1.02x 1.05x 1.06x 1.05x 1.03x 1.05x 1.02x
    8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.03x 1.04x 1.02x

    256bit key: (lrw:384bit) (xts:512bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 1.01x 1.00x 1.01x 1.01x 1.00x 1.00x 0.99x 1.03x 1.01x 1.01x
    64B 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x 1.00x 1.02x
    256B 1.05x 1.02x 1.00x 1.02x 1.05x 1.02x 1.04x 1.05x 1.05x 1.02x
    1024B 1.06x 1.02x 1.00x 1.02x 1.07x 1.06x 1.05x 1.04x 1.05x 1.02x
    8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.05x 1.05x 1.02x

    serpent-avx-x86_64 vs aes-asm (8kB block):
    128bit 256bit
    ecb-enc 1.26x 1.73x
    ecb-dec 1.20x 1.64x
    cbc-enc 0.33x 0.45x
    cbc-dec 1.24x 1.67x
    ctr-enc 1.32x 1.76x
    ctr-dec 1.32x 1.76x
    lrw-enc 1.20x 1.60x
    lrw-dec 1.15x 1.54x
    xts-enc 1.22x 1.64x
    xts-dec 1.17x 1.57x

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • The AVX implementation of the twofish cipher processes 8 blocks parallel, so we
    need to make test vectors larger to check parallel code paths. Test vectors are
    also large enough to deal with 16 block parallel implementations which may occur
    in the future.

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • This patch adds a x86_64/avx assembler implementation of the Twofish block
    cipher. The implementation processes eight blocks in parallel (two 4 block
    chunk AVX operations). The table-lookups are done in general-purpose registers.
    For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
    module are called. A good performance increase is provided for blocksizes
    greater or equal to 128B.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmark results:

    Intel Core i5-2500 CPU (fam:6, model:42, step:7)

    twofish-avx-x86_64 vs. twofish-x86_64-3way
    128bit key: (lrw:256bit) (xts:256bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 0.96x 0.97x 1.00x 0.95x 0.97x 0.97x 0.96x 0.95x 0.95x 0.98x
    64B 0.99x 0.99x 1.00x 0.99x 0.98x 0.98x 0.99x 0.98x 0.99x 0.98x
    256B 1.20x 1.21x 1.00x 1.19x 1.15x 1.14x 1.19x 1.20x 1.18x 1.19x
    1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.24x 1.26x 1.28x 1.26x 1.27x
    8192B 1.31x 1.32x 1.00x 1.31x 1.25x 1.25x 1.28x 1.29x 1.28x 1.30x

    256bit key: (lrw:384bit) (xts:512bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 0.96x 0.96x 1.00x 0.96x 0.97x 0.98x 0.95x 0.95x 0.95x 0.96x
    64B 1.00x 0.99x 1.00x 0.98x 0.98x 1.01x 0.98x 0.98x 0.98x 0.98x
    256B 1.20x 1.21x 1.00x 1.21x 1.15x 1.15x 1.19x 1.20x 1.18x 1.19x
    1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.23x 1.26x 1.27x 1.26x 1.27x
    8192B 1.31x 1.33x 1.00x 1.31x 1.26x 1.26x 1.29x 1.29x 1.28x 1.30x

    twofish-avx-x86_64 vs aes-asm (8kB block):
    128bit 256bit
    ecb-enc 1.19x 1.63x
    ecb-dec 1.18x 1.62x
    cbc-enc 0.75x 1.03x
    cbc-dec 1.23x 1.67x
    ctr-enc 1.24x 1.65x
    ctr-dec 1.24x 1.65x
    lrw-enc 1.15x 1.53x
    lrw-dec 1.14x 1.52x
    xts-enc 1.16x 1.56x
    xts-dec 1.16x 1.56x

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • Signed-off-by: Sonic Zhang
    Acked-by: Mike Frysinger
    Signed-off-by: Herbert Xu

    Sonic Zhang
     

24 May, 2012

2 commits

  • Pull md updates from NeilBrown:
    "It's been a busy cycle for md - lots of fun stuff here.. if you like
    this kind of thing :-)

    Main features:
    - RAID10 arrays can be reshaped - adding and removing devices and
    changing chunks (not 'far' array though)
    - allow RAID5 arrays to be reshaped with a backup file (not tested
    yet, but the priciple works fine for RAID10).
    - arrays can be reshaped while a bitmap is present - you no longer
    need to remove it first
    - SSSE3 support for RAID6 syndrome calculations

    and of course a number of minor fixes etc."

    * tag 'md-3.5' of git://neil.brown.name/md: (56 commits)
    md/bitmap: record the space available for the bitmap in the superblock.
    md/raid10: Remove extras after reshape to smaller number of devices.
    md/raid5: improve removal of extra devices after reshape.
    md: check the return of mddev_find()
    MD RAID1: Further conditionalize 'fullsync'
    DM RAID: Use md_error() in place of simply setting Faulty bit
    DM RAID: Record and handle missing devices
    DM RAID: Set recovery flags on resume
    md/raid5: Allow reshape while a bitmap is present.
    md/raid10: resize bitmap when required during reshape.
    md: allow array to be resized while bitmap is present.
    md/bitmap: make sure reshape request are reflected in superblock.
    md/bitmap: add bitmap_resize function to allow bitmap resizing.
    md/bitmap: use DIV_ROUND_UP instead of open-code
    md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap'
    md/bitmap: make bitmap bitops atomic.
    md/bitmap: make _page_attr bitops atomic.
    md/bitmap: merge bitmap_file_unmap and bitmap_file_put.
    md/bitmap: remove async freeing of bitmap file.
    md/bitmap: convert some spin_lock_irqsave to spin_lock_irq
    ...

    Linus Torvalds
     
  • Pull crypto updates from Herbert Xu:
    - New cipher/hash driver for ARM ux500.
    - Code clean-up for aesni-intel.
    - Misc fixes.

    Fixed up conflicts in arch/arm/mach-ux500/devices-common.h, where quite
    frankly some of it made no sense at all (the pull brought in a
    declaration for the dbx500_add_platform_device_noirq() function, which
    neither exists nor is used anywhere).

    Also some trivial add-add context conflicts in the Kconfig file in
    drivers/{char/hw_random,crypto}/

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: aesni-intel - move more common code to ablk_init_common
    crypto: aesni-intel - use crypto_[un]register_algs
    crypto: ux500 - Cleanup hardware identification
    crypto: ux500 - Update DMA handling for 3.4
    mach-ux500: crypto - core support for CRYP/HASH module.
    crypto: ux500 - Add driver for HASH hardware
    crypto: ux500 - Add driver for CRYP hardware
    hwrng: Kconfig - modify default state for atmel-rng driver
    hwrng: omap - use devm_request_and_ioremap
    crypto: crypto4xx - move up err_request_irq label
    crypto, xor: Sanitize checksumming function selection output
    crypto: caam - add backward compatible string sec4.0

    Linus Torvalds
     

22 May, 2012

2 commits


15 May, 2012

1 commit


24 Apr, 2012

1 commit


21 Apr, 2012

1 commit


13 Apr, 2012

1 commit


11 Apr, 2012

1 commit


10 Apr, 2012

1 commit


09 Apr, 2012

1 commit

  • Currently, it says

    [ 1.015541] xor: automatically using best checksumming function: generic_sse
    [ 1.040769] generic_sse: 6679.000 MB/sec
    [ 1.045377] xor: using function: generic_sse (6679.000 MB/sec)

    and repeats the function name three times unnecessarily. Change it into

    [ 1.015115] xor: automatically using best checksumming function:
    [ 1.040794] generic_sse: 6680.000 MB/sec

    and save us a line in dmesg.

    No functional change.

    Cc: Herbert Xu
    Signed-off-by: Borislav Petkov
    Signed-off-by: Herbert Xu

    Borislav Petkov
     

05 Apr, 2012

1 commit

  • The current code only increments the upper 64 bits of the SHA-512 byte
    counter when the number of bytes hashed happens to hit 2^64 exactly.

    This patch increments the upper 64 bits whenever the lower 64 bits
    overflows.

    Signed-off-by: Kent Yoder
    Cc: stable@kernel.org
    Signed-off-by: Herbert Xu

    Kent Yoder
     

03 Apr, 2012

1 commit

  • Pull crypto fixes from Herbert Xu:
    - Fix for CPU hotplug hang in padata.
    - Avoid using cpu_active inappropriately in pcrypt and padata.
    - Fix for user-space algorithm lookup hang with IV generators.
    - Fix for netlink dump of algorithms where stuff went missing due to
    incorrect calculation of message size.

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: user - Fix size of netlink dump message
    crypto: user - Fix lookup of algorithms with IV generator
    crypto: pcrypt - Use the online cpumask as the default
    padata: Fix cpu hotplug
    padata: Use the online cpumask as the default
    padata: Add a reference to the api documentation

    Linus Torvalds
     

02 Apr, 2012

1 commit


29 Mar, 2012

3 commits

  • The default netlink message size limit might be exceeded when dumping a
    lot of algorithms to userspace. As a result, not all of the instantiated
    algorithms dumped to userspace. So calculate an upper bound on the message
    size and call netlink_dump_start() with that value.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • We lookup algorithms with crypto_alg_mod_lookup() when instantiating via
    crypto_add_alg(). However, algorithms that are wrapped by an IV genearator
    (e.g. aead or genicv type algorithms) need special care. The userspace
    process hangs until it gets a timeout when we use crypto_alg_mod_lookup()
    to lookup these algorithms. So export the lookup functions for these
    algorithms and use them in crypto_add_alg().

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • We use the active cpumask to determine the superset of cpus
    to use for parallelization. However, the active cpumask is
    for internal usage of the scheduler and therefore not the
    appropriate cpumask for these purposes. So use the online
    cpumask instead.

    Reported-by: Peter Zijlstra
    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert