21 Oct, 2011

27 commits


20 Oct, 2011

1 commit


22 Sep, 2011

5 commits

  • Include to pick up the declarations for crypto_aes_encrypt_x86
    and crypto_aes_decrypt_x86 to quiet the sparse noise:

    warning: symbol 'crypto_aes_encrypt_x86' was not declared. Should it be static?
    warning: symbol 'crypto_aes_decrypt_x86' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Acked-by: Mandeep Singh Baines
    Signed-off-by: Herbert Xu

    H Hartley Sweeten
     
  • Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
    functions are provided. First set is regular 'one-block at time'
    encrypt/decrypt functions. Second is 'four-block at time' functions that
    gain performance increase on out-of-order CPUs. Performance of 4-way
    functions should be equal to 1-way functions with in-order CPUs.

    Summary of the tcrypt benchmarks:

    Blowfish assembler vs blowfish C (256bit 8kb block ECB)
    encrypt: 2.2x speed
    decrypt: 2.3x speed

    Blowfish assembler vs blowfish C (256bit 8kb block CBC)
    encrypt: 1.12x speed
    decrypt: 2.5x speed

    Blowfish assembler vs blowfish C (256bit 8kb block CTR)
    encrypt: 2.5x speed

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt

    Tests were run on:
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 10
    model name : AMD Phenom(tm) II X6 1055T Processor
    stepping : 0

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Add ctr(blowfish) speed test to receive results for blowfish x86_64 assembly
    patch.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Rename blowfish to blowfish_generic so that assembler versions of blowfish
    cipher can autoload. Module alias 'blowfish' is added.

    Also fix checkpatch warnings.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch splits up the blowfish crypto routine into a common part (key setup)
    which will be used by blowfish crypto modules (x86_64 assembly and generic-c).

    Also fixes errors/warnings reported by checkpatch.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

20 Aug, 2011

1 commit


16 Aug, 2011

1 commit

  • On Tue, Aug 16, 2011 at 03:22:34PM +1000, Stephen Rothwell wrote:
    >
    > After merging the final tree, today's linux-next build (powerpc
    > allyesconfig) produced this warning:
    >
    > In file included from security/integrity/ima/../integrity.h:16:0,
    > from security/integrity/ima/ima.h:27,
    > from security/integrity/ima/ima_policy.c:20:
    > include/crypto/sha.h:86:10: warning: 'struct shash_desc' declared inside parameter list
    > include/crypto/sha.h:86:10: warning: its scope is only this definition or declaration, which is probably not what you want
    >
    > Introduced by commit 7c390170b493 ("crypto: sha1 - export sha1_update for
    > reuse"). I guess you need to include crypto/hash.h in crypto/sha.h.

    This patch fixes this by providing a declaration for struct shash_desc.

    Reported-by: Stephen Rothwell
    Signed-off-by: Herbert Xu

    Herbert Xu
     

15 Aug, 2011

1 commit


10 Aug, 2011

4 commits

  • This is an assembler implementation of the SHA1 algorithm using the
    Supplemental SSE3 (SSSE3) instructions or, when available, the
    Advanced Vector Extensions (AVX).

    Testing with the tcrypt module shows the raw hash performance is up to
    2.3 times faster than the C implementation, using 8k data blocks on a
    Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
    faster.

    Since this implementation uses SSE/YMM registers it cannot safely be
    used in every situation, e.g. while an IRQ interrupts a kernel thread.
    The implementation falls back to the generic SHA1 variant, if using
    the SSE/YMM registers is not possible.

    With this algorithm I was able to increase the throughput of a single
    IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
    the SSSE3 variant -- a speedup of +34.8%.

    Saving and restoring SSE/YMM state might make the actual throughput
    fluctuate when there are FPU intensive userland applications running.
    For example, meassuring the performance using iperf2 directly on the
    machine under test gives wobbling numbers because iperf2 uses the FPU
    for each packet to check if the reporting interval has expired (in the
    above test I got min/max/avg: 402/484/464 MBit/s).

    Using this algorithm on a IPsec gateway gives much more reasonable and
    stable numbers, albeit not as high as in the directly connected case.
    Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
    FTB-8510:

    frame size sha1-generic sha1-ssse3 delta
    64 byte 37.5 MBit/s 37.5 MBit/s 0.0%
    128 byte 56.3 MBit/s 62.5 MBit/s +11.0%
    256 byte 87.5 MBit/s 100.0 MBit/s +14.3%
    512 byte 131.3 MBit/s 150.0 MBit/s +14.2%
    1024 byte 162.5 MBit/s 193.8 MBit/s +19.3%
    1280 byte 175.0 MBit/s 212.5 MBit/s +21.4%
    1420 byte 175.0 MBit/s 218.7 MBit/s +25.0%
    1518 byte 150.0 MBit/s 181.2 MBit/s +20.8%

    The throughput for the largest frame size is lower than for the
    previous size because the IP packets need to be fragmented in this
    case to make there way through the IPsec tunnel.

    Signed-off-by: Mathias Krause
    Cc: Maxim Locktyukhin
    Signed-off-by: Herbert Xu

    Mathias Krause
     
  • Export the update function as crypto_sha1_update() to not have the need
    to reimplement the same algorithm for each SHA-1 implementation. This
    way the generic SHA-1 implementation can be used as fallback for other
    implementations that fail to run under certain circumstances, like the
    need for an FPU context while executing in IRQ context.

    Signed-off-by: Mathias Krause
    Signed-off-by: Herbert Xu

    Mathias Krause
     
  • The completion callback will free the request so we must remove it from
    the completion list before calling the callback.

    Cc: Herbert Xu
    Signed-off-by: Jamie Iles
    Signed-off-by: Herbert Xu

    Jamie Iles
     
  • Allow the crypto engines to be matched from device tree bindings.

    Cc: devicetree-discuss@lists.ozlabs.org
    Cc: Herbert Xu
    Signed-off-by: Jamie Iles
    Signed-off-by: Herbert Xu

    Jamie Iles