21 Oct, 2011
27 commits
-
Add tests for parallel blowfish-x86_64 code paths.
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
hifn_795x works only on 32 bit, remove the detection while loading
the module and catch non-32 bit systems at build time.Signed-off-by: Richard Weinberger
Signed-off-by: Herbert Xu -
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
rc[0] is unused because rounds are counted from 1.
Save an u64!Signed-off-by: Alexey Dobriyan
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
We add a report function pointer to struct crypto_type. This function
pointer is used from the crypto userspace configuration API to report
crypto algorithms to userspace.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
This patch adds a basic userspace configuration API for the crypto layer.
With this it is possible to instantiate, remove and to show crypto
algorithms from userspace.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
The upcomming crypto usrerspace configuration api needs
to remove the spawns on top on an algorithm, so export
crypto_remove_final.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
The upcomming crypto usrerspace configuration api needs
to remove the spawns on top on an algorithm, so export
crypto_remove_spawns.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
The upcomming crypto user configuration api needs to identify
crypto instances. This patch adds a flag that is set if the
algorithm is an instance that is build from templates.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
module. New assembler functions crypt data in three blocks chunks, improving
cipher performance on out-of-order CPUs.Patch has been tested with tcrypt and automated filesystem tests.
Summary of the tcrypt benchmarks:
Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
encrypt: 1.3x speed
decrypt: 1.3x speedTwofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
encrypt: 1.07x speed
decrypt: 1.4x speedTwofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
encrypt: 1.4x speedTwofish 3-way-asm vs AES asm (128bit 8kb block ECB)
encrypt: 1.0x speed
decrypt: 1.0x speedTwofish 3-way-asm vs AES asm (128bit 8kb block CBC)
encrypt: 0.84x speed
decrypt: 1.09x speedTwofish 3-way-asm vs AES asm (128bit 8kb block CTR)
encrypt: 1.15x speedFull output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txtTests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T ProcessorAlso userspace test were run on:
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz
stepping : 11Userspace test results:
Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
encrypt: 1.27x
decrypt: 1.25xEncryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
encrypt: 1.36x
decrypt: 1.36xSigned-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
This needed by 3-way twofish patch to be able to easily use one block
assembler functions. As glue code is shared between i586/x86_64 apply
change to i586 assembler too. Also export assembler functions for
3-way parallel twofish module.CC: Joachim Fritschi
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
This patch adds improved F-macro for 4-way parallel functions. With new
F-macro for 4-way parallel functions, blowfish sees ~15% improvement in
speed tests on AMD Phenom II (~5% on Intel Xeon E7330).However when used in 1-way blowfish function new macro would be ~10%
slower than original, so old F-macro is kept for 1-way functions.
Patch cleans up old F-macro as it is no longer needed in 4-way part.Patch also does register macro renaming to reduce stack usage.
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu
20 Oct, 2011
1 commit
-
The picoxcell crypto driver requires the clk API, but the platform in
mainline does not currently support it. Add an explicit dependency on
HAVE_CLK to avoid build breakage.Signed-off-by: Jamie Iles
Signed-off-by: Herbert Xu
22 Sep, 2011
5 commits
-
Include to pick up the declarations for crypto_aes_encrypt_x86
and crypto_aes_decrypt_x86 to quiet the sparse noise:warning: symbol 'crypto_aes_encrypt_x86' was not declared. Should it be static?
warning: symbol 'crypto_aes_decrypt_x86' was not declared. Should it be static?Signed-off-by: H Hartley Sweeten
Acked-by: Mandeep Singh Baines
Signed-off-by: Herbert Xu -
Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
functions are provided. First set is regular 'one-block at time'
encrypt/decrypt functions. Second is 'four-block at time' functions that
gain performance increase on out-of-order CPUs. Performance of 4-way
functions should be equal to 1-way functions with in-order CPUs.Summary of the tcrypt benchmarks:
Blowfish assembler vs blowfish C (256bit 8kb block ECB)
encrypt: 2.2x speed
decrypt: 2.3x speedBlowfish assembler vs blowfish C (256bit 8kb block CBC)
encrypt: 1.12x speed
decrypt: 2.5x speedBlowfish assembler vs blowfish C (256bit 8kb block CTR)
encrypt: 2.5x speedFull output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txtTests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor
stepping : 0Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Add ctr(blowfish) speed test to receive results for blowfish x86_64 assembly
patch.Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Rename blowfish to blowfish_generic so that assembler versions of blowfish
cipher can autoload. Module alias 'blowfish' is added.Also fix checkpatch warnings.
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Patch splits up the blowfish crypto routine into a common part (key setup)
which will be used by blowfish crypto modules (x86_64 assembly and generic-c).Also fixes errors/warnings reported by checkpatch.
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu
20 Aug, 2011
1 commit
-
As cryptd is depeneded on by other algorithms such as aesni-intel,
it needs to be registered before them. When everything is built
as modules, this occurs naturally. However, for this to work when
they are built-in, we need to use subsys_initcall in cryptd.Tested-by: Josh Boyer
Signed-off-by: Herbert Xu
16 Aug, 2011
1 commit
-
On Tue, Aug 16, 2011 at 03:22:34PM +1000, Stephen Rothwell wrote:
>
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) produced this warning:
>
> In file included from security/integrity/ima/../integrity.h:16:0,
> from security/integrity/ima/ima.h:27,
> from security/integrity/ima/ima_policy.c:20:
> include/crypto/sha.h:86:10: warning: 'struct shash_desc' declared inside parameter list
> include/crypto/sha.h:86:10: warning: its scope is only this definition or declaration, which is probably not what you want
>
> Introduced by commit 7c390170b493 ("crypto: sha1 - export sha1_update for
> reuse"). I guess you need to include crypto/hash.h in crypto/sha.h.This patch fixes this by providing a declaration for struct shash_desc.
Reported-by: Stephen Rothwell
Signed-off-by: Herbert Xu
15 Aug, 2011
1 commit
-
Fix a get/put_cpu() imbalance in the error case when qp == NULL
Signed-off-by: Thomas Meyer
Signed-off-by: Herbert Xu
10 Aug, 2011
4 commits
-
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:frame size sha1-generic sha1-ssse3 delta
64 byte 37.5 MBit/s 37.5 MBit/s 0.0%
128 byte 56.3 MBit/s 62.5 MBit/s +11.0%
256 byte 87.5 MBit/s 100.0 MBit/s +14.3%
512 byte 131.3 MBit/s 150.0 MBit/s +14.2%
1024 byte 162.5 MBit/s 193.8 MBit/s +19.3%
1280 byte 175.0 MBit/s 212.5 MBit/s +21.4%
1420 byte 175.0 MBit/s 218.7 MBit/s +25.0%
1518 byte 150.0 MBit/s 181.2 MBit/s +20.8%The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.Signed-off-by: Mathias Krause
Cc: Maxim Locktyukhin
Signed-off-by: Herbert Xu -
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.Signed-off-by: Mathias Krause
Signed-off-by: Herbert Xu -
The completion callback will free the request so we must remove it from
the completion list before calling the callback.Cc: Herbert Xu
Signed-off-by: Jamie Iles
Signed-off-by: Herbert Xu -
Allow the crypto engines to be matched from device tree bindings.
Cc: devicetree-discuss@lists.ozlabs.org
Cc: Herbert Xu
Signed-off-by: Jamie Iles
Signed-off-by: Herbert Xu