21 Feb, 2012
3 commits
-
commit f2ea0f5f04c97b48c88edccba52b0682fbe45087 upstream.
Use standard ror64() instead of hand-written.
There is no standard ror64, so create it.The difference is shift value being "unsigned int" instead of uint64_t
(for which there is no reason). gcc starts to emit native ROR instructions
which it doesn't do for some reason currently. This should make the code
faster.Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
Signed-off-by: Alexey Dobriyan
Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman -
commit 3a92d687c8015860a19213e3c102cad6b722f83c upstream.
Unfortunately in reducing W from 80 to 16 we ended up unrolling
the loop twice. As gcc has issues dealing with 64-bit ops on
i386 this means that we end up using even more stack space (>1K).This patch solves the W reduction by moving LOAD_OP/BLEND_OP
into the loop itself, thus avoiding the need to duplicate it.While the stack space still isn't great (>0.5K) it is at least
in the same ball park as the amount of stack used for our C sha1
implementation.Note that this patch basically reverts to the original code so
the diff looks bigger than it really is.Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman -
commit 58d7d18b5268febb8b1391c6dffc8e2aaa751fcd upstream.
The previous patch used the modulus operator over a power of 2
unnecessarily which may produce suboptimal binary code. This
patch changes changes them to binary ands instead.Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman
04 Feb, 2012
2 commits
-
commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 upstream.
For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
Consequently, keeping all W[80] array on stack is unnecessary,
only 16 values are really needed.Using W[16] instead of W[80] greatly reduces stack usage
(~750 bytes to ~340 bytes on x86_64).Line by line explanation:
* BLEND_OP
array is "circular" now, all indexes have to be modulo 16.
Round number is positive, so remainder operation should be
without surprises.* initial full message scheduling is trimmed to first 16 values which
come from data block, the rest is calculated before it's needed.* original loop body is unrolled version of new SHA512_0_15 and
SHA512_16_79 macros, unrolling was done to not do explicit variable
renaming. Otherwise it's the very same code after preprocessing.
See sha1_transform() code which does the same trick.Patch survives in-tree crypto test and original bugreport test
(ping flood with hmac(sha512).See FIPS 180-2 for SHA-512 definition
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdfSigned-off-by: Alexey Dobriyan
Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman -
commit 84e31fdb7c797a7303e0cc295cb9bc8b73fb872d upstream.
commit f9e2bca6c22d75a289a349f869701214d63b5060
aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
created global message schedule area.If sha512_update will ever be entered twice, hash will be silently
calculated incorrectly.Probably the easiest way to notice incorrect hashes being calculated is
to run 2 ping floods over AH with hmac(sha512):#!/usr/sbin/setkey -f
flush;
spdflush;
add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
spdadd IP1 IP2 any -P out ipsec ah/transport//require;
spdadd IP2 IP1 any -P in ipsec ah/transport//require;XfrmInStateProtoError will start ticking with -EBADMSG being returned
from ah_input(). This never happens with, say, hmac(sha1).With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
with multiple bidirectional ping flood streams like it doesn't tick
with SHA-1.After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
This is OK for simple loads, for something more heavy, stack reduction will be done
separatedly.Signed-off-by: Alexey Dobriyan
Signed-off-by: Herbert Xu
Signed-off-by: Greg Kroah-Hartman
12 Nov, 2011
1 commit
-
* git://github.com/herbertx/crypto:
crypto: algapi - Fix build problem with NET disabled
crypto: user - Fix rwsem leak in crypto_user
11 Nov, 2011
1 commit
-
The report functions use NLA_PUT so we need to ensure that NET
is enabled.Reported-by: Luis Henriques
Signed-off-by: Herbert Xu
07 Nov, 2011
1 commit
-
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
02 Nov, 2011
2 commits
-
The list_empty case in crypto_alg_match() will return without calling
up_read() on crypto_alg_sem. We could do the "goto out" routine, but the
function will clearly do the right thing with that test simply removed.Signed-off-by: Jonathan Corbet
Signed-off-by: Herbert Xu -
* git://github.com/herbertx/crypto: (48 commits)
crypto: user - Depend on NET instead of selecting it
crypto: user - Add dependency on NET
crypto: talitos - handle descriptor not found in error path
crypto: user - Initialise match in crypto_alg_match
crypto: testmgr - add twofish tests
crypto: testmgr - add blowfish test-vectors
crypto: Make hifn_795x build depend on !ARCH_DMA_ADDR_T_64BIT
crypto: twofish-x86_64-3way - fix ctr blocksize to 1
crypto: blowfish-x86_64 - fix ctr blocksize to 1
crypto: whirlpool - count rounds from 0
crypto: Add userspace report for compress type algorithms
crypto: Add userspace report for cipher type algorithms
crypto: Add userspace report for rng type algorithms
crypto: Add userspace report for pcompress type algorithms
crypto: Add userspace report for nivaead type algorithms
crypto: Add userspace report for aead type algorithms
crypto: Add userspace report for givcipher type algorithms
crypto: Add userspace report for ablkcipher type algorithms
crypto: Add userspace report for blkcipher type algorithms
crypto: Add userspace report for ahash type algorithms
...
01 Nov, 2011
2 commits
-
Selecting NET causes all sorts of issues, including a dependency
loop involving bluetooth. This patch makes it a dependency instead.Signed-off-by: Herbert Xu
-
Part of the include cleanups means that the implicit
inclusion of module.h via device.h is going away. So
fix things up in advance.Signed-off-by: Paul Gortmaker
26 Oct, 2011
1 commit
-
Since the configuration interface relies on netlink we need to
select NET.Signed-off-by: Herbert Xu
21 Oct, 2011
24 commits
-
We need to default match to 0 as otherwise it may lead to a false
positive.Signed-off-by: Herbert Xu
-
Add tests for parallel twofish-x86_64-3way code paths.
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Add tests for parallel blowfish-x86_64 code paths.
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
rc[0] is unused because rounds are counted from 1.
Save an u64!Signed-off-by: Alexey Dobriyan
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
We add a report function pointer to struct crypto_type. This function
pointer is used from the crypto userspace configuration API to report
crypto algorithms to userspace.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
This patch adds a basic userspace configuration API for the crypto layer.
With this it is possible to instantiate, remove and to show crypto
algorithms from userspace.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
The upcomming crypto usrerspace configuration api needs
to remove the spawns on top on an algorithm, so export
crypto_remove_final.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
The upcomming crypto usrerspace configuration api needs
to remove the spawns on top on an algorithm, so export
crypto_remove_spawns.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
The upcomming crypto user configuration api needs to identify
crypto instances. This patch adds a flag that is set if the
algorithm is an instance that is build from templates.Signed-off-by: Steffen Klassert
Signed-off-by: Herbert Xu -
Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
module. New assembler functions crypt data in three blocks chunks, improving
cipher performance on out-of-order CPUs.Patch has been tested with tcrypt and automated filesystem tests.
Summary of the tcrypt benchmarks:
Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
encrypt: 1.3x speed
decrypt: 1.3x speedTwofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
encrypt: 1.07x speed
decrypt: 1.4x speedTwofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
encrypt: 1.4x speedTwofish 3-way-asm vs AES asm (128bit 8kb block ECB)
encrypt: 1.0x speed
decrypt: 1.0x speedTwofish 3-way-asm vs AES asm (128bit 8kb block CBC)
encrypt: 0.84x speed
decrypt: 1.09x speedTwofish 3-way-asm vs AES asm (128bit 8kb block CTR)
encrypt: 1.15x speedFull output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txtTests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T ProcessorAlso userspace test were run on:
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz
stepping : 11Userspace test results:
Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
encrypt: 1.27x
decrypt: 1.25xEncryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
encrypt: 1.36x
decrypt: 1.36xSigned-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
The ghash_update function passes a pointer to gf128mul_4k_lle which will
be NULL if ghash_setkey is not called or if the most recent call to
ghash_setkey failed to allocate memory. This causes an oops. Fix this
up by returning an error code in the null case.This is trivially triggered from unprivileged userspace through the
AF_ALG interface by simply writing to the socket without setting a key.The ghash_final function has a similar issue, but triggering it requires
a memory allocation failure in ghash_setkey _after_ at least one
successful call to ghash_update.BUG: unable to handle kernel NULL pointer dereference at 00000670
IP: [] gf128mul_4k_lle+0x23/0x60 [gf128mul]
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ghash_generic gf128mul algif_hash af_alg nfs lockd nfs_acl sunrpc bridge ipv6 stp llcPid: 1502, comm: hashatron Tainted: G W 3.1.0-rc9-00085-ge9308cf #32 Bochs Bochs
EIP: 0060:[] EFLAGS: 00000202 CPU: 0
EIP is at gf128mul_4k_lle+0x23/0x60 [gf128mul]
EAX: d69db1f0 EBX: d6b8ddac ECX: 00000004 EDX: 00000000
ESI: 00000670 EDI: d6b8ddac EBP: d6b8ddc8 ESP: d6b8dda4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process hashatron (pid: 1502, ti=d6b8c000 task=d6810000 task.ti=d6b8c000)
Stack:
00000000 d69db1f0 00000163 00000000 d6b8ddc8 c101a520 d69db1f0 d52aa000
00000ff0 d6b8dde8 d88d310f d6b8a3f8 d52aa000 00001000 d88d502c d6b8ddfc
00001000 d6b8ddf4 c11676ed d69db1e8 d6b8de24 c11679ad d52aa000 00000000
Call Trace:
[] ? kmap_atomic_prot+0x37/0xa6
[] ghash_update+0x85/0xbe [ghash_generic]
[] crypto_shash_update+0x18/0x1b
[] shash_ahash_update+0x22/0x36
[] shash_async_update+0xb/0xd
[] hash_sendpage+0xba/0xf2 [algif_hash]
[] kernel_sendpage+0x39/0x4e
[] ? 0xd88cdfff
[] sock_sendpage+0x37/0x3e
[] ? kernel_sendpage+0x4e/0x4e
[] pipe_to_sendpage+0x56/0x61
[] splice_from_pipe_feed+0x58/0xcd
[] ? splice_from_pipe_begin+0x10/0x10
[] __splice_from_pipe+0x36/0x55
[] ? splice_from_pipe_begin+0x10/0x10
[] splice_from_pipe+0x51/0x64
[] ? default_file_splice_write+0x2c/0x2c
[] generic_splice_sendpage+0x13/0x15
[] ? splice_from_pipe_begin+0x10/0x10
[] do_splice_from+0x5d/0x67
[] sys_splice+0x2bf/0x363
[] ? sysenter_exit+0xf/0x16
[] ? trace_hardirqs_on_caller+0x10e/0x13f
[] sysenter_do_call+0x12/0x32
Code: 83 c4 0c 5b 5e 5f c9 c3 55 b9 04 00 00 00 89 e5 57 8d 7d e4 56 53 8d 5d e4 83 ec 18 89 45 e0 89 55 dc 0f b6 70 0f c1 e6 04 01 d6 a5 be 0f 00 00 00 4e 89 d8 e8 48 ff ff ff 8b 45 e0 89 da 0f
EIP: [] gf128mul_4k_lle+0x23/0x60 [gf128mul] SS:ESP 0068:d6b8dda4
CR2: 0000000000000670
---[ end trace 4eaa2a86a8e2da24 ]---
note: hashatron[1502] exited with preempt_count 1
BUG: scheduling while atomic: hashatron/1502/0x10000002
INFO: lockdep is turned off.
[...]Signed-off-by: Nick Bowler
Cc: stable@kernel.org [2.6.37+]
Signed-off-by: Herbert Xu
22 Sep, 2011
3 commits
-
Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
functions are provided. First set is regular 'one-block at time'
encrypt/decrypt functions. Second is 'four-block at time' functions that
gain performance increase on out-of-order CPUs. Performance of 4-way
functions should be equal to 1-way functions with in-order CPUs.Summary of the tcrypt benchmarks:
Blowfish assembler vs blowfish C (256bit 8kb block ECB)
encrypt: 2.2x speed
decrypt: 2.3x speedBlowfish assembler vs blowfish C (256bit 8kb block CBC)
encrypt: 1.12x speed
decrypt: 2.5x speedBlowfish assembler vs blowfish C (256bit 8kb block CTR)
encrypt: 2.5x speedFull output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txtTests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor
stepping : 0Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Add ctr(blowfish) speed test to receive results for blowfish x86_64 assembly
patch.Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu -
Rename blowfish to blowfish_generic so that assembler versions of blowfish
cipher can autoload. Module alias 'blowfish' is added.Also fix checkpatch warnings.
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu