19 Feb, 2012

1 commit


16 Feb, 2012

1 commit

  • Use standard ror64() instead of hand-written.
    There is no standard ror64, so create it.

    The difference is shift value being "unsigned int" instead of uint64_t
    (for which there is no reason). gcc starts to emit native ROR instructions
    which it doesn't do for some reason currently. This should make the code
    faster.

    Patch survives in-tree crypto test and ping flood with hmac(sha512) on.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Herbert Xu

    Alexey Dobriyan
     

14 Feb, 2012

1 commit

  • This updates the sha512 fix so that it doesn't cause excessive stack
    usage on i386. This is done by reverting to the original code, and
    avoiding the W duplication by moving its initialisation into the loop.

    As the underlying code is in fact the one that we have used for years,
    I'm pushing this now instead of postponing to the next cycle.

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: sha512 - Avoid stack bloat on i386
    crypto: sha512 - Use binary and instead of modulus

    Linus Torvalds
     

05 Feb, 2012

1 commit

  • Unfortunately in reducing W from 80 to 16 we ended up unrolling
    the loop twice. As gcc has issues dealing with 64-bit ops on
    i386 this means that we end up using even more stack space (>1K).

    This patch solves the W reduction by moving LOAD_OP/BLEND_OP
    into the loop itself, thus avoiding the need to duplicate it.

    While the stack space still isn't great (>0.5K) it is at least
    in the same ball park as the amount of stack used for our C sha1
    implementation.

    Note that this patch basically reverts to the original code so
    the diff looks bigger than it really is.

    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu

    Herbert Xu
     

26 Jan, 2012

2 commits


15 Jan, 2012

3 commits

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     
  • For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
    Consequently, keeping all W[80] array on stack is unnecessary,
    only 16 values are really needed.

    Using W[16] instead of W[80] greatly reduces stack usage
    (~750 bytes to ~340 bytes on x86_64).

    Line by line explanation:
    * BLEND_OP
    array is "circular" now, all indexes have to be modulo 16.
    Round number is positive, so remainder operation should be
    without surprises.

    * initial full message scheduling is trimmed to first 16 values which
    come from data block, the rest is calculated before it's needed.

    * original loop body is unrolled version of new SHA512_0_15 and
    SHA512_16_79 macros, unrolling was done to not do explicit variable
    renaming. Otherwise it's the very same code after preprocessing.
    See sha1_transform() code which does the same trick.

    Patch survives in-tree crypto test and original bugreport test
    (ping flood with hmac(sha512).

    See FIPS 180-2 for SHA-512 definition
    http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf

    Signed-off-by: Alexey Dobriyan
    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu

    Alexey Dobriyan
     
  • commit f9e2bca6c22d75a289a349f869701214d63b5060
    aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
    created global message schedule area.

    If sha512_update will ever be entered twice, hash will be silently
    calculated incorrectly.

    Probably the easiest way to notice incorrect hashes being calculated is
    to run 2 ping floods over AH with hmac(sha512):

    #!/usr/sbin/setkey -f
    flush;
    spdflush;
    add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
    add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
    spdadd IP1 IP2 any -P out ipsec ah/transport//require;
    spdadd IP2 IP1 any -P in ipsec ah/transport//require;

    XfrmInStateProtoError will start ticking with -EBADMSG being returned
    from ah_input(). This never happens with, say, hmac(sha1).

    With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
    with multiple bidirectional ping flood streams like it doesn't tick
    with SHA-1.

    After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
    This is OK for simple loads, for something more heavy, stack reduction will be done
    separatedly.

    Signed-off-by: Alexey Dobriyan
    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu

    Alexey Dobriyan
     

11 Jan, 2012

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (54 commits)
    crypto: gf128mul - remove leftover "(EXPERIMENTAL)" in Kconfig
    crypto: serpent-sse2 - remove unneeded LRW/XTS #ifdefs
    crypto: serpent-sse2 - select LRW and XTS
    crypto: twofish-x86_64-3way - remove unneeded LRW/XTS #ifdefs
    crypto: twofish-x86_64-3way - select LRW and XTS
    crypto: xts - remove dependency on EXPERIMENTAL
    crypto: lrw - remove dependency on EXPERIMENTAL
    crypto: picoxcell - fix boolean and / or confusion
    crypto: caam - remove DECO access initialization code
    crypto: caam - fix polarity of "propagate error" logic
    crypto: caam - more desc.h cleanups
    crypto: caam - desc.h - convert spaces to tabs
    crypto: talitos - convert talitos_error to struct device
    crypto: talitos - remove NO_IRQ references
    crypto: talitos - fix bad kfree
    crypto: convert drivers/crypto/* to use module_platform_driver()
    char: hw_random: convert drivers/char/hw_random/* to use module_platform_driver()
    crypto: serpent-sse2 - should select CRYPTO_CRYPTD
    crypto: serpent - rename serpent.c to serpent_generic.c
    crypto: serpent - cleanup checkpatch errors and warnings
    ...

    Linus Torvalds
     

20 Dec, 2011

5 commits


30 Nov, 2011

3 commits


21 Nov, 2011

3 commits

  • Patch adds LRW support for serpent-sse2 by using lrw_crypt(). Patch has been
    tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    Benchmark results with tcrypt:

    Intel Celeron T1600 (x86_64) (fam:6, model:15, step:13):
    size lrw-enc lrw-dec
    16B 1.00x 0.96x
    64B 1.01x 1.01x
    256B 3.01x 2.97x
    1024B 3.39x 3.33x
    8192B 3.35x 3.33x

    AMD Phenom II 1055T (x86_64) (fam:16, model:10):
    size lrw-enc lrw-dec
    16B 0.98x 1.03x
    64B 1.01x 1.04x
    256B 2.10x 2.14x
    1024B 2.28x 2.33x
    8192B 2.30x 2.33x

    Intel Atom N270 (i586):
    size lrw-enc lrw-dec
    16B 0.97x 0.97x
    64B 1.47x 1.50x
    256B 1.72x 1.69x
    1024B 1.88x 1.81x
    8192B 1.84x 1.79x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
    functions crypt data in four block chunks.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    Intel Atom N270:

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16 0.95x 1.12x 1.02x 1.07x 0.97x 0.98x
    64 1.73x 1.82x 1.08x 1.82x 1.72x 1.73x
    256 2.08x 2.00x 1.04x 2.07x 1.99x 2.01x
    1024 2.28x 2.18x 1.05x 2.23x 2.17x 2.20x
    8192 2.28x 2.13x 1.05x 2.23x 2.18x 2.20x

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt

    Userspace test results:

    Encryption/decryption of sse2-i586 vs generic on Intel Atom N270:
    encrypt: 2.35x
    decrypt: 2.54x

    Encryption/decryption of sse2-i586 vs generic on AMD Phenom II:
    encrypt: 1.82x
    decrypt: 2.51x

    Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330:
    encrypt: 2.99x
    decrypt: 3.48x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds x86_64/SSE2 assembler implementation of serpent cipher. Assembler
    functions crypt data in eigth block chunks (two 4 block chunk SSE2 operations
    in parallel to improve performance on out-of-order CPUs). Glue code is based
    on one from AES-NI implementation, so requests from irq context are redirected
    to cryptd.

    v2:
    - add missing include of linux/module.h
    (appearently crypto.h used to include module.h, which changed for 3.2 by
    commit 7c926402a7e8c9b279968fd94efec8700ba3859e)

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    AMD Phenom II 1055T (fam:16, model:10):

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 1.03x 1.01x 1.03x 1.05x 1.00x 0.99x
    64B 1.00x 1.01x 1.02x 1.04x 1.02x 1.01x
    256B 2.34x 2.41x 0.99x 2.43x 2.39x 2.40x
    1024B 2.51x 2.57x 1.00x 2.59x 2.56x 2.56x
    8192B 2.50x 2.54x 1.00x 2.55x 2.57x 2.57x

    Intel Celeron T1600 (fam:6, model:15, step:13):

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 0.97x 0.97x 1.01x 1.01x 1.01x 1.02x
    64B 1.00x 1.00x 1.00x 1.02x 1.01x 1.01x
    256B 3.41x 3.35x 1.00x 3.39x 3.42x 3.44x
    1024B 3.75x 3.72x 0.99x 3.74x 3.75x 3.75x
    8192B 3.70x 3.68x 0.99x 3.68x 3.69x 3.69x

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-sse2.txt
    http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-sse2.txt

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

14 Nov, 2011

2 commits


12 Nov, 2011

1 commit


11 Nov, 2011

1 commit


09 Nov, 2011

15 commits