24 Mar, 2012

1 commit


22 Mar, 2012

2 commits

  • Pull crypto update from Herbert Xu:
    "* sha512 bug fixes (already in your tree).
    * SHA224/SHA384 AEAD support in caam.
    * X86-64 optimised version of Camellia.
    * Tegra AES support.
    * Bulk algorithm registration interface to make driver registration easier.
    * padata race fixes.
    * Misc fixes."

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (31 commits)
    padata: Fix race on sequence number wrap
    padata: Fix race in the serialization path
    crypto: camellia - add assembler implementation for x86_64
    crypto: camellia - rename camellia.c to camellia_generic.c
    crypto: camellia - fix checkpatch warnings
    crypto: camellia - rename camellia module to camellia_generic
    crypto: tcrypt - add more camellia tests
    crypto: testmgr - add more camellia test vectors
    crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro
    crypto: twofish-x86_64/i586 - set alignmask to zero
    crypto: blowfish-x86_64 - set alignmask to zero
    crypto: serpent-sse2 - combine ablk_*_init functions
    crypto: blowfish-x86_64 - use crypto_[un]register_algs
    crypto: twofish-x86_64-3way - use crypto_[un]register_algs
    crypto: serpent-sse2 - use crypto_[un]register_algs
    crypto: serpent-sse2 - remove dead code from serpent_sse2_glue.c::serpent_sse2_init()
    crypto: twofish-x86 - Remove dead code from twofish_glue_3way.c::init()
    crypto: In crypto_add_alg(), 'exact' wants to be initialized to 0
    crypto: caam - fix gcc 4.6 warning
    crypto: Add bulk algorithm registration interface
    ...

    Linus Torvalds
     
  • Pull kmap_atomic cleanup from Cong Wang.

    It's been in -next for a long time, and it gets rid of the (no longer
    used) second argument to k[un]map_atomic().

    Fix up a few trivial conflicts in various drivers, and do an "evil
    merge" to catch some new uses that have come in since Cong's tree.

    * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
    feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
    highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
    drbd: remove the second argument of k[un]map_atomic()
    zcache: remove the second argument of k[un]map_atomic()
    gma500: remove the second argument of k[un]map_atomic()
    dm: remove the second argument of k[un]map_atomic()
    tomoyo: remove the second argument of k[un]map_atomic()
    sunrpc: remove the second argument of k[un]map_atomic()
    rds: remove the second argument of k[un]map_atomic()
    net: remove the second argument of k[un]map_atomic()
    mm: remove the second argument of k[un]map_atomic()
    lib: remove the second argument of k[un]map_atomic()
    power: remove the second argument of k[un]map_atomic()
    kdb: remove the second argument of k[un]map_atomic()
    udf: remove the second argument of k[un]map_atomic()
    ubifs: remove the second argument of k[un]map_atomic()
    squashfs: remove the second argument of k[un]map_atomic()
    reiserfs: remove the second argument of k[un]map_atomic()
    ocfs2: remove the second argument of k[un]map_atomic()
    ntfs: remove the second argument of k[un]map_atomic()
    ...

    Linus Torvalds
     

20 Mar, 2012

1 commit


14 Mar, 2012

7 commits

  • Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
    functions are provided. First set is regular 'one-block at time' encrypt/decrypt
    functions. Second is 'two-block at time' functions that gain performance increase
    on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
    functions with in-order CPUs.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmark results:

    AMD Phenom II 1055T (fam:16, model:10):

    camellia-asm vs camellia_generic:
    128bit key: (lrw:256bit) (xts:256bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x
    64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x
    256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x
    1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x
    8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x

    256bit key: (lrw:384bit) (xts:512bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x
    64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x
    256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x
    1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x
    8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x

    camellia-asm vs aes-asm (8kB block):
    128bit 256bit
    ecb-enc 1.15x 1.22x
    ecb-dec 1.16x 1.16x
    cbc-enc 0.85x 0.90x
    cbc-dec 1.20x 1.23x
    ctr-enc 1.28x 1.30x
    ctr-dec 1.27x 1.28x
    lrw-enc 1.12x 1.16x
    lrw-dec 1.08x 1.10x
    xts-enc 1.11x 1.15x
    xts-dec 1.14x 1.15x

    Intel Core2 T8100 (fam:6, model:23, step:6):

    camellia-asm vs camellia_generic:
    128bit key: (lrw:256bit) (xts:256bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x
    64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x
    256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x
    1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x
    8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x

    256bit key: (lrw:384bit) (xts:512bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x
    64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x
    256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x
    1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x
    8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x

    camellia-asm vs aes-asm (8kB block):
    128bit 256bit
    ecb-enc 1.17x 1.21x
    ecb-dec 1.17x 1.20x
    cbc-enc 0.80x 0.82x
    cbc-dec 1.22x 1.24x
    ctr-enc 1.25x 1.26x
    ctr-dec 1.25x 1.26x
    lrw-enc 1.14x 1.18x
    lrw-dec 1.13x 1.17x
    xts-enc 1.14x 1.18x
    xts-dec 1.14x 1.17x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Fix checkpatch warnings before renaming file.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Rename camellia module to camellia_generic to allow optimized assembler
    implementations to autoload with module-alias.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Add tests for CTR, LRW and XTS modes.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • New ECB, CBC, CTR, LRW and XTS test vectors for camellia. Larger ECB/CBC test
    vectors needed for parallel 2-way camellia implementation.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • camellia_setup_tail() applies 'inverse of the last half of P-function' to
    subkeys, which is unneeded if keys are applied directly to yl/yr in
    CAMELLIA_ROUNDSM.

    Patch speeds up key setup and should speed up CAMELLIA_ROUNDSM as applying
    key to yl/yr early has less register dependencies.

    Quick tcrypt camellia results:
    x86_64, AMD Phenom II, ~5% faster
    x86_64, Intel Core 2, ~0.5% faster
    i386, Intel Atom N270, ~1% faster

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

27 Feb, 2012

1 commit


19 Feb, 2012

1 commit


16 Feb, 2012

2 commits


14 Feb, 2012

1 commit

  • This updates the sha512 fix so that it doesn't cause excessive stack
    usage on i386. This is done by reverting to the original code, and
    avoiding the W duplication by moving its initialisation into the loop.

    As the underlying code is in fact the one that we have used for years,
    I'm pushing this now instead of postponing to the next cycle.

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: sha512 - Avoid stack bloat on i386
    crypto: sha512 - Use binary and instead of modulus

    Linus Torvalds
     

05 Feb, 2012

2 commits

  • We declare 'exact' without initializing it and then do:

    [...]
    if (strlen(p->cru_driver_name))
    exact = 1;

    if (priority && !exact)
    return -EINVAL;

    [...]

    If the first 'if' is not true, then the second will test an
    uninitialized 'exact'.
    As far as I can tell, what we want is for 'exact' to be initialized to
    0 (zero/false).

    Signed-off-by: Jesper Juhl
    Acked-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Jesper Juhl
     
  • Unfortunately in reducing W from 80 to 16 we ended up unrolling
    the loop twice. As gcc has issues dealing with 64-bit ops on
    i386 this means that we end up using even more stack space (>1K).

    This patch solves the W reduction by moving LOAD_OP/BLEND_OP
    into the loop itself, thus avoiding the need to duplicate it.

    While the stack space still isn't great (>0.5K) it is at least
    in the same ball park as the amount of stack used for our C sha1
    implementation.

    Note that this patch basically reverts to the original code so
    the diff looks bigger than it really is.

    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu

    Herbert Xu
     

26 Jan, 2012

3 commits


15 Jan, 2012

4 commits

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     
  • Use standard ror64() instead of hand-written.
    There is no standard ror64, so create it.

    The difference is shift value being "unsigned int" instead of uint64_t
    (for which there is no reason). gcc starts to emit native ROR instructions
    which it doesn't do for some reason currently. This should make the code
    faster.

    Patch survives in-tree crypto test and ping flood with hmac(sha512) on.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Herbert Xu

    Alexey Dobriyan
     
  • For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
    Consequently, keeping all W[80] array on stack is unnecessary,
    only 16 values are really needed.

    Using W[16] instead of W[80] greatly reduces stack usage
    (~750 bytes to ~340 bytes on x86_64).

    Line by line explanation:
    * BLEND_OP
    array is "circular" now, all indexes have to be modulo 16.
    Round number is positive, so remainder operation should be
    without surprises.

    * initial full message scheduling is trimmed to first 16 values which
    come from data block, the rest is calculated before it's needed.

    * original loop body is unrolled version of new SHA512_0_15 and
    SHA512_16_79 macros, unrolling was done to not do explicit variable
    renaming. Otherwise it's the very same code after preprocessing.
    See sha1_transform() code which does the same trick.

    Patch survives in-tree crypto test and original bugreport test
    (ping flood with hmac(sha512).

    See FIPS 180-2 for SHA-512 definition
    http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf

    Signed-off-by: Alexey Dobriyan
    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu

    Alexey Dobriyan
     
  • commit f9e2bca6c22d75a289a349f869701214d63b5060
    aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
    created global message schedule area.

    If sha512_update will ever be entered twice, hash will be silently
    calculated incorrectly.

    Probably the easiest way to notice incorrect hashes being calculated is
    to run 2 ping floods over AH with hmac(sha512):

    #!/usr/sbin/setkey -f
    flush;
    spdflush;
    add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
    add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
    spdadd IP1 IP2 any -P out ipsec ah/transport//require;
    spdadd IP2 IP1 any -P in ipsec ah/transport//require;

    XfrmInStateProtoError will start ticking with -EBADMSG being returned
    from ah_input(). This never happens with, say, hmac(sha1).

    With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
    with multiple bidirectional ping flood streams like it doesn't tick
    with SHA-1.

    After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
    This is OK for simple loads, for something more heavy, stack reduction will be done
    separatedly.

    Signed-off-by: Alexey Dobriyan
    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu

    Alexey Dobriyan
     

11 Jan, 2012

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (54 commits)
    crypto: gf128mul - remove leftover "(EXPERIMENTAL)" in Kconfig
    crypto: serpent-sse2 - remove unneeded LRW/XTS #ifdefs
    crypto: serpent-sse2 - select LRW and XTS
    crypto: twofish-x86_64-3way - remove unneeded LRW/XTS #ifdefs
    crypto: twofish-x86_64-3way - select LRW and XTS
    crypto: xts - remove dependency on EXPERIMENTAL
    crypto: lrw - remove dependency on EXPERIMENTAL
    crypto: picoxcell - fix boolean and / or confusion
    crypto: caam - remove DECO access initialization code
    crypto: caam - fix polarity of "propagate error" logic
    crypto: caam - more desc.h cleanups
    crypto: caam - desc.h - convert spaces to tabs
    crypto: talitos - convert talitos_error to struct device
    crypto: talitos - remove NO_IRQ references
    crypto: talitos - fix bad kfree
    crypto: convert drivers/crypto/* to use module_platform_driver()
    char: hw_random: convert drivers/char/hw_random/* to use module_platform_driver()
    crypto: serpent-sse2 - should select CRYPTO_CRYPTD
    crypto: serpent - rename serpent.c to serpent_generic.c
    crypto: serpent - cleanup checkpatch errors and warnings
    ...

    Linus Torvalds
     

20 Dec, 2011

5 commits


30 Nov, 2011

3 commits


21 Nov, 2011

3 commits

  • Patch adds LRW support for serpent-sse2 by using lrw_crypt(). Patch has been
    tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    Benchmark results with tcrypt:

    Intel Celeron T1600 (x86_64) (fam:6, model:15, step:13):
    size lrw-enc lrw-dec
    16B 1.00x 0.96x
    64B 1.01x 1.01x
    256B 3.01x 2.97x
    1024B 3.39x 3.33x
    8192B 3.35x 3.33x

    AMD Phenom II 1055T (x86_64) (fam:16, model:10):
    size lrw-enc lrw-dec
    16B 0.98x 1.03x
    64B 1.01x 1.04x
    256B 2.10x 2.14x
    1024B 2.28x 2.33x
    8192B 2.30x 2.33x

    Intel Atom N270 (i586):
    size lrw-enc lrw-dec
    16B 0.97x 0.97x
    64B 1.47x 1.50x
    256B 1.72x 1.69x
    1024B 1.88x 1.81x
    8192B 1.84x 1.79x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
    functions crypt data in four block chunks.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    Intel Atom N270:

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16 0.95x 1.12x 1.02x 1.07x 0.97x 0.98x
    64 1.73x 1.82x 1.08x 1.82x 1.72x 1.73x
    256 2.08x 2.00x 1.04x 2.07x 1.99x 2.01x
    1024 2.28x 2.18x 1.05x 2.23x 2.17x 2.20x
    8192 2.28x 2.13x 1.05x 2.23x 2.18x 2.20x

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt

    Userspace test results:

    Encryption/decryption of sse2-i586 vs generic on Intel Atom N270:
    encrypt: 2.35x
    decrypt: 2.54x

    Encryption/decryption of sse2-i586 vs generic on AMD Phenom II:
    encrypt: 1.82x
    decrypt: 2.51x

    Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330:
    encrypt: 2.99x
    decrypt: 3.48x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds x86_64/SSE2 assembler implementation of serpent cipher. Assembler
    functions crypt data in eigth block chunks (two 4 block chunk SSE2 operations
    in parallel to improve performance on out-of-order CPUs). Glue code is based
    on one from AES-NI implementation, so requests from irq context are redirected
    to cryptd.

    v2:
    - add missing include of linux/module.h
    (appearently crypto.h used to include module.h, which changed for 3.2 by
    commit 7c926402a7e8c9b279968fd94efec8700ba3859e)

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    AMD Phenom II 1055T (fam:16, model:10):

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 1.03x 1.01x 1.03x 1.05x 1.00x 0.99x
    64B 1.00x 1.01x 1.02x 1.04x 1.02x 1.01x
    256B 2.34x 2.41x 0.99x 2.43x 2.39x 2.40x
    1024B 2.51x 2.57x 1.00x 2.59x 2.56x 2.56x
    8192B 2.50x 2.54x 1.00x 2.55x 2.57x 2.57x

    Intel Celeron T1600 (fam:6, model:15, step:13):

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 0.97x 0.97x 1.01x 1.01x 1.01x 1.02x
    64B 1.00x 1.00x 1.00x 1.02x 1.01x 1.01x
    256B 3.41x 3.35x 1.00x 3.39x 3.42x 3.44x
    1024B 3.75x 3.72x 0.99x 3.74x 3.75x 3.75x
    8192B 3.70x 3.68x 0.99x 3.68x 3.69x 3.69x

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-sse2.txt
    http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-sse2.txt

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

14 Nov, 2011

2 commits


12 Nov, 2011

1 commit