20 Dec, 2011

2 commits


21 Nov, 2011

5 commits

  • LRW/XTS patches for serpent-sse2 forgot to add this. CRYPTO_TFM_REQ_MAY_SLEEP
    should be cleared as sleeping between kernel_fpu_begin()/kernel_fpu_end() is
    not allowed.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds XTS support for serpent-sse2 by using xts_crypt(). Patch has been
    tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    Intel Celeron T1600 (x86_64) (fam:6, model:15, step:13):
    size xts-enc xts-dec
    16B 0.98x 1.00x
    64B 1.00x 1.01x
    256B 2.78x 2.75x
    1024B 3.30x 3.26x
    8192B 3.39x 3.30x

    AMD Phenom II 1055T (x86_64) (fam:16, model:10):
    size xts-enc xts-dec
    16B 1.05x 1.02x
    64B 1.04x 1.03x
    256B 2.10x 2.05x
    1024B 2.34x 2.35x
    8192B 2.34x 2.40x

    Intel Atom N270 (i586):
    size xts-enc xts-dec
    16B 0.95x 0.96x
    64B 1.53x 1.50x
    256B 1.72x 1.75x
    1024B 1.88x 1.87x
    8192B 1.86x 1.83x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds LRW support for serpent-sse2 by using lrw_crypt(). Patch has been
    tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    Benchmark results with tcrypt:

    Intel Celeron T1600 (x86_64) (fam:6, model:15, step:13):
    size lrw-enc lrw-dec
    16B 1.00x 0.96x
    64B 1.01x 1.01x
    256B 3.01x 2.97x
    1024B 3.39x 3.33x
    8192B 3.35x 3.33x

    AMD Phenom II 1055T (x86_64) (fam:16, model:10):
    size lrw-enc lrw-dec
    16B 0.98x 1.03x
    64B 1.01x 1.04x
    256B 2.10x 2.14x
    1024B 2.28x 2.33x
    8192B 2.30x 2.33x

    Intel Atom N270 (i586):
    size lrw-enc lrw-dec
    16B 0.97x 0.97x
    64B 1.47x 1.50x
    256B 1.72x 1.69x
    1024B 1.88x 1.81x
    8192B 1.84x 1.79x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
    functions crypt data in four block chunks.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    Intel Atom N270:

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16 0.95x 1.12x 1.02x 1.07x 0.97x 0.98x
    64 1.73x 1.82x 1.08x 1.82x 1.72x 1.73x
    256 2.08x 2.00x 1.04x 2.07x 1.99x 2.01x
    1024 2.28x 2.18x 1.05x 2.23x 2.17x 2.20x
    8192 2.28x 2.13x 1.05x 2.23x 2.18x 2.20x

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt

    Userspace test results:

    Encryption/decryption of sse2-i586 vs generic on Intel Atom N270:
    encrypt: 2.35x
    decrypt: 2.54x

    Encryption/decryption of sse2-i586 vs generic on AMD Phenom II:
    encrypt: 1.82x
    decrypt: 2.51x

    Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330:
    encrypt: 2.99x
    decrypt: 3.48x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds x86_64/SSE2 assembler implementation of serpent cipher. Assembler
    functions crypt data in eigth block chunks (two 4 block chunk SSE2 operations
    in parallel to improve performance on out-of-order CPUs). Glue code is based
    on one from AES-NI implementation, so requests from irq context are redirected
    to cryptd.

    v2:
    - add missing include of linux/module.h
    (appearently crypto.h used to include module.h, which changed for 3.2 by
    commit 7c926402a7e8c9b279968fd94efec8700ba3859e)

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

    AMD Phenom II 1055T (fam:16, model:10):

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 1.03x 1.01x 1.03x 1.05x 1.00x 0.99x
    64B 1.00x 1.01x 1.02x 1.04x 1.02x 1.01x
    256B 2.34x 2.41x 0.99x 2.43x 2.39x 2.40x
    1024B 2.51x 2.57x 1.00x 2.59x 2.56x 2.56x
    8192B 2.50x 2.54x 1.00x 2.55x 2.57x 2.57x

    Intel Celeron T1600 (fam:6, model:15, step:13):

    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 0.97x 0.97x 1.01x 1.01x 1.01x 1.02x
    64B 1.00x 1.00x 1.00x 1.02x 1.01x 1.01x
    256B 3.41x 3.35x 1.00x 3.39x 3.42x 3.44x
    1024B 3.75x 3.72x 0.99x 3.74x 3.75x 3.75x
    8192B 3.70x 3.68x 0.99x 3.68x 3.69x 3.69x

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-sse2.txt
    http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-generic.txt
    http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-sse2.txt

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

09 Nov, 2011

2 commits

  • Patch adds XTS support for twofish-x86_64-3way by using xts_crypt(). Patch has
    been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (twofish-3way/twofish-asm speed ratios):

    Intel Celeron T1600 (fam:6, model:15, step:13):

    size xts-enc xts-dec
    16B 0.98x 1.00x
    64B 1.14x 1.15x
    256B 1.23x 1.25x
    1024B 1.26x 1.29x
    8192B 1.28x 1.30x

    AMD Phenom II 1055T (fam:16, model:10):

    size xts-enc xts-dec
    16B 1.03x 1.03x
    64B 1.13x 1.16x
    256B 1.20x 1.20x
    1024B 1.22x 1.22x
    8192B 1.22x 1.21x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds LRW support for twofish-x86_64-3way by using lrw_crypt(). Patch has
    been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmarks results (twofish-3way/twofish-asm speed ratios):

    Intel Celeron T1600 (fam:6, model:15, step:13):

    size lrw-enc lrw-dec
    16B 0.99x 1.00x
    64B 1.17x 1.17x
    256B 1.26x 1.27x
    1024B 1.30x 1.31x
    8192B 1.31x 1.32x

    AMD Phenom II 1055T (fam:16, model:10):

    size lrw-enc lrw-dec
    16B 1.06x 1.01x
    64B 1.08x 1.14x
    256B 1.19x 1.20x
    1024B 1.21x 1.22x
    8192B 1.23x 1.24x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

01 Nov, 2011

1 commit


21 Oct, 2011

6 commits

  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
    module. New assembler functions crypt data in three blocks chunks, improving
    cipher performance on out-of-order CPUs.

    Patch has been tested with tcrypt and automated filesystem tests.

    Summary of the tcrypt benchmarks:

    Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
    encrypt: 1.3x speed
    decrypt: 1.3x speed

    Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
    encrypt: 1.07x speed
    decrypt: 1.4x speed

    Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
    encrypt: 1.4x speed

    Twofish 3-way-asm vs AES asm (128bit 8kb block ECB)
    encrypt: 1.0x speed
    decrypt: 1.0x speed

    Twofish 3-way-asm vs AES asm (128bit 8kb block CBC)
    encrypt: 0.84x speed
    decrypt: 1.09x speed

    Twofish 3-way-asm vs AES asm (128bit 8kb block CTR)
    encrypt: 1.15x speed

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt

    Tests were run on:
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 10
    model name : AMD Phenom(tm) II X6 1055T Processor

    Also userspace test were run on:
    vendor_id : GenuineIntel
    cpu family : 6
    model : 15
    model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz
    stepping : 11

    Userspace test results:

    Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
    encrypt: 1.27x
    decrypt: 1.25x

    Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
    encrypt: 1.36x
    decrypt: 1.36x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • This needed by 3-way twofish patch to be able to easily use one block
    assembler functions. As glue code is shared between i586/x86_64 apply
    change to i586 assembler too. Also export assembler functions for
    3-way parallel twofish module.

    CC: Joachim Fritschi
    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • This patch adds improved F-macro for 4-way parallel functions. With new
    F-macro for 4-way parallel functions, blowfish sees ~15% improvement in
    speed tests on AMD Phenom II (~5% on Intel Xeon E7330).

    However when used in 1-way blowfish function new macro would be ~10%
    slower than original, so old F-macro is kept for 1-way functions.
    Patch cleans up old F-macro as it is no longer needed in 4-way part.

    Patch also does register macro renaming to reduce stack usage.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

22 Sep, 2011

2 commits

  • Include to pick up the declarations for crypto_aes_encrypt_x86
    and crypto_aes_decrypt_x86 to quiet the sparse noise:

    warning: symbol 'crypto_aes_encrypt_x86' was not declared. Should it be static?
    warning: symbol 'crypto_aes_decrypt_x86' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Acked-by: Mandeep Singh Baines
    Signed-off-by: Herbert Xu

    H Hartley Sweeten
     
  • Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
    functions are provided. First set is regular 'one-block at time'
    encrypt/decrypt functions. Second is 'four-block at time' functions that
    gain performance increase on out-of-order CPUs. Performance of 4-way
    functions should be equal to 1-way functions with in-order CPUs.

    Summary of the tcrypt benchmarks:

    Blowfish assembler vs blowfish C (256bit 8kb block ECB)
    encrypt: 2.2x speed
    decrypt: 2.3x speed

    Blowfish assembler vs blowfish C (256bit 8kb block CBC)
    encrypt: 1.12x speed
    decrypt: 2.5x speed

    Blowfish assembler vs blowfish C (256bit 8kb block CTR)
    encrypt: 2.5x speed

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt

    Tests were run on:
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 10
    model name : AMD Phenom(tm) II X6 1055T Processor
    stepping : 0

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

10 Aug, 2011

1 commit

  • This is an assembler implementation of the SHA1 algorithm using the
    Supplemental SSE3 (SSSE3) instructions or, when available, the
    Advanced Vector Extensions (AVX).

    Testing with the tcrypt module shows the raw hash performance is up to
    2.3 times faster than the C implementation, using 8k data blocks on a
    Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
    faster.

    Since this implementation uses SSE/YMM registers it cannot safely be
    used in every situation, e.g. while an IRQ interrupts a kernel thread.
    The implementation falls back to the generic SHA1 variant, if using
    the SSE/YMM registers is not possible.

    With this algorithm I was able to increase the throughput of a single
    IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
    the SSSE3 variant -- a speedup of +34.8%.

    Saving and restoring SSE/YMM state might make the actual throughput
    fluctuate when there are FPU intensive userland applications running.
    For example, meassuring the performance using iperf2 directly on the
    machine under test gives wobbling numbers because iperf2 uses the FPU
    for each packet to check if the reporting interval has expired (in the
    above test I got min/max/avg: 402/484/464 MBit/s).

    Using this algorithm on a IPsec gateway gives much more reasonable and
    stable numbers, albeit not as high as in the directly connected case.
    Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
    FTB-8510:

    frame size sha1-generic sha1-ssse3 delta
    64 byte 37.5 MBit/s 37.5 MBit/s 0.0%
    128 byte 56.3 MBit/s 62.5 MBit/s +11.0%
    256 byte 87.5 MBit/s 100.0 MBit/s +14.3%
    512 byte 131.3 MBit/s 150.0 MBit/s +14.2%
    1024 byte 162.5 MBit/s 193.8 MBit/s +19.3%
    1280 byte 175.0 MBit/s 212.5 MBit/s +21.4%
    1420 byte 175.0 MBit/s 218.7 MBit/s +25.0%
    1518 byte 150.0 MBit/s 181.2 MBit/s +20.8%

    The throughput for the largest frame size is lower than for the
    previous size because the IP packets need to be fragmented in this
    case to make there way through the IPsec tunnel.

    Signed-off-by: Mathias Krause
    Cc: Maxim Locktyukhin
    Signed-off-by: Herbert Xu

    Mathias Krause
     

30 Jun, 2011

1 commit


18 May, 2011

1 commit

  • Fix build error on i386 by moving function prototypes:

    arch/x86/crypto/aesni-intel_glue.c: In function 'aesni_init':
    arch/x86/crypto/aesni-intel_glue.c:1263: error: implicit declaration of function 'crypto_fpu_init'
    arch/x86/crypto/aesni-intel_glue.c: In function 'aesni_exit':
    arch/x86/crypto/aesni-intel_glue.c:1373: error: implicit declaration of function 'crypto_fpu_exit'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Herbert Xu

    Randy Dunlap
     

16 May, 2011

1 commit

  • Loading fpu without aesni-intel does nothing. Loading aesni-intel
    without fpu causes modes like xts to fail. (Unloading
    aesni-intel will restore those modes.)

    One solution would be to make aesni-intel depend on fpu, but it
    seems cleaner to just combine the modules.

    This is probably responsible for bugs like:
    https://bugzilla.redhat.com/show_bug.cgi?id=589390

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Herbert Xu

    Andy Lutomirski
     

27 Mar, 2011

1 commit


19 Mar, 2011

1 commit


18 Mar, 2011

1 commit


16 Feb, 2011

1 commit


23 Jan, 2011

1 commit

  • There's a small memory leak in
    arch/x86/crypto/aesni-intel_glue.c::rfc4106_set_hash_subkey(). If the call
    to kmalloc() fails and returns NULL then the memory allocated previously
    by ablkcipher_request_alloc() is not freed when we leave the function.

    I could have just added a call to ablkcipher_request_free() before we
    return -ENOMEM, but that started to look too much like the code we
    already had at the end of the function, so I chose instead to rework the
    code a bit so that there are now a few labels at the end that we goto when
    various allocations fail, so we don't have to repeat the same blocks of
    code (this also reduces the object code size slightly).

    Signed-off-by: Jesper Juhl
    Signed-off-by: Herbert Xu

    Jesper Juhl
     

14 Jan, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
    hwrng: via_rng - Fix memory scribbling on some CPUs
    crypto: padlock - Move padlock.h into include/crypto
    hwrng: via_rng - Fix asm constraints
    crypto: n2 - use __devexit not __exit in n2_unregister_algs
    crypto: mark crypto workqueues CPU_INTENSIVE
    crypto: mv_cesa - dont return PTR_ERR() of wrong pointer
    crypto: ripemd - Set module author and update email address
    crypto: omap-sham - backlog handling fix
    crypto: gf128mul - Remove experimental tag
    crypto: af_alg - fix af_alg memory_allocated data type
    crypto: aesni-intel - Fixed build with binutils 2.16
    crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets
    net: Add missing lockdep class names for af_alg
    include: Install linux/if_alg.h for user-space crypto API
    crypto: omap-aes - checkpatch --file warning fixes
    crypto: omap-aes - initialize aes module once per request
    crypto: omap-aes - unnecessary code removed
    crypto: omap-aes - error handling implementation improved
    crypto: omap-aes - redundant locking is removed
    crypto: omap-aes - DMA initialization fixes for OMAP off mode
    ...

    Linus Torvalds
     

15 Dec, 2010

1 commit


13 Dec, 2010

1 commit


29 Nov, 2010

1 commit


27 Nov, 2010

1 commit

  • The AES-NI instructions are also available in legacy mode so the 32-bit
    architecture may profit from those, too.

    To illustrate the performance gain here's a short summary of a dm-crypt
    speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
    implementations:

    x86: i568 aes-ni delta
    ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4%
    CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3%
    LRW, 256 bit: 108.6 MB/s 222.1 MB/s +104.5%
    XTS, 256 bit: 105.0 MB/s 205.5 MB/s +95.7%

    Additionally, due to some minor optimizations, the 64-bit version also
    got a minor performance gain as seen below:

    x86-64: old impl. new impl. delta
    ECB, 256 bit: 121.1 MB/s 123.0 MB/s +1.5%
    CBC, 256 bit: 285.3 MB/s 290.8 MB/s +1.9%
    LRW, 256 bit: 263.7 MB/s 265.3 MB/s +0.6%
    XTS, 256 bit: 251.1 MB/s 255.3 MB/s +1.7%

    Signed-off-by: Mathias Krause
    Reviewed-by: Huang Ying
    Signed-off-by: Herbert Xu

    Mathias Krause
     

13 Nov, 2010

1 commit

  • This patch adds an optimized RFC4106 AES-GCM implementation for 64-bit
    kernels. It supports 128-bit AES key size. This leverages the crypto
    AEAD interface type to facilitate a combined AES & GCM operation to
    be implemented in assembly code. The assembly code leverages Intel(R)
    AES New Instructions and the PCLMULQDQ instruction.

    Signed-off-by: Adrian Hoban
    Signed-off-by: Tadeusz Struk
    Signed-off-by: Gabriele Paoloni
    Signed-off-by: Aidan O'Mahony
    Signed-off-by: Erdinc Ozturk
    Signed-off-by: James Guilford
    Signed-off-by: Wajdi Feghali
    Signed-off-by: Herbert Xu

    Tadeusz Struk
     

03 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

13 Mar, 2010

1 commit

  • Andrew Morton reported that AES-NI CTR optimization failed to compile
    with gas 2.16.1, the error message is as follow:

    arch/x86/crypto/aesni-intel_asm.S: Assembler messages:
    arch/x86/crypto/aesni-intel_asm.S:752: Error: suffix or operands invalid for `movq'
    arch/x86/crypto/aesni-intel_asm.S:753: Error: suffix or operands invalid for `movq'

    To fix this, a gas macro is defined to assemble movq with 64bit
    general purpose registers and XMM registers. The macro will generate
    the raw .byte sequence for needed instructions.

    Reported-by: Andrew Morton
    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

10 Mar, 2010

1 commit

  • To take advantage of the hardware pipeline implementation of AES-NI
    instructions. CTR mode cryption is implemented in ASM to schedule
    multiple AES-NI instructions one after another. This way, some latency
    of AES-NI instruction can be eliminated.

    Performance testing based on dm-crypt should 50% reduction of
    ecryption/decryption time.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

09 Feb, 2010

1 commit

  • In particular, several occurances of funny versions of 'success',
    'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
    'beginning', 'desirable', 'separate' and 'necessary' are fixed.

    Signed-off-by: Daniel Mack
    Cc: Joe Perches
    Cc: Junio C Hamano
    Signed-off-by: Jiri Kosina

    Daniel Mack
     

01 Dec, 2009

1 commit