21 Oct, 2011

10 commits

  • We add a report function pointer to struct crypto_type. This function
    pointer is used from the crypto userspace configuration API to report
    crypto algorithms to userspace.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • This patch adds a basic userspace configuration API for the crypto layer.
    With this it is possible to instantiate, remove and to show crypto
    algorithms from userspace.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • The upcomming crypto usrerspace configuration api needs
    to remove the spawns on top on an algorithm, so export
    crypto_remove_final.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • The upcomming crypto usrerspace configuration api needs
    to remove the spawns on top on an algorithm, so export
    crypto_remove_spawns.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • The upcomming crypto user configuration api needs to identify
    crypto instances. This patch adds a flag that is set if the
    algorithm is an instance that is build from templates.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
    module. New assembler functions crypt data in three blocks chunks, improving
    cipher performance on out-of-order CPUs.

    Patch has been tested with tcrypt and automated filesystem tests.

    Summary of the tcrypt benchmarks:

    Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
    encrypt: 1.3x speed
    decrypt: 1.3x speed

    Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
    encrypt: 1.07x speed
    decrypt: 1.4x speed

    Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
    encrypt: 1.4x speed

    Twofish 3-way-asm vs AES asm (128bit 8kb block ECB)
    encrypt: 1.0x speed
    decrypt: 1.0x speed

    Twofish 3-way-asm vs AES asm (128bit 8kb block CBC)
    encrypt: 0.84x speed
    decrypt: 1.09x speed

    Twofish 3-way-asm vs AES asm (128bit 8kb block CTR)
    encrypt: 1.15x speed

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt

    Tests were run on:
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 10
    model name : AMD Phenom(tm) II X6 1055T Processor

    Also userspace test were run on:
    vendor_id : GenuineIntel
    cpu family : 6
    model : 15
    model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz
    stepping : 11

    Userspace test results:

    Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
    encrypt: 1.27x
    decrypt: 1.25x

    Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
    encrypt: 1.36x
    decrypt: 1.36x

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • This needed by 3-way twofish patch to be able to easily use one block
    assembler functions. As glue code is shared between i586/x86_64 apply
    change to i586 assembler too. Also export assembler functions for
    3-way parallel twofish module.

    CC: Joachim Fritschi
    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • This patch adds improved F-macro for 4-way parallel functions. With new
    F-macro for 4-way parallel functions, blowfish sees ~15% improvement in
    speed tests on AMD Phenom II (~5% on Intel Xeon E7330).

    However when used in 1-way blowfish function new macro would be ~10%
    slower than original, so old F-macro is kept for 1-way functions.
    Patch cleans up old F-macro as it is no longer needed in 4-way part.

    Patch also does register macro renaming to reduce stack usage.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

20 Oct, 2011

1 commit


22 Sep, 2011

5 commits

  • Include to pick up the declarations for crypto_aes_encrypt_x86
    and crypto_aes_decrypt_x86 to quiet the sparse noise:

    warning: symbol 'crypto_aes_encrypt_x86' was not declared. Should it be static?
    warning: symbol 'crypto_aes_decrypt_x86' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Acked-by: Mandeep Singh Baines
    Signed-off-by: Herbert Xu

    H Hartley Sweeten
     
  • Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
    functions are provided. First set is regular 'one-block at time'
    encrypt/decrypt functions. Second is 'four-block at time' functions that
    gain performance increase on out-of-order CPUs. Performance of 4-way
    functions should be equal to 1-way functions with in-order CPUs.

    Summary of the tcrypt benchmarks:

    Blowfish assembler vs blowfish C (256bit 8kb block ECB)
    encrypt: 2.2x speed
    decrypt: 2.3x speed

    Blowfish assembler vs blowfish C (256bit 8kb block CBC)
    encrypt: 1.12x speed
    decrypt: 2.5x speed

    Blowfish assembler vs blowfish C (256bit 8kb block CTR)
    encrypt: 2.5x speed

    Full output:
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt
    http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt

    Tests were run on:
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 10
    model name : AMD Phenom(tm) II X6 1055T Processor
    stepping : 0

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Add ctr(blowfish) speed test to receive results for blowfish x86_64 assembly
    patch.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Rename blowfish to blowfish_generic so that assembler versions of blowfish
    cipher can autoload. Module alias 'blowfish' is added.

    Also fix checkpatch warnings.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Patch splits up the blowfish crypto routine into a common part (key setup)
    which will be used by blowfish crypto modules (x86_64 assembly and generic-c).

    Also fixes errors/warnings reported by checkpatch.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

20 Aug, 2011

1 commit


16 Aug, 2011

1 commit

  • On Tue, Aug 16, 2011 at 03:22:34PM +1000, Stephen Rothwell wrote:
    >
    > After merging the final tree, today's linux-next build (powerpc
    > allyesconfig) produced this warning:
    >
    > In file included from security/integrity/ima/../integrity.h:16:0,
    > from security/integrity/ima/ima.h:27,
    > from security/integrity/ima/ima_policy.c:20:
    > include/crypto/sha.h:86:10: warning: 'struct shash_desc' declared inside parameter list
    > include/crypto/sha.h:86:10: warning: its scope is only this definition or declaration, which is probably not what you want
    >
    > Introduced by commit 7c390170b493 ("crypto: sha1 - export sha1_update for
    > reuse"). I guess you need to include crypto/hash.h in crypto/sha.h.

    This patch fixes this by providing a declaration for struct shash_desc.

    Reported-by: Stephen Rothwell
    Signed-off-by: Herbert Xu

    Herbert Xu
     

15 Aug, 2011

1 commit


10 Aug, 2011

6 commits

  • This is an assembler implementation of the SHA1 algorithm using the
    Supplemental SSE3 (SSSE3) instructions or, when available, the
    Advanced Vector Extensions (AVX).

    Testing with the tcrypt module shows the raw hash performance is up to
    2.3 times faster than the C implementation, using 8k data blocks on a
    Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
    faster.

    Since this implementation uses SSE/YMM registers it cannot safely be
    used in every situation, e.g. while an IRQ interrupts a kernel thread.
    The implementation falls back to the generic SHA1 variant, if using
    the SSE/YMM registers is not possible.

    With this algorithm I was able to increase the throughput of a single
    IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
    the SSSE3 variant -- a speedup of +34.8%.

    Saving and restoring SSE/YMM state might make the actual throughput
    fluctuate when there are FPU intensive userland applications running.
    For example, meassuring the performance using iperf2 directly on the
    machine under test gives wobbling numbers because iperf2 uses the FPU
    for each packet to check if the reporting interval has expired (in the
    above test I got min/max/avg: 402/484/464 MBit/s).

    Using this algorithm on a IPsec gateway gives much more reasonable and
    stable numbers, albeit not as high as in the directly connected case.
    Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
    FTB-8510:

    frame size sha1-generic sha1-ssse3 delta
    64 byte 37.5 MBit/s 37.5 MBit/s 0.0%
    128 byte 56.3 MBit/s 62.5 MBit/s +11.0%
    256 byte 87.5 MBit/s 100.0 MBit/s +14.3%
    512 byte 131.3 MBit/s 150.0 MBit/s +14.2%
    1024 byte 162.5 MBit/s 193.8 MBit/s +19.3%
    1280 byte 175.0 MBit/s 212.5 MBit/s +21.4%
    1420 byte 175.0 MBit/s 218.7 MBit/s +25.0%
    1518 byte 150.0 MBit/s 181.2 MBit/s +20.8%

    The throughput for the largest frame size is lower than for the
    previous size because the IP packets need to be fragmented in this
    case to make there way through the IPsec tunnel.

    Signed-off-by: Mathias Krause
    Cc: Maxim Locktyukhin
    Signed-off-by: Herbert Xu

    Mathias Krause
     
  • Export the update function as crypto_sha1_update() to not have the need
    to reimplement the same algorithm for each SHA-1 implementation. This
    way the generic SHA-1 implementation can be used as fallback for other
    implementations that fail to run under certain circumstances, like the
    need for an FPU context while executing in IRQ context.

    Signed-off-by: Mathias Krause
    Signed-off-by: Herbert Xu

    Mathias Krause
     
  • The completion callback will free the request so we must remove it from
    the completion list before calling the callback.

    Cc: Herbert Xu
    Signed-off-by: Jamie Iles
    Signed-off-by: Herbert Xu

    Jamie Iles
     
  • Allow the crypto engines to be matched from device tree bindings.

    Cc: devicetree-discuss@lists.ozlabs.org
    Cc: Herbert Xu
    Signed-off-by: Jamie Iles
    Signed-off-by: Herbert Xu

    Jamie Iles
     
  • For using the device tree probing we use a connection ID for the
    clk_get() operation.

    Cc: Herbert Xu
    Signed-off-by: Jamie Iles
    Signed-off-by: Herbert Xu

    Jamie Iles
     
  • Use a platform ID table and a single platform_driver. It's neater and
    makes the device tree addition easier and more consistent. Rename the
    match values to be inline with what they'll be in the device tree
    bindings. There aren't any current in-tree users of the existing device
    names.

    Cc: Herbert Xu
    Signed-off-by: Jamie Iles
    Signed-off-by: Herbert Xu

    Jamie Iles
     

03 Aug, 2011

1 commit

  • When loading aes via the module alias, a padlock module failing to
    load due to missing hardware is not particularly notable. With
    v2.6.27-rc1~1107^2~14 (crypto: padlock - Make module loading quieter
    when hardware isn't available, 2008-07-03), the padlock-aes module
    suppresses the relevant messages when the "quiet" flag is in use; but
    better to suppress this particular message completely, since the
    administrator can already distinguish such errors by the absence of a
    message indicating initialization failing or succeeding.

    This avoids occasional messages in syslog of the form

    padlock_aes: VIA PadLock not detected.

    Signed-off-by: Jonathan Nieder
    Signed-off-by: Herbert Xu

    Jonathan Nieder
     

02 Aug, 2011

14 commits

  • exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which
    frees the memory.

    However select_bad_process() checks ->mm != NULL before TIF_MEMDIE,
    so it continues to kill other tasks even if we have the oom-killed
    task freeing its memory.

    Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip
    the tasks which have already passed exit_notify() to ensure a zombie
    with TIF_MEMDIE set can't block oom-killer. Alternatively we could
    probably clear TIF_MEMDIE after exit_mmap().

    Signed-off-by: Oleg Nesterov
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (23 commits)
    regulator: Improve WM831x DVS VSEL selection algorithm
    regulator: Bootstrap wm831x DVS VSEL value from ON VSEL if not already set
    regulator: Set up GPIO for WM831x VSEL before enabling VSEL mode
    regulator: Add EPEs to the MODULE_ALIAS() for wm831x-dcdc
    regulator: Fix WM831x DCDC DVS VSEL bootstrapping
    regulator: Fix WM831x regulator ID lookups for multiple WM831xs
    regulator: Fix argument format type errors in error prints
    regulator: Fix memory leak in set_machine_constraints() error paths
    regulator: Make core more chatty about some errors
    regulator: tps65910: Fix array access out of bounds bug
    regulator: tps65910: Add missing breaks in switch/case
    regulator: tps65910: Fix a memory leak in tps65910_probe error path
    regulator: TWL: Remove entry of RES_ID for 6030 macros
    ASoC: tlv320aic3x: Add correct hw registers to Line1 cross connect muxes
    regulator: Add basic per consumer debugfs
    regulator: Add rdev_crit() macro
    regulator: Refactor supply implementation to work as regular consumers
    regulator: Include the device name in the microamps_requested_ file
    regulator: Increase the limit on sysfs file names
    regulator: Properly register dummy regulator driver
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
    ext4: prevent memory leaks from ext4_mb_init_backend() on error path
    ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #
    ext4: use ext4_msg() instead of printk in mballoc
    ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info
    ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
    ext4: use the correct error exit path in ext4_init_inode_table()
    ext4: add missing kfree() on error return path in add_new_gdb()
    ext4: change umode_t in tracepoint headers to be an explicit __u16
    ext4: fix races in ext4_sync_parent()
    ext4: Fix overflow caused by missing cast in ext4_fallocate()
    ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
    ext4: simplify parameters of reserve_backup_gdb()
    ext4: simplify parameters of add_new_gdb()
    ext4: remove lock_buffer in bclean() and setup_new_group_blocks()
    ext4: simplify journal handling in setup_new_group_blocks()
    ext4: let setup_new_group_blocks() set multiple bits at a time
    ext4: fix a typo in ext4_group_extend()
    ext4: let ext4_group_add_blocks() handle 0 blocks quickly
    ext4: let ext4_group_add_blocks() return an error code
    ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()
    ...

    Fix up conflict in fs/ext4/inode.c: commit aacfc19c626e ("fs: simplify
    the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO()
    function for the new simplified calling convention, while commit
    dae1e52cb126 ("ext4: move ext4_ind_* functions from inode.c to
    indirect.c") moved the function to another file.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set
    VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock
    VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree()
    VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree()
    switch posix_acl_chmod() to umode_t
    switch posix_acl_from_mode() to umode_t
    switch posix_acl_equiv_mode() to umode_t *
    switch posix_acl_create() to umode_t *
    block: initialise bd_super in bdget()
    vfs: avoid call to inode_lru_list_del() if possible
    vfs: avoid taking inode_hash_lock on pipes and sockets
    vfs: conditionally call inode_wb_list_del()
    VFS: Fix automount for negative autofs dentries
    Btrfs: load the key from the dir item in readdir into a fake dentry
    devtmpfs: missing initialialization in never-hit case
    hppfs: missing include

    Linus Torvalds
     
  • * 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (37 commits)
    Improve slave/cyclic DMA engine documentation
    dmaengine: pl08x: handle the rest of enums in pl08x_width
    DMA: PL08x: cleanup selection of burst size
    DMA: PL08x: avoid recalculating cctl at each prepare
    DMA: PL08x: cleanup selection of buswidth
    DMA: PL08x: constify plchan->cd and plat->slave_channels
    DMA: PL08x: separately store source/destination cctl
    DMA: PL08x: separately store source/destination slave address
    DMA: PL08x: clean up LLI debugging
    DMA: PL08x: select LLI bus only once per LLI setup
    DMA: PL08x: remove unused constants
    ARM: mxs-dma: reset after disable channel
    dma: intel_mid_dma: remove redundant pci_set_drvdata calls
    dma: mxs-dma: fix unterminated platform_device_id table
    dmaengine: pl330: make platform data optional
    dmaengine: imx-sdma: return proper error if kzalloc fails
    pch_dma: Fix CTL register access issue
    dmaengine: mxs-dma: skip request_irq for NO_IRQ
    dmaengine/coh901318: fix slave submission semantics
    dmaengine/ste_dma40: allow memory buswidth/burst to be configured
    ...

    Fix trivial whitespace conflict in drivers/dma/mv_xor.c

    Linus Torvalds
     
  • * 'gpiolib' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] Hook up gpiolib support

    Linus Torvalds
     
  • * 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    efivars: Introduce PSTORE_EFI_ATTRIBUTES
    efivars: Use string functions in pstore_write
    efivars: introduce utf16_strncmp
    efivars: String functions
    efi: Add support for using efivars as a pstore backend
    pstore: Allow the user to explicitly choose a backend
    pstore: Make "part" unsigned
    pstore: Add extra context for writes and erases
    pstore: Extend API for more flexibility in new backends

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
    kdb,kgdb: Allow arbitrary kgdb magic knock sequences
    kdb: Remove all references to DOING_KGDB2
    kdb,kgdb: Implement switch and pass buffer from kdb -> gdb
    kdb: cleanup unused variables missed in the original kdb merge

    Linus Torvalds
     
  • In ext4_mb_init(), if the s_locality_group allocation fails it will
    currently cause the allocations made in ext4_mb_init_backend() to
    be leaked. Moving the ext4_mb_init_backend() allocation after the
    s_locality_group allocation avoids that problem.

    Signed-off-by: Yu Jian
    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Yu Jian
     
  • Signed-off-by: Yu Jian
    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Yu Jian
     
  • Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The first packet that gdb sends when the kernel is in kdb mode seems
    to change with every release of gdb. Instead of continuing to add
    many different gdb packets, change kdb to automatically look for any
    thing that looks like a gdb packet.

    Example 1 cold start test:
    echo g > /proc/sysrq-trigger
    $D#44+

    Example 2 cold start test:
    echo g > /proc/sysrq-trigger
    $3#33

    The second one should re-enter kdb's shell right away and is purely a
    test.

    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • The DOING_KGDB2 was originally a state variable for one of the two
    ways to automatically transition from kdb to kgdb. Purge all these
    variables and just use one single state for the transition.

    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • When switching from kdb mode to kgdb mode packets were getting lost
    depending on the size of the fifo queue of the serial chip. When gdb
    initially connects if it is in kdb mode it should entirely send any
    character buffer over to the gdbstub when switching connections.

    Previously kdb was zero'ing out the character buffer and this could
    lead to gdb failing to connect at all, or a lengthy pause could occur
    on the initial connect.

    Signed-off-by: Jason Wessel

    Jason Wessel