05 Oct, 2012

1 commit

  • Pull crypto update from Herbert Xu:
    - Optimised AES/SHA1 for ARM.
    - IPsec ESN support in talitos and caam.
    - x86_64/avx implementation of cast5/cast6.
    - Add/use multi-algorithm registration helpers where possible.
    - Added IBM Power7+ in-Nest support.
    - Misc fixes.

    Fix up trivial conflicts in crypto/Kconfig due to the sparc64 crypto
    config options being added next to the new ARM ones.

    [ Side note: cut-and-paste duplicate help texts make those conflicts
    harder to read than necessary, thanks to git being smart about
    minimizing conflicts and maximizing the common parts... ]

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
    crypto: x86/glue_helper - fix storing of new IV in CBC encryption
    crypto: cast5/avx - fix storing of new IV in CBC encryption
    crypto: tcrypt - add missing tests for camellia and ghash
    crypto: testmgr - make test_aead also test 'dst != src' code paths
    crypto: testmgr - make test_skcipher also test 'dst != src' code paths
    crypto: testmgr - add test vectors for CTR mode IV increasement
    crypto: testmgr - add test vectors for partial ctr(cast5) and ctr(cast6)
    crypto: testmgr - allow non-multi page and multi page skcipher tests from same test template
    crypto: caam - increase TRNG clocks per sample
    crypto, tcrypt: remove local_bh_disable/enable() around local_irq_disable/enable()
    crypto: tegra-aes - fix error return code
    crypto: crypto4xx - fix error return code
    crypto: hifn_795x - fix error return code
    crypto: ux500 - fix error return code
    crypto: caam - fix error IDs for SEC v5.x RNG4
    hwrng: mxc-rnga - Access data via structure
    hwrng: mxc-rnga - Adapt clocks to new i.mx clock framework
    crypto: caam - add IPsec ESN support
    crypto: 842 - remove .cra_list initialization
    Revert "[CRYPTO] cast6: inline bloat--"
    ...

    Linus Torvalds
     

03 Oct, 2012

3 commits

  • Asking for this option on x86 seems a bit pointless.

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones
     
  • Pull networking changes from David Miller:

    1) GRE now works over ipv6, from Dmitry Kozlov.

    2) Make SCTP more network namespace aware, from Eric Biederman.

    3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

    4) Make openvswitch network namespace aware, from Pravin B Shelar.

    5) IPV6 NAT implementation, from Patrick McHardy.

    6) Server side support for TCP Fast Open, from Jerry Chu and others.

    7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

    8) Increate the loopback default MTU to 64K, from Eric Dumazet.

    9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic. This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

    10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail. Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

    11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

    12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

    13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

    Fix up various fairly trivial conflicts, mostly caused by the user
    namespace changes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
    hyperv: Add buffer for extended info after the RNDIS response message.
    hyperv: Report actual status in receive completion packet
    hyperv: Remove extra allocated space for recv_pkt_list elements
    hyperv: Fix page buffer handling in rndis_filter_send_request()
    hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
    hyperv: Fix the max_xfer_size in RNDIS initialization
    vxlan: put UDP socket in correct namespace
    vxlan: Depend on CONFIG_INET
    sfc: Fix the reported priorities of different filter types
    sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
    sfc: Fix loopback self-test with separate_tx_channels=1
    sfc: Fix MCDI structure field lookup
    sfc: Add parentheses around use of bitfield macro arguments
    sfc: Fix null function pointer in efx_sriov_channel_type
    vxlan: virtual extensible lan
    igmp: export symbol ip_mc_leave_group
    netlink: add attributes to fdb interface
    tg3: unconditionally select HWMON support when tg3 is enabled.
    Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
    gre: fix sparse warning
    ...

    Linus Torvalds
     
  • Pull sparc updates from David Miller:
    "Largely this is simply adding support for the Niagara 4 cpu.

    Major areas are perf events (chip now supports 4 counters and can
    monitor any event on each counter), crypto (opcodes are availble for
    sha1, sha256, sha512, md5, crc32c, AES, DES, CAMELLIA, and Kasumi
    although the last is unsupported since we lack a generic crypto layer
    Kasumi implementation), and an optimized memcpy.

    Finally some cleanups by Peter Senna Tschudin."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (47 commits)
    sparc64: Fix trailing whitespace in NG4 memcpy.
    sparc64: Fix comment type in NG4 copy from user.
    sparc64: Add SPARC-T4 optimized memcpy.
    drivers/sbus/char: removes unnecessary semicolon
    arch/sparc/kernel/pci_sun4v.c: removes unnecessary semicolon
    sparc64: Fix function argument comment in camellia_sparc64_key_expand asm.
    sparc64: Fix IV handling bug in des_sparc64_cbc_decrypt
    sparc64: Add auto-loading mechanism to crypto-opcode drivers.
    sparc64: Add missing pr_fmt define to crypto opcode drivers.
    sparc64: Adjust crypto priorities.
    sparc64: Use cpu_pgsz_mask for linear kernel mapping config.
    sparc64: Probe cpu page size support more portably.
    sparc64: Support 2GB and 16GB page sizes for kernel linear mappings.
    sparc64: Fix bugs in unrolled 256-bit loops.
    sparc64: Avoid code duplication in crypto assembler.
    sparc64: Unroll CTR crypt loops in AES driver.
    sparc64: Unroll ECB decryption loops in AES driver.
    sparc64: Unroll ECB encryption loops in AES driver.
    sparc64: Add ctr mode support to AES driver.
    sparc64: Move AES driver over to a methods based implementation.
    ...

    Linus Torvalds
     

27 Sep, 2012

7 commits


15 Sep, 2012

1 commit

  • Conflicts:
    net/netfilter/nfnetlink_log.c
    net/netfilter/xt_LOG.c

    Rather easy conflict resolution, the 'net' tree had bug fixes to make
    sure we checked if a socket is a time-wait one or not and elide the
    logging code if so.

    Whereas on the 'net-next' side we are calculating the UID and GID from
    the creds using different interfaces due to the user namespace changes
    from Eric Biederman.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Sep, 2012

2 commits

  • The authenc code doesn't deal with zero-length associated data
    correctly and ends up constructing a zero-length sg entry which
    causes a crash when it's fed into the crypto system.

    This patch fixes this by avoiding the code-path that triggers
    the SG construction if we have no associated data.

    This isn't the most optimal fix as it means that we'll end up
    using the fallback code-path even when we could still execute
    the digest function. However, this isn't a big deal as nobody
    but the test path would supply zero-length associated data.

    Reported-by: Romain Francoise
    Signed-off-by: Herbert Xu
    Tested-by: Romain Francoise

    Herbert Xu
     
  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

09 Sep, 2012

1 commit


07 Sep, 2012

4 commits

  • .cra_list initialization is unneeded and have been removed from all other
    crypto modules except 842.

    Cc: Robert Jennings
    Signed-off-by: Jussi Kivilinna
    Acked-by: Seth Jennings
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • This reverts commit e6ccc727f30a02670f6a00df6d548942bc988f43.

    Above commit caused performance regression for CAST6. Reverting gives
    following increase in tcrypt speed tests (revert-vs-old ratios).

    AMD Phenom II X6 1055T, x86-64:

    size ecb cbc ctr lrw xts
    enc dec enc dec enc dec enc dec enc dec
    16b 1.15x 1.17x 1.16x 1.17x 1.16x 1.16x 1.14x 1.19x 1.05x 1.07x
    64b 1.19x 1.23x 1.20x 1.22x 1.19x 1.19x 1.16x 1.24x 1.12x 1.12x
    256b 1.21x 1.24x 1.22x 1.24x 1.20x 1.20x 1.17x 1.21x 1.16x 1.14x
    1kb 1.21x 1.25x 1.22x 1.24x 1.21x 1.21x 1.18x 1.22x 1.17x 1.15x
    8kb 1.21x 1.25x 1.22x 1.24x 1.21x 1.21x 1.18x 1.22x 1.18x 1.15x

    Cc: Ilpo Järvinen
    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Fix "symbol 'x' was not declared. Should it be static?" sparse warnings.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Add assembler versions of AES and SHA1 for ARM platforms. This has provided
    up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1.

    Platform CPU SPeed Endian Before (bps) After (bps) Improvement

    IXP425 533 MHz big 11217042 15566294 ~38%
    KS8695 166 MHz little 3828549 5795373 ~51%

    Signed-off-by: David McCullough
    Signed-off-by: Herbert Xu

    David McCullough
     

29 Aug, 2012

1 commit


26 Aug, 2012

1 commit


23 Aug, 2012

2 commits


21 Aug, 2012

4 commits


20 Aug, 2012

1 commit

  • …NI hardware pipelines

    Use parallel LRW and XTS encryption facilities to better utilize AES-NI
    hardware pipelines and gain extra performance.

    Tcrypt benchmark results (async), old vs new ratios:

    Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7)

    aes:128bit
    lrw:256bit xts:256bit
    size lrw-enc lrw-dec xts-dec xts-dec
    16B 0.99x 1.00x 1.22x 1.19x
    64B 1.38x 1.50x 1.58x 1.61x
    256B 2.04x 2.02x 2.27x 2.29x
    1024B 2.56x 2.54x 2.89x 2.92x
    8192B 2.85x 2.99x 3.40x 3.23x

    aes:192bit
    lrw:320bit xts:384bit
    size lrw-enc lrw-dec xts-dec xts-dec
    16B 1.08x 1.08x 1.16x 1.17x
    64B 1.48x 1.54x 1.59x 1.65x
    256B 2.18x 2.17x 2.29x 2.28x
    1024B 2.67x 2.67x 2.87x 3.05x
    8192B 2.93x 2.84x 3.28x 3.33x

    aes:256bit
    lrw:348bit xts:512bit
    size lrw-enc lrw-dec xts-dec xts-dec
    16B 1.07x 1.07x 1.18x 1.19x
    64B 1.56x 1.56x 1.70x 1.71x
    256B 2.22x 2.24x 2.46x 2.46x
    1024B 2.76x 2.77x 3.13x 3.05x
    8192B 2.99x 3.05x 3.40x 3.30x

    Cc: Huang Ying <ying.huang@intel.com>
    Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Reviewed-by: Kim Phillips <kim.phillips@freescale.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

    Jussi Kivilinna
     

01 Aug, 2012

12 commits

  • This patch add the 842 cryptographic API driver that
    submits compression requests to the 842 hardware compression
    accelerator driver (nx-compress).

    If the hardware accelerator goes offline for any reason
    (dynamic disable, migration, etc...), this driver will use LZO
    as a software failover for all future compression requests.
    For decompression requests, the 842 hardware driver contains
    a software implementation of the 842 decompressor to support
    the decompression of data that was compressed before the accelerator
    went offline.

    Signed-off-by: Robert Jennings
    Signed-off-by: Seth Jennings
    Signed-off-by: Herbert Xu

    Seth Jennings
     
  • This patch adds a x86_64/avx assembler implementation of the Cast6 block
    cipher. The implementation processes eight blocks in parallel (two 4 block
    chunk AVX operations). The table-lookups are done in general-purpose registers.
    For small blocksizes the functions from the generic module are called. A good
    performance increase is provided for blocksizes greater or equal to 128B.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmark results:

    Intel Core i5-2500 CPU (fam:6, model:42, step:7)

    cast6-avx-x86_64 vs. cast6-generic
    128bit key: (lrw:256bit) (xts:256bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 0.97x 1.00x 1.01x 1.01x 0.99x 0.97x 0.98x 1.01x 0.96x 0.98x
    64B 0.98x 0.99x 1.02x 1.01x 0.99x 1.00x 1.01x 0.99x 1.00x 0.99x
    256B 1.77x 1.84x 0.99x 1.85x 1.77x 1.77x 1.70x 1.74x 1.69x 1.72x
    1024B 1.93x 1.95x 0.99x 1.96x 1.93x 1.93x 1.84x 1.85x 1.89x 1.87x
    8192B 1.91x 1.95x 0.99x 1.97x 1.95x 1.91x 1.86x 1.87x 1.93x 1.90x

    256bit key: (lrw:384bit) (xts:512bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 0.97x 0.99x 1.02x 1.01x 0.98x 0.99x 1.00x 1.00x 0.98x 0.98x
    64B 0.98x 0.99x 1.01x 1.00x 1.00x 1.00x 1.01x 1.01x 0.97x 1.00x
    256B 1.77x 1.83x 1.00x 1.86x 1.79x 1.78x 1.70x 1.76x 1.71x 1.69x
    1024B 1.92x 1.95x 0.99x 1.96x 1.93x 1.93x 1.83x 1.86x 1.89x 1.87x
    8192B 1.94x 1.95x 0.99x 1.97x 1.95x 1.95x 1.87x 1.87x 1.93x 1.91x

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • New ECB, CBC, CTR, LRW and XTS testvectors for cast6. We need larger
    testvectors to check parallel code paths in the optimized implementation. Tests
    have also been added to the tcrypt module.

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • Rename cast6 module to cast6_generic to allow autoloading of optimized
    implementations. Generic functions and s-boxes are exported to be able to use
    them within optimized implementations.

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • This patch adds a x86_64/avx assembler implementation of the Cast5 block
    cipher. The implementation processes sixteen blocks in parallel (four 4 block
    chunk AVX operations). The table-lookups are done in general-purpose registers.
    For small blocksizes the functions from the generic module are called. A good
    performance increase is provided for blocksizes greater or equal to 128B.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmark results:

    Intel Core i5-2500 CPU (fam:6, model:42, step:7)

    cast5-avx-x86_64 vs. cast5-generic
    64bit key:
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 0.99x 0.99x 1.00x 1.00x 1.02x 1.01x
    64B 1.00x 1.00x 0.98x 1.00x 1.01x 1.02x
    256B 2.03x 2.01x 0.95x 2.11x 2.12x 2.13x
    1024B 2.30x 2.24x 0.95x 2.29x 2.35x 2.35x
    8192B 2.31x 2.27x 0.95x 2.31x 2.39x 2.39x

    128bit key:
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
    16B 0.99x 0.99x 1.00x 1.00x 1.01x 1.01x
    64B 1.00x 1.00x 0.98x 1.01x 1.02x 1.01x
    256B 2.17x 2.13x 0.96x 2.19x 2.19x 2.19x
    1024B 2.29x 2.32x 0.95x 2.34x 2.37x 2.38x
    8192B 2.35x 2.32x 0.95x 2.35x 2.39x 2.39x

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • New ECB, CBC and CTR testvectors for cast5. We need larger testvectors to check
    parallel code paths in the optimized implementation. Tests have also been added
    to the tcrypt module.

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • Rename cast5 module to cast5_generic to allow autoloading of optimized
    implementations. Generic functions and s-boxes are exported to be able to use
    them within optimized implementations.

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried
     
  • Initialization of cra_list is currently mixed, most ciphers initialize this
    field and most shashes do not. Initialization however is not needed at all
    since cra_list is initialized/overwritten in __crypto_register_alg() with
    list_add(). Therefore perform cleanup to remove all unneeded initializations
    of this field in 'crypto/'.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Combine all shash algs to be registered and use new crypto_[un]register_shashes
    functions. This simplifies init/exit code.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Combine all shash algs to be registered and use new crypto_[un]register_shashes
    functions. This simplifies init/exit code.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Combine all shash algs to be registered and use new crypto_[un]register_shashes
    functions. This simplifies init/exit code.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     
  • Combine all shash algs to be registered and use new crypto_[un]register_shashes
    functions. This simplifies init/exit code.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna