26 Jul, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    fs: Merge split strings
    treewide: fix potentially dangerous trailing ';' in #defined values/expressions
    uwb: Fix misspelling of neighbourhood in comment
    net, netfilter: Remove redundant goto in ebt_ulog_packet
    trivial: don't touch files that are removed in the staging tree
    lib/vsprintf: replace link to Draft by final RFC number
    doc: Kconfig: `to be' -> `be'
    doc: Kconfig: Typo: square -> squared
    doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
    drivers/net: static should be at beginning of declaration
    drivers/media: static should be at beginning of declaration
    drivers/i2c: static should be at beginning of declaration
    XTENSA: static should be at beginning of declaration
    SH: static should be at beginning of declaration
    MIPS: static should be at beginning of declaration
    ARM: static should be at beginning of declaration
    rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
    Update my e-mail address
    PCIe ASPM: forcedly -> forcibly
    gma500: push through device driver tree
    ...

    Fix up trivial conflicts:
    - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
    - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
    - drivers/net/r8169.c (just context changes)

    Linus Torvalds
     

11 Jul, 2011

1 commit


30 Jun, 2011

1 commit


16 May, 2011

1 commit

  • Loading fpu without aesni-intel does nothing. Loading aesni-intel
    without fpu causes modes like xts to fail. (Unloading
    aesni-intel will restore those modes.)

    One solution would be to make aesni-intel depend on fpu, but it
    seems cleaner to just combine the modules.

    This is probably responsible for bugs like:
    https://bugzilla.redhat.com/show_bug.cgi?id=589390

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Herbert Xu

    Andy Lutomirski
     

28 Dec, 2010

1 commit


29 Nov, 2010

1 commit

  • Add missing dependency on NET since we require sockets for our
    interface.

    Should really be a select but kconfig doesn't like that:

    net/Kconfig:6:error: found recursive dependency: NET -> NETWORK_FILESYSTEMS -> AFS_FS -> AF_RXRPC -> CRYPTO -> CRYPTO_USER_API_HASH -> CRYPTO_USER_API -> NET

    Reported-by: Zimny Lech
    Signed-off-by: Herbert Xu

    Herbert Xu
     

27 Nov, 2010

1 commit

  • The AES-NI instructions are also available in legacy mode so the 32-bit
    architecture may profit from those, too.

    To illustrate the performance gain here's a short summary of a dm-crypt
    speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
    implementations:

    x86: i568 aes-ni delta
    ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4%
    CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3%
    LRW, 256 bit: 108.6 MB/s 222.1 MB/s +104.5%
    XTS, 256 bit: 105.0 MB/s 205.5 MB/s +95.7%

    Additionally, due to some minor optimizations, the 64-bit version also
    got a minor performance gain as seen below:

    x86-64: old impl. new impl. delta
    ECB, 256 bit: 121.1 MB/s 123.0 MB/s +1.5%
    CBC, 256 bit: 285.3 MB/s 290.8 MB/s +1.9%
    LRW, 256 bit: 263.7 MB/s 265.3 MB/s +0.6%
    XTS, 256 bit: 251.1 MB/s 255.3 MB/s +1.7%

    Signed-off-by: Mathias Krause
    Reviewed-by: Huang Ying
    Signed-off-by: Herbert Xu

    Mathias Krause
     

26 Nov, 2010

1 commit

  • This patch adds the af_alg plugin for symmetric key ciphers,
    corresponding to the ablkcipher kernel operation type.

    Keys can optionally be set through the setsockopt interface.

    Once a sendmsg call occurs without MSG_MORE no further writes
    may be made to the socket until all previous data has been read.

    IVs and and whether encryption/decryption is performed can be
    set through the setsockopt interface or as a control message
    to sendmsg.

    The interface is completely synchronous, all operations are
    carried out in recvmsg(2) and will complete prior to the system
    call returning.

    The splice(2) interface support reading the user-space data directly
    without copying (except that the Crypto API itself may copy the data
    if alignment is off).

    The recvmsg(2) interface supports directly writing to user-space
    without additional copying, i.e., the kernel crypto interface will
    receive the user-space address as its output SG list.

    Thakns to Miloslav Trmac for reviewing this and contributing
    fixes and improvements.

    Signed-off-by: Herbert Xu
    Acked-by: David S. Miller

    Herbert Xu
     

19 Nov, 2010

2 commits

  • This patch adds the af_alg plugin for hash, corresponding to
    the ahash kernel operation type.

    Keys can optionally be set through the setsockopt interface.

    Each sendmsg call will finalise the hash unless sent with a MSG_MORE
    flag.

    Partial hash states can be cloned using accept(2).

    The interface is completely synchronous, all operations will
    complete prior to the system call returning.

    Both sendmsg(2) and splice(2) support reading the user-space
    data directly without copying (except that the Crypto API itself
    may copy the data if alignment is off).

    For now only the splice(2) interface supports performing digest
    instead of init/update/final. In future the sendmsg(2) interface
    will also be modified to use digest/finup where possible so that
    hardware that cannot return a partial hash state can still benefit
    from this interface.

    Thakns to Miloslav Trmac for reviewing this and contributing
    fixes and improvements.

    Signed-off-by: Herbert Xu
    Acked-by: David S. Miller
    Tested-by: Martin Willi

    Herbert Xu
     
  • This patch creates the backbone of the user-space interface for
    the Crypto API, through a new socket family AF_ALG.

    Each session corresponds to one or more connections obtained from
    that socket. The number depends on the number of inputs/outputs
    of that particular type of operation. For most types there will
    be a s ingle connection/file descriptor that is used for both input
    and output. AEAD is one of the few that require two inputs.

    Each algorithm type will provide its own implementation that plugs
    into af_alg. They're keyed using a string such as "skcipher" or
    "hash".

    IOW this patch only contains the boring bits that is required
    to hold everything together.

    Thakns to Miloslav Trmac for reviewing this and contributing
    fixes and improvements.

    Signed-off-by: Herbert Xu
    Acked-by: David S. Miller
    Tested-by: Martin Willi

    Herbert Xu
     

12 Sep, 2010

1 commit

  • Below is a patch to update the broken web addresses, in crypto/*
    that I could locate. Some are just simple typos that needed to be
    fixed, and some had a change in location altogether..
    let me know if any of them need to be changed and such.

    Signed-off-by: Justin P. Mattock
    Signed-off-by: Herbert Xu

    Justin P. Mattock
     

03 Sep, 2010

1 commit


06 Aug, 2010

2 commits

  • On Thu, Aug 05, 2010 at 07:01:21PM -0700, Linus Torvalds wrote:
    > On Thu, Aug 5, 2010 at 6:40 PM, Herbert Xu wrote:
    > >
    > > -config CRYPTO_MANAGER_TESTS
    > > - bool "Run algolithms' self-tests"
    > > - default y
    > > - depends on CRYPTO_MANAGER2
    > > +config CRYPTO_MANAGER_DISABLE_TESTS
    > > + bool "Disable run-time self tests"
    > > + depends on CRYPTO_MANAGER2 && EMBEDDED
    >
    > Why do you still want to force-enable those tests? I was going to
    > complain about the "default y" anyway, now I'm _really_ complaining,
    > because you've now made it impossible to disable those tests. Why?

    As requested, this patch sets the default to y and removes the
    EMBEDDED dependency.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • This patch fixes a serious bug in the test disabling patch where
    it can cause an spurious load of the cryptomgr module even when
    it's compiled in.

    It also negates the test disabling option so that its absence
    causes tests to be enabled.

    The Kconfig option is also now behind EMBEDDED.

    Signed-off-by: Herbert Xu

    Herbert Xu
     

03 Jun, 2010

2 commits


29 Mar, 2010

1 commit


08 Mar, 2010

1 commit


05 Feb, 2010

1 commit


07 Jan, 2010

1 commit


27 Oct, 2009

1 commit


19 Oct, 2009

1 commit

  • PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
    carry-less multiplication. More information about PCLMULQDQ can be
    found at:

    http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/

    Because PCLMULQDQ changes XMM state, its usage must be enclosed with
    kernel_fpu_begin/end, which can be used only in process context, the
    acceleration is implemented as crypto_ahash. That is, request in soft
    IRQ context will be defered to the cryptd kernel thread.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

02 Sep, 2009

1 commit


20 Aug, 2009

1 commit

  • What about something like this? It defaults the CPRNG to m and makes FIPS
    dependent on the CPRNG. That way you get a module build by default, but you can
    change it to y manually during config and still satisfy the dependency, and if
    you select N it disables FIPS as well. I rather like that better than making
    FIPS a tristate. I just tested it out here and it seems to work well. Let me
    know what you think

    Signed-off-by: Neil Horman
    Signed-off-by: Herbert Xu

    Neil Horman
     

13 Aug, 2009

1 commit

  • This reverts commit 215ccd6f55a2144bd553e0a3d12e1386f02309fd.

    It causes CPRNG and everything selected by it to be built-in
    whenever FIPS is enabled. The problem is that it is selecting
    a tristate from a bool, which is usually not what is intended.

    Signed-off-by: Herbert Xu

    Herbert Xu
     

06 Aug, 2009

2 commits

  • Remove the dedicated GHASH implementation in GCM, and uses the GHASH
    digest algorithm instead. This will make GCM uses hardware accelerated
    GHASH implementation automatically if available.

    ahash instead of shash interface is used, because some hardware
    accelerated GHASH implementation needs asynchronous interface.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • GHASH is implemented as a shash algorithm. The actual implementation
    is copied from gcm.c. This makes it possible to add
    architecture/hardware accelerated GHASH implementation.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

21 Jun, 2009

1 commit

  • The ANSI CPRNG has no dependence on FIPS support. FIPS support however,
    requires the use of the CPRNG. Adjust that depedency relationship in Kconfig.

    Signed-off-by: Neil Horman
    Signed-off-by: Herbert Xu

    Neil Horman
     

19 Jun, 2009

1 commit


02 Jun, 2009

2 commits

  • Because kernel_fpu_begin() and kernel_fpu_end() operations are too
    slow, the performance gain of general mode implementation + aes-aesni
    is almost all compensated.

    The AES-NI support for more modes are implemented as follow:

    - Add a new AES algorithm implementation named __aes-aesni without
    kernel_fpu_begin/end()

    - Use fpu((AES)) to provide kenrel_fpu_begin/end() invoking

    - Add (AES) ablkcipher, which uses cryptd(fpu((AES))) to
    defer cryption to cryptd context in soft_irq context.

    Now the ctr, lrw, pcbc and xts support are added.

    Performance testing based on dm-crypt shows that cryption time can be
    reduced to 50% of general mode implementation + aes-aesni implementation.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and
    kernel_fpu_end(). If they are invoked in cipher algorithm
    implementation, they will be invoked for each block, so that
    performance will be hurt, because they are "slow" operations. This
    patch implements "fpu" template, which makes these operations to be
    invoked for each request.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

04 Mar, 2009

3 commits

  • Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Herbert Xu

    Geert Uytterhoeven
     
  • Signed-off-by: Geert Uytterhoeven
    Cc: James Morris
    Signed-off-by: Herbert Xu

    Geert Uytterhoeven
     
  • The current "comp" crypto interface supports one-shot (de)compression only,
    i.e. the whole data buffer to be (de)compressed must be passed at once, and
    the whole (de)compressed data buffer will be received at once.
    In several use-cases (e.g. compressed file systems that store files in big
    compressed blocks), this workflow is not suitable.
    Furthermore, the "comp" type doesn't provide for the configuration of
    (de)compression parameters, and always allocates workspace memory for both
    compression and decompression, which may waste memory.

    To solve this, add a "pcomp" partial (de)compression interface that provides
    the following operations:
    - crypto_compress_{init,update,final}() for compression,
    - crypto_decompress_{init,update,final}() for decompression,
    - crypto_{,de}compress_setup(), to configure (de)compression parameters
    (incl. allocating workspace memory).

    The (de)compression methods take a struct comp_request, which was mimicked
    after the z_stream object in zlib, and contains buffer pointer and length
    pairs for input and output.

    The setup methods take an opaque parameter pointer and length pair. Parameters
    are supposed to be encoded using netlink attributes, whose meanings depend on
    the actual (name of the) (de)compression algorithm.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Herbert Xu

    Geert Uytterhoeven
     

19 Feb, 2009

3 commits

  • keventd_wq has potential starvation problem, so use dedicated
    kcrypto_wq instead.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • Original cryptd thread implementation has scalability issue, this
    patch solve the issue with a per-CPU thread implementation.

    struct cryptd_queue is defined to be a per-CPU queue, which holds one
    struct cryptd_cpu_queue for each CPU. In struct cryptd_cpu_queue, a
    struct crypto_queue holds all requests for the CPU, a struct
    work_struct is used to run all requests for the CPU.

    Testing based on dm-crypt on an Intel Core 2 E6400 (two cores) machine
    shows 19.2% performance gain. The testing script is as follow:

    -------------------- script begin ---------------------------
    #!/bin/sh

    dmc_create()
    {
    # Create a crypt device using dmsetup
    dmsetup create $2 --table "0 `blockdev --getsize $1` crypt cbc(aes-asm)?cryptd?plain:plain babebabebabebabebabebabebabebabe 0 $1 0"
    }

    dmsetup remove crypt0
    dmsetup remove crypt1

    dd if=/dev/zero of=/dev/ram0 bs=1M count=4 >& /dev/null
    dd if=/dev/zero of=/dev/ram1 bs=1M count=4 >& /dev/null

    dmc_create /dev/ram0 crypt0
    dmc_create /dev/ram1 crypt1

    cat >tr.sh <& /dev/null &
    dd if=/dev/dm-1 of=/dev/null >& /dev/null &
    done
    wait
    EOF

    for n in $(seq 10); do
    /usr/bin/time sh tr.sh
    done
    rm tr.sh
    -------------------- script end ---------------------------

    The separator of dm-crypt parameter is changed from "-" to "?", because
    "-" is used in some cipher driver name too, and cryptds need to specify
    cipher driver name instead of cipher name.

    The test result on an Intel Core2 E6400 (two cores) is as follow:

    without patch:
    -----------------wo begin --------------------------
    0.04user 0.38system 0:00.39elapsed 107%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6566minor)pagefaults 0swaps
    0.07user 0.35system 0:00.35elapsed 121%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6567minor)pagefaults 0swaps
    0.06user 0.34system 0:00.30elapsed 135%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.05user 0.37system 0:00.36elapsed 119%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6607minor)pagefaults 0swaps
    0.06user 0.36system 0:00.35elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.05user 0.37system 0:00.31elapsed 136%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6594minor)pagefaults 0swaps
    0.04user 0.34system 0:00.30elapsed 126%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6597minor)pagefaults 0swaps
    0.06user 0.32system 0:00.31elapsed 125%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6571minor)pagefaults 0swaps
    0.06user 0.34system 0:00.31elapsed 134%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6581minor)pagefaults 0swaps
    0.05user 0.38system 0:00.31elapsed 138%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6600minor)pagefaults 0swaps
    -----------------wo end --------------------------

    with patch:
    ------------------w begin --------------------------
    0.02user 0.31system 0:00.24elapsed 141%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6554minor)pagefaults 0swaps
    0.05user 0.34system 0:00.31elapsed 127%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6606minor)pagefaults 0swaps
    0.07user 0.33system 0:00.26elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6559minor)pagefaults 0swaps
    0.07user 0.32system 0:00.26elapsed 151%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.05user 0.34system 0:00.26elapsed 150%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6603minor)pagefaults 0swaps
    0.03user 0.36system 0:00.31elapsed 124%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.04user 0.35system 0:00.26elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6586minor)pagefaults 0swaps
    0.03user 0.37system 0:00.27elapsed 146%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.04user 0.36system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6594minor)pagefaults 0swaps
    0.04user 0.35system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6557minor)pagefaults 0swaps
    ------------------w end --------------------------

    The middle value of elapsed time is:
    wo cryptwq: 0.31
    w cryptwq: 0.26

    The performance gain is about (0.31-0.26)/0.26 = 0.192.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • Use dedicated workqueue for crypto subsystem

    A dedicated workqueue named kcrypto_wq is created to be used by crypto
    subsystem. The system shared keventd_wq is not suitable for
    encryption/decryption, because of potential starvation problem.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

18 Feb, 2009

1 commit

  • Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
    instructions that are going to be introduced in the next generation of
    Intel processor, as of 2009. These instructions enable fast and secure
    data encryption and decryption, using the Advanced Encryption Standard
    (AES), defined by FIPS Publication number 197. The architecture
    introduces six instructions that offer full hardware support for
    AES. Four of them support high performance data encryption and
    decryption, and the other two instructions support the AES key
    expansion procedure.

    The white paper can be downloaded from:

    http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf

    AES may be used in soft_irq context, but MMX/SSE context can not be
    touched safely in soft_irq context. So in_interrupt() is checked, if
    in IRQ or soft_irq context, the general x86_64 implementation are used
    instead.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

25 Dec, 2008

2 commits