07 Jan, 2010

1 commit


27 Oct, 2009

1 commit


19 Oct, 2009

1 commit

  • PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
    carry-less multiplication. More information about PCLMULQDQ can be
    found at:

    http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/

    Because PCLMULQDQ changes XMM state, its usage must be enclosed with
    kernel_fpu_begin/end, which can be used only in process context, the
    acceleration is implemented as crypto_ahash. That is, request in soft
    IRQ context will be defered to the cryptd kernel thread.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

02 Sep, 2009

1 commit


20 Aug, 2009

1 commit

  • What about something like this? It defaults the CPRNG to m and makes FIPS
    dependent on the CPRNG. That way you get a module build by default, but you can
    change it to y manually during config and still satisfy the dependency, and if
    you select N it disables FIPS as well. I rather like that better than making
    FIPS a tristate. I just tested it out here and it seems to work well. Let me
    know what you think

    Signed-off-by: Neil Horman
    Signed-off-by: Herbert Xu

    Neil Horman
     

13 Aug, 2009

1 commit

  • This reverts commit 215ccd6f55a2144bd553e0a3d12e1386f02309fd.

    It causes CPRNG and everything selected by it to be built-in
    whenever FIPS is enabled. The problem is that it is selecting
    a tristate from a bool, which is usually not what is intended.

    Signed-off-by: Herbert Xu

    Herbert Xu
     

06 Aug, 2009

2 commits

  • Remove the dedicated GHASH implementation in GCM, and uses the GHASH
    digest algorithm instead. This will make GCM uses hardware accelerated
    GHASH implementation automatically if available.

    ahash instead of shash interface is used, because some hardware
    accelerated GHASH implementation needs asynchronous interface.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • GHASH is implemented as a shash algorithm. The actual implementation
    is copied from gcm.c. This makes it possible to add
    architecture/hardware accelerated GHASH implementation.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

21 Jun, 2009

1 commit

  • The ANSI CPRNG has no dependence on FIPS support. FIPS support however,
    requires the use of the CPRNG. Adjust that depedency relationship in Kconfig.

    Signed-off-by: Neil Horman
    Signed-off-by: Herbert Xu

    Neil Horman
     

19 Jun, 2009

1 commit


02 Jun, 2009

2 commits

  • Because kernel_fpu_begin() and kernel_fpu_end() operations are too
    slow, the performance gain of general mode implementation + aes-aesni
    is almost all compensated.

    The AES-NI support for more modes are implemented as follow:

    - Add a new AES algorithm implementation named __aes-aesni without
    kernel_fpu_begin/end()

    - Use fpu((AES)) to provide kenrel_fpu_begin/end() invoking

    - Add (AES) ablkcipher, which uses cryptd(fpu((AES))) to
    defer cryption to cryptd context in soft_irq context.

    Now the ctr, lrw, pcbc and xts support are added.

    Performance testing based on dm-crypt shows that cryption time can be
    reduced to 50% of general mode implementation + aes-aesni implementation.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and
    kernel_fpu_end(). If they are invoked in cipher algorithm
    implementation, they will be invoked for each block, so that
    performance will be hurt, because they are "slow" operations. This
    patch implements "fpu" template, which makes these operations to be
    invoked for each request.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

04 Mar, 2009

3 commits

  • Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Herbert Xu

    Geert Uytterhoeven
     
  • Signed-off-by: Geert Uytterhoeven
    Cc: James Morris
    Signed-off-by: Herbert Xu

    Geert Uytterhoeven
     
  • The current "comp" crypto interface supports one-shot (de)compression only,
    i.e. the whole data buffer to be (de)compressed must be passed at once, and
    the whole (de)compressed data buffer will be received at once.
    In several use-cases (e.g. compressed file systems that store files in big
    compressed blocks), this workflow is not suitable.
    Furthermore, the "comp" type doesn't provide for the configuration of
    (de)compression parameters, and always allocates workspace memory for both
    compression and decompression, which may waste memory.

    To solve this, add a "pcomp" partial (de)compression interface that provides
    the following operations:
    - crypto_compress_{init,update,final}() for compression,
    - crypto_decompress_{init,update,final}() for decompression,
    - crypto_{,de}compress_setup(), to configure (de)compression parameters
    (incl. allocating workspace memory).

    The (de)compression methods take a struct comp_request, which was mimicked
    after the z_stream object in zlib, and contains buffer pointer and length
    pairs for input and output.

    The setup methods take an opaque parameter pointer and length pair. Parameters
    are supposed to be encoded using netlink attributes, whose meanings depend on
    the actual (name of the) (de)compression algorithm.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Herbert Xu

    Geert Uytterhoeven
     

19 Feb, 2009

3 commits

  • keventd_wq has potential starvation problem, so use dedicated
    kcrypto_wq instead.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • Original cryptd thread implementation has scalability issue, this
    patch solve the issue with a per-CPU thread implementation.

    struct cryptd_queue is defined to be a per-CPU queue, which holds one
    struct cryptd_cpu_queue for each CPU. In struct cryptd_cpu_queue, a
    struct crypto_queue holds all requests for the CPU, a struct
    work_struct is used to run all requests for the CPU.

    Testing based on dm-crypt on an Intel Core 2 E6400 (two cores) machine
    shows 19.2% performance gain. The testing script is as follow:

    -------------------- script begin ---------------------------
    #!/bin/sh

    dmc_create()
    {
    # Create a crypt device using dmsetup
    dmsetup create $2 --table "0 `blockdev --getsize $1` crypt cbc(aes-asm)?cryptd?plain:plain babebabebabebabebabebabebabebabe 0 $1 0"
    }

    dmsetup remove crypt0
    dmsetup remove crypt1

    dd if=/dev/zero of=/dev/ram0 bs=1M count=4 >& /dev/null
    dd if=/dev/zero of=/dev/ram1 bs=1M count=4 >& /dev/null

    dmc_create /dev/ram0 crypt0
    dmc_create /dev/ram1 crypt1

    cat >tr.sh <& /dev/null &
    dd if=/dev/dm-1 of=/dev/null >& /dev/null &
    done
    wait
    EOF

    for n in $(seq 10); do
    /usr/bin/time sh tr.sh
    done
    rm tr.sh
    -------------------- script end ---------------------------

    The separator of dm-crypt parameter is changed from "-" to "?", because
    "-" is used in some cipher driver name too, and cryptds need to specify
    cipher driver name instead of cipher name.

    The test result on an Intel Core2 E6400 (two cores) is as follow:

    without patch:
    -----------------wo begin --------------------------
    0.04user 0.38system 0:00.39elapsed 107%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6566minor)pagefaults 0swaps
    0.07user 0.35system 0:00.35elapsed 121%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6567minor)pagefaults 0swaps
    0.06user 0.34system 0:00.30elapsed 135%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.05user 0.37system 0:00.36elapsed 119%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6607minor)pagefaults 0swaps
    0.06user 0.36system 0:00.35elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.05user 0.37system 0:00.31elapsed 136%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6594minor)pagefaults 0swaps
    0.04user 0.34system 0:00.30elapsed 126%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6597minor)pagefaults 0swaps
    0.06user 0.32system 0:00.31elapsed 125%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6571minor)pagefaults 0swaps
    0.06user 0.34system 0:00.31elapsed 134%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6581minor)pagefaults 0swaps
    0.05user 0.38system 0:00.31elapsed 138%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6600minor)pagefaults 0swaps
    -----------------wo end --------------------------

    with patch:
    ------------------w begin --------------------------
    0.02user 0.31system 0:00.24elapsed 141%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6554minor)pagefaults 0swaps
    0.05user 0.34system 0:00.31elapsed 127%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6606minor)pagefaults 0swaps
    0.07user 0.33system 0:00.26elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6559minor)pagefaults 0swaps
    0.07user 0.32system 0:00.26elapsed 151%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.05user 0.34system 0:00.26elapsed 150%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6603minor)pagefaults 0swaps
    0.03user 0.36system 0:00.31elapsed 124%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.04user 0.35system 0:00.26elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6586minor)pagefaults 0swaps
    0.03user 0.37system 0:00.27elapsed 146%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6562minor)pagefaults 0swaps
    0.04user 0.36system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6594minor)pagefaults 0swaps
    0.04user 0.35system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+6557minor)pagefaults 0swaps
    ------------------w end --------------------------

    The middle value of elapsed time is:
    wo cryptwq: 0.31
    w cryptwq: 0.26

    The performance gain is about (0.31-0.26)/0.26 = 0.192.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     
  • Use dedicated workqueue for crypto subsystem

    A dedicated workqueue named kcrypto_wq is created to be used by crypto
    subsystem. The system shared keventd_wq is not suitable for
    encryption/decryption, because of potential starvation problem.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

18 Feb, 2009

1 commit

  • Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
    instructions that are going to be introduced in the next generation of
    Intel processor, as of 2009. These instructions enable fast and secure
    data encryption and decryption, using the Advanced Encryption Standard
    (AES), defined by FIPS Publication number 197. The architecture
    introduces six instructions that offer full hardware support for
    AES. Four of them support high performance data encryption and
    decryption, and the other two instructions support the AES key
    expansion procedure.

    The white paper can be downloaded from:

    http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf

    AES may be used in soft_irq context, but MMX/SSE context can not be
    touched safely in soft_irq context. So in_interrupt() is checked, if
    in IRQ or soft_irq context, the general x86_64 implementation are used
    instead.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying
     

25 Dec, 2008

14 commits


10 Dec, 2008

1 commit

  • If we have at least one algorithm built-in then it no longer makes
    sense to have the testing framework, and hence cryptomgr to be a
    module. It should be either on or off, i.e., built-in or disabled.

    This just happens to stop a potential runaway modprobe loop that
    seems to trigger on at least one distro.

    With fixes from Evgeniy Polyakov.

    Signed-off-by: Herbert Xu

    Herbert Xu
     

29 Aug, 2008

6 commits

  • This patch makes the IV generators use the new RNG interface so
    that the user can pick an RNG other than the default get_random_bytes.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • This patch adds a random number generator interface as well as a
    cryptographic pseudo-random number generator based on AES. It is
    meant to be used in cases where a deterministic CPRNG is required.

    One of the first applications will be as an input in the IPsec IV
    generation process.

    Signed-off-by: Neil Horman
    Signed-off-by: Herbert Xu

    Neil Horman
     
  • Add the ability to turn FIPS-compliant mode on or off at boot

    In order to be FIPS compliant, several check may need to be preformed that may
    be construed as unusefull in a non-compliant mode. This patch allows us to set
    a kernel flag incating that we are running in a fips-compliant mode from boot
    up. It also exports that mode information to user space via a sysctl
    (/proc/sys/crypto/fips_enabled).

    Tested successfully by me.

    Signed-off-by: Neil Horman
    Signed-off-by: Herbert Xu

    Neil Horman
     
  • This patch moves the newly created alg_test infrastructure into
    cryptomgr. This shall allow us to use it for testing at algorithm
    registrations.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • From NHM processor onward, Intel processors can support hardware accelerated
    CRC32c algorithm with the new CRC32 instruction in SSE 4.2 instruction set.
    The patch detects the availability of the feature, and chooses the most proper
    way to calculate CRC32c checksum.
    Byte code instructions are used for compiler compatibility.
    No MMX / XMM registers is involved in the implementation.

    Signed-off-by: Austin Zhang
    Signed-off-by: Kent Liu
    Signed-off-by: Herbert Xu

    Austin Zhang
     
  • Instead of tabs there were two spaces.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Herbert Xu

    Adrian Bunk