18 Apr, 2019

1 commit

  • Use subsys_initcall for registration of all templates and generic
    algorithm implementations, rather than module_init. Then change
    cryptomgr to use arch_initcall, to place it before the subsys_initcalls.

    This is needed so that when both a generic and optimized implementation
    of an algorithm are built into the kernel (not loadable modules), the
    generic implementation is registered before the optimized one.
    Otherwise, the self-tests for the optimized implementation are unable to
    allocate the generic implementation for the new comparison fuzz tests.

    Note that on arm, a side effect of this change is that self-tests for
    generic implementations may run before the unaligned access handler has
    been installed. So, unaligned accesses will crash the kernel. This is
    arguably a good thing as it makes it easier to detect that type of bug.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

22 Mar, 2019

2 commits

  • In chacha_docrypt(), use crypto_xor_cpy() instead of crypto_xor().
    This avoids having to memcpy() the src buffer to the dst buffer.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • The arm64 implementations of ChaCha and XChaCha are failing the extra
    crypto self-tests following my patches to test the !may_use_simd() code
    paths, which previously were untested. The problem is as follows:

    When !may_use_simd(), the arm64 NEON implementations fall back to the
    generic implementation, which uses the skcipher_walk API to iterate
    through the src/dst scatterlists. Due to how the skcipher_walk API
    works, walk.stride is set from the skcipher_alg actually being used,
    which in this case is the arm64 NEON algorithm. Thus walk.stride is
    5*CHACHA_BLOCK_SIZE, not CHACHA_BLOCK_SIZE.

    This unnecessarily large stride shouldn't cause an actual problem.
    However, the generic implementation computes round_down(nbytes,
    walk.stride). round_down() assumes the round amount is a power of 2,
    which 5*CHACHA_BLOCK_SIZE is not, so it gives the wrong result.

    This causes the following case in skcipher_walk_done() to be hit,
    causing a WARN() and failing the encryption operation:

    if (WARN_ON(err)) {
    /* unexpected case; didn't process all bytes */
    err = -EINVAL;
    goto finish;
    }

    Fix it by rounding down to CHACHA_BLOCK_SIZE instead of walk.stride.

    (Or we could replace round_down() with rounddown(), but that would add a
    slow division operation every time, which I think we should avoid.)

    Fixes: 2fe55987b262 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed")
    Cc: # v5.0+
    Signed-off-by: Eric Biggers
    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Eric Biggers
     

20 Nov, 2018

2 commits

  • Now that the generic implementation of ChaCha20 has been refactored to
    allow varying the number of rounds, add support for XChaCha12, which is
    the XSalsa construction applied to ChaCha12. ChaCha12 is one of the
    three ciphers specified by the original ChaCha paper
    (https://cr.yp.to/chacha/chacha-20080128.pdf: "ChaCha, a variant of
    Salsa20"), alongside ChaCha8 and ChaCha20. ChaCha12 is faster than
    ChaCha20 but has a lower, but still large, security margin.

    We need XChaCha12 support so that it can be used in the Adiantum
    encryption mode, which enables disk/file encryption on low-end mobile
    devices where AES-XTS is too slow as the CPUs lack AES instructions.

    We'd prefer XChaCha20 (the more popular variant), but it's too slow on
    some of our target devices, so at least in some cases we do need the
    XChaCha12-based version. In more detail, the problem is that Adiantum
    is still much slower than we're happy with, and encryption still has a
    quite noticeable effect on the feel of low-end devices. Users and
    vendors push back hard against encryption that degrades the user
    experience, which always risks encryption being disabled entirely. So
    we need to choose the fastest option that gives us a solid margin of
    security, and here that's XChaCha12. The best known attack on ChaCha
    breaks only 7 rounds and has 2^235 time complexity, so ChaCha12's
    security margin is still better than AES-256's. Much has been learned
    about cryptanalysis of ARX ciphers since Salsa20 was originally designed
    in 2005, and it now seems we can be comfortable with a smaller number of
    rounds. The eSTREAM project also suggests the 12-round version of
    Salsa20 as providing the best balance among the different variants:
    combining very good performance with a "comfortable margin of security".

    Note that it would be trivial to add vanilla ChaCha12 in addition to
    XChaCha12. However, it's unneeded for now and therefore is omitted.

    As discussed in the patch that introduced XChaCha20 support, I
    considered splitting the code into separate chacha-common, chacha20,
    xchacha20, and xchacha12 modules, so that these algorithms could be
    enabled/disabled independently. However, since nearly all the code is
    shared anyway, I ultimately decided there would have been little benefit
    to the added complexity.

    Reviewed-by: Ard Biesheuvel
    Acked-by: Martin Willi
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • In preparation for adding XChaCha12 support, rename/refactor
    chacha20-generic to support different numbers of rounds. The
    justification for needing XChaCha12 support is explained in more detail
    in the patch "crypto: chacha - add XChaCha12 support".

    The only difference between ChaCha{8,12,20} are the number of rounds
    itself; all other parts of the algorithm are the same. Therefore,
    remove the "20" from all definitions, structures, functions, files, etc.
    that will be shared by all ChaCha versions.

    Also make ->setkey() store the round count in the chacha_ctx (previously
    chacha20_ctx). The generic code then passes the round count through to
    chacha_block(). There will be a ->setkey() function for each explicitly
    allowed round count; the encrypt/decrypt functions will be the same. I
    decided not to do it the opposite way (same ->setkey() function for all
    round counts, with different encrypt/decrypt functions) because that
    would have required more boilerplate code in architecture-specific
    implementations of ChaCha and XChaCha.

    Reviewed-by: Ard Biesheuvel
    Acked-by: Martin Willi
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers