25 Sep, 2020

1 commit

  • There is no reason for the chacha20poly1305 SG miter code to use
    kmap instead of kmap_atomic as the critical section doesn't sleep
    anyway. So we can simply get rid of the preemptible check and
    set SG_MITER_ATOMIC unconditionally.

    Even if we need to reenable preemption to lower latency we should
    be doing that by interrupting the SG miter walk rather than using
    kmap.

    Reported-by: Linus Torvalds
    Signed-off-by: Herbert Xu

    Herbert Xu
     

16 Jul, 2020

2 commits


08 May, 2020

2 commits

  • sounds very generic and important, like it's the
    header to include if you're doing cryptographic hashing in the kernel.
    But actually it only includes the library implementation of the SHA-1
    compression function (not even the full SHA-1). This should basically
    never be used anymore; SHA-1 is no longer considered secure, and there
    are much better ways to do cryptographic hashing in the kernel.

    Most files that include this header don't actually need it. So in
    preparation for removing it, remove all these unneeded includes of it.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • The SHA-256 / SHA-224 library functions can't fail, so remove the
    useless return value.

    Also long as the declarations are being changed anyway, also fix some
    parameter names in the declarations to match the definitions.

    Signed-off-by: Eric Biggers
    Reviewed-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Eric Biggers
     

20 Mar, 2020

1 commit

  • Prior, passing in chunks of 2, 3, or 4, followed by any additional
    chunks would result in the chacha state counter getting out of sync,
    resulting in incorrect encryption/decryption, which is a pretty nasty
    crypto vuln: "why do images look weird on webpages?" WireGuard users
    never experienced this prior, because we have always, out of tree, used
    a different crypto library, until the recent Frankenzinc addition. This
    commit fixes the issue by advancing the pointers and state counter by
    the actual size processed. It also fixes up a bug in the (optional,
    costly) stride test that prevented it from running on arm64.

    Fixes: b3aad5bad26a ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function")
    Reported-and-tested-by: Emil Renner Berthing
    Cc: Ard Biesheuvel
    Cc: stable@vger.kernel.org # v5.5+
    Signed-off-by: Jason A. Donenfeld
    Reviewed-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     

14 Feb, 2020

1 commit

  • This code assigns src_len (size_t) to sl (int), which causes problems
    when src_len is very large. Probably nobody in the kernel should be
    passing this much data to chacha20poly1305 all in one go anyway, so I
    don't think we need to change the algorithm or introduce larger types
    or anything. But we should at least error out early in this case and
    print a warning so that we get reports if this does happen and can look
    into why anybody is possibly passing it that much data or if they're
    accidently passing -1 or similar.

    Fixes: d95312a3ccc0 ("crypto: lib/chacha20poly1305 - reimplement crypt_from_sg() routine")
    Cc: Ard Biesheuvel
    Cc: stable@vger.kernel.org # 5.5+
    Signed-off-by: Jason A. Donenfeld
    Acked-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     

22 Jan, 2020

1 commit

  • When this was originally ported, the 12-byte nonce vectors were left out
    to keep things simple. I agree that we don't need nor want a library
    interface for 12-byte nonces. But these test vectors were specially
    crafted to look at issues in the underlying primitives and related
    interactions. Therefore, we actually want to keep around all of the
    test vectors, and simply have a helper function to test them with.

    Secondly, the sglist-based chunking code in the library interface is
    rather complicated, so this adds a developer-only test for ensuring that
    all the book keeping is correct, across a wide array of possibilities.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     

16 Jan, 2020

3 commits

  • If CRYPTO_CURVE25519 is y, CRYPTO_LIB_CURVE25519_GENERIC will be
    y, but CRYPTO_LIB_CURVE25519 may be set to m, this causes build
    errors:

    lib/crypto/curve25519-selftest.o: In function `curve25519':
    curve25519-selftest.c:(.text.unlikely+0xc): undefined reference to `curve25519_arch'
    lib/crypto/curve25519-selftest.o: In function `curve25519_selftest':
    curve25519-selftest.c:(.init.text+0x17e): undefined reference to `curve25519_base_arch'

    This is because the curve25519 self-test code is being controlled
    by the GENERIC option rather than the overall CURVE25519 option,
    as is the case with blake2s. To recap, the GENERIC and ARCH options
    for CURVE25519 are internal only and selected by users such as
    the Crypto API, or the externally visible CURVE25519 option which
    in turn is selected by wireguard. The self-test is specific to the
    the external CURVE25519 option and should not be enabled by the
    Crypto API.

    This patch fixes this by splitting the GENERIC module from the
    CURVE25519 module with the latter now containing just the self-test.

    Reported-by: Hulk Robot
    Fixes: aa127963f1ca ("crypto: lib/curve25519 - re-add selftests")
    Signed-off-by: Herbert Xu
    Reviewed-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • These x86_64 vectorized implementations support AVX, AVX-2, and AVX512F.
    The AVX-512F implementation is disabled on Skylake, due to throttling,
    but it is quite fast on >= Cannonlake.

    On the left is cycle counts on a Core i7 6700HQ using the AVX-2
    codepath, comparing this implementation ("new") to the implementation in
    the current crypto api ("old"). On the right are benchmarks on a Xeon
    Gold 5120 using the AVX-512 codepath. The new implementation is faster
    on all benchmarks.

    AVX-2 AVX-512
    --------- -----------

    size old new size old new
    ---- ---- ---- ---- ---- ----
    0 70 68 0 74 70
    16 92 90 16 96 92
    32 134 104 32 136 106
    48 172 120 48 184 124
    64 218 136 64 218 138
    80 254 158 80 260 160
    96 298 174 96 300 176
    112 342 192 112 342 194
    128 388 212 128 384 212
    144 428 228 144 420 226
    160 466 246 160 464 248
    176 510 264 176 504 264
    192 550 282 192 544 282
    208 594 302 208 582 300
    224 628 316 224 624 318
    240 676 334 240 662 338
    256 716 354 256 708 358
    272 764 374 272 748 372
    288 802 352 288 788 358
    304 420 366 304 422 370
    320 428 360 320 432 364
    336 484 378 336 486 380
    352 426 384 352 434 390
    368 478 400 368 480 408
    384 488 394 384 490 398
    400 542 408 400 542 412
    416 486 416 416 492 426
    432 534 430 432 538 436
    448 544 422 448 546 432
    464 600 438 464 600 448
    480 540 448 480 548 456
    496 594 464 496 594 476
    512 602 456 512 606 470
    528 656 476 528 656 480
    544 600 480 544 606 498
    560 650 494 560 652 512
    576 664 490 576 662 508
    592 714 508 592 716 522
    608 656 514 608 664 538
    624 708 532 624 710 552
    640 716 524 640 720 516
    656 770 536 656 772 526
    672 716 548 672 722 544
    688 770 562 688 768 556
    704 774 552 704 778 556
    720 826 568 720 832 568
    736 768 574 736 780 584
    752 822 592 752 826 600
    768 830 584 768 836 560
    784 884 602 784 888 572
    800 828 610 800 838 588
    816 884 628 816 884 604
    832 888 618 832 894 598
    848 942 632 848 946 612
    864 884 644 864 896 628
    880 936 660 880 942 644
    896 948 652 896 952 608
    912 1000 664 912 1004 616
    928 942 676 928 954 634
    944 994 690 944 1000 646
    960 1002 680 960 1008 646
    976 1054 694 976 1062 658
    992 1002 706 992 1012 674
    1008 1052 720 1008 1058 690

    This commit wires in the prior implementation from Andy, and makes the
    following changes to be suitable for kernel land.

    - Some cosmetic and structural changes, like renaming labels to
    .Lname, constants, and other Linux conventions, as well as making
    the code easy for us to maintain moving forward.

    - CPU feature checking is done in C by the glue code.

    - We avoid jumping into the middle of functions, to appease objtool,
    and instead parameterize shared code.

    - We maintain frame pointers so that stack traces make sense.

    - We remove the dependency on the perl xlate code, which transforms
    the output into things that assemblers we don't care about use.

    Importantly, none of our changes affect the arithmetic or core code, but
    just involve the differing environment of kernel space.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Samuel Neves
    Co-developed-by: Samuel Neves
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • These two C implementations from Zinc -- a 32x32 one and a 64x64 one,
    depending on the platform -- come from Andrew Moon's public domain
    poly1305-donna portable code, modified for usage in the kernel. The
    precomputation in the 32-bit version and the use of 64x64 multiplies in
    the 64-bit version make these perform better than the code it replaces.
    Moon's code is also very widespread and has received many eyeballs of
    scrutiny.

    There's a bit of interference between the x86 implementation, which
    relies on internal details of the old scalar implementation. In the next
    commit, the x86 implementation will be replaced with a faster one that
    doesn't rely on this, so none of this matters much. But for now, to keep
    this passing the tests, we inline the bits of the old implementation
    that the x86 implementation relied on. Also, since we now support a
    slightly larger key space, via the union, some offsets had to be fixed
    up.

    Nonce calculation was folded in with the emit function, to take
    advantage of 64x64 arithmetic. However, Adiantum appeared to rely on no
    nonce handling in emit, so this path was conditionalized. We also
    introduced a new struct, poly1305_core_key, to represent the precise
    amount of space that particular implementation uses.

    Testing with kbench9000, depending on the CPU, the update function for
    the 32x32 version has been improved by 4%-7%, and for the 64x64 by
    19%-30%. The 32x32 gains are small, but I think there's great value in
    having a parallel implementation to the 64x64 one so that the two can be
    compared side-by-side as nice stand-alone units.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     

27 Dec, 2019

1 commit

  • Somehow these were dropped when Zinc was being integrated, which is
    problematic, because testing the library interface for Curve25519 is
    important.. This commit simply adds them back and wires them in in the
    same way that the blake2s selftests are wired in.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     

22 Nov, 2019

1 commit


17 Nov, 2019

13 commits

  • Reimplement the library routines to perform chacha20poly1305 en/decryption
    on scatterlists, without [ab]using the [deprecated] blkcipher interface,
    which is rather heavyweight and does things we don't really need.

    Instead, we use the sg_miter API in a novel and clever way, to iterate
    over the scatterlist in-place (i.e., source == destination, which is the
    only way this library is expected to be used). That way, we don't have to
    iterate over two scatterlists in parallel.

    Another optimization is that, instead of relying on the blkcipher walker
    to present the input in suitable chunks, we recognize that ChaCha is a
    streamcipher, and so we can simply deal with partial blocks by keeping a
    block of cipherstream on the stack and use crypto_xor() to mix it with
    the in/output.

    Finally, we omit the scatterwalk_and_copy() call if the last element of
    the scatterlist covers the MAC as well (which is the common case),
    avoiding the need to walk the scatterlist and kmap() the page twice.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This incorporates the chacha20poly1305 from the Zinc library, retaining
    the library interface, but replacing the implementation with calls into
    the code that already existed in the kernel's crypto API.

    Note that this library API does not implement RFC7539 fully, given that
    it is limited to 64-bit nonces. (The 96-bit nonce version that was part
    of the selftest only has been removed, along with the 96-bit nonce test
    vectors that only tested the selftest but not the actual library itself)

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Arnd reports that the 32-bit generic library code for Curve25119 ends
    up using an excessive amount of stack space when built with Clang:

    lib/crypto/curve25519-fiat32.c:756:6: error: stack frame size
    of 1384 bytes in function 'curve25519_generic'
    [-Werror,-Wframe-larger-than=]

    Let's give some hints to the compiler regarding which routines should
    not be inlined, to prevent it from running out of registers and spilling
    to the stack. The resulting code performs identically under both GCC
    and Clang, and makes the warning go away.

    Suggested-by: Arnd Bergmann
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This contains two formally verified C implementations of the Curve25519
    scalar multiplication function, one for 32-bit systems, and one for
    64-bit systems whose compiler supports efficient 128-bit integer types.
    Not only are these implementations formally verified, but they are also
    the fastest available C implementations. They have been modified to be
    friendly to kernel space and to be generally less horrendous looking,
    but still an effort has been made to retain their formally verified
    characteristic, and so the C might look slightly unidiomatic.

    The 64-bit version comes from HACL*: https://github.com/project-everest/hacl-star
    The 32-bit version comes from Fiat: https://github.com/mit-plv/fiat-crypto

    Information: https://cr.yp.to/ecdh.html

    Signed-off-by: Jason A. Donenfeld
    [ardb: - move from lib/zinc to lib/crypto
    - replace .c #includes with Kconfig based object selection
    - drop simd handling and simplify support for per-arch versions ]
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • The C implementation was originally based on Samuel Neves' public
    domain reference implementation but has since been heavily modified
    for the kernel. We're able to do compile-time optimizations by moving
    some scaffolding around the final function into the header file.

    Information: https://blake2.net/

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Samuel Neves
    Co-developed-by: Samuel Neves
    [ardb: - move from lib/zinc to lib/crypto
    - remove simd handling
    - rewrote selftest for better coverage
    - use fixed digest length for blake2s_hmac() and rename to
    blake2s256_hmac() ]
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation for
    MIPS authored by Andy Polyakov, a prior 64-bit only version of which has been
    contributed by him to the OpenSSL project. The file 'poly1305-mips.pl' is taken
    straight from this upstream GitHub repository [0] at commit
    d22ade312a7af958ec955620b0d241cf42c37feb, and already contains all the changes
    required to build it as part of a Linux kernel module.

    [0] https://github.com/dot-asm/cryptogams

    Co-developed-by: Andy Polyakov
    Signed-off-by: Andy Polyakov
    Co-developed-by: René van Dorst
    Signed-off-by: René van Dorst
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation
    for NEON authored by Andy Polyakov, and contributed by him to the OpenSSL
    project. The file 'poly1305-armv4.pl' is taken straight from this upstream
    GitHub repository [0] at commit ec55a08dc0244ce570c4fc7cade330c60798952f,
    and already contains all the changes required to build it as part of a
    Linux kernel module.

    [0] https://github.com/dot-asm/cryptogams

    Co-developed-by: Andy Polyakov
    Signed-off-by: Andy Polyakov
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation
    for NEON authored by Andy Polyakov, and contributed by him to the OpenSSL
    project. The file 'poly1305-armv8.pl' is taken straight from this upstream
    GitHub repository [0] at commit ec55a08dc0244ce570c4fc7cade330c60798952f,
    and already contains all the changes required to build it as part of a
    Linux kernel module.

    [0] https://github.com/dot-asm/cryptogams

    Co-developed-by: Andy Polyakov
    Signed-off-by: Andy Polyakov
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Implement the arch init/update/final Poly1305 library routines in the
    accelerated SIMD driver for x86 so they are accessible to users of
    the Poly1305 library interface as well.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Expose the existing generic Poly1305 code via a init/update/final
    library interface so that callers are not required to go through
    the crypto API's shash abstraction to access it. At the same time,
    make some preparations so that the library implementation can be
    superseded by an accelerated arch-specific version in the future.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Move the core Poly1305 routines shared between the generic Poly1305
    shash driver and the Adiantum and NHPoly1305 drivers into a separate
    library so that using just this pieces does not pull in the crypto
    API pieces of the generic Poly1305 routine.

    In a subsequent patch, we will augment this generic library with
    init/update/final routines so that Poyl1305 algorithm can be used
    directly without the need for using the crypto API's shash abstraction.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Currently, our generic ChaCha implementation consists of a permute
    function in lib/chacha.c that operates on the 64-byte ChaCha state
    directly [and which is always included into the core kernel since it
    is used by the /dev/random driver], and the crypto API plumbing to
    expose it as a skcipher.

    In order to support in-kernel users that need the ChaCha streamcipher
    but have no need [or tolerance] for going through the abstractions of
    the crypto API, let's expose the streamcipher bits via a library API
    as well, in a way that permits the implementation to be superseded by
    an architecture specific one if provided.

    So move the streamcipher code into a separate module in lib/crypto,
    and expose the init() and crypt() routines to users of the library.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • In preparation of introducing a set of crypto library interfaces, tidy
    up the Makefile and split off the Kconfig symbols into a separate file.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

05 Sep, 2019

2 commits

  • lib/crypto/sha256.c and include/crypto/sha256_base.h define
    99% identical functions to init a sha256_state struct for sha224 or
    sha256 use.

    This commit moves the functions from lib/crypto/sha256.c to
    include/crypto/sha.h (making them static inline) and makes the
    sha224/256_base_init static inline functions from
    include/crypto/sha256_base.h wrappers around the now also
    static inline include/crypto/sha.h functions.

    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     
  • The generic sha256 implementation from lib/crypto/sha256.c uses data
    structs defined in crypto/sha.h, so lets move the function prototypes
    there too.

    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     

30 Aug, 2019

1 commit


22 Aug, 2019

5 commits

  • Add sha224 support to the lib/crypto/sha256 library code. This will allow
    us to replace both the sha256 and sha224 parts of crypto/sha256_generic.c
    when we remove the code duplication in further patches in this series.

    Suggested-by: Eric Biggers
    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     
  • Before this commit lib/crypto/sha256.c has only been used in the s390 and
    x86 purgatory code, make it suitable for generic use:

    * Export interesting symbols
    * Add -D__DISABLE_EXPORTS to CFLAGS_sha256.o for purgatory builds to
    avoid the exports for the purgatory builds
    * Add to lib/crypto/Makefile and crypto/Kconfig

    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     
  • Use get/put_unaligned_be32 in lib/crypto/sha256.c to load / store data
    so that it can be used with unaligned buffers too, making it more generic.

    And use memzero_explicit for better clearing of sensitive data.

    Note unlike other patches in this series this commit actually makes
    functional changes to the sha256 code as used by the purgatory code.

    This fully aligns the lib/crypto/sha256.c sha256 implementation with the
    one from crypto/sha256_generic.c allowing us to remove the latter in
    further patches in this series.

    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     
  • Generic crypto implementations belong under lib/crypto not directly in
    lib, likewise the header should be in include/crypto, not include/linux.

    Note that the code in lib/crypto/sha256.c is not yet available for
    generic use after this commit, it is still only used by the s390 and x86
    purgatory code. Making it suitable for generic use is done in further
    patches in this series.

    Signed-off-by: Hans de Goede
    Signed-off-by: Herbert Xu

    Hans de Goede
     
  • Another one for the cipher museum: split off DES core processing into
    a separate module so other drivers (mostly for crypto accelerators)
    can reuse the code without pulling in the generic DES cipher itself.
    This will also permit the cipher interface to be made private to the
    crypto API itself once we move the only user in the kernel (CIFS) to
    this library interface.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

09 Aug, 2019

1 commit


26 Jul, 2019

2 commits


20 Jun, 2019

1 commit