27 Jul, 2018

8 commits


20 Jul, 2018

7 commits


13 Jul, 2018

6 commits

  • Cast *val* to u64 in order to give the compiler complete
    information about the proper arithmetic to use.

    Notice that such variable is used in a context that expects an
    expression of type u64 (64 bits, unsigned) and the following
    expression is currently being evaluated using 32-bit arithmetic:

    val << bit_pos

    Addresses-Coverity-ID: 1467425 ("Unintentional integer overflow")
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Herbert Xu

    Gustavo A. R. Silva
     
  • Add a new CCP/PSP PCI device ID and new PSP register offsets.

    Signed-off-by: Tom Lendacky
    Acked-by: Gary R Hook
    Reviewed-by: Brijesh Singh
    Signed-off-by: Herbert Xu

    Tom Lendacky
     
  • In preparation for adding a new PSP device ID that uses different register
    offsets, add support to the PSP version data for register offset values.
    And then update the code to use these new register offset values.

    Signed-off-by: Tom Lendacky
    Acked-by: Gary R Hook
    Reviewed-by: Brijesh Singh
    Signed-off-by: Herbert Xu

    Tom Lendacky
     
  • Remove some unused #defines for register offsets that are not used. This
    will lessen the changes required when register offsets change between
    versions of the device.

    Signed-off-by: Tom Lendacky
    Acked-by: Gary R Hook
    Reviewed-by: Brijesh Singh
    Signed-off-by: Herbert Xu

    Tom Lendacky
     
  • Add a dev_notice() message to the PSP initialization to report when the
    PSP initialization has succeeded and the PSP is enabled.

    Signed-off-by: Tom Lendacky
    Acked-by: Gary R Hook
    Signed-off-by: Herbert Xu

    Tom Lendacky
     
  • The wait_event() function is used to detect command completion. The
    interrupt handler will set the wait condition variable when the interrupt
    is triggered. However, the variable used for wait_event() is initialized
    after the command has been submitted, which can create a race condition
    with the interrupt handler and result in the wait_event() never returning.
    Move the initialization of the wait condition variable to just before
    command submission.

    Fixes: 200664d5237f ("crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support")
    Cc: # 4.16.x-
    Signed-off-by: Tom Lendacky
    Reviewed-by: Brijesh Singh
    Acked-by: Gary R Hook
    Acked-by: Gary R Hook
    Signed-off-by: Herbert Xu

    Tom Lendacky
     

09 Jul, 2018

19 commits

  • A debug print about register status post interrupt can happen
    quite often. Rate limit it to avoid cluttering the log.

    Signed-off-by: Gilad Ben-Yossef
    Reported-by: Geert Uytterhoeven
    Tested-by: Geert Uytterhoeven
    Signed-off-by: Herbert Xu

    Gilad Ben-Yossef
     
  • The ccree driver implemented NIST 800-38A CBC-CS2 ciphertext format,
    which only reverses the last two blocks if the stolen ciphertext amount
    are none zero. Move it to the kernel chosen format of CBC-CS3 which swaps
    the final blocks unconditionally and rename it to "cts" now that it
    complies with the kernel format and passes the self tests.

    Ironically, the CryptoCell REE HW does just that, so the fix is dropping
    the code that forced it to use plain CBC if the ciphertext was block
    aligned.

    Signed-off-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Gilad Ben-Yossef
     
  • Remove legacy code no longer used by anything.

    Signed-off-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Gilad Ben-Yossef
     
  • We were copying our last cipher block into the request for use as IV for
    all modes of operations. Fix this by discerning the behaviour based on
    the mode of operation used: copy ciphertext for CBC, update counter for
    CTR.

    CC: stable@vger.kernel.org
    Fixes: 63ee04c8b491 ("crypto: ccree - add skcipher support")
    Reported by: Hadar Gat
    Signed-off-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Gilad Ben-Yossef
     
  • The testmgr hash tests were testing init, digest, update and final
    methods but not the finup method. Add a test for this one too.

    While doing this, make sure we only run the partial tests once with
    the digest tests and skip them with the final and finup tests since
    they are the same.

    Signed-off-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Gilad Ben-Yossef
     
  • finup() operation was incorrect, padding was missing.
    Fix by setting the ccree HW to enable padding.

    Signed-off-by: Hadar Gat
    [ gilad@benyossef.com: refactored for better code sharing ]
    Signed-off-by: Gilad Ben-Yossef
    Cc: stable@vger.kernel.org
    Signed-off-by: Herbert Xu

    Hadar Gat
     
  • Some crypto API users allocating a tfm with crypto_alloc_$FOO() are also
    specifying the type flags for $FOO, e.g. crypto_alloc_shash() with
    CRYPTO_ALG_TYPE_SHASH. But, that's redundant since the crypto API will
    override any specified type flag/mask with the correct ones.

    So, remove the unneeded flags.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Some skcipher algorithms set .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER. But
    this is redundant with the C structure type ('struct skcipher_alg'), and
    crypto_register_skcipher() already sets the type flag automatically,
    clearing any type flag that was already there. Apparently the useless
    assignment has just been copy+pasted around.

    So, remove the useless assignment from all the skcipher algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Acked-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Some aead algorithms set .cra_flags = CRYPTO_ALG_TYPE_AEAD. But this is
    redundant with the C structure type ('struct aead_alg'), and
    crypto_register_aead() already sets the type flag automatically,
    clearing any type flag that was already there. Apparently the useless
    assignment has just been copy+pasted around.

    So, remove the useless assignment from all the aead algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Some ahash algorithms set .cra_type = &crypto_ahash_type. But this is
    redundant with the C structure type ('struct ahash_alg'), and
    crypto_register_ahash() already sets the .cra_type automatically.
    Apparently the useless assignment has just been copy+pasted around.

    So, remove the useless assignment from all the ahash algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Acked-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH. But this
    is redundant with the C structure type ('struct ahash_alg'), and
    crypto_register_ahash() already sets the type flag automatically,
    clearing any type flag that was already there. Apparently the useless
    assignment has just been copy+pasted around.

    So, remove the useless assignment from all the ahash algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Acked-by: Gilad Ben-Yossef
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH. But this
    is redundant with the C structure type ('struct shash_alg'), and
    crypto_register_shash() already sets the type flag automatically,
    clearing any type flag that was already there. Apparently the useless
    assignment has just been copy+pasted around.

    So, remove the useless assignment from all the shash algorithms.

    This patch shouldn't change any actual behavior.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • With all the crypto modules enabled on x86, and with a CPU that supports
    AVX-2 but not SHA-NI instructions (e.g. Haswell, Broadwell, Skylake),
    the "multibuffer" implementations of SHA-1, SHA-256, and SHA-512 are the
    highest priority. However, these implementations only perform well when
    many hash requests are being submitted concurrently, filling all 8 AVX-2
    lanes. Otherwise, they are incredibly slow, as they waste time waiting
    for more requests to arrive before proceeding to execute each request.

    For example, here are the speeds I see hashing 4096-byte buffers with a
    single thread on a Haswell-based processor:

    generic avx2 mb (multibuffer)
    ------- -------- ----------------
    sha1 602 MB/s 997 MB/s 0.61 MB/s
    sha256 228 MB/s 412 MB/s 0.61 MB/s
    sha512 312 MB/s 559 MB/s 0.61 MB/s

    So, the multibuffer implementation is 500 to 1000 times slower than the
    other implementations. Note that with smaller buffers or more update()s
    per digest, the difference would be even greater.

    I believe the vast majority of people are in the boat where the
    multibuffer code is much slower, and only a small minority are doing the
    highly parallel, hashing-intensive, latency-flexible workloads (maybe
    IPsec on servers?) where the multibuffer code may be beneficial. Yet,
    people often aren't familiar with all the crypto config options and so
    the multibuffer code may inadvertently be built into the kernel.

    Also the multibuffer code apparently hasn't been very well tested,
    seeing as it was sometimes computing the wrong SHA-256 digest.

    So, let's make the multibuffer algorithms low priority. Users who want
    to use them can either request them explicitly by driver name, or use
    NETLINK_CRYPTO (crypto_user) to increase their priority at runtime.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • sha512-generic and sha384-generic had a cra_priority of 0, so it wasn't
    possible to have a lower priority SHA-512 or SHA-384 implementation, as
    is desired for sha512_mb which is only useful under certain workloads
    and is otherwise extremely slow. Change them to priority 100, which is
    the priority used for many of the other generic algorithms.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • sha256-generic and sha224-generic had a cra_priority of 0, so it wasn't
    possible to have a lower priority SHA-256 or SHA-224 implementation, as
    is desired for sha256_mb which is only useful under certain workloads
    and is otherwise extremely slow. Change them to priority 100, which is
    the priority used for many of the other generic algorithms.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • sha1-generic had a cra_priority of 0, so it wasn't possible to have a
    lower priority SHA-1 implementation, as is desired for sha1_mb which is
    only useful under certain workloads and is otherwise extremely slow.
    Change it to priority 100, which is the priority used for many of the
    other generic algorithms.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • "arch/x86/crypto/sha*-mb" needs a trailing slash, since it refers to
    directories. Otherwise get_maintainer.pl doesn't find the entry.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • There is a copy-paste error where sha256_mb_mgr_get_comp_job_avx2()
    copies the SHA-256 digest state from sha256_mb_mgr::args::digest to
    job_sha256::result_digest. Consequently, the sha256_mb algorithm
    sometimes calculates the wrong digest. Fix it.

    Reproducer using AF_ALG:

    #include
    #include
    #include
    #include
    #include
    #include

    static const __u8 expected[32] =
    "\xad\x7f\xac\xb2\x58\x6f\xc6\xe9\x66\xc0\x04\xd7\xd1\xd1\x6b\x02"
    "\x4f\x58\x05\xff\x7c\xb4\x7c\x7a\x85\xda\xbd\x8b\x48\x89\x2c\xa7";

    int main()
    {
    int fd;
    struct sockaddr_alg addr = {
    .salg_type = "hash",
    .salg_name = "sha256_mb",
    };
    __u8 data[4096] = { 0 };
    __u8 digest[32];
    int ret;
    int i;

    fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
    bind(fd, (void *)&addr, sizeof(addr));
    fork();
    fd = accept(fd, 0, 0);
    do {
    ret = write(fd, data, 4096);
    assert(ret == 4096);
    ret = read(fd, digest, 32);
    assert(ret == 32);
    } while (memcmp(digest, expected, 32) == 0);

    printf("wrong digest: ");
    for (i = 0; i < 32; i++)
    printf("%02x", digest[i]);
    printf("\n");
    }

    Output was:

    wrong digest: ad7facb2000000000000000000000000ffffffef7cb47c7a85dabd8b48892ca7

    Fixes: 172b1d6b5a93 ("crypto: sha256-mb - fix ctx pointer and digest copy")
    Cc: # v4.8+
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • This patch main goal is to improve driver performance by moving the
    crypto request from a list to a RDR ring shadow.

    This is possible since there is one producer and one consume for this
    RDR request shadow and one ring descriptor is left unused.
    Doing this change eliminates the use of spinlock when accessing the
    descriptor ring and the need to dynamicaly allocate memory per crypto
    request.

    The crypto request is placed in the first RDR shadow descriptor only
    if there are enough descriptors, when the result handler is invoked,
    it fetches the first result descriptor from RDR shadow.

    Signed-off-by: Ofer Heifetz
    Signed-off-by: Antoine Tenart
    Signed-off-by: Herbert Xu

    Ofer Heifetz