29 Jan, 2021

8 commits


22 Jan, 2021

16 commits

  • Taking ownership of the FPU in kernel mode disables preemption, and
    this may result in excessive scheduling blackouts if the size of the
    data being processed on the FPU is unbounded.

    Given that taking and releasing the FPU is cheap these days on x86, we
    can limit the impact of this issue easily for skcipher implementations,
    by moving the FPU begin/end calls inside the skcipher walk processing
    loop. Considering that skcipher walks operate on at most one page at a
    time, doing so fully mitigates this issue.

    This also permits the skcipher walk logic to use non-atomic kmalloc()
    calls etc so we can change the 'atomic' bool argument in the calls to
    skcipher_walk_virt() to false as well.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Indirect calls are very expensive on x86, so use a static call to set
    the system-wide AES-NI/CTR asm helper.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • src_size and aad_size are defined as u32, so the following expressions are
    currently being evaluated using 32-bit arithmetic:

    bit_len = src_size * 8;
    ...
    bit_len = aad_size * 8;

    However, bit_len is used afterwards in a context that expects a valid
    64-bit value (the lower and upper 32-bit words of bit_len are extracted
    and written to hw).

    In order to make sure the correct bit length is generated and the 32-bit
    multiplication does not wrap around, cast src_size and aad_size to u64.

    Signed-off-by: Ovidiu Panait
    Acked-by: Daniele Alessandrelli
    Signed-off-by: Herbert Xu

    Ovidiu Panait
     
  • With no mod_exit function, users are unable to unload the module after
    use. I'm not aware of any reason why module unloading should be
    prohibited for this one, so this commit simply adds an empty exit
    function.

    Reported-and-tested-by: John Donnelly
    Acked-by: Ard Biesheuvel
    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • CPT offload module utilises the linux crypto framework to offload
    crypto processing. This patch registers supported algorithms by
    calling registration functions provided by the kernel crypto API.

    The module currently supports:
    - AES block cipher in CBC,ECB and XTS mode.
    - 3DES block cipher in CBC and ECB mode.
    - AEAD algorithms.
    authenc(hmac(sha1),cbc(aes)),
    authenc(hmac(sha256),cbc(aes)),
    authenc(hmac(sha384),cbc(aes)),
    authenc(hmac(sha512),cbc(aes)),
    authenc(hmac(sha1),ecb(cipher_null)),
    authenc(hmac(sha256),ecb(cipher_null)),
    authenc(hmac(sha384),ecb(cipher_null)),
    authenc(hmac(sha512),ecb(cipher_null)),
    rfc4106(gcm(aes)).

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • Attach LFs to CPT VF to process the crypto requests and register
    LF interrupts.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • Add support for the Marvell OcteonTX2 CPT virtual function
    driver. This patch includes probe, PCI specific initialization
    and interrupt handling.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • Adds support to get engine capabilities and adds a new mailbox
    to share capabilities with VF driver.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • CPT RVU Local Functions(LFs) needs to be attached to the
    PF/VF to submit the instructions to CPT.
    This patch adds the interface to initialize and attach
    the LFs. It also adds interface to register the LF's
    interrupts.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • CPT includes microcoded GigaCypher symmetric engines(SEs), IPsec
    symmetric engines(IEs), and asymmetric engines (AEs).
    Each engine receives CPT instructions from the engine groups it has
    subscribed to. This patch loads microcode, configures three engine
    groups(one for SEs, one for IEs and one for AEs), and configures
    all engines.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • Adds 'sriov_configure' to enable/disable virtual functions (VFs).
    Also Initializes VFPF mailbox IRQs, register handlers for
    processing these mailbox messages.

    Admin function (AF) handles resource allocation and configuration for
    PFs and their VFs. PFs request the AF directly, via mailboxes.
    Unlike PFs, VFs cannot send a mailbox request directly. A VF sends
    mailbox messages to its parent PF, with which it shares a mailbox
    region. The PF then forwards these messages to the AF. After handling
    the request, the AF sends a response back to the VF, through the PF.

    This patch adds support for this 'VF PF AF' mailbox
    communication.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • In the resource virtualization unit (RVU) each of the PF and AF
    (admin function) share a 64KB of reserved memory region for
    communication. This patch initializes PF AF mailbox IRQs,
    registers handlers for processing these communication messages.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • Adds skeleton for the Marvell OcteonTX2 CPT physical function
    driver which includes probe, PCI specific initialization and
    hardware register defines.
    RVU defines are present in AF driver
    (drivers/net/ethernet/marvell/octeontx2/af), header files from
    AF driver are included here to avoid duplication.

    Signed-off-by: Suheil Chandran
    Signed-off-by: Lukasz Bartosik
    Signed-off-by: Srujana Challa
    Signed-off-by: Herbert Xu

    Srujana Challa
     
  • The accelerated, instruction based implementations of SHA1, SHA2 and
    SHA3 are autoloaded based on CPU capabilities, given that the code is
    modest in size, and widely used, which means that resolving the algo
    name, loading all compatible modules and picking the one with the
    highest priority is taken to be suboptimal.

    However, if these algorithms are requested before this CPU feature
    based matching and autoloading occurs, these modules are not even
    considered, and we end up with suboptimal performance.

    So add the missing module aliases for the various SHA implementations.

    Cc:
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • This patch fixes a number of sparse warnings in the bcm driver.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • Unlike many other structure types defined in the crypto API, the
    'shash_desc' structure is permitted to live on the stack, which
    implies its contents may not be accessed by DMA masters. (This is
    due to the fact that the stack may be located in the vmalloc area,
    which requires a different virtual-to-physical translation than the
    one implemented by the DMA subsystem)

    Our definition of CRYPTO_MINALIGN_ATTR is based on ARCH_KMALLOC_MINALIGN,
    which may take DMA constraints into account on architectures that support
    non-cache coherent DMA such as ARM and arm64. In this case, the value is
    chosen to reflect the largest cacheline size in the system, in order to
    ensure that explicit cache maintenance as required by non-coherent DMA
    masters does not affect adjacent, unrelated slab allocations. On arm64,
    this value is currently set at 128 bytes.

    This means that applying CRYPTO_MINALIGN_ATTR to struct shash_desc is both
    unnecessary (as it is never used for DMA), and undesirable, given that it
    wastes stack space (on arm64, performing the alignment costs 112 bytes in
    the worst case, and the hole between the 'tfm' and '__ctx' members takes
    up another 120 bytes, resulting in an increased stack footprint of up to
    232 bytes.) So instead, let's switch to the minimum SLAB alignment, which
    does not take DMA constraints into account.

    Note that this is a no-op for x86.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

14 Jan, 2021

16 commits

  • Add the following additional dependencies for CRYPTO_DEV_KEEMBAY_OCS_HCU:

    - HAS_IOMEM to prevent build failures

    - ARCH_KEEMBAY to prevent asking the user about this driver when
    configuring a kernel without Intel Keem Bay platform support.

    Signed-off-by: Daniele Alessandrelli
    Signed-off-by: Herbert Xu

    Daniele Alessandrelli
     
  • The first argument to WARN() is a condition and the messages is the
    second argument is the string, so this WARN() will only display the
    __func__ part of the message.

    Fixes: ae832e329a8d ("crypto: keembay-ocs-hcu - Add HMAC support")
    Signed-off-by: Dan Carpenter
    Acked-by: Daniele Alessandrelli
    Signed-off-by: Herbert Xu

    Dan Carpenter
     
  • The Camellia, Serpent and Twofish related header files only contain
    declarations that are shared between different implementations of the
    respective algorithms residing under arch/x86/crypto, and none of their
    contents should be used elsewhere. So move the header files into the
    same location, and use local #includes instead.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • All dependencies on the x86 glue helper module have been replaced by
    local instantiations of the new ECB/CBC preprocessor helper macros, so
    the glue helper module can be retired.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Replace the glue helper dependency with implementations of ECB and CBC
    based on the new CPP macros, which avoid the need for indirect calls.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Replace the glue helper dependency with implementations of ECB and CBC
    based on the new CPP macros, which avoid the need for indirect calls.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Replace the glue helper dependency with implementations of ECB and CBC
    based on the new CPP macros, which avoid the need for indirect calls.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Replace the glue helper dependency with implementations of ECB and CBC
    based on the new CPP macros, which avoid the need for indirect calls.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Replace the glue helper dependency with implementations of ECB and CBC
    based on the new CPP macros, which avoid the need for indirect calls.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • The x86 glue helper module is starting to show its age:
    - It relies heavily on function pointers to invoke asm helper functions that
    operate on fixed input sizes that are relatively small. This means the
    performance is severely impacted by retpolines.
    - It goes to great lengths to amortize the cost of kernel_fpu_begin()/end()
    over as much work as possible, which is no longer necessary now that FPU
    save/restore is done lazily, and doing so may cause unbounded scheduling
    blackouts due to the fact that enabling the FPU in kernel mode disables
    preemption.
    - The CBC mode decryption helper makes backward strides through the input, in
    order to avoid a single block size memcpy() between chunks. Consuming the
    input in this manner is highly likely to defeat any hardware prefetchers,
    so it is better to go through the data linearly, and perform the extra
    memcpy() where needed (which is turned into direct loads and stores by the
    compiler anyway). Note that benchmarks won't show this effect, given that
    the memory they use is always cache hot.
    - It implements blockwise XOR in terms of le128 pointers, which imply an
    alignment that is not guaranteed by the API, violating the C standard.

    GCC does not seem to be smart enough to elide the indirect calls when the
    function pointers are passed as arguments to static inline helper routines
    modeled after the existing ones. So instead, let's create some CPP macros
    that encapsulate the core of the ECB and CBC processing, so we can wire
    them up for existing users of the glue helper module, i.e., Camellia,
    Serpent, Twofish and CAST6.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Blowfish in counter mode is never used in the kernel, so there
    is no point in keeping an accelerated implementation around.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • DES or Triple DES in counter mode is never used in the kernel, so there
    is no point in keeping an accelerated implementation around.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • The glue helper's CTR routines are no longer used, so drop them.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Twofish in CTR mode is never used by the kernel directly, and is highly
    unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
    the accelerated CTR mode implementation, and instead, rely on the CTR
    template and the bare cipher.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • CAST6 in CTR mode is never used by the kernel directly, and is highly
    unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
    the accelerated CTR mode implementation, and instead, rely on the CTR
    template and the bare cipher.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • CAST5 in CTR mode is never used by the kernel directly, and is highly
    unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
    the accelerated CTR mode implementation, and instead, rely on the CTR
    template and the bare cipher.

    Acked-by: Eric Biggers
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel