27 Nov, 2020

4 commits

  • Wiring the SIMD code into the generic driver has the unfortunate side
    effect that the tcrypt testing code cannot distinguish them, and will
    therefore not use the latter to fuzz test the former, as it does for
    other algorithms.

    So let's refactor the code a bit so we can register two implementations:
    aegis128-generic and aegis128-simd.

    Signed-off-by: Ard Biesheuvel
    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Instead of calculating the tag and returning it to the caller on
    decryption, use a SIMD compare and min across vector to perform
    the comparison. This is slightly more efficient, and removes the
    need on the caller's part to wipe the tag from memory if the
    decryption failed.

    While at it, switch to unsigned int when passing cryptlen and
    assoclen - we don't support input sizes where it matters anyway.

    Signed-off-by: Ard Biesheuvel
    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Avoid copying the tail block via a stack buffer if the total size
    exceeds a single AEGIS block. In this case, we can use overlapping
    loads and stores and NEON permutation instructions instead, which
    leads to a modest performance improvement on some cores (< 5%),
    and is slightly cleaner. Note that we still need to use a stack
    buffer if the entire input is smaller than 16 bytes, given that
    we cannot use 16 byte NEON loads and stores safely in this case.

    Signed-off-by: Ard Biesheuvel
    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • The AEGIS spec mentions explicitly that the security guarantees hold
    only if the resulting plaintext and tag of a failed decryption are
    withheld. So ensure that we abide by this.

    While at it, drop the unused struct aead_request *req parameter from
    crypto_aegis128_process_crypt().

    Reviewed-by: Ondrej Mosnacek
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

20 Nov, 2020

13 commits

  • This patch fixes the following smatch warnings:
    drivers/crypto/allwinner/sun8i-ce/sun8i-ce-hash.c:412
    sun8i_ce_hash_run() warn: possible memory leak of 'result'
    Note: "buf" is leaked as well.

    Furthermore, in case of ENOMEM, crypto_finalize_hash_request() was not
    called which was an error.

    Fixes: 56f6d5aee88d ("crypto: sun8i-ce - support hash algorithms")
    Reported-by: kernel test robot
    Reported-by: Dan Carpenter
    Signed-off-by: Corentin Labbe
    Signed-off-by: Herbert Xu

    Corentin Labbe
     
  • There are a couple of spelling mistakes in two crypto Kconfig files.
    Fix these.

    Signed-off-by: Colin Ian King
    Signed-off-by: Herbert Xu

    Colin Ian King
     
  • Add support for QAT 4xxx devices.

    Signed-off-by: Giovanni Cabiddu
    Reviewed-by: Fiona Trahe
    Signed-off-by: Herbert Xu

    Giovanni Cabiddu
     
  • Add an hook to initialize the vector routing table with the default
    values before MSIx is enabled.
    The new function set_msix_rttable() is called only if present in the
    struct adf_hw_device_data of the device. This is to allow for QAT
    devices that do not support that functionality.

    Signed-off-by: Giovanni Cabiddu
    Reviewed-by: Fiona Trahe
    Signed-off-by: Herbert Xu

    Giovanni Cabiddu
     
  • Introduce support for devices that require multiple firmware images.
    If a device requires more than a firmware image to operate, load the
    image to the appropriate Acceleration Engine (AE).

    Signed-off-by: Giovanni Cabiddu
    Reviewed-by: Fiona Trahe
    Signed-off-by: Herbert Xu

    Giovanni Cabiddu
     
  • The pm_runtime_enable will increase power disable depth.
    Thus a pairing decrement is needed on the error handling
    path to keep it balanced according to context.

    Fixes: f7b2b5dd6a62a ("crypto: omap-aes - add error check for pm_runtime_get_sync")
    Signed-off-by: Zhang Qilong
    Signed-off-by: Herbert Xu

    Zhang Qilong
     
  • The patch 'irqchip/gic-v3-its: Balance initial LPI affinity across CPUs'
    set the IRQ to an uncentain CPU. If an IRQ is bound to the CPU used by the
    thread which is sending request, the throughput will be just half.

    So allocate a 'work_queue' and set as 'WQ_UNBOUND' to do the back half work
    on some different CPUS.

    Signed-off-by: Yang Shen
    Reviewed-by: Zaibo Xu
    Reviewed-by: Zhou Wang
    Signed-off-by: Herbert Xu

    Yang Shen
     
  • This patch moves the curve25519_selftest into curve25519.h so
    we don't get a warning from gcc complaining about a missing
    prototype.

    Reported-by: kernel test robot
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • Currently contains declarations for both SHA-1 and SHA-2,
    and contains declarations for SHA-3.

    This organization is inconsistent, but more importantly SHA-1 is no
    longer considered to be cryptographically secure. So to the extent
    possible, SHA-1 shouldn't be grouped together with any of the other SHA
    versions, and usage of it should be phased out.

    Therefore, split into two headers and
    , and make everyone explicitly specify whether they want
    the declarations for SHA-1, SHA-2, or both.

    This avoids making the SHA-1 declarations visible to files that don't
    want anything to do with SHA-1. It also prepares for potentially moving
    sha1.h into a new insecure/ or dangerous/ directory.

    Signed-off-by: Eric Biggers
    Acked-by: Ard Biesheuvel
    Acked-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Eric Biggers
     
  • Clang warns:

    drivers/crypto/amcc/crypto4xx_core.c:921:60: warning: operator '?:' has
    lower precedence than '|'; '|' will be evaluated first
    [-Wbitwise-conditional-parentheses]
    (crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
    drivers/crypto/amcc/crypto4xx_core.c:921:60: note: place parentheses
    around the '|' expression to silence this warning
    (crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
    ^
    )
    drivers/crypto/amcc/crypto4xx_core.c:921:60: note: place parentheses
    around the '?:' expression to evaluate it first
    (crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
    ^
    (
    1 warning generated.

    It looks like this should have been a logical OR so that
    PD_CTL_HASH_FINAL gets added to the w bitmask if crypto_tfm_alg_type
    is either CRYPTO_ALG_TYPE_AHASH or CRYPTO_ALG_TYPE_AEAD. Change the
    operator so that everything works properly.

    Fixes: 4b5b79998af6 ("crypto: crypto4xx - fix stalls under heavy load")
    Link: https://github.com/ClangBuiltLinux/linux/issues/1198
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Christian Lamparter
    Signed-off-by: Herbert Xu

    Nathan Chancellor
     
  • Wang Qing reports that IS_ERR_OR_NULL() should be matched with
    PTR_ERR_OR_ZERO(), not PTR_ERR().

    As it turns out, the error path always returns an error code,
    i.e. NULL is never returned.
    Update the code accordingly - s/IS_ERR_OR_NULL/IS_ERR.

    Reported-by: Wang Qing
    Signed-off-by: Horia Geantă
    Signed-off-by: Herbert Xu

    Horia Geantă
     
  • Instead of copying the calculated authentication tag to memory and
    calling crypto_memneq() to verify it, use vector bytewise compare and
    min across vector instructions to decide whether the tag is valid. This
    is more efficient, and given that the tag is only transiently held in a
    NEON register, it is also safer, given that calculated tags for failed
    decryptions should be withheld.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Fix aead auth setting key process error. if use soft shash function, driver
    need to use digest size replace of the user input key length.

    Signed-off-by: Kai Ye
    Signed-off-by: Herbert Xu

    Kai Ye
     

13 Nov, 2020

23 commits

  • Based on lessons learnt from optimizing the 32-bit version of this driver,
    we can simplify the arm64 version considerably, by reordering the final
    two stores when the last block is not a multiple of 64 bytes. This removes
    the need to use permutation instructions to calculate the elements that are
    clobbered by the final overlapping store, given that the store of the
    penultimate block now follows it, and that one carries the correct values
    for those elements already.

    While at it, simplify the overlapping loads as well, by calculating the
    address of the final overlapping load upfront, and switching to this
    address for every load that would otherwise extend past the end of the
    source buffer.

    There is no impact on performance, but the resulting code is substantially
    smaller and easier to follow.

    Cc: Eric Biggers
    Cc: "Jason A . Donenfeld"
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     
  • Add support for the QAT gen4 devices in the firmware loader.

    Signed-off-by: Jack Xu
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add support for broadcasting mode in firmware loader to enable the next
    generation of QAT devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add support for shared ustore mode support. This is required by the next
    generation of QAT devices to share the same fw image across engines.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Introduce new API, qat_uclo_set_cfg_ae_mask(), to allow the load of the
    firmware image to a subset of Acceleration Engines (AEs). This is
    required by the next generation of QAT devices to be able to load
    different firmware images to the device.

    Signed-off-by: Jack Xu
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add firmware control unit (FCU) CSRs to chip info so the firmware
    authentication code is common between all devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add support for CSS3K, which uses RSA3K as image signature algorithm,
    to support the next generation of QAT devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Use ae_mask to decide which Accelerator Engine (AE) to target in AE
    related operations, instead of a sequential loop, to skip AEs that are
    fused out.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add null pointer check when freeing the memory for firmware.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add misc control CSR to chip info since the CSR offset will be different
    in the next generation of QAT devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add the wake up event to chip info since this value will be different
    in the next generation of QAT devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add global clock enable CSR to the chip info since the CSR offset
    will be different in the next generation of QAT devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add reset CSR offset and mask to chip info since they are different
    in new QAT devices. This also simplifies the reset/clrReset functions
    by using the reset mask.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add the local memory size to the chip info since the size of this memory
    will be different in the next generation of QAT devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Add support for local memory lm2 and lm3 which is introduced in the next
    generation of QAT devices.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Introduce the next neighbor (NN) capability in chip_info as NN registers
    are not supported in certain SKUs of QAT.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Modify condition in qat_uclo_wr_mimage() to use a capability of the
    device (sram_visible), rather than the device ID, so the check is not
    specific to devices of the same type.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Introduce the chip info structure which contains device specific
    information. The initialization path has been split between common and
    hardware specific in order to facilitate the introduction of the next
    generation hardware.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Replace long expressions with local variables in the functions
    qat_uclo_wr_uimage_page(), qat_uclo_init_globals() and
    qat_uclo_init_umem_seg() to improve readability.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Refactor qat_uclo_set_ae_mode() by moving the logic that sets the AE
    modes to a separate function, qat_hal_set_modes().

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Move the definition of ICP_QAT_AE_OFFSET, ICP_QAT_CAP_OFFSET,
    LOCAL_TO_XFER_REG_OFFSET and ICP_QAT_EP_OFFSET from qat_hal.c to
    icp_qat_hal.h to avoid the definition of generation specific constants
    in qat_hal.c.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Include the offset of GLOBAL_CSR directly into the enum hal_global_csr
    and remove the macros SET_GLB_CSR/GET_GLB_CSR to simplify the global CSR
    access.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu
     
  • Change the API and the behaviour of the qat_hal_start() function.
    With this change, the function starts under the hood all acceleration
    engines (AEs) and there is no longer need to call it for each engine.

    Signed-off-by: Jack Xu
    Co-developed-by: Wojciech Ziemba
    Signed-off-by: Wojciech Ziemba
    Reviewed-by: Giovanni Cabiddu
    Signed-off-by: Herbert Xu

    Jack Xu