13 Feb, 2019

1 commit

  • [ Upstream commit 0a6a40c2a8c184a2fb467efacfb1cd338d719e0b ]

    In the "aes-fixed-time" AES implementation, disable interrupts while
    accessing the S-box, in order to make cache-timing attacks more
    difficult. Previously it was possible for the CPU to be interrupted
    while the S-box was loaded into L1 cache, potentially evicting the
    cachelines and causing later table lookups to be time-variant.

    In tests I did on x86 and ARM, this doesn't affect performance
    significantly. Responsiveness is potentially a concern, but interrupts
    are only disabled for a single AES block.

    Note that even after this change, the implementation still isn't
    necessarily guaranteed to be constant-time; see
    https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
    of the many difficulties involved in writing truly constant-time AES
    software. But it's valuable to make such attacks more difficult.

    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu
    Signed-off-by: Sasha Levin

    Eric Biggers
     

19 Jun, 2017

1 commit


11 Feb, 2017

1 commit

  • Lookup table based AES is sensitive to timing attacks, which is due to
    the fact that such table lookups are data dependent, and the fact that
    8 KB worth of tables covers a significant number of cachelines on any
    architecture, resulting in an exploitable correlation between the key
    and the processing time for known plaintexts.

    For network facing algorithms such as CTR, CCM or GCM, this presents a
    security risk, which is why arch specific AES ports are typically time
    invariant, either through the use of special instructions, or by using
    SIMD algorithms that don't rely on table lookups.

    For generic code, this is difficult to achieve without losing too much
    performance, but we can improve the situation significantly by switching
    to an implementation that only needs 256 bytes of table data (the actual
    S-box itself), which can be prefetched at the start of each block to
    eliminate data dependent latencies.

    This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
    ordinary generic AES driver manages 18 cycles per byte on this
    hardware). Decryption is substantially slower.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Herbert Xu

    Ard Biesheuvel