23 Jan, 2017

1 commit

  • A lot of asm-optimized routines in arch/x86/crypto/ keep its
    constants in .data. This is wrong, they should be on .rodata.

    Mnay of these constants are the same in different modules.
    For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
    exists in at least half a dozen places.

    There is a way to let linker merge them and use just one copy.
    The rules are as follows: mergeable objects of different sizes
    should not share sections. You can't put them all in one .rodata
    section, they will lose "mergeability".

    GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
    or ".rodata.cstSIZE." if -fdata-sections is used.
    This patch does the same:

    .section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16

    It is important that all data in such section consists of
    16-byte elements, not larger ones, and there are no implicit
    use of one element from another.

    When this is not the case, use non-mergeable section:

    .section .rodata[.VAR_NAME], "a", @progbits

    This reduces .data by ~15 kbytes:

    text data bss dec hex filename
    11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o
    11112095 2690672 2630712 16433479 fac147 vmlinux.o

    Merged objects are visible in System.map:

    ffffffff81a28810 r POLY
    ffffffff81a28810 r POLY
    ffffffff81a28820 r TWOONE
    ffffffff81a28820 r TWOONE
    ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK
    CC: Herbert Xu
    CC: Josh Poimboeuf
    CC: Xiaodong Liu
    CC: Megha Dey
    CC: linux-crypto@vger.kernel.org
    CC: x86@kernel.org
    CC: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Denys Vlasenko
     

24 Feb, 2016

1 commit

  • The crypto code has several callable non-leaf functions which don't
    honor CONFIG_FRAME_POINTER, which can result in bad stack traces.

    Create stack frames for them when CONFIG_FRAME_POINTER is enabled.

    Signed-off-by: Josh Poimboeuf
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Bernd Petrovitsch
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Chris J Arges
    Cc: David S. Miller
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Herbert Xu
    Cc: Jiri Slaby
    Cc: Linus Torvalds
    Cc: Michal Marek
    Cc: Namhyung Kim
    Cc: Pedro Alves
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

04 Apr, 2014

1 commit


01 Apr, 2014

1 commit

  • The GHASH setkey() function uses SSE registers but fails to call
    kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
    then having to deal with the restriction that they cannot be called from
    interrupt context, move the setkey() implementation to the C domain.

    Note that setkey() does not use any particular SSE features and is not
    expected to become a performance bottleneck.

    Signed-off-by: Ard Biesheuvel
    Acked-by: H. Peter Anvin
    Fixes: 0e1227d356e9b (crypto: ghash - Add PCLMULQDQ accelerated implementation)
    Signed-off-by: Herbert Xu

    Ard Biesheuvel
     

20 Jan, 2013

1 commit


23 Nov, 2009

2 commits


03 Nov, 2009

1 commit


02 Nov, 2009

1 commit


19 Oct, 2009

1 commit

  • PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
    carry-less multiplication. More information about PCLMULQDQ can be
    found at:

    http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/

    Because PCLMULQDQ changes XMM state, its usage must be enclosed with
    kernel_fpu_begin/end, which can be used only in process context, the
    acceleration is implemented as crypto_ahash. That is, request in soft
    IRQ context will be defered to the cryptd kernel thread.

    Signed-off-by: Huang Ying
    Signed-off-by: Herbert Xu

    Huang Ying