18 Oct, 2019

2 commits

  • These are all functions which are invoked from elsewhere, so annotate
    them as global using the new SYM_FUNC_START and their ENDPROC's by
    SYM_FUNC_END.

    Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the
    last users.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Borislav Petkov
    Reviewed-by: Rafael J. Wysocki [hibernate]
    Reviewed-by: Boris Ostrovsky [xen bits]
    Acked-by: Herbert Xu [crypto]
    Cc: Allison Randal
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Andy Shevchenko
    Cc: Ard Biesheuvel
    Cc: Armijn Hemel
    Cc: Cao jin
    Cc: Darren Hart
    Cc: Dave Hansen
    Cc: "David S. Miller"
    Cc: Enrico Weigelt
    Cc: Greg Kroah-Hartman
    Cc: Herbert Xu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jim Mattson
    Cc: Joerg Roedel
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kate Stewart
    Cc: "Kirill A. Shutemov"
    Cc: kvm ML
    Cc: Len Brown
    Cc: linux-arch@vger.kernel.org
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-efi
    Cc: linux-efi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: Mark Rutland
    Cc: Matt Fleming
    Cc: Paolo Bonzini
    Cc: Pavel Machek
    Cc: Peter Zijlstra
    Cc: platform-driver-x86@vger.kernel.org
    Cc: "Radim Krčmář"
    Cc: Sean Christopherson
    Cc: Stefano Stabellini
    Cc: "Steven Rostedt (VMware)"
    Cc: Thomas Gleixner
    Cc: Vitaly Kuznetsov
    Cc: Wanpeng Li
    Cc: Wei Huang
    Cc: x86-ml
    Cc: xen-devel@lists.xenproject.org
    Cc: Xiaoyao Li
    Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz

    Jiri Slaby
     
  • Use the newly added SYM_FUNC_START_LOCAL to annotate beginnings of all
    functions which do not have ".globl" annotation, but their endings are
    annotated by ENDPROC. This is needed to balance ENDPROC for tools that
    generate debuginfo.

    These function names are not prepended with ".L" as they might appear in
    call traces and they wouldn't be visible after such change.

    To be symmetric, the functions' ENDPROCs are converted to the new
    SYM_FUNC_END.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Borislav Petkov
    Cc: "David S. Miller"
    Cc: Herbert Xu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: linux-arch@vger.kernel.org
    Cc: linux-crypto@vger.kernel.org
    Cc: Thomas Gleixner
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20191011115108.12392-7-jslaby@suse.cz

    Jiri Slaby
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details you
    should have received a copy of the gnu general public license along
    with this program if not write to the free software foundation inc
    59 temple place suite 330 boston ma 02111 1307 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1334 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

20 Sep, 2017

1 commit

  • Using RBP as a temporary register breaks frame pointer convention and
    breaks stack traces when unwinding from an interrupt in the crypto code.

    Use R13 instead of RBP. Both are callee-saved registers, so the
    substitution is straightforward.

    Reported-by: Eric Biggers
    Reported-by: Peter Zijlstra
    Tested-by: Eric Biggers
    Acked-by: Eric Biggers
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Herbert Xu

    Josh Poimboeuf
     

23 Jan, 2017

1 commit

  • A lot of asm-optimized routines in arch/x86/crypto/ keep its
    constants in .data. This is wrong, they should be on .rodata.

    Mnay of these constants are the same in different modules.
    For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
    exists in at least half a dozen places.

    There is a way to let linker merge them and use just one copy.
    The rules are as follows: mergeable objects of different sizes
    should not share sections. You can't put them all in one .rodata
    section, they will lose "mergeability".

    GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
    or ".rodata.cstSIZE." if -fdata-sections is used.
    This patch does the same:

    .section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16

    It is important that all data in such section consists of
    16-byte elements, not larger ones, and there are no implicit
    use of one element from another.

    When this is not the case, use non-mergeable section:

    .section .rodata[.VAR_NAME], "a", @progbits

    This reduces .data by ~15 kbytes:

    text data bss dec hex filename
    11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o
    11112095 2690672 2630712 16433479 fac147 vmlinux.o

    Merged objects are visible in System.map:

    ffffffff81a28810 r POLY
    ffffffff81a28810 r POLY
    ffffffff81a28820 r TWOONE
    ffffffff81a28820 r TWOONE
    ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK
    CC: Herbert Xu
    CC: Josh Poimboeuf
    CC: Xiaodong Liu
    CC: Megha Dey
    CC: linux-crypto@vger.kernel.org
    CC: x86@kernel.org
    CC: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Denys Vlasenko
     

24 Feb, 2016

1 commit

  • The crypto code has several callable non-leaf functions which don't
    honor CONFIG_FRAME_POINTER, which can result in bad stack traces.

    Create stack frames for them when CONFIG_FRAME_POINTER is enabled.

    Signed-off-by: Josh Poimboeuf
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Bernd Petrovitsch
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Chris J Arges
    Cc: David S. Miller
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Herbert Xu
    Cc: Jiri Slaby
    Cc: Linus Torvalds
    Cc: Michal Marek
    Cc: Namhyung Kim
    Cc: Pedro Alves
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

25 Apr, 2013

1 commit

  • Change twofish-avx to use the new XTS code, for smaller stack usage and small
    boost to performance.

    tcrypt results, with Intel i5-2450M:
    enc dec
    16B 1.03x 1.02x
    64B 0.91x 0.91x
    256B 1.10x 1.09x
    1024B 1.12x 1.11x
    8192B 1.12x 1.11x

    Since XTS is practically always used with data blocks of size 512 bytes or
    more, I chose to not make use of twofish-3way for block sized smaller than
    128 bytes. This causes slower result in tcrypt for 64 bytes.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

20 Jan, 2013

1 commit


24 Oct, 2012

1 commit

  • Introduce new assembler functions to avoid use temporary stack buffers in glue
    code. This also allows use of vector instructions for xoring output in CTR and
    CBC modes and construction of IVs for CTR mode.

    ECB mode sees ~0.2% decrease in speed because added one extra function
    call. CBC mode decryption and CTR mode benefit from vector operations
    and gain ~3%.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

07 Sep, 2012

1 commit

  • Patch replaces 'movb' instructions with 'movzbl' to break false register
    dependencies and interleaves instructions better for out-of-order scheduling.

    Tested on Intel Core i5-2450M and AMD FX-8100.

    tcrypt ECB results:

    Intel Core i5-2450M:

    size old-vs-new new-vs-3way old-vs-3way
    enc dec enc dec enc dec
    256 1.12x 1.13x 1.36x 1.37x 1.21x 1.22x
    1k 1.14x 1.14x 1.48x 1.49x 1.29x 1.31x
    8k 1.14x 1.14x 1.50x 1.52x 1.32x 1.33x

    AMD FX-8100:

    size old-vs-new new-vs-3way old-vs-3way
    enc dec enc dec enc dec
    256 1.10x 1.11x 1.01x 1.01x 0.92x 0.91x
    1k 1.11x 1.12x 1.08x 1.07x 0.97x 0.96x
    8k 1.11x 1.13x 1.10x 1.08x 0.99x 0.97x

    [v2]
    - Do instruction interleaving another way to avoid adding new FPUCPU
    register moves as these cause performance drop on Bulldozer.
    - Further interleaving improvements for better out-of-order scheduling.

    Tested-by: Borislav Petkov
    Cc: Johannes Goetzfried
    Signed-off-by: Jussi Kivilinna
    Signed-off-by: Herbert Xu

    Jussi Kivilinna
     

11 Jul, 2012

1 commit


12 Jun, 2012

1 commit

  • This patch adds a x86_64/avx assembler implementation of the Twofish block
    cipher. The implementation processes eight blocks in parallel (two 4 block
    chunk AVX operations). The table-lookups are done in general-purpose registers.
    For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
    module are called. A good performance increase is provided for blocksizes
    greater or equal to 128B.

    Patch has been tested with tcrypt and automated filesystem tests.

    Tcrypt benchmark results:

    Intel Core i5-2500 CPU (fam:6, model:42, step:7)

    twofish-avx-x86_64 vs. twofish-x86_64-3way
    128bit key: (lrw:256bit) (xts:256bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 0.96x 0.97x 1.00x 0.95x 0.97x 0.97x 0.96x 0.95x 0.95x 0.98x
    64B 0.99x 0.99x 1.00x 0.99x 0.98x 0.98x 0.99x 0.98x 0.99x 0.98x
    256B 1.20x 1.21x 1.00x 1.19x 1.15x 1.14x 1.19x 1.20x 1.18x 1.19x
    1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.24x 1.26x 1.28x 1.26x 1.27x
    8192B 1.31x 1.32x 1.00x 1.31x 1.25x 1.25x 1.28x 1.29x 1.28x 1.30x

    256bit key: (lrw:384bit) (xts:512bit)
    size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
    16B 0.96x 0.96x 1.00x 0.96x 0.97x 0.98x 0.95x 0.95x 0.95x 0.96x
    64B 1.00x 0.99x 1.00x 0.98x 0.98x 1.01x 0.98x 0.98x 0.98x 0.98x
    256B 1.20x 1.21x 1.00x 1.21x 1.15x 1.15x 1.19x 1.20x 1.18x 1.19x
    1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.23x 1.26x 1.27x 1.26x 1.27x
    8192B 1.31x 1.33x 1.00x 1.31x 1.26x 1.26x 1.29x 1.29x 1.28x 1.30x

    twofish-avx-x86_64 vs aes-asm (8kB block):
    128bit 256bit
    ecb-enc 1.19x 1.63x
    ecb-dec 1.18x 1.62x
    cbc-enc 0.75x 1.03x
    cbc-dec 1.23x 1.67x
    ctr-enc 1.24x 1.65x
    ctr-dec 1.24x 1.65x
    lrw-enc 1.15x 1.53x
    lrw-dec 1.14x 1.52x
    xts-enc 1.16x 1.56x
    xts-dec 1.16x 1.56x

    Signed-off-by: Johannes Goetzfried
    Signed-off-by: Herbert Xu

    Johannes Goetzfried