18 Oct, 2019
2 commits
-
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the
last users.Signed-off-by: Jiri Slaby
Signed-off-by: Borislav Petkov
Reviewed-by: Rafael J. Wysocki [hibernate]
Reviewed-by: Boris Ostrovsky [xen bits]
Acked-by: Herbert Xu [crypto]
Cc: Allison Randal
Cc: Andrey Ryabinin
Cc: Andy Lutomirski
Cc: Andy Shevchenko
Cc: Ard Biesheuvel
Cc: Armijn Hemel
Cc: Cao jin
Cc: Darren Hart
Cc: Dave Hansen
Cc: "David S. Miller"
Cc: Enrico Weigelt
Cc: Greg Kroah-Hartman
Cc: Herbert Xu
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jim Mattson
Cc: Joerg Roedel
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Kate Stewart
Cc: "Kirill A. Shutemov"
Cc: kvm ML
Cc: Len Brown
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland
Cc: Matt Fleming
Cc: Paolo Bonzini
Cc: Pavel Machek
Cc: Peter Zijlstra
Cc: platform-driver-x86@vger.kernel.org
Cc: "Radim Krčmář"
Cc: Sean Christopherson
Cc: Stefano Stabellini
Cc: "Steven Rostedt (VMware)"
Cc: Thomas Gleixner
Cc: Vitaly Kuznetsov
Cc: Wanpeng Li
Cc: Wei Huang
Cc: x86-ml
Cc: xen-devel@lists.xenproject.org
Cc: Xiaoyao Li
Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz -
Use the newly added SYM_FUNC_START_LOCAL to annotate beginnings of all
functions which do not have ".globl" annotation, but their endings are
annotated by ENDPROC. This is needed to balance ENDPROC for tools that
generate debuginfo.These function names are not prepended with ".L" as they might appear in
call traces and they wouldn't be visible after such change.To be symmetric, the functions' ENDPROCs are converted to the new
SYM_FUNC_END.Signed-off-by: Jiri Slaby
Signed-off-by: Borislav Petkov
Cc: "David S. Miller"
Cc: Herbert Xu
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: Thomas Gleixner
Cc: x86-ml
Link: https://lkml.kernel.org/r/20191011115108.12392-7-jslaby@suse.cz
31 May, 2019
1 commit
-
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usaextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1334 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Richard Fontana
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman
20 Sep, 2017
1 commit
-
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.Use R13 instead of RBP. Both are callee-saved registers, so the
substitution is straightforward.Reported-by: Eric Biggers
Reported-by: Peter Zijlstra
Tested-by: Eric Biggers
Acked-by: Eric Biggers
Signed-off-by: Josh Poimboeuf
Signed-off-by: Herbert Xu
23 Jan, 2017
1 commit
-
A lot of asm-optimized routines in arch/x86/crypto/ keep its
constants in .data. This is wrong, they should be on .rodata.Mnay of these constants are the same in different modules.
For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
exists in at least half a dozen places.There is a way to let linker merge them and use just one copy.
The rules are as follows: mergeable objects of different sizes
should not share sections. You can't put them all in one .rodata
section, they will lose "mergeability".GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
or ".rodata.cstSIZE." if -fdata-sections is used.
This patch does the same:.section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16
It is important that all data in such section consists of
16-byte elements, not larger ones, and there are no implicit
use of one element from another.When this is not the case, use non-mergeable section:
.section .rodata[.VAR_NAME], "a", @progbits
This reduces .data by ~15 kbytes:
text data bss dec hex filename
11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o
11112095 2690672 2630712 16433479 fac147 vmlinux.oMerged objects are visible in System.map:
ffffffff81a28810 r POLY
ffffffff81a28810 r POLY
ffffffff81a28820 r TWOONE
ffffffff81a28820 r TWOONE
ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK
CC: Herbert Xu
CC: Josh Poimboeuf
CC: Xiaodong Liu
CC: Megha Dey
CC: linux-crypto@vger.kernel.org
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu
24 Feb, 2016
1 commit
-
The crypto code has several callable non-leaf functions which don't
honor CONFIG_FRAME_POINTER, which can result in bad stack traces.Create stack frames for them when CONFIG_FRAME_POINTER is enabled.
Signed-off-by: Josh Poimboeuf
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Bernd Petrovitsch
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Chris J Arges
Cc: David S. Miller
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Herbert Xu
Cc: Jiri Slaby
Cc: Linus Torvalds
Cc: Michal Marek
Cc: Namhyung Kim
Cc: Pedro Alves
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar
25 Apr, 2013
1 commit
-
Change twofish-avx to use the new XTS code, for smaller stack usage and small
boost to performance.tcrypt results, with Intel i5-2450M:
enc dec
16B 1.03x 1.02x
64B 0.91x 0.91x
256B 1.10x 1.09x
1024B 1.12x 1.11x
8192B 1.12x 1.11xSince XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of twofish-3way for block sized smaller than
128 bytes. This causes slower result in tcrypt for 64 bytes.Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu
20 Jan, 2013
1 commit
-
Signed-off-by: Jussi Kivilinna
Acked-by: David S. Miller
Signed-off-by: Herbert Xu
24 Oct, 2012
1 commit
-
Introduce new assembler functions to avoid use temporary stack buffers in glue
code. This also allows use of vector instructions for xoring output in CTR and
CBC modes and construction of IVs for CTR mode.ECB mode sees ~0.2% decrease in speed because added one extra function
call. CBC mode decryption and CTR mode benefit from vector operations
and gain ~3%.Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu
07 Sep, 2012
1 commit
-
Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies and interleaves instructions better for out-of-order scheduling.Tested on Intel Core i5-2450M and AMD FX-8100.
tcrypt ECB results:
Intel Core i5-2450M:
size old-vs-new new-vs-3way old-vs-3way
enc dec enc dec enc dec
256 1.12x 1.13x 1.36x 1.37x 1.21x 1.22x
1k 1.14x 1.14x 1.48x 1.49x 1.29x 1.31x
8k 1.14x 1.14x 1.50x 1.52x 1.32x 1.33xAMD FX-8100:
size old-vs-new new-vs-3way old-vs-3way
enc dec enc dec enc dec
256 1.10x 1.11x 1.01x 1.01x 0.92x 0.91x
1k 1.11x 1.12x 1.08x 1.07x 0.97x 0.96x
8k 1.11x 1.13x 1.10x 1.08x 0.99x 0.97x[v2]
- Do instruction interleaving another way to avoid adding new FPUCPU
register moves as these cause performance drop on Bulldozer.
- Further interleaving improvements for better out-of-order scheduling.Tested-by: Borislav Petkov
Cc: Johannes Goetzfried
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu
11 Jul, 2012
1 commit
-
The register %rdx is written, but never read till the end of the encryption
routine. Therefore let's delete the useless instruction.Signed-off-by: Johannes Goetzfried
Signed-off-by: Herbert Xu
12 Jun, 2012
1 commit
-
This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
module are called. A good performance increase is provided for blocksizes
greater or equal to 128B.Patch has been tested with tcrypt and automated filesystem tests.
Tcrypt benchmark results:
Intel Core i5-2500 CPU (fam:6, model:42, step:7)
twofish-avx-x86_64 vs. twofish-x86_64-3way
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.97x 1.00x 0.95x 0.97x 0.97x 0.96x 0.95x 0.95x 0.98x
64B 0.99x 0.99x 1.00x 0.99x 0.98x 0.98x 0.99x 0.98x 0.99x 0.98x
256B 1.20x 1.21x 1.00x 1.19x 1.15x 1.14x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.24x 1.26x 1.28x 1.26x 1.27x
8192B 1.31x 1.32x 1.00x 1.31x 1.25x 1.25x 1.28x 1.29x 1.28x 1.30x256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.96x 1.00x 0.96x 0.97x 0.98x 0.95x 0.95x 0.95x 0.96x
64B 1.00x 0.99x 1.00x 0.98x 0.98x 1.01x 0.98x 0.98x 0.98x 0.98x
256B 1.20x 1.21x 1.00x 1.21x 1.15x 1.15x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.23x 1.26x 1.27x 1.26x 1.27x
8192B 1.31x 1.33x 1.00x 1.31x 1.26x 1.26x 1.29x 1.29x 1.28x 1.30xtwofish-avx-x86_64 vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.19x 1.63x
ecb-dec 1.18x 1.62x
cbc-enc 0.75x 1.03x
cbc-dec 1.23x 1.67x
ctr-enc 1.24x 1.65x
ctr-dec 1.24x 1.65x
lrw-enc 1.15x 1.53x
lrw-dec 1.14x 1.52x
xts-enc 1.16x 1.56x
xts-dec 1.16x 1.56xSigned-off-by: Johannes Goetzfried
Signed-off-by: Herbert Xu