Eric Lee / smarc-fsl-linux-kernel

18 Oct, 2019

2 commits

6dcc5627f x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* ... Browse Code »

These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.

Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the
last users.

Signed-off-by: Jiri Slaby
Signed-off-by: Borislav Petkov
Reviewed-by: Rafael J. Wysocki [hibernate]
Reviewed-by: Boris Ostrovsky [xen bits]
Acked-by: Herbert Xu [crypto]
Cc: Allison Randal
Cc: Andrey Ryabinin
Cc: Andy Lutomirski
Cc: Andy Shevchenko
Cc: Ard Biesheuvel
Cc: Armijn Hemel
Cc: Cao jin
Cc: Darren Hart
Cc: Dave Hansen
Cc: "David S. Miller"
Cc: Enrico Weigelt
Cc: Greg Kroah-Hartman
Cc: Herbert Xu
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jim Mattson
Cc: Joerg Roedel
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Kate Stewart
Cc: "Kirill A. Shutemov"
Cc: kvm ML
Cc: Len Brown
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland
Cc: Matt Fleming
Cc: Paolo Bonzini
Cc: Pavel Machek
Cc: Peter Zijlstra
Cc: platform-driver-x86@vger.kernel.org
Cc: "Radim Krčmář"
Cc: Sean Christopherson
Cc: Stefano Stabellini
Cc: "Steven Rostedt (VMware)"
Cc: Thomas Gleixner
Cc: Vitaly Kuznetsov
Cc: Wanpeng Li
Cc: Wei Huang
Cc: x86-ml
Cc: xen-devel@lists.xenproject.org
Cc: Xiaoyao Li
Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz

Jiri Slaby
2019-10-18 17:58:33 +0800
74d8b90a8 x86/asm/crypto: Annotate local functions ... Browse Code »

Use the newly added SYM_FUNC_START_LOCAL to annotate beginnings of all
functions which do not have ".globl" annotation, but their endings are
annotated by ENDPROC. This is needed to balance ENDPROC for tools that
generate debuginfo.

These function names are not prepended with ".L" as they might appear in
call traces and they wouldn't be visible after such change.

To be symmetric, the functions' ENDPROCs are converted to the new
SYM_FUNC_END.

Signed-off-by: Jiri Slaby
Signed-off-by: Borislav Petkov
Cc: "David S. Miller"
Cc: Herbert Xu
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: Thomas Gleixner
Cc: x86-ml
Link: https://lkml.kernel.org/r/20191011115108.12392-7-jslaby@suse.cz

Jiri Slaby
2019-10-18 16:07:15 +0800

31 May, 2019

1 commit

1a59d1b8e treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1334 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Richard Fontana
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-31 02:26:35 +0800

20 Sep, 2017

1 commit

8f182f845 crypto: x86/twofish - Fix RBP usage ... Browse Code »

Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.

Use R13 instead of RBP. Both are callee-saved registers, so the
substitution is straightforward.

Reported-by: Eric Biggers
Reported-by: Peter Zijlstra
Tested-by: Eric Biggers
Acked-by: Eric Biggers
Signed-off-by: Josh Poimboeuf
Signed-off-by: Herbert Xu

Josh Poimboeuf
2017-09-20 17:42:38 +0800

23 Jan, 2017

1 commit

e183914af crypto: x86 - make constants readonly, allow linker to merge them ... Browse Code »

A lot of asm-optimized routines in arch/x86/crypto/ keep its
constants in .data. This is wrong, they should be on .rodata.

Mnay of these constants are the same in different modules.
For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
exists in at least half a dozen places.

There is a way to let linker merge them and use just one copy.
The rules are as follows: mergeable objects of different sizes
should not share sections. You can't put them all in one .rodata
section, they will lose "mergeability".

GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
or ".rodata.cstSIZE." if -fdata-sections is used.
This patch does the same:

.section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16

It is important that all data in such section consists of
16-byte elements, not larger ones, and there are no implicit
use of one element from another.

When this is not the case, use non-mergeable section:

.section .rodata[.VAR_NAME], "a", @progbits

This reduces .data by ~15 kbytes:

text data bss dec hex filename
11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o
11112095 2690672 2630712 16433479 fac147 vmlinux.o

Merged objects are visible in System.map:

ffffffff81a28810 r POLY
ffffffff81a28810 r POLY
ffffffff81a28820 r TWOONE
ffffffff81a28820 r TWOONE
ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK
CC: Herbert Xu
CC: Josh Poimboeuf
CC: Xiaodong Liu
CC: Megha Dey
CC: linux-crypto@vger.kernel.org
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu

Denys Vlasenko
2017-01-23 22:50:29 +0800

24 Feb, 2016

1 commit

8691ccd76 x86/asm/crypto: Create stack frames in crypto functions ... Browse Code »

The crypto code has several callable non-leaf functions which don't
honor CONFIG_FRAME_POINTER, which can result in bad stack traces.

Create stack frames for them when CONFIG_FRAME_POINTER is enabled.

Signed-off-by: Josh Poimboeuf
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Bernd Petrovitsch
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Chris J Arges
Cc: David S. Miller
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Herbert Xu
Cc: Jiri Slaby
Cc: Linus Torvalds
Cc: Michal Marek
Cc: Namhyung Kim
Cc: Pedro Alves
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2016-02-24 15:35:43 +0800

25 Apr, 2013

1 commit

18be45270 crypto: x86/twofish-avx - use optimized XTS code ... Browse Code »

Change twofish-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.03x 1.02x
64B 0.91x 0.91x
256B 1.10x 1.09x
1024B 1.12x 1.11x
8192B 1.12x 1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of twofish-3way for block sized smaller than
128 bytes. This causes slower result in tcrypt for 64 bytes.

Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu

Jussi Kivilinna
2013-04-25 21:01:51 +0800

20 Jan, 2013

1 commit

d3f5188df crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels ... Browse Code »

Signed-off-by: Jussi Kivilinna
Acked-by: David S. Miller
Signed-off-by: Herbert Xu

Jussi Kivilinna
2013-01-20 07:16:51 +0800

24 Oct, 2012

1 commit

8435a3c30 crypto: twofish/avx - avoid using temporary stack buffers ... Browse Code »

Introduce new assembler functions to avoid use temporary stack buffers in glue
code. This also allows use of vector instructions for xoring output in CTR and
CBC modes and construction of IVs for CTR mode.

ECB mode sees ~0.2% decrease in speed because added one extra function
call. CBC mode decryption and CTR mode benefit from vector operations
and gain ~3%.

Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu

Jussi Kivilinna
2012-10-24 21:10:55 +0800

07 Sep, 2012

1 commit

f94a73f8d crypto: twofish-avx - tune assembler code for more performance ... Browse Code »

Patch replaces 'movb' instructions with 'movzbl' to break false register
dependencies and interleaves instructions better for out-of-order scheduling.

Tested on Intel Core i5-2450M and AMD FX-8100.

tcrypt ECB results:

Intel Core i5-2450M:

size old-vs-new new-vs-3way old-vs-3way
enc dec enc dec enc dec
256 1.12x 1.13x 1.36x 1.37x 1.21x 1.22x
1k 1.14x 1.14x 1.48x 1.49x 1.29x 1.31x
8k 1.14x 1.14x 1.50x 1.52x 1.32x 1.33x

AMD FX-8100:

size old-vs-new new-vs-3way old-vs-3way
enc dec enc dec enc dec
256 1.10x 1.11x 1.01x 1.01x 0.92x 0.91x
1k 1.11x 1.12x 1.08x 1.07x 0.97x 0.96x
8k 1.11x 1.13x 1.10x 1.08x 0.99x 0.97x

[v2]
- Do instruction interleaving another way to avoid adding new FPUCPU
register moves as these cause performance drop on Bulldozer.
- Further interleaving improvements for better out-of-order scheduling.

Tested-by: Borislav Petkov
Cc: Johannes Goetzfried
Signed-off-by: Jussi Kivilinna
Signed-off-by: Herbert Xu

Jussi Kivilinna
2012-09-07 04:17:04 +0800

11 Jul, 2012

1 commit

a43478863 crypto: twofish-avx - remove useless instruction ... Browse Code »

The register %rdx is written, but never read till the end of the encryption
routine. Therefore let's delete the useless instruction.

Signed-off-by: Johannes Goetzfried
Signed-off-by: Herbert Xu

Johannes Goetzfried
2012-07-11 11:08:30 +0800

12 Jun, 2012

1 commit

107778b59 crypto: twofish - add x86_64/avx assembler implementation ... Browse Code »

This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
module are called. A good performance increase is provided for blocksizes
greater or equal to 128B.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

Intel Core i5-2500 CPU (fam:6, model:42, step:7)

twofish-avx-x86_64 vs. twofish-x86_64-3way
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.97x 1.00x 0.95x 0.97x 0.97x 0.96x 0.95x 0.95x 0.98x
64B 0.99x 0.99x 1.00x 0.99x 0.98x 0.98x 0.99x 0.98x 0.99x 0.98x
256B 1.20x 1.21x 1.00x 1.19x 1.15x 1.14x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.24x 1.26x 1.28x 1.26x 1.27x
8192B 1.31x 1.32x 1.00x 1.31x 1.25x 1.25x 1.28x 1.29x 1.28x 1.30x

256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.96x 1.00x 0.96x 0.97x 0.98x 0.95x 0.95x 0.95x 0.96x
64B 1.00x 0.99x 1.00x 0.98x 0.98x 1.01x 0.98x 0.98x 0.98x 0.98x
256B 1.20x 1.21x 1.00x 1.21x 1.15x 1.15x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.23x 1.26x 1.27x 1.26x 1.27x
8192B 1.31x 1.33x 1.00x 1.31x 1.26x 1.26x 1.29x 1.29x 1.28x 1.30x

twofish-avx-x86_64 vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.19x 1.63x
ecb-dec 1.18x 1.62x
cbc-enc 0.75x 1.03x
cbc-dec 1.23x 1.67x
ctr-enc 1.24x 1.65x
ctr-dec 1.24x 1.65x
lrw-enc 1.15x 1.53x
lrw-dec 1.14x 1.52x
xts-enc 1.16x 1.56x
xts-dec 1.16x 1.56x

Signed-off-by: Johannes Goetzfried
Signed-off-by: Herbert Xu

Johannes Goetzfried
2012-06-12 16:46:07 +0800