Eric Lee / smarc-fsl-linux-kernel

18 Apr, 2019

2 commits

0d7a78643 crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm ... Browse Code »

Add Elliptic Curve Russian Digital Signature Algorithm (GOST R
34.10-2012, RFC 7091, ISO/IEC 14888-3) is one of the Russian (and since
2018 the CIS countries) cryptographic standard algorithms (called GOST
algorithms). Only signature verification is supported, with intent to be
used in the IMA.

Summary of the changes:

* crypto/Kconfig:
- EC-RDSA is added into Public-key cryptography section.

* crypto/Makefile:
- ecrdsa objects are added.

* crypto/asymmetric_keys/x509_cert_parser.c:
- Recognize EC-RDSA and Streebog OIDs.

* include/linux/oid_registry.h:
- EC-RDSA OIDs are added to the enum. Also, a two currently not
implemented curve OIDs are added for possible extension later (to
not change numbering and grouping).

* crypto/ecc.c:
- Kenneth MacKay copyright date is updated to 2014, because
vli_mmod_slow, ecc_point_add, ecc_point_mult_shamir are based on his
code from micro-ecc.
- Functions needed for ecrdsa are EXPORT_SYMBOL'ed.
- New functions:
vli_is_negative - helper to determine sign of vli;
vli_from_be64 - unpack big-endian array into vli (used for
a signature);
vli_from_le64 - unpack little-endian array into vli (used for
a public key);
vli_uadd, vli_usub - add/sub u64 value to/from vli (used for
increment/decrement);
mul_64_64 - optimized to use __int128 where appropriate, this speeds
up point multiplication (and as a consequence signature
verification) by the factor of 1.5-2;
vli_umult - multiply vli by a small value (speeds up point
multiplication by another factor of 1.5-2, depending on vli sizes);
vli_mmod_special - module reduction for some form of Pseudo-Mersenne
primes (used for the curves A);
vli_mmod_special2 - module reduction for another form of
Pseudo-Mersenne primes (used for the curves B);
vli_mmod_barrett - module reduction using pre-computed value (used
for the curve C);
vli_mmod_slow - more general module reduction which is much slower
(used when the modulus is subgroup order);
vli_mod_mult_slow - modular multiplication;
ecc_point_add - add two points;
ecc_point_mult_shamir - add two points multiplied by scalars in one
combined multiplication (this gives speed up by another factor 2 in
compare to two separate multiplications).
ecc_is_pubkey_valid_partial - additional samity check is added.
- Updated vli_mmod_fast with non-strict heuristic to call optimal
module reduction function depending on the prime value;
- All computations for the previously defined (two NIST) curves should
not unaffected.

* crypto/ecc.h:
- Newly exported functions are documented.

* crypto/ecrdsa_defs.h
- Five curves are defined.

* crypto/ecrdsa.c:
- Signature verification is implemented.

* crypto/ecrdsa_params.asn1, crypto/ecrdsa_pub_key.asn1:
- Templates for BER decoder for EC-RDSA parameters and public key.

Cc: linux-integrity@vger.kernel.org
Signed-off-by: Vitaly Chikunov
Signed-off-by: Herbert Xu

Vitaly Chikunov
2019-04-18 22:15:02 +0800
4a2289dae crypto: ecc - make ecc into separate module ... Browse Code »

ecc.c have algorithms that could be used togeter by ecdh and ecrdsa.
Make it separate module. Add CRYPTO_ECC into Kconfig. EXPORT_SYMBOL and
document to what seems appropriate. Move structs ecc_point and ecc_curve
from ecc_curve_defs.h into ecc.h.

No code changes.

Signed-off-by: Vitaly Chikunov
Signed-off-by: Herbert Xu

Vitaly Chikunov
2019-04-18 22:15:02 +0800

08 Mar, 2019

1 commit

45ec975ef lib/lzo: separate lzo-rle from lzo ... Browse Code »

To prevent any issues with persistent data, separate lzo-rle from lzo so
that it is treated as a separate algorithm, and lzo is still available.

Link: http://lkml.kernel.org/r/20190205155944.16007-3-dave.rodgman@arm.com
Signed-off-by: Dave Rodgman
Cc: David S. Miller
Cc: Greg Kroah-Hartman
Cc: Herbert Xu
Cc: Markus F.X.J. Oberhumer
Cc: Matt Sealey
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Richard Purdie
Cc: Sergey Senozhatsky
Cc: Sonny Rao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Rodgman
2019-03-08 10:32:03 +0800

07 Dec, 2018

1 commit

2ced26078 crypto: user - made crypto_user_stat optional ... Browse Code »

Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are
compiled.
This patch made all part of crypto_user_stat uncompiled in that case.

Signed-off-by: Corentin Labbe
Signed-off-by: Herbert Xu

Corentin Labbe
2018-12-07 14:15:00 +0800

20 Nov, 2018

3 commits

059c2a4d8 crypto: adiantum - add Adiantum support ... Browse Code »

Add support for the Adiantum encryption mode. Adiantum was designed by
Paul Crowley and is specified by our paper:

Adiantum: length-preserving encryption for entry-level processors
(https://eprint.iacr.org/2018/720.pdf)

See our paper for full details; this patch only provides an overview.

Adiantum is a tweakable, length-preserving encryption mode designed for
fast and secure disk encryption, especially on CPUs without dedicated
crypto instructions. Adiantum encrypts each sector using the XChaCha12
stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash
function, and an invocation of the AES-256 block cipher on a single
16-byte block. On CPUs without AES instructions, Adiantum is much
faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors
Adiantum encryption is about 4 times faster than AES-256-XTS encryption,
and decryption about 5 times faster.

Adiantum is a specialization of the more general HBSH construction. Our
earlier proposal, HPolyC, was also a HBSH specialization, but it used a
different εA∆U hash function, one based on Poly1305 only. Adiantum's
εA∆U hash function, which is based primarily on the "NH" hash function
like that used in UMAC (RFC4418), is about twice as fast as HPolyC's;
consequently, Adiantum is about 20% faster than HPolyC.

This speed comes with no loss of security: Adiantum is provably just as
secure as HPolyC, in fact slightly *more* secure. Like HPolyC,
Adiantum's security is reducible to that of XChaCha12 and AES-256,
subject to a security bound. XChaCha12 itself has a security reduction
to ChaCha12. Therefore, one need not "trust" Adiantum; one need only
trust ChaCha12 and AES-256. Note that the εA∆U hash function is only
used for its proven combinatorical properties so cannot be "broken".

Adiantum is also a true wide-block encryption mode, so flipping any
plaintext bit in the sector scrambles the entire ciphertext, and vice
versa. No other such mode is available in the kernel currently; doing
the same with XTS scrambles only 16 bytes. Adiantum also supports
arbitrary-length tweaks and naturally supports any length input >= 16
bytes without needing "ciphertext stealing".

For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in
order to make encryption feasible on the widest range of devices.
Although the 20-round variant is quite popular, the best known attacks
on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial
security margin; in fact, larger than AES-256's. 12-round Salsa20 is
also the eSTREAM recommendation. For the block cipher, Adiantum uses
AES-256, despite it having a lower security margin than XChaCha12 and
needing table lookups, due to AES's extensive adoption and analysis
making it the obvious first choice. Nevertheless, for flexibility this
patch also permits the "adiantum" template to be instantiated with
XChaCha20 and/or with an alternate block cipher.

We need Adiantum support in the kernel for use in dm-crypt and fscrypt,
where currently the only other suitable options are block cipher modes
such as AES-XTS. A big problem with this is that many low-end mobile
devices (e.g. Android Go phones sold primarily in developing countries,
as well as some smartwatches) still have CPUs that lack AES
instructions, e.g. ARM Cortex-A7. Sadly, AES-XTS encryption is much too
slow to be viable on these devices. We did find that some "lightweight"
block ciphers are fast enough, but these suffer from problems such as
not having much cryptanalysis or being too controversial.

The ChaCha stream cipher has excellent performance but is insecure to
use directly for disk encryption, since each sector's IV is reused each
time it is overwritten. Even restricting the threat model to offline
attacks only isn't enough, since modern flash storage devices don't
guarantee that "overwrites" are really overwrites, due to wear-leveling.
Adiantum avoids this problem by constructing a
"tweakable super-pseudorandom permutation"; this is the strongest
possible security model for length-preserving encryption.

Of course, storing random nonces along with the ciphertext would be the
ideal solution. But doing that with existing hardware and filesystems
runs into major practical problems; in most cases it would require data
journaling (like dm-integrity) which severely degrades performance.
Thus, for now length-preserving encryption is still needed.

Signed-off-by: Eric Biggers
Reviewed-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Eric Biggers
2018-11-20 14:26:56 +0800
26609a21a crypto: nhpoly1305 - add NHPoly1305 support ... Browse Code »

Add a generic implementation of NHPoly1305, an ε-almost-∆-universal hash
function used in the Adiantum encryption mode.

CONFIG_NHPOLY1305 is not selectable by itself since there won't be any
real reason to enable it without also enabling Adiantum support.

Signed-off-by: Eric Biggers
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Eric Biggers
2018-11-20 14:26:56 +0800
1ca1b9179 crypto: chacha20-generic - refactor to allow varying number of rounds ... Browse Code »

In preparation for adding XChaCha12 support, rename/refactor
chacha20-generic to support different numbers of rounds. The
justification for needing XChaCha12 support is explained in more detail
in the patch "crypto: chacha - add XChaCha12 support".

The only difference between ChaCha{8,12,20} are the number of rounds
itself; all other parts of the algorithm are the same. Therefore,
remove the "20" from all definitions, structures, functions, files, etc.
that will be shared by all ChaCha versions.

Also make ->setkey() store the round count in the chacha_ctx (previously
chacha20_ctx). The generic code then passes the round count through to
chacha_block(). There will be a ->setkey() function for each explicitly
allowed round count; the encrypt/decrypt functions will be the same. I
decided not to do it the opposite way (same ->setkey() function for all
round counts, with different encrypt/decrypt functions) because that
would have required more boilerplate code in architecture-specific
implementations of ChaCha and XChaCha.

Reviewed-by: Ard Biesheuvel
Acked-by: Martin Willi
Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu

Eric Biggers
2018-11-20 14:26:55 +0800

16 Nov, 2018

1 commit

fe18957e8 crypto: streebog - add Streebog hash function ... Browse Code »

Add GOST/IETF Streebog hash function (GOST R 34.11-2012, RFC 6986)
generic hash transformation.

Cc: linux-integrity@vger.kernel.org
Signed-off-by: Vitaly Chikunov
Reviewed-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Vitaly Chikunov
2018-11-16 14:09:40 +0800

28 Sep, 2018

2 commits

e497c5189 crypto: ofb - add output feedback mode ... Browse Code »

Add a generic version of output feedback mode. We already have support of
several hardware based transformations of this mode and the needed test
vectors but we somehow missed adding a generic software one. Fix this now.

Signed-off-by: Gilad Ben-Yossef
Signed-off-by: Herbert Xu

Gilad Ben-Yossef
2018-09-28 12:46:26 +0800
cac5818c2 crypto: user - Implement a generic crypto statistics ... Browse Code »

This patch implement a generic way to get statistics about all crypto
usages.

Signed-off-by: Corentin Labbe
Signed-off-by: Herbert Xu

Corentin Labbe
2018-09-28 12:46:25 +0800

04 Sep, 2018

2 commits

ab8085c13 crypto: x86 - remove SHA multibuffer routines and mcryptd ... Browse Code »

As it turns out, the AVX2 multibuffer SHA routines are currently
broken [0], in a way that would have likely been noticed if this
code were in wide use. Since the code is too complicated to be
maintained by anyone except the original authors, and since the
performance benefits for real-world use cases are debatable to
begin with, it is better to drop it entirely for the moment.

[0] https://marc.info/?l=linux-crypto-vger&m=153476243825350&w=2

Suggested-by: Eric Biggers
Cc: Megha Dey
Cc: Tim Chen
Cc: Geert Uytterhoeven
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2018-09-04 11:37:04 +0800
578bdaabd crypto: speck - remove Speck ... Browse Code »

These are unused, undesired, and have never actually been used by
anybody. The original authors of this code have changed their mind about
its inclusion. While originally proposed for disk encryption on low-end
devices, the idea was discarded [1] in favor of something else before
that could really get going. Therefore, this patch removes Speck.

[1] https://marc.info/?l=linux-crypto-vger&m=153359499015659

Signed-off-by: Jason A. Donenfeld
Acked-by: Eric Biggers
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Jason A. Donenfeld
2018-09-04 11:35:03 +0800

31 May, 2018

1 commit

2808f1731 crypto: morus - Mark MORUS SIMD glue as x86-specific ... Browse Code »

Commit 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for
MORUS") accidetally consiedered the glue code to be usable by different
architectures, but it seems to be only usable on x86.

This patch moves it under arch/x86/crypto and adds 'depends on X86' to
the Kconfig options and also removes the prompt to hide these internal
options from the user.

Reported-by: kbuild test robot
Signed-off-by: Ondrej Mosnacek
Signed-off-by: Herbert Xu

Ondrej Mosnacek
2018-05-31 00:13:41 +0800

19 May, 2018

3 commits

56e8e57fc crypto: morus - Add common SIMD glue code for MORUS ... Browse Code »

This patch adds a common glue code for optimized implementations of
MORUS AEAD algorithms.

Signed-off-by: Ondrej Mosnacek
Signed-off-by: Herbert Xu

Ondrej Mosnacek
2018-05-19 00:15:18 +0800
396be41f1 crypto: morus - Add generic MORUS AEAD implementations ... Browse Code »

This patch adds the generic implementation of the MORUS family of AEAD
algorithms (MORUS-640 and MORUS-1280). The original authors of MORUS
are Hongjun Wu and Tao Huang.

At the time of writing, MORUS is one of the finalists in CAESAR, an
open competition intended to select a portfolio of alternatives to
the problematic AES-GCM:

https://competitions.cr.yp.to/caesar-submissions.html
https://competitions.cr.yp.to/round3/morusv2.pdf

Signed-off-by: Ondrej Mosnacek
Signed-off-by: Herbert Xu

Ondrej Mosnacek
2018-05-19 00:15:00 +0800
f606a88e5 crypto: aegis - Add generic AEGIS AEAD implementations ... Browse Code »

This patch adds the generic implementation of the AEGIS family of AEAD
algorithms (AEGIS-128, AEGIS-128L, and AEGIS-256). The original
authors of AEGIS are Hongjun Wu and Bart Preneel.

At the time of writing, AEGIS is one of the finalists in CAESAR, an
open competition intended to select a portfolio of alternatives to
the problematic AES-GCM:

https://competitions.cr.yp.to/caesar-submissions.html
https://competitions.cr.yp.to/round3/aegisv11.pdf

Signed-off-by: Ondrej Mosnacek
Signed-off-by: Herbert Xu

Ondrej Mosnacek
2018-05-19 00:13:58 +0800

21 Apr, 2018

1 commit

d28fc3dbe crypto: zstd - Add zstd support ... Browse Code »

Adds zstd support to crypto and scompress. Only supports the default
level.

Previously we held off on this patch, since there weren't any users.
Now zram is ready for zstd support, but depends on CONFIG_CRYPTO_ZSTD,
which isn't defined until this patch is in. I also see a patch adding
zstd to pstore [0], which depends on crypto zstd.

[0] lkml.kernel.org/r/9c9416b2dff19f05fb4c35879aaa83d11ff72c92.1521626182.git.geliangtang@gmail.com

Signed-off-by: Nick Terrell
Signed-off-by: Herbert Xu

Nick Terrell
2018-04-21 00:58:30 +0800

07 Apr, 2018

2 commits

4fa8bc949 kbuild: rename *-asn1.[ch] to *.asn1.[ch] ... Browse Code »

Our convention is to distinguish file types by suffixes with a period
as a separator.

*-asn1.[ch] is a different pattern from other generated sources such
as *.lex.c, *.tab.[ch], *.dtb.S, etc. More confusing, files with
'-asn1.[ch]' are generated files, but '_asn1.[ch]' are checked-in
files:
net/netfilter/nf_conntrack_h323_asn1.c
include/linux/netfilter/nf_conntrack_h323_asn1.h
include/linux/sunrpc/gss_asn1.h

Rename generated files to *.asn1.[ch] for consistency.

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2018-04-07 18:04:02 +0800
3ca3273ea kbuild: clean up *-asn1.[ch] patterns from top-level Makefile ... Browse Code »

Clean up these patterns from the top Makefile to omit 'clean-files'
in each Makefile.

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2018-04-07 18:04:02 +0800

16 Mar, 2018

1 commit

747c8ce4e crypto: sm4 - introduce SM4 symmetric cipher algorithm ... Browse Code »

Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

SMS4 was originally created for use in protecting wireless
networks, and is mandated in the Chinese National Standard for
Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure)
(GB.15629.11-2003).

Signed-off-by: Gilad Ben-Yossef
Signed-off-by: Herbert Xu

Gilad Ben-Yossef
2018-03-16 23:35:48 +0800

09 Mar, 2018

1 commit

a7d85e06e crypto: cfb - add support for Cipher FeedBack mode ... Browse Code »

TPM security routines require encryption and decryption with AES in
CFB mode, so add it to the Linux Crypto schemes. CFB is basically a
one time pad where the pad is generated initially from the encrypted
IV and then subsequently from the encrypted previous block of
ciphertext. The pad is XOR'd into the plain text to get the final
ciphertext.

https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CFB

Signed-off-by: James Bottomley
Signed-off-by: Herbert Xu

James Bottomley
2018-03-09 22:45:49 +0800

03 Mar, 2018

1 commit

0e145b477 crypto: ablk_helper - remove ablk_helper ... Browse Code »

All users of ablk_helper have been converted over to crypto_simd, so
remove ablk_helper.

Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu

Eric Biggers
2018-03-03 00:03:38 +0800

22 Feb, 2018

1 commit

da7a0ab5b crypto: speck - add support for the Speck block cipher ... Browse Code »

Add a generic implementation of Speck, including the Speck128 and
Speck64 variants. Speck is a lightweight block cipher that can be much
faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions. Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used. Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices. However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin. Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis. It's also easy to implement without side
channels, unlike AES. Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305. While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

There is confusion about the byte and word orders of Speck, since the
original paper doesn't specify them. But we have implemented it using
the orders the authors recommended in a correspondence with them. The
test vectors are taken from the original paper but were mapped to byte
arrays using the recommended byte and word orders.

Signed-off-by: Eric Biggers
Signed-off-by: Herbert Xu

Eric Biggers
2018-02-22 22:16:53 +0800

20 Jan, 2018

1 commit

6e36719fb crypto: aes-generic - fix aes-generic regression on powerpc ... Browse Code »

My last bugfix added -Os on the command line, which unfortunately caused
a build regression on powerpc in some configurations.

I've done some more analysis of the original problem and found slightly
different workaround that avoids this regression and also results in
better performance on gcc-7.0: -fcode-hoisting is an optimization step
that got added in gcc-7 and that for all gcc-7 versions causes worse
performance.

This disables -fcode-hoisting on all compilers that understand the option.
For gcc-7.1 and 7.2 I found the same performance as my previous patch
(using -Os), in gcc-7.0 it was even better. On gcc-8 I could see no
change in performance from this patch. In theory, code hoisting should
not be able make things better for the AES cipher, so leaving it
disabled for gcc-8 only serves to simplify the Makefile change.

Reported-by: kbuild test robot
Link: https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg30418.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Fixes: 148b974deea9 ("crypto: aes-generic - build with -Os on gcc-7+")
Signed-off-by: Arnd Bergmann
Signed-off-by: Herbert Xu

Arnd Bergmann
2018-01-20 08:43:36 +0800

12 Jan, 2018

1 commit

148b974de crypto: aes-generic - build with -Os on gcc-7+ ... Browse Code »

While testing other changes, I discovered that gcc-7.2.1 produces badly
optimized code for aes_encrypt/aes_decrypt. This is especially true when
CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely
large stack usage that in turn might cause kernel stack overflows:

crypto/aes_generic.c: In function 'aes_encrypt':
crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=]
crypto/aes_generic.c: In function 'aes_decrypt':
crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=]

I verified that this problem exists on all architectures that are
supported by gcc-7.2, though arm64 in particular is less affected than
the others. I also found that gcc-7.1 and gcc-8 do not show the extreme
stack usage but still produce worse code than earlier versions for this
file, apparently because of optimization passes that generally provide
a substantial improvement in object code quality but understandably fail
to find any shortcuts in the AES algorithm.

Possible workarounds include

a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier
patch I tried, which reliably fixed the stack usage, but caused a
serious performance regression in some versions, as later testing
found.

b) disabling UBSAN on this file or all ciphers, as suggested by Ard
Biesheuvel. This would lead to massively better crypto performance in
UBSAN-enabled kernels and avoid the stack usage, but there is a concern
over whether we should exclude arbitrary files from UBSAN at all.

c) Forcing the optimization level in a different way. Similar to a),
but rather than deselecting specific optimization stages,
this now uses "gcc -Os" for this file, regardless of the
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable
workaround for the stack consumption on all architecture, and I've
retested the performance results now on x86, cycles/byte (lower is
better) for cbc(aes-generic) with 256 bit keys:

-O2 -Os
gcc-6.3.1 14.9 15.1
gcc-7.0.1 14.7 15.3
gcc-7.1.1 15.3 14.7
gcc-7.2.1 16.8 15.9
gcc-8.0.0 15.5 15.6

This implements the option c) by enabling forcing -Os on all compiler
versions starting with gcc-7.1. As a workaround for PR83356, it would
only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows
better performance on gcc-7.1 without UBSAN, it seems appropriate to
use the faster version here as well.

Side note: during testing, I also played with the AES code in libressl,
which had a similar performance regression from gcc-6 to gcc-7.2,
but was three times slower overall. It might be interesting to
investigate that further and possibly port the Linux implementation
into that.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Cc: Richard Biener
Cc: Jakub Jelinek
Cc: Ard Biesheuvel
Signed-off-by: Arnd Bergmann
Acked-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Arnd Bergmann
2018-01-12 20:03:40 +0800

15 Nov, 2017

1 commit

37dc79565 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.15:

API:

- Disambiguate EBUSY when queueing crypto request by adding ENOSPC.
This change touches code outside the crypto API.
- Reset settings when empty string is written to rng_current.

Algorithms:

- Add OSCCA SM3 secure hash.

Drivers:

- Remove old mv_cesa driver (replaced by marvell/cesa).
- Enable rfc3686/ecb/cfb/ofb AES in crypto4xx.
- Add ccm/gcm AES in crypto4xx.
- Add support for BCM7278 in iproc-rng200.
- Add hash support on Exynos in s5p-sss.
- Fix fallback-induced error in vmx.
- Fix output IV in atmel-aes.
- Fix empty GCM hash in mediatek.

Others:

- Fix DoS potential in lib/mpi.
- Fix potential out-of-order issues with padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits)
lib/mpi: call cond_resched() from mpi_powm() loop
crypto: stm32/hash - Fix return issue on update
crypto: dh - Remove pointless checks for NULL 'p' and 'g'
crypto: qat - Clean up error handling in qat_dh_set_secret()
crypto: dh - Don't permit 'key' or 'g' size longer than 'p'
crypto: dh - Don't permit 'p' to be 0
crypto: dh - Fix double free of ctx->p
hwrng: iproc-rng200 - Add support for BCM7278
dt-bindings: rng: Document BCM7278 RNG200 compatible
crypto: chcr - Replace _manual_ swap with swap macro
crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[]
hwrng: virtio - Virtio RNG devices need to be re-registered after suspend/resume
crypto: atmel - remove empty functions
crypto: ecdh - remove empty exit()
MAINTAINERS: update maintainer for qat
crypto: caam - remove unused param of ctx_map_to_sec4_sg()
crypto: caam - remove unneeded edesc zeroization
crypto: atmel-aes - Reset the controller before each use
crypto: atmel-aes - properly set IV after {en,de}crypt
hwrng: core - Reset user selected rng by writing "" to rng_current
...

Linus Torvalds
2017-11-15 02:52:09 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

22 Sep, 2017

1 commit

4f0fc1600 crypto: sm3 - add OSCCA SM3 secure hash ... Browse Code »

Add OSCCA SM3 secure hash (OSCCA GM/T 0004-2012 SM3)
generic hash transformation.

Signed-off-by: Gilad Ben-Yossef
Signed-off-by: Herbert Xu

Gilad Ben-Yossef
2017-09-22 17:43:07 +0800

10 Jun, 2017

1 commit

6755fd269 crypto: ecdh - add privkey generation support ... Browse Code »

Add support for generating ecc private keys.

Generation of ecc private keys is helpful in a user-space to kernel
ecdh offload because the keys are not revealed to user-space. Private
key generation is also helpful to implement forward secrecy.

If the user provides a NULL ecc private key, the kernel will generate it
and further use it for ecdh.

Move ecdh's object files below drbg's. drbg must be present in the kernel
at the time of calling.

Signed-off-by: Tudor Ambarus
Reviewed-by: Stephan Müller
Signed-off-by: Herbert Xu

Tudor-Dan Ambarus
2017-06-10 12:04:35 +0800

11 Feb, 2017

2 commits

7d6e91050 crypto: improve gcc optimization flags for serpent and wp512 ... Browse Code »

An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press: -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

default press nopress nosched
alpha-linux-gcc-4.9.3 1136 848 1136 176
am33_2.0-linux-gcc-4.9.3 2100 2076 2100 2104
arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352
cris-linux-gcc-4.9.3 272 272 272 272
frv-linux-gcc-4.9.3 1128 1000 1128 280
hppa64-linux-gcc-4.9.3 1128 336 1128 184
hppa-linux-gcc-4.9.3 644 308 644 276
i386-linux-gcc-4.9.3 352 352 352 352
m32r-linux-gcc-4.9.3 720 656 720 268
microblaze-linux-gcc-4.9.3 1108 604 1108 256
mips64-linux-gcc-4.9.3 1328 592 1328 208
mips-linux-gcc-4.9.3 1096 624 1096 240
powerpc64-linux-gcc-4.9.3 1088 432 1088 160
powerpc-linux-gcc-4.9.3 1080 584 1080 224
s390-linux-gcc-4.9.3 456 456 624 360
sh3-linux-gcc-4.9.3 292 292 292 292
sparc64-linux-gcc-4.9.3 992 240 992 208
sparc-linux-gcc-4.9.3 680 592 680 312
x86_64-linux-gcc-4.9.3 224 240 272 224
xtensa-linux-gcc-4.9.3 1152 704 1152 304

aarch64-linux-gcc-7.0.0 224 224 1104 208
arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352
mips-linux-gcc-7.0.0 1120 648 1120 272
x86_64-linux-gcc-7.0.1 240 240 304 240

arm-linux-gnueabi-gcc-4.4.7 840 392
arm-linux-gnueabi-gcc-4.5.4 784 728 784 320
arm-linux-gnueabi-gcc-4.6.4 736 728 736 304
arm-linux-gnueabi-gcc-4.7.4 944 784 944 352
arm-linux-gnueabi-gcc-4.8.5 464 464 760 352
arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352
arm-linux-gnueabi-gcc-5.3.1 824 824 1064 336
arm-linux-gnueabi-gcc-6.1.1 808 808 1056 344
arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

default press nopress nosched
alpha-linux-gcc-4.9.3 1392 864 1392 960
am33_2.0-linux-gcc-4.9.3 536 524 536 528
arm-linux-gnueabi-gcc-4.9.3 552 552 776 536
cris-linux-gcc-4.9.3 528 528 528 528
frv-linux-gcc-4.9.3 536 400 536 504
hppa64-linux-gcc-4.9.3 524 208 524 480
hppa-linux-gcc-4.9.3 768 472 768 508
i386-linux-gcc-4.9.3 564 564 564 564
m32r-linux-gcc-4.9.3 712 576 712 532
microblaze-linux-gcc-4.9.3 724 392 724 512
mips64-linux-gcc-4.9.3 720 384 720 496
mips-linux-gcc-4.9.3 728 384 728 496
powerpc64-linux-gcc-4.9.3 704 304 704 480
powerpc-linux-gcc-4.9.3 704 296 704 480
s390-linux-gcc-4.9.3 560 560 592 536
sh3-linux-gcc-4.9.3 540 540 540 540
sparc64-linux-gcc-4.9.3 544 352 544 496
sparc-linux-gcc-4.9.3 544 344 544 496
x86_64-linux-gcc-4.9.3 528 536 576 528
xtensa-linux-gcc-4.9.3 752 544 752 544

aarch64-linux-gcc-7.0.0 432 432 656 480
arm-linux-gnueabi-gcc-7.0.1 616 616 808 536
mips-linux-gcc-7.0.0 720 464 720 488
x86_64-linux-gcc-7.0.1 536 528 600 536

arm-linux-gnueabi-gcc-4.4.7 592 440
arm-linux-gnueabi-gcc-4.5.4 776 448 776 544
arm-linux-gnueabi-gcc-4.6.4 776 448 776 544
arm-linux-gnueabi-gcc-4.7.4 768 448 768 544
arm-linux-gnueabi-gcc-4.8.5 488 488 776 544
arm-linux-gnueabi-gcc-4.9.3 552 552 776 536
arm-linux-gnueabi-gcc-5.3.1 552 552 776 536
arm-linux-gnueabi-gcc-6.1.1 560 560 776 536
arm-linux-gnueabi-gcc-7.0.1 616 616 808 536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle
Signed-off-by: Arnd Bergmann
Signed-off-by: Herbert Xu

Arnd Bergmann
2017-02-11 17:52:26 +0800
b5e0b032b crypto: aes - add generic time invariant AES cipher ... Browse Code »

Lookup table based AES is sensitive to timing attacks, which is due to
the fact that such table lookups are data dependent, and the fact that
8 KB worth of tables covers a significant number of cachelines on any
architecture, resulting in an exploitable correlation between the key
and the processing time for known plaintexts.

For network facing algorithms such as CTR, CCM or GCM, this presents a
security risk, which is why arch specific AES ports are typically time
invariant, either through the use of special instructions, or by using
SIMD algorithms that don't rely on table lookups.

For generic code, this is difficult to achieve without losing too much
performance, but we can improve the situation significantly by switching
to an implementation that only needs 256 bytes of table data (the actual
S-box itself), which can be prefetched at the start of each block to
eliminate data dependent latencies.

This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
ordinary generic AES driver manages 18 cycles per byte on this
hardware). Decryption is substantially slower.

Signed-off-by: Ard Biesheuvel
Signed-off-by: Herbert Xu

Ard Biesheuvel
2017-02-11 17:50:43 +0800

30 Nov, 2016

2 commits

479d014de Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Merge the crypto tree to pull in chelsio chcr fix.

Herbert Xu
2016-11-30 19:53:12 +0800
57891633e crypto: rsa - Add Makefile dependencies to fix parallel builds ... Browse Code »

Both asn1 headers are included by rsa_helper.c, so rsa_helper.o
should explicitly depend on them.

Signed-off-by: David Michael
Signed-off-by: Herbert Xu

David Michael
2016-11-30 19:46:45 +0800

28 Nov, 2016

1 commit

266d05160 crypto: simd - Add simd skcipher helper ... Browse Code »

This patch adds the simd skcipher helper which is meant to be
a replacement for ablk helper. It replaces the underlying blkcipher
interface with skcipher, and also presents the top-level algorithm
as an skcipher.

Signed-off-by: Herbert Xu

Herbert Xu
2016-11-28 21:23:18 +0800

01 Nov, 2016

1 commit

6c0f40005 crypto: acomp - fix dependency in Makefile ... Browse Code »

Fix dependency between acomp and scomp that appears when acomp is
built as module

Signed-off-by: Giovanni Cabiddu
Signed-off-by: Herbert Xu

Giovanni Cabiddu
2016-11-01 08:37:15 +0800

25 Oct, 2016

2 commits

1ab53a77b crypto: acomp - add driver-side scomp interface ... Browse Code »

Add a synchronous back-end (scomp) to acomp. This allows to easily
expose the already present compression algorithms in LKCF via acomp.

Signed-off-by: Giovanni Cabiddu
Signed-off-by: Herbert Xu

Giovanni Cabiddu
2016-10-25 11:08:31 +0800
2ebda74fd crypto: acomp - add asynchronous compression api ... Browse Code »

Add acomp, an asynchronous compression api that uses scatterlist
buffers.

Signed-off-by: Giovanni Cabiddu
Signed-off-by: Herbert Xu

Giovanni Cabiddu
2016-10-25 11:08:30 +0800

18 Jul, 2016

1 commit

3a01d0ee2 crypto: skcipher - Remove top-level givcipher interface ... Browse Code »

This patch removes the old crypto_grab_skcipher helper and replaces
it with crypto_grab_skcipher2.

As this is the final entry point into givcipher this patch also
removes all traces of the top-level givcipher interface, including
all implicit IV generators such as chainiv.

The bottom-level givcipher interface remains until the drivers
using it are converted.

Signed-off-by: Herbert Xu

Herbert Xu
2016-07-18 17:35:46 +0800

23 Jun, 2016

2 commits

3c4b23901 crypto: ecdh - Add ECDH software support ... Browse Code »

* Implement ECDH under kpp API
* Provide ECC software support for curve P-192 and
P-256.
* Add kpp test for ECDH with data generated by OpenSSL

Signed-off-by: Salvatore Benedetto
Signed-off-by: Herbert Xu

Salvatore Benedetto
2016-06-23 18:29:57 +0800
802c7f1c8 crypto: dh - Add DH software implementation ... Browse Code »

* Implement MPI based Diffie-Hellman under kpp API
* Test provided uses data generad by OpenSSL

Signed-off-by: Salvatore Benedetto
Signed-off-by: Herbert Xu

Salvatore Benedetto
2016-06-23 18:29:56 +0800