Eric Lee / smarc-fsl-linux-kernel

16 Oct, 2016

1 commit

9ffc66941 Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull gcc plugins update from Kees Cook:
"This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot
time as possible, hoping to capitalize on any possible variation in
CPU operation (due to runtime data differences, hardware differences,
SMP ordering, thermal timing variation, cache behavior, etc).

At the very least, this plugin is a much more comprehensive example
for how to manipulate kernel code using the gcc plugin internals"

* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
latent_entropy: Mark functions with __latent_entropy
gcc-plugins: Add latent_entropy plugin

Linus Torvalds
2016-10-16 01:03:15 +0800

15 Oct, 2016

1 commit

84d69848c Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild ... Browse Code »

Pull kbuild updates from Michal Marek:

- EXPORT_SYMBOL for asm source by Al Viro.

This does bring a regression, because genksyms no longer generates
checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
working on a patch to fix this.

Plus, we are talking about functions like strcpy(), which rarely
change prototypes.

- Fixes for PPC fallout of the above by Stephen Rothwell and Nick
Piggin

- fixdep speedup by Alexey Dobriyan.

- preparatory work by Nick Piggin to allow architectures to build with
-ffunction-sections, -fdata-sections and --gc-sections

- CONFIG_THIN_ARCHIVES support by Stephen Rothwell

- fix for filenames with colons in the initramfs source by me.

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
initramfs: Escape colons in depfile
ppc: there is no clear_pages to export
powerpc/64: whitelist unresolved modversions CRCs
kbuild: -ffunction-sections fix for archs with conflicting sections
kbuild: add arch specific post-link Makefile
kbuild: allow archs to select link dead code/data elimination
kbuild: allow architectures to use thin archives instead of ld -r
kbuild: Regenerate genksyms lexer
kbuild: genksyms fix for typeof handling
fixdep: faster CONFIG_ search
ia64: move exports to definitions
sparc32: debride memcpy.S a bit
[sparc] unify 32bit and 64bit string.h
sparc: move exports to definitions
ppc: move exports to definitions
arm: move exports to definitions
s390: move exports to definitions
m68k: move exports to definitions
alpha: move exports to actual definitions
x86: move exports to actual definitions
...

Linus Torvalds
2016-10-15 05:26:58 +0800

11 Oct, 2016

1 commit

38addce8b gcc-plugins: Add latent_entropy plugin ... Browse Code »

This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot time as
possible, hoping to capitalize on any possible variation in CPU operation
(due to runtime data differences, hardware differences, SMP ordering,
thermal timing variation, cache behavior, etc).

At the very least, this plugin is a much more comprehensive example for
how to manipulate kernel code using the gcc plugin internals.

The need for very-early boot entropy tends to be very architecture or
system design specific, so this plugin is more suited for those sorts
of special cases. The existing kernel RNG already attempts to extract
entropy from reliable runtime variation, but this plugin takes the idea to
a logical extreme by permuting a global variable based on any variation
in code execution (e.g. a different value (and permutation function)
is used to permute the global based on loop count, case statement,
if/then/else branching, etc).

To do this, the plugin starts by inserting a local variable in every
marked function. The plugin then adds logic so that the value of this
variable is modified by randomly chosen operations (add, xor and rol) and
random values (gcc generates separate static values for each location at
compile time and also injects the stack pointer at runtime). The resulting
value depends on the control flow path (e.g., loops and branches taken).

Before the function returns, the plugin mixes this local variable into
the latent_entropy global variable. The value of this global variable
is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
though it does not credit any bytes of entropy to the pool; the contents
of the global are just used to mix the pool.

Additionally, the plugin can pre-initialize arrays with build-time
random contents, so that two different kernel builds running on identical
hardware will not have the same starting values.

Signed-off-by: Emese Revfy
[kees: expanded commit message and code comments]
Signed-off-by: Kees Cook

Emese Revfy
2016-10-11 05:51:44 +0800

22 Sep, 2016

1 commit

0f4c4af06 kbuild: -ffunction-sections fix for archs with conflicting sections ... Browse Code »

Enabling -ffunction-sections modified the generic linker script to
pull .text.* sections into regular TEXT_TEXT section, conflicting
with some architectures. Revert that change and require archs that
enable the option to ensure they have no conflicting section names,
and do the appropriate merging.

Reported-by: Guenter Roeck
Tested-by: Guenter Roeck
Fixes: b67067f1176d ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin
Signed-off-by: Michal Marek

Nicholas Piggin
2016-09-22 20:37:14 +0800

15 Sep, 2016

1 commit

d4b80afbb Merge branch 'linus' into x86/asm, to pick up recent fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-09-15 14:24:53 +0800

09 Sep, 2016

2 commits

b67067f11 kbuild: allow archs to select link dead code/data elimination ... Browse Code »

Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
select to build with -ffunction-sections, -fdata-sections, and link
with --gc-sections. It requires some work (documented) to ensure all
unreferenced entrypoints are live, and requires toolchain and build
verification, so it is made a per-arch option for now.

On a random powerpc64le build, this yelds a significant size saving,
it boots and runs fine, but there is a lot I haven't tested as yet, so
these savings may be reduced if there are bugs in the link.

text data bss dec filename
11169741 1180744 1923176 14273661 vmlinux
10445269 1004127 1919707 13369103 vmlinux.dce

~700K text, ~170K data, 6% removed from kernel image size.

Signed-off-by: Nicholas Piggin
Signed-off-by: Michal Marek

Nicholas Piggin
2016-09-09 16:47:00 +0800
a5967db9a kbuild: allow architectures to use thin archives instead of ld -r ... Browse Code »

ld -r is an incremental link used to create built-in.o files in build
subdirectories. It produces relocatable object files containing all
its input files, and these are are then pulled together and relocated
in the final link. Aside from the bloat, this constrains the final
link relocations, which has bitten large powerpc builds with
unresolvable relocations in the final link.

Alan Modra has recommended the kernel use thin archives for linking.
This is an alternative and means that the linker has more information
available to it when it links the kernel.

This patch enables a config option architectures can select, which
causes all built-in.o files to be built as thin archives. built-in.o
files in subdirectories do not get symbol table or index attached,
which improves speed and size. The final link pass creates a
built-in.o archive in the root output directory which includes the
symbol table and index. The linker then uses takes this file to link.

The --whole-archive linker option is required, because the linker now
has visibility to every individual object file, and it will otherwise
just completely avoid including those without external references
(consider a file with EXPORT_SYMBOL or initcall or hardware exceptions
as its only entry points). The traditional built works "by luck" as
built-in.o files are large enough that they're going to get external
references. However this optimisation is unpredictable for the kernel
(due to above external references), ineffective at culling unused, and
costly because the .o files have to be searched for references.
Superior alternatives for link-time culling should be used instead.

Build characteristics for inclink vs thinarc, on a small powerpc64le
pseries VM with a modest .config:

inclink thinarc
sizes
vmlinux 15 618 680 15 625 028
sum of all built-in.o 56 091 808 1 054 334
sum excluding root built-in.o 151 430

find -name built-in.o | xargs rm ; time make vmlinux
real 22.772s 21.143s
user 13.280s 13.430s
sys 4.310s 2.750s

- Final kernel pulled in only about 6K more, which shows how
ineffective the object file culling is.
- Build performance looks improved due to less pagecache activity.
On IO constrained systems it could be a bigger win.
- Build size saving is significant.

Side note, the toochain understands archives, so there's some tricks,
$ ar t built-in.o # list all files you linked with
$ size built-in.o # and their sizes
$ objdump -d built-in.o # disassembly (unrelocated) with filenames

Implementation by sfr, minor tweaks by npiggin.

Signed-off-by: Stephen Rothwell
Signed-off-by: Nicholas Piggin
Signed-off-by: Michal Marek

Stephen Rothwell
2016-09-09 16:31:19 +0800

08 Sep, 2016

1 commit

4fadd04d5 seccomp: Remove 2-phase API documentation ... Browse Code »

Fixes: 8112c4f140fa ("seccomp: remove 2-phase API")

Signed-off-by: Mickaël Salaün
Acked-by: Kees Cook
Cc: Andy Lutomirski
Cc: James Morris
Signed-off-by: James Morris
Signed-off-by: Kees Cook

Mickaël Salaün
2016-09-08 00:25:05 +0800

24 Aug, 2016

1 commit

ba14a194a fork: Add generic vmalloced stack support ... Browse Code »

If CONFIG_VMAP_STACK=y is selected, kernel stacks are allocated with
__vmalloc_node_range().

Grsecurity has had a similar feature (called GRKERNSEC_KSTACKOVERFLOW=y)
for a long time.

Signed-off-by: Andy Lutomirski
Acked-by: Michal Hocko
Cc: Alexander Potapenko
Cc: Andrey Ryabinin
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: Dmitry Vyukov
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/14c07d4fd173a5b117f51e8b939f9f4323e39899.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar

Andy Lutomirski
2016-08-24 18:11:41 +0800

09 Aug, 2016

1 commit

1eccfa090 Merge tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull usercopy protection from Kees Cook:
"Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
copy_from_user bounds checking for most architectures on SLAB and
SLUB"

* tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mm: SLUB hardened usercopy support
mm: SLAB hardened usercopy support
s390/uaccess: Enable hardened usercopy
sparc/uaccess: Enable hardened usercopy
powerpc/uaccess: Enable hardened usercopy
ia64/uaccess: Enable hardened usercopy
arm64/uaccess: Enable hardened usercopy
ARM: uaccess: Enable hardened usercopy
x86/uaccess: Enable hardened usercopy
mm: Hardened usercopy
mm: Implement stack frame object validation
mm: Add is_migrate_cma_page

Linus Torvalds
2016-08-09 05:48:14 +0800

03 Aug, 2016

1 commit

f716a85cd Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild ... Browse Code »

Pull kbuild updates from Michal Marek:

- GCC plugin support by Emese Revfy from grsecurity, with a fixup from
Kees Cook. The plugins are meant to be used for static analysis of
the kernel code. Two plugins are provided already.

- reduction of the gcc commandline by Arnd Bergmann.

- IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

- bin2c fix by Michael Tautschnig

- setlocalversion fix by Wolfram Sang

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
gcc-plugins: disable under COMPILE_TEST
kbuild: Abort build on bad stack protector flag
scripts: Fix size mismatch of kexec_purgatory_size
kbuild: make samples depend on headers_install
Kbuild: don't add obj tree in additional includes
Kbuild: arch: look for generated headers in obtree
Kbuild: always prefix objtree in LINUXINCLUDE
Kbuild: avoid duplicate include path
Kbuild: don't add ../../ to include path
vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
kconfig.h: use already defined macros for IS_REACHABLE() define
export.h: use __is_defined() to check if __KSYM_* is defined
kconfig.h: use __is_defined() to check if MODULE is defined
kbuild: setlocalversion: print error to STDERR
Add sancov plugin
Add Cyclomatic complexity GCC plugin
GCC plugin infrastructure
Shared library support

Linus Torvalds
2016-08-03 04:37:12 +0800

27 Jul, 2016

2 commits

a519167e7 gcc-plugins: disable under COMPILE_TEST ... Browse Code »

Since adding the gcc plugin development headers is required for the
gcc plugin support, we should ease into this new kernel build dependency
more slowly. For now, disable the gcc plugins under COMPILE_TEST so that
all*config builds will skip it.

Signed-off-by: Kees Cook
Signed-off-by: Michal Marek

Kees Cook
2016-07-27 06:08:54 +0800
0f60a8efe mm: Implement stack frame object validation ... Browse Code »

This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.

This is based on code from PaX.

Signed-off-by: Kees Cook

Kees Cook
2016-07-27 05:41:47 +0800

25 Jun, 2016

1 commit

b235beea9 Clarify naming of thread info/stack allocators ... Browse Code »

We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.

But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.

Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical. That
identity then meant that we would have things like

ti = alloc_thread_info_node(tsk, node);
...
tsk->stack = ti;

which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.

So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack. The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.

This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.

The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter. It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.

Acked-by: Andy Lutomirski
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-06-25 06:09:37 +0800

18 Jun, 2016

1 commit

3a4955111 isa: Allow ISA-style drivers on modern systems ... Browse Code »

Several modern devices, such as PC/104 cards, are expected to run on
modern systems via an ISA bus interface. Since ISA is a legacy interface
for most modern architectures, ISA support should remain disabled in
general. Support for ISA-style drivers should be enabled on a per driver
basis.

To allow ISA-style drivers on modern systems, this patch introduces the
ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
build conditionally on the ISA_BUS_API Kconfig option, which defaults to
the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
ISA_BUS_API Kconfig option to be selected on architectures which do not
enable ISA (e.g. X86_64).

The ISA_BUS Kconfig option is currently only implemented for X86
architectures. Other architectures may have their own ISA_BUS Kconfig
options added as required.

Reviewed-by: Guenter Roeck
Signed-off-by: William Breathitt Gray
Acked-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

William Breathitt Gray
2016-06-18 11:21:12 +0800

08 Jun, 2016

3 commits

543c37cb1 Add sancov plugin ... Browse Code »

The sancov gcc plugin inserts a __sanitizer_cov_trace_pc() call
at the start of basic blocks.

This plugin is a helper plugin for the kcov feature. It supports
all gcc versions with plugin support (from gcc-4.5 on).
It is based on the gcc commit "Add fuzzing coverage support" by Dmitry Vyukov
(https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=231296).

Signed-off-by: Emese Revfy
Acked-by: Kees Cook
Signed-off-by: Michal Marek

Emese Revfy
2016-06-08 04:57:10 +0800
0dae776c6 Add Cyclomatic complexity GCC plugin ... Browse Code »

Add a very simple plugin to demonstrate the GCC plugin infrastructure. This GCC
plugin computes the cyclomatic complexity of each function.

The complexity M of a function's control flow graph is defined as:
M = E - N + 2P
where
E = the number of edges
N = the number of nodes
P = the number of connected components (exit nodes).

Signed-off-by: Emese Revfy
Acked-by: Kees Cook
Signed-off-by: Michal Marek

Emese Revfy
2016-06-08 04:57:10 +0800
6b90bd4ba GCC plugin infrastructure ... Browse Code »

This patch allows to build the whole kernel with GCC plugins. It was ported from
grsecurity/PaX. The infrastructure supports building out-of-tree modules and
building in a separate directory. Cross-compilation is supported too.
Currently the x86, arm, arm64 and uml architectures enable plugins.

The directory of the gcc plugins is scripts/gcc-plugins. You can use a file or a directory
there. The plugins compile with these options:
* -fno-rtti: gcc is compiled with this option so the plugins must use it too
* -fno-exceptions: this is inherited from gcc too
* -fasynchronous-unwind-tables: this is inherited from gcc too
* -ggdb: it is useful for debugging a plugin (better backtrace on internal
errors)
* -Wno-narrowing: to suppress warnings from gcc headers (ipa-utils.h)
* -Wno-unused-variable: to suppress warnings from gcc headers (gcc_version
variable, plugin-version.h)

The infrastructure introduces a new Makefile target called gcc-plugins. It
supports all gcc versions from 4.5 to 6.0. The scripts/gcc-plugin.sh script
chooses the proper host compiler (gcc-4.7 can be built by either gcc or g++).
This script also checks the availability of the included headers in
scripts/gcc-plugins/gcc-common.h.

The gcc-common.h header contains frequently included headers for GCC plugins
and it has a compatibility layer for the supported gcc versions.

The gcc-generate-*-pass.h headers automatically generate the registration
structures for GIMPLE, SIMPLE_IPA, IPA and RTL passes.

Note that 'make clean' keeps the *.so files (only the distclean or mrproper
targets clean all) because they are needed for out-of-tree modules.

Based on work created by the PaX Team.

Signed-off-by: Emese Revfy
Acked-by: Kees Cook
Signed-off-by: Michal Marek

Emese Revfy
2016-06-08 04:57:10 +0800

29 May, 2016

2 commits

7e0fb73c5 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux ... Browse Code »

Pull string hash improvements from George Spelvin:
"This series does several related things:

- Makes the dcache hash (fs/namei.c) useful for general kernel use.

(Thanks to Bruce for noticing the zero-length corner case)

- Converts the string hashes in to use the
above.

- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.

- Rids the world of the bad hash multipliers in hash_32.

This finishes the job started in commit 689de1d6ca95 ("Minimal
fix-up of bad hashing behavior of hash_64()")

The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.

The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.

- Overhauls the dcache hash mixing.

The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)

- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.

- Sort out partial_name_hash().

The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:

- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"

Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)

On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add
microblaze: Add
m68k: Add
: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to

Linus Torvalds
2016-05-29 07:15:25 +0800
468a94285 <linux/hash.h>: Add support for architecture-specific functions ... Browse Code »

This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of .

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin
Cc: Geert Uytterhoeven
Cc: Greg Ungerer
Cc: Andreas Schwab
Cc: Philippe De Muyter
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis
Cc: Michal Simek
Cc: Yoshinori Sato
Cc: uclinux-h8-devel@lists.sourceforge.jp

George Spelvin
2016-05-29 03:48:31 +0800

21 May, 2016

3 commits

fff7fb0b2 lib/GCD.c: use binary GCD algorithm instead of Euclidean ... Browse Code »

The binary GCD algorithm is based on the following facts:
1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available. This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

#include
#include
#include
#include
#include
#include

#define swap(a, b) \
do { \
a ^= b; \
b ^= a; \
a ^= b; \
} while (0)

unsigned long gcd0(unsigned long a, unsigned long b)
{
unsigned long r;

if (a < b) {
swap(a, b);
}

if (b == 0)
return a;

while ((r = a % b) != 0) {
a = b;
b = r;
}

return b;
}

unsigned long gcd1(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

b >>= __builtin_ctzl(b);

for (;;) {
a >>= __builtin_ctzl(a);
if (a == b)
return a << __builtin_ctzl(r);

if (a < b)
swap(a, b);
a -= b;
}
}

unsigned long gcd2(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

r &= -r;

while (!(b & r))
b >>= 1;

for (;;) {
while (!(a & r))
a >>= 1;
if (a == b)
return a;

if (a < b)
swap(a, b);
a -= b;
a >>= 1;
if (a & r)
a += b;
a >>= 1;
}
}

unsigned long gcd3(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

b >>= __builtin_ctzl(b);
if (b == 1)
return r & -r;

for (;;) {
a >>= __builtin_ctzl(a);
if (a == 1)
return r & -r;
if (a == b)
return a << __builtin_ctzl(r);

if (a < b)
swap(a, b);
a -= b;
}
}

unsigned long gcd4(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

r &= -r;

while (!(b & r))
b >>= 1;
if (b == r)
return r;

for (;;) {
while (!(a & r))
a >>= 1;
if (a == r)
return r;
if (a == b)
return a;

if (a < b)
swap(a, b);
a -= b;
a >>= 1;
if (a & r)
a += b;
a >>= 1;
}
}

static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
gcd0, gcd1, gcd2, gcd3, gcd4,
};

#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

#if defined(__x86_64__)

#define rdtscll(val) do { \
unsigned long __a,__d; \
__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<= start)
ret = end - start;
else
ret = ~0ULL - start + 1 + end;

*res = gcd_res;
return ret;
}

#else

static inline struct timespec read_time(void)
{
struct timespec time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
return time;
}

static inline unsigned long long diff_time(struct timespec start, struct timespec end)
{
struct timespec temp;

if ((end.tv_nsec - start.tv_nsec) < 0) {
temp.tv_sec = end.tv_sec - start.tv_sec - 1;
temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec - start.tv_sec;
temp.tv_nsec = end.tv_nsec - start.tv_nsec;
}

return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
}

static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
unsigned long a, unsigned long b, unsigned long *res)
{
struct timespec start, end;
unsigned long gcd_res;

start = read_time();
gcd_res = gcd(a, b);
end = read_time();

*res = gcd_res;
return diff_time(start, end);
}

#endif

static inline unsigned long get_rand()
{
if (sizeof(long) == 8)
return (unsigned long)rand() << 32 | rand();
else
return rand();
}

int main(int argc, char **argv)
{
unsigned int seed = time(0);
int loops = 100;
int repeats = 1000;
unsigned long (*res)[TEST_ENTRIES];
unsigned long long elapsed[TEST_ENTRIES];
int i, j, k;

for (;;) {
int opt = getopt(argc, argv, "n:r:s:");
/* End condition always first */
if (opt == -1)
break;

switch (opt) {
case 'n':
loops = atoi(optarg);
break;
case 'r':
repeats = atoi(optarg);
break;
case 's':
seed = strtoul(optarg, NULL, 10);
break;
default:
/* You won't actually get here. */
break;
}
}

res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
memset(elapsed, 0, sizeof(elapsed));

srand(seed);
for (j = 0; j < loops; j++) {
unsigned long a = get_rand();
/* Do we have args? */
unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
unsigned long long min_elapsed[TEST_ENTRIES];
for (k = 0; k < repeats; k++) {
for (i = 0; i < TEST_ENTRIES; i++) {
unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
if (k == 0 || min_elapsed[i] > tmp)
min_elapsed[i] = tmp;
}
}
for (i = 0; i < TEST_ENTRIES; i++)
elapsed[i] += min_elapsed[i];
}

for (i = 0; i < TEST_ENTRIES; i++)
printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

k = 0;
srand(seed);
for (j = 0; j < loops; j++) {
unsigned long a = get_rand();
unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
for (i = 1; i < TEST_ENTRIES; i++) {
if (res[j][i] != res[j][0])
break;
}
if (i < TEST_ENTRIES) {
if (k == 0) {
k = 1;
fprintf(stderr, "Error:\n");
}
fprintf(stderr, "gcd(%lu, %lu): ", a, b);
for (i = 0; i < TEST_ENTRIES; i++)
fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
}
}

if (k == 0)
fprintf(stderr, "PASS\n");

free(res);

return 0;
}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 10174
gcd1: elapsed 2120
gcd2: elapsed 2902
gcd3: elapsed 2039
gcd4: elapsed 2812
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9309
gcd1: elapsed 2280
gcd2: elapsed 2822
gcd3: elapsed 2217
gcd4: elapsed 2710
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9589
gcd1: elapsed 2098
gcd2: elapsed 2815
gcd3: elapsed 2030
gcd4: elapsed 2718
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9914
gcd1: elapsed 2309
gcd2: elapsed 2779
gcd3: elapsed 2228
gcd4: elapsed 2709
PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng
Signed-off-by: George Spelvin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhaoxiu Zeng
2016-05-21 08:58:30 +0800
42a0bb3f7 printk/nmi: generic solution for safe printk in NMI ... Browse Code »

printk() takes some locks and could not be used a safe way in NMI
context.

The chance of a deadlock is real especially when printing stacks from
all CPUs. This particular problem has been addressed on x86 by the
commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").

The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited. We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers. These are not easy to avoid.

This patch reuses most of the code and makes it generic. It is useful
for all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context. It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers. There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock. It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven
Rostedt. It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter(). This is achieved by the new
HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures. We need to clean up NMI
handling there first. Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see

https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek
Suggested-by: Peter Zijlstra
Suggested-by: Steven Rostedt
Cc: Jan Kara
Acked-by: Russell King [arm part]
Cc: Daniel Thompson
Cc: Jiri Kosina
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: David Miller
Cc: Daniel Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-05-21 08:58:30 +0800
5f56a5dfd exit_thread: remove empty bodies ... Browse Code »

Define HAVE_EXIT_THREAD for archs which want to do something in
exit_thread. For others, let's define exit_thread as an empty inline.

This is a cleanup before we change the prototype of exit_thread to
accept a task parameter.

[akpm@linux-foundation.org: fix mips]
Signed-off-by: Jiri Slaby
Cc: "David S. Miller"
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: Aurelien Jacquiot
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Chen Liqin
Cc: Chris Metcalf
Cc: Chris Zankel
Cc: David Howells
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Guan Xuetao
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: James Hogan
Cc: Jeff Dike
Cc: Jesper Nilsson
Cc: Jiri Slaby
Cc: Jonas Bonn
Cc: Koichi Yasutake
Cc: Lennox Wu
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Mikael Starvik
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Richard Henderson
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Russell King
Cc: Steven Miao
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2016-05-21 08:58:30 +0800

29 Feb, 2016

1 commit

b9ab5ebb1 objtool: Add CONFIG_STACK_VALIDATION option ... Browse Code »

Add a CONFIG_STACK_VALIDATION option which will run "objtool check" for
each .o file to ensure the validity of its stack metadata.

Signed-off-by: Josh Poimboeuf
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Bernd Petrovitsch
Cc: Borislav Petkov
Cc: Chris J Arges
Cc: Jiri Slaby
Cc: Linus Torvalds
Cc: Michal Marek
Cc: Namhyung Kim
Cc: Pedro Alves
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/92baab69a6bf9bc7043af0bfca9fb964a1d45546.1456719558.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2016-02-29 15:35:13 +0800

21 Jan, 2016

2 commits

e1c7e3245 dma-mapping: always provide the dma_map_ops based implementation ... Browse Code »

Move the generic implementation to now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig
Cc: "David S. Miller"
Cc: Aurelien Jacquiot
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Helge Deller
Cc: James Hogan
Cc: Jesper Nilsson
Cc: Koichi Yasutake
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Mikael Starvik
Cc: Steven Miao
Cc: Vineet Gupta
Cc: Christian Borntraeger
Cc: Joerg Roedel
Cc: Sebastian Ott
Signed-off-by: Valentin Rothberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2016-01-21 09:09:18 +0800
0d4a619b6 dma-mapping: make the generic coherent dma mmap implementation optional ... Browse Code »

This series converts all remaining architectures to use dma_map_ops and
the generic implementation of the DMA API. This not only simplifies the
code a lot, but also prepares for possible future changes like more
generic non-iommu dma_ops implementations or generic per-device
dma_map_ops.

This patch (of 16):

We have a couple architectures that do not want to support this code, so
add another Kconfig symbol that disables the code similar to what we do
for the nommu case.

Signed-off-by: Christoph Hellwig
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Steven Miao
Cc: Ley Foon Tan
Cc: David Howells
Cc: Koichi Yasutake
Cc: Chris Metcalf
Cc: "David S. Miller"
Cc: Aurelien Jacquiot
Cc: Geert Uytterhoeven
Cc: Helge Deller
Cc: James Hogan
Cc: Jesper Nilsson
Cc: Mark Salter
Cc: Mikael Starvik
Cc: Vineet Gupta
Cc: Christian Borntraeger
Cc: Joerg Roedel
Cc: Sebastian Ott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2016-01-21 09:09:18 +0800

15 Jan, 2016

1 commit

d07e22597 mm: mmap: add new /proc tunable for mmap_base ASLR ... Browse Code »

Address Space Layout Randomization (ASLR) provides a barrier to
exploitation of user-space processes in the presence of security
vulnerabilities by making it more difficult to find desired code/data
which could help an attack. This is done by adding a random offset to
the location of regions in the process address space, with a greater
range of potential offset values corresponding to better protection/a
larger search-space for brute force, but also to greater potential for
fragmentation.

The offset added to the mmap_base address, which provides the basis for
the majority of the mappings for a process, is set once on process exec
in arch_pick_mmap_layout() and is done via hard-coded per-arch values,
which reflect, hopefully, the best compromise for all systems. The
trade-off between increased entropy in the offset value generation and
the corresponding increased variability in address space fragmentation
is not absolute, however, and some platforms may tolerate higher amounts
of entropy. This patch introduces both new Kconfig values and a sysctl
interface which may be used to change the amount of entropy used for
offset generation on a system.

The direct motivation for this change was in response to the
libstagefright vulnerabilities that affected Android, specifically to
information provided by Google's project zero at:

http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html

The attack presented therein, by Google's project zero, specifically
targeted the limited randomness used to generate the offset added to the
mmap_base address in order to craft a brute-force-based attack.
Concretely, the attack was against the mediaserver process, which was
limited to respawning every 5 seconds, on an arm device. The hard-coded
8 bits used resulted in an average expected success rate of defeating
the mmap ASLR after just over 10 minutes (128 tries at 5 seconds a
piece). With this patch, and an accompanying increase in the entropy
value to 16 bits, the same attack would take an average expected time of
over 45 hours (32768 tries), which makes it both less feasible and more
likely to be noticed.

The introduced Kconfig and sysctl options are limited by per-arch
minimum and maximum values, the minimum of which was chosen to match the
current hard-coded value and the maximum of which was chosen so as to
give the greatest flexibility without generating an invalid mmap_base
address, generally a 3-4 bits less than the number of bits in the
user-space accessible virtual address space.

When decided whether or not to change the default value, a system
developer should consider that mmap_base address could be placed
anywhere up to 2^(value) bits away from the non-randomized location,
which would introduce variable-sized areas above and below the mmap_base
address such that the maximum vm_area_struct size may be reduced,
preventing very large allocations.

This patch (of 4):

ASLR only uses as few as 8 bits to generate the random offset for the
mmap base address on 32 bit architectures. This value was chosen to
prevent a poorly chosen value from dividing the address space in such a
way as to prevent large allocations. This may not be an issue on all
platforms. Allow the specification of a minimum number of bits so that
platforms desiring greater ASLR protection may determine where to place
the trade-off.

Signed-off-by: Daniel Cashman
Cc: Russell King
Acked-by: Kees Cook
Cc: Ingo Molnar
Cc: Jonathan Corbet
Cc: Don Zickus
Cc: Eric W. Biederman
Cc: Heinrich Schuchardt
Cc: Josh Poimboeuf
Cc: Kirill A. Shutemov
Cc: Naoya Horiguchi
Cc: Andrea Arcangeli
Cc: Mel Gorman
Cc: Thomas Gleixner
Cc: David Rientjes
Cc: Mark Salyzyn
Cc: Jeff Vander Stoep
Cc: Nick Kralevich
Cc: Catalin Marinas
Cc: Will Deacon
Cc: "H. Peter Anvin"
Cc: Hector Marco-Gisbert
Cc: Borislav Petkov
Cc: Ralf Baechle
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Cashman
2016-01-15 08:00:49 +0800

11 Sep, 2015

1 commit

2965faa5e kexec: split kexec_load syscall from kexec core code ... Browse Code »

There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young
Cc: Eric W. Biederman
Cc: Vivek Goyal
Cc: Petr Tesarik
Cc: Theodore Ts'o
Cc: Josh Boyer
Cc: David Howells
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2015-09-11 04:29:01 +0800

06 Sep, 2015

1 commit

7d9071a09 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"In this one:

- d_move fixes (Eric Biederman)

- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)

- switch of sb_writers to percpu rwsem (Oleg Nesterov)

- superblock scalability (Josef Bacik and Dave Chinner)

- swapon(2) race fix (Hugh Dickins)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...

Linus Torvalds
2015-09-06 11:34:28 +0800

15 Aug, 2015

1 commit

bf3eac84c percpu-rwsem: kill CONFIG_PERCPU_RWSEM ... Browse Code »

Remove CONFIG_PERCPU_RWSEM, the next patch adds the unconditional
user of percpu_rw_semaphore.

Signed-off-by: Oleg Nesterov

Oleg Nesterov
2015-08-15 19:52:11 +0800

03 Aug, 2015

1 commit

1987c947d locking/static_keys: Add selftest ... Browse Code »

Add a little selftest that validates all combinations.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-08-03 17:34:16 +0800

18 Jul, 2015

1 commit

5aaeb5c01 x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86 ... Browse Code »

Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2015-07-18 09:42:51 +0800

26 Jun, 2015

1 commit

3033f14ab clone: support passing tls argument via C rather than pt_regs magic ... Browse Code »

clone has some of the quirkiest syscall handling in the kernel, with a
pile of special cases, historical curiosities, and architecture-specific
calling conventions. In particular, clone with CLONE_SETTLS accepts a
parameter "tls" that the C entry point completely ignores and some
assembly entry points overwrite; instead, the low-level arch-specific
code pulls the tls parameter out of the arch-specific register captured
as part of pt_regs on entry to the kernel. That's a massive hack, and
it makes the arch-specific code only work when called via the specific
existing syscall entry points; because of this hack, any new clone-like
system call would have to accept an identical tls argument in exactly
the same arch-specific position, rather than providing a unified system
call entry point across architectures.

The first patch allows architectures to handle the tls argument via
normal C parameter passing, if they opt in by selecting
HAVE_COPY_THREAD_TLS. The second patch makes 32-bit and 64-bit x86 opt
into this.

These two patches came out of the clone4 series, which isn't ready for
this merge window, but these first two cleanup patches were entirely
uncontroversial and have acks. I'd like to go ahead and submit these
two so that other architectures can begin building on top of this and
opting into HAVE_COPY_THREAD_TLS. However, I'm also happy to wait and
send these through the next merge window (along with v3 of clone4) if
anyone would prefer that.

This patch (of 2):

clone with CLONE_SETTLS accepts an argument to set the thread-local
storage area for the new thread. sys_clone declares an int argument
tls_val in the appropriate point in the argument list (based on the
various CLONE_BACKWARDS variants), but doesn't actually use or pass along
that argument. Instead, sys_clone calls do_fork, which calls
copy_process, which calls the arch-specific copy_thread, and copy_thread
pulls the corresponding syscall argument out of the pt_regs captured at
kernel entry (knowing what argument of clone that architecture passes tls
in).

Apart from being awful and inscrutable, that also only works because only
one code path into copy_thread can pass the CLONE_SETTLS flag, and that
code path comes from sys_clone with its architecture-specific
argument-passing order. This prevents introducing a new version of the
clone system call without propagating the same architecture-specific
position of the tls argument.

However, there's no reason to pull the argument out of pt_regs when
sys_clone could just pass it down via C function call arguments.

Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
and a new copy_thread_tls that accepts the tls parameter as an additional
unsigned long (syscall-argument-sized) argument. Change sys_clone's tls
argument to an unsigned long (which does not change the ABI), and pass
that down to copy_thread_tls.

Architectures that don't opt into copy_thread_tls will continue to ignore
the C argument to sys_clone in favor of the pt_regs captured at kernel
entry, and thus will be unable to introduce new versions of the clone
syscall.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett
Acked-by: Andy Lutomirski
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Thiago Macieira
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josh Triplett
2015-06-26 08:00:38 +0800

17 Apr, 2015

1 commit

d19d5efd8 Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux ... Browse Code »

Pull powerpc updates from Michael Ellerman:

- Numerous minor fixes, cleanups etc.

- More EEH work from Gavin to remove its dependency on device_nodes.

- Memory hotplug implemented entirely in the kernel from Nathan
Fontenot.

- Removal of redundant CONFIG_PPC_OF by Kevin Hao.

- Rewrite of VPHN parsing logic & tests from Greg Kurz.

- A fix from Nish Aravamudan to reduce memory usage by clamping
nodes_possible_map.

- Support for pstore on powernv from Hari Bathini.

- Removal of old powerpc specific byte swap routines by David Gibson.

- Fix from Vasant Hegde to prevent the flash driver telling you it was
flashing your firmware when it wasn't.

- Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

- Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
Stancek.

- Some fixes for migration from Tyrel Datwyler.

- A new syscall to switch the cpu endian by Michael Ellerman.

- Large series from Wei Yang to implement SRIOV, reviewed and acked by
Bjorn.

- A fix for the OPAL sensor driver from Cédric Le Goater.

- Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

- Large series from Daniel Axtens to make our PCI hooks per PHB rather
than per machine.

- Small patch from Sam Bobroff to explicitly abort non-suspended
transactions on syscalls, plus a test to exercise it.

- Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

- Small patch to enable the hard lockup detector from Anton Blanchard.

- Fix from Dave Olson for missing L2 cache information on some CPUs.

- Some fixes from Michael Ellerman to get Cell machines booting again.

- Freescale updates from Scott: Highlights include BMan device tree
nodes, an MSI erratum workaround, a couple minor performance
improvements, config updates, and misc fixes/cleanup.

* tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
powerpc/powermac: Fix build error seen with powermac smp builds
powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
powerpc/cell: Fix iommu breakage caused by controller_ops change
powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
powerpc/pseries: Correct memory hotplug locking
powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
powerpc: Add ppc64 hard lockup detector support
oprofile: Disable oprofile NMI timer on ppc64
powerpc/perf/hv-24x7: Add missing put_cpu_var()
powerpc/perf/hv-24x7: Break up single_24x7_request
powerpc/perf/hv-24x7: Define update_event_count()
powerpc/perf/hv-24x7: Whitespace cleanup
powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
powerpc/perf/hv-24x7: Rename hv_24x7_event_update
powerpc/perf/hv-24x7: Move debug prints to separate function
powerpc/perf/hv-24x7: Drop event_24x7_request()
powerpc/perf/hv-24x7: Use pr_devel() to log message
...

Conflicts:
tools/testing/selftests/powerpc/Makefile
tools/testing/selftests/powerpc/tm/Makefile

Linus Torvalds
2015-04-17 02:53:32 +0800

15 Apr, 2015

4 commits

204db6ed1 mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE ... Browse Code »

The arch_randomize_brk() function is used on several architectures,
even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
the architectures that support it, while still handling CONFIG_COMPAT_BRK.

Signed-off-by: Kees Cook
Cc: Hector Marco-Gisbert
Cc: Russell King
Reviewed-by: Ingo Molnar
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Alexander Viro
Cc: Oleg Nesterov
Cc: Andy Lutomirski
Cc: "David A. Long"
Cc: Andrey Ryabinin
Cc: Arun Chandran
Cc: Yann Droneaud
Cc: Min-Hua Chen
Cc: Paul Burton
Cc: Alex Smith
Cc: Markos Chandras
Cc: Vineeth Vijayan
Cc: Jeff Bailey
Cc: Michael Holzheu
Cc: Ben Hutchings
Cc: Behan Webster
Cc: Ismael Ripoll
Cc: Jan-Simon Mller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2015-04-15 07:49:05 +0800
2b68f6cae mm: expose arch_mmap_rnd when available ... Browse Code »

When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook
Cc: Hector Marco-Gisbert
Cc: Russell King
Reviewed-by: Ingo Molnar
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Alexander Viro
Cc: Oleg Nesterov
Cc: Andy Lutomirski
Cc: "David A. Long"
Cc: Andrey Ryabinin
Cc: Arun Chandran
Cc: Yann Droneaud
Cc: Min-Hua Chen
Cc: Paul Burton
Cc: Alex Smith
Cc: Markos Chandras
Cc: Vineeth Vijayan
Cc: Jeff Bailey
Cc: Michael Holzheu
Cc: Ben Hutchings
Cc: Behan Webster
Cc: Ismael Ripoll
Cc: Jan-Simon Mller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2015-04-15 07:49:05 +0800
0ddab1d2e lib/ioremap.c: add huge I/O map capability interfaces ... Browse Code »

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
I/O mappings with pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

A new kernel option "nohugeiomap" is also added, so that user can disable
the huge I/O map capabilities when necessary.

Signed-off-by: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Arnd Bergmann
Cc: Dave Hansen
Cc: Robert Elliott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
2015-04-15 07:49:04 +0800
235a8f028 mm: define default PGTABLE_LEVELS to two ... Browse Code »

By this time all architectures which support more than two page table
levels should be covered. This patch add default definiton of
PGTABLE_LEVELS equal 2.

We also add assert to detect inconsistence between CONFIG_PGTABLE_LEVELS
and __PAGETABLE_PMD_FOLDED/__PAGETABLE_PUD_FOLDED.

Signed-off-by: Kirill A. Shutemov
Tested-by: Guenter Roeck
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: "David S. Miller"
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Jeff Dike
Cc: Kirill A. Shutemov
Cc: Koichi Yasutake
Cc: Martin Schwidefsky
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Richard Weinberger
Cc: Russell King
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2015-04-15 07:49:02 +0800

11 Apr, 2015

1 commit

af9feebe6 oprofile: Disable oprofile NMI timer on ppc64 ... Browse Code »

We want to enable the hard lockup detector on ppc64, but right now
that enables the oprofile NMI timer too.

We'd prefer not to enable the oprofile NMI timer, it adds another
element to our PMU testing and it requires us to increase our
exported symbols (eg cpu_khz).

Modify the config entry for OPROFILE_NMI_TIMER to disable it on PPC64.

Signed-off-by: Anton Blanchard
Acked-by: Robert Richter
Signed-off-by: Michael Ellerman

Anton Blanchard
2015-04-11 18:49:27 +0800

04 Sep, 2014

1 commit

ff27f38e0 seccomp: Document two-phase seccomp and arch-provided seccomp_data ... Browse Code »

The description of how archs should implement seccomp filters was
still strictly correct, but it failed to describe the newly
available optimizations.

Signed-off-by: Andy Lutomirski
Signed-off-by: Kees Cook

Andy Lutomirski
2014-09-04 05:58:17 +0800