Eric Lee / smarc-fsl-linux-kernel

25 Jun, 2016

1 commit

b235beea9 Clarify naming of thread info/stack allocators ... Browse Code »

We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.

But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.

Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical. That
identity then meant that we would have things like

ti = alloc_thread_info_node(tsk, node);
...
tsk->stack = ti;

which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.

So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack. The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.

This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.

The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter. It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.

Acked-by: Andy Lutomirski
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-06-25 06:09:37 +0800

18 Jun, 2016

1 commit

3a4955111 isa: Allow ISA-style drivers on modern systems ... Browse Code »

Several modern devices, such as PC/104 cards, are expected to run on
modern systems via an ISA bus interface. Since ISA is a legacy interface
for most modern architectures, ISA support should remain disabled in
general. Support for ISA-style drivers should be enabled on a per driver
basis.

To allow ISA-style drivers on modern systems, this patch introduces the
ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
build conditionally on the ISA_BUS_API Kconfig option, which defaults to
the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
ISA_BUS_API Kconfig option to be selected on architectures which do not
enable ISA (e.g. X86_64).

The ISA_BUS Kconfig option is currently only implemented for X86
architectures. Other architectures may have their own ISA_BUS Kconfig
options added as required.

Reviewed-by: Guenter Roeck
Signed-off-by: William Breathitt Gray
Acked-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

William Breathitt Gray
2016-06-18 11:21:12 +0800

29 May, 2016

2 commits

7e0fb73c5 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux ... Browse Code »

Pull string hash improvements from George Spelvin:
"This series does several related things:

- Makes the dcache hash (fs/namei.c) useful for general kernel use.

(Thanks to Bruce for noticing the zero-length corner case)

- Converts the string hashes in to use the
above.

- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.

- Rids the world of the bad hash multipliers in hash_32.

This finishes the job started in commit 689de1d6ca95 ("Minimal
fix-up of bad hashing behavior of hash_64()")

The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.

The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.

- Overhauls the dcache hash mixing.

The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)

- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.

- Sort out partial_name_hash().

The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:

- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"

Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)

On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add
microblaze: Add
m68k: Add
: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to

Linus Torvalds
2016-05-29 07:15:25 +0800
468a94285 <linux/hash.h>: Add support for architecture-specific functions ... Browse Code »

This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of .

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin
Cc: Geert Uytterhoeven
Cc: Greg Ungerer
Cc: Andreas Schwab
Cc: Philippe De Muyter
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis
Cc: Michal Simek
Cc: Yoshinori Sato
Cc: uclinux-h8-devel@lists.sourceforge.jp

George Spelvin
2016-05-29 03:48:31 +0800

21 May, 2016

3 commits

fff7fb0b2 lib/GCD.c: use binary GCD algorithm instead of Euclidean ... Browse Code »

The binary GCD algorithm is based on the following facts:
1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available. This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

#include
#include
#include
#include
#include
#include

#define swap(a, b) \
do { \
a ^= b; \
b ^= a; \
a ^= b; \
} while (0)

unsigned long gcd0(unsigned long a, unsigned long b)
{
unsigned long r;

if (a < b) {
swap(a, b);
}

if (b == 0)
return a;

while ((r = a % b) != 0) {
a = b;
b = r;
}

return b;
}

unsigned long gcd1(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

b >>= __builtin_ctzl(b);

for (;;) {
a >>= __builtin_ctzl(a);
if (a == b)
return a << __builtin_ctzl(r);

if (a < b)
swap(a, b);
a -= b;
}
}

unsigned long gcd2(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

r &= -r;

while (!(b & r))
b >>= 1;

for (;;) {
while (!(a & r))
a >>= 1;
if (a == b)
return a;

if (a < b)
swap(a, b);
a -= b;
a >>= 1;
if (a & r)
a += b;
a >>= 1;
}
}

unsigned long gcd3(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

b >>= __builtin_ctzl(b);
if (b == 1)
return r & -r;

for (;;) {
a >>= __builtin_ctzl(a);
if (a == 1)
return r & -r;
if (a == b)
return a << __builtin_ctzl(r);

if (a < b)
swap(a, b);
a -= b;
}
}

unsigned long gcd4(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

r &= -r;

while (!(b & r))
b >>= 1;
if (b == r)
return r;

for (;;) {
while (!(a & r))
a >>= 1;
if (a == r)
return r;
if (a == b)
return a;

if (a < b)
swap(a, b);
a -= b;
a >>= 1;
if (a & r)
a += b;
a >>= 1;
}
}

static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
gcd0, gcd1, gcd2, gcd3, gcd4,
};

#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

#if defined(__x86_64__)

#define rdtscll(val) do { \
unsigned long __a,__d; \
__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<= start)
ret = end - start;
else
ret = ~0ULL - start + 1 + end;

*res = gcd_res;
return ret;
}

#else

static inline struct timespec read_time(void)
{
struct timespec time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
return time;
}

static inline unsigned long long diff_time(struct timespec start, struct timespec end)
{
struct timespec temp;

if ((end.tv_nsec - start.tv_nsec) < 0) {
temp.tv_sec = end.tv_sec - start.tv_sec - 1;
temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec - start.tv_sec;
temp.tv_nsec = end.tv_nsec - start.tv_nsec;
}

return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
}

static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
unsigned long a, unsigned long b, unsigned long *res)
{
struct timespec start, end;
unsigned long gcd_res;

start = read_time();
gcd_res = gcd(a, b);
end = read_time();

*res = gcd_res;
return diff_time(start, end);
}

#endif

static inline unsigned long get_rand()
{
if (sizeof(long) == 8)
return (unsigned long)rand() << 32 | rand();
else
return rand();
}

int main(int argc, char **argv)
{
unsigned int seed = time(0);
int loops = 100;
int repeats = 1000;
unsigned long (*res)[TEST_ENTRIES];
unsigned long long elapsed[TEST_ENTRIES];
int i, j, k;

for (;;) {
int opt = getopt(argc, argv, "n:r:s:");
/* End condition always first */
if (opt == -1)
break;

switch (opt) {
case 'n':
loops = atoi(optarg);
break;
case 'r':
repeats = atoi(optarg);
break;
case 's':
seed = strtoul(optarg, NULL, 10);
break;
default:
/* You won't actually get here. */
break;
}
}

res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
memset(elapsed, 0, sizeof(elapsed));

srand(seed);
for (j = 0; j < loops; j++) {
unsigned long a = get_rand();
/* Do we have args? */
unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
unsigned long long min_elapsed[TEST_ENTRIES];
for (k = 0; k < repeats; k++) {
for (i = 0; i < TEST_ENTRIES; i++) {
unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
if (k == 0 || min_elapsed[i] > tmp)
min_elapsed[i] = tmp;
}
}
for (i = 0; i < TEST_ENTRIES; i++)
elapsed[i] += min_elapsed[i];
}

for (i = 0; i < TEST_ENTRIES; i++)
printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

k = 0;
srand(seed);
for (j = 0; j < loops; j++) {
unsigned long a = get_rand();
unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
for (i = 1; i < TEST_ENTRIES; i++) {
if (res[j][i] != res[j][0])
break;
}
if (i < TEST_ENTRIES) {
if (k == 0) {
k = 1;
fprintf(stderr, "Error:\n");
}
fprintf(stderr, "gcd(%lu, %lu): ", a, b);
for (i = 0; i < TEST_ENTRIES; i++)
fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
}
}

if (k == 0)
fprintf(stderr, "PASS\n");

free(res);

return 0;
}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 10174
gcd1: elapsed 2120
gcd2: elapsed 2902
gcd3: elapsed 2039
gcd4: elapsed 2812
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9309
gcd1: elapsed 2280
gcd2: elapsed 2822
gcd3: elapsed 2217
gcd4: elapsed 2710
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9589
gcd1: elapsed 2098
gcd2: elapsed 2815
gcd3: elapsed 2030
gcd4: elapsed 2718
PASS
zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
gcd0: elapsed 9914
gcd1: elapsed 2309
gcd2: elapsed 2779
gcd3: elapsed 2228
gcd4: elapsed 2709
PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng
Signed-off-by: George Spelvin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhaoxiu Zeng
2016-05-21 08:58:30 +0800
42a0bb3f7 printk/nmi: generic solution for safe printk in NMI ... Browse Code »

printk() takes some locks and could not be used a safe way in NMI
context.

The chance of a deadlock is real especially when printing stacks from
all CPUs. This particular problem has been addressed on x86 by the
commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").

The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited. We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers. These are not easy to avoid.

This patch reuses most of the code and makes it generic. It is useful
for all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context. It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers. There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock. It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven
Rostedt. It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter(). This is achieved by the new
HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures. We need to clean up NMI
handling there first. Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see

https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek
Suggested-by: Peter Zijlstra
Suggested-by: Steven Rostedt
Cc: Jan Kara
Acked-by: Russell King [arm part]
Cc: Daniel Thompson
Cc: Jiri Kosina
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: David Miller
Cc: Daniel Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-05-21 08:58:30 +0800
5f56a5dfd exit_thread: remove empty bodies ... Browse Code »

Define HAVE_EXIT_THREAD for archs which want to do something in
exit_thread. For others, let's define exit_thread as an empty inline.

This is a cleanup before we change the prototype of exit_thread to
accept a task parameter.

[akpm@linux-foundation.org: fix mips]
Signed-off-by: Jiri Slaby
Cc: "David S. Miller"
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: Aurelien Jacquiot
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Chen Liqin
Cc: Chris Metcalf
Cc: Chris Zankel
Cc: David Howells
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Guan Xuetao
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: James Hogan
Cc: Jeff Dike
Cc: Jesper Nilsson
Cc: Jiri Slaby
Cc: Jonas Bonn
Cc: Koichi Yasutake
Cc: Lennox Wu
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Mikael Starvik
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Rich Felker
Cc: Richard Henderson
Cc: Richard Kuo
Cc: Richard Weinberger
Cc: Russell King
Cc: Steven Miao
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2016-05-21 08:58:30 +0800

29 Feb, 2016

1 commit

b9ab5ebb1 objtool: Add CONFIG_STACK_VALIDATION option ... Browse Code »

Add a CONFIG_STACK_VALIDATION option which will run "objtool check" for
each .o file to ensure the validity of its stack metadata.

Signed-off-by: Josh Poimboeuf
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Bernd Petrovitsch
Cc: Borislav Petkov
Cc: Chris J Arges
Cc: Jiri Slaby
Cc: Linus Torvalds
Cc: Michal Marek
Cc: Namhyung Kim
Cc: Pedro Alves
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/92baab69a6bf9bc7043af0bfca9fb964a1d45546.1456719558.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2016-02-29 15:35:13 +0800

21 Jan, 2016

2 commits

e1c7e3245 dma-mapping: always provide the dma_map_ops based implementation ... Browse Code »

Move the generic implementation to now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig
Cc: "David S. Miller"
Cc: Aurelien Jacquiot
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Helge Deller
Cc: James Hogan
Cc: Jesper Nilsson
Cc: Koichi Yasutake
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Mikael Starvik
Cc: Steven Miao
Cc: Vineet Gupta
Cc: Christian Borntraeger
Cc: Joerg Roedel
Cc: Sebastian Ott
Signed-off-by: Valentin Rothberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2016-01-21 09:09:18 +0800
0d4a619b6 dma-mapping: make the generic coherent dma mmap implementation optional ... Browse Code »

This series converts all remaining architectures to use dma_map_ops and
the generic implementation of the DMA API. This not only simplifies the
code a lot, but also prepares for possible future changes like more
generic non-iommu dma_ops implementations or generic per-device
dma_map_ops.

This patch (of 16):

We have a couple architectures that do not want to support this code, so
add another Kconfig symbol that disables the code similar to what we do
for the nommu case.

Signed-off-by: Christoph Hellwig
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Steven Miao
Cc: Ley Foon Tan
Cc: David Howells
Cc: Koichi Yasutake
Cc: Chris Metcalf
Cc: "David S. Miller"
Cc: Aurelien Jacquiot
Cc: Geert Uytterhoeven
Cc: Helge Deller
Cc: James Hogan
Cc: Jesper Nilsson
Cc: Mark Salter
Cc: Mikael Starvik
Cc: Vineet Gupta
Cc: Christian Borntraeger
Cc: Joerg Roedel
Cc: Sebastian Ott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2016-01-21 09:09:18 +0800

15 Jan, 2016

1 commit

d07e22597 mm: mmap: add new /proc tunable for mmap_base ASLR ... Browse Code »

Address Space Layout Randomization (ASLR) provides a barrier to
exploitation of user-space processes in the presence of security
vulnerabilities by making it more difficult to find desired code/data
which could help an attack. This is done by adding a random offset to
the location of regions in the process address space, with a greater
range of potential offset values corresponding to better protection/a
larger search-space for brute force, but also to greater potential for
fragmentation.

The offset added to the mmap_base address, which provides the basis for
the majority of the mappings for a process, is set once on process exec
in arch_pick_mmap_layout() and is done via hard-coded per-arch values,
which reflect, hopefully, the best compromise for all systems. The
trade-off between increased entropy in the offset value generation and
the corresponding increased variability in address space fragmentation
is not absolute, however, and some platforms may tolerate higher amounts
of entropy. This patch introduces both new Kconfig values and a sysctl
interface which may be used to change the amount of entropy used for
offset generation on a system.

The direct motivation for this change was in response to the
libstagefright vulnerabilities that affected Android, specifically to
information provided by Google's project zero at:

http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html

The attack presented therein, by Google's project zero, specifically
targeted the limited randomness used to generate the offset added to the
mmap_base address in order to craft a brute-force-based attack.
Concretely, the attack was against the mediaserver process, which was
limited to respawning every 5 seconds, on an arm device. The hard-coded
8 bits used resulted in an average expected success rate of defeating
the mmap ASLR after just over 10 minutes (128 tries at 5 seconds a
piece). With this patch, and an accompanying increase in the entropy
value to 16 bits, the same attack would take an average expected time of
over 45 hours (32768 tries), which makes it both less feasible and more
likely to be noticed.

The introduced Kconfig and sysctl options are limited by per-arch
minimum and maximum values, the minimum of which was chosen to match the
current hard-coded value and the maximum of which was chosen so as to
give the greatest flexibility without generating an invalid mmap_base
address, generally a 3-4 bits less than the number of bits in the
user-space accessible virtual address space.

When decided whether or not to change the default value, a system
developer should consider that mmap_base address could be placed
anywhere up to 2^(value) bits away from the non-randomized location,
which would introduce variable-sized areas above and below the mmap_base
address such that the maximum vm_area_struct size may be reduced,
preventing very large allocations.

This patch (of 4):

ASLR only uses as few as 8 bits to generate the random offset for the
mmap base address on 32 bit architectures. This value was chosen to
prevent a poorly chosen value from dividing the address space in such a
way as to prevent large allocations. This may not be an issue on all
platforms. Allow the specification of a minimum number of bits so that
platforms desiring greater ASLR protection may determine where to place
the trade-off.

Signed-off-by: Daniel Cashman
Cc: Russell King
Acked-by: Kees Cook
Cc: Ingo Molnar
Cc: Jonathan Corbet
Cc: Don Zickus
Cc: Eric W. Biederman
Cc: Heinrich Schuchardt
Cc: Josh Poimboeuf
Cc: Kirill A. Shutemov
Cc: Naoya Horiguchi
Cc: Andrea Arcangeli
Cc: Mel Gorman
Cc: Thomas Gleixner
Cc: David Rientjes
Cc: Mark Salyzyn
Cc: Jeff Vander Stoep
Cc: Nick Kralevich
Cc: Catalin Marinas
Cc: Will Deacon
Cc: "H. Peter Anvin"
Cc: Hector Marco-Gisbert
Cc: Borislav Petkov
Cc: Ralf Baechle
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Cashman
2016-01-15 08:00:49 +0800

11 Sep, 2015

1 commit

2965faa5e kexec: split kexec_load syscall from kexec core code ... Browse Code »

There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young
Cc: Eric W. Biederman
Cc: Vivek Goyal
Cc: Petr Tesarik
Cc: Theodore Ts'o
Cc: Josh Boyer
Cc: David Howells
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2015-09-11 04:29:01 +0800

06 Sep, 2015

1 commit

7d9071a09 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"In this one:

- d_move fixes (Eric Biederman)

- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)

- switch of sb_writers to percpu rwsem (Oleg Nesterov)

- superblock scalability (Josef Bacik and Dave Chinner)

- swapon(2) race fix (Hugh Dickins)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...

Linus Torvalds
2015-09-06 11:34:28 +0800

15 Aug, 2015

1 commit

bf3eac84c percpu-rwsem: kill CONFIG_PERCPU_RWSEM ... Browse Code »

Remove CONFIG_PERCPU_RWSEM, the next patch adds the unconditional
user of percpu_rw_semaphore.

Signed-off-by: Oleg Nesterov

Oleg Nesterov
2015-08-15 19:52:11 +0800

03 Aug, 2015

1 commit

1987c947d locking/static_keys: Add selftest ... Browse Code »

Add a little selftest that validates all combinations.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-08-03 17:34:16 +0800

18 Jul, 2015

1 commit

5aaeb5c01 x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86 ... Browse Code »

Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2015-07-18 09:42:51 +0800

26 Jun, 2015

1 commit

3033f14ab clone: support passing tls argument via C rather than pt_regs magic ... Browse Code »

clone has some of the quirkiest syscall handling in the kernel, with a
pile of special cases, historical curiosities, and architecture-specific
calling conventions. In particular, clone with CLONE_SETTLS accepts a
parameter "tls" that the C entry point completely ignores and some
assembly entry points overwrite; instead, the low-level arch-specific
code pulls the tls parameter out of the arch-specific register captured
as part of pt_regs on entry to the kernel. That's a massive hack, and
it makes the arch-specific code only work when called via the specific
existing syscall entry points; because of this hack, any new clone-like
system call would have to accept an identical tls argument in exactly
the same arch-specific position, rather than providing a unified system
call entry point across architectures.

The first patch allows architectures to handle the tls argument via
normal C parameter passing, if they opt in by selecting
HAVE_COPY_THREAD_TLS. The second patch makes 32-bit and 64-bit x86 opt
into this.

These two patches came out of the clone4 series, which isn't ready for
this merge window, but these first two cleanup patches were entirely
uncontroversial and have acks. I'd like to go ahead and submit these
two so that other architectures can begin building on top of this and
opting into HAVE_COPY_THREAD_TLS. However, I'm also happy to wait and
send these through the next merge window (along with v3 of clone4) if
anyone would prefer that.

This patch (of 2):

clone with CLONE_SETTLS accepts an argument to set the thread-local
storage area for the new thread. sys_clone declares an int argument
tls_val in the appropriate point in the argument list (based on the
various CLONE_BACKWARDS variants), but doesn't actually use or pass along
that argument. Instead, sys_clone calls do_fork, which calls
copy_process, which calls the arch-specific copy_thread, and copy_thread
pulls the corresponding syscall argument out of the pt_regs captured at
kernel entry (knowing what argument of clone that architecture passes tls
in).

Apart from being awful and inscrutable, that also only works because only
one code path into copy_thread can pass the CLONE_SETTLS flag, and that
code path comes from sys_clone with its architecture-specific
argument-passing order. This prevents introducing a new version of the
clone system call without propagating the same architecture-specific
position of the tls argument.

However, there's no reason to pull the argument out of pt_regs when
sys_clone could just pass it down via C function call arguments.

Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
and a new copy_thread_tls that accepts the tls parameter as an additional
unsigned long (syscall-argument-sized) argument. Change sys_clone's tls
argument to an unsigned long (which does not change the ABI), and pass
that down to copy_thread_tls.

Architectures that don't opt into copy_thread_tls will continue to ignore
the C argument to sys_clone in favor of the pt_regs captured at kernel
entry, and thus will be unable to introduce new versions of the clone
syscall.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett
Acked-by: Andy Lutomirski
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Thiago Macieira
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josh Triplett
2015-06-26 08:00:38 +0800

17 Apr, 2015

1 commit

d19d5efd8 Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux ... Browse Code »

Pull powerpc updates from Michael Ellerman:

- Numerous minor fixes, cleanups etc.

- More EEH work from Gavin to remove its dependency on device_nodes.

- Memory hotplug implemented entirely in the kernel from Nathan
Fontenot.

- Removal of redundant CONFIG_PPC_OF by Kevin Hao.

- Rewrite of VPHN parsing logic & tests from Greg Kurz.

- A fix from Nish Aravamudan to reduce memory usage by clamping
nodes_possible_map.

- Support for pstore on powernv from Hari Bathini.

- Removal of old powerpc specific byte swap routines by David Gibson.

- Fix from Vasant Hegde to prevent the flash driver telling you it was
flashing your firmware when it wasn't.

- Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

- Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
Stancek.

- Some fixes for migration from Tyrel Datwyler.

- A new syscall to switch the cpu endian by Michael Ellerman.

- Large series from Wei Yang to implement SRIOV, reviewed and acked by
Bjorn.

- A fix for the OPAL sensor driver from Cédric Le Goater.

- Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

- Large series from Daniel Axtens to make our PCI hooks per PHB rather
than per machine.

- Small patch from Sam Bobroff to explicitly abort non-suspended
transactions on syscalls, plus a test to exercise it.

- Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

- Small patch to enable the hard lockup detector from Anton Blanchard.

- Fix from Dave Olson for missing L2 cache information on some CPUs.

- Some fixes from Michael Ellerman to get Cell machines booting again.

- Freescale updates from Scott: Highlights include BMan device tree
nodes, an MSI erratum workaround, a couple minor performance
improvements, config updates, and misc fixes/cleanup.

* tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
powerpc/powermac: Fix build error seen with powermac smp builds
powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
powerpc/cell: Fix iommu breakage caused by controller_ops change
powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
powerpc/pseries: Correct memory hotplug locking
powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
powerpc: Add ppc64 hard lockup detector support
oprofile: Disable oprofile NMI timer on ppc64
powerpc/perf/hv-24x7: Add missing put_cpu_var()
powerpc/perf/hv-24x7: Break up single_24x7_request
powerpc/perf/hv-24x7: Define update_event_count()
powerpc/perf/hv-24x7: Whitespace cleanup
powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
powerpc/perf/hv-24x7: Rename hv_24x7_event_update
powerpc/perf/hv-24x7: Move debug prints to separate function
powerpc/perf/hv-24x7: Drop event_24x7_request()
powerpc/perf/hv-24x7: Use pr_devel() to log message
...

Conflicts:
tools/testing/selftests/powerpc/Makefile
tools/testing/selftests/powerpc/tm/Makefile

Linus Torvalds
2015-04-17 02:53:32 +0800

15 Apr, 2015

4 commits

204db6ed1 mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE ... Browse Code »

The arch_randomize_brk() function is used on several architectures,
even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
the architectures that support it, while still handling CONFIG_COMPAT_BRK.

Signed-off-by: Kees Cook
Cc: Hector Marco-Gisbert
Cc: Russell King
Reviewed-by: Ingo Molnar
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Alexander Viro
Cc: Oleg Nesterov
Cc: Andy Lutomirski
Cc: "David A. Long"
Cc: Andrey Ryabinin
Cc: Arun Chandran
Cc: Yann Droneaud
Cc: Min-Hua Chen
Cc: Paul Burton
Cc: Alex Smith
Cc: Markos Chandras
Cc: Vineeth Vijayan
Cc: Jeff Bailey
Cc: Michael Holzheu
Cc: Ben Hutchings
Cc: Behan Webster
Cc: Ismael Ripoll
Cc: Jan-Simon Mller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2015-04-15 07:49:05 +0800
2b68f6cae mm: expose arch_mmap_rnd when available ... Browse Code »

When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook
Cc: Hector Marco-Gisbert
Cc: Russell King
Reviewed-by: Ingo Molnar
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Alexander Viro
Cc: Oleg Nesterov
Cc: Andy Lutomirski
Cc: "David A. Long"
Cc: Andrey Ryabinin
Cc: Arun Chandran
Cc: Yann Droneaud
Cc: Min-Hua Chen
Cc: Paul Burton
Cc: Alex Smith
Cc: Markos Chandras
Cc: Vineeth Vijayan
Cc: Jeff Bailey
Cc: Michael Holzheu
Cc: Ben Hutchings
Cc: Behan Webster
Cc: Ismael Ripoll
Cc: Jan-Simon Mller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2015-04-15 07:49:05 +0800
0ddab1d2e lib/ioremap.c: add huge I/O map capability interfaces ... Browse Code »

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
I/O mappings with pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

A new kernel option "nohugeiomap" is also added, so that user can disable
the huge I/O map capabilities when necessary.

Signed-off-by: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Arnd Bergmann
Cc: Dave Hansen
Cc: Robert Elliott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
2015-04-15 07:49:04 +0800
235a8f028 mm: define default PGTABLE_LEVELS to two ... Browse Code »

By this time all architectures which support more than two page table
levels should be covered. This patch add default definiton of
PGTABLE_LEVELS equal 2.

We also add assert to detect inconsistence between CONFIG_PGTABLE_LEVELS
and __PAGETABLE_PMD_FOLDED/__PAGETABLE_PUD_FOLDED.

Signed-off-by: Kirill A. Shutemov
Tested-by: Guenter Roeck
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: "David S. Miller"
Cc: "H. Peter Anvin"
Cc: "James E.J. Bottomley"
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Ingo Molnar
Cc: Jeff Dike
Cc: Kirill A. Shutemov
Cc: Koichi Yasutake
Cc: Martin Schwidefsky
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Richard Weinberger
Cc: Russell King
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2015-04-15 07:49:02 +0800

11 Apr, 2015

1 commit

af9feebe6 oprofile: Disable oprofile NMI timer on ppc64 ... Browse Code »

We want to enable the hard lockup detector on ppc64, but right now
that enables the oprofile NMI timer too.

We'd prefer not to enable the oprofile NMI timer, it adds another
element to our PMU testing and it requires us to increase our
exported symbols (eg cpu_khz).

Modify the config entry for OPROFILE_NMI_TIMER to disable it on PPC64.

Signed-off-by: Anton Blanchard
Acked-by: Robert Richter
Signed-off-by: Michael Ellerman

Anton Blanchard
2015-04-11 18:49:27 +0800

04 Sep, 2014

1 commit

ff27f38e0 seccomp: Document two-phase seccomp and arch-provided seccomp_data ... Browse Code »

The description of how archs should implement seccomp filters was
still strictly correct, but it failed to describe the newly
available optimizations.

Signed-off-by: Andy Lutomirski
Signed-off-by: Kees Cook

Andy Lutomirski
2014-09-04 05:58:17 +0800

19 Jul, 2014

1 commit

48dc92b9f seccomp: add "seccomp" syscall ... Browse Code »

This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:37 +0800

19 Mar, 2014

1 commit

09294e31b uprobes: Kconfig dependency fix ... Browse Code »

Suggested change from Oleg Nesterov. Fixes incomplete dependencies
for uprobes feature.

Signed-off-by: David A. Long
Acked-by: Oleg Nesterov

David A. Long
2014-03-19 04:39:33 +0800

20 Dec, 2013

2 commits

8779657d2 stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG ... Browse Code »

This changes the stack protector config option into a choice of
"None", "Regular", and "Strong":

CONFIG_CC_STACKPROTECTOR_NONE
CONFIG_CC_STACKPROTECTOR_REGULAR
CONFIG_CC_STACKPROTECTOR_STRONG

"Regular" means the old CONFIG_CC_STACKPROTECTOR=y option.

"Strong" is a new mode introduced by this patch. With "Strong" the
kernel is built with -fstack-protector-strong (available in
gcc 4.9 and later). This option increases the coverage of the stack
protector without the heavy performance hit of -fstack-protector-all.

For reference, the stack protector options available in gcc are:

-fstack-protector-all:
Adds the stack-canary saving prefix and stack-canary checking
suffix to _all_ function entry and exit. Results in substantial
use of stack space for saving the canary for deep stack users
(e.g. historically xfs), and measurable (though shockingly still
low) performance hit due to all the saving/checking. Really not
suitable for sane systems, and was entirely removed as an option
from the kernel many years ago.

-fstack-protector:
Adds the canary save/check to functions that define an 8
(--param=ssp-buffer-size=N, N=8 by default) or more byte local
char array. Traditionally, stack overflows happened with
string-based manipulations, so this was a way to find those
functions. Very few total functions actually get the canary; no
measurable performance or size overhead.

-fstack-protector-strong
Adds the canary for a wider set of functions, since it's not
just those with strings that have ultimately been vulnerable to
stack-busting. With this superset, more functions end up with a
canary, but it still remains small compared to all functions
with only a small change in performance. Based on the original
design document, a function gets the canary when it contains any
of:

- local variable's address used as part of the right hand side
of an assignment or function argument
- local variable is an array (or union containing an array),
regardless of array type or length
- uses register local variables

https://docs.google.com/a/google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU

Find below a comparison of "size" and "objdump" output when built with
gcc-4.9 in three configurations:

- defconfig
11430641 kernel text size
36110 function bodies

- defconfig + CONFIG_CC_STACKPROTECTOR_REGULAR
11468490 kernel text size (+0.33%)
1015 of 36110 functions are stack-protected (2.81%)

- defconfig + CONFIG_CC_STACKPROTECTOR_STRONG via this patch
11692790 kernel text size (+2.24%)
7401 of 36110 functions are stack-protected (20.5%)

With -strong, ARM's compressed boot code now triggers stack
protection, so a static guard was added. Since this is only used
during decompression and was never used before, the exposure
here is very small. Once it switches to the full kernel, the
stack guard is back to normal.

Chrome OS has been using -fstack-protector-strong for its kernel
builds for the last 8 months with no problems.

Signed-off-by: Kees Cook
Cc: Arjan van de Ven
Cc: Michal Marek
Cc: Russell King
Cc: Ralf Baechle
Cc: Paul Mundt
Cc: James Hogan
Cc: Stephen Rothwell
Cc: Shawn Guo
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1387481759-14535-3-git-send-email-keescook@chromium.org
[ Improved the changelog and descriptions some more. ]
Signed-off-by: Ingo Molnar

Kees Cook
2013-12-20 16:38:40 +0800
19952a920 stackprotector: Unify the HAVE_CC_STACKPROTECTOR logic between architectures ... Browse Code »

Instead of duplicating the CC_STACKPROTECTOR Kconfig and
Makefile logic in each architecture, switch to using
HAVE_CC_STACKPROTECTOR and keep everything in one place. This
retains the x86-specific bug verification scripts.

Signed-off-by: Kees Cook
Cc: Arjan van de Ven
Cc: Michal Marek
Cc: Russell King
Cc: Ralf Baechle
Cc: Paul Mundt
Cc: James Hogan
Cc: Stephen Rothwell
Cc: Shawn Guo
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1387481759-14535-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar

Kees Cook
2013-12-20 16:38:40 +0800

15 Nov, 2013

1 commit

0a06ff068 kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS ... Browse Code »

We've switched over every architecture that supports SMP to it, so
remove the new useless config variable.

Signed-off-by: Christoph Hellwig
Cc: Jan Kara
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2013-11-15 08:32:22 +0800

12 Nov, 2013

1 commit

87093826a Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer changes from Ingo Molnar:
"Main changes in this cycle were:

- Updated full dynticks support.

- Event stream support for architected (ARM) timers.

- ARM clocksource driver updates.

- Move arm64 to using the generic sched_clock framework & resulting
cleanup in the generic sched_clock code.

- Misc fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
clocksource: sun4i: remove IRQF_DISABLED
clocksource: sun4i: Report the minimum tick that we can program
clocksource: sun4i: Select CLKSRC_MMIO
clocksource: Provide timekeeping for efm32 SoCs
clocksource: em_sti: convert to clk_prepare/unprepare
time: Fix signedness bug in sysfs_get_uname() and its callers
timekeeping: Fix some trivial typos in comments
alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
clocksource: arch_timer: Do not register arch_sys_counter twice
timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
sched_clock: Remove sched_clock_func() hook
arch_timer: Move to generic sched_clock framework
clocksource: tcb_clksrc: Remove IRQF_DISABLED
clocksource: tcb_clksrc: Improve driver robustness
clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
clocksource: dw_apb_timer_of: Mark a few more functions as __init
clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
arm: zynq: Enable arm_global_timer
...

Linus Torvalds
2013-11-12 09:36:00 +0800

01 Oct, 2013

1 commit

cc1f02745 irq: Optimize softirq stack selection in irq exit ... Browse Code »

If irq_exit() is called on the arch's specified irq stack,
it should be safe to run softirqs inline under that same
irq stack as it is near empty by the time we call irq_exit().

For example if we use the same stack for both hard and soft irqs here,
the worst case scenario is:
hardirq -> softirq -> hardirq. But then the softirq supersedes the
first hardirq as the stack user since irq_exit() is called in
a mostly empty stack. So the stack merge in this case looks acceptable.

Stack overrun still have a chance to happen if hardirqs have more
opportunities to nest, but then it's another problem to solve.

So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol
that can be defined when irq_exit() runs on the irq stack. That way
we can spare some stack switch on irq processing and all the cache
issues that come along.

Acked-by: Linus Torvalds
Signed-off-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul Mackerras
Cc: James Hogan
Cc: James E.J. Bottomley
Cc: Helge Deller
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: David S. Miller
Cc: Andrew Morton

Frederic Weisbecker
2013-10-01 18:53:27 +0800

30 Sep, 2013

1 commit

554b0004d vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN Kconfig ... Browse Code »

With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order
to use that feature, arch code should be audited to ensure there are no
races in concurrent read/write of cputime_t. For example,
reading/writing 64-bit cputime_t on some 32-bit arches may require
multiple accesses for low and high value parts, so proper locking
is needed to protect against concurrent accesses.

Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can
enable after they've been audited for potential races.

This option is automatically enabled on 64-bit platforms.

Feature requested by Frederic Weisbecker.

Signed-off-by: Kevin Hilman
Cc: Ingo Molnar
Cc: Russell King
Cc: Paul E. McKenney
Cc: Arm Linux
Signed-off-by: Frederic Weisbecker

Kevin Hilman
2013-09-30 21:35:53 +0800

28 Sep, 2013

1 commit

083986e82 mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef ... Browse Code »

Linus suggested to replace

#ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
#define arch_mutex_cpu_relax() cpu_relax()
#endif

with just a simple

#ifndef arch_mutex_cpu_relax
# define arch_mutex_cpu_relax() cpu_relax()
#endif

to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
simply define arch_mutex_cpu_relax if they want an architecture
specific function instead of having to add a select statement in
their Kconfig in addition.

Suggested-by: Linus Torvalds
Signed-off-by: Heiko Carstens

Heiko Carstens
2013-09-28 18:46:21 +0800

14 Aug, 2013

1 commit

dfa9771a7 microblaze: fix clone syscall ... Browse Code »

Fix inadvertent breakage in the clone syscall ABI for Microblaze that
was introduced in commit f3268edbe6fe ("microblaze: switch to generic
fork/vfork/clone").

The Microblaze syscall ABI for clone takes the parent tid address in the
4th argument; the third argument slot is used for the stack size. The
incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
slot.

This commit restores the original ABI so that existing userspace libc
code will work correctly.

All kernel versions from v3.8-rc1 were affected.

Signed-off-by: Michal Simek
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Simek
2013-08-14 08:57:48 +0800

04 Jul, 2013

1 commit

0f8975ec4 mm: soft-dirty bits for user memory changes tracking ... Browse Code »

The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to. In order to do this tracking one should

1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
2. Wait some time.
3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is. Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast. This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:26 +0800

16 May, 2013

1 commit

37cae5e24 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core fixes from Thomas Gleixner:

- Two fixlets for the fallout of the generic idle task conversion

- Documentation update

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exit
idle: Fix hlt/nohlt command-line handling in new generic idle
kthread: Document ways of reducing OS jitter due to per-CPU kthreads

Linus Torvalds
2013-05-16 05:04:00 +0800

06 May, 2013

1 commit

f8ce1faf5 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull mudule updates from Rusty Russell:
"We get rid of the general module prefix confusion with a binary config
option, fix a remove/insert race which Never Happens, and (my
favorite) handle the case when we have too many modules for a single
commandline. Seriously, the kernel is full, please go away!"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modpost: fix unwanted VMLINUX_SYMBOL_STR expansion
X.509: Support parse long form of length octets in Authority Key Identifier
module: don't unlink the module until we've removed all exposure.
kernel: kallsyms: memory override issue, need check destination buffer length
MODSIGN: do not send garbage to stderr when enabling modules signature
modpost: handle huge numbers of modules.
modpost: add -T option to read module names from file/stdin.
modpost: minor cleanup.
genksyms: pass symbol-prefix instead of arch
module: fix symbol versioning with symbol prefixes
CONFIG_SYMBOL_PREFIX: cleanup.

Linus Torvalds
2013-05-06 01:58:06 +0800

05 May, 2013

1 commit

485cf5dac idle: Fix hlt/nohlt command-line handling in new generic idle ... Browse Code »

commit d1669912 (idle: Implement generic idle function) added a new
generic idle along with support for hlt/nohlt command line options to
override default idle loop behavior. However, the command-line
processing is never compiled.

The command-line handling is wrapped by CONFIG_GENERIC_IDLE_POLL_SETUP
and arches that use this feature select it in their Kconfigs.
However, no Kconfig definition was created for this option, so it is
never enabled, and therefore command-line override of the idle-loop
behavior is broken after migrating to the generic idle loop.

To fix, add a Kconfig definition for GENERIC_IDLE_POLL_SETUP.

Tested on ARM (OMAP4/Panda) which enables the command-line overrides
by default.

Signed-off-by: Kevin Hilman
Reviewed-by: Srivatsa S. Bhat
Cc: Linus Torvalds
Cc: Rusty Russell
Cc: Paul McKenney
Cc: Peter Zijlstra
Cc: Magnus Damm
Cc: linux-arm-kernel@lists.infradead.org
Cc: linaro-kernel@lists.linaro.org
Link: http://lkml.kernel.org/r/1366849153-25564-1-git-send-email-khilman@linaro.org
Signed-off-by: Thomas Gleixner

Kevin Hilman
2013-05-05 14:29:40 +0800

01 May, 2013

1 commit

08d767608 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull compat cleanup from Al Viro:
"Mostly about syscall wrappers this time; there will be another pile
with patches in the same general area from various people, but I'd
rather push those after both that and vfs.git pile are in."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
syscalls.h: slightly reduce the jungles of macros
get rid of union semop in sys_semctl(2) arguments
make do_mremap() static
sparc: no need to sign-extend in sync_file_range() wrapper
ppc compat wrappers for add_key(2) and request_key(2) are pointless
x86: trim sys_ia32.h
x86: sys32_kill and sys32_mprotect are pointless
get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
merge compat sys_ipc instances
consolidate compat lookup_dcookie()
convert vmsplice to COMPAT_SYSCALL_DEFINE
switch getrusage() to COMPAT_SYSCALL_DEFINE
switch epoll_pwait to COMPAT_SYSCALL_DEFINE
convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
make HAVE_SYSCALL_WRAPPERS unconditional
consolidate cond_syscall and SYSCALL_ALIAS declarations
teach SYSCALL_DEFINE how to deal with long long/unsigned long long
get rid of duplicate logics in __SC_....[1-6] definitions

Linus Torvalds
2013-05-01 22:21:43 +0800

15 Mar, 2013

1 commit

b92021b09 CONFIG_SYMBOL_PREFIX: cleanup. ... Browse Code »

We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
"_". But Al Viro broke this in "consolidate cond_syscall and
SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
do so.

Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
prefix it so something. So various places define helpers which are
defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:

1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
for pasting.

(arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).

Let's solve this properly:
1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
2) Make linux/export.h usable from asm.
3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
4) Make everyone use them.

Signed-off-by: Rusty Russell
Reviewed-by: James Hogan
Tested-by: James Hogan (metag)

Rusty Russell
2013-03-15 12:39:43 +0800