Eric Lee / smarc-fsl-linux-kernel

13 Jul, 2017

23 commits

7c43a657a test_sysctl: test against int proc_dointvec() array support ... Browse Code »

Add a few initial respective tests for an array:

o Echoing values separated by spaces works
o Echoing only first elements will set first elements
o Confirm PAGE_SIZE limit still applies even if an array is used

Link: http://lkml.kernel.org/r/20170630224431.17374-7-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Kees Cook
Cc: "Eric W. Biederman"
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
2920fad3a test_sysctl: add simple proc_douintvec() case ... Browse Code »

Test against a simple proc_douintvec() case. While at it, add a test
against UINT_MAX. Make sure UINT_MAX works, and UINT_MAX+1 will fail
and that negative values are not accepted.

Link: http://lkml.kernel.org/r/20170630224431.17374-6-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Kees Cook
Cc: "Eric W. Biederman"
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
eb965eda1 test_sysctl: add simple proc_dointvec() case ... Browse Code »

Test against a simple proc_dointvec() case. While at it, add a test
against INT_MAX. Make sure INT_MAX works, and INT_MAX+1 will fail.
Also test negative values work.

Link: http://lkml.kernel.org/r/20170630224431.17374-5-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Kees Cook
Cc: "Eric W. Biederman"
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
1c0357c84 test_sysctl: test against PAGE_SIZE for int ... Browse Code »

Add the following tests to ensure we do not regress:

o Test using a buffer full of space (PAGE_SIZE-1) followed by a
single digit works

o Test using a buffer full of spaces (PAGE_SIZE or over) will fail

As tests increase instead of unloading the module and reloading it we
can just do a shell reset_vals() with a reset to values we know are set
at init on the driver.

Link: http://lkml.kernel.org/r/20170630224431.17374-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Kees Cook
Cc: "Eric W. Biederman"
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
64b671204 test_sysctl: add generic script to expand on tests ... Browse Code »

This adds a generic script to let us more easily add more tests cases.
Since we really have only two types of tests cases just fold them into
the one file. Each test unit is now identified into its separate
function:

# ./sysctl.sh -l
Test ID list:

TEST_ID x NUM_TEST
TEST_ID: Test ID
NUM_TESTS: Number of recommended times to run the test

0001 x 1 - tests proc_dointvec_minmax()
0002 x 1 - tests proc_dostring()

For now we start off with what we had before, and run only each test
once. We can now watch a test case until it fails:

./sysctl.sh -w 0002

We can also run a test case x number of times, say we want to run a test
case 100 times:

./sysctl.sh -c 0001 100

To run a test case only once, for example:

./sysctl.sh -s 0002

The default settings are specified at the top of sysctl.sh.

Link: http://lkml.kernel.org/r/20170630224431.17374-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Kees Cook
Cc: "Eric W. Biederman"
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
9308f2f9e test_sysctl: add dedicated proc sysctl test driver ... Browse Code »

The existing tools/testing/selftests/sysctl/ tests include two test
cases, but these use existing production kernel sysctl interfaces. We
want to expand test coverage but we can't just be looking for random
safe production values to poke at, that's just insane!

Instead just dedicate a test driver for debugging purposes and port the
existing scripts to use it. This will make it easier for further tests
to be added.

Subsequent patches will extend our test coverage for sysctl.

The stress test driver uses a new license (GPL on Linux, copyleft-next
outside of Linux). Linus was fine with this [0] and later due to Ted's
and Alans's request ironed out an "or" language clause to use [1] which
is already present upstream.

[0] https://lkml.kernel.org/r/CA+55aFyhxcvD+q7tp+-yrSFDKfR0mOHgyEAe=f_94aKLsOu0Og@mail.gmail.com
[1] https://lkml.kernel.org/r/1495234558.7848.122.camel@linux.intel.com

Link: http://lkml.kernel.org/r/20170630224431.17374-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Acked-by: Kees Cook
Cc: "Eric W. Biederman"
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
61d9b56a8 sysctl: add unsigned int range support ... Browse Code »

To keep parity with regular int interfaces provide the an unsigned int
proc_douintvec_minmax() which allows you to specify a range of allowed
valid numbers.

Adding proc_douintvec_minmax_sysadmin() is easy but we can wait for an
actual user for that.

Link: http://lkml.kernel.org/r/20170519033554.18592-6-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Acked-by: Kees Cook
Cc: Subash Abhinov Kasiviswanathan
Cc: Heinrich Schuchardt
Cc: Kees Cook
Cc: "David S. Miller"
Cc: Ingo Molnar
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
4f2fec00a sysctl: simplify unsigned int support ... Browse Code »

Commit e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32
fields") added proc_douintvec() to start help adding support for
unsigned int, this however was only half the work needed. Two fixes
have come in since then for the following issues:

o Printing the values shows a negative value, this happens since
do_proc_dointvec() and this uses proc_put_long()

This was fixed by commit 5380e5644afbba9 ("sysctl: don't print negative
flag for proc_douintvec").

o We can easily wrap around the int values: UINT_MAX is 4294967295, if
we echo in 4294967295 + 1 we end up with 0, using 4294967295 + 2 we
end up with 1.
o We echo negative values in and they are accepted

This was fixed by commit 425fffd886ba ("sysctl: report EINVAL if value
is larger than UINT_MAX for proc_douintvec").

It still also failed to be added to sysctl_check_table()... instead of
adding it with the current implementation just provide a proper and
simplified unsigned int support without any array unsigned int support
with no negative support at all.

Historically sysctl proc helpers have supported arrays, due to the
complexity this adds though we've taken a step back to evaluate array
users to determine if its worth upkeeping for unsigned int. An
evaluation using Coccinelle has been done to perform a grammatical
search to ask ourselves:

o How many sysctl proc_dointvec() (int) users exist which likely
should be moved over to proc_douintvec() (unsigned int) ?
Answer: about 8
- Of these how many are array users ?
Answer: Probably only 1
o How many sysctl array users exist ?
Answer: about 12

This last question gives us an idea just how popular arrays: they are not.
Array support should probably just be kept for strings.

The identified uint ports are:

drivers/infiniband/core/ucma.c - max_backlog
drivers/infiniband/core/iwcm.c - default_backlog
net/core/sysctl_net_core.c - rps_sock_flow_sysctl()
net/netfilter/nf_conntrack_timestamp.c - nf_conntrack_timestamp -- bool
net/netfilter/nf_conntrack_acct.c nf_conntrack_acct -- bool
net/netfilter/nf_conntrack_ecache.c - nf_conntrack_events -- bool
net/netfilter/nf_conntrack_helper.c - nf_conntrack_helper -- bool
net/phonet/sysctl.c proc_local_port_range()

The only possible array users is proc_local_port_range() but it does not
seem worth it to add array support just for this given the range support
works just as well. Unsigned int support should be desirable more for
when you *need* more than INT_MAX or using int min/max support then does
not suffice for your ranges.

If you forget and by mistake happen to register an unsigned int proc
entry with an array, the driver will fail and you will get something as
follows:

sysctl table check failed: debug/test_sysctl//uint_0002 array now allowed
CPU: 2 PID: 1342 Comm: modprobe Tainted: G W E
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Call Trace:
dump_stack+0x63/0x81
__register_sysctl_table+0x350/0x650
? kmem_cache_alloc_trace+0x107/0x240
__register_sysctl_paths+0x1b3/0x1e0
? 0xffffffffc005f000
register_sysctl_table+0x1f/0x30
test_sysctl_init+0x10/0x1000 [test_sysctl]
do_one_initcall+0x52/0x1a0
? kmem_cache_alloc_trace+0x107/0x240
do_init_module+0x5f/0x200
load_module+0x1867/0x1bd0
? __symbol_put+0x60/0x60
SYSC_finit_module+0xdf/0x110
SyS_finit_module+0xe/0x10
entry_SYSCALL_64_fastpath+0x1e/0xad
RIP: 0033:0x7f042b22d119

Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Link: http://lkml.kernel.org/r/20170519033554.18592-5-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Suggested-by: Alexey Dobriyan
Cc: Subash Abhinov Kasiviswanathan
Cc: Liping Zhang
Cc: Alexey Dobriyan
Cc: Heinrich Schuchardt
Cc: Kees Cook
Cc: "David S. Miller"
Cc: Ingo Molnar
Cc: Al Viro
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
d383d4847 sysctl: fold sysctl_writes_strict checks into helper ... Browse Code »

The mode sysctl_writes_strict positional checks keep being copy and pasted
as we add new proc handlers. Just add a helper to avoid code duplication.

Link: http://lkml.kernel.org/r/20170519033554.18592-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Suggested-by: Kees Cook
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
a19ac3374 sysctl: kdoc'ify sysctl_writes_strict ... Browse Code »

Document the different sysctl_writes_strict modes in code.

Link: http://lkml.kernel.org/r/20170519033554.18592-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Alexey Dobriyan
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
89c5b53b1 sysctl: fix lax sysctl_check_table() sanity check ... Browse Code »

Patch series "sysctl: few fixes", v5.

I've been working on making kmod more deterministic, and as I did that I
couldn't help but notice a few issues with sysctl. My end goal was just
to fix unsigned int support, which back then was completely broken.
Liping Zhang has sent up small atomic fixes, however it still missed yet
one more fix and Alexey Dobriyan had also suggested to just drop array
support given its complexity.

I have inspected array support using Coccinelle and indeed its not that
popular, so if in fact we can avoid it for new interfaces, I agree its
best.

I did develop a sysctl stress driver but will hold that off for another
series.

This patch (of 5):

Commit 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks")
improved sanity checks considerbly, however the enhancements on
sysctl_check_table() meant adding a functional change so that only the
last table entry's sanity error is propagated. It also changed the way
errors were propagated so that each new check reset the err value, this
means only last sanity check computed is used for an error. This has
been in the kernel since v3.4 days.

Fix this by carrying on errors from previous checks and iterations as we
traverse the table and ensuring we keep any error from previous checks.
We keep iterating on the table even if an error is found so we can
complain for all errors found in one shot. This works as -EINVAL is
always returned on error anyway, and the check for error is any non-zero
value.

Fixes: 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks")
Link: http://lkml.kernel.org/r/20170519033554.18592-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Alexey Dobriyan
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-07-13 07:26:00 +0800
a711bdc09 kexec/kdump: minor Documentation updates for arm64 and Image ... Browse Code »

Minor updates in Documentation for arm64 as relocatable kernel. Also
this patch updates documentation for using uncompressed image "Image"
which is used for ARM64.

Link: http://lkml.kernel.org/r/1495104793-6563-1-git-send-email-Bharat.Bhushan@nxp.com
Signed-off-by: Bharat Bhushan
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Jonathan Corbet
Cc: AKASHI Takahiro
Cc: Pratyush Anand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bharat Bhushan
2017-07-13 07:26:00 +0800
1229384f5 kdump: protect vmcoreinfo data under the crash memory ... Browse Code »

Currently vmcoreinfo data is updated at boot time subsys_initcall(), it
has the risk of being modified by some wrong code during system is
running.

As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on,
when using "crash", "makedumpfile", etc utility to parse this vmcore, we
probably will get "Segmentation fault" or other unexpected errors.

E.g. 1) wrong code overwrites vmcoreinfo_data; 2) further crashes the
system; 3) trigger kdump, then we obviously will fail to recognize the
crash context correctly due to the corrupted vmcoreinfo.

Now except for vmcoreinfo, all the crash data is well
protected(including the cpu note which is fully updated in the crash
path, thus its correctness is guaranteed). Given that vmcoreinfo data
is a large chunk prepared for kdump, we better protect it as well.

To solve this, we relocate and copy vmcoreinfo_data to the crash memory
when kdump is loading via kexec syscalls. Because the whole crash
memory will be protected by existing arch_kexec_protect_crashkres()
mechanism, we naturally protect vmcoreinfo_data from write(even read)
access under kernel direct mapping after kdump is loaded.

Since kdump is usually loaded at the very early stage after boot, we can
trust the correctness of the vmcoreinfo data copied.

On the other hand, we still need to operate the vmcoreinfo safe copy
when crash happens to generate vmcoreinfo_note again, we rely on vmap()
to map out a new kernel virtual address and update to use this new one
instead in the following crash_save_vmcoreinfo().

BTW, we do not touch vmcoreinfo_note, because it will be fully updated
using the protected vmcoreinfo_data after crash which is surely correct
just like the cpu crash note.

Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang
Tested-by: Michael Holzheu
Cc: Benjamin Herrenschmidt
Cc: Dave Young
Cc: Eric Biederman
Cc: Hari Bathini
Cc: Juergen Gross
Cc: Mahesh Salgaonkar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xunlei Pang
2017-07-13 07:26:00 +0800
5203f4995 powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdr ... Browse Code »

vmcoreinfo_max_size stands for the vmcoreinfo_data, the correct one we
should use is vmcoreinfo_note whose total size is VMCOREINFO_NOTE_SIZE.

Like explained in commit 77019967f06b ("kdump: fix exported size of
vmcoreinfo note"), it should not affect the actual function, but we
better fix it, also this change should be safe and backward compatible.

After this, we can get rid of variable vmcoreinfo_max_size, let's use
the corresponding macros directly, fewer variables means more safety for
vmcoreinfo operation.

[xlpang@redhat.com: fix build warning]
Link: http://lkml.kernel.org/r/1494830606-27736-1-git-send-email-xlpang@redhat.com
Link: http://lkml.kernel.org/r/1493281021-20737-2-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang
Reviewed-by: Mahesh Salgaonkar
Reviewed-by: Dave Young
Cc: Hari Bathini
Cc: Benjamin Herrenschmidt
Cc: Eric Biederman
Cc: Juergen Gross
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xunlei Pang
2017-07-13 07:25:59 +0800
203e9e412 kexec: move vmcoreinfo out of the kernel's .bss section ... Browse Code »

As Eric said,
"what we need to do is move the variable vmcoreinfo_note out of the
kernel's .bss section. And modify the code to regenerate and keep this
information in something like the control page.

Definitely something like this needs a page all to itself, and ideally
far away from any other kernel data structures. I clearly was not
watching closely the data someone decided to keep this silly thing in
the kernel's .bss section."

This patch allocates extra pages for these vmcoreinfo_XXX variables, one
advantage is that it enhances some safety of vmcoreinfo, because
vmcoreinfo now is kept far away from other kernel data structures.

Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang
Tested-by: Michael Holzheu
Reviewed-by: Juergen Gross
Suggested-by: Eric Biederman
Cc: Benjamin Herrenschmidt
Cc: Dave Young
Cc: Hari Bathini
Cc: Mahesh Salgaonkar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xunlei Pang
2017-07-13 07:25:59 +0800
112166f88 kernel/fork.c: virtually mapped stacks: do not disable interrupts ... Browse Code »

The reason to disable interrupts seems to be to avoid switching to a
different processor while handling per cpu data using individual loads and
stores. If we use per cpu RMV primitives we will not have to disable
interrupts.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org
Signed-off-by: Christoph Lameter
Cc: Andy Lutomirski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2017-07-13 07:25:59 +0800
91a90140f mm/memory.c: mark create_huge_pmd() inline to prevent build failure ... Browse Code »

With gcc 4.1.2:

mm/memory.o: In function `create_huge_pmd':
memory.c:(.text+0x93e): undefined reference to `do_huge_pmd_anonymous_page'

Interestingly, create_huge_pmd() is emitted in the assembler output, but
never called.

Converting transparent_hugepage_enabled() from a macro to a static
inline function reduced the ability of the compiler to remove unused
code.

Fix this by marking create_huge_pmd() inline.

Fixes: 16981d763501c0e0 ("mm: improve readability of transparent_hugepage_enabled()")
Link: http://lkml.kernel.org/r/1499842660-10665-1-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven
Acked-by: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2017-07-13 07:25:59 +0800
c7acec713 kernel.h: handle pointers to arrays better in container_of() ... Browse Code »

If the first parameter of container_of() is a pointer to a
non-const-qualified array type (and the third parameter names a
non-const-qualified array member), the local variable __mptr will be
defined with a const-qualified array type. In ISO C, these types are
incompatible. They work as expected in GNU C, but some versions will
issue warnings. For example, GCC 4.9 produces the warning
"initialization from incompatible pointer type".

Here is an example of where the problem occurs:

-------------------------------------------------------
#include
#include

MODULE_LICENSE("GPL");

struct st {
int a;
char b[16];
};

static int __init example_init(void) {
struct st t = { .a = 101, .b = "hello" };
char (*p)[16] = &t.b;
struct st *x = container_of(p, struct st, b);
printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x);
return 0;
}

static void __exit example_exit(void) {
}

module_init(example_init);
module_exit(example_exit);
-------------------------------------------------------

Building the module with gcc-4.9 results in these warnings (where '{m}'
is the module source and '{k}' is the kernel source):

-------------------------------------------------------
In file included from {m}/example.c:1:0:
{m}/example.c: In function `example_init':
{k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
^
{m}/example.c:14:17: note: in expansion of macro `container_of'
struct st *x = container_of(p, struct st, b);
^
{k}/include/linux/kernel.h:854:48: warning: (near initialization for `x')
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
^
{m}/example.c:14:17: note: in expansion of macro `container_of'
struct st *x = container_of(p, struct st, b);
^
-------------------------------------------------------

Replace the type checking performed by the macro to avoid these
warnings. Make sure `*(ptr)` either has type compatible with the
member, or has type compatible with `void`, ignoring qualifiers. Raise
compiler errors if this is not true. This is stronger than the previous
behaviour, which only resulted in compiler warnings for a type mismatch.

[arnd@arndb.de: fix new warnings for container_of()]
Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.uk
Signed-off-by: Ian Abbott
Signed-off-by: Arnd Bergmann
Acked-by: Michal Nazarewicz
Acked-by: Kees Cook
Cc: Hidehiro Kawai
Cc: Borislav Petkov
Cc: Rasmus Villemoes
Cc: Johannes Berg
Cc: Peter Zijlstra
Cc: Alexander Potapenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Abbott
2017-07-13 07:25:59 +0800
0a2c13d9c include/linux/dcache.h: use unsigned chars in struct name_snapshot ... Browse Code »

"kernel.h: handle pointers to arrays better in container_of()" triggers:

In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/syscalls.h:71,
from fs/dcache.c:17:
fs/dcache.c: In function 'release_dentry_name_snapshot':
include/linux/compiler.h:542:38: error: call to '__compiletime_assert_305' declared with attribute error: pointer type mismatch in container_of()
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^
include/linux/compiler.h:525:4: note: in definition of macro '__compiletime_assert'
prefix ## suffix(); \
^
include/linux/compiler.h:542:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^
include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^
include/linux/kernel.h:860:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^
fs/dcache.c:305:7: note: in expansion of macro 'container_of'
p = container_of(name->name, struct external_name, name[0]);

Switch name_snapshot to use unsigned chars, matching struct qstr and
struct external_name.

Link: http://lkml.kernel.org/r/20170710152134.0f78c1e6@canb.auug.org.au
Signed-off-by: Stephen Rothwell
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2017-07-13 07:25:59 +0800
235b84fc8 Merge branch 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux ... Browse Code »

Pull i2c updates from Wolfram Sang:
"This pull request contains:

- i2c core reorganization. One source file became too monolithic. It
is now split up, yet we still have the same named object as the
final output. This should ease maintenance.

- new drivers: ZTE ZX2967 family, ASPEED 24XX/25XX

- designware driver gained slave mode support

- xgene-slimpro driver gained ACPI support

- bigger overhaul for pca-platform driver

- the algo-bit module now supports messages with enforced STOP

- slightly bigger than usual set of driver updates and improvements

and with much appreciated quality assurance from Andy Shevchenko"

* 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (51 commits)
i2c: Provide a stub for i2c_detect_slave_mode()
i2c: designware: Let slave adapter support be optional
i2c: designware: Make HW init functions static
i2c: designware: fix spelling mistakes
i2c: pca-platform: propagate error from i2c_pca_add_numbered_bus
i2c: pca-platform: correctly set algo_data.reset_chip
i2c: acpi: Do not create i2c-clients for LNXVIDEO ACPI devices
i2c: designware: enable SLAVE in platform module
i2c: designware: add SLAVE mode functions
i2c: zx2967: drop COMPILE_TEST dependency
i2c: zx2967: always use the same device when printing errors
i2c: pca-platform: use dev_warn/dev_info instead of printk
i2c: pca-platform: use device managed allocations
i2c: pca-platform: add devicetree awareness
i2c: pca-platform: switch to struct gpio_desc
dt-bindings: add bindings for i2c-pca-platform
i2c: cadance: fix ctrl/addr reg write order
i2c: zx2967: add i2c controller driver for ZTE's zx2967 family
dt: bindings: add documentation for zx2967 family i2c controller
i2c: algo-bit: add support for I2C_M_STOP
...

Linus Torvalds
2017-07-13 01:04:56 +0800
fb4e3beef Merge tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu ... Browse Code »

Pull IOMMU updates from Joerg Roedel:
"This update comes with:

- Support for lockless operation in the ARM io-pgtable code.

This is an important step to solve the scalability problems in the
common dma-iommu code for ARM

- Some Errata workarounds for ARM SMMU implemenations

- Rewrite of the deferred IO/TLB flush code in the AMD IOMMU driver.

The code suffered from very high flush rates, with the new
implementation the flush rate is down to ~1% of what it was before

- Support for amd_iommu=off when booting with kexec.

The problem here was that the IOMMU driver bailed out early without
disabling the iommu hardware, if it was enabled in the old kernel

- The Rockchip IOMMU driver is now available on ARM64

- Align the return value of the iommu_ops->device_group call-backs to
not miss error values

- Preempt-disable optimizations in the Intel VT-d and common IOVA
code to help Linux-RT

- Various other small cleanups and fixes"

* tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
iommu/vt-d: Constify intel_dma_ops
iommu: Warn once when device_group callback returns NULL
iommu/omap: Return ERR_PTR in device_group call-back
iommu: Return ERR_PTR() values from device_group call-backs
iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
iommu/vt-d: Don't disable preemption while accessing deferred_flush()
iommu/iova: Don't disable preempt around this_cpu_ptr()
iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
iommu/arm-smmu-v3: Remove io-pgtable spinlock
iommu/arm-smmu: Remove io-pgtable spinlock
iommu/io-pgtable-arm-v7s: Support lockless operation
iommu/io-pgtable-arm: Support lockless operation
iommu/io-pgtable: Introduce explicit coherency
iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
...

Linus Torvalds
2017-07-13 01:00:04 +0800
6b1c776d3 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs updates from Miklos Szeredi:
"This work from Amir introduces the inodes index feature, which
provides:

- hardlinks are not broken on copy up

- infrastructure for overlayfs NFS export

This also fixes constant st_ino for samefs case for lower hardlinks"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
ovl: mark parent impure and restore timestamp on ovl_link_up()
ovl: document copying layers restrictions with inodes index
ovl: cleanup orphan index entries
ovl: persistent overlay inode nlink for indexed inodes
ovl: implement index dir copy up
ovl: move copy up lock out
ovl: rearrange copy up
ovl: add flag for upper in ovl_entry
ovl: use struct copy_up_ctx as function argument
ovl: base tmpfile in workdir too
ovl: factor out ovl_copy_up_inode() helper
ovl: extract helper to get temp file in copy up
ovl: defer upper dir lock to tempfile link
ovl: hash overlay non-dir inodes by copy up origin
ovl: cleanup bad and stale index entries on mount
ovl: lookup index entry for copy up origin
ovl: verify index dir matches upper dir
ovl: verify upper root dir matches lower root dir
ovl: introduce the inodes index dir feature
ovl: generalize ovl_create_workdir()
...

Linus Torvalds
2017-07-13 00:28:55 +0800
58c7ffc07 fix a braino in compat_sys_getrlimit() ... Browse Code »

Reported-and-tested-by: Meelis Roos
Fixes: commit d9e968cb9f84 "getrlimit()/setrlimit(): move compat to native"
Signed-off-by: Al Viro
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds

Al Viro
2017-07-13 00:15:00 +0800

12 Jul, 2017

7 commits

3b06b1a74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc ... Browse Code »

Pull sparc fixes from David Miller:

- Fix symbol version generation for assembler on sparc, from
Nagarathnam Muthusamy.

- Fix compound page handling in gup_huge_pmd(), from Nitin Gupta.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix gup_huge_pmd
Adding the type of exported symbols
sed regex in Makefile.build requires line break between exported symbols
Adding asm-prototypes.h for genksyms to generate crc

Linus Torvalds
2017-07-12 12:34:24 +0800
130568d5e Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:
"This is a followup for block changes, that didn't make the initial
pull request. It's a bit of a mixed bag, this contains:

- A followup pull request from Sagi for NVMe. Outside of fixups for
NVMe, it also includes a series for ensuring that we properly
quiesce hardware queues when browsing live tags.

- Set of integrity fixes from Dmitry (mostly), fixing various issues
for folks using DIF/DIX.

- Fix for a bug introduced in cciss, with the req init changes. From
Christoph.

- Fix for a bug in BFQ, from Paolo.

- Two followup fixes for lightnvm/pblk from Javier.

- Depth fix from Ming for blk-mq-sched.

- Also from Ming, performance fix for mtip32xx that was introduced
with the dynamic initialization of commands"

* 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
block: call bio_uninit in bio_endio
nvmet: avoid unneeded assignment of submit_bio return value
nvme-pci: add module parameter for io queue depth
nvme-pci: compile warnings in nvme_alloc_host_mem()
nvmet_fc: Accept variable pad lengths on Create Association LS
nvme_fc/nvmet_fc: revise Create Association descriptor length
lightnvm: pblk: remove unnecessary checks
lightnvm: pblk: control I/O flow also on tear down
cciss: initialize struct scsi_req
null_blk: fix error flow for shared tags during module_init
block: Fix __blkdev_issue_zeroout loop
nvme-rdma: unconditionally recycle the request mr
nvme: split nvme_uninit_ctrl into stop and uninit
virtio_blk: quiesce/unquiesce live IO when entering PM states
mtip32xx: quiesce request queues to make sure no submissions are inflight
nbd: quiesce request queues to make sure no submissions are inflight
nvme: kick requeue list when requeueing a request instead of when starting the queues
nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
...

Linus Torvalds
2017-07-12 06:36:52 +0800
908b852df Merge tag 'smb3-security-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes and sane default from Steve French:
"Upgrade default dialect to more secure SMB3 from older cifs dialect"

* tag 'smb3-security-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Clean up unused variables in smb2pdu.c
[SMB3] Improve security, move default dialect to SMB3 from old CIFS
[SMB3] Remove ifdef since SMB3 (and later) now STRONGLY preferred
CIFS: Reconnect expired SMB sessions
CIFS: Display SMB2 error codes in the hex format
cifs: Use smb 2 - 3 and cifsacl mount options setacl function
cifs: prototype declaration and definition to set acl for smb 2 - 3 and cifsacl mount options

Linus Torvalds
2017-07-12 05:04:48 +0800
3bf7878f0 Merge tag 'ceph-for-4.13-rc1' of git://github.com/ceph/ceph-client ... Browse Code »

Pull ceph updates from Ilya Dryomov:
"The main item here is support for v12.y.z ("Luminous") clusters:
RESEND_ON_SPLIT, RADOS_BACKOFF, OSDMAP_PG_UPMAP and CRUSH_CHOOSE_ARGS
feature bits, and various other changes in the RADOS client protocol.

On top of that we have a new fsc mount option to allow supplying
fscache uniquifier (similar to NFS) and the usual pile of filesystem
fixes from Zheng"

* tag 'ceph-for-4.13-rc1' of git://github.com/ceph/ceph-client: (44 commits)
libceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUS
libceph: osd_state is 32 bits wide in luminous
crush: remove an obsolete comment
crush: crush_init_workspace starts with struct crush_work
libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()
crush: implement weight and id overrides for straw2
libceph: apply_upmap()
libceph: compute actual pgid in ceph_pg_to_up_acting_osds()
libceph: pg_upmap[_items] infrastructure
libceph: ceph_decode_skip_* helpers
libceph: kill __{insert,lookup,remove}_pg_mapping()
libceph: introduce and switch to decode_pg_mapping()
libceph: don't pass pgid by value
libceph: respect RADOS_BACKOFF backoffs
libceph: make DEFINE_RB_* helpers more general
libceph: avoid unnecessary pi lookups in calc_target()
libceph: use target pi for calc_target() calculations
libceph: always populate t->target_{oid,oloc} in calc_target()
libceph: make sure need_resend targets reflect latest map
libceph: delete from need_resend_linger before check_linger_pool_dne()
...

Linus Torvalds
2017-07-12 03:12:28 +0800
07d306c83 Merge git://www.linux-watchdog.org/linux-watchdog ... Browse Code »

Pull watchdog updates from Wim Van Sebroeck:

- Add Renesas RZ/A WDT Watchdog driver

- STM32 Independent WatchDoG (IWDG) support

- UniPhier watchdog support

- Add F71868 support

- Add support for NCT6793D and NCT6795D

- dw_wdt: add reset lines support

- core: add option to avoid early handling of watchdog

- core: introduce watchdog_worker_should_ping helper

- Cleanups and improvements for sama5d4, intel-mid_wdt, s3c2410_wdt,
orion_wdt, gpio_wdt, it87_wdt, meson_wdt, davinci_wdt, bcm47xx_wdt,
zx2967_wdt, cadence_wdt

* git://www.linux-watchdog.org/linux-watchdog: (32 commits)
watchdog: introduce watchdog_worker_should_ping helper
watchdog: uniphier: add UniPhier watchdog driver
dt-bindings: watchdog: add description for UniPhier WDT controller
watchdog: cadence_wdt: make of_device_ids const.
watchdog: zx2967: constify zx2967_wdt_ops.
watchdog: bcm47xx_wdt: constify bcm47xx_wdt_hard_ops and bcm47xx_wdt_soft_ops
watchdog: davinci: Add missing clk_disable_unprepare().
watchdog: davinci: Handle return value of clk_prepare_enable
watchdog: meson: Handle return value of clk_prepare_enable
watchdog: it87: Add support for various Super-IO chips
watchdog: it87: Use infrastructure to stop watchdog on reboot
watchdog: it87: Drop support for resetting watchdog though CIR and Game port
watchdog: it87: Convert to use watchdog core infrastructure
watchdog: it87: Drop FSF mailing address
watchdog: dw_wdt: get reset lines from dt
watchdog: bindings: dw_wdt: add reset lines
watchdog: w83627hf: Add support for NCT6793D and NCT6795D
watchdog: core: add option to avoid early handling of watchdog
watchdog: f71808e_wdt: Add F71868 support
watchdog: Add STM32 IWDG driver
...

Linus Torvalds
2017-07-12 00:59:37 +0800
a3ddacbae Merge tag 'chrome-platform-for-linus-4.13' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/bleung/chrome-platform

Pull chrome platform updates from Benson Leung:
"Changes in this pull request are around catching up cros_ec with the
internal chromeos-kernel versions of cros_ec, cros_ec_lpc, and
cros_ec_lightbar.

Also, switching maintainership from olof to bleung"

* tag 'chrome-platform-for-linus-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
platform/chrome : Add myself as Maintainer
platform/chrome: cros_ec_lightbar - hide unused PM functions
cros_ec: Don't signal wake event for non-wake host events
cros_ec: Fix deadlock when EC is not responsive at probe
cros_ec: Don't return error when checking command version
platform/chrome: cros_ec_lightbar - Avoid I2C xfer to EC during suspend
platform/chrome: cros_ec_lightbar - Add userspace lightbar control bit to EC
platform/chrome: cros_ec_lightbar - Control of suspend/resume lightbar sequence
platform/chrome: cros_ec_lightbar - Add lightbar program feature to sysfs
platform/chrome: cros_ec_lpc: Add MKBP events support over ACPI
platform/chrome: cros_ec_lpc: Add power management ops
platform/chrome: cros_ec_lpc: Add support for GOOG004 ACPI device
platform/chrome: cros_ec_lpc: Add support for mec1322 EC
platform/chrome: cros_ec_lpc: Add R/W helpers to LPC protocol variants
mfd: cros_ec: Add support for dumping panic information
cros_ec_debugfs: Pass proper struct sizes to cros_ec_cmd_xfer()
mfd: cros_ec: add debugfs, console log file
mfd: cros_ec: Add EC console read structures definitions
mfd: cros_ec: Add helper for event notifier.

Linus Torvalds
2017-07-12 00:55:47 +0800
a01881773 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu ... Browse Code »

Pull x86nommu update from Greg Ungerer:
"Only a single change, to remove old Kconfig options from defconfigs"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: defconfig: Cleanup from old Kconfig options

Linus Torvalds
2017-07-12 00:52:56 +0800

11 Jul, 2017

10 commits

9967468c0 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- most of the rest of MM

- KASAN updates

- lib/ updates

- checkpatch updates

- some binfmt_elf changes

- various misc bits

* emailed patches from Andrew Morton : (115 commits)
kernel/exit.c: avoid undefined behaviour when calling wait4()
kernel/signal.c: avoid undefined behaviour in kill_something_info
binfmt_elf: safely increment argv pointers
s390: reduce ELF_ET_DYN_BASE
powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
arm: move ELF_ET_DYN_BASE to 4MB
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
fs, epoll: short circuit fetching events if thread has been killed
checkpatch: improve multi-line alignment test
checkpatch: improve macro reuse test
checkpatch: change format of --color argument to --color[=WHEN]
checkpatch: silence perl 5.26.0 unescaped left brace warnings
checkpatch: improve tests for multiple line function definitions
checkpatch: remove false warning for commit reference
checkpatch: fix stepping through statements with $stat and ctx_statement_block
checkpatch: [HLP]LIST_HEAD is also declaration
checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t
checkpatch: improve the unnecessary OOM message test
lib/bsearch.c: micro-optimize pivot position calculation
...

Linus Torvalds
2017-07-11 07:58:42 +0800
dd83c161f kernel/exit.c: avoid undefined behaviour when calling wait4() ... Browse Code »

wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
UBSAN: Undefined behaviour in kernel/exit.c:1651:9

The related calltrace is as follows:

negation of -2147483648 cannot be represented in type 'int':
CPU: 9 PID: 16482 Comm: zj Tainted: G B ---- ------- 3.10.0-327.53.58.71.x86_64+ #66
Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011
Call Trace:
dump_stack+0x19/0x1b
ubsan_epilogue+0xd/0x50
__ubsan_handle_negate_overflow+0x109/0x14e
SyS_wait4+0x1cb/0x1e0
system_call_fastpath+0x16/0x1b

Exclude the overflow to avoid the UBSAN warning.

Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Aneesh Kumar K.V
Cc: Kirill A. Shutemov
Cc: Xishi Qiu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

zhongjiang
2017-07-11 07:32:36 +0800
4ea77014a kernel/signal.c: avoid undefined behaviour in kill_something_info ... Browse Code »

When running kill(72057458746458112, 0) in userspace I hit the following
issue.

UBSAN: Undefined behaviour in kernel/signal.c:1462:11
negation of -2147483648 cannot be represented in type 'int':
CPU: 226 PID: 9849 Comm: test Tainted: G B ---- ------- 3.10.0-327.53.58.70.x86_64_ubsan+ #116
Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014
Call Trace:
dump_stack+0x19/0x1b
ubsan_epilogue+0xd/0x50
__ubsan_handle_negate_overflow+0x109/0x14e
SYSC_kill+0x43e/0x4d0
SyS_kill+0xe/0x10
system_call_fastpath+0x16/0x1b

Add code to avoid the UBSAN detection.

[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang
Cc: Oleg Nesterov
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Xishi Qiu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

zhongjiang
2017-07-11 07:32:36 +0800
67c6777a5 binfmt_elf: safely increment argv pointers ... Browse Code »

When building the argv/envp pointers, the envp is needlessly
pre-incremented instead of just continuing after the argv pointers are
finished. In some (likely impossible) race where the strings could be
changed from userspace between copy_strings() and here, it might be
possible to confuse the envp position. Instead, just use sp like
everything else.

Link: http://lkml.kernel.org/r/20170622173838.GA43308@beast
Signed-off-by: Kees Cook
Cc: Rik van Riel
Cc: Daniel Micay
Cc: Qualys Security Advisory
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Alexander Viro
Cc: Dmitry Safonov
Cc: Andy Lutomirski
Cc: Grzegorz Andrejczuk
Cc: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-07-11 07:32:36 +0800
a73dc5370 s390: reduce ELF_ET_DYN_BASE ... Browse Code »

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the
address space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers. On 32-bit use 4MB, which is the
traditional x86 minimum load location, likely to avoid historically
requiring a 4MB page table entry when only a portion of the first 4MB
would be used (since the NULL address is avoided). For s390 the
position could be 0x10000, but that is needlessly close to the NULL
address.

Link: http://lkml.kernel.org/r/1498154792-49952-5-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: James Hogan
Cc: Pratyush Anand
Cc: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-07-11 07:32:36 +0800
47ebb09d5 powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB ... Browse Code »

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the
address space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers. On 32-bit use 4MB, which is the
traditional x86 minimum load location, likely to avoid historically
requiring a 4MB page table entry when only a portion of the first 4MB
would be used (since the NULL address is avoided).

Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook
Tested-by: Michael Ellerman
Acked-by: Michael Ellerman
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: James Hogan
Cc: Pratyush Anand
Cc: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-07-11 07:32:36 +0800
02445990a arm64: move ELF_ET_DYN_BASE to 4GB / 4MB ... Browse Code »

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the
address space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers. On 32-bit use 4MB, to match ARM.
This could be 0x8000, the standard ET_EXEC load address, but that is
needlessly close to the NULL address, and anyone running arm compat PIE
will have an MMU, so the tight mapping is not needed.

Link: http://lkml.kernel.org/r/1498251600-132458-4-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook
Cc: Ard Biesheuvel
Cc: Catalin Marinas
Cc: Mark Rutland
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-07-11 07:32:36 +0800
6a9af90a3 arm: move ELF_ET_DYN_BASE to 4MB ... Browse Code »

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the
address space to avoid possible collisions with mmap or stack regions.

4MB is chosen here mainly to have parity with x86, where this is the
traditional minimum load location, likely to avoid historically
requiring a 4MB page table entry when only a portion of the first 4MB
would be used (since the NULL address is avoided).

For ARM the position could be 0x8000, the standard ET_EXEC load address,
but that is needlessly close to the NULL address, and anyone running PIE
on 32-bit ARM will have an MMU, so the tight mapping is not needed.

Link: http://lkml.kernel.org/r/1498154792-49952-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: James Hogan
Cc: Pratyush Anand
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Alexander Viro
Cc: Andy Lutomirski
Cc: Daniel Micay
Cc: Dmitry Safonov
Cc: Grzegorz Andrejczuk
Cc: Kees Cook
Cc: Masahiro Yamada
Cc: Qualys Security Advisory
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-07-11 07:32:36 +0800
eab09532d binfmt_elf: use ELF_ET_DYN_BASE only for PIE ... Browse Code »

The ELF_ET_DYN_BASE position was originally intended to keep loaders
away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2
/bin/cat" might cause the subsequent load of /bin/cat into where the
loader had been loaded.)

With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
ELF_ET_DYN_BASE continued to be used since the kernel was only looking
at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the
top 1/3rd of the TASK_SIZE, a substantial portion of the address space
is unused.

For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
loaded above the mmap region. This means they can be made to collide
(CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
pathological stack regions.

Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
region in all cases, and will now additionally avoid programs falling
back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
if it would have collided with the stack, now it will fail to load
instead of falling back to the mmap region).

To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
are loaded into the mmap region, leaving space available for either an
ET_EXEC binary with a fixed location or PIE being loaded into mmap by
the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
which means architectures can now safely lower their values without risk
of loaders colliding with their subsequently loaded programs.

For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
the entire 32-bit address space for 32-bit pointers.

Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
suggestions on how to implement this solution.

Fixes: d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR")
Link: http://lkml.kernel.org/r/20170621173201.GA114489@beast
Signed-off-by: Kees Cook
Acked-by: Rik van Riel
Cc: Daniel Micay
Cc: Qualys Security Advisory
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Alexander Viro
Cc: Dmitry Safonov
Cc: Andy Lutomirski
Cc: Grzegorz Andrejczuk
Cc: Masahiro Yamada
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Heiko Carstens
Cc: James Hogan
Cc: Martin Schwidefsky
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Pratyush Anand
Cc: Russell King
Cc: Will Deacon
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-07-11 07:32:36 +0800
c257a340e fs, epoll: short circuit fetching events if thread has been killed ... Browse Code »

We've encountered zombies that are waiting for a thread to exit that are
looping in ep_poll() almost endlessly although there is a pending
SIGKILL as a result of a group exit.

This happens because we always find ep_events_available() and fetch more
events and never are able to check for signal_pending() that would break
from the loop and return -EINTR.

Special case fatal signals and break immediately to guarantee that we
loop to fetch more events and delay making a timely exit.

It would also be possible to simply move the check for signal_pending()
higher than checking for ep_events_available(), but there have been no
reports of delayed signal handling other than SIGKILL preventing zombies
from exiting that would be fixed by this.

It fixes an issue for us where we have witnessed zombies sticking around
for at least O(minutes), but considering the code has been like this
forever and nobody else has complained that I have found, I would simply
queue it up for 4.12.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705031722350.76784@chino.kir.corp.google.com
Signed-off-by: David Rientjes
Cc: Alexander Viro
Cc: Jan Kara
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2017-07-11 07:32:36 +0800