Eric Lee / smarc-fsl-linux-kernel

24 May, 2016

40 commits

91f4f94ea ipc, shm: make shmem attach/detach wait for mmap_sem killable ... Browse Code »

shmat and shmdt rely on mmap_sem for write. If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.

Signed-off-by: Michal Hocko
Acked-by: Davidlohr Bueso
Acked-by: Vlastimil Babka
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
7c0512679 mm, fork: make dup_mmap wait for mmap_sem for write killable ... Browse Code »

dup_mmap needs to lock current's mm mmap_sem for write. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
4e80153a6 mm, proc: make clear_refs killable ... Browse Code »

CLEAR_REFS_MM_HIWATER_RSS and CLEAR_REFS_SOFT_DIRTY are relying on
mmap_sem for write. If the waiting task gets killed by the oom killer
and it would operate on the current's mm it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting. This will also expedite the
return to the userspace and do_exit even if the mm is remote.

Signed-off-by: Michal Hocko
Acked-by: Oleg Nesterov
Acked-by: Vlastimil Babka
Cc: Petr Cermak
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
2d6c92824 mm: make vm_brk killable ... Browse Code »

Now that all the callers handle vm_brk failure we can change it wait for
mmap_sem killable to help oom_reaper to not get blocked just because
vm_brk gets blocked behind mmap_sem readers.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: "Kirill A. Shutemov"
Cc: Oleg Nesterov
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
ecc2bc8ac mm, elf: handle vm_brk error ... Browse Code »

load_elf_library doesn't handle vm_brk failure although nothing really
indicates it cannot do that because the function is allowed to fail due
to vm_mmap failures already. This might be not a problem now but later
patch will make vm_brk killable (resp. mmap_sem for write waiting will
become killable) and so the failure will be more probable.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
864778b15 mm, aout: handle vm_brk failures ... Browse Code »

vm_brk is allowed to fail but load_aout_binary simply ignores the error
and happily continues. I haven't noticed any problem from that in real
life but later patches will make the failure more likely because vm_brk
will become killable (resp. mmap_sem for write waiting will become
killable) so we should be more careful now.

The error handling should be quite straightforward because there are
calls to vm_mmap which check the error properly already. The only
notable exception is set_brk which is called after beyond_if label. But
nothing indicates that we cannot move it above set_binfmt as the two do
not depend on each other and fail before we do set_binfmt and alter
reference counting.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
ae7987835 mm: make vm_munmap killable ... Browse Code »

Almost all current users of vm_munmap are ignoring the return value and
so they do not handle potential error. This means that some VMAs might
stay behind. This patch doesn't try to solve those potential problems.
Quite contrary it adds a new failure mode by using down_write_killable
in vm_munmap. This should be safer than other failure modes, though,
because the process is guaranteed to die as soon as it leaves the kernel
and exit_mmap will clean the whole address space.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Signed-off-by: Michal Hocko
Cc: Oleg Nesterov
Cc: "Kirill A. Shutemov"
Cc: Konstantin Khlebnikov
Cc: Andrea Arcangeli
Cc: Alexander Viro
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
9fbeb5ab5 mm: make vm_mmap killable ... Browse Code »

All the callers of vm_mmap seem to check for the failure already and
bail out in one way or another on the error which means that we can
change it to use killable version of vm_mmap_pgoff and return -EINTR if
the current task gets killed while waiting for mmap_sem. This also
means that vm_mmap_pgoff can be killable by default and drop the
additional parameter.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Please note that load_elf_binary is ignoring vm_mmap error for
current->personality & MMAP_PAGE_ZERO case but that shouldn't be a
problem because the address is not used anywhere and we never return to
the userspace if we got killed.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: "Kirill A. Shutemov"
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Andrea Arcangeli
Cc: Al Viro
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
dc0ef0df7 mm: make mmap_sem for write waits killable for mm syscalls ... Browse Code »

This is a follow up work for oom_reaper [1]. As the async OOM killing
depends on oom_sem for read we would really appreciate if a holder for
write didn't stood in the way. This patchset is changing many of
down_write calls to be killable to help those cases when the writer is
blocked and waiting for readers to release the lock and so help
__oom_reap_task to process the oom victim.

Most of the patches are really trivial because the lock is help from a
shallow syscall paths where we can return EINTR trivially and allow the
current task to die (note that EINTR will never get to the userspace as
the task has fatal signal pending). Others seem to be easy as well as
the callers are already handling fatal errors and bail and return to
userspace which should be sufficient to handle the failure gracefully.
I am not familiar with all those code paths so a deeper review is really
appreciated.

As this work is touching more areas which are not directly connected I
have tried to keep the CC list as small as possible and people who I
believed would be familiar are CCed only to the specific patches (all
should have received the cover though).

This patchset is based on linux-next and it depends on
down_write_killable for rw_semaphores which got merged into tip
locking/rwsem branch and it is merged into this next tree. I guess it
would be easiest to route these patches via mmotm because of the
dependency on the tip tree but if respective maintainers prefer other
way I have no objections.

I haven't covered all the mmap_write(mm->mmap_sem) instances here

$ git grep "down_write(.*\)" next/master | wc -l
98
$ git grep "down_write(.*\)" | wc -l
62

I have tried to cover those which should be relatively easy to review in
this series because this alone should be a nice improvement. Other
places can be changed on top.

[0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org
[1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org

This patch (of 18):

This is the first step in making mmap_sem write waiters killable. It
focuses on the trivial ones which are taking the lock early after
entering the syscall and they are not changing state before.

Therefore it is very easy to change them to use down_write_killable and
immediately return with -EINTR. This will allow the waiter to pass away
without blocking the mmap_sem which might be required to make a forward
progress. E.g. the oom reaper will need the lock for reading to
dismantle the OOM victim address space.

The only tricky function in this patch is vm_mmap_pgoff which has many
call sites via vm_mmap. To reduce the risk keep vm_mmap with the
original non-killable semantic for now.

vm_munmap callers do not bother checking the return value so open code
it into the munmap syscall path for now for simplicity.

Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Mel Gorman
Cc: "Kirill A. Shutemov"
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Andrea Arcangeli
Cc: David Rientjes
Cc: Dave Hansen
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-05-24 08:04:14 +0800
e10af1328 MAINTAINERS: add co-maintainer for scripts/gdb ... Browse Code »

Add myself as a co-maintainer for scripts/gdb supporting Jan Kizka

Link: http://lkml.kernel.org/r/fb5d34ce563f33d2f324f26f592b24ded30032ee.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
b3b084298 scripts/gdb: decode bytestream on dmesg for Python3 ... Browse Code »

The recent fixes to lx-dmesg, now allow the command to print
successfully on Python3, however the python interpreter wraps the bytes
for each line with a b'' marker.

To remove this, we need to decode the line, where .decode() will default
to 'UTF-8'

Link: http://lkml.kernel.org/r/d67ccf93f2479c94cb3399262b9b796e0dbefcf2.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Acked-by: Dom Cote
Tested-by: Dom Cote
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
d21d5b9eb scripts/gdb: fix issue with dmesg.py and python 3.X ... Browse Code »

When built against Python 3, GDB differs in the return type for its
read_memory function, causing the lx-dmesg command to fail.

Now that we have an improved read_16() we can use the new
read_memoryview() abstraction to make lx-dmesg return valid data on both
current Python APIs

Tested with python 3.4 and 2.7
Tested with gdb 7.7

Link: http://lkml.kernel.org/r/28477b727ff7fe3101fd4e426060e8a68317a639.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Dom Cote
[kieran@bingham.xyz: Adjusted commit log to better reflect code changes]
Tested-by: Kieran Bingham (Py2.7,Py3.4,GDB10)
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dom Cote
2016-05-24 08:04:14 +0800
321958d97 scripts/gdb: improve types abstraction for gdb python scripts ... Browse Code »

Change the read_u16 function so it accepts both 'str' and 'byte' as type
for the arguments.

When calling read_memory() from gdb API, depending on if it was built
with 2.7 or 3.X, the format used to return the data will differ ( 'str'
for 2.7, and 'byte' for 3.X ).

Add a function read_memoryview() to be able to get a 'memoryview' object
back from read_memory() both with python 2.7 and 3.X .

Tested with python 3.4 and 2.7
Tested with gdb 7.7

Link: http://lkml.kernel.org/r/73621f564503137a002a639d174e4fb35f73f462.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Dom Cote
Tested-by: Kieran Bingham (Py2.7,Py3.4,GDB10)
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dom Cote
2016-05-24 08:04:14 +0800
9f66dee72 scripts/gdb: add lx_thread_info_by_pid helper ... Browse Code »

The tasks module already provides helpers to find the task struct by
pid, and the thread_info by task struct; however this is cumbersome to
utilise on the gdb commandline.

Wrap these two functionalities together in an extra single helper to
allow exploring the thread info, from a PID value

Link: http://lkml.kernel.org/r/dadc5667f053ec811eb3e3033d99d937fedbc93b.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
9b5580359 scripts/gdb: add documentation example for radix tree ... Browse Code »

Provide a worked example for utilising the lx_radix_tree_lookup function

Link: http://lkml.kernel.org/r/e786008ac5aec4b84198812805b326d718bdeb4b.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
e127a73d4 scripts/gdb: add a Radix Tree Parser ... Browse Code »

Linux makes use of the Radix Tree data structure to store pointers
indexed by integer values. This structure is utilised across many
structures in the kernel including the IRQ descriptor tables, and
several filesystems.

This module provides a method to lookup values from a structure given
its head node.

Usage:

The function lx_radix_tree_lookup, must be given a symbol of type struct
radix_tree_root, and an index into that tree.

The object returned is a generic integer value, and must be cast
correctly to the type based on the storage in the data structure.

For example, to print the irq descriptor in the sparse irq_desc_tree at
index 18, try the following:

(gdb) print (struct irq_desc)$lx_radix_tree_lookup(irq_desc_tree, 18)

Link: http://lkml.kernel.org/r/d2028c55e50cf95a9b7f8ca0d11885174b0cc709.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
4bc393dbc scripts/gdb: cast CPU numbers to integer ... Browse Code »

We won't see more than 2 billion CPUs any time soon, and having cpu_list
return long makes the output of lx-cpus a bit ugly.

Link: http://lkml.kernel.org/r/dcb45c3b0a59e0fd321fa56ff7aa398458c689b3.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kiszka
2016-05-24 08:04:14 +0800
b1503934a scripts/gdb: add cpu iterators ... Browse Code »

The linux kernel provides macro's for iterating against values from the
cpu_list masks. By providing some commonly used masks, we can mirror
the kernels helper macros with easy to use generators.

Link: http://lkml.kernel.org/r/d045c6599771ada1999d49612ee30fd2f9acf17f.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
c1a153992 scripts/gdb: add mount point list command ... Browse Code »

lx-mounts will identify current mount points based on the 'init_task'
namespace by default, as we do not yet have a kernel thread list
implementation to select the current running thread.

Optionally, a user can specify a PID to list from that process'
namespace

Link: http://lkml.kernel.org/r/e614c7bc32d2350b4ff1627ec761a7148e65bfe6.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
e7165a2d7 scripts/gdb: add io resource readers ... Browse Code »

Provide iomem_resource and ioports_resource printers and command hooks

It can be quite interesting to halt the kernel as it's booting and check
to see this list as it is being populated.

It should be useful in the event that a kernel is not booting, you can
identify what memory resources have been registered

Link: http://lkml.kernel.org/r/f0a6b9fa9c92af4d7ed2e7343ccc84150e9c6fc5.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
74627cf2d scripts/gdb: provide a dentry_name VFS path helper ... Browse Code »

Walk the VFS entries, pre-pending the iname strings to generate a full
VFS path name from a dentry.

Link: http://lkml.kernel.org/r/4328fdb2d15ba7f1b21ad21c2eecc38d9cfc4d13.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
958ef8a09 scripts/gdb: support !CONFIG_MODULES gracefully ... Browse Code »

If CONFIG_MODULES is not enabled, lx-lsmod tries to find a non-existent
symbol and generates an unfriendly traceback:

(gdb) lx-lsmod
Address Module Size Used by
Traceback (most recent call last):
File "scripts/gdb/linux/modules.py", line 75, in invoke
for module in module_list():
File "scripts/gdb/linux/modules.py", line 24, in module_list
module_ptr_type = module_type.get_type().pointer()
File "scripts/gdb/linux/utils.py", line 28, in get_type
self._type = gdb.lookup_type(self._name)
gdb.error: No struct type named module.
Error occurred in Python command: No struct type named module.

Catch the error and return an empty module_list() for a clean command
output as follows:

(gdb) lx-lsmod
Address Module Size Used by
(gdb)

Link: http://lkml.kernel.org/r/94d533819437408b85ae5864f939dd7ca6fbfcd6.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
e78f3d70b scripts/gdb: provide exception catching parser ... Browse Code »

If we attempt to read a value that is not available to GDB, an exception
is raised. Most of the time, this is a good thing; however on occasion
we will want to be able to determine if a symbol is available.

By catching the exception to simply return None, we can determine if we
tried to read an invalid value, without the exception taking our
execution context away from us

Link: http://lkml.kernel.org/r/c72b25c06fc66e1d68371154097e2cbb112555d8.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
619ccaf3e scripts/gdb: convert modules usage to lists functions ... Browse Code »

Simplify the module list functions with the new list_for_each_entry
abstractions

Link: http://lkml.kernel.org/r/ad0101c9391088608166fcec26af179868973d86.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
a84be61d0 scripts/gdb: provide kernel list item generators ... Browse Code »

Facilitate linked-list items by providing a generator to return the
dereferenced, and type-cast objects from a kernel linked list

Link: http://lkml.kernel.org/r/2b0998564e6e5abe53585d466f87e491331fd2a4.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Cc: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
f197d75fc scripts/gdb: provide linux constants ... Browse Code »

Some macro's and defines are needed when parsing memory, and without
compiling the kernel as -g3 they are not available in the debug-symbols.

We use the pre-processor here to extract constants to a dedicated module
for the linux debugger extensions

Top level Kbuild is used to call in and generate the constants file,
while maintaining dependencies on autogenerated files in
include/generated

Link: http://lkml.kernel.org/r/bc3df9c25f57ea72177c066a51a446fc19e2c27f.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Kieran Bingham
Signed-off-by: Jan Kiszka
Cc: Michal Marek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kieran Bingham
2016-05-24 08:04:14 +0800
0c22fde8b scripts/gdb: Adjust module reference counter reported by lx-lsmod ... Browse Code »

This takes the MODULE_REF_BASE into account.

Link: http://lkml.kernel.org/r/d926d2d54caa034adb964b52215090cbdb875249.1462865983.git.jan.kiszka@siemens.com
Signed-off-by: Jan Kiszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kiszka
2016-05-24 08:04:14 +0800
7efb2a7b8 arch/defconfig: remove CONFIG_RESOURCE_COUNTERS ... Browse Code »

This option was replaced by PAGE_COUNTER which is selected by MEMCG.

Signed-off-by: Konstantin Khlebnikov
Acked-by: Arnd Bergmann
Acked-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2016-05-24 08:04:14 +0800
de6cdcb5c drivers/memstick/core/mspro_block: use kmemdup ... Browse Code »

Use kmemdup when some other buffer is immediately copied into allocated
region. It replaces call to allocation followed by memcpy, by a single
call to kmemdup.

[akpm@linux-foundation.org: remove unneeded cast to void*]
Link: http://lkml.kernel.org/r/1463665743-16269-1-git-send-email-falakreyaz@gmail.com
Signed-off-by: Muhammad Falak R Wani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Muhammad Falak R Wani
2016-05-24 08:04:14 +0800
a831979f8 rtsx_usb_ms: use schedule_timeout_idle() in polling loop ... Browse Code »

First version of this patch has already been posted to LKML by Ben
Hutchings ~6 months ago, but no further action were performed.

Ben's original message:

: rtsx_usb_ms creates a task that mostly sleeps, but tasks in
: uninterruptible sleep still contribute to the load average (for
: bug-compatibility with Unix). A load average of ~1 on a system that
: should be idle is somewhat alarming.
:
: Change the sleep to be interruptible, but still ignore signals.

References: https://bugs.debian.org/765717
Link: http://lkml.kernel.org/r/b49f95ae83057efa5d96f532803cba47@natalenko.name
Signed-off-by: Oleksandr Natalenko
Cc: Oleg Nesterov
Cc: Ben Hutchings
Cc: Lee Jones
Cc: Wolfram Sang
Cc: Roger Tseng
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleksandr Natalenko
2016-05-24 08:04:14 +0800
a0c20deae kdump: fix gdb macros work work with newer and 64-bit kernels ... Browse Code »

Lots of little changes needed to be made to clean these up, remove the
four byte pointer assumption and traverse the pid queue properly. Also
consolidate the traceback code into a single function instead of having
three copies of it.

Link: http://lkml.kernel.org/r/1462926655-9390-1-git-send-email-minyard@acm.org
Signed-off-by: Corey Minyard
Acked-by: Baoquan He
Cc: Vivek Goyal
Cc: Haren Myneni
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Corey Minyard
2016-05-24 08:04:14 +0800
7a0058ec7 s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(… ... Browse Code »

…unprotect)_crashkres()

Commit 3f625002581b ("kexec: introduce a protection mechanism for the
crashkernel reserved memory") is a similar mechanism for protecting the
crash kernel reserved memory to previous crash_map/unmap_reserved_pages()
implementation, the new one is more generic in name and cleaner in code
(besides, some arch may not be allowed to unmap the pgtable).

Therefore, this patch consolidates them, and uses the new
arch_kexec_protect(unprotect)_crashkres() to replace former
crash_map/unmap_reserved_pages() which by now has been only used by
S390.

The consolidation work needs the crash memory to be mapped initially,
this is done in machine_kdump_pm_init() which is after
reserve_crashkernel(). Once kdump kernel is loaded, the new
arch_kexec_protect_crashkres() implemented for S390 will actually
unmap the pgtable like before.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Minfei Huang <mhuang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Xunlei Pang
2016-05-24 08:04:14 +0800
0eea08678 kexec: do a cleanup for function kexec_load ... Browse Code »

There are a lof of work to be done in function kexec_load, not only for
allocating structs and loading initram, but also for some misc.

To make it more clear, wrap a new function do_kexec_load which is used
to allocate structs and load initram. And the pre-work will be done in
kexec_load.

Signed-off-by: Minfei Huang
Cc: Vivek Goyal
Cc: "Eric W. Biederman"
Cc: Xunlei Pang
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minfei Huang
2016-05-24 08:04:14 +0800
917a35605 kexec: make a pair of map/unmap reserved pages in error path ... Browse Code »

For some arch, kexec shall map the reserved pages, then use them, when
we try to start the kdump service.

kexec may return directly, without unmaping the reserved pages, if it
fails during starting service. To fix it, we make a pair of map/unmap
reserved pages both in generic path and error path.

This patch only affects s390. Other architecturess don't implement the
interface of crash_unmap_reserved_pages and crash_map_reserved_pages.

It isn't a urgent patch. Kernel can work well without any risk,
although the reserved pages are not unmapped before returning in error
path.

Signed-off-by: Minfei Huang
Cc: Vivek Goyal
Cc: "Eric W. Biederman"
Cc: Xunlei Pang
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minfei Huang
2016-05-24 08:04:14 +0800
1e5768ae7 kexec: provide arch_kexec_protect(unprotect)_crashkres() ... Browse Code »

Implement the protection method for the crash kernel memory reservation
for the 64-bit x86 kdump.

Signed-off-by: Xunlei Pang
Cc: Eric Biederman
Cc: Dave Young
Cc: Minfei Huang
Cc: Vivek Goyal
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xunlei Pang
2016-05-24 08:04:14 +0800
9b492cf58 kexec: introduce a protection mechanism for the crashkernel reserved memory ... Browse Code »

For the cases that some kernel (module) path stamps the crash reserved
memory(already mapped by the kernel) where has been loaded the second
kernel data, the kdump kernel will probably fail to boot when panic
happens (or even not happens) leaving the culprit at large, this is
unacceptable.

The patch introduces a mechanism for detecting such cases:

1) After each crash kexec loading, it simply marks the reserved memory
regions readonly since we no longer access it after that. When someone
stamps the region, the first kernel will panic and trigger the kdump.
The weak arch_kexec_protect_crashkres() is introduced to do the actual
protection.

2) To allow multiple loading, once 1) was done we also need to remark
the reserved memory to readwrite each time a system call related to
kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced
to do the actual protection.

The architecture can make its specific implementation by overriding
arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres().

Signed-off-by: Xunlei Pang
Cc: Eric Biederman
Cc: Dave Young
Cc: Minfei Huang
Cc: Vivek Goyal
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xunlei Pang
2016-05-24 08:04:14 +0800
9eb8a659d exec: remove the no longer needed remove_arg_zero()->free_arg_page() ... Browse Code »

remove_arg_zero() does free_arg_page() for no reason. This was needed
before and only if CONFIG_MMU=y: see commit 4fc75ff4816c ("exec: fix
remove_arg_zero"), install_arg_page() was called for every page != NULL
in bprm->page[] array. Today install_arg_page() has already gone and
free_arg_page() is nop after another commit b6a2fea39318 ("mm: variable
length argument support").

CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't
need remove_arg_zero()->free_arg_page() too; apart from get_arg_page()
it never checks if the page in bprm->page[] was allocated or not, so the
"extra" non-freed page is fine. OTOH, this free_arg_page() can add the
minor pessimization, the caller is going to do copy_strings_kernel()
right after remove_arg_zero() which will likely need to re-allocate the
same page again.

And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong
because we are going to increment bprm->p once again before return, so
CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this
page.

NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this
is not necessarily true. copy_strings() does "len = strnlen_user(...)",
then copy_from_user(len) but another thread or debuger can overwrite the
trailing '0' in between. Afaics nothing really bad can happen because
we must always have the null-terminated bprm->filename copied by the 1st
copy_strings_kernel(), but perhaps we should change this code to check
"bprm->p < bprm->exec" anyway, and/or change copy_strings() to ensure
that the last byte in string is always zero.

Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.com
Signed-off-by: Oleg Nesterov
Reported by: hujunjie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2016-05-24 08:04:14 +0800
725fc629f kernek/fork.c: allocate idle task for a CPU always on its local node ... Browse Code »

Linux preallocates the task structs of the idle tasks for all possible
CPUs. This currently means they all end up on node 0. This also
implies that the cache line of MWAIT, which is around the flags field in
the task struct, are all located in node 0.

We see a noticeable performance improvement on Knights Landing CPUs when
the cache lines used for MWAIT are located in the local nodes of the
CPUs using them. I would expect this to give a (likely slight)
improvement on other systems too.

The patch implements placing the idle task in the node of its CPUs, by
passing the right target node to copy_process()

[akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1]
Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2016-05-24 08:04:14 +0800
5c8ccefdf signal: move the "sig < SIGRTMIN" check into siginmask(sig) ... Browse Code »

All the users of siginmask() must ensure that sig < SIGRTMIN. sig_fatal()
doesn't and this is wrong:

UBSAN: Undefined behaviour in kernel/signal.c:911:6
shift exponent 32 is too large for 32-bit type 'long unsigned int'

the patch doesn't add the neccesary check to sig_fatal(), it moves the
check into siginmask() and updates other callers.

Link: http://lkml.kernel.org/r/20160517195052.GA15187@redhat.com
Reported-by: Meelis Roos
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2016-05-24 08:04:14 +0800
747800efb kernel/signal.c: convert printk(KERN_<LEVEL> ...) to pr_<level>(...) ... Browse Code »

Use pr_ instead of printk(KERN_ ).

Signed-off-by: Wang Xiaoqiang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Xiaoqiang
2016-05-24 08:04:14 +0800