Eric Lee / smarc-fsl-linux-kernel

12 Oct, 2016

40 commits

dbf52682c kthread: better support freezable kthread workers ... Browse Code »

This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
9a6b06c8d kthread: allow to modify delayed kthread work ... Browse Code »

There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.

This patch implements kthread_mod_delayed_work() as inspired workqueues.
It cancels the timer, removes the work from any worker list and queues it
again with the given timeout.

A very special case is when the work is being canceled at the same time.
It might happen because of the regular kthread_cancel_delayed_work_sync()
or by another kthread_mod_delayed_work(). In this case, we do nothing and
let the other operation win. This should not normally happen as the caller
is supposed to synchronize these operations a reasonable way.

Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
37be45d49 kthread: allow to cancel kthread work ... Browse Code »

We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.

This patch implements cancel_*_sync() operations as inspired by
workqueues. Well, we are synchronized against the other operations via
the worker lock, we use del_timer_sync() and a counter to count parallel
cancel operations. Therefore the implementation might be easier.

First, we check if a worker is assigned. If not, the work has newer been
queued after it was initialized.

Second, we take the worker lock. It must be the right one. The work must
not be assigned to another worker unless it is initialized in between.

Third, we try to cancel the timer when it exists. The timer is deleted
synchronously to make sure that the timer call back is not running. We
need to temporary release the worker->lock to avoid a possible deadlock
with the callback. In the meantime, we set work->canceling counter to
avoid any queuing.

Fourth, we try to remove the work from a worker list. It might be
the list of either normal or delayed works.

Fifth, if the work is running, we call kthread_flush_work(). It might
take an arbitrary time. We need to release the worker-lock again. In the
meantime, we again block any queuing by the canceling counter.

As already mentioned, the check for a pending kthread work is done under a
lock. In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations. Therefore we do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_flush_work(). Any queuing is blocked until the counter gets zero.

Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
22597dc3d kthread: initial support for delayed kthread work ... Browse Code »

We are going to use kthread_worker more widely and delayed works
will be pretty useful.

The implementation is inspired by workqueues. It uses a timer to queue
the work after the requested delay. If the delay is zero, the work is
queued immediately.

In compare with workqueues, each work is associated with a single worker
(kthread). Therefore the implementation could be much easier. In
particular, we use the worker->lock to synchronize all the operations with
the work. We do not need any atomic operation with a flags variable.

In fact, we do not need any state variable at all. Instead, we add a list
of delayed works into the worker. Then the pending work is listed either
in the list of queued or delayed works. And the existing check of pending
works is the same even for the delayed ones.

A work must not be assigned to another worker unless reinitialized.
Therefore the timer handler might expect that dwork->work->worker is valid
and it could simply take the lock. We just add some sanity checks to help
with debugging a potential misuse.

Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
8197b3d43 kthread: detect when a kthread work is used by more workers ... Browse Code »

Nothing currently prevents a work from queuing for a kthread worker when
it is already running on another one. This means that the work might run
in parallel on more than one worker. Also some operations are not
reliable, e.g. flush.

This problem will be even more visible after we add kthread_cancel_work()
function. It will only have "work" as the parameter and will use
worker->lock to synchronize with others.

Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.

This patch adds a warning when we try to insert the work for another
worker. It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.

It adds the same warning also into kthread_flush_work() instead of the
repeated attempts to get the right lock.

A side effect is that one needs to explicitly reinitialize the work if it
must be queued into another worker. This is needed, for example, when the
worker is stopped and started again. It is a bit inconvenient. But it
looks like a good compromise between the stability and complexity.

I have double checked all existing users of the kthread worker API and
they all seems to initialize the work after the worker gets started.

Just for completeness, the patch adds a check that the work is not already
in a queue.

The patch also puts all the checks into a separate function. It will be
reused when implementing delayed works.

Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Cc: Oleg Nesterov
Cc: Tejun Heo
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
35033fe9c kthread: add kthread_destroy_worker() ... Browse Code »

The current kthread worker users call flush() and stop() explicitly.
This function does the same plus it frees the kthread_worker struct
in one call.

It is supposed to be used together with kthread_create_worker*() that
allocates struct kthread_worker.

Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Cc: Oleg Nesterov
Cc: Tejun Heo
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
fbae2d44a kthread: add kthread_create_worker*() ... Browse Code »

Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines kthread_create_worker() and
kthread_create_worker_on_cpu() functions that hide implementation details.

They enforce using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think that
we do not want any alternative main thread because it would be hard to
support consistency with the rest of the kthread worker API.

The naming and function of kthread_create_worker() is inspired by the
workqueues API like the rest of the kthread worker API.

The kthread_create_worker_on_cpu() variant is motivated by the original
kthread_create_on_cpu(). Note that we need to bind per-CPU kthread
workers already when they are created. It makes the life easier.
kthread_bind() could not be used later for an already running worker.

This patch does _not_ convert existing kthread workers. The kthread
worker API need more improvements first, e.g. a function to destroy the
worker.

IMPORTANT:

kthread_create_worker_on_cpu() allows to use any format of the worker
name, in compare with kthread_create_on_cpu(). The good thing is that it
is more generic. The bad thing is that most users will need to pass the
cpu number in two parameters, e.g. kthread_create_worker_on_cpu(cpu,
"helper/%d", cpu).

To be honest, the main motivation was to avoid the need for an empty
va_list. The only legal way was to create a helper function that would be
called with an empty list. Other attempts caused compilation warnings or
even errors on different architectures.

There were also other alternatives, for example, using #define or
splitting __kthread_create_worker(). The used solution looked like the
least ugly.

Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
255451e45 kthread: allow to call __kthread_create_on_node() with va_list args ... Browse Code »

kthread_create_on_node() implements a bunch of logic to create the
kthread. It is already called by kthread_create_on_cpu().

We are going to extend the kthread worker API and will need to call
kthread_create_on_node() with va_list args there.

This patch does only a refactoring and does not modify the existing
behavior.

Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
a65d40961 kthread/smpboot: do not park in kthread_create_on_cpu() ... Browse Code »

kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e
("kthread: Implement park/unpark facility"). It is currently used only
when enabling new CPU. For this purpose, the newly created kthread has to
be parked.

The CPU binding is a bit tricky. The kthread is parked when the CPU has
not been allowed yet. And the CPU is bound when the kthread is unparked.

The function would be useful for more per-CPU kthreads, e.g.
bnx2fc_thread, fcoethread. For this purpose, the newly created kthread
should stay in the uninterruptible state.

This patch moves the parking into smpboot. It binds the thread already
when created. Then the function might be used universally. Also the
behavior is consistent with kthread_create() and kthread_create_on_node().

Link: http://lkml.kernel.org/r/1470754545-17632-4-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Reviewed-by: Thomas Gleixner
Cc: Oleg Nesterov
Cc: Tejun Heo
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
3989144f8 kthread: kthread worker API cleanup ... Browse Code »

A good practice is to prefix the names of functions by the name
of the subsystem.

The kthread worker API is a mix of classic kthreads and workqueues. Each
worker has a dedicated kthread. It runs a generic function that process
queued works. It is implemented as part of the kthread subsystem.

This patch renames the existing kthread worker API to use
the corresponding name from the workqueues API prefixed by
kthread_:

__init_kthread_worker() -> __kthread_init_worker()
init_kthread_worker() -> kthread_init_worker()
init_kthread_work() -> kthread_init_work()
insert_kthread_work() -> kthread_insert_work()
queue_kthread_work() -> kthread_queue_work()
flush_kthread_work() -> kthread_flush_work()
flush_kthread_worker() -> kthread_flush_worker()

Note that the names of DEFINE_KTHREAD_WORK*() macros stay
as they are. It is common that the "DEFINE_" prefix has
precedence over the subsystem names.

Note that INIT() macros and init() functions use different
naming scheme. There is no good solution. There are several
reasons for this solution:

+ "init" in the function names stands for the verb "initialize"
aka "initialize worker". While "INIT" in the macro names
stands for the noun "INITIALIZER" aka "worker initializer".

+ INIT() macros are used only in DEFINE() macros

+ init() functions are used close to the other kthread()
functions. It looks much better if all the functions
use the same scheme.

+ There will be also kthread_destroy_worker() that will
be used close to kthread_cancel_work(). It is related
to the init() function. Again it looks better if all
functions use the same naming scheme.

+ there are several precedents for such init() function
names, e.g. amd_iommu_init_device(), free_area_init_node(),
jump_label_init_type(), regmap_init_mmio_clk(),

+ It is not an argument but it was inconsistent even before.

[arnd@arndb.de: fix linux-next merge conflict]
Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
Suggested-by: Andrew Morton
Signed-off-by: Petr Mladek
Cc: Oleg Nesterov
Cc: Tejun Heo
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
e700591ae kthread: rename probe_kthread_data() to kthread_probe_data() ... Browse Code »

Patch series "kthread: Kthread worker API improvements"

The intention of this patchset is to make it easier to manipulate and
maintain kthreads. Especially, I want to replace all the custom main
cycles with a generic one. Also I want to make the kthreads sleep in a
consistent state in a common place when there is no work.

This patch (of 11):

A good practice is to prefix the names of functions by the name of the
subsystem.

This patch fixes the name of probe_kthread_data(). The other wrong
functions names are part of the kthread worker API and will be fixed
separately.

Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek
Suggested-by: Andrew Morton
Acked-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Thomas Gleixner
Cc: Jiri Kosina
Cc: Borislav Petkov
Cc: Michal Hocko
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Mladek
2016-10-12 06:06:33 +0800
d0c75f33f scripts/tags.sh: enable code completion in VIM ... Browse Code »

Vim, with the omnicppcomplete(#1) plugin, can do code completion using
information build by ctags. Add flags needed by omnicppcomplete(#2) to
have completion on member of structure.

1: https://github.com/vim-scripts/omnicppcomplete
2: https://github.com/vim-scripts/OmniCppComplete/blob/master/doc/omnicppcomplete.txt#L93

Link: http://lkml.kernel.org/r/20160830191546.4469-1-mathieu.maret@gmail.com
Signed-off-by: Mathieu Maret
Cc: Michal Marek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Maret
2016-10-12 06:06:33 +0800
9099daed9 mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping ... Browse Code »

Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
physical address to a virtual one using __va(). However, such physical
addresses may sometimes be located in highmem and using __va() is
incorrect, leading to inconsistent object tracking in kmemleak.

The following functions have been added to the kmemleak API and they take
a physical address as the object pointer. They only perform the
corresponding action if the address has a lowmem mapping:

kmemleak_alloc_phys
kmemleak_free_part_phys
kmemleak_not_leak_phys
kmemleak_ignore_phys

The affected calling places have been updated to use the new kmemleak
API.

Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas
Reported-by: Vignesh R
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2016-10-12 06:06:33 +0800
0549a3c02 kdump, vmcoreinfo: report memory sections virtual addresses ... Browse Code »

KASLR memory randomization can randomize the base of the physical memory
mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
(VMEMMAP_START). Adding these variables on VMCOREINFO so tools can easily
identify the base of each memory section.

Link: http://lkml.kernel.org/r/1471531632-23003-1-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier
Acked-by: Baoquan He
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H . Peter Anvin"
Cc: Eric Biederman
Cc: Xunlei Pang
Cc: HATAYAMA Daisuke
Cc: Kees Cook
Cc: Eugene Surovegin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Garnier
2016-10-12 06:06:33 +0800
2a1613a58 ipc/sem.c: add cond_resched in exit_sme ... Browse Code »

In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
exit_sem. Apparently it's possible for the loop to take quite a long time
and it doesn't have a scheduling point in it. Since the codes is
executing under an rcu read section this may also cause rcu stalls, which
in turn block synchronize_rcu operations, which more or less de-stabilises
the whole system.

Fix this by introducing a cond_resched() at the beginning of the loop.

So this patch fixes the following:

NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
CPU: 10 PID: 18119 Comm: httpd Tainted: G O 4.4.20-clouder2 #6
Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
RIP: 0010:[] [] _raw_spin_lock+0x17/0x30
RSP: 0018:ffff881c95553e40 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
FS: 0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
Call Trace:
? exit_sem+0x7c/0x280
do_exit+0x338/0xb40
do_group_exit+0x43/0xd0
SyS_exit_group+0x14/0x20
entry_SYSCALL_64_fastpath+0x16/0x6e

Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.com
Signed-off-by: Nikolay Borisov
Cc: Herton R. Krzesinski
Cc: Fabian Frederick
Cc: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikolay Borisov
2016-10-12 06:06:33 +0800
ed27f9122 ipc/msg: avoid waking sender upon full queue ... Browse Code »

Blocked tasks queued in q_senders waiting for their message to fit in the
queue are blindly awoken every time we think there's a remote chance this
might happen. This could cause numerous (and expensive -- thundering
herd-ish) bogus wakeups if the queue is still really full. Adding to the
scheduling cost/overhead, there's also the fact that we need to take the
ipc object lock and requeue ourselves in the q_senders list.

By keeping track of the blocked sender's message size, we can know
previously if the wakeup ought to occur or not. Otherwise, to maintain
the current wakeup order we just move it to the tail. This is exactly
what occurs right now if the sender needs to go back to sleep.

The case of EIDRM is left completely untouched, as we need to wakeup all
the tasks, and shouldn't be playing games in the first place.

This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
switches (avg out of ten runs). Although these tests are really about
functionality (as opposed to performance), is does show the direct
benefits of the optimization.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Acked-by: Peter Zijlstra (Intel)
Cc: Manfred Spraul
Cc: Sebastian Andrzej Siewior
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2016-10-12 06:06:33 +0800
d0d6a2a95 ipc/msg: make ss_wakeup() kill arg boolean ... Browse Code »

... 'tis annoying.

Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Acked-by: Peter Zijlstra (Intel)
Cc: Manfred Spraul
Cc: Sebastian Andrzej Siewior
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2016-10-12 06:06:33 +0800
e3658538b ipc/msg: batch queue sender wakeups ... Browse Code »

Currently the use of wake_qs in sysv msg queues are only for the receiver
tasks that are blocked on the queue. But blocked sender tasks (due to
queue size constraints) still are awoken with the ipc object lock held,
which can be a problem particularly for small sized queues and far from
gracious for -rt (just like it was for the receiver side).

The paths that actually wakeup a sender are obviously related to when we
are either getting rid of the queue or after (some) space is freed-up
after a receiver takes the msg (msgrcv). Furthermore, with the exception
of msgrcv, we can always piggy-back on expunge_all that has its own tasks
lined-up for waking. Finally, upon unlinking the message, it should be no
problem delaying the wakeups a bit until after we've released the lock.

Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso
Acked-by: Peter Zijlstra (Intel)
Cc: Manfred Spraul
Cc: Sebastian Andrzej Siewior
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2016-10-12 06:06:33 +0800
ee51636ca ipc/msg: implement lockless pipelined wakeups ... Browse Code »

This patch moves the wakeup_process() invocation so it is not done under
the ipc global lock by making use of a lockless wake_q. With this change,
the waiter is woken up once the message has been assigned and it does not
need to loop on SMP if the message points to NULL. In the signal case we
still need to check the pointer under the lock to verify the state.

This change should also avoid the introduction of preempt_disable() in -RT
which avoids a busy-loop which pools for the NULL -> !NULL change if the
waiter has a higher priority compared to the waker.

By making use of wake_qs, the logic of sysv msg queues is greatly
simplified (and very well suited as we can batch lockless wakeups),
particularly around the lockless receive algorithm.

This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
Radeon R7, 12 Compute Cores 4C+8G":

test | before | after | diff
-----------------|------------|------------|----------
pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 %

Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Davidlohr Bueso
Acked-by: Peter Zijlstra (Intel)
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sebastian Andrzej Siewior
2016-10-12 06:06:33 +0800
5864a2fd3 ipc/sem.c: fix complex_count vs. simple op race ... Browse Code »

Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma->complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps

Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by:
Cc: "H. Peter Anvin"
Cc: Peter Zijlstra
Cc: Davidlohr Bueso
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc:
Cc: [3.10+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2016-10-12 06:06:33 +0800
65deb8af7 kcov: do not instrument lib/stackdepot.c ... Browse Code »

There's no point in collecting coverage from lib/stackdepot.c, as it is
not a function of syscall inputs. Disabling kcov instrumentation for that
file will reduce the coverage noise level.

Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko
Acked-by: Dmitry Vyukov
Cc: Kostya Serebryany
Cc: Andrey Konovalov
Cc: syzkaller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-10-12 06:06:32 +0800
2489a1771 config: android: enable CONFIG_SECCOMP ... Browse Code »

As of Android N, SECCOMP is required. Without it, we will get
mediaextractor error:

E /system/bin/mediaextractor: libminijail: prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER): Invalid argument

Link: http://lkml.kernel.org/r/20160908185934.18098-3-robh@kernel.org
Signed-off-by: Rob Herring
Acked-by: John Stultz
Cc: Amit Pundir
Cc: Dmitry Shmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rob Herring
2016-10-12 06:06:32 +0800
d90ae51a3 config: android: set SELinux as default security mode ... Browse Code »

Android won't boot without SELinux enabled, so make it the default.

Link: http://lkml.kernel.org/r/20160908185934.18098-2-robh@kernel.org
Signed-off-by: Rob Herring
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rob Herring
2016-10-12 06:06:32 +0800
f023a3956 config: android: move device mapper options to recommended ... Browse Code »

CONFIG_MD is in recommended, but other dependent options like DM_CRYPT and
DM_VERITY options are in base. The result is the options in base don't
get enabled when applying both base and recommended fragments. Move all
the options to recommended.

Link: http://lkml.kernel.org/r/20160908185934.18098-1-robh@kernel.org
Signed-off-by: Rob Herring
Acked-by: John Stultz
Cc: Amit Pundir
Cc: Dmitry Shmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rob Herring
2016-10-12 06:06:32 +0800
a2c6a235d config/android: Remove CONFIG_IPV6_PRIVACY ... Browse Code »

Option is long gone, see commit 5d9efa7ee99e ("ipv6: Remove privacy
config option.")

Link: http://lkml.kernel.org/r/20160811170340.9859-1-bp@alien8.de
Signed-off-by: Borislav Petkov
Cc: Rob Herring
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Borislav Petkov
2016-10-12 06:06:32 +0800
26b5679e4 relay: Use irq_work instead of plain timer for deferred wakeup ... Browse Code »

Relay avoids calling wake_up_interruptible() for doing the wakeup of
readers/consumers, waiting for the generation of new data, from the
context of a process which produced the data. This is apparently done to
prevent the possibility of a deadlock in case Scheduler itself is is
generating data for the relay, after acquiring rq->lock.

The following patch used a timer (to be scheduled at next jiffy), for
delegating the wakeup to another context.
commit 7c9cb38302e78d24e37f7d8a2ea7eed4ae5f2fa7
Author: Tom Zanussi
Date: Wed May 9 02:34:01 2007 -0700

relay: use plain timer instead of delayed work

relay doesn't need to use schedule_delayed_work() for waking readers
when a simple timer will do.

Scheduling a plain timer, at next jiffies boundary, to do the wakeup
causes a significant wakeup latency for the Userspace client, which makes
relay less suitable for the high-frequency low-payload use cases where the
data gets generated at a very high rate, like multiple sub buffers getting
filled within a milli second. Moreover the timer is re-scheduled on every
newly produced sub buffer so the timer keeps getting pushed out if sub
buffers are filled in a very quick succession (less than a jiffy gap
between filling of 2 sub buffers). As a result relay runs out of sub
buffers to store the new data.

By using irq_work it is ensured that wakeup of userspace client, blocked
in the poll call, is done at earliest (through self IPI or next timer
tick) enabling it to always consume the data in time. Also this makes
relay consistent with printk & ring buffers (trace), as they too use
irq_work for deferred wake up of readers.

[arnd@arndb.de: select CONFIG_IRQ_WORK]
Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com
Signed-off-by: Peter Zijlstra
Signed-off-by: Akash Goel
Cc: Tom Zanussi
Cc: Chris Wilson
Cc: Tvrtko Ursulin
Signed-off-by: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2016-10-12 06:06:32 +0800
ab47deb6b pps: kc: fix non-tickless system config dependency ... Browse Code »

CONFIG_NO_HZ currently only sets the default value of dynticks config so
if PPS kernel consumer needs periodic timer ticks it should depend on
!CONFIG_NO_HZ_COMMON instead of !CONFIG_NO_HZ.

Otherwise it is possible to enable it even on tickless system which has
CONFIG_NO_HZ not set and CONFIG_NO_HZ_IDLE (or CONFIG_NO_HZ_FULL) set.

Link: http://lkml.kernel.org/r/57E2B769.50202@maciej.szmigiero.name
Signed-off-by: Maciej S. Szmigiero
Acked-by: Rodolfo Giometti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Maciej S. Szmigiero
2016-10-12 06:06:32 +0800
54c721b85 mips/panic: replace smp_send_stop() with kdump friendly version in panic path ... Browse Code »

Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).

In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, kdump
routines fail to save other CPUs' registers. Additionally for MIPS
OCTEON, it misses to stop the watchdog timer.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a
weak function, and it just call smp_send_stop(). Architecture
codes should override it so that kdump can work appropriately.
This patch provides MIPS version.

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: http://lkml.kernel.org/r/20160810080950.11028.28000.stgit@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai
Reported-by: Daniel Walker
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Daniel Walker
Cc: Xunlei Pang
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Cc: David Vrabel
Cc: Toshi Kani
Cc: Ralf Baechle
Cc: David Daney
Cc: Aaro Koskinen
Cc: "Steven J. Hill"
Cc: Corey Minyard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2016-10-12 06:06:32 +0800
0ee59413c x86/panic: replace smp_send_stop() with kdump friendly version in panic path ... Browse Code »

Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).

In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, for x86, kdump
routines fail to save other CPUs' registers and disable virtualization
extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak
function, and it just call smp_send_stop(). Architecture codes should
override it so that kdump can work appropriately. This patch only
provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.

NOTES:

- Right solution would be to place crash_smp_send_stop() before
__crash_kexec() invocation in all cases and remove smp_send_stop(), but
we can't do that until all architectures implement own
crash_smp_send_stop()

- crash_smp_send_stop()-like work is still needed by
machine_crash_shutdown() because crash_kexec() can be called without
entering panic()

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai
Reported-by: Daniel Walker
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Daniel Walker
Cc: Xunlei Pang
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Cc: David Vrabel
Cc: Toshi Kani
Cc: Ralf Baechle
Cc: David Daney
Cc: Aaro Koskinen
Cc: "Steven J. Hill"
Cc: Corey Minyard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2016-10-12 06:06:32 +0800
2b6b535d9 nvme: use the DMA_ATTR_NO_WARN attribute ... Browse Code »

Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme
driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR).

Link: http://lkml.kernel.org/r/1470092390-25451-4-git-send-email-mauricfo@linux.vnet.ibm.com
Signed-off-by: Mauricio Faria de Oliveira
Reviewed-by: Gabriel Krisman Bertazi
Cc: Keith Busch
Cc: Jens Axboe
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Cc: Krzysztof Kozlowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mauricio Faria de Oliveira
2016-10-12 06:06:32 +0800
af8a24988 powerpc: implement the DMA_ATTR_NO_WARN attribute ... Browse Code »

Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.

Link: http://lkml.kernel.org/r/1470092390-25451-3-git-send-email-mauricfo@linux.vnet.ibm.com
Signed-off-by: Mauricio Faria de Oliveira
Acked-by: Michael Ellerman
Cc: Keith Busch
Cc: Jens Axboe
Cc: Benjamin Herrenschmidt
Cc: Krzysztof Kozlowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mauricio Faria de Oliveira
2016-10-12 06:06:32 +0800
a9a62c938 dma-mapping: introduce the DMA_ATTR_NO_WARN attribute ... Browse Code »

Introduce the DMA_ATTR_NO_WARN attribute, and document it.

Link: http://lkml.kernel.org/r/1470092390-25451-2-git-send-email-mauricfo@linux.vnet.ibm.com
Signed-off-by: Mauricio Faria de Oliveira
Cc: Keith Busch
Cc: Jens Axboe
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Cc: Krzysztof Kozlowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mauricio Faria de Oliveira
2016-10-12 06:06:32 +0800
7425154d3 random: remove unused randomize_range() ... Browse Code »

All call sites for randomize_range have been updated to use the much
simpler and more robust randomize_addr(). Remove the now unnecessary
code.

Link: http://lkml.kernel.org/r/20160803233913.32511-8-jason@lakedaemon.net
Signed-off-by: Jason Cooper
Acked-by: Kees Cook
Cc: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Cooper
2016-10-12 06:06:32 +0800
05c2679e9 unicore32: use simpler API for random address requests ... Browse Code »

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address. We can simplify
the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Link: http://lkml.kernel.org/r/20160803233913.32511-7-jason@lakedaemon.net
Signed-off-by: Jason Cooper
Acked-by: Kees Cook
Cc: "Theodore Ts'o"
Cc: Guan Xuetao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Cooper
2016-10-12 06:06:32 +0800
09fddbaf9 tile: use simpler API for random address requests ... Browse Code »

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address. We can simplify
the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Link: http://lkml.kernel.org/r/20160803233913.32511-6-jason@lakedaemon.net
Signed-off-by: Jason Cooper
Acked-by: Kees Cook
Cc: "Theodore Ts'o"
Cc: Chris Metcalf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Cooper
2016-10-12 06:06:32 +0800
fa5114c78 arm64: use simpler API for random address requests ... Browse Code »

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address. We can simplify
the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Link: http://lkml.kernel.org/r/20160803233913.32511-5-jason@lakedaemon.net
Signed-off-by: Jason Cooper
Acked-by: Will Deacon
Acked-by: Kees Cook
Cc: "Russell King - ARM Linux"
Cc: Catalin Marinas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Cooper
2016-10-12 06:06:32 +0800
c984cbf2e ARM: use simpler API for random address requests ... Browse Code »

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address. We can simplify
the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Link: http://lkml.kernel.org/r/20160803233913.32511-4-jason@lakedaemon.net
Signed-off-by: Jason Cooper
Acked-by: Kees Cook
Cc: "Russell King - ARM Linux"
Cc: "Theodore Ts'o"
Cc: Catalin Marinas
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Cooper
2016-10-12 06:06:32 +0800
9c6f0902a x86: use simpler API for random address requests ... Browse Code »

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address. We can simplify
the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Link: http://lkml.kernel.org/r/20160803233913.32511-3-jason@lakedaemon.net
Signed-off-by: Jason Cooper
Acked-by: Kees Cook
Cc: "Theodore Ts'o"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H . Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Cooper
2016-10-12 06:06:32 +0800
99fdafdea random: simplify API for random address requests ... Browse Code »

To date, all callers of randomize_range() have set the length to 0, and
check for a zero return value. For the current callers, the only way to
get zero returned is if end
Cc: Nick Kralevich
Cc: Jeffrey Vander Stoep
Cc: Daniel Cashman
Cc: Chris Metcalf
Cc: Guan Xuetao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jason Cooper
2016-10-12 06:06:32 +0800
7836a2d98 rapidio/rio_cm: use memdup_user() instead of duplicating code ... Browse Code »

Fix coccinelle warning about duplicating existing memdup_user function.

Link: http://lkml.kernel.org/r/20160811151737.20140-1-alexandre.bounine@idt.com
Link: https://lkml.org/lkml/2016/8/11/29
Signed-off-by: Alexandre Bounine
Reported-by: kbuild test robot
Cc: Matt Porter
Cc: Andre van Herk
Cc: Barry Wood
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Bounine
2016-10-12 06:06:32 +0800