Eric Lee / smarc-fsl-linux-kernel

26 Aug, 2017

1 commit

30d6e0a41 futex: Remove duplicated code and fix undefined behaviour ... Browse Code »

There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).

Signed-off-by: Jiri Slaby
Signed-off-by: Thomas Gleixner
Acked-by: Russell King
Acked-by: Michael Ellerman (powerpc)
Acked-by: Heiko Carstens [s390]
Acked-by: Chris Metcalf [for tile]
Reviewed-by: Darren Hart (VMware)
Reviewed-by: Will Deacon [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt
Cc: Max Filippov
Cc: Paul Mackerras
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller
Cc: "James E.J. Bottomley"
Cc: Catalin Marinas
Cc: Matt Turner
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu
Cc: Arnd Bergmann
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky
Cc: Stafford Horne
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson
Cc: Chris Zankel
Cc: Michal Simek
Cc: Tony Luck
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta
Cc: Ralf Baechle
Cc: Richard Kuo
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller"
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz

Jiri Slaby
2017-08-26 04:49:59 +0800

10 Aug, 2017

2 commits

388f8e127 Merge branch 'linus' into locking/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2017-08-10 18:20:53 +0800
48fb6f4db futex: Remove unnecessary warning from get_futex_key ... Browse Code »

Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in
get_futex_key()") removed an unnecessary lock_page() with the
side-effect that page->mapping needed to be treated very carefully.

Two defensive warnings were added in case any assumption was missed and
the first warning assumed a correct application would not alter a
mapping backing a futex key. Since merging, it has not triggered for
any unexpected case but Mark Rutland reported the following bug
triggering due to the first warning.

kernel BUG at kernel/futex.c:679!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
Hardware name: linux,dummy-virt (DT)
task: ffff80001e271780 task.stack: ffff000010908000
PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
pc : [] lr : [] pstate: 80000145

The fact that it's a bug instead of a warning was due to an unrelated
arm64 problem, but the warning itself triggered because the underlying
mapping changed.

This is an application issue but from a kernel perspective it's a
recoverable situation and the warning is unnecessary so this patch
removes the warning. The warning may potentially be triggered with the
following test program from Mark although it may be necessary to adjust
NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
system.

#include
#include
#include
#include
#include
#include
#include
#include

#define NR_FUTEX_THREADS 16
pthread_t threads[NR_FUTEX_THREADS];

void *mem;

#define MEM_PROT (PROT_READ | PROT_WRITE)
#define MEM_SIZE 65536

static int futex_wrapper(int *uaddr, int op, int val,
const struct timespec *timeout,
int *uaddr2, int val3)
{
syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
}

void *poll_futex(void *unused)
{
for (;;) {
futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
}
}

int main(int argc, char *argv[])
{
int i;

mem = mmap(NULL, MEM_SIZE, MEM_PROT,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);

printf("Mapping @ %p\n", mem);

printf("Creating futex threads...\n");

for (i = 0; i < NR_FUTEX_THREADS; i++)
pthread_create(&threads[i], NULL, poll_futex, NULL);

printf("Flipping mapping...\n");
for (;;) {
mmap(mem, MEM_SIZE, MEM_PROT,
MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
}

return 0;
}

Reported-and-tested-by: Mark Rutland
Signed-off-by: Mel Gorman
Acked-by: Peter Zijlstra (Intel)
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Linus Torvalds

Mel Gorman
2017-08-10 05:00:54 +0800

01 Aug, 2017

1 commit

bc2eecd7e futex: Allow for compiling out PI support ... Browse Code »

This makes it possible to preserve basic futex support and compile out the
PI support when RT mutexes are not available.

Signed-off-by: Nicolas Pitre
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Darren Hart
Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708010024190.5981@knanqh.ubzr

Nicolas Pitre
2017-08-01 20:36:35 +0800

19 Jul, 2017

1 commit

e06fdaf40 Merge tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull structure randomization updates from Kees Cook:
"Now that IPC and other changes have landed, enable manual markings for
randstruct plugin, including the task_struct.

This is the rest of what was staged in -next for the gcc-plugins, and
comes in three patches, largest first:

- mark "easy" structs with __randomize_layout

- mark task_struct with an optional anonymous struct to isolate the
__randomize_layout section

- mark structs to opt _out_ of automated marking (which will come
later)

And, FWIW, this continues to pass allmodconfig (normal and patched to
enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and
s390 for me"

* tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randstruct: opt-out externally exposed function pointer structs
task_struct: Allow randomized layout
randstruct: Mark various structs for randomization

Linus Torvalds
2017-07-19 23:55:18 +0800

04 Jul, 2017

1 commit

650fc870a Merge tag 'docs-4.13' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"There has been a fair amount of activity in the docs tree this time
around. Highlights include:

- Conversion of a bunch of security documentation into RST

- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.

- The usual collection of fixes and minor updates"

* tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
scripts/kernel-doc: handle DECLARE_HASHTABLE
Documentation: atomic_ops.txt is core-api/atomic_ops.rst
Docs: clean up some DocBook loose ends
Make the main documentation title less Geocities
Docs: Use kernel-figure in vidioc-g-selection.rst
Docs: fix table problems in ras.rst
Docs: Fix breakage with Sphinx 1.5 and upper
Docs: Include the Latex "ifthen" package
doc/kokr/howto: Only send regression fixes after -rc1
docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
doc: Document suitability of IBM Verse for kernel development
Doc: fix a markup error in coding-style.rst
docs: driver-api: i2c: remove some outdated information
Documentation: DMA API: fix a typo in a function name
Docs: Insert missing space to separate link from text
doc/ko_KR/memory-barriers: Update control-dependencies example
Documentation, kbuild: fix typo "minimun" -> "minimum"
docs: Fix some formatting issues in request-key.rst
doc: ReSTify keys-trusted-encrypted.txt
doc: ReSTify keys-request-key.txt
...

Linus Torvalds
2017-07-04 12:13:25 +0800

01 Jul, 2017

1 commit

3859a271a randstruct: Mark various structs for randomization ... Browse Code »

This marks many critical kernel structures for randomization. These are
structures that have been targeted in the past in security exploits, or
contain functions pointers, pointers to function pointer tables, lists,
workqueues, ref-counters, credentials, permissions, or are otherwise
sensitive. This initial list was extracted from Brad Spengler/PaX Team's
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

Left out of this list is task_struct, which requires special handling
and will be covered in a subsequent patch.

Signed-off-by: Kees Cook

Kees Cook
2017-07-01 03:00:51 +0800

20 Jun, 2017

1 commit

ac6424b98 sched/wait: Rename wait_queue_t => wait_queue_entry_t ... Browse Code »

Rename:

wait_queue_t => wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-06-20 18:18:27 +0800

16 May, 2017

1 commit

7b4ff1adb mutex, futex: adjust kernel-doc markups to generate ReST ... Browse Code »

There are a few issues on some kernel-doc markups that was
causing troubles with kernel-doc output on ReST format:

./kernel/futex.c:492: WARNING: Inline emphasis start-string without end-string.
./kernel/futex.c:1264: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:1721: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2338: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2426: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2899: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2972: WARNING: Block quote ends without a blank line; unexpected unindent.

Fix them.

No functional changes.

Acked-by: Darren Hart (VMware)
Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2017-05-16 19:43:25 +0800

15 Apr, 2017

1 commit

38fcd06e9 futex: Clarify mark_wake_futex memory barrier usage ... Browse Code »

Clarify the scenario described in mark_wake_futex requiring the
smp_store_release(). Update the comment to explicitly refer to the
plist_del now under __unqueue_futex() (previously plist_del was in the
same function as the comment).

Signed-off-by: Darren Hart (VMware)
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20170414223138.GA4222@fury
Signed-off-by: Thomas Gleixner

Darren Hart (VMware)
2017-04-15 22:03:46 +0800

14 Apr, 2017

2 commits

94ffac5d8 futex: Fix small (and harmless looking) inconsistencies ... Browse Code »

During (post-commit) review Darren spotted a few minor things. One
(harmless AFAICT) type inconsistency and a comment that wasn't as
clear as hoped.

Reported-by: Darren Hart (VMWare)
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Darren Hart (VMware)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-04-14 16:30:07 +0800
97181f9bd futex: Avoid freeing an active timer ... Browse Code »

Alexander reported a hrtimer debug_object splat:

ODEBUG: free active (active state 0) object type: hrtimer hint: hrtimer_wakeup (kernel/time/hrtimer.c:1423)

debug_object_free (lib/debugobjects.c:603)
destroy_hrtimer_on_stack (kernel/time/hrtimer.c:427)
futex_lock_pi (kernel/futex.c:2740)
do_futex (kernel/futex.c:3399)
SyS_futex (kernel/futex.c:3447 kernel/futex.c:3415)
do_syscall_64 (arch/x86/entry/common.c:284)
entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:249)

Which was caused by commit:

cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")

... losing the hrtimer_cancel() in the shuffle. Where previously the
hrtimer_cancel() was done by rt_mutex_slowlock() we now need to do it
manually.

Reported-by: Alexander Levin
Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Fixes: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704101802370.2906@nanos
Signed-off-by: Ingo Molnar

Thomas Gleixner
2017-04-14 16:29:53 +0800

04 Apr, 2017

2 commits

aa2bfe553 rtmutex: Clean up ... Browse Code »

Previous patches changed the meaning of the return value of
rt_mutex_slowunlock(); update comments and code to reflect this.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170323150216.255058238@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-04-04 17:44:05 +0800
2a1c60299 rtmutex: Deboost before waking up the top waiter ... Browse Code »

We should deboost before waking the high-priority task, such that we
don't run two tasks with the same "state" (priority, deadline,
sched_class, etc).

In order to make sure the boosting task doesn't start running between
unlock and deboost (due to 'spurious' wakeup), we move the deboost
under the wait_lock, that way its serialized against the wait loop in
__rt_mutex_slowlock().

Doing the deboost early can however lead to priority-inversion if
current would get preempted after the deboost but before waking our
high-prio task, hence we disable preemption before doing deboost, and
enabling it after the wake up is over.

This gets us the right semantic order, but most importantly however;
this change ensures pointer stability for the next patch, where we
have rt_mutex_setprio() cache a pointer to the top-most waiter task.
If we, as before this change, do the wakeup first and then deboost,
this pointer might point into thin air.

[peterz: Changelog + patch munging]
Suggested-by: Peter Zijlstra
Signed-off-by: Xunlei Pang
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Steven Rostedt
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170323150216.110065320@infradead.org
Signed-off-by: Thomas Gleixner

Xunlei Pang
2017-04-04 17:44:05 +0800

24 Mar, 2017

12 commits

56222b212 futex: Drop hb->lock before enqueueing on the rtmutex ... Browse Code »

When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI
chain code will (falsely) report a deadlock and BUG.

The problem is that it hold hb->lock (now an rt_mutex) while doing
task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when
interleaved just right with futex_unlock_pi() leads it to believe to see an
AB-BA deadlock.

Task1 (holds rt_mutex, Task2 (does FUTEX_LOCK_PI)
does FUTEX_UNLOCK_PI)

lock hb->lock
lock rt_mutex (as per start_proxy)
lock hb->lock

Which is a trivial AB-BA.

It is not an actual deadlock, because it won't be holding hb->lock by the
time it actually blocks on the rt_mutex, but the chainwalk code doesn't
know that and it would be a nightmare to handle this gracefully.

To avoid this problem, do the same as in futex_unlock_pi() and drop
hb->lock after acquiring wait_lock. This still fully serializes against
futex_unlock_pi(), since adding to the wait_list does the very same lock
dance, and removing it holds both locks.

Aside of solving the RT problem this makes the lock and unlock mechanism
symetric and reduces the hb->lock held time.

Reported-and-tested-by: Sebastian Andrzej Siewior
Suggested-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104152.161341537@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:14:59 +0800
bebe5b514 futex: Futex_unlock_pi() determinism ... Browse Code »

The problem with returning -EAGAIN when the waiter state mismatches is that
it becomes very hard to proof a bounded execution time on the
operation. And seeing that this is a RT operation, this is somewhat
important.

While in practise; given the previous patch; it will be very unlikely to
ever really take more than one or two rounds, proving so becomes rather
hard.

However, now that modifying wait_list is done while holding both hb->lock
and wait_lock, the scenario can be avoided entirely by acquiring wait_lock
while still holding hb-lock. Doing a hand-over, without leaving a hole.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:10 +0800
cfafcd117 futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() ... Browse Code »

By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list
modifications are done under both hb->lock and wait_lock.

This closes the obvious interleave pattern between futex_lock_pi() and
futex_unlock_pi(), but not entirely so. See below:

Before:

futex_lock_pi() futex_unlock_pi()
unlock hb->lock

lock hb->lock
unlock hb->lock

lock rt_mutex->wait_lock
unlock rt_mutex_wait_lock
-EAGAIN

lock rt_mutex->wait_lock
list_add
unlock rt_mutex->wait_lock

schedule()

lock rt_mutex->wait_lock
list_del
unlock rt_mutex->wait_lock

-EAGAIN

lock hb->lock

After:

futex_lock_pi() futex_unlock_pi()

lock hb->lock
lock rt_mutex->wait_lock
list_add
unlock rt_mutex->wait_lock
unlock hb->lock

schedule()
lock hb->lock
unlock hb->lock
lock hb->lock
lock rt_mutex->wait_lock
list_del
unlock rt_mutex->wait_lock

lock rt_mutex->wait_lock
unlock rt_mutex_wait_lock
-EAGAIN

unlock hb->lock

It does however solve the earlier starvation/live-lock scenario which got
introduced with the -EAGAIN since unlike the before scenario; where the
-EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the
after scenario it happens while futex_unlock_pi() actually holds a lock,
and then it is serialized on that lock.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:09 +0800
38d589f2f futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() ... Browse Code »

With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters
consistent it's necessary to split 'rt_mutex_futex_lock()' into finer
parts, such that only the actual blocking can be done without hb->lock
held.

Split split_mutex_finish_proxy_lock() into two parts, one that does the
blocking and one that does remove_waiter() when the lock acquire failed.

When the rtmutex was acquired successfully the waiter can be removed in the
acquisiton path safely, since there is no concurrency on the lock owner.

This means that, except for futex_lock_pi(), all wait_list modifications
are done with both hb->lock and wait_lock held.

[bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart]

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:09 +0800
50809358d futex,rt_mutex: Introduce rt_mutex_init_waiter() ... Browse Code »

Since there's already two copies of this code, introduce a helper now
before adding a third one.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.950039479@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:09 +0800
16ffa12d7 futex: Pull rt_mutex_futex_unlock() out from under hb->lock ... Browse Code »

There's a number of 'interesting' problems, all caused by holding
hb->lock while doing the rt_mutex_unlock() equivalient.

Notably:

- a PI inversion on hb->lock; and,

- a SCHED_DEADLINE crash because of pointer instability.

The previous changes:

- changed the locking rules to cover {uval,pi_state} with wait_lock.

- allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in
turn allows to rely on wait_lock atomicity completely.

- simplified the waiter conundrum.

It's now sufficient to hold rtmutex::wait_lock and a reference on the
pi_state to protect the state consistency, so hb->lock can be dropped
before calling rt_mutex_futex_unlock().

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:08 +0800
73d786bd0 futex: Rework inconsistent rt_mutex/futex_q state ... Browse Code »

There is a weird state in the futex_unlock_pi() path when it interleaves
with a concurrent futex_lock_pi() at the point where it drops hb->lock.

In this case, it can happen that the rt_mutex wait_list and the futex_q
disagree on pending waiters, in particular rt_mutex will find no pending
waiters where futex_q thinks there are. In this case the rt_mutex unlock
code cannot assign an owner.

The futex side fixup code has to cleanup the inconsistencies with quite a
bunch of interesting corner cases.

Simplify all this by changing wake_futex_pi() to return -EAGAIN when this
situation occurs. This then gives the futex_lock_pi() code the opportunity
to continue and the retried futex_unlock_pi() will now observe a coherent
state.

The only problem is that this breaks RT timeliness guarantees. That
is, consider the following scenario:

T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)

CPU0

T1
lock_pi()
queue_me()
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:08 +0800
bf92cf3a5 futex: Cleanup refcounting ... Browse Code »

Add a put_pit_state() as counterpart for get_pi_state() so the refcounting
becomes consistent.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.801778516@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:08 +0800
734009e96 futex: Change locking rules ... Browse Code »

Currently futex-pi relies on hb->lock to serialize everything. But hb->lock
creates another set of problems, especially priority inversions on RT where
hb->lock becomes a rt_mutex itself.

The rt_mutex::wait_lock is the most obvious protection for keeping the
futex user space value and the kernel internal pi_state in sync.

Rework and document the locking so rt_mutex::wait_lock is held accross all
operations which modify the user space value and the pi state.

This allows to invoke rt_mutex_unlock() (including deboost) without holding
hb->lock as a next step.

Nothing yet relies on the new locking rules.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.751993333@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:07 +0800
5293c2efd futex,rt_mutex: Provide futex specific rt_mutex API ... Browse Code »

Part of what makes futex_unlock_pi() intricate is that
rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop
rt_mutex::wait_lock.

This means it cannot rely on the atomicy of wait_lock, which would be
preferred in order to not rely on hb->lock so much.

The reason rt_mutex_slowunlock() needs to drop wait_lock is because it can
race with the rt_mutex fastpath, however futexes have their own fast path.

Since futexes already have a bunch of separate rt_mutex accessors, complete
that set and implement a rt_mutex variant without fastpath for them.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.702962446@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:07 +0800
1b367ece0 futex: Use smp_store_release() in mark_wake_futex() ... Browse Code »

Since the futex_q can dissapear the instruction after assigning NULL,
this really should be a RELEASE barrier. That stops loads from hitting
dead memory too.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.604296452@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:06 +0800
499f5aca2 futex: Cleanup variable names for futex_top_waiter() ... Browse Code »

futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
this to a variable 'match' totally obscures the code.

Signed-off-by: Peter Zijlstra (Intel)
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.554710645@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-24 02:10:06 +0800

15 Mar, 2017

2 commits

9bbb25afe futex: Add missing error handling to FUTEX_REQUEUE_PI ... Browse Code »

Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Darren Hart
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-15 04:45:36 +0800
c236c8e95 futex: Fix potential use-after-free in FUTEX_REQUEUE_PI ... Browse Code »

While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.

pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.

Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().

Reported-by: Dmitry Vyukov
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Darren Hart
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2017-03-15 04:45:36 +0800

02 Mar, 2017

2 commits

6e84f3152 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:28 +0800
84f001e15 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/wake_q.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:26 +0800

28 Feb, 2017

1 commit

f1f100764 mm: add new mmgrab() helper ... Browse Code »

Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum
Acked-by: Michal Hocko
Acked-by: Peter Zijlstra (Intel)
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vegard Nossum
2017-02-28 10:43:48 +0800

13 Feb, 2017

1 commit

25f71d1c3 futex: Move futex_init() to core_initcall ... Browse Code »

The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.

The user mode helper is triggered by device init calls and the executable
might use the futex syscall.

futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.

If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.

Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.

[ tglx: Rewrote changelog ]

Signed-off-by: Yang Yang
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner

Yang Yang
2017-02-13 23:12:22 +0800

26 Dec, 2016

1 commit

2456e8553 ktime: Get rid of the union ... Browse Code »

ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra

Thomas Gleixner
2016-12-26 00:21:22 +0800

21 Nov, 2016

1 commit

194a6b5b9 sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q ... Browse Code »

Currently the wake_q data structure is defined by the WAKE_Q() macro.
This macro, however, looks like a function doing something as "wake" is
a verb. Even checkpatch.pl was confused as it reported warnings like

WARNING: Missing a blank line after declarations
#548: FILE: kernel/futex.c:3665:
+ int ret;
+ WAKE_Q(wake_q);

This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies
what the macro is doing and eliminates the checkpatch.pl warnings.

Signed-off-by: Waiman Long
Acked-by: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com
[ Resolved conflict and added missing rename. ]
Signed-off-by: Ingo Molnar

Waiman Long
2016-11-21 17:29:01 +0800

05 Sep, 2016

1 commit

e8b61b3f2 futex: Add some more function commentry ... Browse Code »

Add some more comments and reformat existing ones to kernel doc style.

Signed-off-by: Thomas Gleixner
Signed-off-by: Sebastian Andrzej Siewior
Reviewed-by: Darren Hart
Link: http://lkml.kernel.org/r/1464770609-30168-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-09-05 23:20:18 +0800

30 Jul, 2016

1 commit

784bdf3bb futex: Assume all mappings are private on !MMU systems ... Browse Code »

To quote Rick why there is no need for shared mapping on !MMU systems:

|With MMU, shared futex keys need to identify the physical backing for
|a memory address because it may be mapped at different addresses in
|different processes (or even multiple times in the same process).
|Without MMU this cannot happen. You only have physical addresses. So
|the "private futex" behavior of using the virtual address as the key
|is always correct (for both shared and private cases) on nommu
|systems.

This patch disables the FLAGS_SHARED in a way that allows the compiler to
remove that code.

[bigeasy: Added changelog ]
Reported-by: Rich Felker
Signed-off-by: Thomas Gleixner
Signed-off-by: Sebastian Andrzej Siewior
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20160729143230.GA21715@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-07-30 00:44:14 +0800

09 Jun, 2016

1 commit

077fa7aed futex: Calculate the futex key based on a tail page for file-based futexes ... Browse Code »

Mike Galbraith reported that the LTP test case futex_wake04 was broken
by commit 65d8fc777f6d ("futex: Remove requirement for lock_page()
in get_futex_key()").

This test case uses futexes backed by hugetlbfs pages and so there is an
associated inode with a futex stored on such pages. The problem is that
the key is being calculated based on the head page index of the hugetlbfs
page and not the tail page.

Prior to the optimisation, the page lock was used to stabilise mappings and
pin the inode is file-backed which is overkill. If the page was a compound
page, the head page was automatically looked up as part of the page lock
operation but the tail page index was used to calculate the futex key.

After the optimisation, the compound head is looked up early and the page
lock is only relied upon to identify truncated pages, special pages or a
shmem page moving to swapcache. The head page is looked up because without
the page lock, special care has to be taken to pin the inode correctly.
However, the tail page is still required to calculate the futex key so
this patch records the tail page.

On vanilla 4.6, the output of the test case is;

futex_wake04 0 TINFO : Hugepagesize 2097152
futex_wake04 1 TFAIL : futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs.

With the patch applied

futex_wake04 0 TINFO : Hugepagesize 2097152
futex_wake04 1 TPASS : Hi hydra, thread2 awake!

Fixes: 65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()"
Reported-and-tested-by: Mike Galbraith
Signed-off-by: Mel Gorman
Acked-by: Peter Zijlstra (Intel)
Reviewed-by: Davidlohr Bueso
Cc: Sebastian Andrzej Siewior
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160608132522.GM2469@suse.de
Signed-off-by: Thomas Gleixner

Mel Gorman
2016-06-09 01:23:54 +0800

23 May, 2016

1 commit

bd28b1459 x86: remove more uaccess_32.h complexity ... Browse Code »

I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.

For example, the 32-bit version of "__copy_from_user_inatomic()" is
mostly the special cases for the constant size, and it's actually almost
never relevant. Most users aren't actually using a constant size
anyway, and the few cases that do small constant copies are better off
just using __get_user() instead.

So get rid of the unnecessary complexity.

Signed-off-by: Linus Torvalds

Linus Torvalds
2016-05-23 08:21:27 +0800

21 Apr, 2016

1 commit

fe1bce9e2 futex: Acknowledge a new waiter in counter before plist ... Browse Code »

Otherwise an incoming waker on the dest hash bucket can miss
the waiter adding itself to the plist during the lockless
check optimization (small window but still the correct way
of doing this); similarly to the decrement counterpart.

Suggested-by: Peter Zijlstra
Signed-off-by: Davidlohr Bueso
Cc: Davidlohr Bueso
Cc: bigeasy@linutronix.de
Cc: dvhart@infradead.org
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1461208164-29150-1-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner

Davidlohr Bueso
2016-04-21 17:06:09 +0800

20 Apr, 2016

1 commit

89e9e66ba futex: Handle unlock_pi race gracefully ... Browse Code »

If userspace calls UNLOCK_PI unconditionally without trying the TID -> 0
transition in user space first then the user space value might not have the
waiters bit set. This opens the following race:

CPU0 CPU1
uval = get_user(futex)
lock(hb)
lock(hb)
futex |= FUTEX_WAITERS
....
unlock(hb)

cmpxchg(futex, uval, newval)

So the cmpxchg fails and returns -EINVAL to user space, which is wrong because
the futex value is valid.

To handle this (yes, yet another) corner case gracefully, check for a flag
change and retry.

[ tglx: Massaged changelog and slightly reworked implementation ]

Fixes: ccf9e6a80d9e ("futex: Make unlock_pi more robust")
Signed-off-by: Sebastian Andrzej Siewior
Cc: stable@vger.kernel.org
Cc: Davidlohr Bueso
Cc: Darren Hart
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1460723739-5195-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner

Sebastian Andrzej Siewior
2016-04-20 18:33:13 +0800