Eric Lee / smarc-fsl-linux-kernel

14 Jan, 2017

28 commits

1e24edca0 locking/atomic, kref: Add KREF_INIT() ... Browse Code »

Since we need to change the implementation, stop exposing internals.

Provide KREF_INIT() to allow static initialization of struct kref.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Greg Kroah-Hartman
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-01-14 18:37:18 +0800
2b0b21113 locking/ww_mutex: Add ww_mutex to tools/testing/selftests ... Browse Code »

Add the minimal test running (modprobe test-ww_mutex) to the kselftests
CI framework.

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Shuah Khan
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161201114711.28697-9-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:17 +0800
2a0c11282 locking/ww_mutex: Add kselftests for ww_mutex stress ... Browse Code »

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Nicolai Hähnle
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161201114711.28697-8-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:17 +0800
d1b42b800 locking/ww_mutex: Add kselftests for resolving ww_mutex cyclic deadlocks ... Browse Code »

Check that ww_mutexes can detect cyclic deadlocks (generalised ABBA
cycles) and resolve them by lock reordering.

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Nicolai Hähnle
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161201114711.28697-7-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:16 +0800
70207686e locking/ww_mutex: Add kselftests for ww_mutex ABBA deadlock detection ... Browse Code »

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Nicolai Hähnle
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161201114711.28697-6-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:16 +0800
c22fb3807 locking/ww_mutex: Add kselftests for ww_mutex AA deadlock detection ... Browse Code »

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Nicolai Hähnle
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161201114711.28697-5-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:15 +0800
f2a5fec17 locking/ww_mutex: Begin kselftests for ww_mutex ... Browse Code »

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Nicolai Hähnle
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161201114711.28697-4-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:14 +0800
0186a6cbd locking/ww_mutex: Add ww_mutex to locktorture test ... Browse Code »

Although ww_mutexes degenerate into mutexes, it would be useful to
torture the deadlock handling between multiple ww_mutexes in addition to
torturing the regular mutexes.

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Nicolai Hähnle
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161201114711.28697-3-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:14 +0800
af2e859ed locking/ww_mutex: Fix compilation of __WW_MUTEX_INITIALIZER ... Browse Code »

From conflicting macro parameters, passing the wrong name to
__MUTEX_INITIALIZER and a stray '\', #define __WW_MUTEX_INITIALIZER was
very unhappy.

One unnecessary change was to choose to pass &ww_class instead of
implicitly taking the address of the class within the macro.

Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 1b375dc30710 ("mutex: Move ww_mutex definitions to ww_mutex.h")
Link: http://lkml.kernel.org/r/20161201114711.28697-2-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2017-01-14 18:37:13 +0800
27bd57aa8 locking/ww_mutex/Documentation: Update the design document ... Browse Code »

Document the invariants we maintain for the wait list of ww_mutexes.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-13-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:55 +0800
977625a69 locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging ... Browse Code »

Help catch cases where mutex_lock is used directly on w/w mutexes, which
otherwise result in the w/w tasks reading uninitialized data.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-12-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:53 +0800
c516df978 locking/ww_mutex: Optimize ww-mutexes by yielding to other waiters from optimistic spin ... Browse Code »

Lock stealing is less beneficial for w/w mutexes since we may just end up
backing off if we stole from a thread with an earlier acquire stamp that
already holds another w/w mutex that we also need. So don't spin
optimistically unless we are sure that there is no other waiter that might
cause us to back off.

Median timings taken of a contention-heavy GPU workload:

Before:

real 0m52.946s
user 0m7.272s
sys 1m55.964s

After:

real 0m53.086s
user 0m7.360s
sys 1m46.204s

This particular workload still spends 20%-25% of CPU in mutex_spin_on_owner
according to perf, but my attempts to further reduce this spinning based on
various heuristics all lead to an increase in measured wall time despite
the decrease in sys time.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-11-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:52 +0800
25f13b404 locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop ... Browse Code »

In the following scenario, thread #1 should back off its attempt to lock
ww1 and unlock ww2 (assuming the acquire context stamps are ordered
accordingly).

Thread #0 Thread #1
--------- ---------
successfully lock ww2
set ww1->base.owner
attempt to lock ww1
confirm ww1->ctx == NULL
enter mutex_spin_on_owner
set ww1->ctx

What was likely to happen previously is:

attempt to lock ww2
refuse to spin because
ww2->ctx != NULL
schedule()
detect thread #0 is off CPU
stop optimistic spin
return -EDEADLK
unlock ww2
wakeup thread #0
lock ww2

Now, we are more likely to see:

detect ww1->ctx != NULL
stop optimistic spin
return -EDEADLK
unlock ww2
successfully lock ww2

... because thread #1 will stop its optimistic spin as soon as possible.

The whole scenario is quite unlikely, since it requires thread #1 to get
between thread #0 setting the owner and setting the ctx. But since we're
idling here anyway, the additional check is basically free.

Found by inspection.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-10-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:51 +0800
427b18207 locking/mutex: Improve inlining ... Browse Code »

Instead of inlining __mutex_lock_common() 5 times, once for each
{state,ww} variant. Reduce this to two, ww and !ww.

Then add __always_inline to mutex_optimistic_spin(), so that that will
get inlined all 4 remaining times, for all {waiter,ww} variants.

text data bss dec hex filename

6301 0 0 6301 189d defconfig-build/kernel/locking/mutex.o
4053 0 0 4053 fd5 defconfig-build/kernel/locking/mutex.o
4257 0 0 4257 10a1 defconfig-build/kernel/locking/mutex.o

This reduces total text size and better separates the ww and !ww mutex
code generation.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-01-14 18:14:48 +0800
659cf9f58 locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff w… ... Browse Code »

…hen acquiring the lock

The wait list is sorted by stamp order, and the only waiting task that may
have to back off is the first waiter with a context.

The regular slow path does not have to wake any other tasks at all, since
all other waiters that would have to back off were either woken up when
the waiter was added to the list, or detected the condition before they
added themselves.

Median timings taken of a contention-heavy GPU workload:

Without this series:

real 0m59.900s
user 0m7.516s
sys 2m16.076s

With changes up to and including this patch:

real 0m52.946s
user 0m7.272s
sys 1m55.964s

Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-9-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Nicolai Hähnle
2017-01-14 18:14:44 +0800
200b18744 locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list ... Browse Code »

While adding our task as a waiter, detect if another task should back off
because of us.

With this patch, we establish the invariant that the wait list contains
at most one (sleeping) waiter with ww_ctx->acquired > 0, and this waiter
will be the first waiter with a context.

Since only waiters with ww_ctx->acquired > 0 have to back off, this allows
us to be much more economical with wakeups.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-8-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:43 +0800
6baa5c60a locking/ww_mutex: Add waiters in stamp order ... Browse Code »

Add regular waiters in stamp order. Keep adding waiters that have no
context in FIFO order and take care not to starve them.

While adding our task as a waiter, back off if we detect that there is
a waiter with a lower stamp in front of us.

Make sure to call lock_contended even when we back off early.

For w/w mutexes, being first in the wait list is only stable when
taking the lock without a context. Therefore, the purpose of the first
flag is split into two: 'first' remains to indicate whether we want to
spin optimistically, while 'handoff' indicates that we should be
prepared to accept a handoff.

For w/w locking with a context, we always accept handoffs after the
first schedule(), to handle the following sequence of events:

1. Task #0 unlocks and hands off to Task #2 which is first in line

2. Task #1 adds itself in front of Task #2

3. Task #2 wakes up and must accept the handoff even though it is no
longer first in line

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?=
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-7-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:42 +0800
c5470b22d locking/ww_mutex: Remove the __ww_mutex_lock*() inline wrappers ... Browse Code »

Keep the documentation in the header file since there is no good place
for it in mutex.c: there are two rather different implementations with
different EXPORT_SYMBOLs for each function.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?=
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-6-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:41 +0800
ea9e0fb8f locking/ww_mutex: Set use_ww_ctx even when locking without a context ... Browse Code »

We will add a new field to struct mutex_waiter. This field must be
initialized for all waiters if any waiter uses the ww_use_ctx path.

So there is a trade-off: Keep ww_mutex locking without a context on
the faster non-use_ww_ctx path, at the cost of adding the
initialization to all mutex locks (including non-ww_mutexes), or avoid
the additional cost for non-ww_mutex locks, at the cost of adding
additional checks to the use_ww_ctx path.

We take the latter choice. It may be worth eliminating the users of
ww_mutex_lock(lock, NULL), but there are a lot of them.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Chris Wilson
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-5-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:40 +0800
3822da3ed locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after() ... Browse Code »

The function will be re-used in subsequent patches.

Signed-off-by: Nicolai Hähnle
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Chris Wilson
Cc: Andrew Morton
Cc: Daniel Vetter
Cc: Linus Torvalds
Cc: Maarten Lankhorst
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-4-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Hähnle
2017-01-14 18:14:39 +0800
e274795ea locking/mutex: Fix mutex handoff ... Browse Code »

While reviewing the ww_mutex patches, I noticed that it was still
possible to (incorrectly) succeed for (incorrect) code like:

mutex_lock(&a);
mutex_lock(&a);

This was possible if the second mutex_lock() would block (as expected)
but then receive a spurious wakeup. At that point it would find itself
at the front of the queue, request a handoff and instantly claim
ownership and continue, since owner would point to itself.

Avoid this scenario and simplify the code by introducing a third low
bit to signal handoff pickup. So once we request handoff, unlock
clears the handoff bit and sets the pickup bit along with the new
owner.

This also removes the need for the .handoff argument to
__mutex_trylock(), since that becomes superfluous with PICKUP.

In order to guarantee enough low bits, ensure task_struct alignment is
at least L1_CACHE_BYTES (which seems a good ideal regardless).

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 9d659ae14b54 ("locking/mutex: Add lock handoff to avoid starvation")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-01-14 18:14:38 +0800
52b94129f locking/percpu-rwsem: Replace waitqueue with rcuwait ... Browse Code »

The use of any kind of wait queue is an overkill for pcpu-rwsems.
While one option would be to use the less heavy simple (swait)
flavor, this is still too much for what pcpu-rwsems needs. For one,
we do not care about any sort of queuing in that the only (rare) time
writers (and readers, for that matter) are queued is when trying to
acquire the regular contended rw_sem. There cannot be any further
queuing as writers are serialized by the rw_sem in the first place.

Given that percpu_down_write() must not be called after exit_notify(),
we can replace the bulky waitqueue with rcuwait such that a writer
can wait for its turn to take the lock. As such, we can avoid the
queue handling and locking overhead.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Oleg Nesterov
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1484148146-14210-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2017-01-14 18:14:35 +0800
8f95c90ce sched/wait, RCU: Introduce rcuwait machinery ... Browse Code »

rcuwait provides support for (single) RCU-safe task wait/wake functionality,
with the caveat that it must not be called after exit_notify(), such that
we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
being rcu unaware in this context -- for which we similarly have
task_rcu_dereference() magic, but with different return semantics, which
can conflict with the wakeup side.

The interfaces are quite straightforward:

rcuwait_wait_event()
rcuwait_wake_up()

More details are in the comments, but it's perhaps worth mentioning at least,
that users must provide proper serialization when waiting on a condition, and
avoid corrupting a concurrent waiter. Also care must be taken between the task
and the condition for when calling the wakeup -- we cannot miss wakeups. When
porting users, this is for example, a given when using waitqueues in that
everything is done under the q->lock. As such, it can remove sources of non
preemptable unbounded work for realtime.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Oleg Nesterov
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2017-01-14 18:14:33 +0800
642fa448a sched/core: Remove set_task_state() ... Browse Code »

This is a nasty interface and setting the state of a foreign task must
not be done. As of the following commit:

be628be0956 ("bcache: Make gc wakeup sane, remove set_task_state()")

... everyone in the kernel calls set_task_state() with current, allowing
the helper to be removed.

However, as the comment indicates, it is still around for those archs
where computing current is more expensive than using a pointer, at least
in theory. An important arch that is affected is arm64, however this has
been addressed now [1] and performance is up to par making no difference
with either calls.

Of all the callers, if any, it's the locking bits that would care most
about this -- ie: we end up passing a tsk pointer to a lot of the lock
slowpath, and setting ->state on that. The following numbers are based
on two tests: a custom ad-hoc microbenchmark that just measures
latencies (for ~65 million calls) between get_task_state() vs
get_current_state().

Secondly for a higher overview, an unlink microbenchmark was used,
which pounds on a single file with open, close,unlink combos with
increasing thread counts (up to 4x ncpus). While the workload is quite
unrealistic, it does contend a lot on the inode mutex or now rwsem.

[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com

== 1. x86-64 ==

Avg runtime set_task_state(): 601 msecs
Avg runtime set_current_state(): 552 msecs

vanilla dirty
Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%)
Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%)
Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%)
Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%)
Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%)
Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%)
Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%)
Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%)
Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%)
Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%)
Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%)
Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%)
Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%)

== 2. ppc64le ==

Avg runtime set_task_state(): 938 msecs
Avg runtime set_current_state: 940 msecs

vanilla dirty
Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%)
Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%)
Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%)
Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%)
Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%)
Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%)
Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%)
Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%)
Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%)
Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%)
Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%)
Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%)
Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%)
Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%)
Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%)
Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%)

x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
and ppc64 (with paca) show similar improvements in the unlink microbenches.
The small delta for ppc64 (2ms), does not represent the gains on the unlink
runs. In the case of x86, there was a decent amount of variation in the
latency runs, but always within a 20 to 50ms increase), ppc was more constant.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2017-01-14 18:14:16 +0800
d269a8b8c kernel/locking: Compute 'current' directly ... Browse Code »

This patch effectively replaces the tsk pointer dereference
(which is obviously == current), to directly use get_current()
macro. This is to make the removal of setting foreign task
states smoother and painfully obvious. Performance win on some
archs such as x86-64 and ppc64. On a microbenchmark that calls
set_task_state() vs set_current_state() and an inode rwsem
pounding benchmark doing unlink:

== 1. x86-64 ==

Avg runtime set_task_state(): 601 msecs
Avg runtime set_current_state(): 552 msecs

vanilla dirty
Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%)
Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%)
Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%)
Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%)
Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%)
Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%)
Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%)
Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%)
Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%)
Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%)
Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%)
Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%)
Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%)

== 2. ppc64le ==

Avg runtime set_task_state(): 938 msecs
Avg runtime set_current_state: 940 msecs

vanilla dirty
Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%)
Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%)
Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%)
Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%)
Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%)
Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%)
Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%)
Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%)
Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%)
Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%)
Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%)
Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%)
Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%)
Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%)
Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%)
Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%)

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-4-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2017-01-14 18:14:14 +0800
5376f2e72 drivers/tty: Compute 'current' directly ... Browse Code »

This patch effectively replaces the tsk pointer dereference
(which is obviously == current), to directly use get_current()
macro. This is to make the removal of setting foreign task
states smoother and painfully obvious. Performance win on some
archs such as x86-64 and ppc64 -- arm64 is no longer an issue.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Greg Kroah-Hartman
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2017-01-14 18:14:13 +0800
0039962a1 kernel/exit: Compute 'current' directly ... Browse Code »

This patch effectively replaces the tsk pointer dereference (which is
obviously == current), to directly use get_current() macro. In this
case, do_exit() always passes current to exit_mm(), hence we can
simply get rid of the argument. This is also a performance win on some
archs such as x86-64 and ppc64 -- arm64 is no longer an issue.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2017-01-14 18:14:11 +0800
aef591cd3 locking/spinlocks/x86, paravirt: Remove paravirt_ticketlocks_enabled ... Browse Code »

This is a follow-up of commit:

cfd8983f03c7b2 ("x86, locking/spinlocks: Remove ticket (spin)lock implementation")

The static_key structure 'paravirt_ticketlocks_enabled' is now removed as it is
no longer used.

As a result, the init functions kvm_spinlock_init_jump() and
xen_init_spinlocks_jump() are also removed.

A simple build and boot test was done to verify it.

Signed-off-by: Waiman Long
Reviewed-by: Boris Ostrovsky
Cc: Alok Kataria
Cc: Chris Wright
Cc: Jeremy Fitzhardinge
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Paolo Bonzini
Cc: Peter Zijlstra
Cc: Radim Krčmář
Cc: Rusty Russell
Cc: Thomas Gleixner
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1484252878-1962-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar

Waiman Long
2017-01-14 16:33:46 +0800

12 Jan, 2017

6 commits

6e03f66c0 locking/jump_labels: Update bug_at() boot message ... Browse Code »

First of all, %*ph specifier allows to dump data in hex format using the
pointer to a buffer. This is suitable to use here.

Besides that Thomas suggested to move it to critical level and replace __FILE__
by explicit mention of "jumplabel".

Signed-off-by: Andy Shevchenko
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20170110164354.47372-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar

Andy Shevchenko
2017-01-12 16:43:07 +0800
75437bb30 locking/pvqspinlock: Don't wait if vCPU is preempted ... Browse Code »

If prev node is not in running state or its vCPU is preempted, we can give
up our vCPU slices in pv_wait_node() ASAP.

Signed-off-by: Pan Xinhui
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: longman@redhat.com
Link: http://lkml.kernel.org/r/1484035006-6787-1-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Fixed typos in the changelog, removed ugly linebreak from the code. ]
Signed-off-by: Ingo Molnar

Pan Xinhui
2017-01-12 16:35:57 +0800
607904c35 locking/spinlocks: Remove the unused spin_lock_bh_nested() API ... Browse Code »

The spin_lock_bh_nested() API is defined but is not used anywhere
in the kernel. So all spin_lock_bh_nested() and related APIs are
now removed.

Signed-off-by: Waiman Long
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1483975612-16447-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar

Waiman Long
2017-01-12 16:33:39 +0800
ba836a6f5 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"27 fixes.

There are three patches that aren't actually fixes. They're simple
function renamings which are nice-to-have in mainline as ongoing net
development depends on them."

* akpm: (27 commits)
timerfd: export defines to userspace
mm/hugetlb.c: fix reservation race when freeing surplus pages
mm/slab.c: fix SLAB freelist randomization duplicate entries
zram: support BDI_CAP_STABLE_WRITES
zram: revalidate disk under init_lock
mm: support anonymous stable page
mm: add documentation for page fragment APIs
mm: rename __page_frag functions to __page_frag_cache, drop order from drain
mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
mm: don't dereference struct page fields of invalid pages
mailmap: add codeaurora.org names for nameless email commits
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
mm: pmd dirty emulation in page fault handler
ipc/sem.c: fix incorrect sem_lock pairing
lib/Kconfig.debug: fix frv build failure
mm: get rid of __GFP_OTHER_NODE
mm: fix remote numa hits statistics
mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
ocfs2: fix crash caused by stale lvb with fsdlm plugin
...

Linus Torvalds
2017-01-12 03:15:15 +0800
cff3b2c4b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix rtlwifi crash, from Larry Finger.

2) Memory disclosure in appletalk ipddp routing code, from Vlad
Tsyrklevich.

3) r8152 can erroneously split an RX packet into multiple URBs if the
Rx FIFO is not empty when we suspend. Fix this by waiting for the
FIFO to empty before suspending. From Hayes Wang.

4) Two GRO fixes (enter slow path when not enough SKB tail room exists,
disable frag0 optimizations when there are IPV6 extension headers)
from Eric Dumazet and Herbert Xu.

5) A series of mlx5e bug fixes (do source udp port offloading for
tunnels properly, Ip fragment matching fixes, handling firmware
errors properly when installing TC rules, etc.) from Saeed Mahameed,
Or Gerlitz, Roi Dayan, Hadar Hen Zion, Gil Rockah, and Daniel
Jurgens.

6) Two VRF fixes from David Ahern (don't skip multipath selection for
VRF paths, disallow VRF to be configured with table ID 0).

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
net: vrf: do not allow table id 0
net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
sctp: Fix spelling mistake: "Atempt" -> "Attempt"
net: ipv4: Fix multipath selection with vrf
cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
gro: use min_t() in skb_gro_reset_offset()
net/mlx5: Only cancel recovery work when cleaning up device
net/mlx5e: Remove WARN_ONCE from adaptive moderation code
net/mlx5e: Un-register uplink representor on nic_disable
net/mlx5e: Properly handle FW errors while adding TC rules
net/mlx5e: Fix kbuild warnings for uninitialized parameters
net/mlx5e: Set inline mode requirements for matching on IP fragments
net/mlx5e: Properly get address type of encapsulation IP headers
net/mlx5e: TC ipv4 tunnel encap offload error flow fixes
net/mlx5e: Warn when rejecting offload attempts of IP tunnels
net/mlx5e: Properly handle offloading of source udp port for IP tunnels
gro: Disable frag0 optimization on IPv6 ext headers
gro: Enter slow-path if there is no tailroom
mlx4: Return EOPNOTSUPP instead of ENOTSUPP
net/af_iucv: don't use paged skbs for TX on HiperSockets
...

Linus Torvalds
2017-01-12 01:52:12 +0800
a6b6e6165 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto fix from Herbert Xu:
"This fixes a regression in aesni that renders it useless if it's
built-in with a modular pcbc configuration"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aesni - Fix failure when built-in with modular pcbc

Linus Torvalds
2017-01-12 01:28:13 +0800

11 Jan, 2017

6 commits

24c63bbc1 net: vrf: do not allow table id 0 ... Browse Code »

Frank reported that vrf devices can be created with a table id of 0.
This breaks many of the run time table id checks and should not be
allowed. Detect this condition at create time and fail with EINVAL.

Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Reported-by: Frank Kellermann
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-01-11 23:04:01 +0800
a13c06525 net: phy: marvell: fix Marvell 88E1512 used in SGMII mode ... Browse Code »

When an Marvell 88E1512 PHY is connected to a nic in SGMII mode, the
fiber page is used for the SGMII host-side connection. The PHY driver
notices that SUPPORTED_FIBRE is set, so it tries reading the fiber page
for the link status, and ends up reading the MAC-side status instead of
the outgoing (copper) link. This leads to incorrect results reported
via ethtool.

If the PHY is connected via SGMII to the host, ignore the fiber page.
However, continue to allow the existing power management code to
suspend and resume the fiber page.

Fixes: 6cfb3bcc0641 ("Marvell phy: check link status in case of fiber link.")
Signed-off-by: Russell King
Signed-off-by: David S. Miller

Russell King
2017-01-11 23:02:37 +0800
eb004603c sctp: Fix spelling mistake: "Atempt" -> "Attempt" ... Browse Code »

Trivial fix to spelling mistake in WARN_ONCE message

Signed-off-by: Colin Ian King
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Colin Ian King
2017-01-11 23:01:01 +0800
7a18c5b9f net: ipv4: Fix multipath selection with vrf ... Browse Code »

fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.

Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-01-11 22:59:55 +0800
73b351473 cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig ... Browse Code »

We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
not right when CONFIG_NET is disabled and there is no socket interface:

warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)

I don't know what the correct solution for this is, but simply removing
the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
'if NET' section avoids the warning and does not produce other build
errors.

Fixes: 483c4933ea09 ("cgroup: Fix CGROUP_BPF config")
Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2017-01-11 22:47:10 +0800
7cfd5fd5a gro: use min_t() in skb_gro_reset_offset() ... Browse Code »

On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2017-01-11 21:15:40 +0800