Doug / smarc-fsl-linux-kernel | Embedian Git Server

04 Sep, 2013

1 commit

4689550bb Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core/locking changes from Ingo Molnar:
"Main changes:

- another mutex optimization, from Davidlohr Bueso

- improved lglock lockdep tracking, from Michel Lespinasse

- [ assorted smaller updates, improvements, cleanups. ]"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
generic-ipi/locking: Fix misleading smp_call_function_any() description
hung_task debugging: Print more info when reporting the problem
mutex: Avoid label warning when !CONFIG_MUTEX_SPIN_ON_OWNER
mutex: Do not unnecessarily deal with waiters
mutex: Fix/document access-once assumption in mutex_can_spin_on_owner()
lglock: Update lockdep annotations to report recursive local locks
lockdep: Introduce lock_acquire_exclusive()/shared() helper macros

Linus Torvalds
2013-09-04 23:18:19 +0800

31 Jul, 2013

1 commit

85f489612 mutex: Fix w/w mutex deadlock injection ... Browse Code »

The check needs to be for > 1, because ctx->acquired is already incremented.
This will prevent ww_mutex_lock_slow from returning -EDEADLK and not locking
the mutex. It caused a lot of false gpu lockups on radeon with
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y because a function that shouldn't be able
to return -EDEADLK did.

Signed-off-by: Maarten Lankhorst
Signed-off-by: Peter Zijlstra
Cc: Alex Deucher
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/51F775B5.201@canonical.com
Signed-off-by: Ingo Molnar

Maarten Lankhorst
2013-07-31 04:16:40 +0800

26 Jul, 2013

1 commit

2b4883972 mutex: Avoid label warning when !CONFIG_MUTEX_SPIN_ON_OWNER ... Browse Code »

Fengguang reported the following warning when optimistic
spinning is disabled (ie: make allnoconfig):

kernel/mutex.c:599:1: warning: label 'done' defined but not used

Remove the 'done' label altogether.

Reported-by: Fengguang Wu
Signed-off-by: Davidlohr Bueso
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2013-07-26 05:21:24 +0800

23 Jul, 2013

1 commit

ec83f425d mutex: Do not unnecessarily deal with waiters ... Browse Code »

Upon entering the slowpath, we immediately attempt to acquire
the lock by checking if it is already unlocked. If we are lucky
enough that this is the case, then we don't need to deal with
any waiter related logic.

Furthermore any checks for an empty wait_list are unnecessary as
we already know that count is non-negative and hence no one is
waiting for the lock.

Move the count check and xchg calls to be done before any
waiters are setup - including waiter debugging. Upon failure to
acquire the lock, the xchg sets the counter to 0, instead of -1
as it was originally. This can be done here since we set it back
to -1 right at the beginning of the loop so other waiters are
woken up when the lock is released.

When tested on a 8-socket (80 core) system against a vanilla
3.10-rc1 kernel, this patch provides some small performance
benefits (+2-6%). While it could be considered in the noise
level, the average percentages were stable across multiple runs
and no performance regressions were seen. Two big winners, for
small amounts of users (10-100), were the short and compute
workloads had a +19.36% and +%15.76% in jobs per minute.

Also change some break statements to 'goto slowpath', which IMO
makes a little more intuitive to read.

Signed-off-by: Davidlohr Bueso
Acked-by: Rik van Riel
Acked-by: Maarten Lankhorst
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1372450398.2106.1.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2013-07-23 17:48:37 +0800

22 Jul, 2013

1 commit

1e40c2ede mutex: Fix/document access-once assumption in mutex_can_spin_on_owner() ... Browse Code »

mutex_can_spin_on_owner() is technically broken in that it would
in theory allow the compiler to load lock->owner twice, seeing a
pointer first time and a NULL pointer the second time.

Linus pointed out that a compiler has to be seriously broken to
not compile this correctly - but nevertheless this change
is correct as it will better document the implementation.

Signed-off-by: Peter Zijlstra
Acked-by: Davidlohr Bueso
Acked-by: Waiman Long
Acked-by: Linus Torvalds
Acked-by: Thomas Gleixner
Acked-by: Rik van Riel
Cc: Paul E. McKenney
Cc: David Howells
Link: http://lkml.kernel.org/r/20130719183101.GA20909@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-07-22 16:33:39 +0800

12 Jul, 2013

1 commit

1b375dc30 mutex: Move ww_mutex definitions to ww_mutex.h ... Browse Code »

Move the definitions for wound/wait mutexes out to a separate
header, ww_mutex.h. This reduces clutter in mutex.h, and
increases readability.

Suggested-by: Linus Torvalds
Signed-off-by: Maarten Lankhorst
Acked-by: Peter Zijlstra
Acked-by: Rik van Riel
Acked-by: Maarten Lankhorst
Cc: Dave Airlie
Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com
[ Tidied up the code a bit. ]
Signed-off-by: Ingo Molnar

Maarten Lankhorst
2013-07-12 18:07:46 +0800

26 Jun, 2013

3 commits

230100276 mutex: Add w/w mutex slowpath debugging ... Browse Code »

Injects EDEADLK conditions at pseudo-random interval, with
exponential backoff up to UINT_MAX (to ensure that every lock
operation still completes in a reasonable time).

This way we can test the wound slowpath even for ww mutex users
where contention is never expected, and the ww deadlock
avoidance algorithm is only needed for correctness against
malicious userspace. An example would be protecting kernel
modesetting properties, which thanks to single-threaded X isn't
really expected to contend, ever.

I've looked into using the CONFIG_FAULT_INJECTION
infrastructure, but decided against it for two reasons:

- EDEADLK handling is mandatory for ww mutex users and should
never affect the outcome of a syscall. This is in contrast to -ENOMEM
injection. So fine configurability isn't required.

- The fault injection framework only allows to set a simple
probability for failure. Now the probability that a ww mutex acquire
stage with N locks will never complete (due to too many injected
EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
operations for the completely uncontended case would be O(exp(N)).
The per-acuiqire ctx exponential backoff solution choosen here only
results in O(log N) overhead due to injection and so O(log N * N)
lock operations. This way we can fail with high probability (and so
have good test coverage even for fancy backoff and lock acquisition
paths) without running into patalogical cases.

Note that EDEADLK will only ever be injected when we managed to
acquire the lock. This prevents any behaviour changes for users
which rely on the EALREADY semantics.

Signed-off-by: Daniel Vetter
Signed-off-by: Maarten Lankhorst
Acked-by: Peter Zijlstra
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser
Signed-off-by: Ingo Molnar

Daniel Vetter
2013-06-26 18:10:56 +0800
040a0a371 mutex: Add support for wound/wait style locks ... Browse Code »

Wound/wait mutexes are used when other multiple lock
acquisitions of a similar type can be done in an arbitrary
order. The deadlock handling used here is called wait/wound in
the RDBMS literature: The older tasks waits until it can acquire
the contended lock. The younger tasks needs to back off and drop
all the locks it is currently holding, i.e. the younger task is
wounded.

For full documentation please read Documentation/ww-mutex-design.txt.

References: https://lwn.net/Articles/548909/
Signed-off-by: Maarten Lankhorst
Acked-by: Daniel Vetter
Acked-by: Rob Clark
Acked-by: Peter Zijlstra
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.com
Signed-off-by: Ingo Molnar

Maarten Lankhorst
2013-06-26 18:10:56 +0800
a41b56efa arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not ... Browse Code »

This will allow me to call functions that have multiple
arguments if fastpath fails. This is required to support ticket
mutexes, because they need to be able to pass an extra argument
to the fail function.

Originally I duplicated the functions, by adding
__mutex_fastpath_lock_retval_arg. This ended up being just a
duplication of the existing function, so a way to test if
fastpath was called ended up being better.

This also cleaned up the reservation mutex patch some by being
able to call an atomic_set instead of atomic_xchg, and making it
easier to detect if the wrong unlock function was previously
used.

Signed-off-by: Maarten Lankhorst
Acked-by: Peter Zijlstra
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser
Signed-off-by: Ingo Molnar

Maarten Lankhorst
2013-06-26 18:10:55 +0800

19 Apr, 2013

4 commits

cc189d251 mutex: Back out architecture specific check for negative mutex count ... Browse Code »

Linus suggested that probably all the supported architectures can
allow a negative mutex count without incorrect behavior, so we can
then back out the architecture specific change and allow the
mutex count to go to any negative number. That should further
reduce contention for non-x86 architecture.

Suggested-by: Linus Torvalds
Signed-off-by: Waiman Long
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Chandramouleeswaran Aswin
Cc: Davidlohr Bueso
Cc: Norton Scott J
Cc: Rik van Riel
Cc: Paul E. McKenney
Cc: David Howells
Cc: Dave Jones
Cc: Clark Williams
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1366226594-5506-5-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2013-04-19 15:33:36 +0800
2bd2c92cf mutex: Queue mutex spinners with MCS lock to reduce cacheline contention ... Browse Code »

The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option
turned on) allow multiple tasks to spin on a single mutex
concurrently. A potential problem with the current approach is
that when the mutex becomes available, all the spinning tasks
will try to acquire the mutex more or less simultaneously. As a
result, there will be a lot of cacheline bouncing especially on
systems with a large number of CPUs.

This patch tries to reduce this kind of contention by putting
the mutex spinners into a queue so that only the first one in
the queue will try to acquire the mutex. This will reduce
contention and allow all the tasks to move forward faster.

The queuing of mutex spinners is done using an MCS lock based
implementation which will further reduce contention on the mutex
cacheline than a similar ticket spinlock based implementation.
This patch will add a new field into the mutex data structure
for holding the MCS lock. This expands the mutex size by 8 bytes
for 64-bit system and 4 bytes for 32-bit system. This overhead
will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off.

The following table shows the jobs per minute (JPM) scalability
data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
numactl command is used to restrict the running of the fserver
workloads to 1/2/4/8 nodes with hyperthreading off.

+-----------------+-----------+-----------+-------------+----------+
| Configuration | Mean JPM | Mean JPM | Mean JPM | % Change |
| | w/o patch | patch 1 | patches 1&2 | 1->1&2 |
+-----------------+------------------------------------------------+
| | User Range 1100 - 2000 |
+-----------------+------------------------------------------------+
| 8 nodes, HT off | 227972 | 227237 | 305043 | +34.2% |
| 4 nodes, HT off | 393503 | 381558 | 394650 | +3.4% |
| 2 nodes, HT off | 334957 | 325240 | 338853 | +4.2% |
| 1 node , HT off | 198141 | 197972 | 198075 | +0.1% |
+-----------------+------------------------------------------------+
| | User Range 200 - 1000 |
+-----------------+------------------------------------------------+
| 8 nodes, HT off | 282325 | 312870 | 332185 | +6.2% |
| 4 nodes, HT off | 390698 | 378279 | 393419 | +4.0% |
| 2 nodes, HT off | 336986 | 326543 | 340260 | +4.2% |
| 1 node , HT off | 197588 | 197622 | 197582 | 0.0% |
+-----------------+-----------+-----------+-------------+----------+

At low user range 10-100, the JPM differences were within +/-1%.
So they are not that interesting.

The fserver workload uses mutex spinning extensively. With just
the mutex change in the first patch, there is no noticeable
change in performance. Rather, there is a slight drop in
performance. This mutex spinning patch more than recovers the
lost performance and show a significant increase of +30% at high
user load with the full 8 nodes. Similar improvements were also
seen in a 3.8 kernel.

The table below shows the %time spent by different kernel
functions as reported by perf when running the fserver workload
at 1500 users with all 8 nodes.

+-----------------------+-----------+---------+-------------+
| Function | % time | % time | % time |
| | w/o patch | patch 1 | patches 1&2 |
+-----------------------+-----------+---------+-------------+
| __read_lock_failed | 34.96% | 34.91% | 29.14% |
| __write_lock_failed | 10.14% | 10.68% | 7.51% |
| mutex_spin_on_owner | 3.62% | 3.42% | 2.33% |
| mspin_lock | N/A | N/A | 9.90% |
| __mutex_lock_slowpath | 1.46% | 0.81% | 0.14% |
| _raw_spin_lock | 2.25% | 2.50% | 1.10% |
+-----------------------+-----------+---------+-------------+

The fserver workload for an 8-node system is dominated by the
contention in the read/write lock. Mutex contention also plays a
role. With the first patch only, mutex contention is down (as
shown by the __mutex_lock_slowpath figure) which help a little
bit. We saw only a few percents improvement with that.

By applying patch 2 as well, the single mutex_spin_on_owner
figure is now split out into an additional mspin_lock figure.
The time increases from 3.42% to 11.23%. It shows a great
reduction in contention among the spinners leading to a 30%
improvement. The time ratio 9.9/2.33=4.3 indicates that there
are on average 4+ spinners waiting in the spin_lock loop for
each spinner in the mutex_spin_on_owner loop. Contention in
other locking functions also go down by quite a lot.

The table below shows the performance change of both patches 1 &
2 over patch 1 alone in other AIM7 workloads (at 8 nodes,
hyperthreading off).

+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests | 0.0% | -0.8% | +0.6% |
| five_sec | -0.3% | +0.8% | +0.8% |
| high_systime | +0.4% | +2.4% | +2.1% |
| new_fserver | +0.1% | +14.1% | +34.2% |
| shared | -0.5% | -0.3% | -0.4% |
| short | -1.7% | -9.8% | -8.3% |
+--------------+---------------+----------------+-----------------+

The short workload is the only one that shows a decline in
performance probably due to the spinner locking and queuing
overhead.

Signed-off-by: Waiman Long
Reviewed-by: Davidlohr Bueso
Acked-by: Rik van Riel
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Chandramouleeswaran Aswin
Cc: Norton Scott J
Cc: Paul E. McKenney
Cc: David Howells
Cc: Dave Jones
Cc: Clark Williams
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2013-04-19 15:33:36 +0800
0dc8c730c mutex: Make more scalable by doing less atomic operations ... Browse Code »

In the __mutex_lock_common() function, an initial entry into
the lock slow path will cause two atomic_xchg instructions to be
issued. Together with the atomic decrement in the fast path, a
total of three atomic read-modify-write instructions will be
issued in rapid succession. This can cause a lot of cache
bouncing when many tasks are trying to acquire the mutex at the
same time.

This patch will reduce the number of atomic_xchg instructions
used by checking the counter value first before issuing the
instruction. The atomic_read() function is just a simple memory
read. The atomic_xchg() function, on the other hand, can be up
to 2 order of magnitude or even more in cost when compared with
atomic_read(). By using atomic_read() to check the value first
before calling atomic_xchg(), we can avoid a lot of unnecessary
cache coherency traffic. The only downside with this change is
that a task on the slow path will have a tiny bit less chance of
getting the mutex when competing with another task in the fast
path.

The same is true for the atomic_cmpxchg() function in the
mutex-spin-on-owner loop. So an atomic_read() is also performed
before calling atomic_cmpxchg().

The mutex locking and unlocking code for the x86 architecture
can allow any negative number to be used in the mutex count to
indicate that some tasks are waiting for the mutex. I am not so
sure if that is the case for the other architectures. So the
default is to avoid atomic_xchg() if the count has already been
set to -1. For x86, the check is modified to include all
negative numbers to cover a larger case.

The following table shows the jobs per minutes (JPM) scalability
data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
numactl command is used to restrict the running of the
high_systime workloads to 1/2/4/8 nodes with hyperthreading on
and off.

+-----------------+-----------+------------+----------+
| Configuration | Mean JPM | Mean JPM | % Change |
| | w/o patch | with patch | |
+-----------------+-----------------------------------+
| | User Range 1100 - 2000 |
+-----------------+-----------------------------------+
| 8 nodes, HT on | 36980 | 148590 | +301.8% |
| 8 nodes, HT off | 42799 | 145011 | +238.8% |
| 4 nodes, HT on | 61318 | 118445 | +51.1% |
| 4 nodes, HT off | 158481 | 158592 | +0.1% |
| 2 nodes, HT on | 180602 | 173967 | -3.7% |
| 2 nodes, HT off | 198409 | 198073 | -0.2% |
| 1 node , HT on | 149042 | 147671 | -0.9% |
| 1 node , HT off | 126036 | 126533 | +0.4% |
+-----------------+-----------------------------------+
| | User Range 200 - 1000 |
+-----------------+-----------------------------------+
| 8 nodes, HT on | 41525 | 122349 | +194.6% |
| 8 nodes, HT off | 49866 | 124032 | +148.7% |
| 4 nodes, HT on | 66409 | 106984 | +61.1% |
| 4 nodes, HT off | 119880 | 130508 | +8.9% |
| 2 nodes, HT on | 138003 | 133948 | -2.9% |
| 2 nodes, HT off | 132792 | 131997 | -0.6% |
| 1 node , HT on | 116593 | 115859 | -0.6% |
| 1 node , HT off | 104499 | 104597 | +0.1% |
+-----------------+------------+-----------+----------+

At low user range 10-100, the JPM differences were within +/-1%.
So they are not that interesting.

AIM7 benchmark run has a pretty large run-to-run variance due to
random nature of the subtests executed. So a difference of less
than +-5% may not be really significant.

This patch improves high_systime workload performance at 4 nodes
and up by maintaining transaction rates without significant
drop-off at high node count. The patch has practically no
impact on 1 and 2 nodes system.

The table below shows the percentage time (as reported by perf
record -a -s -g) spent on the __mutex_lock_slowpath() function
by the high_systime workload at 1500 users for 2/4/8-node
configurations with hyperthreading off.

+---------------+-----------------+------------------+---------+
| Configuration | %Time w/o patch | %Time with patch | %Change |
+---------------+-----------------+------------------+---------+
| 8 nodes | 65.34% | 0.69% | -99% |
| 4 nodes | 8.70% | 1.02% | -88% |
| 2 nodes | 0.41% | 0.32% | -22% |
+---------------+-----------------+------------------+---------+

It is obvious that the dramatic performance improvement at 8
nodes was due to the drastic cut in the time spent within the
__mutex_lock_slowpath() function.

The table below show the improvements in other AIM7 workloads
(at 8 nodes, hyperthreading off).

+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests | +0.6% | +104.2% | +185.9% |
| five_sec | +1.9% | +0.9% | +0.9% |
| fserver | +1.4% | -7.7% | +5.1% |
| new_fserver | -0.5% | +3.2% | +3.1% |
| shared | +13.1% | +146.1% | +181.5% |
| short | +7.4% | +5.0% | +4.2% |
+--------------+---------------+----------------+-----------------+

Signed-off-by: Waiman Long
Reviewed-by: Davidlohr Bueso
Reviewed-by: Rik van Riel
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Chandramouleeswaran Aswin
Cc: Norton: Scott J
Cc: Paul E. McKenney
Cc: David Howells
Cc: Dave Jones
Cc: Clark Williams
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1366226594-5506-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2013-04-19 15:33:35 +0800
41fcb9f23 mutex: Move mutex spinning code from sched/core.c back to mutex.c ... Browse Code »

As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler
feature bit was really just an early hack to make with/without
mutex-spinning testable. So it is no longer necessary.

This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and
move the mutex spinning code from kernel/sched/core.c back to
kernel/mutex.c which is where they should belong.

Signed-off-by: Waiman Long
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Chandramouleeswaran Aswin
Cc: Davidlohr Bueso
Cc: Norton Scott J
Cc: Rik van Riel
Cc: Paul E. McKenney
Cc: David Howells
Cc: Dave Jones
Cc: Clark Williams
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar

Waiman Long
2013-04-19 15:33:34 +0800

08 Feb, 2013

1 commit

8bd75c77b sched/rt: Move rt specific bits into new header file ... Browse Code »

Move rt scheduler definitions out of include/linux/sched.h into
new file include/linux/sched/rt.h

Signed-off-by: Clark Williams
Cc: Peter Zijlstra
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
Signed-off-by: Ingo Molnar

Clark Williams
2013-02-08 03:51:08 +0800

01 Mar, 2012

1 commit

bd2f55361 sched/rt: Use schedule_preempt_disabled() ... Browse Code »

Coccinelle based conversion.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-24swm5zut3h9c4a6s46x8rws@git.kernel.org
Signed-off-by: Ingo Molnar

Thomas Gleixner
2012-03-01 17:28:03 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

25 May, 2011

1 commit

e4c70a662 lockdep, mutex: provide mutex_lock_nest_lock ... Browse Code »

In order to convert i_mmap_lock to a mutex we need a mutex equivalent to
spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation.

As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of
the locking pattern where an outer lock serializes the acquisition order
of nested locks. That is, if every time you lock multiple locks A, say A1
and A2 you first acquire N, the order of acquiring A1 and A2 is
irrelevant.

Signed-off-by: Peter Zijlstra
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:17 +0800

24 Apr, 2011

1 commit

625f2a378 sched: Get rid of lock_depth ... Browse Code »

Neil Brown pointed out that lock_depth somehow escaped the BKL
removal work. Let's get rid of it now.

Note that the perf scripting utilities still have a bunch of
code for dealing with common_lock_depth in tracepoints; I have
left that in place in case anybody wants to use that code with
older kernels.

Suggested-by: Neil Brown
Signed-off-by: Jonathan Corbet
Cc: Arnd Bergmann
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20110422111910.456c0e84@bike.lwn.net
Signed-off-by: Ingo Molnar

Jonathan Corbet
2011-04-24 19:18:38 +0800

14 Apr, 2011

1 commit

c6eb3dda2 mutex: Use p->on_cpu for the adaptive spin ... Browse Code »

Since we now have p->on_cpu unconditionally available, use it to
re-implement mutex_spin_on_owner.

Requested-by: Thomas Gleixner
Reviewed-by: Frank Rowand
Cc: Mike Galbraith
Cc: Nick Piggin
Cc: Linus Torvalds
Cc: Andrew Morton
Signed-off-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl

Peter Zijlstra
2011-04-14 14:52:33 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

26 Nov, 2010

1 commit

335d7afbf mutexes, sched: Introduce arch_mutex_cpu_relax() ... Browse Code »

The spinning mutex implementation uses cpu_relax() in busy loops as a
compiler barrier. Depending on the architecture, cpu_relax() may do more
than needed in this specific mutex spin loops. On System z we also give
up the time slice of the virtual cpu in cpu_relax(), which prevents
effective spinning on the mutex.

This patch replaces cpu_relax() in the spinning mutex code with
arch_mutex_cpu_relax(), which can be defined by each architecture that
selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
this patch should not affect other architectures than System z for now.

Signed-off-by: Gerald Schaefer
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Gerald Schaefer
2010-11-26 22:05:34 +0800

03 Sep, 2010

1 commit

ef5dc121d mutex: Fix annotations to include it in kernel-locking docbook ... Browse Code »

Fix kernel-doc notation in linux/mutex.h and kernel/mutex.c,
then add these 2 files to the kernel-locking docbook as the
Mutex API reference chapter.

Add one API function to mutex-design.txt and correct a typo in
that file.

Signed-off-by: Randy Dunlap
Cc: Rusty Russell
LKML-Reference:
Signed-off-by: Ingo Molnar

Randy Dunlap
2010-09-03 14:19:51 +0800

19 May, 2010

1 commit

fd6be105b mutex: Fix optimistic spinning vs. BKL ... Browse Code »

Currently, we can hit a nasty case with optimistic
spinning on mutexes:

CPU A tries to take a mutex, while holding the BKL

CPU B tried to take the BLK while holding the mutex

This looks like a AB-BA scenario but in practice, is
allowed and happens due to the auto-release on
schedule() nature of the BKL.

In that case, the optimistic spinning code can get us
into a situation where instead of going to sleep, A
will spin waiting for B who is spinning waiting for
A, and the only way out of that loop is the
need_resched() test in mutex_spin_on_owner().

This patch fixes it by completely disabling spinning
if we own the BKL. This adds one more detail to the
extensive list of reasons why it's a bad idea for
kernel code to be holding the BKL.

Signed-off-by: Tony Breeds
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Cc: Benjamin Herrenschmidt
Cc:
LKML-Reference:
[ added an unlikely() attribute to the branch ]
Signed-off-by: Ingo Molnar

Tony Breeds
2010-05-19 14:18:44 +0800

03 Dec, 2009

1 commit

c02260277 mutex: Better control mutex adaptive spinning config ... Browse Code »

Introduce CONFIG_MUTEX_SPIN_ON_OWNER so that we can centralize
in a single place the conditions that determine its definition
and use.

Signed-off-by: Frederic Weisbecker
Acked-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
Cc: Peter Zijlstra

Frederic Weisbecker
2009-12-03 18:50:11 +0800

11 Jun, 2009

2 commits

940010c5a Merge branch 'linus' into perfcounters/core ... Browse Code »

Conflicts:
arch/x86/kernel/irqinit.c
arch/x86/kernel/irqinit_64.c
arch/x86/kernel/traps.c
arch/x86/mm/fault.c
include/linux/sched.h
kernel/exit.c

Ingo Molnar
2009-06-11 23:55:42 +0800
e7241d771 Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
spinlock: Add missing __raw_spin_lock_flags() stub for UP
mutex: add atomic_dec_and_mutex_lock(), fix
locking, rtmutex.c: Documentation cleanup
mutex: add atomic_dec_and_mutex_lock()

Linus Torvalds
2009-06-11 07:19:40 +0800

11 May, 2009

1 commit

7961386fe Merge commit 'v2.6.30-rc5' into sched/core ... Browse Code »

Merge reason: sched/core was on .30-rc1 before, update to latest fixes

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-05-11 18:59:37 +0800

06 May, 2009

1 commit

3611dfb8e Merge branch 'core/locking' into perfcounters/core ... Browse Code »

Merge reason: we moved a mutex.h commit that originated from the
perfcounters tree into core/locking - but now merge
back that branch to solve a merge artifact and to
pick up cleanups of this commit that happened in
core/locking.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-05-06 14:47:26 +0800

30 Apr, 2009

1 commit

a511e3f96 mutex: add atomic_dec_and_mutex_lock(), fix ... Browse Code »

include/linux/mutex.h:136: warning: 'mutex_lock' declared inline after being called
include/linux/mutex.h:136: warning: previous declaration of 'mutex_lock' was here

uninline it.

[ Impact: clean up and uninline, address compiler warning ]

Signed-off-by: Andrew Morton
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Eric Paris
Cc: Paul Mackerras
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Andrew Morton
2009-04-30 15:01:34 +0800

29 Apr, 2009

1 commit

e7fd5d4b3 Merge branch 'linus' into perfcounters/core ... Browse Code »

Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
the latest upstream fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-04-29 20:47:05 +0800

21 Apr, 2009

1 commit

ff743345b sched: remove extra call overhead for schedule() ... Browse Code »

Lai Jiangshan's patch reminded me that I promised Nick to remove
that extra call overhead in schedule().

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-04-21 02:49:53 +0800

10 Apr, 2009

1 commit

36cd3c9f9 mutex: have non-spinning mutexes on s390 by default ... Browse Code »

Impact: performance regression fix for s390

The adaptive spinning mutexes will not always do what one would expect on
virtualized architectures like s390. Especially the cpu_relax() loop in
mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
away by the hypervisor.

We would end up in a cpu_relax() loop when there is no chance that the
state of the mutex changes until the target cpu has been scheduled again by
the hypervisor.

For that reason we should change the default behaviour to no-spin on s390.

We do have an instruction which allows to yield the current cpu in favour of
a different target cpu. Also we have an instruction which allows us to figure
out if the target cpu is physically backed.

However we need to do some performance tests until we can come up with
a solution that will do the right thing on s390.

Signed-off-by: Heiko Carstens
Acked-by: Peter Zijlstra
Cc: Martin Schwidefsky
Cc: Christian Borntraeger
LKML-Reference:
Signed-off-by: Ingo Molnar

Heiko Carstens
2009-04-10 01:28:24 +0800

06 Apr, 2009

1 commit

b09d2501e mutex: drop "inline" from mutex_lock() inside kernel/mutex.c ... Browse Code »

Impact: build fix

mutex_lock() is was defined inline in kernel/mutex.c, but wasn't
declared so not in . This didn't cause a problem until
checkin 3a2d367d9aabac486ac4444c6c7ec7a1dab16267 added the
atomic_dec_and_mutex_lock() inline in between declaration and
definion.

This broke building with CONFIG_ALLOW_WARNINGS=n, e.g. make
allnoconfig.

Either from the source code nor the allnoconfig binary output I cannot
find any internal references to mutex_lock() in kernel/mutex.c, so
presumably this "inline" is now-useless legacy.

Cc: Eric Paris
Cc: Peter Zijlstra
Cc: Paul Mackerras
Orig-LKML-Reference:
Signed-off-by: H. Peter Anvin

H. Peter Anvin
2009-04-06 15:30:27 +0800

15 Jan, 2009

4 commits

ac6e60ee4 mutex: adaptive spinnning, performance tweaks ... Browse Code »

Spin more agressively. This is less fair but also markedly faster.

The numbers:

* dbench 50 (higher is better):
spin 1282MB/s
v10 548MB/s
v10 no wait 1868MB/s

* 4k creates (numbers in files/second higher is better):
spin avg 200.60 median 193.20 std 19.71 high 305.93 low 186.82
v10 avg 180.94 median 175.28 std 13.91 high 229.31 low 168.73
v10 no wait avg 232.18 median 222.38 std 22.91 high 314.66 low 209.12

* File stats (numbers in seconds, lower is better):
spin 2.27s
v10 5.1s
v10 no wait 1.6s

( The source changes are smaller than they look, I just moved the
need_resched checks in __mutex_lock_common after the cmpxchg. )

Signed-off-by: Chris Mason
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Chris Mason
2009-01-15 02:03:54 +0800
0d66bf6d3 mutex: implement adaptive spinning ... Browse Code »

Change mutex contention behaviour such that it will sometimes busy wait on
acquisition - moving its behaviour closer to that of spinlocks.

This concept got ported to mainline from the -rt tree, where it was originally
implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins.

Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
gave a 345% boost for VFS scalability on my testbox:

# ./test-mutex-shm V 16 10 | grep "^avg ops"
avg ops/sec: 296604

# ./test-mutex-shm V 16 10 | grep "^avg ops"
avg ops/sec: 85870

The key criteria for the busy wait is that the lock owner has to be running on
a (different) cpu. The idea is that as long as the owner is running, there is a
fair chance it'll release the lock soon, and thus we'll be better off spinning
instead of blocking/scheduling.

Since regular mutexes (as opposed to rtmutexes) do not atomically track the
owner, we add the owner in a non-atomic fashion and deal with the races in
the slowpath.

Furthermore, to ease the testing of the performance impact of this new code,
there is means to disable this behaviour runtime (without having to reboot
the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y),
by issuing the following command:

# echo NO_OWNER_SPIN > /debug/sched_features

This command re-enables spinning again (this is also the default):

# echo OWNER_SPIN > /debug/sched_features

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-01-15 01:09:02 +0800
41719b030 mutex: preemption fixes ... Browse Code »

The problem is that dropping the spinlock right before schedule is a voluntary
preemption point and can cause a schedule, right after which we schedule again.

Fix this inefficiency by keeping preemption disabled until we schedule, do this
by explicity disabling preemption and providing a schedule() variant that
assumes preemption is already disabled.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-01-15 01:09:00 +0800
93d81d1ac mutex: small cleanup ... Browse Code »

Remove a local variable by combining an assingment and test in one.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2009-01-15 01:08:58 +0800

24 Nov, 2008

1 commit

7918baa55 mutex: __used is needed for function referenced only from inline asm ... Browse Code »

Impact: fix build failure on llvm-gcc-4.2

According to the gcc manual, the 'used' attribute should be applied to
functions referenced only from inline assembly.
This fixes a build failure with llvm-gcc-4.2, which deleted
__mutex_lock_slowpath, __mutex_unlock_slowpath.

Signed-off-by: Török Edwin
Signed-off-by: Ingo Molnar

Török Edwin
2008-11-24 17:00:28 +0800

20 Oct, 2008

1 commit

c7e78cff6 lockstat: contend with points ... Browse Code »

We currently only provide points that have to wait on contention, also
lists the points we have to wait for.

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-10-20 21:43:10 +0800

29 Jul, 2008

1 commit

0e241ffd3 locking: fix mutex @key parameter kernel-doc notation ... Browse Code »

Fix @key parameter to mutex_init() and one of its callers.

Warning(linux-2.6.26-git11//drivers/base/class.c:210): No description found for parameter 'key'

Signed-off-by: Randy Dunlap
Acked-by: Greg Kroah-Hartman
Signed-off-by: Ingo Molnar

Randy Dunlap
2008-07-29 00:12:36 +0800