Eric Lee / smarc-fsl-linux-kernel

22 Sep, 2016

2 commits

259d69b7f locking/percpu-rwsem: Add down_read_preempt_disable() ... Browse Code »

Provide a down_read()/up_read() variant that keeps preemption disabled
over the whole thing, when possible.

This avoids a needless preemption point for constructs such as:

percpu_down_read(&global_rwsem);
spin_lock(&lock);
...
spin_unlock(&lock);
percpu_up_read(&global_rwsem);

Which perturbs timings. In particular it was found to cure a
performance regression in a follow up patch in fs/locks.c

Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-09-22 21:25:54 +0800
11d9684ca locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held() ... Browse Code »

Provide a static init and a standard locking assertion method.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: oleg@redhat.com
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: tj@kernel.org
Cc: viro@ZenIV.linux.org.uk
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-09-22 21:25:52 +0800

10 Aug, 2016

1 commit

80127a396 locking/percpu-rwsem: Optimize readers and reduce global impact ... Browse Code »

Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.

This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.

This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.

Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Oleg Nesterov
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Paul McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
[ Fixed modular build. ]
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-08-10 20:34:01 +0800

07 Oct, 2015

1 commit

001dac627 locking/percpu-rwsem: Make use of the rcu_sync infrastructure ... Browse Code »

Currently down_write/up_write calls synchronize_sched_expedited()
twice, which is evil. Change this code to rely on rcu-sync primitives.
This avoids the _expedited "big hammer", and this can be faster in
the contended case or even in the case when a single thread does
down_write/up_write in a loop.

Of course, a single down_write() will take more time, but otoh it
will be much more friendly to the whole system.

To simplify the review this patch doesn't update the comments, fixed
by the next change.

Signed-off-by: Oleg Nesterov
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Oleg Nesterov
2015-10-07 02:25:31 +0800

15 Aug, 2015

2 commits

55cc15650 percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() ... Browse Code »

Add percpu_rwsem_release() and percpu_rwsem_acquire() for the users
which need to return to userspace with percpu-rwsem lock held and/or
pass the ownership to another thread.

TODO: change percpu_rwsem_release() to use rwsem_clear_owner(). We can
either fold kernel/locking/rwsem.h into include/linux/rwsem.h, or add
the non-inline percpu_rwsem_clear_owner().

Signed-off-by: Oleg Nesterov

Oleg Nesterov
2015-08-15 19:52:10 +0800
9287f6925 percpu-rwsem: introduce percpu_down_read_trylock() ... Browse Code »

Add percpu_down_read_trylock(), it will have the user soon.

Signed-off-by: Oleg Nesterov

Oleg Nesterov
2015-08-15 19:52:10 +0800

18 Dec, 2012

3 commits

8ebe34731 percpu_rw_semaphore: add lockdep annotations ... Browse Code »

Add lockdep annotations. Not only this can help to find the potential
problems, we do not want the false warnings if, say, the task takes two
different percpu_rw_semaphore's for reading. IOW, at least ->rw_sem
should not use a single class.

This patch exposes this internal lock to lockdep so that it represents the
whole percpu_rw_semaphore. This way we do not need to add another "fake"
->lockdep_map and lock_class_key. More importantly, this also makes the
output from lockdep much more understandable if it finds the problem.

In short, with this patch from lockdep pov percpu_down_read() and
percpu_up_read() acquire/release ->rw_sem for reading, this matches the
actual semantics. This abuses __up_read() but I hope this is fine and in
fact I'd like to have down_read_no_lockdep() as well,
percpu_down_read_recursive_readers() will need it.

Signed-off-by: Oleg Nesterov
Cc: Anton Arapov
Cc: Ingo Molnar
Cc: Linus Torvalds
Cc: Michal Marek
Cc: Mikulas Patocka
Cc: "Paul E. McKenney"
Cc: Peter Zijlstra
Cc: Srikar Dronamraju
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-12-18 09:15:18 +0800
9390ef0c8 percpu_rw_semaphore: kill ->writer_mutex, add ->write_ctr ... Browse Code »

percpu_rw_semaphore->writer_mutex was only added to simplify the initial
rewrite, the only thing it protects is clear_fast_ctr() which otherwise
could be called by multiple writers. ->rw_sem is enough to serialize the
writers.

Kill this mutex and add "atomic_t write_ctr" instead. The writers
increment/decrement this counter, the readers check it is zero instead of
mutex_is_locked().

Move atomic_add(clear_fast_ctr(), slow_read_ctr) under down_write() to
avoid the race with other writers. This is a bit sub-optimal, only the
first writer needs this and we do not need to exclude the readers at this
stage. But this is simple, we do not want another internal lock until we
add more features.

And this speeds up the write-contended case. Before this patch the racing
writers sleep in synchronize_sched_expedited() sequentially, with this
patch multiple synchronize_sched_expedited's can "overlap" with each
other. Note: we can do more optimizations, this is only the first step.

Signed-off-by: Oleg Nesterov
Cc: Anton Arapov
Cc: Ingo Molnar
Cc: Linus Torvalds
Cc: Michal Marek
Cc: Mikulas Patocka
Cc: "Paul E. McKenney"
Cc: Peter Zijlstra
Cc: Srikar Dronamraju
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-12-18 09:15:18 +0800
a1fd3e24d percpu_rw_semaphore: reimplement to not block the readers unnecessarily ... Browse Code »

Currently the writer does msleep() plus synchronize_sched() 3 times to
acquire/release the semaphore, and during this time the readers are
blocked completely. Even if the "write" section was not actually started
or if it was already finished.

With this patch down_write/up_write does synchronize_sched() twice and
down_read/up_read are still possible during this time, just they use the
slow path.

percpu_down_write() first forces the readers to use rw_semaphore and
increment the "slow" counter to take the lock for reading, then it
takes that rw_semaphore for writing and blocks the readers.

Also. With this patch the code relies on the documented behaviour of
synchronize_sched(), it doesn't try to pair synchronize_sched() with
barrier.

Signed-off-by: Oleg Nesterov
Reviewed-by: Paul E. McKenney
Cc: Linus Torvalds
Cc: Mikulas Patocka
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli
Cc: Anton Arapov
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-12-18 09:15:18 +0800

28 Nov, 2012

1 commit

4b05a1c74 percpu-rwsem: use synchronize_sched_expedited ... Browse Code »

Use synchronize_sched_expedited() instead of synchronize_sched()
to improve mount speed.

This patch improves mount time from 0.500s to 0.013s for Jeff's
test-case.

Signed-off-by: Mikulas Patocka
Reported-and-tested-by: Jeff Chua
Signed-off-by: Linus Torvalds

Mikulas Patocka
2012-11-28 23:33:50 +0800

29 Oct, 2012

2 commits

1bf11c535 percpu-rw-semaphores: use rcu_read_lock_sched ... Browse Code »

Use rcu_read_lock_sched / rcu_read_unlock_sched / synchronize_sched
instead of rcu_read_lock / rcu_read_unlock / synchronize_rcu.

This is an optimization. The RCU-protected region is very small, so
there will be no latency problems if we disable preempt in this region.

So we use rcu_read_lock_sched / rcu_read_unlock_sched that translates
to preempt_disable / preempt_disable. It is smaller (and supposedly
faster) than preemptible rcu_read_lock / rcu_read_unlock.

Signed-off-by: Mikulas Patocka
Signed-off-by: Linus Torvalds

Mikulas Patocka
2012-10-29 01:59:36 +0800
5c1eabe68 percpu-rw-semaphores: use light/heavy barriers ... Browse Code »

This patch introduces new barrier pair light_mb() and heavy_mb() for
percpu rw semaphores.

This patch fixes a bug in percpu-rw-semaphores where a barrier was
missing in percpu_up_write.

This patch improves performance on the read path of
percpu-rw-semaphores: on non-x86 cpus, there was a smp_mb() in
percpu_up_read. This patch changes it to a compiler barrier and removes
the "#if defined(X86) ..." condition.

From: Lai Jiangshan
Signed-off-by: Mikulas Patocka
Signed-off-by: Linus Torvalds

Mikulas Patocka
2012-10-29 01:59:36 +0800

26 Sep, 2012

1 commit

62ac665ff blockdev: turn a rw semaphore into a percpu rw semaphore ... Browse Code »

This avoids cache line bouncing when many processes lock the semaphore
for read.

New percpu lock implementation

The lock consists of an array of percpu unsigned integers, a boolean
variable and a mutex.

When we take the lock for read, we enter rcu read section, check for a
"locked" variable. If it is false, we increase a percpu counter on the
current cpu and exit the rcu section. If "locked" is true, we exit the
rcu section, take the mutex and drop it (this waits until a writer
finished) and retry.

Unlocking for read just decreases percpu variable. Note that we can
unlock on a difference cpu than where we locked, in this case the
counter underflows. The sum of all percpu counters represents the number
of processes that hold the lock for read.

When we need to lock for write, we take the mutex, set "locked" variable
to true and synchronize rcu. Since RCU has been synchronized, no
processes can create new read locks. We wait until the sum of percpu
counters is zero - when it is, there are no readers in the critical
section.

Signed-off-by: Mikulas Patocka
Signed-off-by: Jens Axboe

Mikulas Patocka
2012-09-26 13:46:43 +0800