Eric Lee / smarc-fsl-linux-kernel

06 Dec, 2016

1 commit

f943fe0fa lockdep: Fix report formatting ... Browse Code »

Since commit:

4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")

printk() requires KERN_CONT to continue log messages. Lots of printk()
in lockdep.c and print_ip_sym() don't have it. As the result lockdep
reports are completely messed up.

Add missing KERN_CONT and inline print_ip_sym() where necessary.

Example of a messed up report:

0-rc5+ #41 Not tainted
-------------------------------------------------------
syz-executor0/5036 is trying to acquire lock:
(
rtnl_mutex
){+.+.+.}
, at:
[] rtnl_lock+0x1c/0x20
but task is already holding lock:
(
&net->packet.sklist_lock
){+.+...}
, at:
[] packet_diag_dump+0x1a6/0x1920
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3
(
&net->packet.sklist_lock
+.+...}
...

Without this patch all scripts that parse kernel bug reports are broken.

Signed-off-by: Dmitry Vyukov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: andreyknvl@google.com
Cc: aryabinin@virtuozzo.com
Cc: joe@perches.com
Cc: syzkaller@googlegroups.com
Link: http://lkml.kernel.org/r/1480343083-48731-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar

Dmitry Vyukov
2016-12-06 17:40:08 +0800

02 Dec, 2016

2 commits

1be5d4fa0 locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() ... Browse Code »

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon
Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra (Intel)
Cc: David Daney
Cc: Linus Torvalds
Cc: Mark Rutland
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Steven Rostedt
Cc:
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-12-02 18:13:26 +0800
dbb26055d locking/rtmutex: Prevent dequeue vs. unlock race ... Browse Code »

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0 CPU1 CPU2

l->owner=T1
rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T2)
boost()
unlock(l->wait_lock)
schedule()

rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T3)
boost()
unlock(l->wait_lock)
schedule()
signal(->T2) signal(->T3)
lock(l->wait_lock)
dequeue(T2)
deboost()
unlock(l->wait_lock)
lock(l->wait_lock)
dequeue(T3)
===> wait list is now empty
deboost()
unlock(l->wait_lock)
lock(l->wait_lock)
fixup_rt_mutex_waiters()
if (wait_list_empty(l)) {
owner = l->owner & ~HAS_WAITERS;
l->owner = owner
==> l->owner = T1
}

lock(l->wait_lock)
rt_mutex_unlock(l) fixup_rt_mutex_waiters()
if (wait_list_empty(l)) {
owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
===> Success (l->owner = NULL)
l->owner = owner
==> l->owner = T1
}

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney
Tested-by: David Daney
Signed-off-by: Thomas Gleixner
Reviewed-by: Steven Rostedt
Acked-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mark Rutland
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Will Deacon
Cc: stable@vger.kernel.org
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-12-02 18:13:26 +0800

19 Nov, 2016

1 commit

e245d99e6 lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined ... Browse Code »

Reduce the size of data structure for lockdep entries by half if
PROVE_LOCKING_SMALL if defined. This is used only for sparc.

Signed-off-by: Babu Moger
Acked-by: Sam Ravnborg
Signed-off-by: David S. Miller

Babu Moger
2016-11-19 03:33:19 +0800

22 Sep, 2016

3 commits

d32cdbfb0 locking/lglock: Remove lglock implementation ... Browse Code »

It is now unused, remove it before someone else thinks its a good idea
to use this.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-09-22 21:25:56 +0800
e62539704 stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock() ... Browse Code »

stop_two_cpus() and stop_cpus() use stop_cpus_lock to avoid the deadlock,
we need to ensure that the stopper functions can't be queued "backwards"
from one another. This doesn't look nice; if we use lglock then we do not
really need stopper->lock, cpu_stop_queue_work() could use lg_local_lock()
under local_irq_save().

OTOH it would be even better to avoid lglock in stop_machine.c and remove
lg_double_lock(). This patch adds "bool stop_cpus_in_progress" set/cleared
by queue_stop_cpus_work(), and changes cpu_stop_queue_two_works() to busy
wait until it is cleared.

queue_stop_cpus_work() sets stop_cpus_in_progress = T lockless, but after
it queues a work on CPU1 it must be visible to stop_two_cpus(CPU1, CPU2)
which checks it under the same lock. And since stop_two_cpus() holds the
2nd lock too, queue_stop_cpus_work() can not clear stop_cpus_in_progress
if it is also going to queue a work on CPU2, it needs to take that 2nd
lock to do this.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Tejun Heo
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20151121181148.GA433@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2016-09-22 21:25:55 +0800
b19304937 locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock() ... Browse Code »

cmpxchg_release() is more lighweight than cmpxchg() on some archs(e.g.
PPC), moreover, in __pv_queued_spin_unlock() we only needs a RELEASE in
the fast path(pairing with *_try_lock() or *_lock()). And the slow path
has smp_store_release too. So it's safe to use cmpxchg_release here.

Suggested-by: Boqun Feng
Signed-off-by: Pan Xinhui
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: virtualization@lists.linux-foundation.org
Cc: waiman.long@hpe.com
Link: http://lkml.kernel.org/r/1474277037-15200-2-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Pan Xinhui
2016-09-22 21:25:51 +0800

18 Aug, 2016

3 commits

70800c3c0 locking/rwsem: Scan the wait_list for readers only once ... Browse Code »

When wanting to wakeup readers, __rwsem_mark_wakeup() currently
iterates the wait_list twice while looking to wakeup the first N
queued reader-tasks. While this can be quite inefficient, it was
there such that a awoken reader would be first and foremost
acknowledged by the lock counter.

Keeping the same logic, we can further benefit from the use of
wake_qs and avoid entirely the first wait_list iteration that sets
the counter as wake_up_process() isn't going to occur right away,
and therefore we maintain the counter->list order of going about
things.

Other than saving cycles with O(n) "scanning", this change also
nicely cleans up a good chunk of __rwsem_mark_wakeup(); both
visually and less tedious to read.

For example, the following improvements where seen on some will
it scale microbenchmarks, on a 48-core Haswell:

v4.7 v4.7-rwsem-v1
Hmean signal1-processes-8 5792691.42 ( 0.00%) 5771971.04 ( -0.36%)
Hmean signal1-processes-12 6081199.96 ( 0.00%) 6072174.38 ( -0.15%)
Hmean signal1-processes-21 3071137.71 ( 0.00%) 3041336.72 ( -0.97%)
Hmean signal1-processes-48 3712039.98 ( 0.00%) 3708113.59 ( -0.11%)
Hmean signal1-processes-79 4464573.45 ( 0.00%) 4682798.66 ( 4.89%)
Hmean signal1-processes-110 4486842.01 ( 0.00%) 4633781.71 ( 3.27%)
Hmean signal1-processes-141 4611816.83 ( 0.00%) 4692725.38 ( 1.75%)
Hmean signal1-processes-172 4638157.05 ( 0.00%) 4714387.86 ( 1.64%)
Hmean signal1-processes-203 4465077.80 ( 0.00%) 4690348.07 ( 5.05%)
Hmean signal1-processes-224 4410433.74 ( 0.00%) 4687534.43 ( 6.28%)

Stddev signal1-processes-8 6360.47 ( 0.00%) 8455.31 ( 32.94%)
Stddev signal1-processes-12 4004.98 ( 0.00%) 9156.13 (128.62%)
Stddev signal1-processes-21 3273.14 ( 0.00%) 5016.80 ( 53.27%)
Stddev signal1-processes-48 28420.25 ( 0.00%) 26576.22 ( -6.49%)
Stddev signal1-processes-79 22038.34 ( 0.00%) 18992.70 (-13.82%)
Stddev signal1-processes-110 23226.93 ( 0.00%) 17245.79 (-25.75%)
Stddev signal1-processes-141 6358.98 ( 0.00%) 7636.14 ( 20.08%)
Stddev signal1-processes-172 9523.70 ( 0.00%) 4824.75 (-49.34%)
Stddev signal1-processes-203 13915.33 ( 0.00%) 9326.33 (-32.98%)
Stddev signal1-processes-224 15573.94 ( 0.00%) 10613.82 (-31.85%)

Other runs that saw improvements include context_switch and pipe; and
as expected, this is particularly highlighted on larger thread counts
as it becomes more expensive to walk the list twice.

No change in wakeup ordering or semantics.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hp.com
Cc: dave@stgolabs.net
Cc: jason.low2@hpe.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1470384285-32163-4-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-08-18 21:37:11 +0800
c2867bbaf locking/rwsem: Remove a few useless comments ... Browse Code »

Our rwsem code (xadd, at least) is rather well documented, but
there are a few really annoying comments in there that serve
no purpose and we shouldn't bother with them.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hp.com
Cc: dave@stgolabs.net
Cc: jason.low2@hpe.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1470384285-32163-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-08-18 21:37:07 +0800
84b23f9b5 locking/rwsem: Return void in __rwsem_mark_wake() ... Browse Code »

We currently return a rw_semaphore structure, which is the
same lock we passed to the function's argument in the first
place. While there are several functions that choose this
return value, the callers use it, for example, for things
like ERR_PTR. This is not the case for __rwsem_mark_wake(),
and in addition this function is really about the lock
waiters (which we know there are at this point), so its
somewhat odd to be returning the sem structure.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hp.com
Cc: dave@stgolabs.net
Cc: jason.low2@hpe.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1470384285-32163-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-08-18 21:37:03 +0800

10 Aug, 2016

5 commits

80127a396 locking/percpu-rwsem: Optimize readers and reduce global impact ... Browse Code »

Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.

This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.

This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.

Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Oleg Nesterov
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Paul McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
[ Fixed modular build. ]
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-08-10 20:34:01 +0800
08be8f63c locking/pvstat: Separate wait_again and spurious wakeup stats ... Browse Code »

Currently there are overlap in the pvqspinlock wait_again and
spurious_wakeup stat counters. Because of lock stealing, it is
no longer possible to accurately determine if spurious wakeup has
happened in the queue head. As they track both the queue node and
queue head status, it is also hard to tell how many of those comes
from the queue head and how many from the queue node.

This patch changes the accounting rules so that spurious wakeup is
only tracked in the queue node. The wait_again count, however, is
only tracked in the queue head when the vCPU failed to acquire the
lock after a vCPU kick. This should give a much better indication of
the wait-kick dynamics in the queue node and the queue head.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boqun Feng
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Pan Xinhui
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1464713631-1066-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-08-10 20:16:02 +0800
64a5e3cb3 locking/qspinlock: Improve readability ... Browse Code »

Restructure pv_queued_spin_steal_lock() as I found it hard to read.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long

Peter Zijlstra
2016-08-10 20:16:02 +0800
c2ace36b8 locking/pvqspinlock: Fix a bug in qstat_read() ... Browse Code »

It's obviously wrong to set stat to NULL. So lets remove it.
Otherwise it is always zero when we check the latency of kick/wake.

Signed-off-by: Pan Xinhui
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Waiman Long
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1468405414-3700-1-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Pan Xinhui
2016-08-10 20:13:29 +0800
229ce6315 locking/pvqspinlock: Fix double hash race ... Browse Code »

When the lock holder vCPU is racing with the queue head:

CPU 0 (lock holder) CPU1 (queue head)
=================== =================
spin_lock(); spin_lock();
pv_kick_node(): pv_wait_head_or_lock():
if (!lp) {
lp = pv_hash(lock, pn);
xchg(&l->locked, _Q_SLOW_VAL);
}
WRITE_ONCE(pn->state, vcpu_halted);
cmpxchg(&pn->state,
vcpu_halted, vcpu_hashed);
WRITE_ONCE(l->locked, _Q_SLOW_VAL);
(void)pv_hash(lock, pn);

In this case, lock holder inserts the pv_node of queue head into the
hash table and set _Q_SLOW_VAL unnecessary. This patch avoids it by
restoring/setting vcpu_hashed state after failing adaptive locking
spinning.

Signed-off-by: Wanpeng Li
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Pan Xinhui
Cc: Andrew Morton
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long
Link: http://lkml.kernel.org/r/1468484156-4521-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar

Wanpeng Li
2016-08-10 20:13:28 +0800

26 Jul, 2016

1 commit

c86ad14d3 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.

The main changes are:

- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)

- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)

- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)

- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)

- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)

- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)

- optimize mutex_trylock() fastpath (Peter Zijlstra)

- ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...

Linus Torvalds
2016-07-26 03:41:29 +0800

07 Jul, 2016

1 commit

36e91aa26 Merge branch 'locking/arch-atomic' into locking/core, because the topic is ready ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-07-07 15:12:02 +0800

27 Jun, 2016

1 commit

0dceeaf59 locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec() ... Browse Code »

queued_spin_lock_slowpath() should not worry about another
queued_spin_lock_slowpath() running in interrupt context and
changing node->count by accident, because node->count keeps
the same value every time we enter/leave queued_spin_lock_slowpath().

On some architectures this_cpu_dec() will save/restore irq flags,
which has high overhead. Use the much cheaper __this_cpu_dec() instead.

Signed-off-by: Pan Xinhui
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hpe.com
Link: http://lkml.kernel.org/r/1465886247-3773-1-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Rewrote changelog. ]
Signed-off-by: Ingo Molnar

Pan Xinhui
2016-06-27 17:37:41 +0800

24 Jun, 2016

1 commit

6720a305d locking: avoid passing around 'thread_info' in mutex debugging code ... Browse Code »

None of the code actually wants a thread_info, it all wants a
task_struct, and it's just converting back and forth between the two
("ti->task" to get the task_struct from the thread_info, and
"task_thread_info(task)" to go the other way).

No semantic change.

Acked-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-06-24 03:11:17 +0800

16 Jun, 2016

3 commits

86a3b5f34 locking/atomic, arch/rwsem: Employ atomic_long_fetch_add() ... Browse Code »

Now that we have fetch_add() we can stop using add_return() - val.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Jason Low
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-16 16:48:35 +0800
f9852b74b locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire() ... Browse Code »

The only reason for the current code is to make GCC emit only the
"LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
the result), do so nicer.

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Waiman Long
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-16 16:48:34 +0800
e37837fb6 locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions ... Browse Code »

These functions have been deprecated for a while and there is only the
one user left, convert and kill.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boqun Feng
Cc: Davidlohr Bueso
Cc: Frederic Weisbecker
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-16 16:48:33 +0800

14 Jun, 2016

2 commits

33ac27967 locking/barriers: Introduce smp_acquire__after_ctrl_dep() ... Browse Code »

Introduce smp_acquire__after_ctrl_dep(), this construct is not
uncommon, but the lack of this barrier is.

Use it to better express smp_rmb() uses in WRITE_ONCE(), the IPC
semaphore code and the qspinlock code.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-14 17:55:14 +0800
1f03e8d29 locking/barriers: Replace smp_cond_acquire() with smp_cond_load_acquire() ... Browse Code »

This new form allows using hardware assisted waiting.

Some hardware (ARM64 and x86) allow monitoring an address for changes,
so by providing a pointer we can use this to replace the cpu_relax()
with hardware optimized methods in the future.

Requested-by: Will Deacon
Suggested-by: Linus Torvalds
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-14 17:54:27 +0800

08 Jun, 2016

11 commits

ddd0fa73c locking/rwsem: Streamline the rwsem_optimistic_spin() code ... Browse Code »

This patch moves the owner loading and checking code entirely inside of
rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
loop.

Suggested-by: Peter Hurley
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Peter Hurley
Cc: Andrew Morton
Cc: Dave Chinner
Cc: Davidlohr Bueso
Cc: Douglas Hatch
Cc: Jason Low
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1463534783-38814-6-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-06-08 21:17:00 +0800
bf7b4c472 locking/rwsem: Improve reader wakeup code ... Browse Code »

In __rwsem_do_wake(), the reader wakeup code will assume a writer
has stolen the lock if the active reader/writer count is not 0.
However, this is not as reliable an indicator as the original
"< RWSEM_WAITING_BIAS" check. If another reader is present, the code
will still break out and exit even if the writer is gone. This patch
changes it to check the same "< RWSEM_WAITING_BIAS" condition to
reduce the chance of false positive.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Peter Hurley
Cc: Andrew Morton
Cc: Dave Chinner
Cc: Davidlohr Bueso
Cc: Douglas Hatch
Cc: Jason Low
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1463534783-38814-5-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-06-08 21:17:00 +0800
fb6a44f33 locking/rwsem: Protect all writes to owner by WRITE_ONCE() ... Browse Code »

Without using WRITE_ONCE(), the compiler can potentially break a
write into multiple smaller ones (store tearing). So a read from the
same data by another task concurrently may return a partial result.
This can result in a kernel crash if the data is a memory address
that is being dereferenced.

This patch changes all write to rwsem->owner to use WRITE_ONCE()
to make sure that store tearing will not happen. READ_ONCE() may
not be needed for rwsem->owner as long as the value is only used for
comparison and not dereferencing.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Dave Chinner
Cc: Davidlohr Bueso
Cc: Douglas Hatch
Cc: Jason Low
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Hurley
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1463534783-38814-3-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-06-08 21:16:59 +0800
19c5d690e locking/rwsem: Add reader-owned state to the owner field ... Browse Code »

Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
the owner field is NULL which can mean either the readers own
the lock or the owning writer hasn't set the owner field yet.
In the latter case, we miss the chance to do optimistic spinning.

2) While a writer is waiting in the OSQ and a reader takes the lock,
the writer will continue to spin when out of the OSQ in the main
rwsem_optimistic_spin() loop as the owner field is NULL wasting
CPU cycles if some of readers are sleeping.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

Test BW before patch BW after patch % change
---- --------------- -------------- --------
randrw 988 MB/s 1192 MB/s +21%
randwrite 1513 MB/s 1623 MB/s +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

19.95% 5.88% fio [kernel.vmlinux] [k] rwsem_down_write_failed
14.20% 1.52% fio [kernel.vmlinux] [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Jason Low
Acked-by: Davidlohr Bueso
Cc: Andrew Morton
Cc: Dave Chinner
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Hurley
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1463534783-38814-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

Waiman Long
2016-06-08 21:16:59 +0800
8ee62b187 locking/rwsem: Convert sem->count to 'atomic_long_t' ... Browse Code »

Convert the rwsem count variable to an atomic_long_t since we use it
as an atomic variable. This also allows us to remove the
rwsem_atomic_{add,update}() "abstraction" which would now be an unnecesary
level of indirection. In follow up patches, we also remove the
rwsem_atomic_{add,update}() definitions across the various architectures.

Suggested-by: Peter Zijlstra
Signed-off-by: Jason Low
[ Build warning fixes on various architectures. ]
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Davidlohr Bueso
Cc: Fenghua Yu
Cc: Heiko Carstens
Cc: Jason Low
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Paul E. McKenney
Cc: Peter Hurley
Cc: Terry Rudd
Cc: Thomas Gleixner
Cc: Tim Chen
Cc: Tony Luck
Cc: Waiman Long
Link: http://lkml.kernel.org/r/1465017963-4839-2-git-send-email-jason.low2@hpe.com
Signed-off-by: Ingo Molnar

Jason Low
2016-06-08 21:16:42 +0800
055ce0fd1 locking/qspinlock: Add comments ... Browse Code »

I figured we need to document the spin_is_locked() and
spin_unlock_wait() constraints somwehere.

Ideally 'someone' would rewrite Documentation/atomic_ops.txt and we
could find a place in there. But currently that document is stale to
the point of hardly being useful.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boqun Feng
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Pan Xinhui
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long
Cc: Will Deacon
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-08 20:44:01 +0800
8d53fa190 locking/qspinlock: Clarify xchg_tail() ordering ... Browse Code »

While going over the code I noticed that xchg_tail() is a RELEASE but
had no obvious pairing commented.

It pairs with a somewhat unique address dependency through
decode_tail().

So the store-release of xchg_tail() is paired by the address
dependency of the load of xchg_tail followed by the dereference from
the pointer computed from that load.

The @old -> @prev transformation itself is pure, and therefore does
not depend on external state, so that is immaterial wrt. ordering.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Boqun Feng
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Pan Xinhui
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long
Cc: Will Deacon
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-08 20:44:00 +0800
ae0b5c2f0 Merge branch 'locking/urgent' into locking/core, to pick up dependency ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-06-08 20:35:29 +0800
2c6100227 locking/qspinlock: Fix spin_unlock_wait() some more ... Browse Code »

While this prior commit:

54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")

... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.

So while the 2 locks crossed problem:

spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
foo() foo();

... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:

flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
// add to lockless list
// iterate lockless list

... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.

This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.

It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boqun Feng
Cc: Davidlohr Bueso
Cc: Giovanni Gherdovich
Cc: Linus Torvalds
Cc: Pan Xinhui
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long
Cc: Will Deacon
Cc: stable@vger.kernel.org # v4.2 and later
Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-08 20:29:08 +0800
a461d5879 locking/rtmutex: Only warn once on a trylock from bad context ... Browse Code »

One warning should be enough to get one motivated to fix this. It is
possible that this happens more than once and that starts flooding the
output. Later the prints will be suppressed so we only get half of it.
Depending on the console system used it might not be helpful.

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1464356838-1755-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar

Sebastian Andrzej Siewior
2016-06-08 20:22:00 +0800
dfaaf3fa0 locking/lockdep: Use __jhash_mix() for iterate_chain_key() ... Browse Code »

Use __jhash_mix() to mix the class_idx into the class_key. This
function provides better mixing than the previously used, home grown
mix function.

Leave hashing to the professionals :-)

Suggested-by: George Spelvin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-08 20:22:00 +0800

03 Jun, 2016

5 commits

6e2814745 locking/mutex: Set and clear owner using WRITE_ONCE() ... Browse Code »

The mutex owner can get read and written to locklessly.
Use WRITE_ONCE when setting and clearing the owner field
in order to avoid optimizations such as store tearing. This
avoids situations where the owner field gets written to with
multiple stores and another thread could concurrently read
and use a partially written owner value.

Signed-off-by: Jason Low
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Davidlohr Bueso
Acked-by: Waiman Long
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Terry Rudd
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1463782776.2479.9.camel@j-VirtualBox
Signed-off-by: Ingo Molnar

Jason Low
2016-06-03 18:06:10 +0800
c0fcb6c2d locking/rwsem: Optimize write lock by reducing operations in slowpath ... Browse Code »

When acquiring the rwsem write lock in the slowpath, we first try
to set count to RWSEM_WAITING_BIAS. When that is successful,
we then atomically add the RWSEM_WAITING_BIAS in cases where
there are other tasks on the wait list. This causes write lock
operations to often issue multiple atomic operations.

We can instead make the list_is_singular() check first, and then
set the count accordingly, so that we issue at most 1 atomic
operation when acquiring the write lock and reduce unnecessary
cacheline contention.

Signed-off-by: Jason Low
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Waiman Long
Acked-by: Davidlohr Bueso
Cc: Andrew Morton
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Fenghua Yu
Cc: Heiko Carstens
Cc: Ivan Kokshaysky
Cc: Jason Low
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Paul E. McKenney
Cc: Peter Hurley
Cc: Peter Zijlstra
Cc: Richard Henderson
Cc: Terry Rudd
Cc: Thomas Gleixner
Cc: Tim Chen
Cc: Tony Luck
Link: http://lkml.kernel.org/r/1463445486-16078-2-git-send-email-jason.low2@hpe.com
Signed-off-by: Ingo Molnar

Jason Low
2016-06-03 15:47:13 +0800
e38513905 locking/rwsem: Rework zeroing reader waiter->task ... Browse Code »

Readers that are awoken will expect a nil ->task indicating
that a wakeup has occurred. Because of the way readers are
implemented, there's a small chance that the waiter will never
block in the slowpath (rwsem_down_read_failed), and therefore
requires some form of reference counting to avoid the following
scenario:

rwsem_down_read_failed() rwsem_wake()
get_task_struct();
spin_lock_irq(&wait_lock);
list_add_tail(&waiter.list)
spin_unlock_irq(&wait_lock);
raw_spin_lock_irqsave(&wait_lock)
__rwsem_do_wake()
while (1) {
set_task_state(TASK_UNINTERRUPTIBLE);
waiter->task = NULL
if (!waiter.task) // true
break;
schedule() // never reached

__set_task_state(TASK_RUNNING);
do_exit();
wake_up_process(tsk); // boom

... and therefore race with do_exit() when the caller returns.

There is also a mismatch between the smp_mb() and its documentation,
in that the serialization is done between reading the task and the
nil store. Furthermore, in addition to having the overlapping of
loads and stores to waiter->task guaranteed to be ordered within
that CPU, both wake_up_process() originally and now wake_q_add()
already imply barriers upon successful calls, which serves the
comment.

Now, as an alternative to perhaps inverting the checks in the blocker
side (which has its own penalty in that schedule is unavoidable),
with lockless wakeups this situation is naturally addressed and we
can just use the refcount held by wake_q_add(), instead doing so
explicitly. Of course, we must guarantee that the nil store is done
as the _last_ operation in that the task must already be marked for
deletion to not fall into the race above. Spurious wakeups are also
handled transparently in that the task's reference is only removed
when wake_up_q() is actually called _after_ the nil store.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hpe.com
Cc: dave@stgolabs.net
Cc: jason.low2@hp.com
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/1463165787-25937-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-06-03 15:47:12 +0800
133e89ef5 locking/rwsem: Enable lockless waiter wakeup(s) ... Browse Code »

As wake_qs gain users, we can teach rwsems about them such that
waiters can be awoken without the wait_lock. This is for both
readers and writer, the former being the most ideal candidate
as we can batch the wakeups shortening the critical region that
much more -- ie writer task blocking a bunch of tasks waiting to
service page-faults (mmap_sem readers).

In general applying wake_qs to rwsem (xadd) is not difficult as
the wait_lock is intended to be released soon _anyways_, with
the exception of when a writer slowpath will proactively wakeup
any queued readers if it sees that the lock is owned by a reader,
in which we simply do the wakeups with the lock held (see comment
in __rwsem_down_write_failed_common()).

Similar to other locking primitives, delaying the waiter being
awoken does allow, at least in theory, the lock to be stolen in
the case of writers, however no harm was seen in this (in fact
lock stealing tends to be a _good_ thing in most workloads), and
this is a tiny window anyways.

Some page-fault (pft) and mmap_sem intensive benchmarks show some
pretty constant reduction in systime (by up to ~8 and ~10%) on a
2-socket, 12 core AMD box. In addition, on an 8-core Westmere doing
page allocations (page_test)

aim9:
4.6-rc6 4.6-rc6
rwsemv2
Min page_test 378167.89 ( 0.00%) 382613.33 ( 1.18%)
Min exec_test 499.00 ( 0.00%) 502.67 ( 0.74%)
Min fork_test 3395.47 ( 0.00%) 3537.64 ( 4.19%)
Hmean page_test 395433.06 ( 0.00%) 414693.68 ( 4.87%)
Hmean exec_test 499.67 ( 0.00%) 505.30 ( 1.13%)
Hmean fork_test 3504.22 ( 0.00%) 3594.95 ( 2.59%)
Stddev page_test 17426.57 ( 0.00%) 26649.92 (-52.93%)
Stddev exec_test 0.47 ( 0.00%) 1.41 (-199.05%)
Stddev fork_test 63.74 ( 0.00%) 32.59 ( 48.86%)
Max page_test 429873.33 ( 0.00%) 456960.00 ( 6.30%)
Max exec_test 500.33 ( 0.00%) 507.66 ( 1.47%)
Max fork_test 3653.33 ( 0.00%) 3650.90 ( -0.07%)

4.6-rc6 4.6-rc6
rwsemv2
User 1.12 0.04
System 0.23 0.04
Elapsed 727.27 721.98

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hpe.com
Cc: dave@stgolabs.net
Cc: jason.low2@hp.com
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/1463165787-25937-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-06-03 15:47:10 +0800
0422e83d8 locking/ww_mutex: Report recursive ww_mutex locking early ... Browse Code »

Recursive locking for ww_mutexes was originally conceived as an
exception. However, it is heavily used by the DRM atomic modesetting
code. Currently, the recursive deadlock is checked after we have queued
up for a busy-spin and as we never release the lock, we spin until
kicked, whereupon the deadlock is discovered and reported.

A simple solution for the now common problem is to move the recursive
deadlock discovery to the first action when taking the ww_mutex.

Suggested-by: Maarten Lankhorst
Signed-off-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Maarten Lankhorst
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar

Chris Wilson
2016-06-03 14:37:26 +0800