22 Nov, 2016
1 commit
-
An over-committed guest with more vCPUs than pCPUs has a heavy overload
in osq_lock().This is because if vCPU-A holds the osq lock and yields out, vCPU-B ends
up waiting for per_cpu node->locked to be set. IOW, vCPU-B waits for
vCPU-A to run and unlock the osq lock.Use the new vcpu_is_preempted(cpu) interface to detect if a vCPU is
currently running or not, and break out of the spin-loop if so.test case:
$ perf record -a perf bench sched messaging -g 400 -p && perf report
before patch:
18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
2.49% sched-messaging [kernel.vmlinux] [k] system_callafter patch:
20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner
8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock
4.12% sched-messaging [kernel.vmlinux] [k] system_call
3.01% sched-messaging [kernel.vmlinux] [k] system_call_common
2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7
2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
2.00% sched-messaging [kernel.vmlinux] [k] osq_lockSuggested-by: Boqun Feng
Tested-by: Juergen Gross
Signed-off-by: Pan Xinhui
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Christian Borntraeger
Acked-by: Paolo Bonzini
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: benh@kernel.crashing.org
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-3-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Translated to English. ]
Signed-off-by: Ingo Molnar
16 Nov, 2016
1 commit
-
With the s390 special case of a yielding cpu_relax() implementation gone,
we can now remove all users of cpu_relax_lowlatency() and replace them
with cpu_relax().Signed-off-by: Christian Borntraeger
Signed-off-by: Peter Zijlstra (Intel)
Cc: Catalin Marinas
Cc: Heiko Carstens
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Nicholas Piggin
Cc: Noam Camus
Cc: Peter Zijlstra
Cc: Russell King
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar
18 Dec, 2015
1 commit
-
The Cavium guys reported a soft lockup on their arm64 machine, caused by
commit c55a6ffa6285 ("locking/osq: Relax atomic semantics"):mutex_optimistic_spin+0x9c/0x1d0
__mutex_lock_slowpath+0x44/0x158
mutex_lock+0x54/0x58
kernfs_iop_permission+0x38/0x70
__inode_permission+0x88/0xd8
inode_permission+0x30/0x6c
link_path_walk+0x68/0x4d4
path_openat+0xb4/0x2bc
do_filp_open+0x74/0xd0
do_sys_open+0x14c/0x228
SyS_openat+0x3c/0x48
el0_svc_naked+0x24/0x28This is because in osq_lock we initialise the node for the current CPU:
node->locked = 0;
node->next = NULL;
node->cpu = curr;and then publish the current CPU in the lock tail:
old = atomic_xchg_acquire(&lock->tail, curr);
Once the update to lock->tail is visible to another CPU, the node is
then live and can be both read and updated by concurrent lockers.Unfortunately, the ACQUIRE semantics of the xchg operation mean that
there is no guarantee the contents of the node will be visible before
lock tail is updated. This can lead to lock corruption when, for
example, a concurrent locker races to set the next field.Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
Reported-by: David Daney
Reported-by: Andrew Pinski
Tested-by: Andrew Pinski
Acked-by: Davidlohr Bueso
Signed-off-by: Will Deacon
Signed-off-by: Peter Zijlstra (Intel)
Link: http://lkml.kernel.org/r/1449856001-21177-1-git-send-email-will.deacon@arm.com
Signed-off-by: Linus Torvalds
18 Sep, 2015
1 commit
-
... by using acquire/release for ops around the lock->tail. As such,
weakly ordered archs can benefit from more relaxed use of barriers
when issuing atomics.Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long
Link: http://lkml.kernel.org/r/1442216244-4409-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar
24 Feb, 2015
1 commit
-
With the new standardized functions, we can replace all
ACCESS_ONCE() calls across relevant locking - this includes
lockref and seqlock while at it.ACCESS_ONCE() does not work reliably on non-scalar types.
For example gcc 4.6 and 4.7 might remove the volatile tag
for such accesses during the SRA (scalar replacement of
aggregates) step:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145
Update the new calls regardless of if it is a scalar type,
this is cleaner than having three alternatives.Signed-off-by: Davidlohr Bueso
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Cc: Paul E. McKenney
Link: http://lkml.kernel.org/r/1424662301.6539.18.camel@stgolabs.net
Signed-off-by: Ingo Molnar
14 Jan, 2015
2 commits
-
Both mutexes and rwsems took a performance hit when we switched
over from the original mcs code to the cancelable variant (osq).
The reason being the use of smp_load_acquire() when polling for
node->locked. This is not needed as reordering is not an issue,
as such, relax the barrier semantics. Paul describes the scenario
nicely: https://lkml.org/lkml/2013/11/19/405- If we start polling before the insertion is complete, all that
happens is that the first few polls have no chance of seeing a lock
grant.- Ordering the polling against the initialization -- the above
xchg() is already doing that for us.The smp_load_acquire() when unqueuing make sense. In addition,
we don't need to worry about leaking the critical region as
osq is only used internally.This impacts both regular and large levels of concurrency,
ie on a 40 core system with a disk intensive workload:disk-1 804.83 ( 0.00%) 828.16 ( 2.90%)
disk-61 8063.45 ( 0.00%) 18181.82 (125.48%)
disk-121 7187.41 ( 0.00%) 20119.17 (179.92%)
disk-181 6933.32 ( 0.00%) 20509.91 (195.82%)
disk-241 6850.81 ( 0.00%) 20397.80 (197.74%)
disk-301 6815.22 ( 0.00%) 20287.58 (197.68%)
disk-361 7080.40 ( 0.00%) 20205.22 (185.37%)
disk-421 7076.13 ( 0.00%) 19957.33 (182.04%)
disk-481 7083.25 ( 0.00%) 19784.06 (179.31%)
disk-541 7038.39 ( 0.00%) 19610.92 (178.63%)
disk-601 7072.04 ( 0.00%) 19464.53 (175.23%)
disk-661 7010.97 ( 0.00%) 19348.23 (175.97%)
disk-721 7069.44 ( 0.00%) 19255.33 (172.37%)
disk-781 7007.58 ( 0.00%) 19103.14 (172.61%)
disk-841 6981.18 ( 0.00%) 18964.22 (171.65%)
disk-901 6968.47 ( 0.00%) 18826.72 (170.17%)
disk-961 6964.61 ( 0.00%) 18708.02 (168.62%)Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: "Paul E. McKenney"
Cc: Thomas Gleixner
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1420573509-24774-7-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar -
We have two flavors of the MCS spinlock: standard and cancelable (OSQ).
While each one is independent of the other, we currently mix and match
them. This patch:- Moves the OSQ code out of mcs_spinlock.h (which only deals with the traditional
version) into include/linux/osq_lock.h. No unnecessary code is added to the
more global header file, anything locks that make use of OSQ must include
it anyway.- Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code.
- Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock
if there is support for it.Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Thomas Gleixner
Cc: "Paul E. McKenney"
Cc: Jason Low
Cc: Linus Torvalds
Cc: Mikulas Patocka
Cc: Waiman Long
Link: http://lkml.kernel.org/r/1420573509-24774-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar