26 May, 2017
1 commit
-
Saw these compile errors on SPARC when queued rwlock feature is enabled.
CC kernel/locking/qrwlock.o
kernel/locking/qrwlock.c: In function ‘queued_read_lock_slowpath’:
kernel/locking/qrwlock.c:89: error: implicit declaration of function ‘arch_spin_lock’
kernel/locking/qrwlock.c:102: error: implicit declaration of function ‘arch_spin_unlock’
make[4]: *** [kernel/locking/qrwlock.o] Error 1Include spinlock.h in qrwlock.c to fix it.
Signed-off-by: Babu Moger
Reviewed-by: Håkon Bugge
Reviewed-by: Jane Chu
Reviewed-by: Shannon Nelson
Reviewed-by: Vijay Kumar
Signed-off-by: David S. Miller
16 Nov, 2016
1 commit
-
With the s390 special case of a yielding cpu_relax() implementation gone,
we can now remove all users of cpu_relax_lowlatency() and replace them
with cpu_relax().Signed-off-by: Christian Borntraeger
Signed-off-by: Peter Zijlstra (Intel)
Cc: Catalin Marinas
Cc: Heiko Carstens
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Nicholas Piggin
Cc: Noam Camus
Cc: Peter Zijlstra
Cc: Russell King
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar
16 Jun, 2016
1 commit
-
The only reason for the current code is to make GCC emit only the
"LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
the result), do so nicer.Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Waiman Long
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
18 Sep, 2015
1 commit
-
... trivial, but reads a little nicer when we name our
actual primitive 'lock'.Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman Long
Link: http://lkml.kernel.org/r/1442216244-4409-1-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar
12 Aug, 2015
1 commit
-
The qrwlock implementation is slightly heavy in its use of memory
barriers, mainly through the use of _cmpxchg() and _return() atomics, which
imply full barrier semantics.This patch modifies the qrwlock code to use the more relaxed atomic
routines so that we can reduce the unnecessary barrier overhead on
weakly-ordered architectures.Signed-off-by: Will Deacon
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1438880084-18856-7-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar
03 Aug, 2015
1 commit
-
Currently, a reader will check first to make sure that the writer mode
byte is cleared before incrementing the reader count. That waiting is
not really necessary. It increases the latency in the reader/writer
to reader transition and reduces readers performance.This patch eliminates that waiting. It also has the side effect
of reducing the chance of writer lock stealing and improving the
fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
locking loop of mostly readers (RW ratio = 10,000:1) has the following
performance numbers in a Haswell-EX box:Kernel Locking Rate (Kops/s)
------ ---------------------
4.1.1 15,063,081
4.1.1+patch 17,241,552 (+14.4%)Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Arnd Bergmann
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/1436459543-29126-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar
06 Jul, 2015
2 commits
-
The qrwlock is fair in the process context, but becoming unfair when
in the interrupt context to support use cases like the tasklist_lock.The current code isn't that well-documented on what happens when
in the interrupt context. The rspin_until_writer_unlock() will only
spin if the writer has gotten the lock. If the writer is still in the
waiting state, the increment in the reader count will cause the writer
to remain in the waiting state and the new interrupt context reader
will get the lock and return immediately. The current code, however,
does an additional read of the lock value which is not necessary as
the information has already been there in the fast path. This may
sometime cause an additional cacheline transfer when the lock is
highly contended.This patch passes the lock value information gotten in the fast path
to the slow path to eliminate the additional read. It also documents
the action for the interrupt context readers more clearly.Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Will Deacon
Cc: Arnd Bergmann
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1434729002-57724-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar -
To sync up with the naming convention used in qspinlock, all the
qrwlock functions were renamed to started with "queued" instead of
"queue".Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnd Bergmann
Cc: Douglas Hatch
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Cc: Will Deacon
Link: http://lkml.kernel.org/r/1434729002-57724-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar
19 Jun, 2015
1 commit
-
The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:With R:W ratio = 5:1
Threads w/o patch with patch % change
------- --------- ---------- --------
2 990 895 -9.6%
3 2136 1912 -10.5%
4 3166 2830 -10.6%
5 3953 3629 -8.2%
6 4628 4405 -4.8%
7 5344 5197 -2.8%
8 6065 6004 -1.0%
9 6826 6811 -0.2%
10 7599 7599 0.0%
15 9757 9766 +0.1%
20 13767 13817 +0.4%With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Arnd Bergmann
Cc: Borislav Petkov
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1433863153-30722-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar
12 May, 2015
1 commit
-
To be consistent with the queued spinlocks which use
CONFIG_QUEUED_SPINLOCKS config parameter, the one for the queued
rwlocks is now renamed to CONFIG_QUEUED_RWLOCKS.Signed-off-by: Waiman Long
Cc: Borislav Petkov
Cc: Douglas Hatch
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Scott J Norton
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1431367031-36697-1-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar
17 Jul, 2014
1 commit
-
The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.This patch (i) renames the call to cpu_relax_lowlatency ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Anton Blanchard
Cc: Aurelien Jacquiot
Cc: Benjamin Herrenschmidt
Cc: Bharat Bhushan
Cc: Catalin Marinas
Cc: Chen Liqin
Cc: Chris Metcalf
Cc: Christian Borntraeger
Cc: Chris Zankel
Cc: David Howells
Cc: David S. Miller
Cc: Deepthi Dharwar
Cc: Dominik Dingel
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Guan Xuetao
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Hirokazu Takata
Cc: Ivan Kokshaysky
Cc: James E.J. Bottomley
Cc: James Hogan
Cc: Jason Wang
Cc: Jesper Nilsson
Cc: Joe Perches
Cc: Jonas Bonn
Cc: Joseph Myers
Cc: Kees Cook
Cc: Koichi Yasutake
Cc: Lennox Wu
Cc: Linus Torvalds
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Neuling
Cc: Michal Simek
Cc: Mikael Starvik
Cc: Nicolas Pitre
Cc: Paolo Bonzini
Cc: Paul Burton
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Paul Mackerras
Cc: Qais Yousef
Cc: Qiaowei Ren
Cc: Rafael Wysocki
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Richard Kuo
Cc: Russell King
Cc: Steven Miao
Cc: Steven Rostedt
Cc: Stratos Karafotis
Cc: Tim Chen
Cc: Tony Luck
Cc: Vasily Kulikov
Cc: Vineet Gupta
Cc: Vineet Gupta
Cc: Waiman Long
Cc: Will Deacon
Cc: Wolfram Sang
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar
06 Jun, 2014
1 commit
-
This rwlock uses the arch_spin_lock_t as a waitqueue, and assuming the
arch_spin_lock_t is a fair lock (ticket,mcs etc..) the resulting
rwlock is a fair lock.It fits in the same 8 bytes as the regular rwlock_t by folding the
reader and writer count into a single integer, using the remaining 4
bytes for the arch_spinlock_t.Architectures that can single-copy adress bytes can optimize
queue_write_unlock() with a 0 write to the LSB (the write count).Performance as measured by Davidlohr Bueso (rwlock_t -> qrwlock_t):
+--------------+-------------+---------------+
| Workload | #users | delta |
+--------------+-------------+---------------+
| alltests | > 1400 | -4.83% |
| custom | 0-100,> 100 | +1.43%,-1.57% |
| high_systime | > 1000 | -2.61 |
| shared | all | +0.32 |
+--------------+-------------+---------------+http://www.stgolabs.net/qrwlock-stuff/aim7-results-vs-rwsem_optsin/
Signed-off-by: Waiman Long
[peterz: near complete rewrite]
Signed-off-by: Peter Zijlstra
Cc: Arnd Bergmann
Cc: Linus Torvalds
Cc: "Paul E.McKenney"
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-gac1nnl3wvs2ij87zv2xkdzq@git.kernel.org
Signed-off-by: Ingo Molnar