Eric Lee / smarc-fsl-linux-kernel

08 Jun, 2019

1 commit

893a7d32e lockref: Limit number of cmpxchg loop retries ... Browse Code »

The lockref cmpxchg loop is unbound as long as the spinlock is not
taken. Depending on the hardware implementation of compare-and-swap
a high number of loop retries might happen.

Add an upper bound to the loop to force the fallback to spinlocks
after some time. A retry value of 100 should not impact any hardware
that does not have this issue.

With the retry limit the performance of an open-close testcase
improved between 60-70% on ThunderX2.

Suggested-by: Linus Torvalds
Signed-off-by: Jan Glauber
Signed-off-by: Linus Torvalds

Jan Glauber
2019-06-08 04:15:06 +0800

13 Apr, 2018

1 commit

450b1f6f5 lockref: Add lockref_put_not_zero ... Browse Code »

Put a lockref unless the lockref is dead or its count would become zero.
This is the same as lockref_put_or_lock except that the lock is never
left held.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Bob Peterson

Andreas Gruenbacher
2018-04-13 00:41:19 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

16 Nov, 2016

1 commit

f2f09a4ce locking/core: Remove cpu_relax_lowlatency() users ... Browse Code »

With the s390 special case of a yielding cpu_relax() implementation gone,
we can now remove all users of cpu_relax_lowlatency() and replace them
with cpu_relax().

Signed-off-by: Christian Borntraeger
Signed-off-by: Peter Zijlstra (Intel)
Cc: Catalin Marinas
Cc: Heiko Carstens
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Nicholas Piggin
Cc: Noam Camus
Cc: Peter Zijlstra
Cc: Russell King
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar

Christian Borntraeger
2016-11-16 17:15:10 +0800

12 Aug, 2015

1 commit

f5468ffde locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition ... Browse Code »

cmpxchg64_relaxed() is now defined by linux/atomic.h, so we can
remove our local definition from the lockref code.

Signed-off-by: Will Deacon
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1438880084-18856-5-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar

Will Deacon
2015-08-12 17:59:04 +0800

24 Feb, 2015

1 commit

4d3199e4c locking: Remove ACCESS_ONCE() usage ... Browse Code »

With the new standardized functions, we can replace all
ACCESS_ONCE() calls across relevant locking - this includes
lockref and seqlock while at it.

ACCESS_ONCE() does not work reliably on non-scalar types.
For example gcc 4.6 and 4.7 might remove the volatile tag
for such accesses during the SRA (scalar replacement of
aggregates) step:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145

Update the new calls regardless of if it is a scalar type,
this is cleaner than having three alternatives.

Signed-off-by: Davidlohr Bueso
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Thomas Gleixner
Cc: Paul E. McKenney
Link: http://lkml.kernel.org/r/1424662301.6539.18.camel@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2015-02-24 15:44:16 +0800

26 Jan, 2015

1 commit

360f54796 dcache: let the dentry count go down to zero without taking d_lock ... Browse Code »

We can be more aggressive about this, if we are clever and careful. This is subtle.

Signed-off-by: Linus Torvalds
Signed-off-by: Al Viro

Linus Torvalds
2015-01-26 12:16:29 +0800

17 Jul, 2014

1 commit

3a6bfbc91 arch, locking: Ciao arch_mutex_cpu_relax() ... Browse Code »

The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.

This patch (i) renames the call to cpu_relax_lowlatency ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Anton Blanchard
Cc: Aurelien Jacquiot
Cc: Benjamin Herrenschmidt
Cc: Bharat Bhushan
Cc: Catalin Marinas
Cc: Chen Liqin
Cc: Chris Metcalf
Cc: Christian Borntraeger
Cc: Chris Zankel
Cc: David Howells
Cc: David S. Miller
Cc: Deepthi Dharwar
Cc: Dominik Dingel
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Guan Xuetao
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Heiko Carstens
Cc: Helge Deller
Cc: Hirokazu Takata
Cc: Ivan Kokshaysky
Cc: James E.J. Bottomley
Cc: James Hogan
Cc: Jason Wang
Cc: Jesper Nilsson
Cc: Joe Perches
Cc: Jonas Bonn
Cc: Joseph Myers
Cc: Kees Cook
Cc: Koichi Yasutake
Cc: Lennox Wu
Cc: Linus Torvalds
Cc: Mark Salter
Cc: Martin Schwidefsky
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Neuling
Cc: Michal Simek
Cc: Mikael Starvik
Cc: Nicolas Pitre
Cc: Paolo Bonzini
Cc: Paul Burton
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Paul Mackerras
Cc: Qais Yousef
Cc: Qiaowei Ren
Cc: Rafael Wysocki
Cc: Ralf Baechle
Cc: Richard Henderson
Cc: Richard Kuo
Cc: Russell King
Cc: Steven Miao
Cc: Steven Rostedt
Cc: Stratos Karafotis
Cc: Tim Chen
Cc: Tony Luck
Cc: Vasily Kulikov
Cc: Vineet Gupta
Cc: Vineet Gupta
Cc: Waiman Long
Cc: Will Deacon
Cc: Wolfram Sang
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2014-07-17 18:32:47 +0800

28 Nov, 2013

1 commit

14058d20c lockref: include mutex.h rather than reinvent arch_mutex_cpu_relax ... Browse Code »

arch_mutex_cpu_relax is already conditionally defined in mutex.h, so
simply include that header rather than replicate the code here.

Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-11-28 12:37:33 +0800

15 Nov, 2013

1 commit

57f4257ea lockref: use BLOATED_SPINLOCKS to avoid explicit config dependencies ... Browse Code »

Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
friendly compile-time test to determine this.

[kirill.shutemov@linux.intel.com: drop CONFIG_CMPXCHG_LOCKREF]
Signed-off-by: Peter Zijlstra
Signed-off-by: Kirill A. Shutemov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2013-11-15 08:32:20 +0800

11 Nov, 2013

1 commit

8b5baa460 Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw ... Browse Code »

Pull gfs2 updates from Steven Whitehouse:
"The main feature of interest this time is quota updates. There are
some clean ups and some patches to use the new generic lru list code.

There is still plenty of scope for some further changes in due course -
faster lookups of quota structures is very much on the todo list.
Also, a start has been made towards the more tricky issue of using the
generic lru code with glocks, but that will have to be completed in a
subsequent merge window.

The other, more minor feature, is that there have been a number of
performance patches which relate to block allocation. In particular
they will improve performance when the disk is nearly full"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: Use generic list_lru for quota
GFS2: Rename quota qd_lru_lock qd_lock
GFS2: Use reflink for quota data cache
GFS2: Use lockref for glocks
GFS2: Protect quota sync generation
GFS2: Inline qd_trylock into gfs2_quota_unlock
GFS2: Make two similar quota code fragments into a function
GFS2: Remove obsolete quota tunable
GFS2: Move gfs2_icbit_munge into quota.c
GFS2: Speed up starting point selection for block allocation
GFS2: Add allocation parameters structure
GFS2: Clean up reservation removal
GFS2: fix dentry leaks
GFS2: new function gfs2_rbm_incr
GFS2: Introduce rbm field bii
GFS2: Do not reset flags on active reservations
GFS2: introduce bi_blocks for optimization
GFS2: optimize rbm_from_block wrt bi_start
GFS2: d_splice_alias() can't return error

Linus Torvalds
2013-11-11 06:11:00 +0800

15 Oct, 2013

1 commit

e66cf1610 GFS2: Use lockref for glocks ... Browse Code »

Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.

As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.

The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.

A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.

The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().

Signed-off-by: Steven Whitehouse

Steven Whitehouse
2013-10-15 22:18:08 +0800

28 Sep, 2013

2 commits

491f6f8e5 lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP() ... Browse Code »

Make use of arch_mutex_cpu_relax() so architectures can override the
default cpu_relax() semantics.
This is especially useful for s390, where cpu_relax() means that we
yield() the current (virtual) cpu and therefore is very expensive,
and would contradict the whole purpose of the lockless cmpxchg loop.

Signed-off-by: Heiko Carstens

Heiko Carstens
2013-09-28 18:46:24 +0800
d2212b4dc lockref: allow relaxed cmpxchg64 variant for lockless updates ... Browse Code »

The 64-bit cmpxchg operation on the lockref is ordered by virtue of
hazarding between the cmpxchg operation and the reference count
manipulation. On weakly ordered memory architectures (such as ARM), it
can be of great benefit to omit the barrier instructions where they are
not needed.

This patch moves the lockless lockref code over to a cmpxchg64_relaxed
operation, which doesn't provide barrier semantics. If the operation
isn't defined, we simply #define it as the usual 64-bit cmpxchg macro.

Cc: Waiman Long
Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-09-28 00:15:01 +0800

21 Sep, 2013

1 commit

8f4c34469 lockref: use cmpxchg64 explicitly for lockless updates ... Browse Code »

The cmpxchg() function tends not to support 64-bit arguments on 32-bit
architectures. This could be either due to use of unsigned long
arguments (like on ARM) or lack of instruction support (cmpxchgq on
x86). However, these architectures may implement a specific cmpxchg64()
function to provide 64-bit cmpxchg support instead.

Since the lockref code requires a 64-bit cmpxchg and relies on the
architecture selecting ARCH_USE_CMPXCHG_LOCKREF, move to using cmpxchg64
instead of cmpxchg and allow 32-bit architectures to make use of the
lockless lockref implementation.

Cc: Waiman Long
Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-09-21 00:04:28 +0800

08 Sep, 2013

2 commits

e7d33bb5e lockref: add ability to mark lockrefs "dead" ... Browse Code »

The only actual current lockref user (dcache) uses zero reference counts
even for perfectly live dentries, because it's a cache: there may not be
any users, but that doesn't mean that we want to throw away the dentry.

At the same time, the dentry cache does have a notion of a truly "dead"
dentry that we must not even increment the reference count of, because
we have pruned it and it is not valid.

Currently that distinction is not visible in the lockref itself, and the
dentry cache validation uses "lockref_get_or_lock()" to either get a new
reference to a dentry that already had existing references (and thus
cannot be dead), or get the dentry lock so that we can then verify the
dentry and increment the reference count under the lock if that
verification was successful.

That's all somewhat complicated.

This adds the concept of being "dead" to the lockref itself, by simply
using a count that is negative. This allows a usage scenario where we
can increment the refcount of a dentry without having to validate it,
and pushing the special "we killed it" case into the lockref code.

The dentry code itself doesn't actually use this yet, and it's probably
too late in the merge window to do that code (the dentry_kill() code
with its "should I decrement the count" logic really is pretty complex
code), but let's introduce the concept at the lockref level now.

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-08 06:49:18 +0800
44a0cf929 lockref: fix docbook argument names ... Browse Code »

The code got rewritten, but the comments got copied as-is from older
versions, and as a result the argument name in the comment didn't
actually match the code any more.

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-08 06:30:29 +0800

04 Sep, 2013

1 commit

d472d9d98 lockref: Relax in cmpxchg loop ... Browse Code »

While we are likley to succeed and break out of this loop, it isn't
guaranteed. We should be power and thread friendly if we do have to
go around for a second (or third, or more) attempt.

Signed-off-by: Tony Luck
Signed-off-by: Linus Torvalds

Luck, Tony
2013-09-04 06:36:42 +0800

03 Sep, 2013

2 commits

bc08b449e lockref: implement lockless reference count updates using cmpxchg() ... Browse Code »

Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop. This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.

Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.

So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.

The lockref structure, in contrast, really is a *locked* reference
count. If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.

In order to enable the cmpxchg lockless code, the architecture needs to
do three things:

(1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
in an aligned u64, and have a "cmpxchg()" implementation that works
on such a u64 data type.

(2) define a helper function to test for a spinlock being unlocked
("arch_spin_value_unlocked()")

(3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
Kconfig file.

This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-03 03:12:15 +0800
2f4f12e57 lockref: uninline lockref helper functions ... Browse Code »

They aren't very good to inline, since they already call external
functions (the spinlock code), and we're going to create rather more
complicated versions of them that can do the reference count updates
locklessly.

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-03 02:58:20 +0800