Eric Lee / smarc-fsl-linux-kernel

11 Jun, 2020

1 commit

37f8173dd locking/atomics: Flip fallbacks and instrumentation ... Browse Code »

Currently instrumentation of atomic primitives is done at the architecture
level, while composites or fallbacks are provided at the generic level.

The result is that there are no uninstrumented variants of the
fallbacks. Since there is now need of such variants to isolate text poke
from any form of instrumentation invert this ordering.

Doing this means moving the instrumentation into the generic code as
well as having (for now) two variants of the fallbacks.

Notes:

- the various *cond_read* primitives are not proper fallbacks
and got moved into linux/atomic.c. No arch_ variants are
generated because the base primitives smp_cond_load*()
are instrumented.

- once all architectures are moved over to arch_atomic_ one of the
fallback variants can be removed and some 2300 lines reclaimed.

- atomic_{read,set}*() are no longer double-instrumented

Reported-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Acked-by: Mark Rutland
Link: https://lkml.kernel.org/r/20200505134058.769149955@linutronix.de

Peter Zijlstra
2020-06-11 14:03:24 +0800

01 Nov, 2018

1 commit

9fa45070a locking/atomics: Switch to generated fallbacks ... Browse Code »

As a step to ensuring the atomic* APIs are consistent, switch to fallbacks
generated by gen-atomic-fallback.sh.

These are checked in rather than generated with Kbuild, since:

* This allows inspection of the atomics with git grep and ctags on a
pristine tree, which Linus strongly prefers being able to do.

* The fallbacks are not affected by machine details or configuration
options, so it is not necessary to regenerate them to take these into
account.

* These are included by files required *very* early in the build process
(e.g. for generating bounds.h), and we'd rather not complicate the
top-level Kbuild file with dependencies.

The new fallback header should be equivalent to the old fallbacks in
, but:

* It is formatted a little differently due to scripting ensuring things
are more regular than they used to be.

* Fallbacks are now expanded in-place as static inline functions rather
than macros.

* The prototypes for fallbacks are arragned consistently with the return
type on a separate line to try to keep to a sensible line length.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Will Deacon
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: linuxdrivers@attotech.com
Cc: dvyukov@google.com
Cc: Boqun Feng
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: glider@google.com
Link: http://lkml.kernel.org/r/20180904104830.2975-3-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-11-01 18:00:46 +0800

25 Jul, 2018

1 commit

fd2efaa4e locking/atomics: Rework ordering barriers ... Browse Code »

Currently architectures can override __atomic_op_*() to define the barriers
used before/after a relaxed atomic when used to build acquire/release/fence
variants.

This has the unfortunate property of requiring the architecture to define the
full wrapper for the atomics, rather than just the barriers they care about,
and gets in the way of generating atomics which can be easily read.

Instead, this patch has architectures define an optional set of barriers:

* __atomic_acquire_fence()
* __atomic_release_fence()
* __atomic_pre_full_fence()
* __atomic_post_full_fence()

... which uses to build the wrappers.

It would be nice if we could undef these, along with the __atomic_op_*()
wrappers, but that would break the cmpxchg() wrappers, which are written
in preprocessor. Undefs would have been nice, but alas.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Acked-by: Peter Zijlstra (Intel)
Acked-by: Will Deacon
Cc: Andrea Parri
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: andy.shevchenko@gmail.com
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: catalin.marinas@arm.com
Cc: dvyukov@google.com
Cc: glider@google.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: peter@hurleysoftware.com
Link: http://lkml.kernel.org/r/20180716113017.3909-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-07-25 17:53:59 +0800

21 Jun, 2018

11 commits

7cc7eaad4 atomics/treewide: Clean up '*_andnot()' ifdeffery ... Browse Code »

The ifdeffery for atomic*_{fetch_,}andnot() is unlike that for all the
other atomics. If atomic*_andnot() is not defined, the corresponding
atomic*_fetch_andnot() is assumed to not be defined.

Additionally, the fallbacks for the various ordering cases are written
much later in atomic.h as static inlines.

This isn't problematic today, but gets in the way of scripting the
generation of atomics. To prepare for scripting, this patch:

* Switches to separate ifdefs for atomic*_andnot() and
atomic*_fetch_andnot(), updating implementations as appropriate.

* Moves the fallbacks into the standards ifdefs, as macro expansions
rather than static inlines.

* Removes trivial andnot implementations from architectures, where these
are superseded by core code.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-19-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:25:24 +0800
b3a2a05f9 atomics/treewide: Make conditional inc/dec ops optional ... Browse Code »

The conditional inc/dec ops differ for atomic_t and atomic64_t:

- atomic_inc_unless_positive() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_unless_negative() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_if_positive is optional for atomic_t, and is mandatory for atomic64_t.

Let's make these consistently optional for both. At the same time, let's
clean up the existing fallbacks to use atomic_try_cmpxchg().

The instrumented atomics are updated accordingly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-18-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:25:24 +0800
9837559d8 atomics/treewide: Make unconditional inc/dec ops optional ... Browse Code »

Many of the inc/dec ops are mandatory, but for most architectures inc/dec are
simply trivial wrappers around their corresponding add/sub ops.

Let's make all the inc/dec ops optional, so that we can get rid of these
boilerplate wrappers.

The instrumented atomics are updated accordingly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Acked-by: Palmer Dabbelt
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-17-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:25:24 +0800
18cc1814d atomics/treewide: Make test ops optional ... Browse Code »

Some of the atomics return the result of a test applied after the atomic
operation, and almost all architectures implement these as trivial
wrappers around the underlying atomic. Specifically:

* _inc_and_test(v) is (_inc_return(v) == 0)
* _dec_and_test(v) is (_dec_return(v) == 0)
* _sub_and_test(i, v) is (_sub_return(i, v) == 0)
* _add_negative(i, v) is (_add_return(i, v) < 0)

Rather than have these definitions duplicated in all architectures, with
minor inconsistencies in formatting and documentation, let's make these
operations optional, with default fallbacks as above. Implementations
must now provide a preprocessor symbol.

The instrumented atomics are updated accordingly.

Both x86 and m68k have custom implementations, which are left as-is,
given preprocessor symbols to avoid being overridden.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Geert Uytterhoeven
Acked-by: Peter Zijlstra (Intel)
Acked-by: Palmer Dabbelt
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-16-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:25:24 +0800
356701329 atomics/treewide: Make atomic64_fetch_add_unless() optional ... Browse Code »

Architectures with atomic64_fetch_add_unless() provide a preprocessor
symbol if they do so, and all other architectures have trivial C
implementations of atomic64_add_unless() which are near-identical.

Let's unify the trivial definitions of atomic64_fetch_add_unless() in
, so that we always have both
atomic64_fetch_add_unless() and atomic64_add_unless() with less
boilerplate code.

This means that atomic64_add_unless() is always implemented in core
code, and the instrumented atomics are updated accordingly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-15-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:25:24 +0800
0ae1d9940 atomics: Prepare for atomic64_fetch_add_unless() ... Browse Code »

Currently all architectures must implement atomic_fetch_add_unless(),
with common code providing atomic_add_unless(). Architectures must also
implement atomic64_add_unless() directly, with no corresponding
atomic64_fetch_add_unless().

This divergence is unfortunate, and means that the APIs for atomic_t,
atomic64_t, and atomic_long_t differ.

In preparation for unifying things, with architectures providing
atomic64_fetch_add_unless, this patch adds a generic
atomic64_add_unless() which will use atomic64_fetch_add_unless(). The
instrumented atomics are updated to take this case into account.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Cc: Albert Ou
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Boqun Feng
Cc: Ivan Kokshaysky
Cc: Linus Torvalds
Cc: Matt Turner
Cc: Michael Ellerman
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Richard Henderson
Cc: Russell King
Cc: Thomas Gleixner
Cc: Vineet Gupta
Link: https://lore.kernel.org/lkml/20180621121321.4761-8-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:22:34 +0800
eccc2da8c atomics/treewide: Make atomic_fetch_add_unless() optional ... Browse Code »

Several architectures these have a near-identical implementation based
on atomic_read() and atomic_cmpxchg() which we can instead define in
, so let's do so, using something close to the existing
x86 implementation with try_cmpxchg().

Where an architecture provides its own atomic_fetch_add_unless(), it
must define a preprocessor symbol for it. The instrumented atomics are
updated accordingly.

Note that arch/arc's existing atomic_fetch_add_unless() had redundant
barriers, as these are already present in its atomic_cmpxchg()
implementation.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Geert Uytterhoeven
Reviewed-by: Will Deacon
Acked-by: Geert Uytterhoeven
Acked-by: Peter Zijlstra (Intel)
Acked-by: Palmer Dabbelt
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Cc: Vineet Gupta
Link: https://lore.kernel.org/lkml/20180621121321.4761-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:22:33 +0800
bef828204 atomics/treewide: Make atomic64_inc_not_zero() optional ... Browse Code »

We define a trivial fallback for atomic_inc_not_zero(), but don't do
the same for atomic64_inc_not_zero(), leading most architectures to
define the same boilerplate.

Let's add a fallback in , and remove the redundant
implementations. Note that atomic64_add_unless() is always defined in
, and promotes its arguments to the requisite types, so
we need not do this explicitly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Acked-by: Palmer Dabbelt
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-6-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:22:33 +0800
ade5ef928 atomics: Make conditional ops return 'bool' ... Browse Code »

Some of the atomics return a status value, which is a boolean value
describing whether the operation was performed. To make it clear that
this is a boolean value, let's update the common fallbacks to return
bool, fixing up the return values and comments likewise.

At the same time, let's simplify the description of the operations in
their respective comments.

The instrumented atomics and generic atomic64 implementation are updated
accordingly.

Note that atomic64_dec_if_positive() doesn't follow the usual test op
pattern, and returns the would-be decremented value. This is not
changed.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Michael Ellerman
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-5-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:22:33 +0800
f74445b6d atomics/treewide: Remove atomic_inc_not_zero_hint() ... Browse Code »

While documentation suggests atomic_inc_not_zero_hint() will perform better
than atomic_inc_not_zero(), this is unlikely to be the case. No architectures
implement atomic_inc_not_zero_hint() directly, and thus it either falls back to
atomic_inc_not_zero(), or a loop using atomic_cmpxchg().

Whenever the hint does not match the value in memory, the repeated use of
atomic_cmpxchg() will be more expensive than the read that
atomic_inc_not_zero_hint() attempts to avoid. For architectures with LL/SC
atomics, a read cannot be avoided, and it would always be better to use
atomic_inc_not_zero() directly. For other architectures, their own
atomic_inc_not_zero() is likely to be more optimal than an atomic_cmpxchg()
loop regardless.

Generally, atomic_inc_not_zero_hint() is liable to perform worse than
atomic_inc_not_zero(). Further, atomic_inc_not_zero_hint() only exists
for atomic_t, and not atomic64_t or atomic_long_t, and there is only one
user in the kernel tree.

Given all this, let's remove atomic_inc_not_zero_hint(), and migrate the
existing user over to atomic_inc_not_zero().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-4-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:22:33 +0800
bfc18e389 atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless() ... Browse Code »

While __atomic_add_unless() was originally intended as a building-block
for atomic_add_unless(), it's now used in a number of places around the
kernel. It's the only common atomic operation named __atomic*(), rather
than atomic_*(), and for consistency it would be better named
atomic_fetch_add_unless().

This lack of consistency is slightly confusing, and gets in the way of
scripting atomics. Given that, let's clean things up and promote it to
an official part of the atomics API, in the form of
atomic_fetch_add_unless().

This patch converts definitions and invocations over to the new name,
including the instrumented version, using the following script:

----
git grep -w __atomic_add_unless | while read line; do
sed -i '{s/\/atomic_fetch_add_unless/}' "${line%%:*}";
done
git grep -w __arch_atomic_add_unless | while read line; do
sed -i '{s/\/arch_atomic_fetch_add_unless/}' "${line%%:*}";
done
----

Note that we do not have atomic{64,_long}_fetch_add_unless(), which will
be introduced by later patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Acked-by: Geert Uytterhoeven
Acked-by: Peter Zijlstra (Intel)
Acked-by: Palmer Dabbelt
Cc: Boqun Feng
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com
Signed-off-by: Ingo Molnar

Mark Rutland
2018-06-21 20:22:32 +0800

27 Apr, 2018

1 commit

fcfdfe30e locking/barriers: Introduce smp_cond_load_relaxed() and atomic_cond_read_relaxed() ... Browse Code »

Whilst we currently provide smp_cond_load_acquire() and
atomic_cond_read_acquire(), there are cases where the ACQUIRE semantics are
not required because of a subsequent fence or release operation once the
conditional loop has exited.

This patch adds relaxed versions of the conditional spinning primitives
to avoid unnecessary barrier overhead on architectures such as arm64.

Signed-off-by: Will Deacon
Acked-by: Peter Zijlstra (Intel)
Acked-by: Waiman Long
Cc: Linus Torvalds
Cc: Thomas Gleixner
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar

Will Deacon
2018-04-27 15:48:44 +0800

07 Nov, 2017

1 commit

8c5db92a7 Merge branch 'linus' into locking/core, to resolve conflicts ... Browse Code »

Conflicts:
include/linux/compiler-clang.h
include/linux/compiler-gcc.h
include/linux/compiler-intel.h
include/uapi/linux/stddef.h

Signed-off-by: Ingo Molnar

Ingo Molnar
2017-11-07 17:32:44 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

25 Oct, 2017

1 commit

4df714be4 locking/atomic: Add atomic_cond_read_acquire() ... Browse Code »

smp_cond_load_acquire() provides a way to spin on a variable with acquire
semantics until some conditional expression involving the variable is
satisfied. Architectures such as arm64 can potentially enter a low-power
state, waking up only when the value of the variable changes, which
reduces the system impact of tight polling loops.

This patch makes the same interface available to users of atomic_t,
atomic64_t and atomic_long_t, rather than require messy accesses to the
structure internals.

Signed-off-by: Will Deacon
Acked-by: Peter Zijlstra
Cc: Boqun Feng
Cc: Jeremy.Linton@arm.com
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Thomas Gleixner
Cc: Waiman Long
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1507810851-306-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar

Will Deacon
2017-10-25 16:57:24 +0800

10 Aug, 2017

1 commit

d89e588ca locking: Introduce smp_mb__after_spinlock() ... Browse Code »

Since its inception, our understanding of ACQUIRE, esp. as applied to
spinlocks, has changed somewhat. Also, I wonder if, with a simple
change, we cannot make it provide more.

The problem with the comment is that the STORE done by spin_lock isn't
itself ordered by the ACQUIRE, and therefore a later LOAD can pass over
it and cross with any prior STORE, rendering the default WMB
insufficient (pointed out by Alan).

Now, this is only really a problem on PowerPC and ARM64, both of
which already defined smp_mb__before_spinlock() as a smp_mb().

At the same time, we can get a much stronger construct if we place
that same barrier _inside_ the spin_lock(). In that case we upgrade
the RCpc spinlock to an RCsc. That would make all schedule() calls
fully transitive against one another.

Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Will Deacon
Cc: Alan Stern
Cc: Benjamin Herrenschmidt
Cc: Linus Torvalds
Cc: Michael Ellerman
Cc: Nicholas Piggin
Cc: Oleg Nesterov
Cc: Paul McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-08-10 18:29:02 +0800

30 Mar, 2017

1 commit

44fe84459 locking/atomic: Fix atomic_try_cmpxchg() semantics ... Browse Code »

Dmitry noted that the new atomic_try_cmpxchg() primitive is broken when
the old pointer doesn't point to the local stack.

He writes:

"Consider a classical lock-free stack push:

node->next = atomic_read(&head);
do {
} while (!atomic_try_cmpxchg(&head, &node->next, node));

This code is broken with the current implementation, the problem is
with unconditional update of *__po.

In case of success it writes the same value back into *__po, but in
case of cmpxchg success we might have lose ownership of some memory
locations and potentially over what __po has pointed to. The same
holds for the re-read of *__po. "

He also points out that this makes it surprisingly different from the
similar C/C++ atomic operation.

After investigating the code-gen differences caused by this patch; and
a number of alternatives (Linus dislikes this interface lots), we
arrived at these results (size x86_64-defconfig/vmlinux):

GCC-6.3.0:

10735757 cmpxchg
10726413 try_cmpxchg
10730509 try_cmpxchg + patch
10730445 try_cmpxchg-linus

GCC-7 (20170327):

10709514 cmpxchg
10704266 try_cmpxchg
10704266 try_cmpxchg + patch
10704394 try_cmpxchg-linus

From this we see that the patch has the advantage of better code-gen
on GCC-7 and keeps the interface roughly consistent with the C
language variant.

Reported-by: Dmitry Vyukov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Fixes: a9ebf306f52c ("locking/atomic: Introduce atomic_try_cmpxchg()")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-03-30 15:35:54 +0800

23 Mar, 2017

1 commit

a9ebf306f locking/atomic: Introduce atomic_try_cmpxchg() ... Browse Code »

Add a new cmpxchg interface:

bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new);

Where the boolean returns the result of the compare; and thus if the
exchange happened; and in case of failure, the new value of *ptr is
returned in *val.

This allows simplification/improvement of loops like:

for (;;) {
new = val $op $imm;
old = cmpxchg(ptr, val, new);
if (old == val)
break;
val = old;
}

into:

do {
} while (!try_cmpxchg(ptr, &val, val $op $imm));

while also generating better code (GCC6 and onwards).

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2017-03-23 15:54:40 +0800

07 Jul, 2016

1 commit

f06628638 locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API ... Browse Code »

With the inclusion of atomic FETCH-OP variants, many places in the
kernel can make use of atomic_fetch_$op() to avoid the callers that
need to compute the value/state _before_ the operation.

Peter Zijlstra laid out the machinery but we are still missing the
simpler dec,inc() calls (which future patches will make use of).

This patch only deals with the generic code, as at least right now
no arch actually implement them -- which is similar to what the
OP-RETURN primitives currently do.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: James.Bottomley@HansenPartnership.com
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: awalls@md.metrocast.net
Cc: bp@alien8.de
Cc: cw00.choi@samsung.com
Cc: davem@davemloft.net
Cc: dledford@redhat.com
Cc: dougthompson@xmission.com
Cc: gregkh@linuxfoundation.org
Cc: hans.verkuil@cisco.com
Cc: heiko.carstens@de.ibm.com
Cc: jikos@kernel.org
Cc: kys@microsoft.com
Cc: mchehab@osg.samsung.com
Cc: pfg@sgi.com
Cc: schwidefsky@de.ibm.com
Cc: sean.hefty@intel.com
Cc: sumit.semwal@linaro.org
Link: http://lkml.kernel.org/r/20160628215651.GA20048@linux-80c1.suse
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2016-07-07 15:16:20 +0800

16 Jun, 2016

4 commits

e37837fb6 locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions ... Browse Code »

These functions have been deprecated for a while and there is only the
one user left, convert and kill.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Boqun Feng
Cc: Davidlohr Bueso
Cc: Frederic Weisbecker
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-16 16:48:33 +0800
b53d6bedb locking/atomic: Remove linux/atomic.h:atomic_fetch_or() ... Browse Code »

Since all architectures have this implemented now natively, remove this
dead code.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-16 16:48:32 +0800
28aa2bda2 locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_… ... Browse Code »

…relaxed,_acquire,_release}()

Now that all the architectures have implemented support for these new
atomic primitives add on the generic infrastructure to expose and use
it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Peter Zijlstra
2016-06-16 16:48:32 +0800
e12133324 locking/atomic: Fix atomic64_relaxed() bits ... Browse Code »

We should only expand the atomic64 relaxed bits once we've included
all relevant headers. So move it down until after we potentially
include asm-generic/atomic64.h.

In practise this will not have made a difference so far, since the
generic bits will not define _relaxed versions.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Davidlohr Bueso
Cc: Frederic Weisbecker
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-06-16 16:48:31 +0800

05 May, 2016

1 commit

a1cc5bcfc locking/atomics: Flip atomic_fetch_or() arguments ... Browse Code »

All the atomic operations have their arguments the wrong way around;
make atomic_fetch_or() consistent and flip them.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-05-05 15:58:52 +0800

29 Mar, 2016

2 commits

5529578a2 locking/atomic, sched: Unexport fetch_or() ... Browse Code »

This patch functionally reverts:

5fd7a09cfb8c ("atomic: Export fetch_or()")

During the merge Linus observed that the generic version of fetch_or()
was messy:

" This makes the ugly "fetch_or()" macro that the scheduler used
internally a new generic helper, and does a bad job at it. "

e23604edac2a Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Now that we have introduced atomic_fetch_or(), fetch_or() is only used
by the scheduler in order to deal with thread_info flags which type
can vary across architectures.

Lets confine fetch_or() back to the scheduler so that we encourage
future users to use the more robust and well typed atomic_t version
instead.

While at it, fetch_or() gets robustified, pasting improvements from a
previous patch by Ingo Molnar that avoids needless expression
re-evaluations in the loop.

Reported-by: Linus Torvalds
Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1458830281-4255-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2016-03-29 17:52:11 +0800
5acba71e1 locking/atomic: Introduce atomic_fetch_or() ... Browse Code »

This is deemed to replace the type generic fetch_or() which brings a lot
of issues such as macro induced block variable aliasing and sloppy types.
Not to mention fetch_or() doesn't refer to any namespace, adding even
more confusion.

So lets provide an atomic_t version. Current and next users of fetch_or()
are thus encouraged to use atomic_t.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1458830281-4255-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2016-03-29 17:52:11 +0800

20 Mar, 2016

1 commit

d5e2d0089 Merge tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux ... Browse Code »

Pull powerpc updates from Michael Ellerman:
"This was delayed a day or two by some build-breakage on old toolchains
which we've now fixed.

There's two PCI commits both acked by Bjorn.

There's one commit to mm/hugepage.c which is (co)authored by Kirill.

Highlights:
- Restructure Linux PTE on Book3S/64 to Radix format from Paul
Mackerras
- Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh
Kumar K.V
- Add POWER9 cputable entry from Michael Neuling
- FPU/Altivec/VSX save/restore optimisations from Cyril Bur
- Add support for new ftrace ABI on ppc64le from Torsten Duwe

Various cleanups & minor fixes from:
- Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy,
Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell
Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh.

General:
- atomics: Allow architectures to define their own __atomic_op_*
helpers from Boqun Feng
- Implement atomic{, 64}_*_return_* variants and acquire/release/
relaxed variants for (cmp)xchg from Boqun Feng
- Add powernv_defconfig from Jeremy Kerr
- Fix BUG_ON() reporting in real mode from Balbir Singh
- Add xmon command to dump OPAL msglog from Andrew Donnellan
- Add xmon command to dump process/task similar to ps(1) from Douglas
Miller
- Clean up memory hotplug failure paths from David Gibson

pci/eeh:
- Redesign SR-IOV on PowerNV to give absolute isolation between VFs
from Wei Yang.
- EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
- PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
- PCI: Add pcibios_bus_add_device() weak function from Wei Yang
- MAINTAINERS: Update EEH details and maintainership from Russell
Currey

cxl:
- Support added to the CXL driver for running on both bare-metal and
hypervisor systems, from Christophe Lombard and Frederic Barrat.
- Ignore probes for virtual afu pci devices from Vaibhav Jain

perf:
- Export Power8 generic and cache events to sysfs from Sukadev
Bhattiprolu
- hv-24x7: Fix usage with chip events, display change in counter
values, display domain indices in sysfs, eliminate domain suffix in
event names, from Sukadev Bhattiprolu

Freescale:
- Updates from Scott: "Highlights include 8xx optimizations, 32-bit
checksum optimizations, 86xx consolidation, e5500/e6500 cpu
hotplug, more fman and other dt bits, and minor fixes/cleanup"

* tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
powerpc: Fix unrecoverable SLB miss during restore_math()
powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers
powerpc/rcpm: Fix build break when SMP=n
powerpc/book3e-64: Use hardcoded mttmr opcode
powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible
powerpc/T104xRDB: add tdm riser card node to device tree
powerpc32: PAGE_EXEC required for inittext
powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree
powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
powerpc/86xx: Introduce and use common dtsi
powerpc/86xx: Update device tree
powerpc/86xx: Move dts files to fsl directory
powerpc/86xx: Switch to kconfig fragments approach
powerpc/86xx: Update defconfigs
powerpc/86xx: Consolidate common platform code
powerpc32: Remove one insn in mulhdu
powerpc32: small optimisation in flush_icache_range()
powerpc: Simplify test in __dma_sync()
powerpc32: move xxxxx_dcache_range() functions inline
powerpc32: Remove clear_pages() and define clear_page() inline
...

Linus Torvalds
2016-03-20 06:38:41 +0800

17 Feb, 2016

1 commit

e1ab7f39d atomics: Allow architectures to define their own __atomic_op_* helpers ... Browse Code »

Some architectures may have their special barriers for acquire, release
and fence semantics, so that general memory barriers(smp_mb__*_atomic())
in the default __atomic_op_*() may be too strong, so allow architectures
to define their own helpers which can overwrite the default helpers.

Signed-off-by: Boqun Feng
Signed-off-by: Michael Ellerman

Boqun Feng
2016-02-17 21:11:02 +0800

13 Feb, 2016

1 commit

5fd7a09cf atomic: Export fetch_or() ... Browse Code »

Export fetch_or() that's implemented and used internally by the
scheduler. We are going to use it for NO_HZ so make it generally
available.

Reviewed-by: Chris Metcalf
Cc: Christoph Lameter
Cc: Chris Metcalf
Cc: Ingo Molnar
Cc: Luiz Capitulino
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Viresh Kumar
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2016-02-13 22:34:28 +0800

04 Nov, 2015

1 commit

105ff3cbf atomic: remove all traces of READ_ONCE_CTRL() and atomic*_read_ctrl() ... Browse Code »

This seems to be a mis-reading of how alpha memory ordering works, and
is not backed up by the alpha architecture manual. The helper functions
don't do anything special on any other architectures, and the arguments
that support them being safe on other architectures also argue that they
are safe on alpha.

Basically, the "control dependency" is between a previous read and a
subsequent write that is dependent on the value read. Even if the
subsequent write is actually done speculatively, there is no way that
such a speculative write could be made visible to other cpu's until it
has been committed, which requires validating the speculation.

Note that most weakely ordered architectures (very much including alpha)
do not guarantee any ordering relationship between two loads that depend
on each other on a control dependency:

read A
if (val == 1)
read B

because the conditional may be predicted, and the "read B" may be
speculatively moved up to before reading the value A. So we require the
user to insert a smp_rmb() between the two accesses to be correct:

read A;
if (A == 1)
smp_rmb()
read B

Alpha is further special in that it can break that ordering even if the
*address* of B depends on the read of A, because the cacheline that is
read later may be stale unless you have a memory barrier in between the
pointer read and the read of the value behind a pointer:

read ptr
read offset(ptr)

whereas all other weakly ordered architectures guarantee that the data
dependency (as opposed to just a control dependency) will order the two
accesses. As a result, alpha needs a "smp_read_barrier_depends()" in
between those two reads for them to be ordered.

The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()"
had was a control dependency to a subsequent *write*, however, and
nobody can finalize such a subsequent write without having actually done
the read. And were you to write such a value to a "stale" cacheline
(the way the unordered reads came to be), that would seem to lose the
write entirely.

So the things that make alpha able to re-order reads even more
aggressively than other weak architectures do not seem to be relevant
for a subsequent write. Alpha memory ordering may be strange, but
there's no real indication that it is *that* strange.

Also, the alpha architecture reference manual very explicitly talks
about the definition of "Dependence Constraints" in section 5.6.1.7,
where a preceding read dominates a subsequent write.

Such a dependence constraint admittedly does not impose a BEFORE (alpha
architecture term for globally visible ordering), but it does guarantee
that there can be no "causal loop". I don't see how you could avoid
such a loop if another cpu could see the stored value and then impact
the value of the first read. Put another way: the read and the write
could not be seen as being out of order wrt other cpus.

So I do not see how these "x_ctrl()" functions can currently be necessary.

I may have to eat my words at some point, but in the absense of clear
proof that alpha actually needs this, or indeed even an explanation of
how alpha could _possibly_ need it, I do not believe these functions are
called for.

And if it turns out that alpha really _does_ need a barrier for this
case, that barrier still should not be "smp_read_barrier_depends()".
We'd have to make up some new speciality barrier just for alpha, along
with the documentation for why it really is necessary.

Cc: Peter Zijlstra
Cc: Paul E McKenney
Cc: Dmitry Vyukov
Cc: Will Deacon
Cc: Ingo Molnar
Signed-off-by: Linus Torvalds

Linus Torvalds
2015-11-04 09:22:17 +0800

06 Oct, 2015

1 commit

63ab7bd0d locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics ... Browse Code »

Similar to what we have for regular add/sub calls. For now, no actual arch
implements them, so everyone falls back to the default atomics... iow,
nothing changes. These will be used in future primitives.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Thomas Gleixner
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Paul E.McKenney
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443643395-17016-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2015-10-06 23:28:19 +0800

23 Sep, 2015

2 commits

e3e72ab80 atomic: Implement atomic_read_ctrl() ... Browse Code »

Provide atomic_read_ctrl() to mirror READ_ONCE_CTRL(), such that we can
more conveniently use atomics in control dependencies.

Since we can assume atomic_read() implies a READ_ONCE(), we must only
emit an extra smp_read_barrier_depends() in order to upgrade to
READ_ONCE_CTRL() semantics.

Requested-by: Dmitry Vyukov
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Will Deacon
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20150918115637.GM3604@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-09-23 15:54:29 +0800
90fe65148 atomic: Add atomic_long_t bitops ... Browse Code »

When adding the atomic bitops, I seem to have forgotten about
atomic_long_t, fix this.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-09-23 15:54:28 +0800

12 Aug, 2015

1 commit

654672d4b locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations ... Browse Code »

Whilst porting the generic qrwlock code over to arm64, it became
apparent that any portable locking code needs finer-grained control of
the memory-ordering guarantees provided by our atomic routines.

In particular: xchg, cmpxchg, {add,sub}_return are often used in
situations where full barrier semantics (currently the only option
available) are not required. For example, when a reader increments a
reader count to obtain a lock, checking the old value to see if a writer
was present, only acquire semantics are strictly needed.

This patch introduces three new ordering semantics for these operations:

- *_relaxed: No ordering guarantees. This is similar to what we have
already for the non-return atomics (e.g. atomic_add).

- *_acquire: ACQUIRE semantics, similar to smp_load_acquire.

- *_release: RELEASE semantics, similar to smp_store_release.

In memory-ordering speak, this means that the acquire/release semantics
are RCpc as opposed to RCsc. Consequently a RELEASE followed by an
ACQUIRE does not imply a full barrier, as already documented in
memory-barriers.txt.

Currently, all the new macros are conditionally mapped to the full-mb
variants, however if the *_relaxed version is provided by the
architecture, then the acquire/release variants are constructed by
supplementing the relaxed routine with an explicit barrier.

Signed-off-by: Will Deacon
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1438880084-18856-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar

Will Deacon
2015-08-12 17:58:59 +0800

27 Jul, 2015

2 commits

de9e432cb atomic: Collapse all atomic_{set,clear}_mask definitions ... Browse Code »

Move the now generic definitions of atomic_{set,clear}_mask() into
linux/atomic.h to avoid endless and pointless repetition.

Also, provide an atomic_andnot() wrapper for those few archs that can
implement that.

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2015-07-27 20:06:24 +0800
e6942b7de atomic: Provide atomic_{or,xor,and} ... Browse Code »

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2015-07-27 20:06:24 +0800

13 Aug, 2014

1 commit

2e39465ab locking: Remove deprecated smp_mb__() barriers ... Browse Code »

Its been a while and there are no in-tree users left, so remove the
deprecated barriers.

Signed-off-by: Peter Zijlstra
Cc: Chen, Gong
Cc: Jacob Pan
Cc: Joe Perches
Cc: John Sullivan
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Srinivas Pandruvada
Cc: Theodore Ts'o
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-08-13 16:31:57 +0800