Eric Lee / smarc-fsl-linux-kernel

26 Feb, 2010

1 commit

a7e926abc x86-32: Rewrite 32-bit atomic64 functions in assembly ... Browse Code »

This patch replaces atomic64_32.c with two assembly implementations,
one for 386/486 machines using pushf/cli/popf and one for 586+ machines
using cmpxchg8b.

The cmpxchg8b implementation provides the following advantages over the
current one:

1. Implements atomic64_add_unless, atomic64_dec_if_positive and
atomic64_inc_not_zero

2. Uses the ZF flag changed by cmpxchg8b instead of doing a comparison

3. Uses custom register calling conventions that reduce or eliminate
register moves to suit cmpxchg8b

4. Reads the initial value instead of using cmpxchg8b to do that.
Currently we use lock xaddl and movl, which seems the fastest.

5. Does not use the lock prefix for atomic64_set
64-bit writes are already atomic, so we don't need that.
We still need it for atomic64_read to avoid restoring a value
changed in the meantime.

6. Allocates registers as well or better than gcc

The 386 implementation provides support for 386 and 486 machines.
386/486 SMP is not supported (we dropped it), but such support can be
added easily if desired.

A pure assembly implementation is required due to the custom calling
conventions, and desire to use %ebp in atomic64_add_return (we need
7 registers...), as well as the ability to use pushf/popf in the 386
code without an intermediate pop/push.

The parameter names are changed to match the convention in atomic_64.h

Changes in v3 (due to rebasing to tip/x86/asm):
- Patches atomic64_32.h instead of atomic_32.h
- Uses the CALL alternative mechanism from commit
1b1d9258181bae199dc940f4bd0298126b9a73d9

Changes in v2:
- Merged 386 and cx8 support in the same patch
- 386 support now done in assembly, C code no longer used at all
- cmpxchg64 is used for atomic64_cmpxchg
- stop using macros, use one-line inline functions instead
- miscellanous changes and improvements

Signed-off-by: Luca Barbieri
LKML-Reference:
Signed-off-by: H. Peter Anvin

Luca Barbieri
2010-02-26 12:47:30 +0800

04 Jul, 2009

4 commits

a79f0da80 x86: atomic64: Inline atomic64_read() again ... Browse Code »

Now atomic64_read() is light weight (no register pressure and
small icache), we can inline it again.

Also use "=&A" constraint instead of "+A" to avoid warning
about unitialized 'res' variable. (gcc had to force 0 in eax/edx)

$ size vmlinux.prev vmlinux.after
text data bss dec hex filename
4908667 451676 1684868 7045211 6b805b vmlinux.prev
4908651 451676 1684868 7045195 6b804b vmlinux.after

Signed-off-by: Eric Dumazet
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
[ Also fix typo in atomic64_set() export ]
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-04 17:45:00 +0800
ddf9a003d x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative() ... Browse Code »

Linus noticed that the variable name 'old_val' is
confusingly named in these functions - the correct
naming is 'new_val'.

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-04 03:15:08 +0800
3a8d1788b x86: atomic64: Improve atomic64_xchg() ... Browse Code »

Remove the read-first logic from atomic64_xchg() and simplify
the loop.

This function was the last user of __atomic64_read() - remove it.

Also, change the 'real_val' assumption from the somewhat quirky
1ULL << 32 value to the (just as arbitrary, but simpler) value
of 0.

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-04 02:23:55 +0800
1fde902d5 x86: atomic64: Export APIs to modules ... Browse Code »

atomic64_t primitives are used by a handful of drivers,
so export the APIs consistently. These were inlined
before.

Also mark atomic64_32.o a core object, so that the symbols
are available even if not linked to core kernel pieces.

Cc: Eric Dumazet
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-04 02:23:52 +0800

03 Jul, 2009

7 commits

67d7178f8 x86: atomic64: Improve atomic64_read() ... Browse Code »

Optimize atomic64_read() as a special open-coded
cmpxchg8b variant. This generates nicer code:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
431 0 0 431 1af atomic64_32.o.after

md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
2bdfd4bd1f6b7b61b7fc127aef90ce3b atomic64_32.o.after.asm

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-03 20:42:59 +0800
199e23780 x86: atomic64: Fix unclean type use in atomic64_xchg() ... Browse Code »

Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.

Fix atomic64_xchg() to use __atomic64_read() instead.

No code changed:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after

md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.after.asm

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:46 +0800
3ac805d2a x86: atomic64: Reduce size of functions ... Browse Code »

cmpxchg8b is a huge instruction in terms of register footprint,
we almost never want to inline it, not even within the same
code module.

GCC 4.3 still messes up for two functions, under-judging the
true cost of this instruction - so annotate two key functions
to reduce the bloat:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
1763 0 0 1763 6e3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after

Cc: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:43 +0800
824975ef1 x86: atomic64: Improve atomic64_add_return() ... Browse Code »

Linus noted (based on Eric Dumazet's numbers) that we would
probably be better off not trying an atomic_read() in
atomic64_add_return() but intead intentionally let the first
cmpxchg8b fail - to get a cache-friendly 'give me ownership
of this cacheline' transaction. That can then be followed
by the real cmpxchg8b which sets the value local to the CPU.

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:42 +0800
69237f94e x86: atomic64: Improve cmpxchg8b() ... Browse Code »

Rewrite cmpxchg8b() to not use %edi register but a generic "+m"
constraint, to increase compiler freedom in code generation and
possibly better code.

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-03 19:26:41 +0800
aacf682fd x86: atomic64: Improve atomic64_read() ... Browse Code »

Linus noticed that the 32-bit version of atomic64_read() was
being overly complex with re-reading the value and doing a
retry loop over that.

Instead we can just rely on cmpxchg8b returning either the new
value or returning the current value.

We can use any 'old' value, which will be faster as it can be
loaded via immediates. Using some value that is not equal to
the real value in memory the instruction gets faster.

This also has the advantage that the CPU could avoid dirtying
the cacheline.

Reported-by: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-03 19:26:40 +0800
b7882b7c6 x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file ... Browse Code »

Linus noted that the atomic64_t primitives are all inlines
currently which is crazy because these functions have a large
register footprint anyway.

Move them to a separate file: arch/x86/lib/atomic64_32.c

Also, while at it, rename all uses of 'unsigned long long' to
the much shorter u64.

This makes the appearance of the prototypes a lot nicer - and
it also uncovered a few bugs where (yet unused) API variants
had 'long' as their return type instead of u64.

[ More intrusive changes are not yet done in this patch. ]

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:39 +0800