Eric Lee / smarc-fsl-linux-kernel

01 Oct, 2009

2 commits

04edbdef0 x86: Don't generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y ... Browse Code »

Conditionaly compile cmpxchg8b_emu.o and EXPORT_SYMBOL(cmpxchg8b_emu).

This reduces the kernel size a bit.

Signed-off-by: Eric Dumazet
Cc: Arjan van de Ven
Cc: Martin Schwidefsky
Cc: John Stultz
Cc: Peter Zijlstra
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-10-01 14:42:24 +0800
79e1dd05d x86: Provide an alternative() based cmpxchg64() ... Browse Code »

cmpxchg64() today generates, to quote Linus, "barf bag" code.

cmpxchg64() is about to get used in the scheduler to fix a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.

This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systems.

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)

Signed-off-by: Arjan van de Ven
Acked-by: Linus Torvalds
[ fixed asm constraint bug ]
Fixed-by: Eric Dumazet
Cc: Martin Schwidefsky
Cc: John Stultz
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Arjan van de Ven
2009-10-01 04:55:59 +0800

05 Sep, 2009

1 commit

b19ae3999 x86, msr: change msr-reg.o to obj-y, and export its symbols ... Browse Code »

Change msr-reg.o to obj-y (it will be included in virtually every
kernel since it is used by the initialization code for AMD processors)
and add a separate C file to export its symbols to modules, so that
msr.ko can use them; on uniprocessors we bypass the helper functions
in msr.o and use the accessor functions directly via inlines.

Signed-off-by: H. Peter Anvin
LKML-Reference:
Cc: Borislav Petkov

H. Peter Anvin
2009-09-05 01:00:09 +0800

04 Sep, 2009

1 commit

8adf65cfa x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too ... Browse Code »

The macro was defined in the 32-bit path as well - breaking the
build on 32-bit platforms:

arch/x86/lib/msr-reg.S: Assembler messages:
arch/x86/lib/msr-reg.S:53: Error: Bad macro parameter list
arch/x86/lib/msr-reg.S:100: Error: invalid character '_' in mnemonic
arch/x86/lib/msr-reg.S:101: Error: invalid character '_' in mnemonic

Cc: Borislav Petkov
Cc: H. Peter Anvin
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-04 03:26:34 +0800

02 Sep, 2009

1 commit

f6909f394 x86, msr: fix msr-reg.S compilation with gas 2.16.1 ... Browse Code »

msr-reg.S used the :req option on a macro argument, which wasn't
supported by gas 2.16.1 (but apparently by some earlier versions of
gas, just to be confusing.) It isn't necessary, so just remove it.

Signed-off-by: H. Peter Anvin
Cc: Borislav Petkov

H. Peter Anvin
2009-09-02 04:37:21 +0800

01 Sep, 2009

3 commits

8b956bf1f x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs() ... Browse Code »

Create _on_cpu helpers for {rw,wr}msr_safe_regs() analogously with the
other MSR functions. This will be necessary to add support for these
to the MSR driver.

Signed-off-by: H. Peter Anvin
Cc: Borislav Petkov

H. Peter Anvin
2009-09-01 07:15:57 +0800
79c5dca36 x86, msr: CFI annotations, cleanups for msr-reg.S ... Browse Code »

Add CFI annotations for native_{rd,wr}msr_safe_regs().
Simplify the 64-bit implementation: we don't allow the upper half
registers to be set, and so we can use them to carry state across the
operation.

Signed-off-by: H. Peter Anvin
Cc: Borislav Petkov
LKML-Reference:

H. Peter Anvin
2009-09-01 06:14:47 +0800
132ec92f3 x86, msr: Add rd/wrmsr interfaces with preset registers ... Browse Code »

native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow
presetting of a subset of eight x86 GPRs before executing the rd/wrmsr
instructions. This is needed at least on AMD K8 for accessing an erratum
workaround MSR.

Originally based on an idea by H. Peter Anvin.

Signed-off-by: Borislav Petkov
LKML-Reference:
Signed-off-by: H. Peter Anvin

Borislav Petkov
2009-09-01 06:14:26 +0800

04 Aug, 2009

1 commit

bab9a3da9 x86, msr: execute on the correct CPU subset ... Browse Code »

Make rdmsr_on_cpus/wrmsr_on_cpus execute on the current CPU only if it
is in the supplied bitmask.

Signed-off-by: Borislav Petkov
Signed-off-by: H. Peter Anvin

Borislav Petkov
2009-08-04 05:48:13 +0800

11 Jul, 2009

3 commits

69ca06c94 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: reset oom_cfqq in cfq_set_request()
block: fix sg SG_DXFER_TO_FROM_DEV regression
block: call blk_scsi_ioctl_init()
Fix congestion_wait() sync/async vs read/write confusion

Linus Torvalds
2009-07-11 05:29:58 +0800
85be928c4 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tip/linux-2.6-tip

* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
perf report: Add "Fractal" mode output - support callchains with relative overhead rate
perf_counter tools: callchains: Manage the cumul hits on the fly
perf report: Change default callchain parameters
perf report: Use a modifiable string for default callchain options
perf report: Warn on callchain output request from non-callchain file
x86: atomic64: Inline atomic64_read() again
x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
x86: atomic64: Improve atomic64_xchg()
x86: atomic64: Export APIs to modules
x86: atomic64: Improve atomic64_read()
x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
x86: atomic64: Fix unclean type use in atomic64_xchg()
x86: atomic64: Make atomic_read() type-safe
x86: atomic64: Reduce size of functions
x86: atomic64: Improve atomic64_add_return()
x86: atomic64: Improve cmpxchg8b()
x86: atomic64: Improve atomic64_read()
x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
perf report: Annotate variable initialization
...

Linus Torvalds
2009-07-11 05:25:03 +0800
8aa7e847d Fix congestion_wait() sync/async vs read/write confusion ... Browse Code »

Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

Signed-off-by: Jens Axboe

Jens Axboe
2009-07-11 02:31:53 +0800

04 Jul, 2009

4 commits

a79f0da80 x86: atomic64: Inline atomic64_read() again ... Browse Code »

Now atomic64_read() is light weight (no register pressure and
small icache), we can inline it again.

Also use "=&A" constraint instead of "+A" to avoid warning
about unitialized 'res' variable. (gcc had to force 0 in eax/edx)

$ size vmlinux.prev vmlinux.after
text data bss dec hex filename
4908667 451676 1684868 7045211 6b805b vmlinux.prev
4908651 451676 1684868 7045195 6b804b vmlinux.after

Signed-off-by: Eric Dumazet
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
[ Also fix typo in atomic64_set() export ]
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-04 17:45:00 +0800
ddf9a003d x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative() ... Browse Code »

Linus noticed that the variable name 'old_val' is
confusingly named in these functions - the correct
naming is 'new_val'.

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-04 03:15:08 +0800
3a8d1788b x86: atomic64: Improve atomic64_xchg() ... Browse Code »

Remove the read-first logic from atomic64_xchg() and simplify
the loop.

This function was the last user of __atomic64_read() - remove it.

Also, change the 'real_val' assumption from the somewhat quirky
1ULL << 32 value to the (just as arbitrary, but simpler) value
of 0.

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-04 02:23:55 +0800
1fde902d5 x86: atomic64: Export APIs to modules ... Browse Code »

atomic64_t primitives are used by a handful of drivers,
so export the APIs consistently. These were inlined
before.

Also mark atomic64_32.o a core object, so that the symbols
are available even if not linked to core kernel pieces.

Cc: Eric Dumazet
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-04 02:23:52 +0800

03 Jul, 2009

8 commits

67d7178f8 x86: atomic64: Improve atomic64_read() ... Browse Code »

Optimize atomic64_read() as a special open-coded
cmpxchg8b variant. This generates nicer code:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
431 0 0 431 1af atomic64_32.o.after

md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
2bdfd4bd1f6b7b61b7fc127aef90ce3b atomic64_32.o.after.asm

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-03 20:42:59 +0800
3fd382ced x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user ... Browse Code »

While examining symbol generation in perf_counter tools, I
noticed that copy_to_user() had no size in vmlinux's symtab.

Signed-off-by: Mike Galbraith
Acked-by: Alexander van Heukelum
Acked-by: Cyrill Gorcunov
LKML-Reference:
Signed-off-by: Ingo Molnar

Mike Galbraith
2009-07-03 20:34:17 +0800
199e23780 x86: atomic64: Fix unclean type use in atomic64_xchg() ... Browse Code »

Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.

Fix atomic64_xchg() to use __atomic64_read() instead.

No code changed:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after

md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.after.asm

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:46 +0800
3ac805d2a x86: atomic64: Reduce size of functions ... Browse Code »

cmpxchg8b is a huge instruction in terms of register footprint,
we almost never want to inline it, not even within the same
code module.

GCC 4.3 still messes up for two functions, under-judging the
true cost of this instruction - so annotate two key functions
to reduce the bloat:

arch/x86/lib/atomic64_32.o:

text data bss dec hex filename
1763 0 0 1763 6e3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after

Cc: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:43 +0800
824975ef1 x86: atomic64: Improve atomic64_add_return() ... Browse Code »

Linus noted (based on Eric Dumazet's numbers) that we would
probably be better off not trying an atomic_read() in
atomic64_add_return() but intead intentionally let the first
cmpxchg8b fail - to get a cache-friendly 'give me ownership
of this cacheline' transaction. That can then be followed
by the real cmpxchg8b which sets the value local to the CPU.

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:42 +0800
69237f94e x86: atomic64: Improve cmpxchg8b() ... Browse Code »

Rewrite cmpxchg8b() to not use %edi register but a generic "+m"
constraint, to increase compiler freedom in code generation and
possibly better code.

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-03 19:26:41 +0800
aacf682fd x86: atomic64: Improve atomic64_read() ... Browse Code »

Linus noticed that the 32-bit version of atomic64_read() was
being overly complex with re-reading the value and doing a
retry loop over that.

Instead we can just rely on cmpxchg8b returning either the new
value or returning the current value.

We can use any 'old' value, which will be faster as it can be
loaded via immediates. Using some value that is not equal to
the real value in memory the instruction gets faster.

This also has the advantage that the CPU could avoid dirtying
the cacheline.

Reported-by: Linus Torvalds
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Eric Dumazet
2009-07-03 19:26:40 +0800
b7882b7c6 x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file ... Browse Code »

Linus noted that the atomic64_t primitives are all inlines
currently which is crazy because these functions have a large
register footprint anyway.

Move them to a separate file: arch/x86/lib/atomic64_32.c

Also, while at it, rename all uses of 'unsigned long long' to
the much shorter u64.

This makes the appearance of the prototypes a lot nicer - and
it also uncovered a few bugs where (yet unused) API variants
had 'long' as their return type instead of u64.

[ More intrusive changes are not yet done in this patch. ]

Reported-by: Linus Torvalds
Cc: Eric Dumazet
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: David Howells
Cc: Andrew Morton
Cc: Arnd Bergmann
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-07-03 19:26:39 +0800

01 Jul, 2009

1 commit

9e314996e x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c ... Browse Code »

Noticed the zero-sized function symbol while looking at 'perf' profiles,
it causes the profiler to display those addresses in hexa.

Turns out that this was wrong/bogus for an eternity.

Signed-off-by: Mike Galbraith
Acked-by: Alexander van Heukelum
Acked-by: Cyrill Gorcunov
LKML-Reference:
Signed-off-by: Ingo Molnar

Mike Galbraith
2009-07-01 05:43:15 +0800

26 Jun, 2009

1 commit

e888d7fac x86, delay: tsc based udelay should have rdtsc_barrier ... Browse Code »

delay_tsc needs rdtsc_barrier to provide proper delay.

Output from a test driver using hpet to cross check delay
provided by udelay().

Before:
[ 86.794363] Expected delay 5us actual 4679ns
[ 87.154362] Expected delay 5us actual 698ns
[ 87.514162] Expected delay 5us actual 4539ns
[ 88.653716] Expected delay 5us actual 4539ns
[ 94.664106] Expected delay 10us actual 9638ns
[ 95.049351] Expected delay 10us actual 10126ns
[ 95.416110] Expected delay 10us actual 9568ns
[ 95.799216] Expected delay 10us actual 9638ns
[ 103.624104] Expected delay 10us actual 9707ns
[ 104.020619] Expected delay 10us actual 768ns
[ 104.419951] Expected delay 10us actual 9707ns

After:
[ 50.983320] Expected delay 5us actual 5587ns
[ 51.261807] Expected delay 5us actual 5587ns
[ 51.565715] Expected delay 5us actual 5657ns
[ 51.861171] Expected delay 5us actual 5587ns
[ 52.164704] Expected delay 5us actual 5726ns
[ 52.487457] Expected delay 5us actual 5657ns
[ 52.789338] Expected delay 5us actual 5726ns
[ 57.119680] Expected delay 10us actual 10755ns
[ 57.893997] Expected delay 10us actual 10615ns
[ 58.261287] Expected delay 10us actual 10755ns
[ 58.620505] Expected delay 10us actual 10825ns
[ 58.941035] Expected delay 10us actual 10755ns
[ 59.320903] Expected delay 10us actual 10615ns
[ 61.306311] Expected delay 10us actual 10755ns
[ 61.520542] Expected delay 10us actual 10615ns

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: H. Peter Anvin

Pallipadi, Venkatesh
2009-06-26 07:47:40 +0800

21 Jun, 2009

1 commit

9063c61fd x86, 64-bit: Clean up user address masking ... Browse Code »

The discussion about using "access_ok()" in get_user_pages_fast() (see
commit 7f8189068726492950bf1a2dcfd9b51314560abf: "x86: don't use
'access_ok()' as a range check in get_user_pages_fast()" for details and
end result), made us notice that x86-64 was really being very sloppy
about virtual address checking.

So be way more careful and straightforward about masking x86-64 virtual
addresses:

- All the VIRTUAL_MASK* variants now cover half of the address
space, it's not like we can use the full mask on a signed
integer, and the larger mask just invites mistakes when
applying it to either half of the 48-bit address space.

- /proc/kcore's kc_offset_to_vaddr() becomes a lot more
obvious when it transforms a file offset into a
(kernel-half) virtual address.

- Unify/simplify the 32-bit and 64-bit USER_DS definition to
be based on TASK_SIZE_MAX.

This cleanup and more careful/obvious user virtual address checking also
uncovered a buglet in the x86-64 implementation of strnlen_user(): it
would do an "access_ok()" check on the whole potential area, even if the
string itself was much shorter, and thus return an error even for valid
strings. Our sloppy checking had hidden this.

So this fixes 'strnlen_user()' to do this properly, the same way we
already handled user strings in 'strncpy_from_user()'. Namely by just
checking the first byte, and then relying on fault handling for the
rest. That always works, since we impose a guard page that cannot be
mapped at the end of the user space address space (and even if we
didn't, we'd have the address space hole).

Acked-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andrew Morton
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: H. Peter Anvin
Cc: Thomas Gleixner
Cc: Alan Cox
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-06-21 06:40:00 +0800

10 Jun, 2009

2 commits

b034c19f9 x86: MSR: add methods for writing of an MSR on several CPUs ... Browse Code »

Provide for concurrent MSR writes on all the CPUs in the cpumask. Also,
add a temporary workaround for smp_call_function_many which skips the
CPU we're executing on.

Bart: zero out rv struct which is allocated on stack.

CC: H. Peter Anvin
Signed-off-by: Borislav Petkov
Signed-off-by: Bartlomiej Zolnierkiewicz

Borislav Petkov
2009-06-10 18:18:43 +0800
6bc1096d7 x86: MSR: add a struct representation of an MSR ... Browse Code »

Add a struct representing a 64bit MSR pair consisting of a low and high
register part and convert msr_info to use it. Also, rename msr-on-cpu.c
to msr.c.

Side note: Put the cpumask.h include in __KERNEL__ space thus fixing an
allmodconfig build failure in the headers_check target.

CC: H. Peter Anvin
Signed-off-by: Borislav Petkov

Borislav Petkov
2009-06-10 18:18:42 +0800

12 Mar, 2009

2 commits

f3b6eaf01 x86: memcpy, clean up ... Browse Code »

Impact: cleanup

Make this file more readable by bringing it more in line
with the usual kernel style.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-03-12 19:21:17 +0800
dd1ef4ec4 x86-64: remove unnecessary spill/reload of rbx from memcpy ... Browse Code »

Impact: micro-optimization

This should slightly improve its performance.

Signed-off-by: Jan Beulich
LKML-Reference:
Signed-off-by: Ingo Molnar

Jan Beulich
2009-03-12 19:04:47 +0800

14 Feb, 2009

1 commit

0341c14da x86: use _types.h headers in asm where available ... Browse Code »

In general, the only definitions that assembly files can use
are in _types.S headers (where available), so convert them.

Signed-off-by: Jeremy Fitzhardinge

Jeremy Fitzhardinge
2009-02-14 03:35:01 +0800

21 Jan, 2009

1 commit

e0a96129d x86: use early clobbers in usercopy*.c ... Browse Code »

Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions

Hugh Dickins noticed that strncpy_from_user() was miscompiled
in some circumstances with gcc 4.3.

Thanks to Hugh's excellent analysis it was easy to track down.

Hugh writes:

> Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
> except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
> and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
> might_fault() there, which hides the issue): using a
> gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
>
> It generates the following:
>
> 0000000000000000 :
> 0: 48 89 d1 mov %rdx,%rcx
> 3: 48 85 c9 test %rcx,%rcx
> 6: 74 0e je 16
> 8: ac lods %ds:(%rsi),%al
> 9: aa stos %al,%es:(%rdi)
> a: 84 c0 test %al,%al
> c: 74 05 je 13
> e: 48 ff c9 dec %rcx
> 11: 75 f5 jne 8
> 13: 48 29 c9 sub %rcx,%rcx
> 16: 48 89 c8 mov %rcx,%rax
> 19: c3 retq
>
> Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
> (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
> Isn't it returning 0 when it ought to be returning strlen?

The asm constraints for the strncpy_from_user() result were missing an
early clobber, which tells gcc that the last output arguments
are written before all input arguments are read.

Also add more early clobbers in the rest of the file and fix 32-bit
usercopy.c in the same way.

Signed-off-by: Andi Kleen
Signed-off-by: H. Peter Anvin
[ since this API is rarely used and no in-kernel user relies on a 'len'
return value (they only rely on negative return values) this miscompile
was never noticed in the field. But it's worth fixing it nevertheless. ]
Signed-off-by: Ingo Molnar

Andi Kleen
2009-01-21 16:43:17 +0800

28 Oct, 2008

1 commit

d1a76187a Merge commit 'v2.6.28-rc2' into core/locking ... Browse Code »

Conflicts:
arch/um/include/asm/system.h

Ingo Molnar
2008-10-28 23:54:49 +0800

12 Oct, 2008

1 commit

0afe2db21 Merge branch 'x86/unify-cpu-detect' into x86-v28-for-linus-phase4-D ... Browse Code »

Conflicts:
arch/x86/kernel/cpu/common.c
arch/x86/kernel/signal_64.c
include/asm-x86/cpufeature.h

Ingo Molnar
2008-10-12 02:23:20 +0800

12 Sep, 2008

1 commit

1d18ef489 x86: some lock annotations for user copy paths, v3 ... Browse Code »

- add annotation back to clear_user()
- change probe_kernel_address() to _inatomic*() method

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-09-12 03:42:59 +0800

11 Sep, 2008

1 commit

3ee1afa30 x86: some lock annotations for user copy paths, v2 ... Browse Code »

- introduce might_fault()
- handle the atomic user copy paths correctly

[ mingo@elte.hu: move might_sleep() outside of in_atomic(). ]
Signed-off-by: Nick Piggin
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Nick Piggin
2008-09-11 15:44:21 +0800

10 Sep, 2008

1 commit

c10d38dda x86: some lock annotations for user copy paths ... Browse Code »

copy_to/from_user and all its variants (except the atomic ones) can take a
page fault and perform non-trivial work like taking mmap_sem and entering
the filesyste/pagecache.

Unfortunately, this often escapes lockdep because a common pattern is to
use it to read in some arguments just set up from userspace, or write data
back to a hot buffer. In those cases, it will be unlikely for page reclaim
to get a window in to cause copy_*_user to fault.

With the new might_lock primitives, add some annotations to x86. I don't
know if I caught all possible faulting points (it's a bit of a maze, and I
didn't really look at 32-bit). But this is a starting point.

Boots and runs OK so far.

Signed-off-by: Nick Piggin
Acked-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Nick Piggin
2008-09-10 19:48:49 +0800

04 Sep, 2008

1 commit

fb481dd56 x86: drop -funroll-loops for csum_partial_64.c ... Browse Code »

Impact: performance optimization

I did some rebenchmarking with modern compilers and dropping
-funroll-loops makes the function consistently go faster by a few
percent. So drop that flag.

Thanks to Richard Guenther for a hint.

Signed-off-by: Andi Kleen
Signed-off-by: H. Peter Anvin

Andi Kleen
2008-09-04 23:42:06 +0800

28 Aug, 2008

1 commit

b30a72a7e Merge branch 'x86/urgent' into x86/cpu ... Browse Code »

Conflicts:

arch/x86/kernel/cpu/cyrix.c

H. Peter Anvin
2008-08-28 10:17:07 +0800