Eric Lee / smarc-fsl-linux-kernel

07 Jan, 2012

1 commit

69734b644 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86: Fix atomic64_xxx_cx8() functions
x86: Fix and improve cmpxchg_double{,_local}()
x86_64, asm: Optimise fls(), ffs() and fls64()
x86, bitops: Move fls64.h inside __KERNEL__
x86: Fix and improve percpu_cmpxchg{8,16}b_double()
x86: Report cpb and eff_freq_ro flags correctly
x86/i386: Use less assembly in strlen(), speed things up a bit
x86: Use the same node_distance for 32 and 64-bit
x86: Fix rflags in FAKE_STACK_FRAME
x86: Clean up and extend do_int3()
x86: Call do_notify_resume() with interrupts enabled
x86/div64: Add a micro-optimization shortcut if base is power of two
x86-64: Cleanup some assembly entry points
x86-64: Slightly shorten line system call entry and exit paths
x86-64: Reduce amount of redundant code generated for invalidate_interruptNN
x86-64: Slightly shorten int_ret_from_sys_call
x86, efi: Convert efi_phys_get_time() args to physical addresses
x86: Default to vsyscall=emulate
x86-64: Set siginfo and context on vsyscall emulation faults
x86: consolidate xchg and xadd macros
...

Linus Torvalds
2012-01-07 05:59:14 +0800

13 Dec, 2011

1 commit

890890cb8 x86/i386: Use less assembly in strlen(), speed things up a bit ... Browse Code »

Current i386 strlen() hardcodes NOT/DEC sequence. DEC is
mentioned to be suboptimal on Core2. So, put only REPNE SCASB
sequence in assembly, compiler can do the rest.

The difference in generated code is like below (MCORE2=y):

:
push %edi
mov $0xffffffff,%ecx
mov %eax,%edi
xor %eax,%eax
repnz scas %es:(%edi),%al
not %ecx

- dec %ecx
- mov %ecx,%eax
+ lea -0x1(%ecx),%eax

pop %edi
ret

Signed-off-by: Alexey Dobriyan
Cc: Linus Torvalds
Cc: Jan Beulich
Link: http://lkml.kernel.org/r/20111211181319.GA17097@p183.telecom.by
Signed-off-by: Ingo Molnar

Alexey Dobriyan
2011-12-13 01:33:42 +0800

05 Dec, 2011

2 commits

a9c373d03 x86: Update instruction decoder to support new AVX formats ... Browse Code »

Since new Intel software developers manual introduces
new format for AVX instruction set (including AVX2),
it is important to update x86-opcode-map.txt to fit
those changes.

Signed-off-by: Masami Hiramatsu
Cc: "H. Peter Anvin"
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120557.15475.13236.stgit@cloud
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2011-12-05 21:53:21 +0800
130b78b2b x86: Fix instruction decoder to handle grouped AVX instructions ... Browse Code »

For reducing memory usage of attribute table, x86 instruction
decoder puts "Group" attribute only on "no-last-prefix"
attribute table (same as vex_p == 0 case).

Thus, the decoder should look no-last-prefix table first, and
then only if it is not a group, move on to "with-last-prefix"
table (vex_p != 0).

However, current implementation, inat_get_avx_attribute()
looks with-last-prefix directly. So, when decoding
a grouped AVX instruction, the decoder fails to find correct
group because there is no "Group" attribute on the table.
This ends up with the mis-decoding of instructions, as Ingo
reported in http://thread.gmane.org/gmane.linux.kernel/1214103

This patch fixes it to check no-last-prefix table first
even if that is an AVX instruction, and get an attribute from
"with last-prefix" table only if that is not a group.

Reported-by: Ingo Molnar
Signed-off-by: Masami Hiramatsu
Cc: "H. Peter Anvin"
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120539.15475.91428.stgit@cloud
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2011-12-05 21:53:15 +0800

10 Oct, 2011

1 commit

53a019a95 x86: Fix insn decoder for longer instruction ... Browse Code »

Fix x86 insn decoder for hardening against invalid length
instructions. This adds length checkings for each byte-read
site and if it exceeds MAX_INSN_SIZE, returns immediately.
This can happen when decoding user-space binary.

Caller can check whether it happened by checking insn.*.got
member is set or not.

Signed-off-by: Masami Hiramatsu
Cc: Stephane Eranian
Cc: Andi Kleen
Cc: acme@redhat.com
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: ravitillo@lbl.gov
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Peter Zijlstra
Cc: Srikar Dronamraju
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111007133155.10933.58577.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2011-10-10 15:05:51 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

23 Jul, 2011

2 commits

8e204874d Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, vdso: Do not allocate memory for the vDSO
clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option
x86, vdso: Drop now wrong comment
Document the vDSO and add a reference parser
ia64: Replace clocksource.fsys_mmio with generic arch data
x86-64: Move vread_tsc and vread_hpet into the vDSO
clocksource: Replace vread with generic arch data
x86-64: Add --no-undefined to vDSO build
x86-64: Allow alternative patching in the vDSO
x86: Make alternative instruction pointers relative
x86-64: Improve vsyscall emulation CS and RIP handling
x86-64: Emulate legacy vsyscalls
x86-64: Fill unused parts of the vsyscall page with 0xcc
x86-64: Remove vsyscall number 3 (venosys)
x86-64: Map the HPET NX
x86-64: Remove kernel.vsyscall64 sysctl
x86-64: Give vvars their own page
x86-64: Document some of entry_64.S
x86-64: Fix alignment of jiffies variable

Linus Torvalds
2011-07-23 08:05:15 +0800
eb47418dc Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix write lock scalability 64-bit issue
x86: Unify rwsem assembly implementation
x86: Unify rwlock assembly implementation
x86, asm: Fix binutils 2.16 issue with __USER32_CS
x86, asm: Cleanup thunk_64.S
x86, asm: Flip RESTORE_ARGS arguments logic
x86, asm: Flip SAVE_ARGS arguments logic
x86, asm: Thin down SAVE/RESTORE_* asm macros

Linus Torvalds
2011-07-23 08:02:24 +0800

22 Jul, 2011

1 commit

1ac2e6ca4 x86, perf: Make copy_from_user_nmi() a library function ... Browse Code »

copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.

Signed-off-by: Robert Richter
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.com
Signed-off-by: Ingo Molnar

Robert Richter
2011-07-22 02:41:57 +0800

21 Jul, 2011

3 commits

a750036f3 x86: Fix write lock scalability 64-bit issue ... Browse Code »

With the write lock path simply subtracting RW_LOCK_BIAS there
is, on large systems, the theoretical possibility of overflowing
the 32-bit value that was used so far (namely if 128 or more
CPUs manage to do the subtraction, but don't get to do the
inverse addition in the failure path quickly enough).

A first measure is to modify RW_LOCK_BIAS itself - with the new
value chosen, it is good for up to 2048 CPUs each allowed to
nest over 2048 times on the read path without causing an issue.
Quite possibly it would even be sufficient to adjust the bias a
little further, assuming that allowing for significantly less
nesting would suffice.

However, as the original value chosen allowed for even more
nesting levels, to support more than 2048 CPUs (possible
currently only for 64-bit kernels) the lock itself gets widened
to 64 bits.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-07-21 15:03:36 +0800
a73866946 x86: Unify rwsem assembly implementation ... Browse Code »

Rather than having two functionally identical implementations
for 32- and 64-bit configurations, use the previously extended
assembly abstractions to fold the rwsem two implementations into
a shared one.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/4E258DF3020000780004E3ED@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-07-21 15:03:32 +0800
4625cd637 x86: Unify rwlock assembly implementation ... Browse Code »

Rather than having two functionally identical implementations
for 32- and 64-bit configurations, extend the existing assembly
abstractions enough to fold the two rwlock implementations into
a shared one.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-07-21 15:03:31 +0800

14 Jul, 2011

1 commit

59e97e4d6 x86: Make alternative instruction pointers relative ... Browse Code »

This save a few bytes on x86-64 and means that future patches can
apply alternatives to unrelocated code.

Signed-off-by: Andy Lutomirski
Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin

Andy Lutomirski
2011-07-14 02:22:56 +0800

04 Jun, 2011

1 commit

38e6b75d3 x86, asm: Cleanup thunk_64.S ... Browse Code »

Drop thunk_ra macro in favor of an additional argument to the thunk
macro since their bodies are almost identical. Do a whitespace scrubbing
and use CFI-aware macros for full annotation.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1306873314-32523-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin

Borislav Petkov
2011-06-04 05:38:55 +0800

20 May, 2011

1 commit

17b141803 Merge branches 'x86-apic-for-linus', 'x86-asm-for-linus' and 'x86-cleanups-for-l… ... Browse Code »

…inus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, apic: Print verbose error interrupt reason on apic=debug

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Demacro CONFIG_PARAVIRT cpu accessors

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix mrst sparse complaints
x86: Fix spelling error in the memcpy() source code comment
x86, mpparse: Remove unnecessary variable

Linus Torvalds
2011-05-20 08:49:35 +0800

18 May, 2011

6 commits

26afb7c66 x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit ... Browse Code »

As reported in BZ #30352:

https://bugzilla.kernel.org/show_bug.cgi?id=30352

there's a kernel bug related to reading the last allowed page on x86_64.

The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:

if (buf + size >= limit)
fail();

while it should be more permissive:

if (buf + size > limit)
fail();

That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".

Following program fails to use the last page as buffer
due to the wrong limit check:

#include
#include
#include

#define PAGE_SIZE (4096)
#define LAST_PAGE ((void*)(0x7fffffffe000))

int main()
{
int fds[2], err;
void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
assert(ptr == LAST_PAGE);
err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
assert(err == 0);
err = send(fds[0], ptr, PAGE_SIZE, 0);
perror("send");
assert(err == PAGE_SIZE);
err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
perror("recv");
assert(err == PAGE_SIZE);
return 0;
}

The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.

The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).

However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.

This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.

Signed-off-by: Jiri Olsa
Acked-by: Brian Gerst
Acked-by: Linus Torvalds
Cc:
Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2011-05-18 18:49:00 +0800
2f19e06ac x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB ... Browse Code »

Support memset() with enhanced rep stosb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb
overrides the fast string alternative memset_c and the original function.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:31 +0800
057e05c1d x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB ... Browse Code »

Support memmove() by enhanced rep movsb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb
overrides the original function.

The patch doesn't change the backward memmove case to use enhanced rep
movsb.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:30 +0800
101068c1f x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB ... Browse Code »

Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:29 +0800
4307bec93 x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSB ... Browse Code »

Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB.
On processors supporting enhanced REP MOVSB/STOSB, the alternative
copy_user_enhanced_fast_string function using enhanced rep movsb overrides the
original function and the fast string function.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:28 +0800
e365c9df2 x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB ... Browse Code »

Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB
/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative
clear_page_c_e function using enhanced REP STOSB overrides the original function
and the fast string function.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:27 +0800

07 May, 2011

1 commit

4cb1f43ce Merge commit 'v2.6.39-rc6' into x86/cleanups ... Browse Code »

Merge reason: move to a (much) newer upstream base.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-05-07 16:51:48 +0800

02 May, 2011

1 commit

9de4966a4 x86: Fix spelling error in the memcpy() source code comment ... Browse Code »

Signed-off-by: Bart Van Assche
Cc: "H. Peter Anvin"
Link: http://lkml.kernel.org/r/201105011409.21629.bvanassche@acm.org
Signed-off-by: Ingo Molnar

Bart Van Assche
2011-05-02 01:16:18 +0800

28 Mar, 2011

1 commit

d7c3f8cee percpu: Omit segment prefix in the UP case for cmpxchg_double ... Browse Code »

Omit the segment prefix in the UP case. GS is not used then
and we will generate segfaults if cmpxchg16b is used otherwise.

Signed-off-by: Christoph Lameter
Signed-off-by: Linus Torvalds

Christoph Lameter
2011-03-28 10:25:36 +0800

19 Mar, 2011

1 commit

f2e1fbb5f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Flush TLB if PGD entry is changed in i386 PAE mode
x86, dumpstack: Correct stack dump info when frame pointer is available
x86: Clean up csum-copy_64.S a bit
x86: Fix common misspellings
x86: Fix misspelling and align params
x86: Use PentiumPro-optimized partial_csum() on VIA C7

Linus Torvalds
2011-03-19 01:45:21 +0800

18 Mar, 2011

2 commits

2c76397bd x86: Clean up csum-copy_64.S a bit ... Browse Code »

The many stray whitespaces and other uncleanlinesses made this code
almost unreadable to me - so fix those.

No changes to the code.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-03-18 17:44:26 +0800
0d2eb44f6 x86: Fix common misspellings ... Browse Code »

They were generated by 'codespell' and then manually reviewed.

Signed-off-by: Lucas De Marchi
Cc: trivial@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar

Lucas De Marchi
2011-03-18 17:39:30 +0800

16 Mar, 2011

2 commits

79d8a8f73 Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support
percpu: Generic support for this_cpu_cmpxchg_double()
alpha: use L1_CACHE_BYTES for cacheline size in the linker script
percpu: align percpu readmostly subsection to cacheline

Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the
percpu alignment having changed ("x86: Reduce back the alignment of the
per-CPU data section")

Linus Torvalds
2011-03-16 23:22:41 +0800
d5d42399b Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, mem: Convert memmove() to assembly file and fix return value bug

Linus Torvalds
2011-03-16 10:41:42 +0800

02 Mar, 2011

1 commit

e938c287e x86: Fix a bogus unwind annotation in lib/semaphore_32.S ... Browse Code »

'simple' would have required specifying current frame address
and return address location manually, but that's obviously not
the case (and not necessary) here.

Signed-off-by: Jan Beulich
LKML-Reference:
Signed-off-by: Ingo Molnar

Jan Beulich
2011-03-02 15:16:44 +0800

01 Mar, 2011

3 commits

039e13890 x86: Remove unused bits from lib/thunk_*.S ... Browse Code »

Some of the items removed were apparently never used, others
simply didn't get removed with their last user.

Signed-off-by: Jan Beulich
LKML-Reference:
Signed-off-by: Ingo Molnar

Jan Beulich
2011-03-01 01:06:22 +0800
60cf637a1 x86: Use {push,pop}_cfi in more places ... Browse Code »

Cleaning up and shortening code...

Signed-off-by: Jan Beulich
Cc: Alexander van Heukelum
LKML-Reference:
Signed-off-by: Ingo Molnar

Jan Beulich
2011-03-01 01:06:22 +0800
39f2205e1 x86-64: Add CFI annotations to lib/rwsem_64.S ... Browse Code »

These weren't part of the initial commit of this code.

Signed-off-by: Jan Beulich
Cc: Alexander van Heukelum
LKML-Reference:
Signed-off-by: Ingo Molnar

Jan Beulich
2011-03-01 01:06:21 +0800

28 Feb, 2011

1 commit

b9ec40af0 percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support ... Browse Code »

Support this_cpu_cmpxchg_double() using the cmpxchg16b and cmpxchg8b
instructions.

-tj: s/percpu_cmpxchg16b/percpu_cmpxchg16b_double/ for consistency and
other cosmetic changes.

Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
2011-02-28 18:20:49 +0800

26 Jan, 2011

1 commit

9599ec047 x86-64, mem: Convert memmove() to assembly file and fix return value bug ... Browse Code »

memmove_64.c only implements memmove() function which is completely written in
inline assembly code. Therefore it doesn't make sense to keep the assembly code
in .c file.

Currently memmove() doesn't store return value to rax. This may cause issue if
caller uses the return value. The patch fixes this issue.

Signed-off-by: Fenghua Yu
LKML-Reference:
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-01-26 08:58:39 +0800

04 Jan, 2011

1 commit

357089fca x86: udelay: Use this_cpu_read to avoid address calculation ... Browse Code »

The code will use a segment prefix instead of doing the lookup and
calculation.

Signed-off-by: Christoph Lameter
Acked-by: "H. Peter Anvin"
Signed-off-by: Tejun Heo

Christoph Lameter
2011-01-04 13:08:55 +0800

25 Sep, 2010

1 commit

3b4b682be x86, mem: Optimize memmove for small size and unaligned cases ... Browse Code »

movs instruction will combine data to accelerate moving data,
however we need to concern two cases about it.

1. movs instruction need long lantency to startup,
so here we use general mov instruction to copy data.
2. movs instruction is not good for unaligned case,
even if src offset is 0x10, dest offset is 0x0,
we avoid and handle the case by general mov instruction.

Signed-off-by: Ma Ling
LKML-Reference:
Signed-off-by: H. Peter Anvin

Ma Ling
2010-09-25 09:57:11 +0800

24 Aug, 2010

2 commits

59daa706f x86, mem: Optimize memcpy by avoiding memory false dependece ... Browse Code »

All read operations after allocation stage can run speculatively,
all write operation will run in program order, and if addresses are
different read may run before older write operation, otherwise wait
until write commit. However CPU don't check each address bit,
so read could fail to recognize different address even they
are in different page.For example if rsi is 0xf004, rdi is 0xe008,
in following operation there will generate big performance latency.
1. movq (%rsi), %rax
2. movq %rax, (%rdi)
3. movq 8(%rsi), %rax
4. movq %rax, 8(%rdi)

If %rsi and rdi were in really the same meory page, there are TRUE
read-after-write dependence because instruction 2 write 0x008 and
instruction 3 read 0x00c, the two address are overlap partially.
Actually there are in different page and no any issues,
but without checking each address bit CPU could think they are
in the same page, and instruction 3 have to wait for instruction 2
to write data into cache from write buffer, then load data from cache,
the cost time read spent is equal to mfence instruction. We may avoid it by
tuning operation sequence as follow.

1. movq 8(%rsi), %rax
2. movq %rax, 8(%rdi)
3. movq (%rsi), %rax
4. movq %rax, (%rdi)

Instruction 3 read 0x004, instruction 2 write address 0x010, no any
dependence. At last on Core2 we gain 1.83x speedup compared with
original instruction sequence. In this patch we first handle small
size(less 20bytes), then jump to different copy mode. Based on our
micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X
improvement, and up to 1.5X improvement for 1024 bytes on Corei7. (We
use our micro-benchmark, and will do further test according to your
requirment)

Signed-off-by: Ma Ling
LKML-Reference:
Signed-off-by: H. Peter Anvin

Ma Ling
2010-08-24 05:56:41 +0800
fdf428967 x86, mem: Don't implement forward memmove() as memcpy() ... Browse Code »

memmove() allow source and destination address to be overlap, but
there is no such limitation for memcpy(). Therefore, explicitly
implement memmove() in both the forwards and backward directions, to
give us the ability to optimize memcpy().

Signed-off-by: Ma Ling
LKML-Reference:
Signed-off-by: H. Peter Anvin

Ma, Ling
2010-08-24 05:14:27 +0800

14 Aug, 2010

1 commit

c029b55af Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, asm: Use a lower case name for the end macro in atomic64_386_32.S
x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner
x86: Document __phys_reloc_hide() usage in __pa_symbol()
x86, apic: Map the local apic when parsing the MP table.

Linus Torvalds
2010-08-14 01:35:48 +0800