Eric Lee / smarc-fsl-linux-kernel

17 Apr, 2012

1 commit

4643b0566 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Handle failures of parsing immediate operands in the instruction decoder
perf archive: Correct cutting of symbolic link
perf tools: Ignore auto-generated bison/flex files
perf tools: Fix parsers' rules to dependencies
perf tools: fix NO_GTK2 Makefile config error
perf session: Skip event correctly for unknown id/machine

Linus Torvalds
2012-04-17 09:35:21 +0800

16 Apr, 2012

2 commits

6c7b8e82a x86: Handle failures of parsing immediate operands in the instruction decoder ... Browse Code »

This can happen if the instruction is much longer than the maximum length,
or if insn->opnd_bytes is manually changed.

This patch also fixes warnings from -Wswitch-default flag.

Reported-by: Prashanth Nageshappa
Signed-off-by: Masami Hiramatsu
Cc: Linus Torvalds
Cc: Ananth N Mavinakayanahalli
Cc: Jim Keniston
Cc: Linux-mm
Cc: Oleg Nesterov
Cc: Andi Kleen
Cc: Christoph Hellwig
Cc: Steven Rostedt
Cc: Arnaldo Carvalho de Melo
Cc: Anton Arapov
Cc: Srikar Dronamraju
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120413032427.32577.42602.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2012-04-16 14:56:11 +0800
12e993b89 x86-32: fix up strncpy_from_user() sign error ... Browse Code »
46

The 'max' range needs to be unsigned, since the size of the user address
space is bigger than 2GB.

We know that 'count' is positive in 'long' (that is checked in the
caller), so we will truncate 'max' down to something that fits in a
signed long, but before we actually do that, that comparison needs to be
done in unsigned.

Bug introduced in commit 92ae03f2ef99 ("x86: merge 32/64-bit versions of
'strncpy_from_user()' and speed it up"). On x86-64 you can't trigger
this, since the user address space is much smaller than 63 bits, and on
x86-32 it works in practice, since you would seldom hit the strncpy
limits anyway.

I had actually tested the corner-cases, I had only tested them on
x86-64. Besides, I had only worried about the case of a pointer *close*
to the end of the address space, rather than really far away from it ;)

This also changes the "we hit the user-specified maximum" to return
'res', for the trivial reason that gcc seems to generate better code
that way. 'res' and 'count' are the same in that case, so it really
doesn't matter which one we return.

Signed-off-by: Linus Torvalds

Linus Torvalds
2012-04-16 08:23:00 +0800

12 Apr, 2012

1 commit

92ae03f2e x86: merge 32/64-bit versions of 'strncpy_from_user()' and speed it up ... Browse Code »
138

This merges the 32- and 64-bit versions of the x86 strncpy_from_user()
by just rewriting it in C rather than the ancient inline asm versions
that used lodsb/stosb and had been duplicated for (trivial) differences
between the 32-bit and 64-bit versions.

While doing that, it also speeds them up by doing the accesses a word at
a time. Finally, the new routines also properly handle the case of
hitting the end of the address space, which we have never done correctly
before (fs/namei.c has a hack around it for that reason).

Despite all these improvements, it actually removes more lines than it
adds, due to the de-duplication. Also, we no longer export (or define)
the legacy __strncpy_from_user() function (that was defined to not do
the user permission checks), since it's not actually used anywhere, and
the user address space checks are built in to the new code.

Other architecture maintainers have been notified that the old hack in
fs/namei.c will be going away in the 3.5 merge window, in case they
copied the x86 approach of being a bit cavalier about the end of the
address space.

Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar
Cc: Peter Anvin"
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-04-12 00:41:28 +0800

23 Mar, 2012

2 commits

1b674bf10 Merge branch 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86/atomic changes from Ingo Molnar.

* 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: atomic64 assembly improvements
x86: Adjust asm constraints in atomic64 wrappers

Linus Torvalds
2012-03-23 00:23:57 +0800
e17fdf5c6 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86/asm changes from Ingo Molnar

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Include probe_roms.h in probe_roms.c
x86/32: Print control and debug registers for kerenel context
x86: Tighten dependencies of CPU_SUP_*_32
x86/numa: Improve internode cache alignment
x86: Fix the NMI nesting comments
x86-64: Improve insn scheduling in SAVE_ARGS_IRQ
x86-64: Fix CFI annotations for NMI nesting code
bitops: Add missing parentheses to new get_order macro
bitops: Optimise get_order()
bitops: Adjust the comment on get_order() to describe the size==0 case
x86/spinlocks: Eliminate TICKET_MASK
x86-64: Handle byte-wise tail copying in memcpy() without a loop
x86-64: Fix memcpy() to support sizes of 4Gb and above
x86-64: Fix memset() to support sizes of 4Gb and above
x86-64: Slightly shorten copy_page()

Linus Torvalds
2012-03-23 00:13:24 +0800

22 Mar, 2012

1 commit

9f3938346 Merge branch 'kmap_atomic' of git://github.com/congwang/linux ... Browse Code »

Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...

Linus Torvalds
2012-03-22 00:40:26 +0800

20 Mar, 2012

1 commit

8fd75e121 x86: remove the second argument of k[un]map_atomic() ... Browse Code »

Acked-by: Avi Kivity
Acked-by: Herbert Xu
Signed-off-by: Cong Wang

Cong Wang
2012-03-20 21:48:15 +0800

13 Mar, 2012

1 commit

35239e23c Merge branch 'perf/urgent' into perf/core ... Browse Code »

Merge reason: We are going to queue up a dependent patch.

Signed-off-by: Ingo Molnar

Ingo Molnar
2012-03-13 03:44:11 +0800

10 Mar, 2012

1 commit

a7f4255f9 x86: Derandom delay_tsc for 64 bit ... Browse Code »
1

Commit f0fbf0abc093 ("x86: integrate delay functions") converted
delay_tsc() into a random delay generator for 64 bit. The reason is
that it merged the mostly identical versions of delay_32.c and
delay_64.c. Though the subtle difference of the result was:

static void delay_tsc(unsigned long loops)
{
- unsigned bclock, now;
+ unsigned long bclock, now;

Now the function uses rdtscl() which returns the lower 32bit of the
TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64
bit this fails when the lower 32bit are close to wrap around when
bclock is read, because the following check

if ((now - bclock) >= loops)
break;

evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0
because the unsigned long (now - bclock) of these values results in
0xffffffff00000001 which is definitely larger than the loops
value. That explains Tvortkos observation:

"Because I am seeing udelay(500) (_occasionally_) being short, and
that by delaying for some duration between 0us (yep) and 491us."

Make those variables explicitely u32 again, so this works for both 32
and 64 bit.

Reported-by: Tvrtko Ursulin
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org # >= 2.6.27
Signed-off-by: Linus Torvalds

Thomas Gleixner
2012-03-10 04:43:27 +0800

28 Feb, 2012

1 commit

458ce2910 Merge branch 'linus' into x86/asm ... Browse Code »

Sync up the latest NMI fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2012-02-28 17:27:36 +0800

11 Feb, 2012

1 commit

f8d98f109 x86: Fix to decode grouped AVX with VEX pp bits ... Browse Code »

Fix to decode grouped AVX with VEX pp bits which should be
handled as same as last-prefixes. This fixes below warnings
in posttest with CONFIG_CRYPTO_SHA1_SSSE3=y.

Warning: arch/x86/tools/test_get_len found difference at :ffffffff810d5fc0
Warning: ffffffff810d6069: c5 f9 73 de 04 vpsrldq $0x4,%xmm6,%xmm0
Warning: objdump says 5 bytes, but insn_get_length() says 4
...

With this change, test_get_len can decode it correctly.

$ arch/x86/tools/test_get_len -v -y
ffffffff810d6069: c5 f9 73 de 04 vpsrldq $0x4,%xmm6,%xmm0
Succeed: decoded and checked 1 instructions

Reported-by: Ingo Molnar
Signed-off-by: Masami Hiramatsu
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20120210053340.30429.73410.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2012-02-11 22:11:35 +0800

27 Jan, 2012

2 commits

9d8e22777 x86-64: Handle byte-wise tail copying in memcpy() without a loop ... Browse Code »

While hard to measure, reducing the number of possibly/likely
mis-predicted branches can generally be expected to be slightly
better.

Other than apparent at the first glance, this also doesn't grow
the function size (the alignment gap to the next function just
gets smaller).

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/4F218584020000780006F422@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2012-01-27 04:19:20 +0800
2ab560911 x86-64: Fix memcpy() to support sizes of 4Gb and above ... Browse Code »

While currently there doesn't appear to be any reachable in-tree
case where such large memory blocks may be passed to memcpy(),
we already had hit the problem in our Xen kernels. Just like
done recently for mmeset(), rather than working around it,
prevent others from falling into the same trap by fixing this
long standing limitation.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/4F21846F020000780006F3FA@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2012-01-27 04:19:18 +0800

26 Jan, 2012

1 commit

5d7244e7c x86-64: Fix memset() to support sizes of 4Gb and above ... Browse Code »

While currently there doesn't appear to be any reachable in-tree
case where such large memory blocks may be passed to memset()
(alloc_bootmem() being the primary non-reachable one, as it gets
called with suitably large sizes in FLATMEM configurations), we
have recently hit the problem a second time in our Xen kernels.

Rather than working around it a second time, prevent others from
falling into the same trap by fixing this long standing
limitation.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/4F05D992020000780006AA09@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2012-01-26 18:50:04 +0800

21 Jan, 2012

2 commits

cb8095bba x86: atomic64 assembly improvements ... Browse Code »

In the "xchg" implementation, %ebx and %ecx don't need to be copied
into %eax and %edx respectively (this is only necessary when desiring
to only read the stored value).

In the "add_unless" implementation, swapping the use of %ecx and %esi
for passing arguments allows %esi to become an input only (i.e.
permitting the register to be re-used to address the same object
without reload).

In "{add,sub}_return", doing the initial read64 through the passed in
%ecx decreases a register dependency.

In "inc_not_zero", a branch can be eliminated by or-ing together the
two halves of the current (64-bit) value, and code size can be further
reduced by adjusting the arithmetic slightly.

v2: Undo the folding of "xchg" and "set".

Signed-off-by: Jan Beulich
Link: http://lkml.kernel.org/r/4F19A2BC020000780006E0DC@nat28.tlf.novell.com
Cc: Luca Barbieri
Cc: Eric Dumazet
Signed-off-by: H. Peter Anvin

Jan Beulich
2012-01-21 09:29:49 +0800
819165fb3 x86: Adjust asm constraints in atomic64 wrappers ... Browse Code »

Eric pointed out overly restrictive constraints in atomic64_set(), but
there are issues throughout the file. In the cited case, %ebx and %ecx
are inputs only (don't get changed by either of the two low level
implementations). This was also the case elsewhere.

Further in many cases early-clobber indicators were missing.

Finally, the previous implementation rolled a custom alternative
instruction macro from scratch, rather than using alternative_call()
(which was introduced with the commit that the description of the
change in question actually refers to). Adjusting has the benefit of
not hiding referenced symbols from the compiler, which however requires
them to be declared not just in the exporting source file (which, as a
desirable side effect, in turn allows that exporting file to become a
real 5-line stub).

This patch does not eliminate the overly restrictive memory clobbers,
however: Doing so would occasionally make the compiler set up a second
register for accessing the memory object (to satisfy the added "m"
constraint), and it's not clear which of the two non-optimal
alternatives is better.

v2: Re-do the declaration and exporting of the internal symbols.

Reported-by: Eric Dumazet
Signed-off-by: Jan Beulich
Link: http://lkml.kernel.org/r/4F19A2A5020000780006E0D9@nat28.tlf.novell.com
Cc: Luca Barbieri
Signed-off-by: H. Peter Anvin

Jan Beulich
2012-01-21 09:29:31 +0800

20 Jan, 2012

1 commit

567e47935 Merge branches 'sched-urgent-for-linus', 'perf-urgent-for-linus' and 'x86-urgent… ... Browse Code »

…-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/accounting, proc: Fix /proc/stat interrupts sum

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT
x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore
x86/kprobes: Fix typo transferred from Intel manual

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits
x86, tsc: Fix SMI induced variation in quick_pit_calibrate()
x86, opcode: ANDN and Group 17 in x86-opcode-map.txt
x86/kconfig: Move the ZONE_DMA entry under a menu
x86/UV2: Add accounting for BAU strong nacks
x86/UV2: Ack BAU interrupt earlier
x86/UV2: Remove stale no-resources test for UV2 BAU
x86/UV2: Work around BAU bug
x86/UV2: Fix BAU destination timeout initialization
x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode
x86: Get rid of dubious one-bit signed bitfield

Linus Torvalds
2012-01-20 06:53:06 +0800

18 Jan, 2012

1 commit

ce79dac86 x86, opcode: ANDN and Group 17 in x86-opcode-map.txt ... Browse Code »

The Intel documentation at

http://software.intel.com/file/36945

shows the ANDN opcode and Group 17 with encoding f2 and f3 encoding
respectively. The current version of x86-opcode-map.txt shows them
with f3 and f4. Unless someone can point to documentation which shows
the currently used encoding the following patch be applied.

Signed-off-by: Ulrich Drepper
Link: http://lkml.kernel.org/r/CAOPLpQdq5SuVo9=023CYhbFLAX9rONyjmYq7jJkqc5xwctW5eA@mail.gmail.com
Signed-off-by: H. Peter Anvin

Ulrich Drepper
2012-01-18 04:11:54 +0800

16 Jan, 2012

1 commit

8d973b624 x86/kprobes: Fix typo transferred from Intel manual ... Browse Code »

The arch/x86/lib/x86-opcode-map.txt file [used by the
kprobes instruction decoder] contains the line:

af: SCAS/W/D/Q rAX,Xv

This is what the Intel manuals show, but it's not correct.
The 'X' stands for:

Memory addressed by the DS:rSI register pair (for example, MOVS, CMPS, OUTS, or LODS).

On the other hand 'Y' means (also see the ae byte entry for
SCASB):

Memory addressed by the ES:rDI register pair (for example, MOVS, CMPS, INS, STOS, or SCAS).

Signed-off-by: Ulrich Drepper
Acked-by: Masami Hiramatsu
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/CAOPLpQfytPyDEBF1Hbkpo7ovUerEsstVGxBr%3DEpDL-BKEMaqLA@mail.gmail.com
Signed-off-by: Ingo Molnar

Ulrich Drepper
2012-01-16 15:20:36 +0800

07 Jan, 2012

1 commit

69734b644 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86: Fix atomic64_xxx_cx8() functions
x86: Fix and improve cmpxchg_double{,_local}()
x86_64, asm: Optimise fls(), ffs() and fls64()
x86, bitops: Move fls64.h inside __KERNEL__
x86: Fix and improve percpu_cmpxchg{8,16}b_double()
x86: Report cpb and eff_freq_ro flags correctly
x86/i386: Use less assembly in strlen(), speed things up a bit
x86: Use the same node_distance for 32 and 64-bit
x86: Fix rflags in FAKE_STACK_FRAME
x86: Clean up and extend do_int3()
x86: Call do_notify_resume() with interrupts enabled
x86/div64: Add a micro-optimization shortcut if base is power of two
x86-64: Cleanup some assembly entry points
x86-64: Slightly shorten line system call entry and exit paths
x86-64: Reduce amount of redundant code generated for invalidate_interruptNN
x86-64: Slightly shorten int_ret_from_sys_call
x86, efi: Convert efi_phys_get_time() args to physical addresses
x86: Default to vsyscall=emulate
x86-64: Set siginfo and context on vsyscall emulation faults
x86: consolidate xchg and xadd macros
...

Linus Torvalds
2012-01-07 05:59:14 +0800

06 Jan, 2012

1 commit

426932909 x86-64: Slightly shorten copy_page() ... Browse Code »

%r13 got saved and restored without ever getting touched, so
there's no need to do so.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/4F05D9F9020000780006AA0D@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2012-01-06 19:25:37 +0800

13 Dec, 2011

1 commit

890890cb8 x86/i386: Use less assembly in strlen(), speed things up a bit ... Browse Code »

Current i386 strlen() hardcodes NOT/DEC sequence. DEC is
mentioned to be suboptimal on Core2. So, put only REPNE SCASB
sequence in assembly, compiler can do the rest.

The difference in generated code is like below (MCORE2=y):

:
push %edi
mov $0xffffffff,%ecx
mov %eax,%edi
xor %eax,%eax
repnz scas %es:(%edi),%al
not %ecx

- dec %ecx
- mov %ecx,%eax
+ lea -0x1(%ecx),%eax

pop %edi
ret

Signed-off-by: Alexey Dobriyan
Cc: Linus Torvalds
Cc: Jan Beulich
Link: http://lkml.kernel.org/r/20111211181319.GA17097@p183.telecom.by
Signed-off-by: Ingo Molnar

Alexey Dobriyan
2011-12-13 01:33:42 +0800

05 Dec, 2011

2 commits

a9c373d03 x86: Update instruction decoder to support new AVX formats ... Browse Code »

Since new Intel software developers manual introduces
new format for AVX instruction set (including AVX2),
it is important to update x86-opcode-map.txt to fit
those changes.

Signed-off-by: Masami Hiramatsu
Cc: "H. Peter Anvin"
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120557.15475.13236.stgit@cloud
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2011-12-05 21:53:21 +0800
130b78b2b x86: Fix instruction decoder to handle grouped AVX instructions ... Browse Code »

For reducing memory usage of attribute table, x86 instruction
decoder puts "Group" attribute only on "no-last-prefix"
attribute table (same as vex_p == 0 case).

Thus, the decoder should look no-last-prefix table first, and
then only if it is not a group, move on to "with-last-prefix"
table (vex_p != 0).

However, current implementation, inat_get_avx_attribute()
looks with-last-prefix directly. So, when decoding
a grouped AVX instruction, the decoder fails to find correct
group because there is no "Group" attribute on the table.
This ends up with the mis-decoding of instructions, as Ingo
reported in http://thread.gmane.org/gmane.linux.kernel/1214103

This patch fixes it to check no-last-prefix table first
even if that is an AVX instruction, and get an attribute from
"with last-prefix" table only if that is not a group.

Reported-by: Ingo Molnar
Signed-off-by: Masami Hiramatsu
Cc: "H. Peter Anvin"
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120539.15475.91428.stgit@cloud
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2011-12-05 21:53:15 +0800

10 Oct, 2011

1 commit

53a019a95 x86: Fix insn decoder for longer instruction ... Browse Code »

Fix x86 insn decoder for hardening against invalid length
instructions. This adds length checkings for each byte-read
site and if it exceeds MAX_INSN_SIZE, returns immediately.
This can happen when decoding user-space binary.

Caller can check whether it happened by checking insn.*.got
member is set or not.

Signed-off-by: Masami Hiramatsu
Cc: Stephane Eranian
Cc: Andi Kleen
Cc: acme@redhat.com
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: ravitillo@lbl.gov
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Peter Zijlstra
Cc: Srikar Dronamraju
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111007133155.10933.58577.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2011-10-10 15:05:51 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

23 Jul, 2011

2 commits

8e204874d Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, vdso: Do not allocate memory for the vDSO
clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option
x86, vdso: Drop now wrong comment
Document the vDSO and add a reference parser
ia64: Replace clocksource.fsys_mmio with generic arch data
x86-64: Move vread_tsc and vread_hpet into the vDSO
clocksource: Replace vread with generic arch data
x86-64: Add --no-undefined to vDSO build
x86-64: Allow alternative patching in the vDSO
x86: Make alternative instruction pointers relative
x86-64: Improve vsyscall emulation CS and RIP handling
x86-64: Emulate legacy vsyscalls
x86-64: Fill unused parts of the vsyscall page with 0xcc
x86-64: Remove vsyscall number 3 (venosys)
x86-64: Map the HPET NX
x86-64: Remove kernel.vsyscall64 sysctl
x86-64: Give vvars their own page
x86-64: Document some of entry_64.S
x86-64: Fix alignment of jiffies variable

Linus Torvalds
2011-07-23 08:05:15 +0800
eb47418dc Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix write lock scalability 64-bit issue
x86: Unify rwsem assembly implementation
x86: Unify rwlock assembly implementation
x86, asm: Fix binutils 2.16 issue with __USER32_CS
x86, asm: Cleanup thunk_64.S
x86, asm: Flip RESTORE_ARGS arguments logic
x86, asm: Flip SAVE_ARGS arguments logic
x86, asm: Thin down SAVE/RESTORE_* asm macros

Linus Torvalds
2011-07-23 08:02:24 +0800

22 Jul, 2011

1 commit

1ac2e6ca4 x86, perf: Make copy_from_user_nmi() a library function ... Browse Code »

copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.

Signed-off-by: Robert Richter
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.com
Signed-off-by: Ingo Molnar

Robert Richter
2011-07-22 02:41:57 +0800

21 Jul, 2011

3 commits

a750036f3 x86: Fix write lock scalability 64-bit issue ... Browse Code »

With the write lock path simply subtracting RW_LOCK_BIAS there
is, on large systems, the theoretical possibility of overflowing
the 32-bit value that was used so far (namely if 128 or more
CPUs manage to do the subtraction, but don't get to do the
inverse addition in the failure path quickly enough).

A first measure is to modify RW_LOCK_BIAS itself - with the new
value chosen, it is good for up to 2048 CPUs each allowed to
nest over 2048 times on the read path without causing an issue.
Quite possibly it would even be sufficient to adjust the bias a
little further, assuming that allowing for significantly less
nesting would suffice.

However, as the original value chosen allowed for even more
nesting levels, to support more than 2048 CPUs (possible
currently only for 64-bit kernels) the lock itself gets widened
to 64 bits.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-07-21 15:03:36 +0800
a73866946 x86: Unify rwsem assembly implementation ... Browse Code »

Rather than having two functionally identical implementations
for 32- and 64-bit configurations, use the previously extended
assembly abstractions to fold the rwsem two implementations into
a shared one.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/4E258DF3020000780004E3ED@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-07-21 15:03:32 +0800
4625cd637 x86: Unify rwlock assembly implementation ... Browse Code »

Rather than having two functionally identical implementations
for 32- and 64-bit configurations, extend the existing assembly
abstractions enough to fold the two rwlock implementations into
a shared one.

Signed-off-by: Jan Beulich
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-07-21 15:03:31 +0800

14 Jul, 2011

1 commit

59e97e4d6 x86: Make alternative instruction pointers relative ... Browse Code »

This save a few bytes on x86-64 and means that future patches can
apply alternatives to unrelocated code.

Signed-off-by: Andy Lutomirski
Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin

Andy Lutomirski
2011-07-14 02:22:56 +0800

04 Jun, 2011

1 commit

38e6b75d3 x86, asm: Cleanup thunk_64.S ... Browse Code »

Drop thunk_ra macro in favor of an additional argument to the thunk
macro since their bodies are almost identical. Do a whitespace scrubbing
and use CFI-aware macros for full annotation.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1306873314-32523-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin

Borislav Petkov
2011-06-04 05:38:55 +0800

20 May, 2011

1 commit

17b141803 Merge branches 'x86-apic-for-linus', 'x86-asm-for-linus' and 'x86-cleanups-for-l… ... Browse Code »

…inus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, apic: Print verbose error interrupt reason on apic=debug

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Demacro CONFIG_PARAVIRT cpu accessors

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix mrst sparse complaints
x86: Fix spelling error in the memcpy() source code comment
x86, mpparse: Remove unnecessary variable

Linus Torvalds
2011-05-20 08:49:35 +0800

18 May, 2011

4 commits

26afb7c66 x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit ... Browse Code »

As reported in BZ #30352:

https://bugzilla.kernel.org/show_bug.cgi?id=30352

there's a kernel bug related to reading the last allowed page on x86_64.

The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:

if (buf + size >= limit)
fail();

while it should be more permissive:

if (buf + size > limit)
fail();

That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".

Following program fails to use the last page as buffer
due to the wrong limit check:

#include
#include
#include

#define PAGE_SIZE (4096)
#define LAST_PAGE ((void*)(0x7fffffffe000))

int main()
{
int fds[2], err;
void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
assert(ptr == LAST_PAGE);
err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
assert(err == 0);
err = send(fds[0], ptr, PAGE_SIZE, 0);
perror("send");
assert(err == PAGE_SIZE);
err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
perror("recv");
assert(err == PAGE_SIZE);
return 0;
}

The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.

The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).

However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.

This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.

Signed-off-by: Jiri Olsa
Acked-by: Brian Gerst
Acked-by: Linus Torvalds
Cc:
Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2011-05-18 18:49:00 +0800
2f19e06ac x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB ... Browse Code »

Support memset() with enhanced rep stosb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb
overrides the fast string alternative memset_c and the original function.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:31 +0800
057e05c1d x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB ... Browse Code »

Support memmove() by enhanced rep movsb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb
overrides the original function.

The patch doesn't change the backward memmove case to use enhanced rep
movsb.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:30 +0800
101068c1f x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB ... Browse Code »

Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.

Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin

Fenghua Yu
2011-05-18 06:40:29 +0800