Eric Lee / smarc-fsl-linux-kernel

22 Aug, 2012

1 commit

eb48c0714 mm: hugetlbfs: correctly populate shared pmd ... Browse Code »

Each page mapped in a process's address space must be correctly
accounted for in _mapcount. Normally the rules for this are
straightforward but hugetlbfs page table sharing is different. The page
table pages at the PMD level are reference counted while the mapcount
remains the same.

If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:

kernel BUG at mm/filemap.c:135!
invalid opcode: 0000 [#1] SMP
CPU 22
Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
Pid: 18001, comm: mpitest Tainted: G W 3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
RIP: 0010:[] [] __delete_from_page_cache+0x15d/0x170
Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
Call Trace:
delete_from_page_cache+0x40/0x80
truncate_hugepages+0x115/0x1f0
hugetlbfs_evict_inode+0x18/0x30
evict+0x9f/0x1b0
iput_final+0xe3/0x1e0
iput+0x3e/0x50
d_kill+0xf8/0x110
dput+0xe2/0x1b0
__fput+0x162/0x240

During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte. The logic is if
the PMD page is the same, they must be shared. This assumes that the
sharing is between the parent and child. However, if the sharing is
with a different process entirely then this check fails as in this
diagram:

parent
|
------------>pmd
src_pte----------> data page
^
other--------->pmd--------------------|
^
child-----------|
dst_pte

For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other. This is
possible due to the following style of race.

PROC A PROC B
copy_hugetlb_page_range copy_hugetlb_page_range
src_pte == huge_pte_offset src_pte == huge_pte_offset
!src_pte so no sharing !src_pte so no sharing

(time passes)

hugetlb_fault hugetlb_fault
huge_pte_alloc huge_pte_alloc
huge_pmd_share huge_pmd_share
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
pmd_alloc pmd_alloc
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)

These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed. When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.

This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd. This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.

Race identified and changelog written mostly by Mel Gorman.

{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: Larry Woodman
Tested-by: Larry Woodman
Reviewed-by: Mel Gorman
Signed-off-by: Michal Hocko
Reviewed-by: Rik van Riel
Cc: David Gibson
Cc: Ken Chen
Cc: Cong Wang
Cc: Hillf Danton
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2012-08-22 07:45:02 +0800

21 Aug, 2012

2 commits

c71a35520 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar.

A x32 socket ABI fix with a -stable backport tag among other fixes.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x32: Use compat shims for {g,s}etsockopt
Revert "x86-64/efi: Use EFI to deal with platform wall clock"
x86, apic: fix broken legacy interrupts in the logical apic mode
x86, build: Globally set -fno-pic
x86, avx: don't use avx instructions with "noxsave" boot param

Linus Torvalds
2012-08-21 01:36:18 +0800
f78602ab7 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: disable PEBS on a guest entry.
perf/x86: Add Intel Westmere-EX uncore support
perf/x86: Fixes for Nehalem-EX uncore driver
perf, x86: Fix uncore_types_exit section mismatch

Linus Torvalds
2012-08-21 01:34:21 +0800

19 Aug, 2012

1 commit

515c7af85 x32: Use compat shims for {g,s}etsockopt ... Browse Code »

Some of the arguments to {g,s}etsockopt are passed in userland pointers.
If we try to use the 64bit entry point, we end up sometimes failing.

For example, dhcpcd doesn't run in x32:
# dhcpcd eth0
dhcpcd[1979]: version 5.5.6 starting
dhcpcd[1979]: eth0: broadcasting for a lease
dhcpcd[1979]: eth0: open_socket: Invalid argument
dhcpcd[1979]: eth0: send_raw_packet: Bad file descriptor

The code in particular is getting back EINVAL when doing:
struct sock_fprog pf;
setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &pf, sizeof(pf));

Diving into the kernel code, we can see:
include/linux/filter.h:
struct sock_fprog {
unsigned short len;
struct sock_filter __user *filter;
};

net/core/sock.c:
case SO_ATTACH_FILTER:
ret = -EINVAL;
if (optlen == sizeof(struct sock_fprog)) {
struct sock_fprog fprog;

ret = -EFAULT;
if (copy_from_user(&fprog, optval, sizeof(fprog)))
break;

ret = sk_attach_filter(&fprog, sk);
}
break;

arch/x86/syscalls/syscall_64.tbl:
54 common setsockopt sys_setsockopt
55 common getsockopt sys_getsockopt

So for x64, sizeof(sock_fprog) is 16 bytes. For x86/x32, it's 8 bytes.
This comes down to the pointer being 32bit for x32, which means we need
to do structure size translation. But since x32 comes in directly to
sys_setsockopt, it doesn't get translated like x86.

After changing the syscall table and rebuilding glibc with the new kernel
headers, dhcp runs fine in an x32 userland.

Oddly, it seems like Linus noted the same thing during the initial port,
but I guess that was missed/lost along the way:
https://lkml.org/lkml/2011/8/26/452

[ hpa: tagging for -stable since this is an ABI fix. ]

Bugzilla: https://bugs.gentoo.org/423649
Reported-by: Mads
Signed-off-by: Mike Frysinger
Link: http://lkml.kernel.org/r/1345320697-15713-1-git-send-email-vapier@gentoo.org
Cc: H. J. Lu
Cc: v3.4..v3.5
Signed-off-by: H. Peter Anvin

Mike Frysinger
2012-08-19 05:15:39 +0800

17 Aug, 2012

1 commit

ad54e4611 Merge tag 'stable/for-linus-3.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen ... Browse Code »

Pull Xen fix from Konrad Rzeszutek Wilk:
"Way back in v3.5 we added a mechanism to populate back pages that were
released (they overlapped with MMIO regions), but neglected to reserve
the proper amount of virtual space for extend_brk to work properly.

Coincidentally some other commit aligned the _brk space to larger area
so I didn't trigger this until it was run on a machine with more than
2GB of MMIO space."

* On machines with large MMIO/PCI E820 spaces we fail to boot b/c
we failed to pre-allocate large enough virtual space for extend_brk.

* tag 'stable/for-linus-3.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.

Linus Torvalds
2012-08-17 02:31:59 +0800

15 Aug, 2012

2 commits

f026cfa82 Revert "x86-64/efi: Use EFI to deal with platform wall clock" ... Browse Code »

This reverts commit bacef661acdb634170a8faddbc1cf28e8f8b9eee.

This commit has been found to cause serious regressions on a number of
ASUS machines at the least. We probably need to provide a 1:1 map in
addition to the EFI virtual memory map in order for this to work.

Signed-off-by: H. Peter Anvin
Reported-and-bisected-by: Jérôme Carretero
Cc: Jan Beulich
Cc: Matt Fleming
Cc: Matthew Garrett
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120805172903.5f8bb24c@zougloub.eu

H. Peter Anvin
2012-08-15 00:58:25 +0800
f1c630018 x86, apic: fix broken legacy interrupts in the logical apic mode ... Browse Code »

Recent commit 332afa656e76458ee9cf0f0d123016a0658539e4 cleaned up
a workaround that updates irq_cfg domain for legacy irq's that
are handled by the IO-APIC. This was assuming that the recent
changes in assign_irq_vector() were sufficient to remove the workaround.

But this broke couple of AMD platforms. One of them seems to be
sending interrupts to the offline cpu's, resulting in spurious
"No irq handler for vector xx (irq -1)" messages when those cpu's come online.
And the other platform seems to always send the interrupt to the last logical
CPU (cpu-7). Recent changes had an unintended side effect of using only logical
cpu-0 in the IO-APIC RTE (during boot for the legacy interrupts) and this
broke the legacy interrupts not getting routed to the cpu-7 on the AMD
platform, resulting in a boot hang.

For now, reintroduce the removed workaround, (essentially not allowing the
vector to change for legacy irq's when io-apic starts to handle the irq. Which
also addressed the uninteded sife effect of just specifying cpu-0 in the
IO-APIC RTE for those irq's during boot).

Reported-and-tested-by: Robert Richter
Reported-and-tested-by: Borislav Petkov
Signed-off-by: Suresh Siddha
Link: http://lkml.kernel.org/r/1344453412.29170.5.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin

Suresh Siddha
2012-08-15 00:52:20 +0800

14 Aug, 2012

4 commits

26a4f3c08 perf/x86: disable PEBS on a guest entry. ... Browse Code »

If PMU counter has PEBS enabled it is not enough to disable counter
on a guest entry since PEBS memory write can overshoot guest entry
and corrupt guest memory. Disabling PEBS during guest entry solves
the problem.

Tested-by: David Ahern
Signed-off-by: Gleb Natapov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120809085234.GI3341@redhat.com
Signed-off-by: Thomas Gleixner

Gleb Natapov
2012-08-14 01:01:04 +0800
cb37af771 perf/x86: Add Intel Westmere-EX uncore support ... Browse Code »

The Westmere-EX uncore is similar to the Nehalem-EX uncore. The
differences are:
- Westmere-EX uncore has 10 instances of Cbox. The MSRs for Cbox8
and Cbox9 in the Westmere-EX aren't contiguous with Cbox 0~7.
- The fvid field in the ZDP_CTL_FVC register in the Mbox is
different. It's 5 bits in the Nehalem-EX, 6 bits in the
Westmere-EX.

Signed-off-by: Yan, Zheng
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1344229882-3907-3-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Thomas Gleixner

Yan, Zheng
2012-08-14 01:01:04 +0800
ebb6cc035 perf/x86: Fixes for Nehalem-EX uncore driver ... Browse Code »

This patch includes following fixes and update:
- Only some events in the Sbox and Mbox can use the match/mask
registers, add code to check this.
- The format definitions for xbr_mm_cfg and xbr_match registers
in the Rbox are wrong, xbr_mm_cfg should use 32 bits, xbr_match
should use 64 bits.
- Cleanup the Rbox code. Compute the addresses extra registers in
the enable_event function instead of the hw_config function.
This simplifies the code in nhmex_rbox_alter_er().

Signed-off-by: Yan, Zheng
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1344229882-3907-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Thomas Gleixner

Yan, Zheng
2012-08-14 01:01:03 +0800
cffa59baa perf, x86: Fix uncore_types_exit section mismatch ... Browse Code »

Fix the following section mismatch:

WARNING: arch/x86/kernel/cpu/built-in.o(.text+0x7ad9): Section mismatch in reference from the function uncore_types_exit() to the function .init.text:uncore_type_exit()

The function uncore_types_exit() references the function __init
uncore_type_exit(). This is often because uncore_types_exit lacks a
__init annotation or the annotation of uncore_type_exit is wrong.

caused by 14371cce03c2 ("perf: Add generic PCI uncore PMU device
support").

Cc: Zheng Yan
Cc: Ingo Molnar
Signed-off-by: Borislav Petkov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1339741902-8449-8-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Thomas Gleixner

Borislav Petkov
2012-08-14 01:01:03 +0800

11 Aug, 2012

1 commit

484d90eec x86, build: Globally set -fno-pic ... Browse Code »

GCC built with nonstandard options can enable -fpic by default.
We never want this for 32-bit kernels and it will break the build.

[ hpa: Notably the Android toolchain apparently does this. ]

Change-Id: Iaab7d66e598b1c65ac4a4f0229eca2cd3d0d2898
Signed-off-by: Andrew Boie
Link: http://lkml.kernel.org/r/1344624546-29691-1-git-send-email-andrew.p.boie@intel.com
Signed-off-by: H. Peter Anvin

Andrew Boie
2012-08-11 07:12:30 +0800

09 Aug, 2012

1 commit

c6fd893da x86, avx: don't use avx instructions with "noxsave" boot param ... Browse Code »

Clear AVX, AVX2 features along with clearing XSAVE feature bits,
as part of the parsing "noxsave" parameter.

Fixes the kernel boot panic with "noxsave" boot parameter.

We could have checked cpu_has_osxsave along with cpu_has_avx etc, but Peter
mentioned clearing the feature bits will be better for uses like
static_cpu_has() etc.

Signed-off-by: Suresh Siddha
Link: http://lkml.kernel.org/r/1343755754.2041.2.camel@sbsiddha-desk.sc.intel.com
Cc: # v3.5
Signed-off-by: H. Peter Anvin

Suresh Siddha
2012-08-09 04:41:42 +0800

04 Aug, 2012

4 commits

d8579fd83 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux ... Browse Code »

Pull ACPI and power management fixes from Len Brown:
"A 3.3 sleep regression fixed, numa bugfix, plus some minor cleanups"

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
ACPI processor: Fix tick_broadcast_mask online/offline regression
ACPI: Only count valid srat memory structures
ACPI: Untangle a return statement for better readability
ACPI / PCI: Do not try to acquire _OSC control if that is hopeless
ACPI: delete _GTS/_BFS support
ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler'
ACPI: replace strlen("string") with sizeof("string") -1
ACPI / PM: Fix build warning in sleep.c for CONFIG_ACPI_SLEEP unset

Linus Torvalds
2012-08-04 05:10:00 +0800
d79095eee Merge git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM bug fixes from Marcelo Tosatti:
- Fix DS/ES segment register corruption on x86_32.
- Fix kvmclock wallclock migration offset.
- Fix PIT interrupt ACK vs system reset logic bug.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Fix ds/es corruption on i386 with preemption
KVM: x86: apply kvmclock offset to guest wall clock time
KVM: PIC: call ack notifiers for irqs that are dropped form irr

Linus Torvalds
2012-08-04 02:21:29 +0800
1ca0049f2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"Various fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, kcmp: The kcmp system call can be common
arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case
x86/mce: Add quirk for instruction recovery on Sandy Bridge processors
x86/mce: Move MCACOD defines from mce-severity.c to
x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
x86, nops: Missing break resulting in incorrect selection on Intel
x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental

Linus Torvalds
2012-08-04 01:59:36 +0800
bd463a060 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Fix merge window fallout and fix sleep profiling (this was always
broken, so it's not a fix for the merge window - we can skip this one
from the head of the tree)."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/trace: Add ability to set a target task for events
perf/x86: Fix USER/KERNEL tagging of samples properly
perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit

Linus Torvalds
2012-08-04 01:57:20 +0800

03 Aug, 2012

3 commits

9d0b01a1b Merge branches 'delete-gts-bfs', 'misc', 'novell-bugzilla-757888-numa' and 'osc-pcie' into base Browse Code »

Len Brown
2012-08-03 12:31:23 +0800
095adbb64 ACPI: Only count valid srat memory structures ... Browse Code »

Otherwise you could run into:
WARN_ON in numa_register_memblks(), because node_possible_map is zero

References: https://bugzilla.novell.com/show_bug.cgi?id=757888

On this machine (ProLiant ML570 G3) the SRAT table contains:
- No processor affinities
- One memory affinity structure (which is set disabled)

CC: Per Jessen
CC: Andi Kleen
Signed-off-by: Thomas Renninger
Signed-off-by: Len Brown

Thomas Renninger
2012-08-03 12:15:53 +0800
fc6bdb59a Merge branch 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc ... Browse Code »

Pull OLPC platform updates from Andres Salomon:
"These move the OLPC Embedded Controller driver out of
arch/x86/platform and into drivers/platform/olpc.

OLPC machines are now ARM-based (which means lots of x86 and ARM
changes), but are typically pretty self-contained.. so it makes more
sense to go through a separate OLPC tree after getting the appropriate
review/ACKs."

* 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc:
x86: OLPC: move s/r-related EC cmds to EC driver
Platform: OLPC: move global variables into priv struct
Platform: OLPC: move debugfs support from x86 EC driver
x86: OLPC: switch over to using new EC driver on x86
Platform: OLPC: add a suspended flag to the EC driver
Platform: OLPC: turn EC driver into a platform_driver
Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it
drivers: OLPC: update various drivers to include olpc-ec.h
Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver

Linus Torvalds
2012-08-03 02:52:39 +0800

02 Aug, 2012

6 commits

5bc6f9888 xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. ... Browse Code »
43

When we release pages back during bootup:

Freeing 9d-100 pfn range: 99 pages freed
Freeing 9cf36-9d0d2 pfn range: 412 pages freed
Freeing 9f6bd-9f6bf pfn range: 2 pages freed
Freeing 9f714-9f7bf pfn range: 171 pages freed
Freeing 9f7e0-9f7ff pfn range: 31 pages freed
Freeing 9f800-100000 pfn range: 395264 pages freed
Released 395979 pages of unused memory

We then try to populate those pages back. In the P2M tree however
the space for those leafs must be reserved - as such we use extend_brk.
We reserve 8MB of _brk space, which means we can fit over
1048576 PFNs - which is more than we should ever need.

Without this, on certain compilation of the kernel we would hit:

(XEN) domain_crash_sync called from entry.S
(XEN) CPU: 0
(XEN) RIP: e033:[]
(XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest
(XEN) rax: ffffffff81a7c000 rbx: 000000000000003d rcx: 0000000000001000
(XEN) rdx: ffffffff81a7b000 rsi: 0000000000001000 rdi: 0000000000001000
(XEN) rbp: ffffffff81801cd8 rsp: ffffffff81801c98 r8: 0000000000100000
(XEN) r9: ffffffff81a7a000 r10: 0000000000000001 r11: 0000000000000003
(XEN) r12: 0000000000000004 r13: 0000000000000004 r14: 000000000000003d
(XEN) r15: 00000000000001e8 cr0: 000000008005003b cr4: 00000000000006f0
(XEN) cr3: 0000000125803000 cr2: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffffffff81801c98:

.. which is extend_brk hitting a BUG_ON.

Interestingly enough, most of the time we are not going to hit this
b/c the _brk space is quite large (v3.5):
ffffffff81a25000 B __brk_base
ffffffff81e43000 B __brk_limit
= ~4MB.

vs earlier kernels (with this back-ported), the space is smaller:
ffffffff81a25000 B __brk_base
ffffffff81a7b000 B __brk_limit
= 344 kBytes.

where we would certainly hit this and hit extend_brk.

Note that git commit c3d93f880197953f86ab90d9da4744e926b38e33
(xen: populate correct number of pages when across mem boundary (v2))
exposed this bug).

[v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]

CC: stable@vger.kernel.org #only for 3.5
Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2012-08-02 22:39:53 +0800
1871e845e Merge branch 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml ... Browse Code »

Pull UML fixes from Richard Weinberger:
"This patch set contains mostly fixes and cleanups. The UML tty driver
uses now tty_port and is no longer broken like hell :-)"

* 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Add arch/x86/um to MAINTAINERS
um: pass siginfo to guest process
um: fix ubd_file_size for read-only files
um: pull interrupt_end() into userspace()
um: split syscall_trace(), pass pt_regs to it
um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs
um: set BLK_CGROUP=y in defconfig
um: remove count_lock
um: fully use tty_port
um: Remove dead code
um: remove line_ioctl()
TTY: um/line, use tty from tty_port
TTY: um/line, add tty_port

Linus Torvalds
2012-08-02 07:45:02 +0800
aa67f6096 KVM: VMX: Fix ds/es corruption on i386 with preemption ... Browse Code »
43

Commit b2da15ac26a0c ("KVM: VMX: Optimize %ds, %es reload") broke i386
in the following scenario:

vcpu_load
...
vmx_save_host_state
vmx_vcpu_run
(ds.rpl, es.rpl cleared by hardware)

interrupt
push ds, es # pushes bad ds, es
schedule
vmx_vcpu_put
vmx_load_host_state
reload ds, es (with __USER_DS)
pop ds, es # of other thread's stack
iret
# other thread runs
interrupt
push ds, es
schedule # back in vcpu thread
pop ds, es # now with rpl=0
iret
...
vcpu_put
resume_userspace
iret # clears ds, es due to mismatched rpl

(instead of resume_userspace, we might return with SYSEXIT and then
take an exception; when the exception IRETs we end up with cleared
ds, es)

Fix by avoiding the optimization on i386 and reloading ds, es on the
lightweight exit path.

Reported-by: Chris Clayron
Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti

Avi Kivity
2012-08-02 07:23:57 +0800
eaf4ce6c5 x86-64, kcmp: The kcmp system call can be common ... Browse Code »

We already use the same system call handler for i386 and x86-64, there
is absolutely no reason x32 can't use the same system call, too.

Signed-off-by: H. Peter Anvin
Cc: H.J. Lu
Cc: Cyrill Gorcunov
Cc: v3.5
Link: http://lkml.kernel.org/n/tip-vwzk3qbcr3yjyxjg2j38vgy9@git.kernel.org

H. Peter Anvin
2012-08-02 07:01:06 +0800
a3170d2ec um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Richard Weinberger

Al Viro
2012-08-02 05:33:16 +0800
4b6486659 KVM: x86: apply kvmclock offset to guest wall clock time ... Browse Code »

When a guest migrates to a new host, the system time difference from the
previous host is used in the updates to the kvmclock system time visible
to the guest, resulting in a continuation of correct kvmclock based guest
timekeeping.

The wall clock component of the kvmclock provided time is currently not
updated with this same time offset. Since the Linux guest caches the
wall clock based time, this discrepency is not noticed until the guest is
rebooted. After reboot the guest's time calculations are off.

This patch adjusts the wall clock by the kvmclock_offset, resulting in
correct guest time after a reboot.

Cc: Zachary Amsden
Signed-off-by: Bruce Rogers
Signed-off-by: Marcelo Tosatti

Bruce Rogers
2012-08-02 04:23:50 +0800

01 Aug, 2012

6 commits

1fcfd08bd x86: OLPC: move s/r-related EC cmds to EC driver ... Browse Code »

The new EC driver calls platform-specific suspend and resume hooks; run
XO-1-specific EC commands from there, rather than deep in s/r code. If we
attempt to run EC commands after the new EC driver has suspended, it is
refused by the ec->suspended checks.

Signed-off-by: Andres Salomon
Acked-by: Paul Fox
Reviewed-by: Thomas Gleixner

Andres Salomon
2012-08-01 11:27:31 +0800
6cca83d49 Platform: OLPC: move debugfs support from x86 EC driver ... Browse Code »

There's nothing about the debugfs interface for the EC driver that is
architecture-specific, so move it into the arch-independent driver.

The code is mostly unchanged with the exception of renamed variables, coding
style changes, and API updates.

Signed-off-by: Andres Salomon
Acked-by: Paul Fox
Reviewed-by: Thomas Gleixner

Andres Salomon
2012-08-01 11:27:31 +0800
85f90cf6c x86: OLPC: switch over to using new EC driver on x86 ... Browse Code »

This uses the new EC driver framework in drivers/platform/olpc. The
XO-1 and XO-1.5-specific code is still in arch/x86, but the generic stuff
(including a new workqueue; no more running EC commands with IRQs disabled!)
can be shared with other architectures.

Signed-off-by: Andres Salomon
Acked-by: Paul Fox
Reviewed-by: Thomas Gleixner

Andres Salomon
2012-08-01 11:27:30 +0800
3bf9428f2 drivers: OLPC: update various drivers to include olpc-ec.h ... Browse Code »

Switch over to using olpc-ec.h in multiple steps, so as not to break builds.
This covers every driver that calls olpc_ec_cmd().

Signed-off-by: Andres Salomon
Acked-by: Paul Fox
Reviewed-by: Thomas Gleixner

Andres Salomon
2012-08-01 11:27:29 +0800
392a325c4 Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver ... Browse Code »

The OLPC EC driver has outgrown arch/x86/platform/. It's time to both
share common code amongst different architectures, as well as move it out
of arch/x86/. The XO-1.75 is ARM-based, and the EC driver shares a lot of
code with the x86 code.

Signed-off-by: Andres Salomon
Acked-by: Paul Fox
Reviewed-by: Thomas Gleixner

Andres Salomon
2012-08-01 11:27:29 +0800
bca1a5c0e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes
updates/cleanups/fixes from Oleg and diverse tooling updates (mostly
fixes) now that Arnaldo is back from vacation."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
uprobes: __replace_page() needs munlock_vma_page()
uprobes: Rename vma_address() and make it return "unsigned long"
uprobes: Fix register_for_each_vma()->vma_address() check
uprobes: Introduce vaddr_to_offset(vma, vaddr)
uprobes: Teach build_probe_list() to consider the range
uprobes: Remove insert_vm_struct()->uprobe_mmap()
uprobes: Remove copy_vma()->uprobe_mmap()
uprobes: Fix overflow in vma_address()/find_active_uprobe()
uprobes: Suppress uprobe_munmap() from mmput()
uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()
uprobes: Clean up and document write_opcode()->lock_page(old_page)
uprobes: Kill write_opcode()->lock_page(new_page)
uprobes: __replace_page() should not use page_address_in_vma()
uprobes: Don't recheck vma/f_mapping in write_opcode()
perf/x86: Fix missing struct before structure name
perf/x86: Fix format definition of SNB-EP uncore QPI box
perf/x86: Make bitfield unsigned
perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
perf/x86: Add Intel Nehalem-EX uncore support
perf/x86: Fix typo in format definition of uncore PCU filter
...

Linus Torvalds
2012-08-01 06:34:13 +0800

31 Jul, 2012

6 commits

d07bdfd32 perf/x86: Fix USER/KERNEL tagging of samples properly ... Browse Code »

Some PMUs don't provide a full register set for their sample,
specifically 'advanced' PMUs like AMD IBS and Intel PEBS which provide
'better' than regular interrupt accuracy.

In this case we use the interrupt regs as basis and over-write some
fields (typically IP) with different information.

The perf core however uses user_mode() to distinguish user/kernel
samples, user_mode() relies on regs->cs. If the interrupt skid pushed
us over a boundary the new IP might not be in the same domain as the
interrupt.

Commit ce5c1fe9a9e ("perf/x86: Fix USER/KERNEL tagging of samples")
tried to fix this by making the perf core use kernel_ip(). This
however is wrong (TM), as pointed out by Linus, since it doesn't allow
for VM86 and non-zero based segments in IA32 mode.

Therefore, provide a new helper to set the regs->ip field,
set_linear_ip(), which massages the regs into a suitable state
assuming the provided IP is in fact a linear address.

Also modify perf_instruction_pointer() and perf_callchain_user() to
deal with segments base offsets.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1341910954.3462.102.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2012-07-31 23:02:04 +0800
7740dfc03 perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit ... Browse Code »

i386 allmodconfig:

arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_hrtimer':
arch/x86/kernel/cpu/perf_event_intel_uncore.c:728: warning: integer overflow in expression
arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_start_hrtimer':
arch/x86/kernel/cpu/perf_event_intel_uncore.c:735: warning: integer overflow in expression

Signed-off-by: Andrew Morton
Cc: Zheng Yan
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-h84qlqj02zrojmxxybzmy9hi@git.kernel.org
Signed-off-by: Ingo Molnar

Andrew Morton
2012-07-31 23:02:03 +0800
3b6961ba8 ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C fun… ... Browse Code »

…ction from assembler'

cd74257b974d6d26442c97891c4d05772748b177
patched up GTS/BFS -- a feature we want to remove.
So revert it (by hand, due to conflict in sleep.h)
to prepare for GTS/BFS removal.

Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Len Brown
2012-07-31 09:10:16 +0800
c1d7e01d7 ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION ... Browse Code »

Rather than #define the options manually in the architecture code, add
Kconfig options for them and select them there instead. This also allows
us to select the compat IPC version parsing automatically for platforms
using the old compat IPC interface.

Reported-by: Andrew Morton
Signed-off-by: Will Deacon
Cc: Arnd Bergmann
Cc: Chris Metcalf
Cc: Catalin Marinas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Will Deacon
2012-07-31 08:25:21 +0800
4ed940d4c firmware_map: make firmware_map_add_early() argument consistent with firmware_map_add_hotplug() ... Browse Code »

There are two ways to create /sys/firmware/memmap/X sysfs:

- firmware_map_add_early
When the system starts, it is calledd from e820_reserve_resources()
- firmware_map_add_hotplug
When the memory is hot plugged, it is called from add_memory()

But these functions are called without unifying value of end argument as
below:

- end argument of firmware_map_add_early() : start + size - 1
- end argument of firmware_map_add_hogplug() : start + size

The patch unifies them to "start + size". Even if applying the patch,
/sys/firmware/memmap/X/end file content does not change.

[akpm@linux-foundation.org: clarify comments]
Signed-off-by: Yasuaki Ishimatsu
Reviewed-by: Dave Hansen
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
2012-07-31 08:25:17 +0800
7463449b8 atomic64_test: simplify the #ifdef for atomic64_dec_if_positive() test ... Browse Code »

Introduce CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE and use this instead
of the multitude of #if defined() checks in atomic64_test.c

Signed-off-by: Catalin Marinas
Cc: Russell King
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2012-07-31 08:25:16 +0800

27 Jul, 2012

2 commits

4cb38750d Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »
86

Pull x86/mm changes from Peter Anvin:
"The big change here is the patchset by Alex Shi to use INVLPG to flush
only the affected pages when we only need to flush a small page range.

It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
vectors!) and replace it with an ordinary IPI function call."

Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
to changed line)

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tlb: Fix build warning and crash when building for !SMP
x86/tlb: do flush_tlb_kernel_range by 'invlpg'
x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
x86/tlb: enable tlb flush range support for x86
mm/mmu_gather: enable tlb flush range in generic mmu_gather
x86/tlb: add tlb_flushall_shift knob into debugfs
x86/tlb: add tlb_flushall_shift for specific CPU
x86/tlb: fall back to flush all when meet a THP large page
x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86/tlb_info: get last level TLB entry number of CPU
x86: Add read_mostly declaration/definition to variables from smp.h
x86: Define early read-mostly per-cpu macros

Linus Torvalds
2012-07-27 04:17:17 +0800
0a2fe19cc Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pul x86/efi changes from Ingo Molnar:
"This tree adds an EFI bootloader handover protocol, which, once
supported on the bootloader side, will make bootup faster and might
result in simpler bootloaders.

The other change activates the EFI wall clock time accessors on x86-64
as well, instead of the legacy RTC readout."

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Handover Protocol
x86-64/efi: Use EFI to deal with platform wall clock

Linus Torvalds
2012-07-27 04:13:25 +0800