Eric Lee / smarc-fsl-linux-kernel

09 Jan, 2012

1 commit

f93ea7338 Merge branches 'iommu/page-sizes' and 'iommu/group-id' into next ... Browse Code »

Conflicts:
drivers/iommu/amd_iommu.c
drivers/iommu/intel-iommu.c
include/linux/iommu.h

Joerg Roedel
2012-01-09 20:06:28 +0800

30 Dec, 2011

1 commit

7578ed02e Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix raw_spin_unlock_irqrestore() usage
oprofile, arm/sh: Fix oprofile_arch_exit() linkage issue

Linus Torvalds
2011-12-30 09:09:16 +0800

26 Dec, 2011

1 commit

4d25a066b KVM: Don't automatically expose the TSC deadline timer in cpuid ... Browse Code »

Unlike all of the other cpuid bits, the TSC deadline timer bit is set
unconditionally, regardless of what userspace wants.

This is broken in several ways:
- if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
deadline timer feature, a guest that uses the feature will break
- live migration to older host kernels that don't support the TSC deadline
timer will cause the feature to be pulled from under the guest's feet;
breaking it
- guests that are broken wrt the feature will fail.

Fix by not enabling the feature automatically; instead report it to userspace.
Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
KVM_GET_SUPPORTED_CPUID.

Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

[avi: add the KVM_CAP + documentation]

Reported-by: Alexey Zaytsev
Tested-by: Alexey Zaytsev
Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-12-26 19:27:44 +0800

25 Dec, 2011

1 commit

0924ab2cf KVM: x86: Prevent starting PIT timers in the absence of irqchip support ... Browse Code »

User space may create the PIT and forgets about setting up the irqchips.
In that case, firing PIT IRQs will crash the host:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
IP: [] kvm_set_irq+0x30/0x170 [kvm]
...
Call Trace:
[] pit_do_work+0x51/0xd0 [kvm]
[] process_one_work+0x111/0x4d0
[] worker_thread+0x152/0x340
[] kthread+0x7e/0x90
[] kernel_thread_helper+0x4/0x10

Prevent this by checking the irqchip mode before starting a timer. We
can't deny creating the PIT if the irqchips aren't set up yet as
current user land expects this order to work.

Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-12-25 23:13:18 +0800

24 Dec, 2011

1 commit

2e64694de perf/x86: Fix raw_spin_unlock_irqrestore() usage ... Browse Code »

Use raw_spin_unlock_irqrestore() as equivalent to
raw_spin_lock_irqsave().

Signed-off-by: Robert Richter
Cc: Stephane Eranian
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1324646665-13334-1-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar

Robert Richter
2011-12-24 00:57:01 +0800

22 Dec, 2011

1 commit

ecefc36b4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Add a flow_cache_flush_deferred function
ipv4: reintroduce route cache garbage collector
net: have ipconfig not wait if no dev is available
sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd
asix: new device id
davinci-cpdma: fix locking issue in cpdma_chan_stop
sctp: fix incorrect overflow check on autoclose
r8169: fix Config2 MSIEnable bit setting.
llc: llc_cmsg_rcv was getting called after sk_eat_skb.
net: bpf_jit: fix an off-one bug in x86_64 cond jump target
iwlwifi: update SCD BC table for all SCD queues
Revert "Bluetooth: Revert: Fix L2CAP connection establishment"
Bluetooth: Clear RFCOMM session timer when disconnecting last channel
Bluetooth: Prevent uninitialized data access in L2CAP configuration
iwlwifi: allow to switch to HT40 if not associated
iwlwifi: tx_sync only on PAN context
mwifiex: avoid double list_del in command cancel path
ath9k: fix max phy rate at rate control init
nfc: signedness bug in __nci_request()
iwlwifi: do not set the sequence control bit is not needed

Linus Torvalds
2011-12-22 10:29:26 +0800

20 Dec, 2011

2 commits

13f541c10 x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT ... Browse Code »

When printing the code bytes in show_registers(), the markers around the
byte at the fault address could make the printk() format string look
like a valid log level and facility code. This would prevent this byte
from being printed and result in a spurious newline:

[ 7555.765589] Code: 8b 32 e9 94 00 00 00 81 7d 00 ff 00 00 00 0f 87 96 00 00 00 48 8b 83 c0 00 00 00 44 89 e2 44 89 e6 48 89 df 48 8b 80 d8 02 00 00
[ 7555.765683] 8b 48 28 48 89 d0 81 e2 ff 0f 00 00 48 c1 e8 0c 48 c1 e0 04

Add KERN_CONT where needed, and elsewhere in show_registers() for
consistency.

Signed-off-by: Clemens Ladisch
Link: http://lkml.kernel.org/r/4EEFA7AE.9020407@ladisch.de
Signed-off-by: H. Peter Anvin

Clemens Ladisch
2011-12-20 05:09:56 +0800
a03ffcf87 net: bpf_jit: fix an off-one bug in x86_64 cond jump target ... Browse Code »
1

x86 jump instruction size is 2 or 5 bytes (near/long jump), not 2 or 6
bytes.

In case a conditional jump is followed by a long jump, conditional jump
target is one byte past the start of target instruction.

Signed-off-by: Markus Kötter
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Markus Kötter
2011-12-20 04:47:29 +0800

16 Dec, 2011

2 commits

42ebfc61c Merge branch 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/konrad/xen

* 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/swiotlb: Use page alignment for early buffer allocation.
xen: only limit memory map to maximum reservation for domain 0.

Linus Torvalds
2011-12-16 02:52:40 +0800
d3db72812 xen: only limit memory map to maximum reservation for domain 0. ... Browse Code »
1

d312ae878b6a "xen: use maximum reservation to limit amount of usable RAM"
clamped the total amount of RAM to the current maximum reservation. This is
correct for dom0 but is not correct for guest domains. In order to boot a guest
"pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
future memory expansion the guest must derive max_pfn from the e820 provided by
the toolstack and not the current maximum reservation (which can reflect only
the current maximum, not the guest lifetime max). The existing algorithm
already behaves this correctly if we do not artificially limit the maximum
number of pages for the guest case.

For a guest booted with maxmem=512, memory=128 this results in:
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable)
[ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
-[ 0.000000] Xen: 0000000000100000 - 0000000008100000 (usable)
-[ 0.000000] Xen: 0000000008100000 - 0000000020800000 (unusable)
+[ 0.000000] Xen: 0000000000100000 - 0000000020800000 (usable)
...
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
-[ 0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
+[ 0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
[ 0.000000] initial memory mapped : 0 - 027ff000
[ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
-[ 0.000000] init_memory_mapping: 0000000000000000-0000000008100000
-[ 0.000000] 0000000000 - 0008100000 page 4k
-[ 0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
+[ 0.000000] init_memory_mapping: 0000000000000000-0000000020800000
+[ 0.000000] 0000000000 - 0020800000 page 4k
+[ 0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
[ 0.000000] xen: setting RW the range 27e8000 - 27ff000
[ 0.000000] 0MB HIGHMEM available.
-[ 0.000000] 129MB LOWMEM available.
-[ 0.000000] mapped low ram: 0 - 08100000
-[ 0.000000] low ram: 0 - 08100000
+[ 0.000000] 520MB LOWMEM available.
+[ 0.000000] mapped low ram: 0 - 20800000
+[ 0.000000] low ram: 0 - 20800000

With this change "xl mem-set 512M" will successfully increase the
guest RAM (by reducing the balloon).

There is no change for dom0.

Reported-and-Tested-by: George Shuklin
Signed-off-by: Ian Campbell
Cc: stable@kernel.org
Reviewed-by: David Vrabel
Signed-off-by: Konrad Rzeszutek Wilk

Ian Campbell
2011-12-16 00:24:02 +0800

13 Dec, 2011

1 commit

e1ad783b1 Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid" ... Browse Code »

This hangs my MacBook Air at boot time; I get no console
messages at all. I reverted this on top of -rc5 and my machine
boots again.

This reverts commit e8c7106280a305e1ff2a3a8a4dfce141469fb039.

Signed-off-by: Matt Fleming
Signed-off-by: Keith Packard
Acked-by: H. Peter Anvin
Cc: Matthew Garrett
Cc: Zhang Rui
Cc: Huang Ying
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console
Signed-off-by: Ingo Molnar

Keith Packard
2011-12-13 01:25:56 +0800

10 Dec, 2011

2 commits

6d3e32e63 x86, efi: Make efi_call_phys_{prelog,epilog} CONFIG_RELOCATABLE-aware ... Browse Code »

efi_call_phys_prelog() sets up a 1:1 mapping of the physical address
range in swapper_pg_dir. Instead of replacing then restoring entries
in swapper_pg_dir we should be using initial_page_table which already
contains the 1:1 mapping.

It's safe to blindly switch back to swapper_pg_dir in the epilog
because the physical EFI routines are only called before
efi_enter_virtual_mode(), e.g. before any user processes have been
forked. Therefore, we don't need to track which pgd was in %cr3 when
we entered the prelog.

The previous code actually contained a bug because it assumed that the
kernel was loaded at a physical address within the first 8MB of ram,
usually at 0x100000. However, this isn't the case with a
CONFIG_RELOCATABLE=y kernel which could have been loaded anywhere in
the physical address space.

Also delete the ancient (and bogus) comments about the page table
being restored after the lock is released. There is no locking.

Cc: Matthew Garrett
Cc: Darrent Hart
Signed-off-by: Matt Fleming
Link: http://lkml.kernel.org/r/1323346250.3894.74.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin

Matt Fleming
2011-12-10 08:39:11 +0800
a776878d6 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Calling __pa() with an ioremap()ed address is invalid
x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
x86/intel_mid: Kconfig select fix
x86/intel_mid: Fix the Kconfig for MID selection

Linus Torvalds
2011-12-10 06:45:12 +0800

09 Dec, 2011

3 commits

b6999b191 thp: add compound tail page _mapcount when mapped ... Browse Code »
87

With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd. If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
-net none -device pci-assign,host=07:00.1

kernel BUG at mm/swap.c:114!
invalid opcode: 0000 [#1] SMP
Call Trace:
put_page+0x15/0x37
kvm_release_pfn_clean+0x31/0x36
kvm_iommu_put_pages+0x94/0xb1
kvm_iommu_unmap_memslots+0x80/0xb6
kvm_assign_device+0xba/0x117
kvm_vm_ioctl_assigned_device+0x301/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP put_compound_page+0xd4/0x168

Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc: Andi Kleen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Youquan Song
2011-12-09 23:50:28 +0800
e8c710628 x86, efi: Calling __pa() with an ioremap()ed address is invalid ... Browse Code »

If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:

BUG: unable to handle kernel paging request at f7f22280
IP: [] reserve_ram_pages_type+0x89/0x210
[...]

Call Trace:
[] ? page_is_ram+0x1a/0x40
[] reserve_memtype+0xdf/0x2f0
[] set_memory_uc+0x49/0xa0
[] efi_enter_virtual_mode+0x1c2/0x3aa
[] start_kernel+0x291/0x2f2
[] ? loglevel+0x1b/0x1b
[] i386_start_kernel+0xbf/0xc8

A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,

https://bugzilla.redhat.com/show_bug.cgi?id=748516

Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.

Signed-off-by: Matt Fleming
Cc: H. Peter Anvin
Cc: Matthew Garrett
Cc: Zhang Rui
Cc: Huang Ying
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar

Matt Fleming
2011-12-09 15:32:26 +0800
2ded6e6a9 x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked ... Browse Code »
1

When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
controls whether the HPET or the RTC delivers interrupts to irq8. When
the system goes into suspend, the RTC driver sends a signal to the
HPET driver so that the HPET releases control of irq8, allowing the
RTC to wake the system from suspend. The switchover is accomplished by
a write to the HPET configuration registers which currently only
occurs while servicing the HPET interrupt.

On some systems, I have seen the system suspend before an HPET
interrupt occurs, preventing the write to the HPET configuration
register and leaving the HPET in control of the irq8. As the HPET is
not active during suspend, it does not generate a wake signal and RTC
alarms do not work.

This patch forces the HPET driver to immediately transfer control of
the irq8 channel to the RTC instead of waiting until the next
interrupt event.

Signed-off-by: Mark Langsdorf
Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com
Tested-by: Andreas Herrmann
Signed-off-by: Andreas Herrmann
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org

Mark Langsdorf
2011-12-09 04:47:22 +0800

06 Dec, 2011

8 commits

4e2b1c4f5 x86/intel_mid: Kconfig select fix ... Browse Code »

If we select a symbol it should have a type declared first
otherwise in some situations the config tools get upset. They
are currently perhaps a bit too resilient which is why this
wasn't noticed initially.

Signed-off-by: Alan Cox
Link: http://lkml.kernel.org/r/20111206132811.4041.32549.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar

Alan Cox
2011-12-06 21:40:50 +0800
dd1375253 x86/intel_mid: Fix the Kconfig for MID selection ... Browse Code »

We currently fail to build on CONFIG_X86_INTEL_MID=y and
CONFIG_X86_MRST unset.

We could build all the bits to make generic MID work if you
picked MID platform alone but that's really silly. Instead use
select and two variables.

This looks a bit daft right now but once we add a Medfield
selection it'll start to look a good deal more sensible.

Reported-by: Ingo Molnar
Reported-by: Stanislaw Gruszka
Signed-off-by: Alan Cox
Link: http://lkml.kernel.org/r/20111205231433.28811.51297.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar

Alan Cox
2011-12-06 18:28:36 +0800
45e713efe Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intr_remapping: Fix section mismatch in ir_dev_scope_init()
intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev()
x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions
x86, AMD: Correct align_va_addr documentation
x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms
x86/mrst: Battery fixes
x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode
x86: Fix "Acer Aspire 1" reboot hang
x86/mtrr: Resolve inconsistency with Intel processor manual
x86: Document rdmsr_safe restrictions
x86, microcode: Fix the failure path of microcode update driver init code
Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup
x86/mpparse: Account for bus types other than ISA and PCI
x86, mrst: Change the pmic_gpio device type to IPC
mrst: Added some platform data for the SFI translations
x86,mrst: Power control commands update
x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot
x86, UV: Fix UV2 hub part number
x86: Add user_mode_vm check in stack_overflow_check

Linus Torvalds
2011-12-06 08:54:15 +0800
232ea3445 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix loss of notification with multi-event
perf, x86: Force IBS LVT offset assignment for family 10h
perf, x86: Disable PEBS on SandyBridge chips
trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
perf session: Fix crash with invalid CPU list
perf python: Fix undefined symbol problem
perf/x86: Enable raw event access to Intel offcore events
perf: Don't use -ENOSPC for out of PMU resources
perf: Do not set task_ctx pointer in cpuctx if there are no events in the context
perf/x86: Fix PEBS instruction unwind
oprofile, x86: Fix crash when unloading module (nmi timer mode)
oprofile: Fix crash when unloading module (hr timer mode)

Linus Torvalds
2011-12-06 08:54:00 +0800
7125facea Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, x86: Avoid unnecessary overflow in sched_clock
sched: Fix buglet in return_cfs_rq_runtime()
sched: Avoid SMT siblings in select_idle_sibling() if possible
sched: Set the command name of the idle tasks in SMP kernels
sched, rt: Provide means of disabling cross-cpu bandwidth sharing
sched: Document wait_for_completion_*() return values
sched_fair: Fix a typo in the comment describing update_sd_lb_stats
sched: Add a comment to effective_load() since it's a pain

Linus Torvalds
2011-12-06 08:50:24 +0800
f62ef5f3e x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions ... Browse Code »

I've received complaints that the numa_node attribute for family
15h model 00-0fh (e.g. Interlagos) northbridge functions shows
-1 instead of the proper node ID.

Correct this with attached quirks (similar to quirks for other
AMD CPU families used in multi-socket systems).

Signed-off-by: Andreas Herrmann
Cc: Frank Arnold
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/20111202072143.GA31916@alberich.amd.com
Signed-off-by: Ingo Molnar

Andreas Herrmann
2011-12-06 01:13:11 +0800
35d476996 x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms ... Browse Code »

Intel MID x86 platforms have a memory mapped virtual RTC
instead. No MID platform have the default ports (and
accessing them may do weird stuff).

Signed-off-by: Mathias Nyman
Signed-off-by: Alan Cox
Cc: feng.tang@intel.com
Cc: Feng Tang
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Mathias Nyman
2011-12-06 00:09:21 +0800
2cd1c8d4d x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode ... Browse Code »
1

Fix an outstanding issue that has been reported since 2.6.37.
Under a heavy loaded machine processing "fork()" calls could
crash with:

BUG: unable to handle kernel paging request at f573fc8c
IP: [] swap_count_continued+0x104/0x180
*pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000 Oops: 0000 [#1] SMP
Modules linked in:
Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
EIP: 0061:[] EFLAGS: 00210246 CPU: 3
EIP is at swap_count_continued+0x104/0x180
.. snip..
Call Trace:
[] ? __swap_duplicate+0xc2/0x160
[] ? pte_mfn_to_pfn+0x87/0xe0
[] ? swap_duplicate+0x14/0x40
[] ? copy_pte_range+0x45b/0x500
[] ? copy_page_range+0x195/0x200
[] ? dup_mmap+0x1c6/0x2c0
[] ? dup_mm+0xa8/0x130
[] ? copy_process+0x98a/0xb30
[] ? do_fork+0x4f/0x280
[] ? getnstimeofday+0x43/0x100
[] ? sys_clone+0x30/0x40
[] ? ptregs_clone+0x15/0x48
[] ? syscall_call+0x7/0xb

The problem is that in copy_page_range() we turn lazy mode on,
and then in swap_entry_free() we call swap_count_continued()
which ends up in:

map = kmap_atomic(page, KM_USER0) + offset;

and then later we touch *map.

Since we are running in batched mode (lazy) we don't actually
set up the PTE mappings and the kmap_atomic is not done
synchronously and ends up trying to dereference a page that has
not been set.

Looking at kmap_atomic_prot_pfn(), it uses
'arch_flush_lazy_mmu_mode' and doing the same in
kmap_atomic_prot() and __kunmap_atomic() makes the problem go
away.

Interestingly, commit b8bcfe997e4615 ("x86/paravirt: remove lazy
mode in interrupts") removed part of this to fix an interrupt
issue - but it went to far and did not consider this scenario.

Signed-off-by: Konrad Rzeszutek Wilk
Cc: Peter Zijlstra
Cc: Jeremy Fitzhardinge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Konrad Rzeszutek Wilk
2011-12-06 00:06:34 +0800

05 Dec, 2011

16 commits

f1b23714c Merge branch 'ucode' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Browse Code »

Ingo Molnar
2011-12-05 23:38:51 +0800
1ef038909 x86: Fix "Acer Aspire 1" reboot hang ... Browse Code »
1

Looks like on some Acer Aspire 1s with older bioses, reboot via bios
fails. It works on my machine, (with BIOS version 0.3310) but
not on some others (BIOS version 0.3309).

There's a log of problems at:

https://bbs.archlinux.org/viewtopic.php?id=124136

This patch adds a different callback to the reboot quirk table,
to allow rebooting via keybaord controller.

Reported-by: Uroš Vampl
Tested-by: Vasily Khoruzhick
Signed-off-by: Peter Chubb
Cc: Don Zickus
Cc: Peter Zijlstra
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1323093233-9481-1-git-send-email-anarsoul@gmail.com
Signed-off-by: Ingo Molnar

Peter Chubb
2011-12-05 22:06:17 +0800
8dbf4a300 x86/mtrr: Resolve inconsistency with Intel processor manual ... Browse Code »

Following is from Notes of section 11.5.3 of Intel processor
manual available at:

http://www.intel.com/Assets/PDF/manual/325384.pdf

For the Pentium 4 and Intel Xeon processors, after the sequence of
steps given above has been executed, the cache lines containing the
code between the end of the WBINVD instruction and before the
MTRRS have actually been disabled may be retained in the cache
hierarchy. Here, to remove code from the cache completely, a
second WBINVD instruction must be executed after the MTRRs have
been disabled.

This patch provides resolution for that.

Ideally, I will like to make changes only for Pentium 4 and Xeon
processors. But, I am not finding easier way to do it.
And, extra wbinvd() instruction does not hurt much for other
processors.

Signed-off-by: Ajaykumar Hotchandani
Cc: Linus Torvalds
Cc: Arjan van de Ven
Cc: Lucas De Marchi
Link: http://lkml.kernel.org/r/4EBD1CC5.3030008@oracle.com
Signed-off-by: Ingo Molnar

Ajaykumar Hotchandani
2011-12-05 22:06:15 +0800
ce37defc0 x86: Document rdmsr_safe restrictions ... Browse Code »
43

Recently, I got bitten by using rdmsr_safe too early in the boot
process. Document its shortcomings for future reference.

Link: http://lkml.kernel.org/r/4ED5B70F.606@lwfinger.net
Signed-off-by: Borislav Petkov

Borislav Petkov
2011-12-05 21:28:37 +0800
bd3990639 x86, microcode: Fix the failure path of microcode update driver init code ... Browse Code »

The microcode update driver's initialization code does not handle
failures correctly. This patch fixes this issue.

Signed-off-by: Jan Beulich
Signed-off-by: Srivatsa S. Bhat
Link: http://lkml.kernel.org/r/20111107123530.12164.31227.stgit@srivatsabhat.in.ibm.com
Link: http://lkml.kernel.org/r/4ED8E2270200007800065120@nat28.tlf.novell.com
Signed-off-by: Borislav Petkov

Srivatsa S. Bhat
2011-12-05 21:21:01 +0800
644ddf588 Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup ... Browse Code »

TAINT_FIRMWARE_WORKAROUND should be set when an MTRR fixup
is done.

Signed-off-by: Prarit Bhargava
Acked-by: David Rientjes
Link: http://lkml.kernel.org/r/1318958650-12447-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar

Prarit Bhargava
2011-12-05 20:48:50 +0800
9e6866686 x86/mpparse: Account for bus types other than ISA and PCI ... Browse Code »
1

In commit f8924e770e04 ("x86: unify mp_bus_info"), the 32-bit
and 64-bit versions of MP_bus_info were rearranged to match each
other better. Unfortunately it introduced a regression: prior
to that change we used to always set the mp_bus_not_pci bit,
then clear it if we found a PCI bus. After it, we set
mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave
it alone otherwise.

In the cases of ISA and PCI, there's not much difference. But
ISA is not the only non-PCI bus, so it's better to always set
mp_bus_not_pci and clear it only for PCI.

Without this change, Dan's Dell PowerEdge 4200 panics on boot
with a log indicating interrupt routing trouble unless the
"noapic" option is supplied. With this change, the machine
boots reliably without "noapic".

Fixes http://bugs.debian.org/586494

Reported-bisected-and-tested-by: Dan McGrath
Signed-off-by: Bjorn Helgaas
Cc: stable@vger.kernel.org # 2.6.26+
Cc: Dan McGrath
Cc: Alexey Starikovskiy
[jrnieder@gmail.com: clarified commit message]
Signed-off-by: Jonathan Nieder
Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net
Signed-off-by: Ingo Molnar

Bjorn Helgaas
2011-12-05 20:46:27 +0800
efa221268 x86, mrst: Change the pmic_gpio device type to IPC ... Browse Code »

In latest firmware's SFI tables, pmic_gpio has been set to
IPC type of device, so we need handle it too.

Signed-off-by: Feng Tang
Signed-off-by: Alan Cox
Signed-off-by: Ingo Molnar

Feng Tang
2011-12-05 19:42:15 +0800
28744b3e9 mrst: Added some platform data for the SFI translations ... Browse Code »

Add SFI glue for the following devices:

tca6416: a gpio expander compatible with max7315
mpu3050: gyro sensor

Both of these actual drivers are already upstream

Signed-off-by: Jekyll Lai
Signed-off-by: Alan Cox
Signed-off-by: Ingo Molnar

Jekyll Lai
2011-12-05 19:42:13 +0800
48bc55621 x86,mrst: Power control commands update ... Browse Code »

On the Intel MID devices SCU commands are issued to manage power
off and the like. We need to issue different ones for
non-Lincroft based devices.

Signed-off-by: Alek Du
Signed-off-by: Jacob Pan
Signed-off-by: Alan Cox
Signed-off-by: Ingo Molnar

Jacob Pan
2011-12-05 19:42:11 +0800
6be30bb7d x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot ... Browse Code »

Dell OptiPlex 990 is known to require PCI reboot, so add it to
the reboot blacklist in pci_reboot_dmi_table[].

Signed-off-by: Rafael J. Wysocki
Link: http://lkml.kernel.org/r/201111160019.51303.rjw@sisk.pl
Signed-off-by: Ingo Molnar

Rafael J. Wysocki
2011-12-05 19:20:43 +0800
b495e039b x86, UV: Fix UV2 hub part number ... Browse Code »

There was a mixup when the SGI UV2 hub chip was sent to be
fabricated, and it ended up with the wrong part number in the
HRP_NODE_ID mmr. Future versions of the chip will (may) have the
correct part number. Change the UV infrastructure to recognize
both part numbers as valid IDs of a UV2 hub chip.

Signed-off-by: Jack Steiner
Link: http://lkml.kernel.org/r/20111129210058.GA20452@sgi.com
Signed-off-by: Ingo Molnar

Jack Steiner
2011-12-05 18:49:52 +0800
69682b625 x86: Add user_mode_vm check in stack_overflow_check ... Browse Code »

The kernel stack overflow is checked in stack_overflow_check(),
which may wrongly detect the overflow if the stack pointer in
user space points to the kernel stack intentionally or
accidentally. So, the actual overflow is never detected after
this misdetection because WARN_ONCE() is used on the detection
of it.

This patch adds user-mode-vm checking before it to avoid this
problem and bails out early if the user stack is used.

Signed-off-by: Mitsuo Hayasaka
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Randy Dunlap
Link: http://lkml.kernel.org/r/20111129060821.11076.55315.stgit@ltc219.sdl.hitachi.co.jp
Signed-off-by: Ingo Molnar
Cc: "H. Peter Anvin"

Mitsuo Hayasaka
2011-12-05 18:28:25 +0800
16e5294e5 perf, x86: Force IBS LVT offset assignment for family 10h ... Browse Code »

On AMD family 10h we see firmware bug messages like the following:

[Firmware Bug]: cpu 6, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu
[Firmware Bug]: cpu 6, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
[Firmware Bug]: using offset 1 for IBS interrupts
[Firmware Bug]: workaround enabled for IBS LVT offset
perf: AMD IBS detected (0x00000007)

We always see this, since the offsets are not assigned by the BIOS for
this family. Force LVT offset assignment in this case. If the OS
assignment fails, fallback to BIOS settings and try to setup this.

The fallback to BIOS settings weakens the family check since
force_ibs_eilvt_setup() may fail e.g. in case of virtual machines.
But setup may still succeed if BIOS offsets are correct.

Other families don't have a workaround implemented that assigns LVT
offsets. It's ok, to drop calling force_ibs_eilvt_setup() for that
families.

With the patch the [Firmware Bug] messages vanish. We see now:

IBS: LVT offset 1 assigned
perf: AMD IBS detected (0x00000007)

Signed-off-by: Robert Richter
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111109162225.GO12451@erda.amd.com
Signed-off-by: Ingo Molnar

Robert Richter
2011-12-05 16:32:59 +0800
6a600a8b8 perf, x86: Disable PEBS on SandyBridge chips ... Browse Code »

Cc: Stephane Eranian
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-05 16:32:38 +0800
8e8da023f x86: Fix boot failures on older AMD CPU's ... Browse Code »

People with old AMD chips are getting hung boots, because commit
bcb80e53877c ("x86, microcode, AMD: Add microcode revision to
/proc/cpuinfo") moved the microcode detection too early into
"early_init_amd()".

At that point we are *so* early in the booth that the exception tables
haven't even been set up yet, so the whole

rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);

doesn't actually work: if the rdmsr does a GP fault (due to non-existant
MSR register on older CPU's), we can't fix it up yet, and the boot fails.

Fix it by simply moving the code to a slightly later point in the boot
(init_amd() instead of early_init_amd()), since the kernel itself
doesn't even really care about the microcode patchlevel at this point
(or really ever: it's made available to user space in /proc/cpuinfo, and
updated if you do a microcode load).

Reported-tested-and-bisected-by: Larry Finger
Tested-by: Bob Tracy
Acked-by: Borislav Petkov
Cc: Ingo Molnar
Cc: Srivatsa S. Bhat
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-12-05 03:57:09 +0800