Eric Lee / smarc-fsl-linux-kernel

10 Dec, 2011

1 commit

a776878d6 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Calling __pa() with an ioremap()ed address is invalid
x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
x86/intel_mid: Kconfig select fix
x86/intel_mid: Fix the Kconfig for MID selection

Linus Torvalds
2011-12-10 06:45:12 +0800

09 Dec, 2011

3 commits

b6999b191 thp: add compound tail page _mapcount when mapped ... Browse Code »
87

With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd. If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
-net none -device pci-assign,host=07:00.1

kernel BUG at mm/swap.c:114!
invalid opcode: 0000 [#1] SMP
Call Trace:
put_page+0x15/0x37
kvm_release_pfn_clean+0x31/0x36
kvm_iommu_put_pages+0x94/0xb1
kvm_iommu_unmap_memslots+0x80/0xb6
kvm_assign_device+0xba/0x117
kvm_vm_ioctl_assigned_device+0x301/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP put_compound_page+0xd4/0x168

Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc: Andi Kleen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Youquan Song
2011-12-09 23:50:28 +0800
e8c710628 x86, efi: Calling __pa() with an ioremap()ed address is invalid ... Browse Code »

If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:

BUG: unable to handle kernel paging request at f7f22280
IP: [] reserve_ram_pages_type+0x89/0x210
[...]

Call Trace:
[] ? page_is_ram+0x1a/0x40
[] reserve_memtype+0xdf/0x2f0
[] set_memory_uc+0x49/0xa0
[] efi_enter_virtual_mode+0x1c2/0x3aa
[] start_kernel+0x291/0x2f2
[] ? loglevel+0x1b/0x1b
[] i386_start_kernel+0xbf/0xc8

A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,

https://bugzilla.redhat.com/show_bug.cgi?id=748516

Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.

Signed-off-by: Matt Fleming
Cc: H. Peter Anvin
Cc: Matthew Garrett
Cc: Zhang Rui
Cc: Huang Ying
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar

Matt Fleming
2011-12-09 15:32:26 +0800
2ded6e6a9 x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked ... Browse Code »
1

When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
controls whether the HPET or the RTC delivers interrupts to irq8. When
the system goes into suspend, the RTC driver sends a signal to the
HPET driver so that the HPET releases control of irq8, allowing the
RTC to wake the system from suspend. The switchover is accomplished by
a write to the HPET configuration registers which currently only
occurs while servicing the HPET interrupt.

On some systems, I have seen the system suspend before an HPET
interrupt occurs, preventing the write to the HPET configuration
register and leaving the HPET in control of the irq8. As the HPET is
not active during suspend, it does not generate a wake signal and RTC
alarms do not work.

This patch forces the HPET driver to immediately transfer control of
the irq8 channel to the RTC instead of waiting until the next
interrupt event.

Signed-off-by: Mark Langsdorf
Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com
Tested-by: Andreas Herrmann
Signed-off-by: Andreas Herrmann
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org

Mark Langsdorf
2011-12-09 04:47:22 +0800

06 Dec, 2011

8 commits

4e2b1c4f5 x86/intel_mid: Kconfig select fix ... Browse Code »

If we select a symbol it should have a type declared first
otherwise in some situations the config tools get upset. They
are currently perhaps a bit too resilient which is why this
wasn't noticed initially.

Signed-off-by: Alan Cox
Link: http://lkml.kernel.org/r/20111206132811.4041.32549.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar

Alan Cox
2011-12-06 21:40:50 +0800
dd1375253 x86/intel_mid: Fix the Kconfig for MID selection ... Browse Code »

We currently fail to build on CONFIG_X86_INTEL_MID=y and
CONFIG_X86_MRST unset.

We could build all the bits to make generic MID work if you
picked MID platform alone but that's really silly. Instead use
select and two variables.

This looks a bit daft right now but once we add a Medfield
selection it'll start to look a good deal more sensible.

Reported-by: Ingo Molnar
Reported-by: Stanislaw Gruszka
Signed-off-by: Alan Cox
Link: http://lkml.kernel.org/r/20111205231433.28811.51297.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar

Alan Cox
2011-12-06 18:28:36 +0800
45e713efe Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intr_remapping: Fix section mismatch in ir_dev_scope_init()
intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev()
x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions
x86, AMD: Correct align_va_addr documentation
x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms
x86/mrst: Battery fixes
x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode
x86: Fix "Acer Aspire 1" reboot hang
x86/mtrr: Resolve inconsistency with Intel processor manual
x86: Document rdmsr_safe restrictions
x86, microcode: Fix the failure path of microcode update driver init code
Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup
x86/mpparse: Account for bus types other than ISA and PCI
x86, mrst: Change the pmic_gpio device type to IPC
mrst: Added some platform data for the SFI translations
x86,mrst: Power control commands update
x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot
x86, UV: Fix UV2 hub part number
x86: Add user_mode_vm check in stack_overflow_check

Linus Torvalds
2011-12-06 08:54:15 +0800
232ea3445 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix loss of notification with multi-event
perf, x86: Force IBS LVT offset assignment for family 10h
perf, x86: Disable PEBS on SandyBridge chips
trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
perf session: Fix crash with invalid CPU list
perf python: Fix undefined symbol problem
perf/x86: Enable raw event access to Intel offcore events
perf: Don't use -ENOSPC for out of PMU resources
perf: Do not set task_ctx pointer in cpuctx if there are no events in the context
perf/x86: Fix PEBS instruction unwind
oprofile, x86: Fix crash when unloading module (nmi timer mode)
oprofile: Fix crash when unloading module (hr timer mode)

Linus Torvalds
2011-12-06 08:54:00 +0800
7125facea Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, x86: Avoid unnecessary overflow in sched_clock
sched: Fix buglet in return_cfs_rq_runtime()
sched: Avoid SMT siblings in select_idle_sibling() if possible
sched: Set the command name of the idle tasks in SMP kernels
sched, rt: Provide means of disabling cross-cpu bandwidth sharing
sched: Document wait_for_completion_*() return values
sched_fair: Fix a typo in the comment describing update_sd_lb_stats
sched: Add a comment to effective_load() since it's a pain

Linus Torvalds
2011-12-06 08:50:24 +0800
f62ef5f3e x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions ... Browse Code »

I've received complaints that the numa_node attribute for family
15h model 00-0fh (e.g. Interlagos) northbridge functions shows
-1 instead of the proper node ID.

Correct this with attached quirks (similar to quirks for other
AMD CPU families used in multi-socket systems).

Signed-off-by: Andreas Herrmann
Cc: Frank Arnold
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/20111202072143.GA31916@alberich.amd.com
Signed-off-by: Ingo Molnar

Andreas Herrmann
2011-12-06 01:13:11 +0800
35d476996 x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms ... Browse Code »

Intel MID x86 platforms have a memory mapped virtual RTC
instead. No MID platform have the default ports (and
accessing them may do weird stuff).

Signed-off-by: Mathias Nyman
Signed-off-by: Alan Cox
Cc: feng.tang@intel.com
Cc: Feng Tang
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Mathias Nyman
2011-12-06 00:09:21 +0800
2cd1c8d4d x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode ... Browse Code »
1

Fix an outstanding issue that has been reported since 2.6.37.
Under a heavy loaded machine processing "fork()" calls could
crash with:

BUG: unable to handle kernel paging request at f573fc8c
IP: [] swap_count_continued+0x104/0x180
*pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000 Oops: 0000 [#1] SMP
Modules linked in:
Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
EIP: 0061:[] EFLAGS: 00210246 CPU: 3
EIP is at swap_count_continued+0x104/0x180
.. snip..
Call Trace:
[] ? __swap_duplicate+0xc2/0x160
[] ? pte_mfn_to_pfn+0x87/0xe0
[] ? swap_duplicate+0x14/0x40
[] ? copy_pte_range+0x45b/0x500
[] ? copy_page_range+0x195/0x200
[] ? dup_mmap+0x1c6/0x2c0
[] ? dup_mm+0xa8/0x130
[] ? copy_process+0x98a/0xb30
[] ? do_fork+0x4f/0x280
[] ? getnstimeofday+0x43/0x100
[] ? sys_clone+0x30/0x40
[] ? ptregs_clone+0x15/0x48
[] ? syscall_call+0x7/0xb

The problem is that in copy_page_range() we turn lazy mode on,
and then in swap_entry_free() we call swap_count_continued()
which ends up in:

map = kmap_atomic(page, KM_USER0) + offset;

and then later we touch *map.

Since we are running in batched mode (lazy) we don't actually
set up the PTE mappings and the kmap_atomic is not done
synchronously and ends up trying to dereference a page that has
not been set.

Looking at kmap_atomic_prot_pfn(), it uses
'arch_flush_lazy_mmu_mode' and doing the same in
kmap_atomic_prot() and __kunmap_atomic() makes the problem go
away.

Interestingly, commit b8bcfe997e4615 ("x86/paravirt: remove lazy
mode in interrupts") removed part of this to fix an interrupt
issue - but it went to far and did not consider this scenario.

Signed-off-by: Konrad Rzeszutek Wilk
Cc: Peter Zijlstra
Cc: Jeremy Fitzhardinge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Konrad Rzeszutek Wilk
2011-12-06 00:06:34 +0800

05 Dec, 2011

16 commits

f1b23714c Merge branch 'ucode' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Browse Code »

Ingo Molnar
2011-12-05 23:38:51 +0800
1ef038909 x86: Fix "Acer Aspire 1" reboot hang ... Browse Code »
1

Looks like on some Acer Aspire 1s with older bioses, reboot via bios
fails. It works on my machine, (with BIOS version 0.3310) but
not on some others (BIOS version 0.3309).

There's a log of problems at:

https://bbs.archlinux.org/viewtopic.php?id=124136

This patch adds a different callback to the reboot quirk table,
to allow rebooting via keybaord controller.

Reported-by: Uroš Vampl
Tested-by: Vasily Khoruzhick
Signed-off-by: Peter Chubb
Cc: Don Zickus
Cc: Peter Zijlstra
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1323093233-9481-1-git-send-email-anarsoul@gmail.com
Signed-off-by: Ingo Molnar

Peter Chubb
2011-12-05 22:06:17 +0800
8dbf4a300 x86/mtrr: Resolve inconsistency with Intel processor manual ... Browse Code »

Following is from Notes of section 11.5.3 of Intel processor
manual available at:

http://www.intel.com/Assets/PDF/manual/325384.pdf

For the Pentium 4 and Intel Xeon processors, after the sequence of
steps given above has been executed, the cache lines containing the
code between the end of the WBINVD instruction and before the
MTRRS have actually been disabled may be retained in the cache
hierarchy. Here, to remove code from the cache completely, a
second WBINVD instruction must be executed after the MTRRs have
been disabled.

This patch provides resolution for that.

Ideally, I will like to make changes only for Pentium 4 and Xeon
processors. But, I am not finding easier way to do it.
And, extra wbinvd() instruction does not hurt much for other
processors.

Signed-off-by: Ajaykumar Hotchandani
Cc: Linus Torvalds
Cc: Arjan van de Ven
Cc: Lucas De Marchi
Link: http://lkml.kernel.org/r/4EBD1CC5.3030008@oracle.com
Signed-off-by: Ingo Molnar

Ajaykumar Hotchandani
2011-12-05 22:06:15 +0800
ce37defc0 x86: Document rdmsr_safe restrictions ... Browse Code »
43

Recently, I got bitten by using rdmsr_safe too early in the boot
process. Document its shortcomings for future reference.

Link: http://lkml.kernel.org/r/4ED5B70F.606@lwfinger.net
Signed-off-by: Borislav Petkov

Borislav Petkov
2011-12-05 21:28:37 +0800
bd3990639 x86, microcode: Fix the failure path of microcode update driver init code ... Browse Code »

The microcode update driver's initialization code does not handle
failures correctly. This patch fixes this issue.

Signed-off-by: Jan Beulich
Signed-off-by: Srivatsa S. Bhat
Link: http://lkml.kernel.org/r/20111107123530.12164.31227.stgit@srivatsabhat.in.ibm.com
Link: http://lkml.kernel.org/r/4ED8E2270200007800065120@nat28.tlf.novell.com
Signed-off-by: Borislav Petkov

Srivatsa S. Bhat
2011-12-05 21:21:01 +0800
644ddf588 Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup ... Browse Code »

TAINT_FIRMWARE_WORKAROUND should be set when an MTRR fixup
is done.

Signed-off-by: Prarit Bhargava
Acked-by: David Rientjes
Link: http://lkml.kernel.org/r/1318958650-12447-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar

Prarit Bhargava
2011-12-05 20:48:50 +0800
9e6866686 x86/mpparse: Account for bus types other than ISA and PCI ... Browse Code »
1

In commit f8924e770e04 ("x86: unify mp_bus_info"), the 32-bit
and 64-bit versions of MP_bus_info were rearranged to match each
other better. Unfortunately it introduced a regression: prior
to that change we used to always set the mp_bus_not_pci bit,
then clear it if we found a PCI bus. After it, we set
mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave
it alone otherwise.

In the cases of ISA and PCI, there's not much difference. But
ISA is not the only non-PCI bus, so it's better to always set
mp_bus_not_pci and clear it only for PCI.

Without this change, Dan's Dell PowerEdge 4200 panics on boot
with a log indicating interrupt routing trouble unless the
"noapic" option is supplied. With this change, the machine
boots reliably without "noapic".

Fixes http://bugs.debian.org/586494

Reported-bisected-and-tested-by: Dan McGrath
Signed-off-by: Bjorn Helgaas
Cc: stable@vger.kernel.org # 2.6.26+
Cc: Dan McGrath
Cc: Alexey Starikovskiy
[jrnieder@gmail.com: clarified commit message]
Signed-off-by: Jonathan Nieder
Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net
Signed-off-by: Ingo Molnar

Bjorn Helgaas
2011-12-05 20:46:27 +0800
efa221268 x86, mrst: Change the pmic_gpio device type to IPC ... Browse Code »

In latest firmware's SFI tables, pmic_gpio has been set to
IPC type of device, so we need handle it too.

Signed-off-by: Feng Tang
Signed-off-by: Alan Cox
Signed-off-by: Ingo Molnar

Feng Tang
2011-12-05 19:42:15 +0800
28744b3e9 mrst: Added some platform data for the SFI translations ... Browse Code »

Add SFI glue for the following devices:

tca6416: a gpio expander compatible with max7315
mpu3050: gyro sensor

Both of these actual drivers are already upstream

Signed-off-by: Jekyll Lai
Signed-off-by: Alan Cox
Signed-off-by: Ingo Molnar

Jekyll Lai
2011-12-05 19:42:13 +0800
48bc55621 x86,mrst: Power control commands update ... Browse Code »

On the Intel MID devices SCU commands are issued to manage power
off and the like. We need to issue different ones for
non-Lincroft based devices.

Signed-off-by: Alek Du
Signed-off-by: Jacob Pan
Signed-off-by: Alan Cox
Signed-off-by: Ingo Molnar

Jacob Pan
2011-12-05 19:42:11 +0800
6be30bb7d x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot ... Browse Code »

Dell OptiPlex 990 is known to require PCI reboot, so add it to
the reboot blacklist in pci_reboot_dmi_table[].

Signed-off-by: Rafael J. Wysocki
Link: http://lkml.kernel.org/r/201111160019.51303.rjw@sisk.pl
Signed-off-by: Ingo Molnar

Rafael J. Wysocki
2011-12-05 19:20:43 +0800
b495e039b x86, UV: Fix UV2 hub part number ... Browse Code »

There was a mixup when the SGI UV2 hub chip was sent to be
fabricated, and it ended up with the wrong part number in the
HRP_NODE_ID mmr. Future versions of the chip will (may) have the
correct part number. Change the UV infrastructure to recognize
both part numbers as valid IDs of a UV2 hub chip.

Signed-off-by: Jack Steiner
Link: http://lkml.kernel.org/r/20111129210058.GA20452@sgi.com
Signed-off-by: Ingo Molnar

Jack Steiner
2011-12-05 18:49:52 +0800
69682b625 x86: Add user_mode_vm check in stack_overflow_check ... Browse Code »

The kernel stack overflow is checked in stack_overflow_check(),
which may wrongly detect the overflow if the stack pointer in
user space points to the kernel stack intentionally or
accidentally. So, the actual overflow is never detected after
this misdetection because WARN_ONCE() is used on the detection
of it.

This patch adds user-mode-vm checking before it to avoid this
problem and bails out early if the user stack is used.

Signed-off-by: Mitsuo Hayasaka
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Randy Dunlap
Link: http://lkml.kernel.org/r/20111129060821.11076.55315.stgit@ltc219.sdl.hitachi.co.jp
Signed-off-by: Ingo Molnar
Cc: "H. Peter Anvin"

Mitsuo Hayasaka
2011-12-05 18:28:25 +0800
16e5294e5 perf, x86: Force IBS LVT offset assignment for family 10h ... Browse Code »

On AMD family 10h we see firmware bug messages like the following:

[Firmware Bug]: cpu 6, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu
[Firmware Bug]: cpu 6, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
[Firmware Bug]: using offset 1 for IBS interrupts
[Firmware Bug]: workaround enabled for IBS LVT offset
perf: AMD IBS detected (0x00000007)

We always see this, since the offsets are not assigned by the BIOS for
this family. Force LVT offset assignment in this case. If the OS
assignment fails, fallback to BIOS settings and try to setup this.

The fallback to BIOS settings weakens the family check since
force_ibs_eilvt_setup() may fail e.g. in case of virtual machines.
But setup may still succeed if BIOS offsets are correct.

Other families don't have a workaround implemented that assigns LVT
offsets. It's ok, to drop calling force_ibs_eilvt_setup() for that
families.

With the patch the [Firmware Bug] messages vanish. We see now:

IBS: LVT offset 1 assigned
perf: AMD IBS detected (0x00000007)

Signed-off-by: Robert Richter
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111109162225.GO12451@erda.amd.com
Signed-off-by: Ingo Molnar

Robert Richter
2011-12-05 16:32:59 +0800
6a600a8b8 perf, x86: Disable PEBS on SandyBridge chips ... Browse Code »

Cc: Stephane Eranian
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-05 16:32:38 +0800
8e8da023f x86: Fix boot failures on older AMD CPU's ... Browse Code »

People with old AMD chips are getting hung boots, because commit
bcb80e53877c ("x86, microcode, AMD: Add microcode revision to
/proc/cpuinfo") moved the microcode detection too early into
"early_init_amd()".

At that point we are *so* early in the booth that the exception tables
haven't even been set up yet, so the whole

rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);

doesn't actually work: if the rdmsr does a GP fault (due to non-existant
MSR register on older CPU's), we can't fix it up yet, and the boot fails.

Fix it by simply moving the code to a slightly later point in the boot
(init_amd() instead of early_init_amd()), since the kernel itself
doesn't even really care about the microcode patchlevel at this point
(or really ever: it's made available to user space in /proc/cpuinfo, and
updated if you do a microcode load).

Reported-tested-and-bisected-by: Larry Finger
Tested-by: Bob Tracy
Acked-by: Borislav Petkov
Cc: Ingo Molnar
Cc: Srivatsa S. Bhat
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-12-05 03:57:09 +0800

04 Dec, 2011

1 commit

e5fd47bfa xen/pm_idle: Make pm_idle be default_idle under Xen. ... Browse Code »

The idea behind commit d91ee5863b71 ("cpuidle: replace xen access to x86
pm_idle and default_idle") was to have one call - disable_cpuidle()
which would make pm_idle not be molested by other code. It disallows
cpuidle_idle_call to be set to pm_idle (which is excellent).

But in the select_idle_routine() and idle_setup(), the pm_idle can still
be set to either: amd_e400_idle, mwait_idle or default_idle. This
depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.

In case of mwait_idle we can hit some instances where the hypervisor
(Amazon EC2 specifically) sets the MWAIT and we get:

Brought up 2 CPUs
invalid opcode: 0000 [#1] SMP

Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
RIP: e030:[] [] mwait_idle+0x6f/0xb4
...
Call Trace:
[] cpu_idle+0xae/0xe8
[] cpu_bringup_and_idle+0xe/0x10
RIP [] mwait_idle+0x6f/0xb4
RSP

In the case of amd_e400_idle we don't get so spectacular crashes, but we
do end up making an MSR which is trapped in the hypervisor, and then
follow it up with a yield hypercall. Meaning we end up going to
hypervisor twice instead of just once.

The previous behavior before v3.0 was that pm_idle was set to
default_idle regardless of select_idle_routine/idle_setup.

We want to do that, but only for one specific case: Xen. This patch
does that.

Fixes RH BZ #739499 and Ubuntu #881076
Reported-by: Stefan Bader
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Linus Torvalds

Konrad Rzeszutek Wilk
2011-12-04 02:49:58 +0800

22 Nov, 2011

1 commit

cc11f9edd fix braino in um patchset (mea culpa) ... Browse Code »

wrong register returned...

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2011-11-22 04:10:21 +0800

21 Nov, 2011

1 commit

a4cc3889f Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM guest: prevent tracing recursion with kvmclock
Revert "KVM: PPC: Add support for explicit HIOR setting"
KVM: VMX: Check for automatic switch msr table overflow
KVM: VMX: Add support for guest/host-only profiling
KVM: VMX: add support for switching of PERF_GLOBAL_CTRL
KVM: s390: announce SYNC_MMU
KVM: s390: Fix tprot locking
KVM: s390: handle SIGP sense running intercepts
KVM: s390: Fix RUNNING flag misinterpretation

Linus Torvalds
2011-11-21 06:57:43 +0800

20 Nov, 2011

1 commit

95ef1e529 KVM guest: prevent tracing recursion with kvmclock ... Browse Code »

Prevent tracing of preempt_disable() in get_cpu_var() in
kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled,
preempt_disable/enable() are traced and this causes the function_graph
tracer to go into an infinite recursion. By open coding the
preempt_disable() around the get_cpu_var(), we can use the notrace
version which prevents preempt_disable/enable() from being traced and
prevents the recursion.

Based on a similar patch for Xen from Jeremy Fitzhardinge.

Tested-by: Gleb Natapov
Acked-by: Steven Rostedt
Signed-off-by: Avi Kivity

Avi Kivity
2011-11-20 16:53:48 +0800

19 Nov, 2011

1 commit

5c6b4e84c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
random: Fix handing of arch_get_random_long in get_random_bytes()
x86: Call stop_machine_text_poke() on all CPUs
x86, ioapic: Only print ioapic debug information for IRQs belonging to an ioapic chip
x86/mrst: Avoid reporting wrong nmi status
x86/mrst: Add support for Penwell clock calibration
x86/apic: Allow use of lapic timer early calibration result
x86/apic: Do not clear nr_irqs_gsi if no legacy irqs
x86/platform: Add a wallclock_init func to x86_platforms ops
x86/mce: Make mce_chrdev_ops 'static const'

Linus Torvalds
2011-11-19 08:16:18 +0800

18 Nov, 2011

1 commit

b68445238 Merge branch 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/konrad/xen

* 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen-gntalloc: signedness bug in add_grefs()
xen-gntalloc: integer overflow in gntalloc_ioctl_alloc()
xen-gntdev: integer overflow in gntdev_alloc_map()
xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs.
xen/balloon: Avoid OOM when requesting highmem
xen: Remove hanging references to CONFIG_XEN_PLATFORM_PCI
xen: map foreign pages for shared rings by updating the PTEs directly

Linus Torvalds
2011-11-18 23:18:07 +0800

17 Nov, 2011

6 commits

e7fc6f93b KVM: VMX: Check for automatic switch msr table overflow ... Browse Code »

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2011-11-17 22:28:09 +0800
d7cd97964 KVM: VMX: Add support for guest/host-only profiling ... Browse Code »

Support guest/host-only profiling by switch perf msrs on
a guest entry if needed.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2011-11-17 22:28:00 +0800
8bf00a529 KVM: VMX: add support for switching of PERF_GLOBAL_CTRL ... Browse Code »

Some cpus have special support for switching PERF_GLOBAL_CTRL msr.
Add logic to detect if such support exists and works properly and extend
msr switching code to use it if available. Also extend number of generic
msr switching entries to 8.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2011-11-17 22:27:54 +0800
4cecf6d40 sched, x86: Avoid unnecessary overflow in sched_clock ... Browse Code »
1

(Added the missing signed-off-by line)

In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing
any precision.

We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.

Signed-off-by: Salman Qazi
Acked-by: John Stultz
Reviewed-by: Paul Turner
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar

Salman Qazi
2011-11-17 02:51:25 +0800
90d4f5534 xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs. ... Browse Code »
1

PVHVM running with more than 32 vcpus and pv_irq/pv_time enabled
need VCPU placement to work, or else it will softlockup.

CC: stable@kernel.org
Acked-by: Stefano Stabellini
Signed-off-by: Zhenzhong Duan
Signed-off-by: Konrad Rzeszutek Wilk

Zhenzhong Duan
2011-11-17 01:13:44 +0800
cd12909cb xen: map foreign pages for shared rings by updating the PTEs directly ... Browse Code »

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs. This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel
Acked-by: Andrew Morton
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Michal Simek

David Vrabel
2011-11-17 01:13:08 +0800