18 May, 2010
2 commits
-
…el/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Reduce stack_trace usage
lockdep: No need to disable preemption in debug atomic ops
lockdep: Actually _dec_ in debug_atomic_dec
lockdep: Provide off case for redundant_hardirqs_on increment
lockdep: Simplify debug atomic ops
lockdep: Fix redundant_hardirqs_on incremented with irqs enabled
lockstat: Make lockstat counting per cpu
i8253: Convert i8253_lock to raw_spinlock -
…/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Add amd_iommu=off command line option
iommu-api: Remove iommu_{un}map_range functions
x86/amd-iommu: Implement ->{un}map callbacks for iommu-api
x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes
x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes
x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
kvm: Change kvm_iommu_map_pages to map large pages
VT-d: Change {un}map_range functions to implement {un}map interface
iommu-api: Add ->{un}map callbacks to iommu_ops
iommu-api: Add iommu_map and iommu_unmap functions
iommu-api: Rename ->{un}map function pointers to ->{un}map_range
17 May, 2010
1 commit
-
In preparation for removing volatile from the atomic_t definition, this
patch adds a volatile cast to all the atomic read functions.Signed-off-by: Anton Blanchard
Signed-off-by: Linus Torvalds
15 May, 2010
5 commits
-
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mrst: Don't blindly access extended config space -
Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform. The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.This fixes booting certain VMware configurations with CONFIG_MRST=y.
Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.Reported-and-tested-by: Petr Vandrovec
Signed-off-by: H. Peter Anvin
Acked-by: Jacob Pan
Acked-by: Jesse Barnes
LKML-Reference: -
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
x86, k8: Fix build error when K8_NB is disabled
x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
x86: Fix fake apicid to node mapping for numa emulation -
When running a quest kernel on xen we get:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:Pid: 0, comm: swapper Tainted: G W 2.6.34-rc3 #1 /HVM domU
RIP: 0010:[] [] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
[] ? sync_supers_timer_fn+0x0/0x1c
[] ? mod_timer+0x23/0x25
[] ? arm_supers_timer+0x34/0x36
[] ? hrtimer_get_next_event+0xa7/0xc3
[] ? get_next_timer_interrupt+0x19a/0x20d
[] get_cpu_leaves+0x5c/0x232
[] ? sched_clock_local+0x1c/0x82
[] ? sched_clock_tick+0x75/0x7a
[] generic_smp_call_function_single_interrupt+0xae/0xd0
[] smp_call_function_single_interrupt+0x18/0x27
[] call_function_single_interrupt+0x13/0x20
[] ? notifier_call_chain+0x14/0x63
[] ? native_safe_halt+0xc/0xd
[] ? default_idle+0x36/0x53
[] cpu_idle+0xaa/0xe4
[] rest_init+0x7e/0x80
[] start_kernel+0x40e/0x419
[] x86_64_start_reservations+0xb3/0xb7
[] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 70 38 48 8d 8d 5c ff
ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP [] cpuid4_cache_lookup_regs+0x2ca/0x3df
RSP
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.Check if northbridge devices are present and do not enable the feature
if there are none.[ hpa: backported to 2.6.34 ]
Signed-off-by: Frank Arnold
LKML-Reference:
Acked-by: Borislav Petkov
Signed-off-by: H. Peter Anvin
Cc: -
K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.Signed-off-by: Borislav Petkov
LKML-Reference:
Signed-off-by: H. Peter Anvin
Cc:
14 May, 2010
1 commit
-
If host CPU is exposed to a guest the OSVW MSRs are not guaranteed
to be present and a GP fault occurs. Thus checking the feature flag is
essential.Cc: # .32.x .33.x
Signed-off-by: Andreas Herrmann
LKML-Reference:
Signed-off-by: H. Peter Anvin
13 May, 2010
3 commits
-
As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.Signed-off-by: Jan Kiszka
KVM-Stable-Tag
Acked-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti -
cpuid_update may operate VMCS, so vcpu_load() and vcpu_put()
should be called to ensure correctness.Signed-off-by: Dongxiao Xu
Signed-off-by: Marcelo Tosatti -
This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.Cc: Jan Kiszka
Cc: Gleb Natapov
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity
11 May, 2010
3 commits
-
Conflicts:
arch/x86/kernel/amd_iommu.c -
This patch adds a command line option to tell the AMD IOMMU
driver to not initialize any IOMMU it finds.Signed-off-by: Joerg Roedel
-
Fix kprobe/x86 to check removed int3 when failing to get kprobe
from hlist. Since we have a time window between checking int3
exists on probed address and getting kprobe on that address,
we can have following scenario:-------
CPU1 CPU2
hit int3
check int3 exists
remove int3
remove kprobe from hlist
get kprobe from hlist
no kprobe->OOPS!
-------This patch moves int3 checking if there is no kprobe on that
address for fixing this problem as follows:------
CPU1 CPU2
hit int3
remove int3
remove kprobe from hlist
get kprobe from hlist
no kprobe->check int3 exists
->rollback&retry
------Signed-off-by: Masami Hiramatsu
Acked-by: Ananth N Mavinakayanahalli
Cc: systemtap
Cc: DLE
Cc: Dave Anderson
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
06 May, 2010
1 commit
-
With NUMA emulation, it's possible for a single cpu to be bound
to multiple nodes since more than one may have affinity if
allocated on a physical node that is local to the cpu.APIC ids must therefore be mapped to the lowest node ids to
maintain generic kernel use of functions such as cpu_to_node()
that determine device affinity. For example, if a device has
proximity to physical node 1, for instance, and a cpu happens to
be mapped to a higher emulated node id 8, the proximity may not
be correctly determined by comparison in generic code even
though the cpu may be truly local and allocated on physical node 1.When this happens, the true topology of the machine isn't
accurately represented in the emulated environment; although
this isn't critical to the system's uptime, any generic code
that is NUMA aware benefits from the physical topology being
accurately represented.This can affect any system that maps multiple APIC ids to a
single node and is booted with numa=fake=N where N is greater
than the number of physical nodes.Signed-off-by: David Rientjes
Cc: Yinghai Lu
Cc: Suresh Siddha
LKML-Reference:
Signed-off-by: Ingo Molnar
05 May, 2010
2 commits
-
…git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
powernow-k8: Fix frequency reporting
x86: Fix parse_reservetop() build failure on certain configs
x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests
x86: Fix 'reservetop=' functionality -
The x86_64 call_rwsem_wait() treats the active state counter part of the
R/W semaphore state as being 16-bit when it's actually 32-bit (it's half
of the 64-bit state). It should do "decl %edx" not "decw %dx".Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
03 May, 2010
3 commits
-
With F10, model 10, all valid frequencies are in the ACPI _PST table.
Cc: # 33.x 32.x
Signed-off-by: Mark Langsdorf
LKML-Reference:
Signed-off-by: Borislav Petkov
Reviewed-by: Thomas Renninger
Signed-off-by: H. Peter Anvin
Signed-off-by: Ingo Molnar -
Commit e67a807 ("x86: Fix 'reservetop=' functionality") added a
fixup_early_ioremap() call to parse_reservetop() and declared it
in io.h.But asm/io.h was only included indirectly - and on some configs
not at all, causing a build failure on those configs.Cc: Liang Li
Cc: Konrad Rzeszutek Wilk
Cc: Yinghai Lu
Cc: Jeremy Fitzhardinge
Cc: Wang Chen
Cc: "H. Peter Anvin"
Cc: Andrew Morton
LKML-Reference:
Signed-off-by: Ingo Molnar
01 May, 2010
1 commit
-
Upstream PV guests fail to boot because of a NULL pointer in
irq_force_complete_move(). It is possible that xen guests have
irq_desc->chip_data = NULL.Test for NULL chip_data pointer before attempting to complete an irq move.
Signed-off-by: Prarit Bhargava
LKML-Reference:
Acked-by: Suresh Siddha
Signed-off-by: H. Peter Anvin
Cc: [2.6.33]
30 Apr, 2010
1 commit
-
When specifying the 'reservetop=0xbadc0de' kernel parameter,
the kernel will stop booting due to a early_ioremap bug that
relates to commit 8827247ff.The root cause of boot failure problem is the value of
'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
But later in setup_arch, the function 'parse_early_param' will
modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.The simplest fix might be use __fix_to_virt(idx0) to get updated
value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
old value from slot_virt[slot] directly.Changelog since v0:
-v1: When reservetop being handled then FIXADDR_TOP get
adjusted, Hence check prev_map then re-initialize slot_virt and
PMD based on new FIXADDR_TOP.-v2: place fixup_early_ioremap hence call early_ioremap_init in
reserve_top_address to re-initialize slot_virt and
corresponding PMD when parse_reservertop-v3: move fixup_early_ioremap out of reserve_top_address to make
sure other clients of reserve_top_address like xen/lguest won't
brokenSigned-off-by: Liang Li
Tested-by: Konrad Rzeszutek Wilk
Acked-by: Yinghai Lu
Acked-by: Jeremy Fitzhardinge
Cc: Wang Chen
Cc: "H. Peter Anvin"
Cc: Andrew Morton
LKML-Reference:
[ fixed three small cleanliness details in fixup_early_ioremap() ]
Signed-off-by: Ingo Molnar
29 Apr, 2010
2 commits
-
…git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Disable large pages on CPUs with Atom erratum AAE44
x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
x86, mrst: Conditionally register cpu hotplug notifier for apbt -
ACPI _CRS Address Space Descriptors have _MIN, _MAX, and _LEN. Linux has
been computing Address Spaces as [_MIN to _MIN + _LEN - 1]. Based on the
tests in the bug reports below, Windows apparently uses [_MIN to _MAX].Per spec (ACPI 4.0, Table 6-40), for _CRS fixed-size, fixed location
descriptors, "_LEN must be (_MAX - _MIN + 1)", and when that's true, it
doesn't matter which way we compute the end. But of course, there are
BIOSes that don't follow this rule, and we're better off if Linux handles
those exceptions the same way as Windows.This patch makes Linux use [_MIN to _MAX], as Windows seems to do. This
effectively reverts d558b483d5 and 03db42adfe and replaces them with
simpler code.https://bugzilla.kernel.org/show_bug.cgi?id=14337 (round)
https://bugzilla.kernel.org/show_bug.cgi?id=15480 (truncate)Signed-off-by: Bjorn Helgaas
Signed-off-by: Jesse Barnes
27 Apr, 2010
1 commit
-
When we move a PCI device or assign resources to a device not configured
by the BIOS, we want to avoid the BIOS region below 1MB. Note that if the
BIOS places devices below 1MB, we leave them there.See https://bugzilla.kernel.org/show_bug.cgi?id=15744
and https://bugzilla.kernel.org/show_bug.cgi?id=15841Tested-by: Andy Isaacson
Tested-by: Andy Bailey
Signed-off-by: Bjorn Helgaas
Signed-off-by: Jesse Barnes
25 Apr, 2010
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: Ensure we re-enable devices on resume
x86/PCI: parse additional host bridge window resource types
PCI: revert broken device warning
PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset
x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions -
This is a standalone version of VMware Balloon driver. Ballooning is a
technique that allows hypervisor dynamically limit the amount of memory
available to the guest (with guest cooperation). In the overcommit
scenario, when hypervisor set detects that it needs to shuffle some
memory, it instructs the driver to allocate certain number of pages, and
the underlying memory gets returned to the hypervisor. Later hypervisor
may return memory to the guest by reattaching memory to the pageframes and
instructing the driver to "deflate" balloon.We are submitting a standalone driver because KVM maintainer (Avi Kivity)
expressed opinion (rightly) that our transport does not fit well into
virtqueue paradigm and thus it does not make much sense to integrate with
virtio.There were also some concerns whether current ballooning technique is the
right thing. If there appears a better framework to achieve this we are
prepared to evaluate and switch to using it, but in the meantime we'd like
to get this driver upstream.We want to get the driver accepted in distributions so that users do not
have to deal with an out-of-tree module and many distributions have
"upstream first" requirement.The driver has been shipping for a number of years and users running on
VMware platform will have it installed as part of VMware Tools even if it
will not come from a distribution, thus there should not be additional
risk in pulling the driver into mainline. The driver will only activate
if host is VMware so everyone else should not be affected at all.Signed-off-by: Dmitry Torokhov
Cc: Avi Kivity
Cc: Jeremy Fitzhardinge
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Apr, 2010
2 commits
-
Atom erratum AAE44/AAF40/AAG38/AAH41:
"If software clears the PS (page size) bit in a present PDE (page
directory entry), that will cause linear addresses mapped through this
PDE to use 4-KByte pages instead of using a large page after old TLB
entries are invalidated. Due to this erratum, if a code fetch uses
this PDE before the TLB entry for the large page is invalidated then
it may fetch from a different physical address than specified by
either the old large page translation or the new 4-KByte page
translation. This erratum may also cause speculative code fetches from
incorrect addresses."[http://download.intel.com/design/processor/specupdt/319536.pdf]
Where as commit 211b3d03c7400f48a781977a50104c9d12f4e229 seems to
workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
opportunity for the bug to occur and does not totally remove it. This
patch disables mixed 4K/4MB page tables totally avoiding the page
splitting and not tripping this processor issue.This is based on an original patch by Colin King.
Originally-by: Colin Ian King
Cc: Colin Ian King
Cc: Ingo Molnar
Signed-off-by: H. Peter Anvin
LKML-Reference:
Cc: -
When we do a thread switch, we clear the outgoing FS/GS base if the
corresponding selector is nonzero. This is taken by __switch_to() as
an entry invariant; it does not verify that it is true on entry.
However, copy_thread() doesn't enforce this constraint, which can
result in inconsistent results after fork().Make copy_thread() match the behavior of __switch_to().
Reported-and-tested-by: Samuel Thibault
Signed-off-by: H. Peter Anvin
LKML-Reference:
Cc:
23 Apr, 2010
1 commit
-
This adds support for Memory24, Memory32, and Memory32Fixed descriptors in
PCI host bridge _CRS.I experimentally determined that Windows (2008 R2) accepts these descriptors
and treats them as windows that are forwarded to the PCI bus, e.g., if
it finds any PCI devices with BARs outside the windows, it moves them into
the windows.I don't know whether any machines actually use these descriptors in PCI
host bridge _CRS methods, but if any exist and they're new enough that we
automatically turn on "pci=use_crs", they will work with Windows but not
with Linux.Here are the details: https://bugzilla.kernel.org/show_bug.cgi?id=15817
Signed-off-by: Bjorn Helgaas
Signed-off-by: Jesse Barnes
22 Apr, 2010
1 commit
-
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix TSS size check for 16-bit tasks
KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
KVM: Increase NR_IOBUS_DEVS limit to 200
KVM: fix the handling of dirty bitmaps to avoid overflows
KVM: MMU: fix kvm_mmu_zap_page() and its calling path
KVM: VMX: Save/restore rflags.vm correctly in real mode
KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
KVM: take srcu lock before call to complete_pio()
21 Apr, 2010
4 commits
-
A 16-bit TSS is only 44 bytes long. So make sure to test for the correct
size on task switch.Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity -
APB timer is used on Moorestown platforms but not on a standard PC.
If APB timer code is compiled in but not initialized at run-time due
to lack of FW reported SFI table, kernel would panic when the non-boot
CPUs are offlined and notifier is called.https://bugzilla.kernel.org/show_bug.cgi?id=15786
This patch ensures CPU hotplug notifier for APB timer is only registered
when the APBT timer block is initialized.Signed-off-by: Jacob Pan
LKML-Reference:
Signed-off-by: H. Peter Anvin -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix unsafe frame rewinding with hot regs fetching -
Before commit e28cbf22933d0c0ccaf3c4c27a1a263b41f73859 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.Reported-and-tested-by: Andy Isaacson
Signed-off-by: Christoph Hellwig
Signed-off-by: Linus Torvalds
20 Apr, 2010
4 commits
-
Int is not long enough to store the size of a dirty bitmap.
This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.Note: in mark_page_dirty(), we have to consider the fact that
__set_bit() takes the offset as int, not long.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti -
This patch fix:
- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walkingKVM-Stable-Tag.
Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti -
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode. The means that the following sequenceKVM_SET_REGS (rflags.vm=1)
KVM_SET_SREGS (cr0.pe=1)Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().
Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti -
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.Signed-off-by: Andre Przywara
Signed-off-by: Avi Kivity