Eric Lee / smarc-fsl-linux-kernel

28 Jun, 2009

2 commits

84261923d KVM: protect concurrent make_all_cpus_request ... Browse Code »

make_all_cpus_request contains a race condition which can
trigger false request completed status, as follows:

CPU0 CPU1

if (test_and_set_bit(req,&vcpu->requests))
.... if (test_and_set_bit(req,&vcpu->requests))
.. return
proceed to smp_call_function_many(wait=1)

Use a spinlock to serialize concurrent CPUs.

Cc: stable@kernel.org
Signed-off-by: Andrea Arcangeli
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2009-06-28 19:10:29 +0800
e244584fe KVM: Fix dirty bit tracking for slots with large pages ... Browse Code »

When slot is already allocated and being asked to be tracked we need
to break the large pages.

This code flush the mmu when someone ask a slot to start dirty bit
tracking.

Cc: stable@kernel.org
Signed-off-by: Izik Eidus
Signed-off-by: Avi Kivity

Izik Eidus
2009-06-28 19:10:29 +0800

12 Jun, 2009

1 commit

aee74f3bb kvm: remove the duplicated cpumask_clear ... Browse Code »

zalloc_cpumask_var already cleared it.

Signed-off-by: Yinghai Lu
Signed-off-by: Linus Torvalds

Yinghai Lu
2009-06-12 11:04:37 +0800

10 Jun, 2009

25 commits

09f8ca74a KVM: Prevent overflow in largepages calculation ... Browse Code »

If userspace specifies a memory slot that is larger than 8 petabytes, it
could overflow the largepages variable.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity

Avi Kivity
2009-06-10 20:18:16 +0800
ac04527f7 KVM: Disable large pages on misaligned memory slots ... Browse Code »

If a slots guest physical address and host virtual address unequal (mod
large page size), then we would erronously try to back guest large pages
with host large pages. Detect this misalignment and diable large page
support for the trouble slot.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity

Avi Kivity
2009-06-10 20:17:58 +0800
b43b1901a KVM: take mmu_lock when updating a deleted slot ... Browse Code »

kvm_handle_hva relies on mmu_lock protection to safely access
the memslot structures.

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2009-06-10 16:48:54 +0800
547de29e5 KVM: protect assigned dev workqueue, int handler and irq acker ... Browse Code »

kvm_assigned_dev_ack_irq is vulnerable to a race condition with the
interrupt handler function. It does:

if (dev->host_irq_disabled) {
enable_irq(dev->host_irq);
dev->host_irq_disabled = false;
}

If an interrupt triggers before the host->dev_irq_disabled assignment,
it will disable the interrupt and set dev->host_irq_disabled to true.

On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to
false, and the next kvm_assigned_dev_ack_irq call will fail to reenable
it.

Other than that, having the interrupt handler and work handlers run in
parallel sounds like asking for trouble (could not spot any obvious
problem, but better not have to, its fragile).

CC: sheng.yang@intel.com
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2009-06-10 16:48:53 +0800
efbc100c2 KVM: Trivial format fix in setup_routing_entry() ... Browse Code »

Remove extra tab.

Signed-off-by: Chris Wright
Signed-off-by: Avi Kivity

Chris Wright
2009-06-10 16:48:50 +0800
8e1c18157 KVM: VMX: Disable VMX when system shutdown ... Browse Code »

Intel TXT(Trusted Execution Technology) required VMX off for all cpu to work
when system shutdown.

CC: Joseph Cihula
Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:50 +0800
522c68c44 KVM: Enable snooping control for supported hardware ... Browse Code »

Memory aliases with different memory type is a problem for guest. For the guest
without assigned device, the memory type of guest memory would always been the
same as host(WB); but for the assigned device, some part of memory may be used
as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of
host memory type then be a potential issue.

Snooping control can guarantee the cache correctness of memory go through the
DMA engine of VT-d.

[avi: fix build on ia64]

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:50 +0800
78646121e KVM: Fix interrupt unhalting a vcpu when it shouldn't ... Browse Code »

kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
if interrupt window is actually opened.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-06-10 16:48:33 +0800
09cec7548 KVM: Timer event should not unconditionally unhalt vcpu. ... Browse Code »

Currently timer events are processed before entering guest mode. Move it
to main vcpu event loop since timer events should be processed even while
vcpu is halted. Timer may cause interrupt/nmi to be injected and only then
vcpu will be unhalted.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-06-10 16:48:33 +0800
f00be0cae KVM: MMU: do not free active mmu pages in free_mmu_pages() ... Browse Code »

free_mmu_pages() should only undo what alloc_mmu_pages() does.
Free mmu pages from the generic VM destruction function, kvm_destroy_vm().

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-06-10 16:48:30 +0800
e56d532f2 KVM: Device assignment framework rework ... Browse Code »

After discussion with Marcelo, we decided to rework device assignment framework
together. The old problems are kernel logic is unnecessary complex. So Marcelo
suggest to split it into a more elegant way:

1. Split host IRQ assign and guest IRQ assign. And userspace determine the
combination. Also discard msi2intx parameter, userspace can specific
KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
enable MSI to INTx convertion.

2. Split assign IRQ and deassign IRQ. Import two new ioctls:
KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.

This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
old interface).

[avi: replace homemade bitcount() by hweight_long()]

Signed-off-by: Marcelo Tosatti
Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:29 +0800
58c2dde17 KVM: APIC: get rid of deliver_bitmask ... Browse Code »

Deliver interrupt during destination matching loop.

Signed-off-by: Gleb Natapov
Acked-by: Xiantao Zhang
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2009-06-10 16:48:27 +0800
e1035715e KVM: change the way how lowest priority vcpu is calculated ... Browse Code »

The new way does not require additional loop over vcpus to calculate
the one with lowest priority as one is chosen during delivery bitmap
construction.

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2009-06-10 16:48:27 +0800
343f94fe4 KVM: consolidate ioapic/ipi interrupt delivery logic ... Browse Code »

Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead
of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in
apic_send_ipi() to figure out ipi destination instead of reimplementing
the logic.

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2009-06-10 16:48:27 +0800
a53c17d21 KVM: ioapic/msi interrupt delivery consolidation ... Browse Code »

ioapic_deliver() and kvm_set_msi() have code duplication. Move
the code into ioapic_deliver_entry() function and call it from
both places.

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2009-06-10 16:48:27 +0800
6da7e3f64 KVM: APIC: kvm_apic_set_irq deliver all kinds of interrupts ... Browse Code »

Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions.

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2009-06-10 16:48:26 +0800
74a3a8f15 KVM: Merge kvm_ioapic_get_delivery_bitmask into kvm_get_intr_delivery_bitmask ... Browse Code »

Gleb fixed bitmap ops usage in kvm_ioapic_get_delivery_bitmask.

Sheng merged two functions, as well as fixed several issues in
kvm_get_intr_delivery_bitmask
1. deliver_bitmask is a bitmap rather than a unsigned long intereger.
2. Lowest priority target bitmap wrong calculated by mistake.
3. Prevent potential NULL reference.
4. Declaration in include/kvm_host.h caused powerpc compilation warning.
5. Add warning for guest broadcast interrupt with lowest priority delivery mode.
6. Removed duplicate bitmap clean up in caller of kvm_get_intr_delivery_bitmask.

Signed-off-by: Gleb Natapov
Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti

Sheng Yang
2009-06-10 16:48:26 +0800
d510d6cc6 KVM: Enable MSI-X for KVM assigned device ... Browse Code »

This patch finally enable MSI-X.

What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one. Now this have been done by guest, and
transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.

2. Information for incoming interrupt. Now one device can have more than one
interrupt, and they are all handled by one workqueue structure. So we need to
identify them. The previous patch enable gsi_msg_pending_bitmap get this done.

3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.

What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).

2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.

3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, userspace
should have reposibility to handle it.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:23 +0800
2350bd1f6 KVM: Add MSI-X interrupt injection logic ... Browse Code »

We have to handle more than one interrupt with one handler for MSI-X. Avi
suggested to use a flag to indicate the pending. So here is it.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:23 +0800
c1e015142 KVM: Ioctls for init MSI-X entry ... Browse Code »

Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls.

This two ioctls are used by userspace to specific guest device MSI-X entry
number and correlate MSI-X entry with GSI during the initialization stage.

MSI-X should be well initialzed before enabling.

Don't support change MSI-X entry number for now.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:23 +0800
bfd349d07 KVM: bit ops for deliver_bitmap ... Browse Code »

It's also convenient when we extend KVM supported vcpu number in the future.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:22 +0800
110c2faeb KVM: Update intr delivery func to accept unsigned long* bitmap ... Browse Code »

Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is
increased.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:22 +0800
e5871be0f KVM: Change API of kvm_ioapic_get_delivery_bitmask ... Browse Code »

In order to use with bit ops.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:22 +0800
116191b69 KVM: Unify the delivery of IOAPIC and MSI interrupts ... Browse Code »

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:22 +0800
cf9e4e15e KVM: Split IOAPIC structure ... Browse Code »

Prepared for reuse ioapic_redir_entry for MSI.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-06-10 16:48:21 +0800

09 Jun, 2009

1 commit

8437a6177 kvm: fix kvm reboot crash when MAXSMP is used ... Browse Code »

one system was found there is crash during reboot then kvm/MAXSMP
Sending all processes the KILL signal... done
Please stand by while rebooting the system...
[ 1721.856538] md: stopping all md devices.
[ 1722.852139] kvm: exiting hardware virtualization
[ 1722.854601] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1722.872219] IP: [] hardware_disable+0x4c/0xb4
[ 1722.877955] PGD 0
[ 1722.880042] Oops: 0000 [#1] SMP
[ 1722.892548] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:2:0/0:2:0:0/vendor
[ 1722.900977] CPU 9
[ 1722.912606] Modules linked in:
[ 1722.914226] Pid: 0, comm: swapper Not tainted 2.6.30-rc7-tip-01843-g2305324-dirty #299 ...
[ 1722.932589] RIP: 0010:[] [] hardware_disable+0x4c/0xb4
[ 1722.942709] RSP: 0018:ffffc900010b6ed8 EFLAGS: 00010046
[ 1722.956121] RAX: 0000000000000000 RBX: ffffc9000e253140 RCX: 0000000000000009
[ 1722.972202] RDX: 000000000000b020 RSI: ffffc900010c3220 RDI: ffffffffffffd790
[ 1722.977399] RBP: ffffc900010b6f08 R08: 0000000000000000 R09: 0000000000000000
[ 1722.995149] R10: 00000000000004b8 R11: 966912b6c78fddbd R12: 0000000000000009
[ 1723.011551] R13: 000000000000b020 R14: 0000000000000009 R15: 0000000000000000
[ 1723.019898] FS: 0000000000000000(0000) GS:ffffc900010b3000(0000) knlGS:0000000000000000
[ 1723.034389] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 1723.041164] CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006e0
[ 1723.056192] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1723.072546] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1723.080562] Process swapper (pid: 0, threadinfo ffff88107e464000, task ffff88047e5a2550)
[ 1723.096144] Stack:
[ 1723.099071] 0000000000000046 ffffc9000e253168 966912b6c78fddbd ffffc9000e253140
[ 1723.115471] ffff880c7d4304d0 ffffc9000e253168 ffffc900010b6f28 ffffffff81011022
[ 1723.132428] ffffc900010b6f48 966912b6c78fddbd ffffc900010b6f48 ffffffff8100b83b
[ 1723.141973] Call Trace:
[ 1723.142981] [] kvm_arch_hardware_disable+0x26/0x3c
[ 1723.158153] [] hardware_disable+0x3f/0x55
[ 1723.172168] [] generic_smp_call_function_interrupt+0x76/0x13c
[ 1723.178836] [] smp_call_function_interrupt+0x3a/0x5e
[ 1723.194689] [] call_function_interrupt+0x13/0x20
[ 1723.199750] [] ? acpi_idle_enter_c1+0xd3/0xf4
[ 1723.217508] [] ? acpi_idle_enter_c1+0xcd/0xf4
[ 1723.232172] [] ? acpi_idle_enter_bm+0xe7/0x2ce
[ 1723.235141] [] ? __atomic_notifier_call_chain+0x0/0xac
[ 1723.253381] [] ? menu_select+0x58/0xd2
[ 1723.258179] [] ? cpuidle_idle_call+0xa4/0xf3
[ 1723.272828] [] ? cpu_idle+0xb8/0x101
[ 1723.277085] [] ? start_secondary+0x1bc/0x1d7
[ 1723.293708] Code: b0 00 00 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 48 8b 04 cd 30 ee 27 82 49 89 cc 49 89 d5 48 8b 04 10 48 8d b8 90 d7 ff ff 8b 87 70 28 00 00 48 8d 98 90 d7 ff ff eb 16 e8 e9 fe ff ff
[ 1723.335524] RIP [] hardware_disable+0x4c/0xb4
[ 1723.342076] RSP
[ 1723.352021] CR2: 0000000000000000
[ 1723.354348] ---[ end trace e2aec53dae150aa1 ]---

it turns out that we need clear cpus_hardware_enabled in that case.

Reported-and-tested-by: Yinghai Lu
Signed-off-by: Yinghai Lu
Signed-off-by: Rusty Russell

Avi Kivity
2009-06-09 21:00:28 +0800

08 Jun, 2009

1 commit

a4c0364be KVM: Explicity initialize cpus_hardware_enabled ... Browse Code »

Under CONFIG_MAXSMP, cpus_hardware_enabled is allocated from the heap and
not statically initialized. This causes a crash on reboot when kvm thinks
vmx is enabled on random nonexistent cpus and accesses nonexistent percpu
lists.

Fix by explicitly clearing the variable.

Cc: stable@kernel.org
Reported-and-tested-by: Yinghai Lu
Signed-off-by: Avi Kivity

Avi Kivity
2009-06-08 15:50:46 +0800

22 Apr, 2009

2 commits

4cd481f68 KVM: Fix overlapping check for memory slots ... Browse Code »

When checking for overlapping slots on registration of a new one, kvm
currently also considers zero-length (ie. deleted) slots and rejects
requests incorrectly. This finally denies user space from joining slots.
Fix the check by skipping deleted slots and advertise this via a
KVM_CAP_JOIN_MEMORY_REGIONS_WORKS.

Cc: stable@kernel.org
Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2009-04-22 18:52:09 +0800
99894a799 KVM: MMU: Fix off-by-one calculating large page count ... Browse Code »

The large page initialization code concludes there are two large pages spanned
by a slot covering 1 (small) page starting at gfn 1. This is incorrect, and
also results in incorrect write_count initialization in some cases (base = 1,
npages = 513 for example).

Cc: stable@kernel.org
Signed-off-by: Avi Kivity

Avi Kivity
2009-04-22 18:52:09 +0800

24 Mar, 2009

8 commits

36463146f KVM: Get support IRQ routing entry counts ... Browse Code »

In capability probing ioctl.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-03-24 17:03:14 +0800
cded19f39 KVM: fix sparse warnings: Should it be static? ... Browse Code »

Impact: Make symbols static.

Fix this sparse warnings:
arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static?
arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static?
arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static?
virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static?

Signed-off-by: Hannes Eder
Signed-off-by: Avi Kivity

Hannes Eder
2009-03-24 17:03:14 +0800
6b08035f3 KVM: ia64: Fix the build errors due to lack of macros related to MSI. ... Browse Code »

Include the newly introduced msidef.h to solve the build issues.

Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity

Xiantao Zhang
2009-03-24 17:03:13 +0800
4a906e49f KVM: fix kvm_vm_ioctl_deassign_device ... Browse Code »

only need to set assigned_dev_id for deassignment, use
match->flags to judge and deassign it.

Acked-by: Mark McLoughlin
Signed-off-by: Weidong Han
Signed-off-by: Avi Kivity

Weidong Han
2009-03-24 17:03:12 +0800
71450f788 KVM: Report IRQ injection status for MSI delivered interrupts ... Browse Code »

Return number of CPUs interrupt was successfully injected into or -1 if
none.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-03-24 17:03:11 +0800
4925663a0 KVM: Report IRQ injection status to userspace. ... Browse Code »

IRQ injection status is either -1 (if there was no CPU found
that should except the interrupt because IRQ was masked or
ioapic was misconfigured or ...) or >= 0 in that case the
number indicates to how many CPUs interrupt was injected.
If the value is 0 it means that the interrupt was coalesced
and probably should be reinjected.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2009-03-24 17:03:11 +0800
fc5659c8c KVM: MMU: handle compound pages in kvm_is_mmio_pfn ... Browse Code »

The function kvm_is_mmio_pfn is called before put_page is called on a
page by KVM. This is a problem when when this function is called on some
struct page which is part of a compund page. It does not test the
reserved flag of the compound page but of the struct page within the
compount page. This is a problem when KVM works with hugepages allocated
at boot time. These pages have the reserved bit set in all tail pages.
Only the flag in the compount head is cleared. KVM would not put such a
page which results in a memory leak.

Signed-off-by: Joerg Roedel
Acked-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Joerg Roedel
2009-03-24 17:03:09 +0800
79950e107 KVM: Use irq routing API for MSI ... Browse Code »

Merge MSI userspace interface with IRQ routing table. Notice the API have been
changed, and using IRQ routing table would be the only interface kvm-userspace
supported.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2009-03-24 17:03:09 +0800