Eric Lee / linux-smarc-t335x-v3.2

02 Aug, 2010

17 commits

edba23e51 KVM: Return EFAULT from kvm ioctl when guest accesses bad area ... Browse Code »

Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2010-08-02 11:40:33 +0800
fa7bff8f8 KVM: define hwpoison variables static ... Browse Code »

They are not used outside of the file.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2010-08-02 11:40:32 +0800
673813e81 KVM: fix lock imbalance in kvm_create_pit() ... Browse Code »

Stanse found that there is an omitted unlock in kvm_create_pit in one fail
path. Add proper unlock there.

Signed-off-by: Jiri Slaby
Cc: Avi Kivity
Cc: Marcelo Tosatti
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: x86@kernel.org
Cc: Gleb Natapov
Cc: "Michael S. Tsirkin"
Cc: Gregory Haskins
Cc: kvm@vger.kernel.org
Signed-off-by: Avi Kivity

Jiri Slaby
2010-08-02 11:40:31 +0800
f59c1d2de KVM: MMU: Keep going on permission error ... Browse Code »

Real hardware disregards permission errors when computing page fault error
code bit 0 (page present). Do the same.

Reviewed-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Avi Kivity
2010-08-02 11:40:30 +0800
b0eeec29f KVM: MMU: Only indicate a fetch fault in page fault error code if nx is enabled ... Browse Code »

Bit 4 of the page fault error code is set only if EFER.NX is set.

Reviewed-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Avi Kivity
2010-08-02 11:40:29 +0800
5d55f299f KVM: x86 emulator: re-implementing 'mov AL,moffs' instruction decoding ... Browse Code »

This patch change to use DstAcc for decoding 'mov AL, moffs'
and introduced SrcAcc for decoding 'mov moffs, AL'.

Signed-off-by: Wei Yongjun
Signed-off-by: Avi Kivity

Wei Yongjun
2010-08-02 11:40:27 +0800
07cbc6c18 KVM: x86 emulator: fix cli/sti instruction emulation ... Browse Code »

If IOPL check fail, the cli/sti emulate GP and then we should
skip writeback since the default write OP is OP_REG.

Signed-off-by: Wei Yongjun
Signed-off-by: Avi Kivity

Wei Yongjun
2010-08-02 11:40:26 +0800
b16b2b7bb KVM: x86 emulator: fix 'mov rm,sreg' instruction decoding ... Browse Code »

The source operand of 'mov rm,sreg' is segment register, not
general-purpose register, so remove SrcReg from decoding.

Signed-off-by: Wei Yongjun
Signed-off-by: Avi Kivity

Wei Yongjun
2010-08-02 11:40:25 +0800
e97e883f8 KVM: x86 emulator: fix 'and AL,imm8' instruction decoding ... Browse Code »

'and AL,imm8' should be mask as ByteOp, otherwise the dest operand
length will no correct and we may fill the full EAX when writeback.

Signed-off-by: Wei Yongjun
Signed-off-by: Avi Kivity

Wei Yongjun
2010-08-02 11:40:24 +0800
ce7a0ad3b KVM: x86 emulator: fix the comment of out instruction ... Browse Code »

Fix the comment of out instruction, using the same style as the
other instructions.

Signed-off-by: Wei Yongjun
Signed-off-by: Avi Kivity

Wei Yongjun
2010-08-02 11:40:23 +0800
a5046e6c7 KVM: x86 emulator: fix 'mov sreg,rm16' instruction decoding ... Browse Code »

Memory reads for 'mov sreg,rm16' should be 16 bits only.

Signed-off-by: Wei Yongjun
Signed-off-by: Avi Kivity

Wei Yongjun
2010-08-02 11:40:22 +0800
b79b93f92 KVM: MMU: Don't drop accessed bit while updating an spte ... Browse Code »

__set_spte() will happily replace an spte with the accessed bit set with
one that has the accessed bit clear. Add a helper update_spte() which checks
for this condition and updates the page flag if needed.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-02 11:40:21 +0800
a9221dd5e KVM: MMU: Atomically check for accessed bit when dropping an spte ... Browse Code »

Currently, in the window between the check for the accessed bit, and actually
dropping the spte, a vcpu can access the page through the spte and set the bit,
which will be ignored by the mmu.

Fix by using an exchange operation to atmoically fetch the spte and drop it.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-02 11:40:20 +0800
ce061867a KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte() ... Browse Code »

Since we need to make the check atomic, move it to the place that will
set the new spte.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-02 11:40:18 +0800
be38d276b KVM: MMU: Introduce drop_spte() ... Browse Code »

When we call rmap_remove(), we (almost) always immediately follow it by
an __set_spte() to a nonpresent pte. Since we need to perform the two
operations atomically, to avoid losing the dirty and accessed bits, introduce
a helper drop_spte() and convert all call sites.

The operation is still nonatomic at this point.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-02 11:40:17 +0800
dd180b3e9 KVM: VMX: fix tlb flush with invalid root ... Browse Code »

Commit 341d9b535b6c simplify reload logic while entry guest mode, it
can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and
KVM_REQ_MMU_SYNC both set.

But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the
root is invalid, it is triggered during my test:

Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable]
......

Fixed by directly return if the root is not ready.

Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2010-08-02 11:40:16 +0800
5689cc53f KVM: Use u64 for frame data types ... Browse Code »

For 32bit machines where the physical address width is
larger than the virtual address width the frame number types
in KVM may overflow. Fix this by changing them to u64.

[sfr: fix build on 32-bit ppc]

Signed-off-by: Joerg Roedel
Signed-off-by: Stephen Rothwell
Signed-off-by: Marcelo Tosatti

Joerg Roedel
2010-08-02 11:39:44 +0800

01 Aug, 2010

23 commits

828554136 KVM: Remove unnecessary divide operations ... Browse Code »

This patch converts unnecessary divide and modulo operations
in the KVM large page related code into logical operations.
This allows to convert gfn_t to u64 while not breaking 32
bit builds.

Signed-off-by: Joerg Roedel
Signed-off-by: Marcelo Tosatti

Joerg Roedel
2010-08-01 15:47:30 +0800
95c87e2b4 KVM: Fix IOMMU memslot reference warning ... Browse Code »

This patch fixes the following warning.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/kvm_host.h:259 invoked rcu_dereference_check() without
protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
no locks held by qemu-system-x86/29679.

stack backtrace:
Pid: 29679, comm: qemu-system-x86 Not tainted 2.6.35-rc3+ #200
Call Trace:
[] lockdep_rcu_dereference+0xa8/0xb1
[] kvm_iommu_unmap_memslots+0xc9/0xde [kvm]
[] kvm_iommu_unmap_guest+0x40/0x4e [kvm]
[] kvm_arch_destroy_vm+0x1a/0x186 [kvm]
[] kvm_put_kvm+0x110/0x167 [kvm]
[] kvm_vcpu_release+0x18/0x1c [kvm]
[] fput+0x22a/0x3a0
[] filp_close+0xb4/0xcd
[] put_files_struct+0x1b7/0x36b
[] ? put_files_struct+0x48/0x36b
[] ? do_raw_spin_unlock+0x118/0x160
[] exit_files+0x6d/0x75
[] do_exit+0x47d/0xc60
[] ? _raw_spin_unlock_irq+0x30/0x36
[] do_group_exit+0xcf/0x134
[] get_signal_to_deliver+0x732/0x81d
[] ? cpu_clock+0x4e/0x60
[] do_notify_resume+0x117/0xc43
[] ? trace_hardirqs_on+0xd/0xf
[] ? sys_rt_sigtimedwait+0x2b5/0x3bf
[] ? trace_hardirqs_off_thunk+0x3a/0x3c
[] ? sysret_signal+0x5/0x3d
[] int_signal+0x12/0x17

Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti

Sheng Yang
2010-08-01 15:47:29 +0800
fef093bec KVM: PPC: Make use of hash based Shadow MMU ... Browse Code »

We just introduced generic functions to handle shadow pages on PPC.
This patch makes the respective backends make use of them, getting
rid of a lot of duplicate code along the way.

Signed-off-by: Alexander Graf
Signed-off-by: Marcelo Tosatti

Alexander Graf
2010-08-01 15:47:28 +0800
7741909bf KVM: PPC: Add generic hpte management functions ... Browse Code »

Currently the shadow paging code keeps an array of entries it knows about.
Whenever the guest invalidates an entry, we loop through that entry,
trying to invalidate matching parts.

While this is a really simple implementation, it is probably the most
ineffective one possible. So instead, let's keep an array of lists around
that are indexed by a hash. This way each PTE can be added by 4 list_add,
removed by 4 list_del invocations and the search only needs to loop through
entries that share the same hash.

This patch implements said lookup and exports generic functions that both
the 32-bit and 64-bit backend can use.

Signed-off-by: Alexander Graf
Signed-off-by: Marcelo Tosatti

Alexander Graf
2010-08-01 15:47:27 +0800
84754cd8f KVM: MMU: cleanup FNAME(fetch)() functions ... Browse Code »

Cleanup this function that we are already get the direct sp's access

Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2010-08-01 15:47:26 +0800
9e7b0e7fb KVM: MMU: fix direct sp's access corrupted ... Browse Code »

If the mapping is writable but the dirty flag is not set, we will find
the read-only direct sp and setup the mapping, then if the write #PF
occur, we will mark this mapping writable in the read-only direct sp,
now, other real read-only mapping will happily write it without #PF.

It may hurt guest's COW

Fixed by re-install the mapping when write #PF occur.

Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2010-08-01 15:47:25 +0800
5fd5387c8 KVM: MMU: fix conflict access permissions in direct sp ... Browse Code »

In no-direct mapping, we mark sp is 'direct' when we mapping the
guest's larger page, but its access is encoded form upper page-struct
entire not include the last mapping, it will cause access conflict.

For example, have this mapping:
[W]
/ PDE1 -> |---|
P[W] | | LPA
\ PDE2 -> |---|
[R]

P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the
same lage page(LPA). The P's access is WR, PDE1's access is WR,
PDE2's access is RO(just consider read-write permissions here)

When guest access PDE1, we will create a direct sp for LPA, the sp's
access is from P, is W, then we will mark the ptes is W in this sp.

Then, guest access PDE2, we will find LPA's shadow page, is the same as
PDE's, and mark the ptes is RO.

So, if guest access PDE1, the incorrect #PF is occured.

Fixed by encode the last mapping access into direct shadow page

Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2010-08-01 15:47:23 +0800
36a2e6774 KVM: MMU: fix writable sync sp mapping ... Browse Code »

While we sync many unsync sp at one time(in mmu_sync_children()),
we may mapping the spte writable, it's dangerous, if one unsync
sp's mapping gfn is another unsync page's gfn.

For example:

SP1.pte[0] = P
SP2.gfn's pfn = P
[SP1.pte[0] = SP2.gfn's pfn]

First, we write protected SP1 and SP2, but SP1 and SP2 are still the
unsync sp.

Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp,
that is SP2, so it will mapping it writable, but we plan to sync SP2 soon,
at this point, the SP2->unsync is not reliable since later we sync SP2 but
SP2->gfn is already writable.

So the final result is: SP2 is the sync page but SP2.gfn is writable.

This bug will corrupt guest's page table, fixed by mark read-only mapping
if the mapped gfn has shadow pages.

Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2010-08-01 15:47:22 +0800
f5f48ee15 KVM: VMX: Execute WBINVD to keep data consistency with assigned devices ... Browse Code »

Some guest device driver may leverage the "Non-Snoop" I/O, and explicitly
WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
CLFLUSH, we need to maintain data consistency either by:
1: flushing cache (wbinvd) when the guest is scheduled out if there is no
wbinvd exit, or
2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.

Signed-off-by: Yaozu (Eddie) Dong
Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti

Sheng Yang
2010-08-01 15:47:21 +0800
cf3e3d3e1 KVM: Document KVM specific review items ... Browse Code »

Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti

Avi Kivity
2010-08-01 15:47:20 +0800
3e0075094 KVM: Simplify vcpu_enter_guest() mmu reload logic slightly ... Browse Code »

No need to reload the mmu in between two different vcpu->requests checks.

kvm_mmu_reload() may trigger KVM_REQ_TRIPLE_FAULT, but that will be caught
during atomic guest entry later.

Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti

Avi Kivity
2010-08-01 15:47:19 +0800
529df65e3 KVM: Search the LAPIC's for one that will accept a PIC interrupt ... Browse Code »

Older versions of 32-bit linux have a "Checking 'hlt' instruction"
test where they repeatedly call the 'hlt' instruction, and then
expect a timer interrupt to kick the CPU out of halt. This happens
before any LAPIC or IOAPIC setup happens, which means that all of
the APIC's are in virtual wire mode at this point. Unfortunately,
the current implementation of virtual wire mode is hardcoded to
only kick the BSP, so if a crash+kexec occurs on a different
vcpu, it will never get kicked.

This patch makes pic_unlock() do the equivalent of
kvm_irq_delivery_to_apic() for the IOAPIC code. That is, it runs
through all of the vcpus looking for one that is in virtual wire
mode. In the normal case where LAPICs and IOAPICs are configured,
this won't be used at all. In the bootstrap phase of a modern
OS, before the LAPICs and IOAPICs are configured, this will have
exactly the same behavior as today; VCPU0 is always looked at
first, so it will always get out of the loop after the first
iteration. This will only go through the loop more than once
during a kexec/kdump, in which case it will only do it a few times
until the kexec'ed kernel programs the LAPIC and IOAPIC.

Signed-off-by: Chris Lalancette
Signed-off-by: Avi Kivity

Chris Lalancette
2010-08-01 15:47:17 +0800
979586e0b KVM: ia64: cleanup kvm_ia64_sync_dirty_log() ... Browse Code »

kvm_ia64_sync_dirty_log() is a helper function for kvm_vm_ioctl_get_dirty_log()
which copies ia64's arch specific dirty bitmap to general one in memslot.
So doing sanity checks in this function is unnatural. We move these checks
outside of this and change the prototype appropriately.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity

Takuya Yoshikawa
2010-08-01 15:47:16 +0800
4482b06c0 KVM: ia64: fix dirty_log_lock spin_lock section not to include get_dirty_log() ... Browse Code »

kvm_get_dirty_log() calls copy_to_user(). So we need to narrow the
dirty_log_lock spin_lock section not to include this.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity

Takuya Yoshikawa
2010-08-01 15:47:15 +0800
4d29bdbf1 KVM: PPC: Make BAT only guest segments work ... Browse Code »

When a guest sets its SR entry to invalid, we may still find a
corresponding entry in a BAT. So we need to make sure we're not
faulting on invalid SR entries, but instead just claim them to be
BAT resolved.

This resolves breakage experienced when using libogc based guests.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-08-01 15:47:14 +0800
3b249157c KVM: PPC: Use kernel hash function ... Browse Code »

The linux kernel already provides a hash function. Let's reuse that
instead of reinventing the wheel!

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-08-01 15:47:13 +0800
a576f7a29 KVM: PPC: Remove obsolete kvmppc_mmu_find_pte ... Browse Code »

Initially we had to search for pte entries to invalidate them. Since
the logic has improved since then, we can just get rid of the search
function.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2010-08-01 15:47:12 +0800
bbeb34062 KVM: Fix a race condition for usage of is_hwpoison_address() ... Browse Code »

is_hwpoison_address accesses the page table, so the caller must hold
current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of
kvm accordingly.

Comment is_hwpoison_address to remind other users.

Reported-by: Avi Kivity
Signed-off-by: Huang Ying
Signed-off-by: Avi Kivity

Huang Ying
2010-08-01 15:47:11 +0800
6c3f60411 KVM: x86: Enable AVX for guest ... Browse Code »

Enable Intel(R) Advanced Vector Extension(AVX) for guest.

The detection of AVX feature includes OSXSAVE bit testing. When OSXSAVE bit is
not set, even if AVX is supported, the AVX instruction would result in UD as
well. So we're safe to expose AVX bits to guest directly.

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2010-08-01 15:47:10 +0800
7ac77099c KVM: Prevent internal slots from being COWed ... Browse Code »

If a process with a memory slot is COWed, the page will change its address
(despite having an elevated reference count). This breaks internal memory
slots which have their physical addresses loaded into vmcs registers (see
the APIC access memory slot).

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:08 +0800
e36d96f7c KVM: Keep slot ID in memory slot structure ... Browse Code »

May be used for distinguishing between internal and user slots, or for sorting
slots in size order.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:07 +0800
0719837c0 KVM: Reduce atomic operations on vcpu->requests ... Browse Code »

Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for
each request generates a large number of unneeded atomics if a bit is set.

Replace with a separate test/clear sequence. This is safe since there is
no clear_bit() outside the vcpu thread.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:06 +0800
a8eeb04a4 KVM: Add mini-API for vcpu->requests ... Browse Code »

Makes it a little more readable and hackable.

Signed-off-by: Avi Kivity

Avi Kivity
2010-08-01 15:47:05 +0800