23 Sep, 2010
1 commit
-
When we reboot, we disable vmx extensions or otherwise INIT gets blocked.
If a task on another cpu hits a vmx instruction, it will fault if vmx is
disabled. We trap that to avoid a nasty oops and spin until the reboot
completes.Problem is, we sleep with interrupts disabled. This blocks smp_send_stop()
from running, and the reboot process halts.Fix by enabling interrupts before spinning.
KVM-Stable-Tag.
Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti
10 Sep, 2010
1 commit
-
The CPU_STARTING callback was added upstream with the intention
of being used for KVM, specifically for the hardware enablement
that must be done before we can run in hardware virt. It had
bugs on the x86_64 architecture at the time, where it was called
after CPU_ONLINE. The arches have since merged and the bug is
gone.It might be noted other features should probably start making
use of this callback; microcode updates in particular which
might be fixing important erratums would be best applied before
beginning to run user tasks.Signed-off-by: Zachary Amsden
Signed-off-by: Marcelo Tosatti
02 Aug, 2010
2 commits
-
Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity -
They are not used outside of the file.
Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity
01 Aug, 2010
11 commits
-
This patch converts unnecessary divide and modulo operations
in the KVM large page related code into logical operations.
This allows to convert gfn_t to u64 while not breaking 32
bit builds.Signed-off-by: Joerg Roedel
Signed-off-by: Marcelo Tosatti -
is_hwpoison_address accesses the page table, so the caller must hold
current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of
kvm accordingly.Comment is_hwpoison_address to remind other users.
Reported-by: Avi Kivity
Signed-off-by: Huang Ying
Signed-off-by: Avi Kivity -
May be used for distinguishing between internal and user slots, or for sorting
slots in size order.Signed-off-by: Avi Kivity
-
Makes it a little more readable and hackable.
Signed-off-by: Avi Kivity
-
As advertised in feature-removal-schedule.txt. Equivalent support is provided
by overlapping memory regions.Signed-off-by: Avi Kivity
-
When the user passed in a NULL mask pass this on from the ioctl
handler.Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen
Signed-off-by: Avi Kivity -
The type of '*new.rmap' is not 'struct page *', fix it
Signed-off-by: Lai Jiangshan
Signed-off-by: Marcelo Tosatti -
Signed-off-by: Avi Kivity
-
Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.Signed-off-by: Avi Kivity
-
All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.This patch only updates generic ioctls and leaves arch specific ioctls alone.
Signed-off-by: Avi Kivity
-
In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.[xiao: fix warning introduced by avi]
Reported-by: Max Asbock
Signed-off-by: Huang Ying
Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity
19 May, 2010
1 commit
-
vmx and svm vcpus have different contents and therefore may have different
alignmment requirements. Let each specify its required alignment.Signed-off-by: Avi Kivity
17 May, 2010
7 commits
-
As Avi pointed out, testing bit part in mark_page_dirty() was important
in the days of shadow paging, but currently EPT and NPT has already become
common and the chance of faulting a page more that once per iteration is
small. So let's remove the test bit to avoid extra access.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity -
When CPU_UP_CANCELED, hardware_enable() has not been called at the CPU
which is going up because raw_notifier_call_chain(CPU_ONLINE)
has not been called for this cpu.Drop the handling for CPU_UP_CANCELED.
Signed-off-by: Lai Jiangshan
Signed-off-by: Avi Kivity -
The RCU/SRCU API have already changed for proving RCU usage.
I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
#0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
#1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb3
[] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
[] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
[] __kvm_set_memory_region+0x636/0x6e2 [kvm]
[] kvm_set_memory_region+0x37/0x50 [kvm]
[] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
[] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
[] ? unlock_page+0x27/0x2c
[] ? __do_fault+0x3a9/0x3e1
[] kvm_vm_ioctl+0x364/0x38d [kvm]
[] ? up_read+0x23/0x3d
[] vfs_ioctl+0x32/0xa6
[] do_vfs_ioctl+0x495/0x4db
[] ? fget_light+0xc2/0x241
[] ? do_sys_open+0x104/0x116
[] ? retint_swapgs+0xe/0x13
[] sys_ioctl+0x47/0x6a
[] system_call_fastpath+0x16/0x1bSigned-off-by: Lai Jiangshan
Signed-off-by: Avi Kivity -
This patch limits the number of pages per memory slot to make
us free from extra care about type issues.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti -
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
mmio ring page and dev even after it has freed them.Also, if this function fails, though it might be rare, it seems to be
suggesting the system's serious state: so we'd better stop the works
following the kvm_creat_vm().This patch clears these problems.
We move the coalesced mmio's initialization out of kvm_create_vm().
This seems to be natural because it includes a registration which
can be done only when vm is successfully created.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti -
This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
from -EINVAL to -ENXIO if no coalesced mmio dev exists.Signed-off-by: Wei Yongjun
Signed-off-by: Marcelo Tosatti -
This patch does:
- no need call tracepoint_synchronize_unregister() when kvm module
is unloaded since ftrace can handle it- cleanup ftrace's macro
Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity
25 Apr, 2010
1 commit
-
Marcelo introduced gfn_to_hva_memslot() when he implemented
gfn_to_pfn_memslot(). Let's use this for gfn_to_hva() too.Note: also remove parentheses next to return as checkpatch said to do.
Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity
21 Apr, 2010
1 commit
-
I got this dmesg due to srcu_read_lock() is missing in
kvm_mmu_notifier_release().===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/3100:
#0: (rcu_read_lock){.+.+..}, at: [] __mmu_notifier_release+0x38/0xdf
#1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_mmu_zap_all+0x21/0x5e [kvm]stack backtrace:
Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb3
[] unalias_gfn+0x56/0xab [kvm]
[] gfn_to_memslot+0x16/0x25 [kvm]
[] gfn_to_rmap+0x17/0x6e [kvm]
[] rmap_remove+0xa0/0x19d [kvm]
[] kvm_mmu_zap_page+0x109/0x34d [kvm]
[] kvm_mmu_zap_all+0x35/0x5e [kvm]
[] kvm_arch_flush_shadow+0x16/0x22 [kvm]
[] kvm_mmu_notifier_release+0x15/0x17 [kvm]
[] __mmu_notifier_release+0x88/0xdf
[] ? __mmu_notifier_release+0x38/0xdf
[] ? exit_mm+0xe0/0x115
[] exit_mmap+0x2c/0x17e
[] mmput+0x2d/0xd4
[] exit_mm+0x108/0x115
[...]Signed-off-by: Lai Jiangshan
Signed-off-by: Avi Kivity
20 Apr, 2010
1 commit
-
Int is not long enough to store the size of a dirty bitmap.
This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.Note: in mark_page_dirty(), we have to consider the fact that
__set_bit() takes the offset as int, not long.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
01 Mar, 2010
13 commits
-
The code relies on kvm->requests_lock inhibiting preemption.
Noted by Jan Kiszka.
Signed-off-by: Avi Kivity
-
This patch introduces a generic function to find out the
host page size for a given gfn. This function is needed by
the kvm iommu code. This patch also simplifies the x86
host_mapping_level function.Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity -
The commit 0953ca73 "KVM: Simplify coalesced mmio initialization"
allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but
didn't discard the original allocation...Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti -
cleanup_srcu_struct on VM destruction remains broken:
BUG: unable to handle kernel paging request at ffffffffffffffff
IP: [] srcu_read_lock+0x16/0x21
RIP: 0010:[] [] srcu_read_lock+0x16/0x21
Call Trace:
[] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm]
[] kvm_vcpu_uninit+0x9/0x15 [kvm]
[] vmx_free_vcpu+0x7f/0x8f [kvm_intel]
[] kvm_arch_destroy_vm+0x78/0x111 [kvm]
[] kvm_put_kvm+0xd4/0xfe [kvm]Move it to kvm_arch_destroy_vm.
Signed-off-by: Marcelo Tosatti
Reported-by: Jan Kiszka -
Signed-off-by: Marcelo Tosatti
-
Signed-off-by: Marcelo Tosatti
-
Using a similar two-step procedure as for memslots.
Signed-off-by: Marcelo Tosatti
-
Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.Also simplifies kvm_handle_hva locking.
Signed-off-by: Marcelo Tosatti
-
So its possible to iommu map a memslot before making it visible to
kvm.Signed-off-by: Marcelo Tosatti
-
Which takes a memslot pointer instead of using kvm->memslots.
To be used by SRCU convertion later.
Signed-off-by: Marcelo Tosatti
-
Required for SRCU convertion later.
Signed-off-by: Marcelo Tosatti
-
Have a pointer to an allocated region inside struct kvm.
[alex: fix ppc book 3s]
Signed-off-by: Alexander Graf
Signed-off-by: Marcelo Tosatti -
- add destructor function
- move related allocation into constructor
- add stubs for !CONFIG_KVM_MMIOSigned-off-by: Avi Kivity