02 Aug, 2010
4 commits
-
Devices register mask notifier using gsi, but irqchip knows about
irqchip/pin, so conversion from irqchip/pin to gsi should be done before
looking for mask notifier to call.Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti -
Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity -
They are not used outside of the file.
Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity -
For 32bit machines where the physical address width is
larger than the virtual address width the frame number types
in KVM may overflow. Fix this by changing them to u64.[sfr: fix build on 32-bit ppc]
Signed-off-by: Joerg Roedel
Signed-off-by: Stephen Rothwell
Signed-off-by: Marcelo Tosatti
01 Aug, 2010
15 commits
-
This patch converts unnecessary divide and modulo operations
in the KVM large page related code into logical operations.
This allows to convert gfn_t to u64 while not breaking 32
bit builds.Signed-off-by: Joerg Roedel
Signed-off-by: Marcelo Tosatti -
This patch fixes the following warning.
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/kvm_host.h:259 invoked rcu_dereference_check() without
protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
no locks held by qemu-system-x86/29679.stack backtrace:
Pid: 29679, comm: qemu-system-x86 Not tainted 2.6.35-rc3+ #200
Call Trace:
[] lockdep_rcu_dereference+0xa8/0xb1
[] kvm_iommu_unmap_memslots+0xc9/0xde [kvm]
[] kvm_iommu_unmap_guest+0x40/0x4e [kvm]
[] kvm_arch_destroy_vm+0x1a/0x186 [kvm]
[] kvm_put_kvm+0x110/0x167 [kvm]
[] kvm_vcpu_release+0x18/0x1c [kvm]
[] fput+0x22a/0x3a0
[] filp_close+0xb4/0xcd
[] put_files_struct+0x1b7/0x36b
[] ? put_files_struct+0x48/0x36b
[] ? do_raw_spin_unlock+0x118/0x160
[] exit_files+0x6d/0x75
[] do_exit+0x47d/0xc60
[] ? _raw_spin_unlock_irq+0x30/0x36
[] do_group_exit+0xcf/0x134
[] get_signal_to_deliver+0x732/0x81d
[] ? cpu_clock+0x4e/0x60
[] do_notify_resume+0x117/0xc43
[] ? trace_hardirqs_on+0xd/0xf
[] ? sys_rt_sigtimedwait+0x2b5/0x3bf
[] ? trace_hardirqs_off_thunk+0x3a/0x3c
[] ? sysret_signal+0x5/0x3d
[] int_signal+0x12/0x17Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti -
is_hwpoison_address accesses the page table, so the caller must hold
current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of
kvm accordingly.Comment is_hwpoison_address to remind other users.
Reported-by: Avi Kivity
Signed-off-by: Huang Ying
Signed-off-by: Avi Kivity -
May be used for distinguishing between internal and user slots, or for sorting
slots in size order.Signed-off-by: Avi Kivity
-
Makes it a little more readable and hackable.
Signed-off-by: Avi Kivity
-
As advertised in feature-removal-schedule.txt. Equivalent support is provided
by overlapping memory regions.Signed-off-by: Avi Kivity
-
Otherwise we might try to deliver a timer interrupt to a cpu that
can't possibly handle it.Signed-off-by: Chris Lalancette
Signed-off-by: Marcelo Tosatti -
No real bugs in this one.
Signed-off-by: Andi Kleen
Signed-off-by: Avi Kivity -
When the user passed in a NULL mask pass this on from the ioctl
handler.Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen
Signed-off-by: Avi Kivity -
The type of '*new.rmap' is not 'struct page *', fix it
Signed-off-by: Lai Jiangshan
Signed-off-by: Marcelo Tosatti -
Signed-off-by: Avi Kivity
-
Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.Signed-off-by: Avi Kivity
-
All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.This patch only updates generic ioctls and leaves arch specific ioctls alone.
Signed-off-by: Avi Kivity
-
Remove this check in an effort to allow kvm guests to run without
root privileges. This capability check doesn't seem to add any
security since the device needs to have already been added via the
assign device ioctl and the io actually occurs through the pci
sysfs interface.Signed-off-by: Alex Williamson
Signed-off-by: Marcelo Tosatti -
In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.[xiao: fix warning introduced by avi]
Reported-by: Max Asbock
Signed-off-by: Huang Ying
Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity
11 Jun, 2010
1 commit
-
Read ioapic->irr inside ioapic->lock protected section.
KVM-Stable-Tag
Signed-off-by: Marcelo Tosatti
09 Jun, 2010
1 commit
-
This is obviously a left-over from the the old interface taking the
size. Apparently a mostly harmless issue with the current iommu_unmap
implementation.Signed-off-by: Jan Kiszka
Acked-by: Joerg Roedel
Signed-off-by: Avi Kivity
22 May, 2010
1 commit
-
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
KVM: x86: Add missing locking to arch specific vcpu ioctls
KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
KVM: MMU: Segregate shadow pages with different cr0.wp
KVM: x86: Check LMA bit before set_efer
KVM: Don't allow lmsw to clear cr0.pe
KVM: Add cpuid.txt file
KVM: x86: Tell the guest we'll warn it about tsc stability
x86, paravirt: don't compute pvclock adjustments if we trust the tsc
x86: KVM guest: Try using new kvm clock msrs
KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
KVM: x86: add new KVMCLOCK cpuid feature
KVM: x86: change msr numbers for kvmclock
x86, paravirt: Add a global synchronization point for pvclock
x86, paravirt: Enable pvclock flags in vcpu_time_info structure
KVM: x86: Inject #GP with the right rip on efer writes
KVM: SVM: Don't allow nested guest to VMMCALL into host
KVM: x86: Fix exception reinjection forced to true
KVM: Fix wallclock version writing race
KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
...
19 May, 2010
1 commit
-
vmx and svm vcpus have different contents and therefore may have different
alignmment requirements. Let each specify its required alignment.Signed-off-by: Avi Kivity
18 May, 2010
1 commit
-
…/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Add amd_iommu=off command line option
iommu-api: Remove iommu_{un}map_range functions
x86/amd-iommu: Implement ->{un}map callbacks for iommu-api
x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes
x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes
x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
kvm: Change kvm_iommu_map_pages to map large pages
VT-d: Change {un}map_range functions to implement {un}map interface
iommu-api: Add ->{un}map callbacks to iommu_ops
iommu-api: Add iommu_map and iommu_unmap functions
iommu-api: Rename ->{un}map function pointers to ->{un}map_range
17 May, 2010
8 commits
-
As Avi pointed out, testing bit part in mark_page_dirty() was important
in the days of shadow paging, but currently EPT and NPT has already become
common and the chance of faulting a page more that once per iteration is
small. So let's remove the test bit to avoid extra access.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity -
When CPU_UP_CANCELED, hardware_enable() has not been called at the CPU
which is going up because raw_notifier_call_chain(CPU_ONLINE)
has not been called for this cpu.Drop the handling for CPU_UP_CANCELED.
Signed-off-by: Lai Jiangshan
Signed-off-by: Avi Kivity -
The RCU/SRCU API have already changed for proving RCU usage.
I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
#0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
#1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb3
[] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
[] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
[] __kvm_set_memory_region+0x636/0x6e2 [kvm]
[] kvm_set_memory_region+0x37/0x50 [kvm]
[] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
[] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
[] ? unlock_page+0x27/0x2c
[] ? __do_fault+0x3a9/0x3e1
[] kvm_vm_ioctl+0x364/0x38d [kvm]
[] ? up_read+0x23/0x3d
[] vfs_ioctl+0x32/0xa6
[] do_vfs_ioctl+0x495/0x4db
[] ? fget_light+0xc2/0x241
[] ? do_sys_open+0x104/0x116
[] ? retint_swapgs+0xe/0x13
[] sys_ioctl+0x47/0x6a
[] system_call_fastpath+0x16/0x1bSigned-off-by: Lai Jiangshan
Signed-off-by: Avi Kivity -
This patch limits the number of pages per memory slot to make
us free from extra care about type issues.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti -
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
mmio ring page and dev even after it has freed them.Also, if this function fails, though it might be rare, it seems to be
suggesting the system's serious state: so we'd better stop the works
following the kvm_creat_vm().This patch clears these problems.
We move the coalesced mmio's initialization out of kvm_create_vm().
This seems to be natural because it includes a registration which
can be done only when vm is successfully created.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti -
Free IRQ's and disable MSIX upon failure.
Cc: Avi Kivity
Signed-off-by: Jing Zhang
Signed-off-by: Marcelo Tosatti -
This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
from -EINVAL to -ENXIO if no coalesced mmio dev exists.Signed-off-by: Wei Yongjun
Signed-off-by: Marcelo Tosatti -
This patch does:
- no need call tracepoint_synchronize_unregister() when kvm module
is unloaded since ftrace can handle it- cleanup ftrace's macro
Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity
13 May, 2010
1 commit
-
kvm_set_irq is used from non sleepable contexes, so convert ioapic from
mutex to spinlock.KVM-Stable-Tag.
Tested-by: Ralf Bonenkamp
Signed-off-by: Marcelo Tosatti
11 May, 2010
1 commit
-
Conflicts:
arch/x86/kernel/amd_iommu.c
25 Apr, 2010
1 commit
-
Marcelo introduced gfn_to_hva_memslot() when he implemented
gfn_to_pfn_memslot(). Let's use this for gfn_to_hva() too.Note: also remove parentheses next to return as checkpatch said to do.
Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity
21 Apr, 2010
1 commit
-
I got this dmesg due to srcu_read_lock() is missing in
kvm_mmu_notifier_release().===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/3100:
#0: (rcu_read_lock){.+.+..}, at: [] __mmu_notifier_release+0x38/0xdf
#1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_mmu_zap_all+0x21/0x5e [kvm]stack backtrace:
Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb3
[] unalias_gfn+0x56/0xab [kvm]
[] gfn_to_memslot+0x16/0x25 [kvm]
[] gfn_to_rmap+0x17/0x6e [kvm]
[] rmap_remove+0xa0/0x19d [kvm]
[] kvm_mmu_zap_page+0x109/0x34d [kvm]
[] kvm_mmu_zap_all+0x35/0x5e [kvm]
[] kvm_arch_flush_shadow+0x16/0x22 [kvm]
[] kvm_mmu_notifier_release+0x15/0x17 [kvm]
[] __mmu_notifier_release+0x88/0xdf
[] ? __mmu_notifier_release+0x38/0xdf
[] ? exit_mm+0xe0/0x115
[] exit_mmap+0x2c/0x17e
[] mmput+0x2d/0xd4
[] exit_mm+0x108/0x115
[...]Signed-off-by: Lai Jiangshan
Signed-off-by: Avi Kivity
20 Apr, 2010
1 commit
-
Int is not long enough to store the size of a dirty bitmap.
This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.Note: in mark_page_dirty(), we have to consider the fact that
__set_bit() takes the offset as int, not long.Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
08 Mar, 2010
1 commit
-
This patch changes the implementation of of
kvm_iommu_map_pages to map the pages with the host page size
into the io virtual address space.Signed-off-by: Joerg Roedel
Acked-By: Avi Kivity
01 Mar, 2010
1 commit
-
The code relies on kvm->requests_lock inhibiting preemption.
Noted by Jan Kiszka.
Signed-off-by: Avi Kivity