Eric Lee / smarc-fsl-linux-kernel

27 Dec, 2011

12 commits

4f69b6805 KVM: ensure that debugfs entries have been created ... Browse Code »

by checking the return value from kvm_init_debug, we
can ensure that the entries under debugfs for KVM have
been created correctly.

Signed-off-by: Yang Bai
Signed-off-by: Marcelo Tosatti

Hamo
2011-12-27 17:22:33 +0800
d546cb406 KVM: drop bsp_vcpu pointer from kvm struct ... Browse Code »

Drop bsp_vcpu pointer from kvm struct since its only use is incorrect
anyway.

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2011-12-27 17:22:32 +0800
ff5c2c031 KVM: Use memdup_user instead of kmalloc/copy_from_user ... Browse Code »

Switch to using memdup_user when possible. This makes code more
smaller and compact, and prevents errors.

Signed-off-by: Sasha Levin
Signed-off-by: Avi Kivity

Sasha Levin
2011-12-27 17:22:21 +0800
cdfca7b34 KVM: Use kmemdup() instead of kmalloc/memcpy ... Browse Code »

Switch to kmemdup() in two places to shorten the code and avoid possible bugs.

Signed-off-by: Sasha Levin
Signed-off-by: Avi Kivity

Sasha Levin
2011-12-27 17:22:20 +0800
f85e2cb5d KVM: introduce a table to map slot id to index in memslots array ... Browse Code »

The operation of getting dirty log is frequent when framebuffer-based
displays are used(for example, Xwindow), so, we introduce a mapping table
to speed up id_to_memslot()

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:42 +0800
bf3e05bc1 KVM: sort memslots by its size and use line search ... Browse Code »

Sort memslots base on its size and use line search to find it, so that the
larger memslots have better fit

The idea is from Avi

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:40 +0800
28a37544f KVM: introduce id_to_memslot function ... Browse Code »

Introduce id_to_memslot to get memslot by slot id

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:39 +0800
be6ba0f09 KVM: introduce kvm_for_each_memslot macro ... Browse Code »

Introduce kvm_for_each_memslot to walk all valid memslot

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:37 +0800
be593d628 KVM: introduce update_memslots function ... Browse Code »

Introduce update_memslots to update slot which will be update to
kvm->memslots

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:35 +0800
93a5cef07 KVM: introduce KVM_MEM_SLOTS_NUM macro ... Browse Code »

Introduce KVM_MEM_SLOTS_NUM macro to instead of
KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-12-27 17:17:34 +0800
7850ac542 KVM: Count the number of dirty pages for dirty logging ... Browse Code »

Needed for the next patch which uses this number to decide how to write
protect a slot.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity

Takuya Yoshikawa
2011-12-27 17:17:19 +0800
6da64fdb8 KVM: Use kmemdup rather than duplicating its implementation ... Browse Code »

Use kmemdup rather than duplicating its implementation

The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.

More information about semantic patching is available at
http://coccinelle.lip6.fr/

Signed-off-by: Thomas Meyer
Signed-off-by: Marcelo Tosatti

Thomas Meyer
2011-12-27 17:17:11 +0800

26 Sep, 2011

1 commit

743eeb0b0 KVM: Intelligent device lookup on I/O bus ... Browse Code »

Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
is to call the read or write callback for each device registered
on the bus until we find a device which handles it.

Since the number of devices on a bus can be significant due to ioeventfds
and coalesced MMIO zones, this leads to a lot of overhead on each IO
operation.

Instead of registering devices, we now register ranges which points to
a device. Lookup is done using an efficient bsearch instead of a linear
search.

Performance test was conducted by comparing exit count per second with
200 ioeventfds created on one byte and the guest is trying to access a
different byte continuously (triggering usermode exits).
Before the patch the guest has achieved 259k exits per second, after the
patch the guest does 274k exits per second.

Cc: Avi Kivity
Cc: Marcelo Tosatti
Signed-off-by: Sasha Levin
Signed-off-by: Avi Kivity

Sasha Levin
2011-09-26 00:17:59 +0800

24 Jul, 2011

2 commits

ce88decff KVM: MMU: mmio page fault support ... Browse Code »

The idea is from Avi:

| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

When the page fault is caused by mmio, we cache the info in the shadow page
table, and also set the reserved bits in the shadow page table, so if the mmio
is caused again, we can quickly identify it and emulate it directly

Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it
can be reduced by this feature, and also avoid walking guest page table for
soft mmu.

[jan: fix operator precedence issue]

Signed-off-by: Xiao Guangrong
Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-07-24 16:50:40 +0800
fce92dce7 KVM: MMU: filter out the mmio pfn from the fault pfn ... Browse Code »

If the page fault is caused by mmio, the gfn can not be found in memslots, and
'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify
the mmio page fault.
And, to clarify the meaning of mmio pfn, we return fault page instead of bad
page when the gfn is not allowd to prefetch

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-07-24 16:50:34 +0800

12 Jul, 2011

4 commits

e03b644fe KVM: introduce kvm_read_guest_cached ... Browse Code »

Introduce kvm_read_guest_cached() function in addition to write one we
already have.

[ by glauber: export function signature in kvm header ]

Signed-off-by: Gleb Natapov
Signed-off-by: Glauber Costa
Acked-by: Rik van Riel
Tested-by: Eric Munson
Signed-off-by: Avi Kivity

Gleb Natapov
2011-07-12 18:17:01 +0800
1dda606c5 KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK ... Browse Code »

KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.

Signed-off-by: Alexander Graf
Signed-off-by: Avi Kivity

Alexander Graf
2011-07-12 18:16:17 +0800
d780592b9 KVM: Clean up error handling during VCPU creation ... Browse Code »

So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if
it fails. Move this confusing resonsibility back into the hands of
kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected,
all other archs cannot fail.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-07-12 16:45:08 +0800
8b0cedff0 KVM: use __copy_to_user/__clear_user to write guest page ... Browse Code »

Simply use __copy_to_user/__clear_user to write guest page since we have
already verified the user address when the memslot is set

Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2011-07-12 16:45:03 +0800

06 Jun, 2011

1 commit

74b5c5bff KVM: Initialize kvm before registering the mmu notifier ... Browse Code »

It doesn't make sense to ever see a half-initialized kvm structure on
mmu notifier callbacks. Previously, 85722cda changed the ordering to
ensure that the mmu_lock was initialized before mmu notifier
registration, but there is still a race where the mmu notifier could
come in and try accessing other portions of struct kvm before they are
intialized.

Solve this by moving the mmu notifier registration to occur after the
structure is completely initialized.

Google-Bug-Id: 452199
Signed-off-by: Mike Waychison
Signed-off-by: Avi Kivity

Mike Waychison
2011-06-06 16:27:52 +0800

26 May, 2011

1 commit

9e3bb6b6f KVM: add missing void __user * cast to access_ok() call ... Browse Code »

fa3d315a "KVM: Validate userspace_addr of memslot when registered" introduced
this new warning onn s390:

kvm_main.c: In function '__kvm_set_memory_region':
kvm_main.c:654:7: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast
arch/s390/include/asm/uaccess.h:53:19: note: expected 'const void *' but argument is of type '__u64'

Add the missing cast to get rid of it again...

Cc: Takuya Yoshikawa
Signed-off-by: Heiko Carstens
Signed-off-by: Avi Kivity

Heiko Carstens
2011-05-26 14:41:44 +0800

22 May, 2011

2 commits

85722cda3 KVM: Fix kvm mmu_notifier initialization order ... Browse Code »

Like the following, mmu_notifier can be called after registering
immediately. So, kvm have to initialize kvm->mmu_lock before it.

BUG: spinlock bad magic on CPU#0, kswapd0/342
lock: ffff8800af8c4000, .magic: 00000000, .owner: /-1, .owner_cpu: 0
Pid: 342, comm: kswapd0 Not tainted 2.6.39-rc5+ #1
Call Trace:
[] spin_bug+0x9c/0xa3
[] do_raw_spin_lock+0x29/0x13c
[] ? flush_tlb_others_ipi+0xaf/0xfd
[] _raw_spin_lock+0x9/0xb
[] kvm_mmu_notifier_clear_flush_young+0x2c/0x66 [kvm]
[] __mmu_notifier_clear_flush_young+0x2b/0x57
[] page_referenced_one+0x88/0xea
[] page_referenced+0x1fc/0x256
[] shrink_page_list+0x187/0x53a
[] shrink_inactive_list+0x1e0/0x33d
[] ? determine_dirtyable_memory+0x15/0x27
[] ? call_function_single_interrupt+0xe/0x20
[] shrink_zone+0x322/0x3de
[] ? zone_watermark_ok_safe+0xe2/0xf1
[] kswapd+0x516/0x818
[] ? shrink_zone+0x3de/0x3de
[] kthread+0x7d/0x85
[] kernel_thread_helper+0x4/0x10
[] ? __init_kthread_worker+0x37/0x37
[] ? gs_change+0xb/0xb

Signed-off-by: OGAWA Hirofumi
Signed-off-by: Avi Kivity

OGAWA Hirofumi
2011-05-22 20:48:12 +0800
fa3d315a4 KVM: Validate userspace_addr of memslot when registered ... Browse Code »

This way, we can avoid checking the user space address many times when
we read the guest memory.

Although we can do the same for write if we check which slots are
writable, we do not care write now: reading the guest memory happens
more often than writing.

[avi: change VERIFY_READ to VERIFY_WRITE]

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Avi Kivity

Takuya Yoshikawa
2011-05-22 20:47:56 +0800

11 May, 2011

1 commit

0ee8dcb87 KVM: cleanup memslot_id function ... Browse Code »

We can get memslot id from memslot->id directly

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-05-11 19:56:53 +0800

06 Apr, 2011

1 commit

0857b9e95 KVM: Enable async page fault processing ... Browse Code »

If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to
avoid sleeping on IO. Check for hwpoison is done at the same time,
otherwise check_user_page_hwpoison() will call GUP again and will put
vcpu to sleep.

Signed-off-by: Gleb Natapov
Signed-off-by: Avi Kivity

Gleb Natapov
2011-04-06 18:15:55 +0800

26 Mar, 2011

1 commit

16c29dafc Merge branch 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 ... Browse Code »

* 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
Introduce ARCH_NO_SYSDEV_OPS config option (v2)
cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
KVM: Use syscore_ops instead of sysdev class and sysdev
PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
timekeeping: Use syscore_ops instead of sysdev class and sysdev
x86: Use syscore_ops instead of sysdev classes and sysdevs

Linus Torvalds
2011-03-26 12:07:59 +0800

24 Mar, 2011

3 commits

cd7e48c5d kvm: use little-endian bitops ... Browse Code »

As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.

Signed-off-by: Akinobu Mita
Cc: Avi Kivity
Cc: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-03-24 10:46:16 +0800
5140a357e kvm: stop including asm-generic/bitops/le.h directly ... Browse Code »

asm-generic/bitops/le.h is only intended to be included directly from
asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h
which implements generic ext2 or minix bit operations.

This stops including asm-generic/bitops/le.h directly and use ext2
non-atomic bit operations instead.

It seems odd to use ext2_set_bit() on kvm, but it will replaced with
__set_bit_le() after introducing little endian bit operations for all
architectures. This indirect step is necessary to maintain bisectability
for some architectures which have their own little-endian bit operations.

Signed-off-by: Akinobu Mita
Cc: Avi Kivity
Cc: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-03-24 10:46:10 +0800
fb3600cc5 KVM: Use syscore_ops instead of sysdev class and sysdev ... Browse Code »

KVM uses a sysdev class and a sysdev for executing kvm_suspend()
after interrupts have been turned off on the boot CPU (during system
suspend) and for executing kvm_resume() before turning on interrupts
on the boot CPU (during system resume). However, since both of these
functions ignore their arguments, the entire mechanism may be
replaced with a struct syscore_ops object which is simpler.

Signed-off-by: Rafael J. Wysocki
Acked-by: Avi Kivity

Rafael J. Wysocki
2011-03-24 05:16:23 +0800

18 Mar, 2011

8 commits

e935b8372 KVM: Convert kvm_lock to raw_spinlock ... Browse Code »

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-03-18 00:08:30 +0800
217ece612 KVM: use yield_to instead of sleep in kvm_vcpu_on_spin ... Browse Code »

Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic
slowdowns of certain workloads, we instead use yield_to to get
another VCPU in the same KVM guest to run sooner.

This seems to give a 10-15% speedup in certain workloads.

Signed-off-by: Rik van Riel
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Rik van Riel
2011-03-18 00:08:29 +0800
34bb10b79 KVM: keep track of which task is running a KVM vcpu ... Browse Code »

Keep track of which task is running a KVM vcpu. This helps us
figure out later what task to wake up if we want to boost a
vcpu that got preempted.

Unfortunately there are no guarantees that the same task
always keeps the same vcpu, so we can only track the task
across a single "run" of the vcpu.

Signed-off-by: Rik van Riel
Signed-off-by: Avi Kivity

Rik van Riel
2011-03-18 00:08:29 +0800
fafc3dbaa KVM: Replace is_hwpoison_address with __get_user_pages ... Browse Code »

is_hwpoison_address only checks whether the page table entry is
hwpoisoned, regardless the memory page mapped. While __get_user_pages
will check both.

QEMU will clear the poisoned page table entry (via unmap/map) to make
it possible to allocate a new memory page for the virtual address
across guest rebooting. But it is also possible that the underlying
memory page is kept poisoned even after the corresponding page table
entry is cleared, that is, a new memory page can not be allocated.
__get_user_pages can catch these situations.

Signed-off-by: Huang Ying
Signed-off-by: Marcelo Tosatti

Huang Ying
2011-03-18 00:08:27 +0800
3cba41307 KVM: make make_all_cpus_request() lockless ... Browse Code »

Now, we have 'vcpu->mode' to judge whether need to send ipi to other
cpus, this way is very exact, so checking request bit is needless,
then we can drop the spinlock let it's collateral

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-03-18 00:08:26 +0800
6b7e2d099 KVM: Add "exiting guest mode" state ... Browse Code »

Currently we keep track of only two states: guest mode and host
mode. This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.

Also
1: No need atomically to read/write ->mode in vcpu's thread

2: reorganize struct kvm_vcpu to make ->mode and ->requests
in the same cache line explicitly

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-03-18 00:08:26 +0800
d48ead8b0 KVM: fix build warning within __kvm_set_memory_region() on s390 ... Browse Code »

Get rid of this warning:

CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c:596:12: warning: 'kvm_create_dirty_bitmap' defined but not used

The only caller of the function is within a !CONFIG_S390 section, so add the
same ifdef around kvm_create_dirty_bitmap() as well.

Signed-off-by: Heiko Carstens
Signed-off-by: Marcelo Tosatti

Heiko Carstens
2011-03-18 00:08:26 +0800
8234b22e1 KVM: MMU: Don't flush shadow when enabling dirty tracking ... Browse Code »

Instead, drop large mappings, which were the reason we dropped shadow.

Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti

Avi Kivity
2011-03-18 00:08:24 +0800

14 Jan, 2011

3 commits

22e5c47ee thp: add compound_trans_head() helper ... Browse Code »

Cleanup some code with common compound_trans_head helper.

Signed-off-by: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Marcelo Tosatti
Cc: Avi Kivity
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:48 +0800
8ee53820e thp: mmu_notifier_test_young ... Browse Code »

For GRU and EPT, we need gup-fast to set referenced bit too (this is why
it's correct to return 0 when shadow_access_mask is zero, it requires
gup-fast to set the referenced bit). qemu-kvm access already sets the
young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow
paging EPT minor fault we relay on gup-fast to signal the page is in
use...

We also need to check the young bits on the secondary pagetables for NPT
and not nested shadow mmu as the data may never get accessed again by the
primary pte.

Without this closer accuracy, we'd have to remove the heuristic that
avoids collapsing hugepages in hugepage virtual regions that have not even
a single subpage in use.

->test_young is full backwards compatible with GRU and other usages that
don't have young bits in pagetables set by the hardware and that should
nuke the secondary mmu mappings when ->clear_flush_young runs just like
EPT does.

Removing the heuristic that checks the young bit in
khugepaged/collapse_huge_page completely isn't so bad either probably but
I thought it was worth it and this makes it reliable.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:46 +0800
936a5fe6e thp: kvm mmu transparent hugepage support ... Browse Code »

This should work for both hugetlbfs and transparent hugepages.

[akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability]
Signed-off-by: Andrea Arcangeli
Cc: Avi Kivity
Cc: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:41 +0800