Eric Lee / smarc-fsl-linux-kernel

26 Mar, 2011

1 commit

16c29dafc Merge branch 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 ... Browse Code »

* 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
Introduce ARCH_NO_SYSDEV_OPS config option (v2)
cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
KVM: Use syscore_ops instead of sysdev class and sysdev
PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
timekeeping: Use syscore_ops instead of sysdev class and sysdev
x86: Use syscore_ops instead of sysdev classes and sysdevs

Linus Torvalds
2011-03-26 12:07:59 +0800

24 Mar, 2011

3 commits

cd7e48c5d kvm: use little-endian bitops ... Browse Code »

As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.

Signed-off-by: Akinobu Mita
Cc: Avi Kivity
Cc: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-03-24 10:46:16 +0800
5140a357e kvm: stop including asm-generic/bitops/le.h directly ... Browse Code »

asm-generic/bitops/le.h is only intended to be included directly from
asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h
which implements generic ext2 or minix bit operations.

This stops including asm-generic/bitops/le.h directly and use ext2
non-atomic bit operations instead.

It seems odd to use ext2_set_bit() on kvm, but it will replaced with
__set_bit_le() after introducing little endian bit operations for all
architectures. This indirect step is necessary to maintain bisectability
for some architectures which have their own little-endian bit operations.

Signed-off-by: Akinobu Mita
Cc: Avi Kivity
Cc: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-03-24 10:46:10 +0800
fb3600cc5 KVM: Use syscore_ops instead of sysdev class and sysdev ... Browse Code »

KVM uses a sysdev class and a sysdev for executing kvm_suspend()
after interrupts have been turned off on the boot CPU (during system
suspend) and for executing kvm_resume() before turning on interrupts
on the boot CPU (during system resume). However, since both of these
functions ignore their arguments, the entire mechanism may be
replaced with a struct syscore_ops object which is simpler.

Signed-off-by: Rafael J. Wysocki
Acked-by: Avi Kivity

Rafael J. Wysocki
2011-03-24 05:16:23 +0800

18 Mar, 2011

9 commits

c8ce057ea KVM: improve comment on rcu use in irqfd_deassign ... Browse Code »

The RCU use in kvm_irqfd_deassign is tricky: we have rcu_assign_pointer
but no synchronize_rcu: synchronize_rcu is done by kvm_irq_routing_update
which we share a spinlock with.

Fix up a comment in an attempt to make this clearer.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: Avi Kivity

Michael S. Tsirkin
2011-03-18 00:08:33 +0800
e935b8372 KVM: Convert kvm_lock to raw_spinlock ... Browse Code »

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-03-18 00:08:30 +0800
217ece612 KVM: use yield_to instead of sleep in kvm_vcpu_on_spin ... Browse Code »

Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic
slowdowns of certain workloads, we instead use yield_to to get
another VCPU in the same KVM guest to run sooner.

This seems to give a 10-15% speedup in certain workloads.

Signed-off-by: Rik van Riel
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Rik van Riel
2011-03-18 00:08:29 +0800
34bb10b79 KVM: keep track of which task is running a KVM vcpu ... Browse Code »

Keep track of which task is running a KVM vcpu. This helps us
figure out later what task to wake up if we want to boost a
vcpu that got preempted.

Unfortunately there are no guarantees that the same task
always keeps the same vcpu, so we can only track the task
across a single "run" of the vcpu.

Signed-off-by: Rik van Riel
Signed-off-by: Avi Kivity

Rik van Riel
2011-03-18 00:08:29 +0800
fafc3dbaa KVM: Replace is_hwpoison_address with __get_user_pages ... Browse Code »

is_hwpoison_address only checks whether the page table entry is
hwpoisoned, regardless the memory page mapped. While __get_user_pages
will check both.

QEMU will clear the poisoned page table entry (via unmap/map) to make
it possible to allocate a new memory page for the virtual address
across guest rebooting. But it is also possible that the underlying
memory page is kept poisoned even after the corresponding page table
entry is cleared, that is, a new memory page can not be allocated.
__get_user_pages can catch these situations.

Signed-off-by: Huang Ying
Signed-off-by: Marcelo Tosatti

Huang Ying
2011-03-18 00:08:27 +0800
3cba41307 KVM: make make_all_cpus_request() lockless ... Browse Code »

Now, we have 'vcpu->mode' to judge whether need to send ipi to other
cpus, this way is very exact, so checking request bit is needless,
then we can drop the spinlock let it's collateral

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-03-18 00:08:26 +0800
6b7e2d099 KVM: Add "exiting guest mode" state ... Browse Code »

Currently we keep track of only two states: guest mode and host
mode. This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.

Also
1: No need atomically to read/write ->mode in vcpu's thread

2: reorganize struct kvm_vcpu to make ->mode and ->requests
in the same cache line explicitly

Signed-off-by: Xiao Guangrong
Signed-off-by: Avi Kivity

Xiao Guangrong
2011-03-18 00:08:26 +0800
d48ead8b0 KVM: fix build warning within __kvm_set_memory_region() on s390 ... Browse Code »

Get rid of this warning:

CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c:596:12: warning: 'kvm_create_dirty_bitmap' defined but not used

The only caller of the function is within a !CONFIG_S390 section, so add the
same ifdef around kvm_create_dirty_bitmap() as well.

Signed-off-by: Heiko Carstens
Signed-off-by: Marcelo Tosatti

Heiko Carstens
2011-03-18 00:08:26 +0800
8234b22e1 KVM: MMU: Don't flush shadow when enabling dirty tracking ... Browse Code »

Instead, drop large mappings, which were the reason we dropped shadow.

Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti

Avi Kivity
2011-03-18 00:08:24 +0800

14 Jan, 2011

3 commits

22e5c47ee thp: add compound_trans_head() helper ... Browse Code »

Cleanup some code with common compound_trans_head helper.

Signed-off-by: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Marcelo Tosatti
Cc: Avi Kivity
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:48 +0800
8ee53820e thp: mmu_notifier_test_young ... Browse Code »

For GRU and EPT, we need gup-fast to set referenced bit too (this is why
it's correct to return 0 when shadow_access_mask is zero, it requires
gup-fast to set the referenced bit). qemu-kvm access already sets the
young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow
paging EPT minor fault we relay on gup-fast to signal the page is in
use...

We also need to check the young bits on the secondary pagetables for NPT
and not nested shadow mmu as the data may never get accessed again by the
primary pte.

Without this closer accuracy, we'd have to remove the heuristic that
avoids collapsing hugepages in hugepage virtual regions that have not even
a single subpage in use.

->test_young is full backwards compatible with GRU and other usages that
don't have young bits in pagetables set by the hardware and that should
nuke the secondary mmu mappings when ->clear_flush_young runs just like
EPT does.

Removing the heuristic that checks the young bit in
khugepaged/collapse_huge_page completely isn't so bad either probably but
I thought it was worth it and this makes it reliable.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:46 +0800
936a5fe6e thp: kvm mmu transparent hugepage support ... Browse Code »

This should work for both hugetlbfs and transparent hugepages.

[akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability]
Signed-off-by: Andrea Arcangeli
Cc: Avi Kivity
Cc: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:41 +0800

12 Jan, 2011

24 commits

b7c4145ba KVM: Don't spin on virt instruction faults during reboot ... Browse Code »

Since vmx blocks INIT signals, we disable virtualization extensions during
reboot. This leads to virtualization instructions faulting; we trap these
faults and spin while the reboot continues.

Unfortunately spinning on a non-preemptible kernel may block a task that
reboot depends on; this causes the reboot to hang.

Fix by skipping over the instruction and hoping for the best.

Signed-off-by: Avi Kivity

Avi Kivity
2011-01-12 17:30:18 +0800
a4ee1ca4a KVM: MMU: delay flush all tlbs on sync_page path ... Browse Code »

Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
| that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page()
| can consult the bit and force a flush if set.

Signed-off-by: Xiao Guangrong
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2011-01-12 17:29:51 +0800
bd2b53b20 KVM: fast-path msi injection with irqfd ... Browse Code »

Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.

While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.

This also adds some comments about locking rules and rcu usage in code.

Some notes on the design:
- Use pointer into the rt instead of copying an entry,
to make it possible to use rcu, thus side-stepping
locking complexities. We also save some memory this way.
- Old workqueue code is still used for level irqs.
I don't think we DTRT with level anyway, however,
it seems easier to keep the code around as
it has been thought through and debugged, and fix level later than
rip out and re-instate it later.

Signed-off-by: Michael S. Tsirkin
Acked-by: Marcelo Tosatti
Acked-by: Gregory Haskins
Signed-off-by: Avi Kivity

Michael S. Tsirkin
2011-01-12 17:29:38 +0800
75b7127c3 KVM: rename hardware_[dis|en]able() to *_nolock() and add locking wrappers ... Browse Code »

The naming convension of hardware_[dis|en]able family is little bit confusing
because only hardware_[dis|en]able_all are using _nolock suffix.

Renaming current hardware_[dis|en]able() to *_nolock() and using
hardware_[dis|en]able() as wrapper functions which take kvm_lock for them
reduces extra confusion.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2011-01-12 17:29:29 +0800
97e91e28f KVM: take kvm_lock for hardware_disable() during cpu hotplug ... Browse Code »

In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2011-01-12 17:29:28 +0800
51de271d4 KVM: Clean up kvm_vm_ioctl_assigned_device ... Browse Code »

Any arch not supporting device assigment will also not build
assigned-dev.c. So testing for KVM_CAP_DEVICE_DEASSIGNMENT is pointless.
KVM_CAP_ASSIGN_DEV_IRQ is unconditinally set. Moreover, add a default
case for dispatching the ioctl.

Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-01-12 17:29:24 +0800
ed78661f2 KVM: Save/restore state of assigned PCI device ... Browse Code »

The guest may change states that pci_reset_function does not touch. So
we better save/restore the assigned device across guest usage.

Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-01-12 17:29:22 +0800
1e001d49f KVM: Refactor IRQ names of assigned devices ... Browse Code »

Cosmetic change, but it helps to correlate IRQs with PCI devices.

Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-01-12 17:29:21 +0800
0645211c4 KVM: Switch assigned device IRQ forwarding to threaded handler ... Browse Code »

This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.

Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.

Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-01-12 17:29:20 +0800
0c106b5aa KVM: Clear assigned guest IRQ on release ... Browse Code »

When we deassign a guest IRQ, clear the potentially asserted guest line.
There might be no chance for the guest to do this, specifically if we
switch from INTx to MSI mode.

Acked-by: Alex Williamson
Acked-by: Michael S. Tsirkin
Signed-off-by: Jan Kiszka
Signed-off-by: Marcelo Tosatti

Jan Kiszka
2011-01-12 17:29:19 +0800
d89f5eff7 KVM: Clean up vm creation and release ... Browse Code »

IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.

It also fixes error clean-up on failures of kvm_create_vm for IA64.

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-01-12 17:29:09 +0800
57e7fbee1 KVM: Refactor srcu struct release on early errors ... Browse Code »

Signed-off-by: Jan Kiszka
Signed-off-by: Avi Kivity

Jan Kiszka
2011-01-12 17:29:05 +0800
64f638c7c KVM: fix the race while wakeup all pv guest ... Browse Code »

In kvm_async_pf_wakeup_all(), we add a dummy apf to vcpu->async_pf.done
without holding vcpu->async_pf.lock, it will break if we are handling apfs
at this time.

Also use 'list_empty_careful()' instead of 'list_empty()'

Signed-off-by: Xiao Guangrong
Acked-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2011-01-12 17:29:03 +0800
15096ffce KVM: handle more completed apfs if possible ... Browse Code »

If it's no need to inject async #PF to PV guest we can handle
more completed apfs at one time, so we can retry guest #PF
as early as possible

Signed-off-by: Xiao Guangrong
Acked-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Xiao Guangrong
2011-01-12 17:29:01 +0800
265350376 KVM: replace vmalloc and memset with vzalloc ... Browse Code »

Let's use newly introduced vzalloc().

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Jesper Juhl
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2011-01-12 17:28:55 +0800
aac876369 KVM: get rid of warning within kvm_dev_ioctl_create_vm ... Browse Code »

Fixes this:

CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_dev_ioctl_create_vm':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1828:10: warning: unused variable 'r'

Signed-off-by: Heiko Carstens
Signed-off-by: Marcelo Tosatti

Heiko Carstens
2011-01-12 17:28:50 +0800
3bcc8a8c6 KVM: add cast within kvm_clear_guest_page to fix warning ... Browse Code »

Fixes this:

CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_clear_guest_page':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1224:2: warning: passing argument 3 of 'kvm_write_guest_page' makes pointer from integer without a cast
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1185:5: note: expected 'const void *' but argument is of type 'long unsigned int'

Signed-off-by: Heiko Carstens
Signed-off-by: Marcelo Tosatti

Heiko Carstens
2011-01-12 17:28:49 +0800
6f9e5c170 KVM: use kmalloc() for small dirty bitmaps ... Browse Code »

Currently we are using vmalloc() for all dirty bitmaps even if
they are small enough, say less than K bytes.

We use kmalloc() if dirty bitmap size is less than or equal to
PAGE_SIZE so that we can avoid vmalloc area usage for VGA.

This will also make the logging start/stop faster.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2011-01-12 17:28:48 +0800
515a01279 KVM: pre-allocate one more dirty bitmap to avoid vmalloc() ... Browse Code »

Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.

This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.

Performance improvement:
I measured performance for the case of VGA update by trace-cmd.
The result was 1.5 times faster than the original one.

In the case of live migration, the improvement ratio depends on the workload
and the guest memory size. In general, the larger the memory size is the more
benefits we get.

Note:
This does not change other architectures's logic but the allocation size
becomes twice. This will increase the actual memory consumption only when
the new size changes the number of pages allocated by vmalloc().

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Fernando Luis Vazquez Cao
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2011-01-12 17:28:46 +0800
a36a57b1a KVM: introduce wrapper functions for creating/destroying dirty bitmaps ... Browse Code »

This makes it easy to change the way of allocating/freeing dirty bitmaps.

Signed-off-by: Takuya Yoshikawa
Signed-off-by: Fernando Luis Vazquez Cao
Signed-off-by: Marcelo Tosatti

Takuya Yoshikawa
2011-01-12 17:28:45 +0800
64be50070 KVM: x86: trace "exit to userspace" event ... Browse Code »

Add tracepoint for userspace exit.

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2011-01-12 17:28:44 +0800
612819c3c KVM: propagate fault r/w information to gup(), allow read-only memory ... Browse Code »

As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2011-01-12 17:28:40 +0800
8030089f9 KVM: improve hva_to_pfn() readability ... Browse Code »

Improve vma handling code readability in hva_to_pfn() and fix
async pf handling code to properly check vma returned by find_vma().

Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2011-01-12 17:23:23 +0800
7c90705bf KVM: Inject asynchronous page fault into a PV guest if page is swapped out. ... Browse Code »

Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.

Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.

Acked-by: Rik van Riel
Signed-off-by: Gleb Natapov
Signed-off-by: Marcelo Tosatti

Gleb Natapov
2011-01-12 17:23:17 +0800