Eric Lee / smarc-fsl-linux-kernel

29 Jul, 2008

2 commits

e930bffe9 KVM: Synchronize guest physical memory map to host virtual memory map ... Browse Code »

Synchronize changes to host virtual addresses which are part of
a KVM memory slot to the KVM shadow mmu. This allows pte operations
like swapping, page migration, and madvise() to transparently work
with KVM.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Avi Kivity

Andrea Arcangeli
2008-07-29 17:33:53 +0800
604b38ac0 KVM: Allow browsing memslots with mmu_lock ... Browse Code »

This allows reading memslots with only the mmu_lock hold for mmu
notifiers that runs in atomic context and with mmu_lock held.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Avi Kivity

Andrea Arcangeli
2008-07-29 17:33:50 +0800

25 Jul, 2008

1 commit

7d9dbca34 flag parameters: anon_inode_getfd extension ... Browse Code »

This patch just extends the anon_inode_getfd interface to take an additional
parameter with a flag value. The flag value is passed on to
get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag.

No actual semantic changes here, the changed callers all pass 0 for now.

[akpm@linux-foundation.org: KVM fix]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:27 +0800

20 Jul, 2008

11 commits

597a5f551 KVM: Adjust smp_call_function_mask() callers to new requirements ... Browse Code »

smp_call_function_mask() now complains when called in a preemptible context;
adjust its callers accordingly.

Signed-off-by: Avi Kivity

Avi Kivity
2008-07-20 19:29:54 +0800
34d4cb8fc KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction ... Browse Code »

Flush the shadow mmu before removing regions to avoid stale entries.

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2008-07-20 17:42:40 +0800
eff0114ac KVM: s390: dont allocate dirty bitmap ... Browse Code »

This patch #ifdefs the bitmap array for dirty tracking. We don't have dirty
tracking on s390 today, and we'd love to use our storage keys to store the
dirty information for migration. Therefore, we won't need this array at all,
and due to our limited amount of vmalloc space this limits the amount of guests
we can run.

Signed-off-by: Carsten Otte
Signed-off-by: Avi Kivity

Carsten Otte
2008-07-20 17:42:36 +0800
9ef621d3b KVM: Support mixed endian machines ... Browse Code »

Currently kvmtrace is not portable. This will prevent from copying a
trace file from big-endian target to little-endian workstation for analysis.
In the patch, kernel outputs metadata containing a magic number to trace
log, and changes 64-bit words to be u64 instead of a pair of u32s.

Signed-off-by: Tan Li
Acked-by: Jerone Young
Acked-by: Hollis Blanchard
Signed-off-by: Avi Kivity

Tan, Li
2008-07-20 17:42:32 +0800
5f94c1741 KVM: Add coalesced MMIO support (common part) ... Browse Code »

This patch adds all needed structures to coalesce MMIOs.
Until an architecture uses it, it is not compiled.

Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
can be coalesced:

- KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
a memory area where MMIOs can be coalesced until the next switch to
user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.

- KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).

The userspace client can check kernel coalesced MMIO availability by asking
ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
or the page offset where will be stored the ring buffer.
The page offset depends on the architecture.

After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
to the address of the start of th kvm_run structure. The MMIO ring buffer
is defined by the structure kvm_coalesced_mmio_ring.

[akio: fix oops during guest shutdown]

Signed-off-by: Laurent Vivier
Signed-off-by: Akio Takebe
Signed-off-by: Avi Kivity

Laurent Vivier
2008-07-20 17:42:31 +0800
92760499d KVM: kvm_io_device: extend in_range() to manage len and write attribute ... Browse Code »

Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).

This modification allows to use kvm_io_device with coalesced MMIO.

Signed-off-by: Laurent Vivier
Signed-off-by: Avi Kivity

Laurent Vivier
2008-07-20 17:42:30 +0800
3419ffc8e KVM: IOAPIC/LAPIC: Enable NMI support ... Browse Code »

[avi: fix ia64 build breakage]

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2008-07-20 17:42:25 +0800
7cc888307 KVM: Remove decache_vcpus_on_cpu() and related callbacks ... Browse Code »

Obsoleted by the vmx-specific per-cpu list.

Signed-off-by: Avi Kivity

Avi Kivity
2008-07-20 17:42:25 +0800
4ecac3fd6 KVM: Handle virtualization instruction #UD faults during reboot ... Browse Code »

KVM turns off hardware virtualization extensions during reboot, in order
to disassociate the memory used by the virtualization extensions from the
processor, and in order to have the system in a consistent state.
Unfortunately virtual machines may still be running while this goes on,
and once virtualization extensions are turned off, any virtulization
instruction will #UD on execution.

Fix by adding an exception handler to virtualization instructions; if we get
an exception during reboot, we simply spin waiting for the reset to complete.
If it's a true exception, BUG() so we can have our stack trace.

Signed-off-by: Avi Kivity

Avi Kivity
2008-07-20 17:41:43 +0800
2e2e3738a KVM: Handle vma regions with no backing page ... Browse Code »

This patch allows VMAs that contain no backing page to be used for guest
memory. This is useful for assigning mmio regions to a guest.

Signed-off-by: Anthony Liguori
Signed-off-by: Avi Kivity

Anthony Liguori
2008-07-20 17:40:49 +0800
1e1c65e03 KVM: remove long -> void *user -> long cast ... Browse Code »

kvm_dev_ioctl casts the arg value to void __user *, just to recast it
again to long. This seems unnecessary.

According to objdump the binary code on x86 is unchanged by this patch.

Signed-off-by: Christian Borntraeger
Signed-off-by: Avi Kivity

Christian Borntraeger
2008-07-20 17:40:46 +0800

16 Jul, 2008

1 commit

1a781a777 Merge branch 'generic-ipi' into generic-ipi-for-linus ... Browse Code »

Conflicts:

arch/powerpc/Kconfig
arch/s390/kernel/time.c
arch/x86/kernel/apic_32.c
arch/x86/kernel/cpu/perfctr-watchdog.c
arch/x86/kernel/i8259_64.c
arch/x86/kernel/ldt.c
arch/x86/kernel/nmi_64.c
arch/x86/kernel/smpboot.c
arch/x86/xen/smp.c
include/asm-x86/hw_irq_32.h
include/asm-x86/hw_irq_64.h
include/asm-x86/mach-default/irq_vectors.h
include/asm-x86/mach-voyager/irq_vectors.h
include/asm-x86/smp.h
kernel/Makefile

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-07-16 03:55:59 +0800

06 Jul, 2008

1 commit

35baff256 KVM: IOAPIC: Fix level-triggered irq injection hang ... Browse Code »

The "remote_irr" variable is used to indicate an interrupt
which has been received by the LAPIC, but not acked.

In our EOI handler, we unset remote_irr and re-inject the
interrupt if the interrupt line is still asserted.

However, we do not set remote_irr here, leading to a
situation where if kvm_ioapic_set_irq() is called, then we go
ahead and call ioapic_service(). This means that IRR is
re-asserted even though the interrupt is currently in service
(i.e. LAPIC IRR is cleared and ISR/TMR set)

The issue with this is that when the currently executing
interrupt handler finishes and writes LAPIC EOI, then TMR is
unset and EOI sent to the IOAPIC. Since IRR is now asserted,
but TMR is not, then when the second interrupt is handled,
no EOI is sent and if there is any pending interrupt, it is
not re-injected.

This fixes a hang only seen while running mke2fs -j on an
8Gb virtio disk backed by a fully sparse raw file, with
aliguori "avoid fragmented virtio-blk transfers by copying"
changes.

Signed-off-by: Mark McLoughlin
Acked-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Mark McLoughlin
2008-07-06 16:05:35 +0800

26 Jun, 2008

2 commits

15c8b6c1a on_each_cpu(): kill unused 'retry' parameter ... Browse Code »

It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge
Reviewed-by: Paul E. McKenney
Signed-off-by: Jens Axboe

Jens Axboe
2008-06-26 17:24:38 +0800
8691e5a8f smp_call_function: get rid of the unused nonatomic/retry argument ... Browse Code »

It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge
Signed-off-by: Jens Axboe

Jens Axboe
2008-06-26 17:24:35 +0800

24 Jun, 2008

1 commit

4fa6b9c5d KVM: ioapic: fix lost interrupt when changing a device's irq ... Browse Code »

The ioapic acknowledge path translates interrupt vectors to irqs. It
currently uses a first match algorithm, stopping when it finds the first
redirection table entry containing the vector. That fails however if the
guest changes the irq to a different line, leaving the old redirection table
entry in place (though masked). Result is interrupts not making it to the
guest.

Fix by always scanning the entire redirection table.

Signed-off-by: Avi Kivity

Avi Kivity
2008-06-24 17:23:55 +0800

07 Jun, 2008

1 commit

ff4b9df87 KVM: IOAPIC: only set remote_irr if interrupt was injected ... Browse Code »

There's a bug in the IOAPIC code for level-triggered interrupts. Its
relatively easy to trigger by sharing (virtio-blk + usbtablet was the
testcase, initially reported by Gerd von Egidy).

The "remote_irr" variable is used to indicate accepted but not yet acked
interrupts. Its cleared from the EOI handler.

Problem is that the EOI handler clears remote_irr unconditionally, even
if it reinjected another pending interrupt.

In that case, kvm_ioapic_set_irq() proceeds to ioapic_service() which
sets remote_irr even if it failed to inject (since the IRR was high due
to EOI reinjection).

Since the TMR bit has been cleared by the first EOI, the second one
fails to clear remote_irr.

End result is interrupt line dead.

Fix it by setting remote_irr only if a new pending interrupt has been
generated (and the TMR bit for vector in question set).

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2008-06-07 02:32:39 +0800

18 May, 2008

1 commit

e5c239cfd KVM: Fix kvm_vcpu_block() task state race ... Browse Code »

There's still a race in kvm_vcpu_block(), if a wake_up_interruptible()
call happens before the task state is set to TASK_INTERRUPTIBLE:

CPU0 CPU1

kvm_vcpu_block

add_wait_queue

kvm_cpu_has_interrupt = 0
set interrupt
if (waitqueue_active())
wake_up_interruptible()

kvm_cpu_has_pending_timer
kvm_arch_vcpu_runnable
signal_pending

set_current_state(TASK_INTERRUPTIBLE)
schedule()

Can be fixed by using prepare_to_wait() which sets the task state before
testing for the wait condition.

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2008-05-18 19:37:12 +0800

04 May, 2008

1 commit

0d1502989 KVM: Export necessary function for EPT ... Browse Code »

Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity

Sheng Yang
2008-05-04 19:44:40 +0800

02 May, 2008

1 commit

2030a42ce [PATCH] sanitize anon_inode_getfd() ... Browse Code »

a) none of the callers even looks at inode or file returned by anon_inode_getfd()
b) any caller that would try to look at those would be racy, since by the time
it returns we might have raced with close() from another thread and that
file would be pining for fjords.

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:50 +0800

27 Apr, 2008

14 commits

66c0b394f KVM: kill file->f_count abuse in kvm ... Browse Code »

Use kvm own refcounting instead of playing with ->filp->f_count.
That will allow to get rid of a lot of crap in anon_inode_getfd() and
kill a race in kvm_dev_ioctl_create_vm() (file might have been closed
immediately by another thread, so ->filp might point to already freed
struct file when we get around to setting it).

Signed-off-by: Al Viro
Signed-off-by: Avi Kivity

Al Viro
2008-04-27 23:21:46 +0800
76f7c8790 KVM: Rename debugfs_dir to kvm_debugfs_dir ... Browse Code »

It's a globally exported symbol now.

Signed-off-by: Hollis Blanchard
Signed-off-by: Avi Kivity

Hollis Blanchard
2008-04-27 23:21:36 +0800
62d9f0dbc KVM: add ioctls to save/store mpstate ... Browse Code »

So userspace can save/restore the mpstate during migration.

[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]

Signed-off-by: Marcelo Tosatti
Signed-off-by: Christian Borntraeger
Signed-off-by: Carsten Otte
Signed-off-by: Avi Kivity

Marcelo Tosatti
2008-04-27 23:21:16 +0800
3d80840d9 KVM: hlt emulation should take in-kernel APIC/PIT timers into account ... Browse Code »

Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.

Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:

CPU0 CPU1
if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
schedule()
atomic_inc(pit_timer->pending)

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2008-04-27 17:04:11 +0800
d4c9ff2d1 KVM: Add kvm trace userspace interface ... Browse Code »

This interface allows user a space application to read the trace of kvm
related events through relayfs.

Signed-off-by: Feng (Eric) Liu
Signed-off-by: Avi Kivity

Feng(Eric) Liu
2008-04-27 17:01:22 +0800
35149e212 KVM: MMU: Don't assume struct page for x86 ... Browse Code »

This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty(). Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.

We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results. When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.

This does not implement support for avoiding reference counting for reserved
RAM or for IO memory. However, it should make those things pretty straight
forward.

Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those. I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.

[avi: fix overflow when shifting left pfns by adding casts]

Signed-off-by: Anthony Liguori
Signed-off-by: Avi Kivity

Anthony Liguori
2008-04-27 17:01:15 +0800
d39f13b0d KVM: add vm refcounting ... Browse Code »

the main purpose of adding this functions is the abilaty to release the
spinlock that protect the kvm list while still be able to do operations
on a specific kvm in a safe way.

Signed-off-by: Izik Eidus
Signed-off-by: Avi Kivity

Izik Eidus
2008-04-27 17:00:56 +0800
3e4bb3ac9 KVM: Use kzalloc to avoid allocating kvm_regs from kernel stack ... Browse Code »

Since the size of kvm_regs is too big to allocate from kernel stack on ia64,
use kzalloc to allocate it.

Signed-off-by: Xiantao Zhang
Signed-off-by: Avi Kivity

Xiantao Zhang
2008-04-27 16:53:26 +0800
05da45583 KVM: MMU: large page support ... Browse Code »

Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed. If the largepage contains
write-protected pages, a large pte is not used.

Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.

Anthony measures a 4% improvement on 4-way kernbench, with NPT.

Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity

Marcelo Tosatti
2008-04-27 16:53:25 +0800
2e53d63ac KVM: MMU: ignore zapped root pagetables ... Browse Code »

Mark zapped root pagetables as invalid and ignore such pages during lookup.

This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.

Signed-off-by: Marcelo Tosatti
Cc: Anthony Liguori
Signed-off-by: Avi Kivity

Marcelo Tosatti
2008-04-27 16:53:25 +0800
0aac03f07 KVM: Disable pagefaults during copy_from_user_inatomic() ... Browse Code »

With CONFIG_PREEMPT=n, this is needed in order to disable the fault-in
code from sleeping.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Avi Kivity

Andrea Arcangeli
2008-04-27 16:53:18 +0800
adb1ff467 KVM: Limit vcpu mmap size to one page on non-x86 ... Browse Code »

The second page is only needed on archs that support pio.

Noted by Carsten Otte.

Signed-off-by: Avi Kivity

Avi Kivity
2008-04-27 16:53:17 +0800
09566765e KVM: Only x86 has pio ... Browse Code »

Signed-off-by: Avi Kivity

Avi Kivity
2008-04-27 16:53:15 +0800
5c5027425 KVM: constify function pointer tables ... Browse Code »

Signed-off-by: Jan Engelhardt
Signed-off-by: Avi Kivity

Jan Engelhardt
2008-04-27 16:53:15 +0800

04 Mar, 2008

2 commits

8c35f237f KVM: Route irq 0 to vcpu 0 exclusively ... Browse Code »

Some Linux versions allow the timer interrupt to be processed by more than
one cpu, leading to hangs due to tsc instability. Work around the issue
by only disaptching the interrupt to vcpu 0.

Problem analyzed (and patch tested) by Sheng Yang.

Signed-off-by: Avi Kivity

Avi Kivity
2008-03-04 21:19:48 +0800
72dc67a69 KVM: remove the usage of the mmap_sem for the protection of the memory slots. ... Browse Code »

This patch replaces the mmap_sem lock for the memory slots with a new
kvm private lock, it is needed beacuse untill now there were cases where
kvm accesses user memory while holding the mmap semaphore.

Signed-off-by: Izik Eidus
Signed-off-by: Avi Kivity

Izik Eidus
2008-03-04 21:19:40 +0800

09 Feb, 2008

1 commit

8b88b0998 libfs: allow error return from simple attributes ... Browse Code »

Sometimes simple attributes might need to return an error, e.g. for
acquiring a mutex interruptibly. In fact we have that situation in
spufs already which is the original user of the simple attributes. This
patch merged the temporarily forked attributes in spufs back into the
main ones and allows to return errors.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Christoph Hellwig
Cc:
Cc: Arnd Bergmann
Cc: Greg KH
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2008-02-09 01:22:34 +0800