29 Apr, 2008
1 commit
-
Add a proper extern for late_time_init in include/linux/init.h
Signed-off-by: Adrian Bunk
Acked-by: Ingo Molnar
Cc: Thomas Gleixner
Cc: john stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Apr, 2008
9 commits
-
If there's no VSA2 (ie, if we're using tinybios or OpenFirmware), use the
GLIU's P2D Range Offset Descriptor to determine how much memory we have
available for the framebuffer.Originally based on a patch by Jordan Crouse. Tested with OpenFirmware;
Pascal informs me that tinybios has a stub that fills in P2D_RO0.Signed-off-by: Andres Salomon
Cc: Jordan Crouse
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
..Rather than using magic constants.
Signed-off-by: Andres Salomon
Cc: Jordan Crouse
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is generic VSA2 detection. It's used by OLPC to determine whether or not
the BIOS contains VSA2, but since other BIOSes are coming out that don't use
the VSA (ie, tinybios), it might end up being useful for others.Signed-off-by: Andres Salomon
Acked-by: Alan Cox
Cc: Jordan Crouse
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This cleans up a few MSR-using drivers in the following manner:
- Ensures MSRs are all defined in asm/geode.h, rather than in misc
places
- Makes the naming consistent; cs553[56] ones begin with MSR_,
GX-specific ones start with MSR_GX_, and LX-specific ones start
with MSR_LX_. Also, make the names match the data sheet.
- Use MSR names rather than numbers in source code
- Document the fact that the LX's MSR_PADSEL has the wrong value
in the data sheet. That's, uh, good to note.Signed-off-by: Andres Salomon
Acked-by: Jordan Crouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Huge ptes have a special type on s390 and cannot be handled with the standard
pte functions in certain cases, e.g. because of a different location of the
invalid bit. This patch adds some new architecture- specific functions to
hugetlb common code, as a prerequisite for the s390 large page support.This won't affect other architectures in functionality, but I need to add some
new dummy inline functions to the headers.Acked-by: Martin Schwidefsky
Signed-off-by: Gerald Schaefer
Cc: Paul Mundt
Cc: "Luck, Tony"
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A cow break on a hugetlbfs page with page_count > 1 will set a new pte with
set_huge_pte_at(), w/o any tlb flush operation. The old pte will remain in
the tlb and subsequent write access to the page will result in a page fault
loop, for as long as it may take until the tlb is flushed from somewhere else.
This patch introduces an architecture-specific huge_ptep_clear_flush()
function, which is called before the the set_huge_pte_at() in hugetlb_cow().ATTENTION: This is just a nop on all architectures for now, the s390
implementation will come with our large page patch later. Other architectures
should define their own huge_ptep_clear_flush() if needed.Acked-by: Martin Schwidefsky
Signed-off-by: Gerald Schaefer
Cc: Paul Mundt
Cc: "Luck, Tony"
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch moves all architecture functions for hugetlb to architecture header
files (include/asm-foo/hugetlb.h) and converts all macros to inline functions.
It also removes (!) ARCH_HAS_HUGEPAGE_ONLY_RANGE,
ARCH_HAS_HUGETLB_FREE_PGD_RANGE, ARCH_HAS_PREPARE_HUGEPAGE_RANGE,
ARCH_HAS_SETCLEAR_HUGE_PTE and ARCH_HAS_HUGETLB_PREFAULT_HOOK.Getting rid of the ARCH_HAS_xxx #ifdef and macro fugliness should increase
readability and maintainability, at the price of some code duplication. An
asm-generic common part would have reduced the loc, but we would end up with
new ARCH_HAS_xxx defines eventually.Acked-by: Martin Schwidefsky
Signed-off-by: Gerald Schaefer
Cc: Paul Mundt
Cc: "Luck, Tony"
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
model (which is more dynamic than most). Instead, they had proposed to
implement it with an additional path through vm_normal_page(), using a bit in
the pte to determine whether or not the page should be refcounted:vm_normal_page()
{
...
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
#ifdef s390
if (!mixedmap_refcount_pte(pte))
return NULL;
#else
if (!pfn_valid(pfn))
return NULL;
#endif
goto out;
}
...
}This is fine, however if we are allowed to use a bit in the pte to determine
refcountedness, we can use that to _completely_ replace all the vma based
schemes. So instead of adding more cases to the already complex vma-based
scheme, we can have a clearly seperate and simple pte-based scheme (and get
slightly better code generation in the process):vm_normal_page()
{
#ifdef s390
if (!mixedmap_refcount_pte(pte))
return NULL;
return pte_page(pte);
#else
...
#endif
}And finally, we may rather make this concept usable by any architecture rather
than making it s390 only, so implement a new type of pte state for this.
Unfortunately the old vma based code must stay, because some architectures may
not be able to spare pte bits. This makes vm_normal_page a little bit more
ugly than we would like, but the 2 cases are clearly seperate.So introduce a pte_special pte state, and use it in mm/memory.c. It is
currently a noop for all architectures, so this doesn't actually result in any
compiled code changes to mm/memory.o.BTW:
I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
The reason is that, regardless of where vm_normal_page is actually
implemented, the *abstraction* is still exactly the same. Also, while it
depends on whether the architecture has pte_special or not, that is the
only two possible cases, and it really isn't an arch specific function --
the role of the arch code should be to provide primitive functions and
accessors with which to build the core code; pte_special does that. We do
not want architectures to know or care about vm_normal_page itself, and
we definitely don't want them being able to invent something new there
out of sight of mm/ code. If we made vm_normal_page an arch function, then
we have to make vm_insert_mixed (next patch) an arch function too. So I
don't think moving it to arch code fundamentally improves any abstractions,
while it does practically make the code more difficult to follow, for both
mm and arch developers, and easier to misuse.[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin
Acked-by: Carsten Otte
Cc: Jared Hulbert
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
KVM: kill file->f_count abuse in kvm
KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
KVM: SVM: remove selective CR0 comment
KVM: SVM: remove now obsolete FIXME comment
KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
KVM: export kvm_lapic_set_tpr() to modules
KVM: SVM: sync TPR value to V_TPR field in the VMCB
KVM: ppc: PowerPC 440 KVM implementation
KVM: Add MAINTAINERS entry for PowerPC KVM
KVM: ppc: Add DCR access information to struct kvm_run
ppc: Export tlb_44x_hwater for KVM
KVM: Rename debugfs_dir to kvm_debugfs_dir
KVM: x86 emulator: fix lea to really get the effective address
KVM: x86 emulator: fix smsw and lmsw with a memory operand
KVM: x86 emulator: initialize src.val and dst.val for register operands
KVM: SVM: force a new asid when initializing the vmcb
KVM: fix kvm_vcpu_kick vs __vcpu_run race
KVM: add ioctls to save/store mpstate
KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
...
27 Apr, 2008
30 commits
-
So userspace can save/restore the mpstate during migration.
[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]Signed-off-by: Marcelo Tosatti
Signed-off-by: Christian Borntraeger
Signed-off-by: Carsten Otte
Signed-off-by: Avi Kivity -
We wish to export it to userspace, so move it into the kvm namespace.
Signed-off-by: Avi Kivity
-
Trace markers allow userspace to trace execution of a virtual machine
in order to monitor its performance.Signed-off-by: Feng (Eric) Liu
Signed-off-by: Avi Kivity -
To properly forward a MCE occured while the guest is running to the host, we
have to intercept this exception and call the host handler by hand. This is
implemented by this patch.Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity -
This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty(). Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results. When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.This does not implement support for avoiding reference counting for reserved
RAM or for IO memory. However, it should make those things pretty straight
forward.Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those. I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.[avi: fix overflow when shifting left pfns by adding casts]
Signed-off-by: Anthony Liguori
Signed-off-by: Avi Kivity -
The kvm_host.h file for x86 declares the functions kvm_set_cr[0348]. In the
header file their second parameter is named cr0 in all cases. This patch
renames the parameters so that they match the function name.Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity -
Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.Also fix some callsites that were not grabbing the lock properly.
[avi: drop slots_lock while in guest mode to avoid holding the lock
for indefinite periods]Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm. It allows operating systems which use it,
like freedos, to run as kvm guests.Signed-off-by: Izik Eidus
Signed-off-by: Avi Kivity -
Signed-off-by: Izik Eidus
Signed-off-by: Avi Kivity -
Signed-off-by: Avi Kivity
-
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're doneSigned-off-by: Glauber Costa
Signed-off-by: Avi Kivity -
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_opsSigned-off-by: Glauber Costa
Signed-off-by: Avi Kivity -
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.Don't report the feature if two dimensional paging is enabled.
[avi:
- one mmu_op hypercall instead of one per op
- allow 64-bit gpa on hypercall
- don't pass host errors (-ENOMEM) to guest][akpm: warning fix on i386]
Signed-off-by: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Avi Kivity -
Signed-off-by: Avi Kivity
-
Add basic KVM paravirt support. Avoid vm-exits on IO delays.
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity -
The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.[marcelo: make last_injected_time per-guest]
Signed-off-by: Sheng Yang
Signed-off-by: Marcelo Tosatti
Tested-and-Acked-by: Alex Davis
Signed-off-by: Avi Kivity -
Names like 'set_cr3()' look dangerously close to affecting the host.
Signed-off-by: Avi Kivity
-
Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed. If the largepage contains
write-protected pages, a large pte is not used.Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.Anthony measures a 4% improvement on 4-way kernbench, with NPT.
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
Mark zapped root pagetables as invalid and ignore such pages during lookup.
This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.Signed-off-by: Marcelo Tosatti
Cc: Anthony Liguori
Signed-off-by: Avi Kivity -
Signed-off-by: Amit Shah
Signed-off-by: Avi Kivity -
This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.The area is binary compatible with xen, as we use the same shadow_info
structure.[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]Signed-off-by: Glauber de Oliveira Costa
Signed-off-by: Marcelo Tosatti
Signed-off-by: Avi Kivity -
The load_pdptrs() function is required in the SVM module for NPT support.
Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity -
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity -
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.Signed-off-by: Joerg Roedel
Signed-off-by: Avi Kivity -
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]Signed-off-by: Sheng Yang
Signed-off-by: Avi Kivity -
Signed-off-by: Yaozu (Eddie) Dong
Signed-off-by: Avi Kivity -
OK, so 25-mm1 gave a lockdep error which made me look into this.
The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d15612c61c2e26a169567becf088e71b8ffThe problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:idle routines are entered with IRQs disabled
idle routines will exit with IRQs enabledNearly all already did this in one form or another.
Merge the 32 and 64 bit bits so they no longer have different bugs.
As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.Signed-off-by: Peter Zijlstra
Tested-by: Bob Copeland
Signed-off-by: Ingo Molnar -
…nux-2.6-x86-bigbox-bootmem-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
x86_64/mm: check and print vmemmap allocation continuous
x86_64: fix setup_node_bootmem to support big mem excluding with memmap
x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
mm: allow reserve_bootmem() cross nodes
mm: offset align in alloc_bootmem()
mm: fix alloc_bootmem_core to use fast searching for all nodes
mm: make mem_map allocation continuous -
typical case: four sockets system, every node has 4g ram, and we are using:
memmap=10g$4g
to mask out memory on node1 and node2
when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.so check it and print out some info about it.
need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.depends on "mm: make reserve_bootmem can crossed the nodes".
Signed-off-by: Yinghai Lu
Signed-off-by: Ingo Molnar