Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

20 Jul, 2007

11 commits

efffbeee5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (33 commits)
xtensa: use DATA_DATA in xtensa
powerpc: add missing DATA_DATA to powerpc
cris: use DATA_DATA in cris
kallsyms: remove usage of memmem and _GNU_SOURCE from scripts/kallsyms.c
kbuild: use -fno-optimize-sibling-calls unconditionally
kconfig: reset generated values only if Kconfig and .config agree.
kbuild: fix the warning when running make tags
kconfig: strip 'CONFIG_' automatically in kernel configuration search
kbuild: use POSIX BRE in headers install target
Whitelist references from __dbe_table to .init
modpost white list pattern adjustment
kbuild: do section mismatch check on full vmlinux
kbuild: whitelist references from variables named _timer to .init.text
kbuild: remove hardcoded _logo names from modpost
kbuild: remove hardcoded apic_es7000 from modpost
kbuild: warn about references from .init.text to .exit.text
kbuild: consolidate section checks
kbuild: refactor code in modpost to improve maintainability
kbuild: ignore section mismatch warnings originating from .note section
kbuild: .paravirtprobe section is obsolete, so modpost doesn't need to handle it
...

Linus Torvalds
2007-07-20 05:28:19 +0800
c0d121720 drivers/edac: add new nmi rescan ... Browse Code »

Provides a way for NMI reported errors on x86 to notify the EDAC
subsystem pending ECC errors by writing to a software state variable.

Here's the reworked patch. I added an EDAC stub to the kernel so we can
have variables that are in the kernel even if EDAC is a module. I also
implemented the idea of using the chip driver to select error detection
mode via module parameter and eliminate the kernel compile option.
Please review/test. Thx!

Also, I only made changes to some of the chipset drivers since I am
unfamiliar with the other ones. We can add similar changes as we go.

Signed-off-by: Dave Jiang
Signed-off-by: Douglas Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jiang
2007-07-20 01:04:53 +0800
6db7016d1 lguest: the asm offsets ... Browse Code »

This is the structure offsets required by lg.ko's switcher.S.

Unfortunately we don't have infrastructure for private asm-offsets
creation.

Signed-off-by: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-20 01:04:52 +0800
d7e28ffe6 lguest: the host code ... Browse Code »

This is the code for the "lg.ko" module, which allows lguest guests to
be launched.

[akpm@linux-foundation.org: update for futex-new-private-futexes]
[akpm@linux-foundation.org: build fix]
[jmorris@namei.org: lguest: use hrtimers]
[akpm@linux-foundation.org: x86_64 build fix]
Signed-off-by: Rusty Russell
Cc: Andi Kleen
Cc: Eric Dumazet
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-20 01:04:52 +0800
5992b6dac lguest: export symbols for lguest as a module ... Browse Code »

lguest does some fairly lowlevel things to support a host, which
normal modules don't need:

math_state_restore:
When the guest triggers a Device Not Available fault, we need
to be able to restore the FPU

__put_task_struct:
We need to hold a reference to another task for inter-guest
I/O, and put_task_struct() is an inline function which calls
__put_task_struct.

access_process_vm:
We need to access another task for inter-guest I/O.

map_vm_area & __get_vm_area:
We need to map the switcher shim (ie. monitor) at 0xFFC01000.

Signed-off-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-20 01:04:52 +0800
cbe87121f i386: Put allocated ELF notes in read-only data segment ... Browse Code »

This changes the i386 linker script and the asm-generic macro it uses so that
ELF note sections with SHF_ALLOC set are linked into the kernel image along
with other read-only data. The PT_NOTE also points to their location.

This paves the way for putting useful build-time information into ELF notes
that can be found easily later in a kernel memory dump.

Signed-off-by: Roland McGrath
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2007-07-20 01:04:47 +0800
f34e3b61f use the new percpu interface for shared data ... Browse Code »

Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute. Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.

This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.

Signed-off-by: Fenghua Yu
Acked-by: Suresh Siddha
Cc: Rusty Russell
Cc: Christoph Lameter
Cc: "Luck, Tony"
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fenghua Yu
2007-07-20 01:04:45 +0800
5fb7dc37d define new percpu interface for shared data ... Browse Code »

per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.

One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.

This patch:

Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.

Signed-off-by: Fenghua Yu
Acked-by: Suresh Siddha
Cc: Rusty Russell
Cc: Christoph Lameter
Cc:
Cc: "Luck, Tony"
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fenghua Yu
2007-07-20 01:04:44 +0800
77afcf78a PM: Integrate beeping flag with existing acpi_sleep flags ... Browse Code »

Move "debug during resume from s2ram" into the variable we already use
for real-mode flags to simplify code. It also closes nasty trap for
the user in acpi_sleep_setup; order of parameters actually mattered there,
acpi_sleep=s3_bios,s3_mode doing something different from
acpi_sleep=s3_mode,s3_bios.

Signed-off-by: Pavel Machek
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Machek
2007-07-20 01:04:43 +0800
5a60d6235 PM: Optional beeping during resume from suspend to RAM ... Browse Code »

Add a feature allowing the user to make the system beep during a resume from
suspend to RAM, on x86_64 and i386.

This is useful for the users with broken resume from RAM, so that they can
verify if the control reaches the kernel after a wake-up event.

Signed-off-by: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nigel Cunningham
2007-07-20 01:04:43 +0800
83c54070e mm: fault feedback #2 ... Browse Code »

This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).

[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: Ian Molton
Cc: Bryan Wu
Cc: Mikael Starvik
Cc: David Howells
Cc: Yoshinori Sato
Cc: "Luck, Tony"
Cc: Hirokazu Takata
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Cc: Greg Ungerer
Cc: Matthew Wilcox
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Acked-by: Kyle McMartin
Acked-by: Haavard Skinnemoen
Acked-by: Ralf Baechle
Acked-by: Andi Kleen
Signed-off-by: Andrew Morton
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800

19 Jul, 2007

9 commits

97405fe26 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup:
[PATCH] x86: do not recompile boot for each build
[x86 setup] Save/restore DS around invocations of INT 10h
[x86 setup] VGA: Clear the Protect bit before setting the vertical height
[x86 setup] Fix assembly constraints
[x86 setup] build/tools.c: fix comment
[x86 setup] MAINTAINERS: document x86 setup code git tree

Linus Torvalds
2007-07-19 03:13:02 +0800
a10d9a71b i386: fixup TRACE_IRQ breakage ... Browse Code »

The TRACE_IRQS_ON function in iret_exc: calls a C function without
ensuring that the segments are set properly. Move the trace function and
the enabling of interrupt into the C stub.

Signed-off-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-07-19 03:09:01 +0800
29eb51101 Handle bogus %cs selector in single-step instruction decoding ... Browse Code »

The code for LDT segment selectors was not robust in the face of a bogus
selector set in %cs via ptrace before the single-step was done.

Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds

Roland McGrath
2007-07-19 03:09:01 +0800
3fbc54165 [PATCH] x86: do not recompile boot for each build ... Browse Code »

Keep the arch/i386/boot directory from being rebuilt every time.

Signed-off-by: Sam Ravnborg
Signed-off-by: H. Peter Anvin

Sam Ravnborg
2007-07-19 02:36:17 +0800
8c027ae2d [x86 setup] Save/restore DS around invocations of INT 10h ... Browse Code »

There exists at least one card, Trident TVGA8900CL (BIOS dated 1992/9/8)
which clobbers DS when "scrolling in an SVGA text mode of more than
800x600 pixels." Although we are extremely unlikely to run into that
situation, it is cheap insurance to save and restore DS, and it only adds
a grand total of 50 bytes to the total output.

Pointed out by Etienne Lorrain.

Cc: Etienne Lorrain
Signed-off-by: H. Peter Anvin

H. Peter Anvin
2007-07-19 02:36:17 +0800
7ad37df02 [x86 setup] VGA: Clear the Protect bit before setting the vertical height ... Browse Code »

If the user has asked for the vertical height registers to be recomputed
by setting bit 15 in the video mode number, we do so without clearing the
Protect bit in the Vertical Retrace Register before setting the Overflow
register. As a result, if the VGA BIOS had set the Protect bit, the
write to the Overflow register will be dropped, and bits [9:8] of the
vertical height will be left unchanged.

This is a bug imported from the assembly version of this code. It was
pointed out by Etienne Lorrain.

Cc: Etienne Lorrain
Signed-off-by: H. Peter Anvin

H. Peter Anvin
2007-07-19 02:36:17 +0800
5593eaa85 [x86 setup] Fix assembly constraints ... Browse Code »

Fix incorrect assembly constraints. In particular, fix memory
constraints used inside push..pop, which can cause invalid operation
since gcc may generate %esp-relative references.

Additionally:

outl() should have "dN" not "dn".

query_mca() shouldn't listen 16/32-bit registers in an 8-bit only
context.

has_eflag(): the "mask" is only used well after both the stack pointer
and the output registers have been touched; this requires the output
registers to be earlyclobbers (=&) and the input to exclude memory (so
"ri", not "g").

Thanks to Etienne Lorrain and Chuck Ebbert for prompting this review.

Cc: Etienne Lorrain
Cc: Chuck Ebbert
Signed-off-by: H. Peter Anvin

H. Peter Anvin
2007-07-19 02:36:17 +0800
9aa3909c0 [x86 setup] build/tools.c: fix comment ... Browse Code »

Correct a comment in arch/i386/boot/build/tools.c; we now build the
kernel from only two components instead of three, since the boot
sector has been integrated in the setup code.

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2007-07-19 02:36:17 +0800
d756d10e2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: extent macros cleanup
Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions.
ext4: remove extra IS_RDONLY() check
ext4: Use is_power_of_2()
Use zero_user_page() in ext4 where possible
ext4: Remove 65000 subdirectory limit
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields
ext4: Add nanosecond timestamps
jbd2: Move jbd2-debug file to debugfs
jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG
ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices
ext4: Make extents code sanely handle on-disk corruption
ext4: copy i_flags to inode flags on write
ext4: Enable extents by default
Change on-disk format to support 2^15 uninitialized extents
write support for preallocated blocks
fallocate support in ext4
sys_fallocate() implementation on i386, x86_64 and powerpc

Linus Torvalds
2007-07-19 01:32:00 +0800

18 Jul, 2007

20 commits

dfdcdd42f xen: disable all non-virtual drivers ... Browse Code »

A domU Xen environment has no non-virtual drivers, so make sure
they're all disabled at once.

Signed-off-by: Jeremy Fitzhardinge
Cc: Rusty Russell

Jeremy Fitzhardinge
2007-07-18 23:47:46 +0800
9ec2b804e xen: use iret directly when possible ... Browse Code »

Most of the time we can simply use the iret instruction to exit the
kernel, rather than having to use the iret hypercall - the only
exception is if we're returning into vm86 mode, or from delivering an
NMI (which we don't support yet).

When running native, iret has the behaviour of testing for a pending
interrupt atomically with re-enabling interrupts. Unfortunately
there's no way to do this with Xen, so there's a window in which we
could get a recursive exception after enabling events but before
actually returning to userspace.

This causes a problem: if the nested interrupt causes one of the
task's TIF_WORK_MASK flags to be set, they will not be checked again
before returning to userspace. This means that pending work may be
left pending indefinitely, until the process enters and leaves the
kernel again. The net effect is that a pending signal or reschedule
event could be delayed for an unbounded amount of time.

To deal with this, the xen event upcall handler checks to see if the
EIP is within the critical section of the iret code, after events
are (potentially) enabled up to the iret itself. If its within this
range, it calls the iret critical section fixup, which adjusts the
stack to deal with any unrestored registers, and then shifts the
stack frame up to replace the previous invocation.

Signed-off-by: Jeremy Fitzhardinge

Jeremy Fitzhardinge
2007-07-18 23:47:46 +0800
600b2fc24 xen: suppress abs symbol warnings for unused reloc pointers ... Browse Code »

arch/i386/xen/xen-asm.S defines some small pieces of code which are
used to implement a few paravirt_ops. They're designed so they can be
used either in-place, or be inline patched into their callsites if
there's enough space.

Some of those operations need to make calls out (specifically, if you
re-enable events [interrupts], and there's a pending event at that
time). These calls need the call instruction to be relocated if the
code is patched inline. In this case xen_foo_reloc is a
section-relative symbol which points to xen_foo's required relocation.

Other operations have no need of a relocation, and so their
corresponding xen_bar_reloc is absolute 0. These are the cases which
are triggering the warning.

This patch adds those symbols to the list of safe abs symbols.

Signed-off-by: Jeremy Fitzhardinge
Cc: Adrian Bunk

Jeremy Fitzhardinge
2007-07-18 23:47:45 +0800
6487673b8 xen: Attempt to patch inline versions of common operations ... Browse Code »

This patchs adds the mechanism to allow us to patch inline versions of
common operations.

The implementations of the direct-access versions save_fl, restore_fl,
irq_enable and irq_disable are now in assembler, and the same code is
used for both out of line and inline uses.

Signed-off-by: Jeremy Fitzhardinge
Cc: Chris Wright
Cc: Keir Fraser

Jeremy Fitzhardinge
2007-07-18 23:47:45 +0800
60223a326 xen: Place vcpu_info structure into per-cpu memory ... Browse Code »

An experimental patch for Xen allows guests to place their vcpu_info
structs anywhere. We try to use this to place the vcpu_info into the
PDA, which allows direct access.

If this works, then switch to using direct access operations for
irq_enable, disable, save_fl and restore_fl.

Signed-off-by: Jeremy Fitzhardinge
Cc: Chris Wright
Cc: Keir Fraser

Jeremy Fitzhardinge
2007-07-18 23:47:45 +0800
3e2b8fbee xen: handle external requests for shutdown, reboot and sysrq ... Browse Code »

The guest domain can be asked to shutdown or reboot itself, or have a
sysrq key injected, via xenbus. This patch adds a watcher for those
events, and does the appropriate action.

Signed-off-by: Jeremy Fitzhardinge
Cc: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:45 +0800
fefa629ab xen: machine operations ... Browse Code »

Make the appropriate hypercalls to halt and reboot the virtual machine.

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:45 +0800
b536b4b96 xen: use the hvc console infrastructure for Xen console ... Browse Code »

Implement a Xen back-end for hvc console.

* * *
Add early printk support via hvc console, enable using
"earlyprintk=xen" on the kernel command line.

From: Gerd Hoffmann
Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright
Acked-by: Ingo Molnar
Acked-by: Olof Johansson

Jeremy Fitzhardinge
2007-07-18 23:47:44 +0800
8b84ad942 xen: hack to prevent bad segment register reload ... Browse Code »

The hypervisor saves and restores the segment registers as part of the
state is saves while context switching. If, during a context switch,
the next process doesn't use the TLS segments, it invalidates the GDT
entry, causing the segment register reload to fault. This fault
effectively doubles the cost of a context switch.

This patch is a band-aid workaround which clears the usermode %gs
after it has been saved for the previous process, but before it gets
reloaded for the next, and it avoids having the hypervisor attempt to
erroneously reload it.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:44 +0800
d66bf8fcf xen: lazy-mmu operations ... Browse Code »

This patch uses the lazy-mmu hooks to batch mmu operations where
possible. This is primarily useful for batching operations applied to
active pagetables, which happens during mprotect, munmap, mremap and
the like (mmap does not do bulk pagetable operations, so it isn't
helped).

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:44 +0800
f120f13ea xen: Add support for preemption ... Browse Code »

Add Xen support for preemption. This is mostly a cleanup of existing
preempt_enable/disable calls, or just comments to explain the current
usage.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:44 +0800
f87e4cac4 xen: SMP guest support ... Browse Code »

This is a fairly straightforward Xen implementation of smp_ops.

Xen has its own IPI mechanisms, and has no dependency on any
APIC-based IPI. The smp_ops hooks and the flush_tlb_others pv_op
allow a Xen guest to avoid all APIC code in arch/i386 (the only apic
operation is a single apic_read for the apic version number).

One subtle point which needs to be addressed is unpinning pagetables
when another cpu may have a lazy tlb reference to the pagetable. Xen
will not allow an in-use pagetable to be unpinned, so we must find any
other cpus with a reference to the pagetable and get them to shoot
down their references.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright
Cc: Benjamin LaHaise
Cc: Ingo Molnar
Cc: Andi Kleen

Jeremy Fitzhardinge
2007-07-18 23:47:44 +0800
ab5502888 xen: Implement sched_clock ... Browse Code »

Implement xen_sched_clock, which returns the number of ns the current
vcpu has been actually in an unstolen state (ie, running or blocked,
vs runnable-but-not-running, or offline) since boot.

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Chris Wright
Cc: john stultz

Jeremy Fitzhardinge
2007-07-18 23:47:43 +0800
f91a8b447 xen: Account for stolen time ... Browse Code »

This patch accounts for the time stolen from our VCPUs. Stolen time is
time where a vcpu is runnable and could be running, but all available
physical CPUs are being used for something else.

This accounting gets run on each timer interrupt, just as a way to get
it run relatively often, and when interesting things are going on.
Stolen time is not really used by much in the kernel; it is reported
in /proc/stats, and that's about it.

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Chris Wright
Cc: john stultz
Cc: Rik van Riel

Jeremy Fitzhardinge
2007-07-18 23:47:43 +0800
9a4029fd3 xen: ignore RW mapping of RO pages in pagetable_init ... Browse Code »

When setting up the initial pagetable, which includes mappings of all
low physical memory, ignore a mapping which tries to set the RW bit on
an RO pte. An RO pte indicates a page which is part of the current
pagetable, and so it cannot be allowed to become RW.

Once xen_pagetable_setup_done is called, set_pte reverts to its normal
behaviour.

Signed-off-by: Jeremy Fitzhardinge
Acked-by: Chris Wright
Cc: ebiederm@xmission.com (Eric W. Biederman)

Jeremy Fitzhardinge
2007-07-18 23:47:43 +0800
f4f97b3ea xen: Complete pagetable pinning ... Browse Code »

Xen requires all active pagetables to be marked read-only. When the
base of the pagetable is loaded into %cr3, the hypervisor validates
the entire pagetable and only allows the load to proceed if it all
checks out.

This is pretty slow, so to mitigate this cost Xen has a notion of
pinned pagetables. Pinned pagetables are pagetables which are
considered to be active even if no processor's cr3 is pointing to is.
This means that it must remain read-only and all updates are validated
by the hypervisor. This makes context switches much cheaper, because
the hypervisor doesn't need to revalidate the pagetable each time.

This also adds a new paravirt hook which is called during setup once
the zones and memory allocator have been initialized. When the
init_mm pagetable is first built, the struct page array does not yet
exist, and so there's nowhere to put he init_mm pagetable's PG_pinned
flags. Once the zones are initialized and the struct page array
exists, we can set the PG_pinned flags for those pages.

This patch also adds the Xen support for pte pages allocated out of
highmem (highpte) by implementing xen_kmap_atomic_pte.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright
Cc: Zach Amsden

Jeremy Fitzhardinge
2007-07-18 23:47:43 +0800
e738fca8d xen: configuration ... Browse Code »

Put config options for Xen after the core pieces are in place.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:43 +0800
15c84731d xen: time implementation ... Browse Code »

Xen maintains a base clock which measures nanoseconds since system
boot. This is provided to guests via a shared page which contains a
base time in ns, a tsc timestamp at that point and tsc frequency
parameters. Guests can compute the current time by reading the tsc
and using it to extrapolate the current time from the basetime. The
hypervisor makes sure that the frequency parameters are updated
regularly, paricularly if the tsc changes rate or stops.

This is implemented as a clocksource, so the interface to the rest of
the kernel is a simple clocksource which simply returns the current
time directly in nanoseconds.

Xen also provides a simple timer mechanism, which allows a timeout to
be set in the future. When that time arrives, a timer event is sent
to the guest. There are two timer interfaces:
- An old one which also delivers a stream of (unused) ticks at 100Hz,
and on the same event, the actual timer events. The 100Hz ticks
cause a lot of spurious wakeups, but are basically harmless.
- The new timer interface doesn't have the 100Hz ticks, and can also
fail if the specified time is in the past.

This code presents the Xen timer as a clockevent driver, and uses the
new interface by preference.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright
Cc: Ingo Molnar
Cc: Thomas Gleixner

Jeremy Fitzhardinge
2007-07-18 23:47:43 +0800
e46cdb66c xen: event channels ... Browse Code »

Xen implements interrupts in terms of event channels. Each guest
domain gets 1024 event channels which can be used for a variety of
purposes, such as Xen timer events, inter-domain events,
inter-processor events (IPI) or for real hardware IRQs.

Within the kernel, we map the event channels to IRQs, and implement
the whole interrupt handling using a Xen irq_chip.

Rather than setting NR_IRQ to 1024 under PARAVIRT in order to
accomodate Xen, we create a dynamic mapping between event channels and
IRQs. Ideally, Linux will eventually move towards dynamically
allocating per-irq structures, and we can use a 1:1 mapping between
event channels and irqs.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright
Cc: Ingo Molnar
Cc: Eric W. Biederman

Jeremy Fitzhardinge
2007-07-18 23:47:42 +0800
3b827c1b3 xen: virtual mmu ... Browse Code »

Xen pagetable handling, including the machinery to implement direct
pagetables.

Xen presents the real CPU's pagetables directly to guests, with no
added shadowing or other layer of abstraction. Naturally this means
the hypervisor must maintain close control over what the guest can put
into the pagetable.

When the guest modifies the pte/pmd/pgd, it must convert its
domain-specific notion of a "physical" pfn into a global machine frame
number (mfn) before inserting the entry into the pagetable. Xen will
check to make sure the domain is allowed to create a mapping of the
given mfn.

Xen also requires that all mappings the guest has of its own active
pagetable are read-only. This is relatively easy to implement in
Linux because all pagetables share the same pte pages for kernel
mappings, so updating the pte in one pagetable will implicitly update
the mapping in all pagetables.

Normally a pagetable becomes active when you point to it with cr3 (or
the Xen equivalent), but when you do so, Xen must check the whole
pagetable for correctness, which is clearly a performance problem.

Xen solves this with pinning which keeps a pagetable effectively
active even if its currently unused, which means that all the normal
update rules are enforced. This means that it need not revalidate the
pagetable when loading cr3.

This patch has a first-cut implementation of pinning, but it is more
fully implemented in a later patch.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:42 +0800