18 Oct, 2007

1 commit

  • * 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
    xfs: eagerly remove vmap mappings to avoid upsetting Xen
    xen: add some debug output for failed multicalls
    xen: fix incorrect vcpu_register_vcpu_info hypercall argument
    xen: ask the hypervisor how much space it needs reserved
    xen: lock pte pages while pinning/unpinning
    xen: deal with stale cr3 values when unpinning pagetables
    xen: add batch completion callbacks
    xen: yield to IPI target if necessary
    Clean up duplicate includes in arch/i386/xen/
    remove dead code in pgtable_cache_init
    paravirt: clean up lazy mode handling
    paravirt: refactor struct paravirt_ops into smaller pv_*_ops

    Linus Torvalds
     

17 Oct, 2007

3 commits

  • Instead of using magic macros for boot_params access, simply use the
    boot_params structure.

    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     
  • Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
    1. enter lazy cpu mode
    2. leave lazy cpu mode
    3. enter lazy mmu mode
    4. leave lazy mmu mode
    5. flush pending batched operations

    This complicates each paravirt backend, since it needs to deal with
    all the possible state transitions, handling flushing, etc. In
    particular, flushing is quite distinct from the other 4 functions, and
    seems to just cause complication.

    This patch removes the set_lazy_mode operation, and adds "enter" and
    "leave" lazy mode operations on mmu_ops and cpu_ops. All the logic
    associated with enter and leaving lazy states is now in common code
    (basically BUG_ONs to make sure that no mode is current when entering
    a lazy mode, and make sure that the mode is current when leaving).
    Also, flush is handled in a common way, by simply leaving and
    re-entering the lazy mode.

    The result is that the Xen, lguest and VMI lazy mode implementations
    are much simpler.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Andi Kleen
    Cc: Zach Amsden
    Cc: Rusty Russell
    Cc: Avi Kivity
    Cc: Anthony Liguory
    Cc: "Glauber de Oliveira Costa"
    Cc: Jun Nakajima

    Jeremy Fitzhardinge
     
  • This patch refactors the paravirt_ops structure into groups of
    functionally related ops:

    pv_info - random info, rather than function entrypoints
    pv_init_ops - functions used at boot time (some for module_init too)
    pv_misc_ops - lazy mode, which didn't fit well anywhere else
    pv_time_ops - time-related functions
    pv_cpu_ops - various privileged instruction ops
    pv_irq_ops - operations for managing interrupt state
    pv_apic_ops - APIC operations
    pv_mmu_ops - operations for managing pagetables

    There are several motivations for this:

    1. Some of these ops will be general to all x86, and some will be
    i386/x86-64 specific. This makes it easier to share common stuff
    while allowing separate implementations where needed.

    2. At the moment we must export all of paravirt_ops, but modules only
    need selected parts of it. This allows us to export on a case by case
    basis (and also choose which export license we want to apply).

    3. Functional groupings make things a bit more readable.

    Struct paravirt_ops is now only used as a template to generate
    patch-site identifiers, and to extract function pointers for inserting
    into jmp/calls when patching. It is only instantiated when needed.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Zach Amsden
    Cc: Avi Kivity
    Cc: Anthony Liguory
    Cc: "Glauber de Oliveira Costa"
    Cc: Jun Nakajima

    Jeremy Fitzhardinge
     

25 Sep, 2007

1 commit

  • The assembly templates for lguest guest patching are in the .init.text
    section. This means that modules get patched with "cc cc cc cc" or similar
    junk.

    Signed-off-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

13 Sep, 2007

1 commit

  • One of the very first things lguest_init() does is a memcpy. On
    Athlon/Duron/K7 or CyrixIII/VIA-C3 or Geode GX/LX, this tries to use
    MMX.

    memcpy -> _mmx_memcpy -> kernel_fpu_begin -> clts -> paravirt_ops.clts

    But we haven't set paravirt_ops.clts yet, so we do the native version
    and crash. The simplest solution is to use __memcpy.

    Thanks to Michael Rasenberger for the bug report.

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

31 Aug, 2007

1 commit

  • If the stack pointer is 0xc057a000, then the first stack page is at
    0xc0579000 (the stack pointer is decremented before use). Not
    calculating this correctly caused guests with CONFIG_DEBUG_PAGEALLOC=y
    to be killed with a "bad stack page" message: the initial kernel stack
    was just proceeding the .smp_locks section which
    CONFIG_DEBUG_PAGEALLOC marks read-only when freeing.

    Thanks to Frederik Deweerdt for the bug report!

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

24 Aug, 2007

1 commit


12 Aug, 2007

2 commits

  • Commit 19d36ccdc34f5ed444f8a6af0cbfdb6790eb1177 "x86: Fix alternatives
    and kprobes to remap write-protected kernel text" uses code which is
    being patched for patching.

    In particular, paravirt_ops does patching in two stages: first it
    calls paravirt_ops.patch, then it fills any remaining instructions
    with nop_out(). nop_out calls text_poke() which calls
    lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
    that call site is one of the places we patch.

    If we always do patching as one single call to text_poke(), we only
    need make sure we're not patching the memcpy in text_poke itself.
    This means the prototype to paravirt_ops.patch needs to change, to
    marshal the new code into a buffer rather than patching in place as it
    does now. It also means all patching goes through text_poke(), which
    is known to be safe (apply_alternatives is also changed to make a
    single patch).

    AK: fix compilation on x86-64 (bad rusty!)
    AK: fix boot on x86-64 (sigh)
    AK: merged with other patches

    Signed-off-by: Rusty Russell
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Files using bits from paravirt.h should explicitly include it rather than
    relying on it being pulled in by something else.

    Signed-off-by: Jes Sorensen
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jes Sorensen
     

09 Aug, 2007

2 commits

  • If a Guest makes hypercall which sets a GDT entry to not present, we
    currently set any segment registers using that GDT entry to 0.
    Unfortunately, this is not sufficient: there are other ways of
    altering GDT entries which will cause a fault.

    The correct solution to do what Linux does: let them set any GDT value
    they want and handle the #GP when popping causes a fault. This has
    the added benefit of making our Switcher slightly more robust in the
    case of any other bugs which cause it to fault.

    We kill the Guest if it causes a fault in the Switcher: it's the
    Guest's responsibility to make sure it's not using segments when it
    changes them.

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • lguest uses a host-supplied wallclock-based clocksource when the TSC
    is not reliable. As this is already in nanoseconds, I naively used a
    multiplier of 1 and a shift of 0.

    But update_wall_time() in its infinite wisdom decides to adjust the
    clock a little (where does it think it's getting a more accurate time
    from?)

    It will happily tweak the multiplier... to 0, then -1.

    So the "fix" is to use a shift of 22 like everyone else, and a
    multiplier of 1 << 22.

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

07 Aug, 2007

1 commit

  • Lguest drivers need to default to "Y" otherwise they're never selected
    for new builds. (We don't bother prompting, because they're less than
    4k combined, and implied by selecting lguest support).

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

30 Jul, 2007

1 commit


29 Jul, 2007

2 commits

  • A non-periodic clock_event_device and the "jiffies" clock don't mix well:
    tick_handle_periodic() can go into an infinite loop.

    Currently lguest guests use the jiffies clock when the TSC is
    unusable. Instead, make the Host write the current time into the lguest
    page on every interrupt. This doesn't cost much but is more precise
    and at least as accurate as the jiffies clock. It also gets rid of
    the GET_WALLCLOCK hypercall.

    Also, delay setting sched_clock until our clock is set up, otherwise
    the early printk timestamps can go backwards (not harmful, just ugly).

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • Jason Yeh sent his crashing .config: bzImages made with
    CONFIG_RELOCATABLE=y put the relocs where the BSS is expected, and we
    crash with unusual results such as:

    lguest: unhandled trap 14 at 0xc0122ae1 (0xa9)

    Relying on BSS being zero was merely laziness on my part, and
    unfortunately, lguest doesn't go through the normal startup path (which
    does this in asm).

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

27 Jul, 2007

7 commits


22 Jul, 2007

1 commit

  • We need to make sure, that the clockevent devices are resumed, before
    the tick is resumed. The current resume logic does not guarantee this.

    Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
    event devices before resuming the tick / oneshot functionality.

    Fixup the existing users.

    Thanks to Nigel Cunningham for tracking down a long standing thinko,
    which affected the jinxed VAIO.

    [akpm@linux-foundation.org: xen build fix]
    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

21 Jul, 2007

3 commits


20 Jul, 2007

4 commits

  • It's void __user *, not void * __user...

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • This is the Kconfig and Makefile to allow lguest to actually be
    compiled.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • This is the code for the "lg.ko" module, which allows lguest guests to
    be launched.

    [akpm@linux-foundation.org: update for futex-new-private-futexes]
    [akpm@linux-foundation.org: build fix]
    [jmorris@namei.org: lguest: use hrtimers]
    [akpm@linux-foundation.org: x86_64 build fix]
    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Eric Dumazet
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need
    VT/SVM hardware. Unlike Xen it's simply "modprobe and go". Unlike both, it's
    5000 lines and self-contained.

    Performance is ok, but not great (-30% on kernel compile). But given its
    hackability, I expect this to improve, along with the paravirt_ops code which
    it supplies a complete example for. There's also a 64-bit version being
    worked on and other craziness.

    But most of all, lguest is awesome fun! Too much of the kernel is a big ball
    of hair. lguest is simple enough to dive into and hack, plus has some warts
    which scream "fork me!".

    This patch:

    This is the code and headers required to make an i386 kernel an lguest guest.

    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell