22 Jul, 2011

1 commit

  • The Host used to create some page tables for the Guest to use at the
    top of Guest memory; it would then tell the Guest where this was. In
    particular, it created linear mappings for 0 and 0xC0000000 addresses
    because lguest used to switch to its real page tables quite late in
    boot.

    However, since d50d8fe19 Linux initialized boot page tables in
    head_32.S even before the "are we lguest?" boot jump. So, now we can
    simplify things: the Host pagetable code assumes 1:1 linear mapping
    until it first calls the LHCALL_NEW_PGTABLE hypercall, which we now do
    before we reach C code.

    This also means that the Host doesn't need to know anything about the
    Guest's PAGE_OFFSET. (Non-Linux guests might not even have such a
    thing).

    Signed-off-by: Rusty Russell

    Rusty Russell
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

30 Jul, 2009

2 commits


17 Jul, 2009

1 commit


01 Jul, 2009

1 commit

  • Change the eventfd interface to de-couple the eventfd memory context, from
    the file pointer instance.

    Without such change, there is no clean way to racely free handle the
    POLLHUP event sent when the last instance of the file* goes away. Also,
    now the internal eventfd APIs are using the eventfd context instead of the
    file*.

    This patch is required by KVM's IRQfd code, which is still under
    development.

    Signed-off-by: Davide Libenzi
    Cc: Gregory Haskins
    Cc: Rusty Russell
    Cc: Benjamin LaHaise
    Cc: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

12 Jun, 2009

8 commits

  • We no longer need an efficient mechanism to force the Guest back into
    host userspace, as each device is serviced without bothering the main
    Guest process (aka. the Launcher).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
    an address: the main Launcher process returns with this address, and figures
    out what device to run.

    A far nicer model is to let processes bind an eventfd to an address: if we
    find one, we simply signal the eventfd.

    Signed-off-by: Rusty Russell
    Cc: Davide Libenzi

    Rusty Russell
     
  • We currently only allow the Launcher process to send interrupts, but it
    as we already send interrupts from the hrtimer, it's a simple matter of
    extracting that code into a common set_interrupt routine.

    As we switch to a thread per virtqueue, this avoids a bottleneck through the
    main Launcher process.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • This version requires that host and guest have the same PAE status.
    NX cap is not offered to the guest, yet.

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name
    (That's really what it is, and the confusion gets worse with PAE support)

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell
    Reported-by: Jeremy Fitzhardinge

    Matias Zabaljauregui
     
  • If GDT_ENTRIES were every > 256, this could become a problem.

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • lguest never checked for pending interrupts when enabling interrupts, and
    things still worked. However, it makes a significant difference to TCP
    performance, so it's time we fixed it by introducing a pending_irq flag
    and checking it on irq_restore and irq_enable.

    These two routines are now too big to patch into the 8/10 bytes
    patch space, so we drop that code.

    Note: The high latency on interrupt delivery had a very curious
    effect: once everything else was optimized, networking without GSO was
    faster than networking with GSO, since more interrupts were sent and
    hence a greater chance of one getting through to the Guest!

    Note2: (Almost) Closing the same loophole for iret doesn't have any
    measurable effect, so I'm leaving that patch for the moment.

    Before:
    1GB tcpblast Guest->Host: 30.7 seconds
    1GB tcpblast Guest->Host (no GSO): 76.0 seconds

    After:
    1GB tcpblast Guest->Host: 6.8 seconds
    1GB tcpblast Guest->Host (no GSO): 27.8 seconds

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting
    that a timer or the Waker will wake_up_process() us.

    But we do it in a stupid way, leaving a classic missing wakeup race.

    So split maybe_do_interrupt() into interrupt_pending() and
    try_deliver_interrupt(), and check maybe_do_interrupt() and the
    "break_out" flag before calling schedule.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

19 Apr, 2009

1 commit

  • Fixes guest crash 'lguest: bad read address 0x4800000 len 256'

    The new per-cpu allocator ends up handing a non-linear address to
    write_gdt_entry. We do __pa() on it, and hand it to the host, which
    kills us.

    I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
    code, but had no pressing reason until now.

    Signed-off-by: Rusty Russell
    Cc: lguest@ozlabs.org

    Rusty Russell
     

30 Mar, 2009

1 commit


30 Dec, 2008

1 commit


27 May, 2008

1 commit

  • Add pte_flags() to extract the flags from a pte. This is a special
    case of pte_val() which is only guaranteed to return the pte's flags
    correctly; the page number may be corrupted or missing.

    The intent is to allow paravirt implementations to return pte flags
    without having to do any translation of the page number (most notably,
    Xen).

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge
     

19 Apr, 2008

1 commit


30 Jan, 2008

16 commits


25 Oct, 2007

2 commits


23 Oct, 2007

3 commits

  • Jes complains that page table code still uses lgread_u32 even though
    it now uses general kernel pte types. The best thing to do is to
    generalize lgread_u32 and lgwrite_u32.

    This means we lose the efficiency of getuser(). We could potentially
    regain it if we used __copy_from_user instead of copy_from_user, but
    I'm not certain that our range check is equivalent to access_ok() on
    all platforms.

    Signed-off-by: Rusty Russell
    Acked-by: Jes Sorensen

    Rusty Russell
     
  • This patch gets rid of the old lguest host I/O infrastructure and
    replaces it with a single hypercall "LHCALL_NOTIFY" which takes an
    address.

    The main change is the removal of io.c: that mainly did inter-guest
    I/O, which virtio doesn't yet support.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • 1) This allows us to get alot closer to booting bzImages.

    2) It means we don't have to know page_offset.

    3) The Guest needs to modify the boot pagetables to create the
    PAGE_OFFSET mapping before jumping to C code.

    4) guest_pa() walks the page tables rather than using page_offset.

    5) We don't use page_offset to figure out whether to emulate: it was
    always kinda quesationable, and won't work for instructions done
    before remapping (bzImage unpacking in particular).

    6) We still want the kernel address for tlb flushing: have the initial
    hypercall give us that, too.

    Signed-off-by: Rusty Russell

    Rusty Russell