27 Oct, 2011

2 commits

  • run_guest tries to freeze the current process after it has handled
    pending interrupts and before it calls lguest_arch_run_guest.
    This doesn't work nicely if the task has been killed while being frozen
    and when we want to handle that signal as soon as possible.
    Let's move try_to_freeze before we check for pending signal so that we
    can get out of the loop as soon as possible.

    Signed-off-by: Michal Hocko
    Acked-by: Rusty Russell
    Signed-off-by: Rusty Russell

    Michal Hocko
     
  • We actually can run under KVM, as it doesn't paravirtualize anything we
    need to use; reduce the check to checking we are the normal ringlevel.

    Reported-by: Stefanos Geraggelos
    Signed-off-by: Rusty Russell # HG changeset patch

    Rusty Russell
     

22 Jul, 2011

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

23 Sep, 2009

1 commit


30 Jul, 2009

2 commits

  • Every so often, after code shuffles, I need to go through and unbitrot
    the Lguest Journey (see drivers/lguest/README). Since we now use RCU in
    a simple form in one place I took the opportunity to expand that explanation.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar
    Cc: Paul McKenney

    Rusty Russell
     
  • I don't really notice it (except to begrudge the extra vertical
    space), but Ingo does. And he pointed out that one excuse of lguest
    is as a teaching tool, it should set a good example.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     

12 Jun, 2009

5 commits

  • We no longer need an efficient mechanism to force the Guest back into
    host userspace, as each device is serviced without bothering the main
    Guest process (aka. the Launcher).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
    an address: the main Launcher process returns with this address, and figures
    out what device to run.

    A far nicer model is to let processes bind an eventfd to an address: if we
    find one, we simply signal the eventfd.

    Signed-off-by: Rusty Russell
    Cc: Davide Libenzi

    Rusty Russell
     
  • Map switcher with executable page table entries.
    (This bug didn't matter before PAE and hence NX support -- RR)

    Signed-off-by: Matias Zabaljauregui
    Signed-off-by: Rusty Russell

    Matias Zabaljauregui
     
  • lguest never checked for pending interrupts when enabling interrupts, and
    things still worked. However, it makes a significant difference to TCP
    performance, so it's time we fixed it by introducing a pending_irq flag
    and checking it on irq_restore and irq_enable.

    These two routines are now too big to patch into the 8/10 bytes
    patch space, so we drop that code.

    Note: The high latency on interrupt delivery had a very curious
    effect: once everything else was optimized, networking without GSO was
    faster than networking with GSO, since more interrupts were sent and
    hence a greater chance of one getting through to the Guest!

    Note2: (Almost) Closing the same loophole for iret doesn't have any
    measurable effect, so I'm leaving that patch for the moment.

    Before:
    1GB tcpblast Guest->Host: 30.7 seconds
    1GB tcpblast Guest->Host (no GSO): 76.0 seconds

    After:
    1GB tcpblast Guest->Host: 6.8 seconds
    1GB tcpblast Guest->Host (no GSO): 27.8 seconds

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting
    that a timer or the Waker will wake_up_process() us.

    But we do it in a stupid way, leaving a classic missing wakeup race.

    So split maybe_do_interrupt() into interrupt_pending() and
    try_deliver_interrupt(), and check maybe_do_interrupt() and the
    "break_out" flag before calling schedule.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

30 Mar, 2009

1 commit


30 Jan, 2009

1 commit


29 Jul, 2008

1 commit


28 Mar, 2008

1 commit


11 Mar, 2008

1 commit

  • Robert Bragg's 5dc331852848a38ca00a2817e5b98a1d0561b116 tightened
    (ie. fixed) the checking in __get_vm_area, and it broke lguest.

    lguest should pass the exact "end" it wants, not some random constant
    (it was possible previously that it would actually get an address
    different from SWITCHER_ADDR).

    Also, Fabio Checconi pointed out that we should make sure we're not
    hitting the fixmap area.

    Signed-off-by: Rusty Russell
    Cc: Robert Bragg

    Rusty Russell
     

30 Jan, 2008

8 commits


25 Oct, 2007

1 commit


23 Oct, 2007

7 commits

  • Jes complains that page table code still uses lgread_u32 even though
    it now uses general kernel pte types. The best thing to do is to
    generalize lgread_u32 and lgwrite_u32.

    This means we lose the efficiency of getuser(). We could potentially
    regain it if we used __copy_from_user instead of copy_from_user, but
    I'm not certain that our range check is equivalent to access_ok() on
    all platforms.

    Signed-off-by: Rusty Russell
    Acked-by: Jes Sorensen

    Rusty Russell
     
  • This patch gets rid of the old lguest host I/O infrastructure and
    replaces it with a single hypercall "LHCALL_NOTIFY" which takes an
    address.

    The main change is the removal of io.c: that mainly did inter-guest
    I/O, which virtio doesn't yet support.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • (Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).

    This patch allows Guests to specify what system call vector they want,
    and we try to reserve it. We only allow one non-Linux system call
    vector, to try to avoid DoS on the Host.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Currently we look at the "trapnum" to see if the Guest wants a
    hypercall. But once the hypercall is done we have to reset trapnum to
    a bogus value, otherwise if we exit to userspace and return, we'd run
    the same hypercall twice (that was a nasty bug to find!).

    This has two main effects:

    1) When Jes's patch changes the hypercall args to be a generic "struct
    hcall_args" we simply change the type of "lg->hcall". It's set by
    arch code, so if it has to copy args or something it can do so, and
    point "hcall" into lg->arch somewhere.

    2) Async hypercalls only get run when an actual hypercall is pending.
    This simplfies the code a little and is a more logical semantic.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Separate i386 architecture specific from core.c and move it to
    x86/core.c and add x86/lguest.h header file to match.

    Signed-off-by: Jes Sorensen
    Signed-off-by: Rusty Russell

    Jes Sorensen
     
  • Back when we had all the Guest state in the switcher, we had a fixed
    array of them. This is no longer necessary.

    If we switch the network code to using random_ether_addr (46 bits is
    enough to avoid clashes), we can get rid of the concept of "guest id"
    altogether.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • In order to avoid problematic special linking of the Launcher, we give
    the Host an offset: this means we can use any memory region in the
    Launcher as Guest memory rather than insisting on mmap() at 0.

    The result is quite pleasing: a number of casts are replaced with
    simple additions.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

17 Oct, 2007

1 commit

  • This patch refactors the paravirt_ops structure into groups of
    functionally related ops:

    pv_info - random info, rather than function entrypoints
    pv_init_ops - functions used at boot time (some for module_init too)
    pv_misc_ops - lazy mode, which didn't fit well anywhere else
    pv_time_ops - time-related functions
    pv_cpu_ops - various privileged instruction ops
    pv_irq_ops - operations for managing interrupt state
    pv_apic_ops - APIC operations
    pv_mmu_ops - operations for managing pagetables

    There are several motivations for this:

    1. Some of these ops will be general to all x86, and some will be
    i386/x86-64 specific. This makes it easier to share common stuff
    while allowing separate implementations where needed.

    2. At the moment we must export all of paravirt_ops, but modules only
    need selected parts of it. This allows us to export on a case by case
    basis (and also choose which export license we want to apply).

    3. Functional groupings make things a bit more readable.

    Struct paravirt_ops is now only used as a template to generate
    patch-site identifiers, and to extract function pointers for inserting
    into jmp/calls when patching. It is only instantiated when needed.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Zach Amsden
    Cc: Avi Kivity
    Cc: Anthony Liguory
    Cc: "Glauber de Oliveira Costa"
    Cc: Jun Nakajima

    Jeremy Fitzhardinge
     

09 Aug, 2007

1 commit

  • If a Guest makes hypercall which sets a GDT entry to not present, we
    currently set any segment registers using that GDT entry to 0.
    Unfortunately, this is not sufficient: there are other ways of
    altering GDT entries which will cause a fault.

    The correct solution to do what Linux does: let them set any GDT value
    they want and handle the #GP when popping causes a fault. This has
    the added benefit of making our Switcher slightly more robust in the
    case of any other bugs which cause it to fault.

    We kill the Guest if it causes a fault in the Switcher: it's the
    Guest's responsibility to make sure it's not using segments when it
    changes them.

    Signed-off-by: Rusty Russell
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

27 Jul, 2007

4 commits


20 Jul, 2007

1 commit

  • This is the code for the "lg.ko" module, which allows lguest guests to
    be launched.

    [akpm@linux-foundation.org: update for futex-new-private-futexes]
    [akpm@linux-foundation.org: build fix]
    [jmorris@namei.org: lguest: use hrtimers]
    [akpm@linux-foundation.org: x86_64 build fix]
    Signed-off-by: Rusty Russell
    Cc: Andi Kleen
    Cc: Eric Dumazet
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell