12 Jan, 2012

2 commits

  • Make sure the interrupt is allocated correctly by lguest_setup_irq (check the
    return value of irq_alloc_desc_at for -ENOMEM)

    Signed-off-by: Stratos Psomadakis
    Signed-off-by: Rusty Russell (cleanups and commentry)

    Stratos Psomadakis
     
  • We were cheating with our barriers; using the smp ones rather than the
    real device ones. That was fine, until rpmsg came along, which is
    used to talk to a real device (a non-SMP CPU).

    Unfortunately, just putting back the real barriers (reverting
    d57ed95d) causes a performance regression on virtio-pci. In
    particular, Amos reports netbench's TCP_RR over virtio_net CPU
    utilization increased up to 35% while throughput went down by up to
    14%.

    By comparison, this branch is in the noise.

    Reference: https://lkml.org/lkml/2011/12/11/22

    Signed-off-by: Rusty Russell

    Rusty Russell
     

17 Nov, 2011

1 commit


01 Nov, 2011

1 commit


22 Jul, 2011

1 commit

  • We used to notify the Host every time we updated a device's status. However,
    it only really needs to know when we're resetting the device, or failed to
    initialize it, or when we've finished our feature negotiation.

    In particular, we used to wait for VIRTIO_CONFIG_S_DRIVER_OK in the
    status byte before starting the device service threads. But this
    corresponds to the successful finish of device initialization, which
    might (like virtio_blk's partition scanning) use the device. So we
    had a hack, if they used the device before we expected we started the
    threads anyway.

    Now we hook into the finalize_features hook in the Guest: at that
    point we tell the Launcher that it can rely on the features we have
    acked. On the Launcher side, we look at the status at that point, and
    start servicing the device.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

14 Apr, 2010

1 commit

  • This is a partial revert of 4cd8b5e2a159 "lguest: use KVM hypercalls";
    we revert to using (just as questionable but more reliable) int $15 for
    hypercalls. I didn't revert the register mapping, so we still use the
    same calling convention as kvm.

    KVM in more recent incarnations stopped injecting a fault when a guest
    tried to use the VMCALL instruction from ring 1, so lguest under kvm
    fails to make hypercalls. It was nice to share code with our KVM
    cousins, but this was overreach.

    Signed-off-by: Rusty Russell
    Cc: Matias Zabaljauregui
    Cc: Avi Kivity

    Rusty Russell
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

30 Jul, 2009

3 commits

  • I've been doing this for years, and akpm picked me up on it about 12
    months ago. lguest partly serves as example code, so let's do it Right.

    Also, remove two unused fields in struct vblk_info in the example launcher.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     
  • Every so often, after code shuffles, I need to go through and unbitrot
    the Lguest Journey (see drivers/lguest/README). Since we now use RCU in
    a simple form in one place I took the opportunity to expand that explanation.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar
    Cc: Paul McKenney

    Rusty Russell
     
  • I don't really notice it (except to begrudge the extra vertical
    space), but Ingo does. And he pointed out that one excuse of lguest
    is as a teaching tool, it should set a good example.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     

12 Jun, 2009

2 commits


30 Mar, 2009

1 commit


09 Mar, 2009

1 commit

  • Impact: remove lots of lguest boot WARN_ON() when CONFIG_SPARSE_IRQ=y

    We now need to call irq_to_desc_alloc_cpu() before
    set_irq_chip_and_handler_name(), but we can't do that from init_IRQ (no
    kmalloc available).

    So do it as we use interrupts instead. Also means we only alloc for
    irqs we use, which was the intent of CONFIG_SPARSE_IRQ anyway.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar

    Rusty Russell
     

07 Jan, 2009

1 commit


30 Dec, 2008

3 commits


25 Aug, 2008

1 commit


25 Jul, 2008

2 commits


30 May, 2008

2 commits

  • Anthony Liguori points out that three different transports use the virtio code,
    but each one keeps its own counter to set the virtio_device's index field. In
    theory (though not in current practice) this means that names could be
    duplicated, and that risk grows as more transports are created.

    So we move the selection of the unique virtio_device.index into the common code
    in virtio.c, which has the side-benefit of removing duplicate code.

    The only complexity is that lguest and S/390 use the index to uniquely identify
    the device in case of catastrophic failure before register_virtio_device() is
    called: now we use the offset within the descriptor page as a unique identifier
    for the printks.

    Signed-off-by: Rusty Russell
    Cc: Christian Borntraeger
    Cc: Martin Schwidefsky
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: Chris Lalancette
    Cc: Anthony Liguori

    Rusty Russell
     
  • Thanks to Jon Corbet & LWN. Only took me a day to join the dots.

    Host->Guest netcat before (with unnecessily large receive buffers):
    1073741824 bytes (1.1 GB) copied, 24.7528 seconds, 43.4 MB/s

    After:
    1073741824 bytes (1.1 GB) copied, 17.6369 seconds, 60.9 MB/s

    Signed-off-by: Rusty Russell

    Rusty Russell
     

02 May, 2008

2 commits

  • This brings us closer to Real Life, where we'd examine the device
    features once it's set the DRIVER_OK status bit.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • A recent proposed feature addition to the virtio block driver revealed
    some flaws in the API: in particular, we assume that feature
    negotiation is complete once a driver's probe function returns.

    There is nothing in the API to require this, however, and even I
    didn't notice when it was violated.

    So instead, we require the driver to specify what features it supports
    in a table, we can then move the feature negotiation into the virtio
    core. The intersection of device and driver features are presented in
    a new 'features' bitmap in the struct virtio_device.

    Note that this highlights the difference between Linux unsigned-long
    bitmaps where each unsigned long is in native endian, and a
    straight-forward little-endian array of bytes.

    Drivers can still remove feature bits in their probe routine if they
    really have to.

    API changes:
    - dev->config->feature() no longer gets and acks a feature.
    - drivers should advertise their features in the 'feature_table' field
    - use virtio_has_feature() for extra sanity when checking feature bits

    Signed-off-by: Rusty Russell

    Rusty Russell
     

28 Mar, 2008

1 commit


09 Feb, 2008

1 commit

  • Using "attr" twice is not OK, because it effectively prohibits such
    container_of() on variables not named "attr".

    Signed-off-by: Alexey Dobriyan
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

04 Feb, 2008

3 commits

  • A reset function solves three problems:

    1) It allows us to renegotiate features, eg. if we want to upgrade a
    guest driver without rebooting the guest.

    2) It gives us a clean way of shutting down virtqueues: after a reset,
    we know that the buffers won't be used by the host, and

    3) It helps the guest recover from messed-up drivers.

    So we remove the ->shutdown hook, and the only way we now remove
    feature bits is via reset.

    We leave it to the driver to do the reset before it deletes queues:
    the balloon driver, for example, needs to chat to the host in its
    remove function.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • It seems that virtio_net wants to disable callbacks (interrupts) before
    calling netif_rx_schedule(), so we can't use the return value to do so.

    Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
    now returns void, rather than a boolean.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Previously we used a type/len pair within the config space, but this
    seems overkill. We now simply define a structure which represents the
    layout in the config space: the config space can now only be extended
    at the end.

    The main driver-visible changes:
    1) We indicate what fields are present with an explicit feature bit.
    2) Virtqueues are explicitly numbered, and not in the config space.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

19 Nov, 2007

1 commit

  • The virtio code never hooked through the ->remove callback. Although
    noone supports device removal at the moment, this code is already
    needed for module unloading.

    This of course also revealed bugs in virtio_blk, virtio_net and lguest
    unloading paths.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

12 Nov, 2007

1 commit

  • The virtio descriptor rings of size N-1 were nicely set up to be
    aligned to an N-byte boundary. But as Anthony Liguori points out, the
    free-running indices used by virtio require that the sizes be a power
    of 2, otherwise we get problems on wrap (demonstrated with lguest).

    So we replace the clever "2^n-1" scheme with a simple "align to page
    boundary" scheme: this means that all virtio rings take at least two
    pages, but it's safer than guessing cache alignment.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

25 Oct, 2007

1 commit


23 Oct, 2007

1 commit

  • This makes lguest able to use the virtio devices.

    We change the device descriptor page from a simple array to a variable
    length "type, config_len, status, config data..." format, and
    implement virtio_config_ops to read from that config data.

    We use the virtio ring implementation for an efficient Guest Host
    virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the
    host when it changes.

    We also use LHCALL_NOTIFY on kernel addresses for very very early
    console output. We could have another hypercall, but this hack works
    quite well.

    Signed-off-by: Rusty Russell

    Rusty Russell