27 Dec, 2011

1 commit

  • My testing version of Smatch complains that addr and len come from
    the user and they can wrap. The path is:
    -> kvm_vm_ioctl()
    -> kvm_vm_ioctl_unregister_coalesced_mmio()
    -> coalesced_mmio_in_range()

    I don't know what the implications are of wrapping here, but we may
    as well fix it, if only to silence the warning.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Marcelo Tosatti

    Dan Carpenter
     

26 Sep, 2011

3 commits

  • Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
    is to call the read or write callback for each device registered
    on the bus until we find a device which handles it.

    Since the number of devices on a bus can be significant due to ioeventfds
    and coalesced MMIO zones, this leads to a lot of overhead on each IO
    operation.

    Instead of registering devices, we now register ranges which points to
    a device. Lookup is done using an efficient bsearch instead of a linear
    search.

    Performance test was conducted by comparing exit count per second with
    200 ioeventfds created on one byte and the guest is trying to access a
    different byte continuously (triggering usermode exits).
    Before the patch the guest has achieved 259k exits per second, after the
    patch the guest does 274k exits per second.

    Cc: Avi Kivity
    Cc: Marcelo Tosatti
    Signed-off-by: Sasha Levin
    Signed-off-by: Avi Kivity

    Sasha Levin
     
  • This patch changes coalesced mmio to create one mmio device per
    zone instead of handling all zones in one device.

    Doing so enables us to take advantage of existing locking and prevents
    a race condition between coalesced mmio registration/unregistration
    and lookups.

    Suggested-by: Avi Kivity
    Signed-off-by: Sasha Levin
    Signed-off-by: Marcelo Tosatti

    Sasha Levin
     
  • Move the check whether there are available entries to within the spinlock.
    This allows working with larger amount of VCPUs and reduces premature
    exits when using a large number of VCPUs.

    Cc: Avi Kivity
    Cc: Ingo Molnar
    Cc: Marcelo Tosatti
    Cc: Pekka Enberg
    Signed-off-by: Sasha Levin
    Signed-off-by: Marcelo Tosatti

    Sasha Levin
     

01 Aug, 2010

1 commit


17 May, 2010

2 commits

  • kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
    mmio ring page and dev even after it has freed them.

    Also, if this function fails, though it might be rare, it seems to be
    suggesting the system's serious state: so we'd better stop the works
    following the kvm_creat_vm().

    This patch clears these problems.

    We move the coalesced mmio's initialization out of kvm_create_vm().
    This seems to be natural because it includes a registration which
    can be done only when vm is successfully created.

    Signed-off-by: Takuya Yoshikawa
    Signed-off-by: Marcelo Tosatti

    Takuya Yoshikawa
     
  • This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
    from -EINVAL to -ENXIO if no coalesced mmio dev exists.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Marcelo Tosatti

    Wei Yongjun
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

01 Mar, 2010

4 commits


10 Sep, 2009

8 commits

  • Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON
    if it fails. We want to create dynamic MMIO/PIO entries driven from
    userspace later in the series, so we need to enhance the code to be more
    robust with the following changes:

    1) Add a return value to the registration function
    2) Fix up all the callsites to check the return code, handle any
    failures, and percolate the error up to the caller.
    3) Add an unregister function that collapses holes in the array

    Signed-off-by: Gregory Haskins
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Gregory Haskins
     
  • This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write
    functions. in_range now becomes unused so it is removed from device ops in
    favor of read/write callbacks performing range checks internally.

    This allows aliasing (mostly for in-kernel virtio), as well as better error
    handling by making it possible to pass errors up to userspace.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     
  • Use slots_lock to protect device list on the bus. slots_lock is already
    taken for read everywhere, so we only need to take it for write when
    registering devices. This is in preparation to removing in_range and
    kvm->lock around it.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     
  • switch coalesced mmio slots_lock. slots_lock is already taken for read
    everywhere, so we only need to take it for write when changing zones.
    This is in preparation to removing in_range and kvm->lock around it.

    [avi: fix build]

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     
  • Move coalesced_mmio locking to its own device, instead of relying on
    kvm->lock.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Instead of checking whether we'll wrap around, calculate how many entries
    are available, and check whether we have enough (just one) for the pending
    mmio.

    By itself, this doesn't change anything, but it paves the way for making
    this function lockless.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • We modernize the io_device code so that we use container_of() instead of
    dev->private, and move the vtable to a separate ops structure
    (theoretically allows better caching for multiple instances of the same
    ops structure)

    Signed-off-by: Gregory Haskins
    Acked-by: Chris Wright
    Signed-off-by: Avi Kivity

    Gregory Haskins
     
  • We invoke kfree() on a data member instead of the structure. This works today
    because the kvm_io_device is the first element of the private structure, but
    this could change in the future, so lets clean this up.

    Signed-off-by: Gregory Haskins
    Acked-by: Chris Wright
    Signed-off-by: Avi Kivity

    Gregory Haskins
     

20 Jul, 2008

1 commit

  • This patch adds all needed structures to coalesce MMIOs.
    Until an architecture uses it, it is not compiled.

    Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
    can be coalesced:

    - KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
    It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
    a memory area where MMIOs can be coalesced until the next switch to
    user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.

    - KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
    the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).

    The userspace client can check kernel coalesced MMIO availability by asking
    ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
    The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
    or the page offset where will be stored the ring buffer.
    The page offset depends on the architecture.

    After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
    a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
    an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
    to the address of the start of th kvm_run structure. The MMIO ring buffer
    is defined by the structure kvm_coalesced_mmio_ring.

    [akio: fix oops during guest shutdown]

    Signed-off-by: Laurent Vivier
    Signed-off-by: Akio Takebe
    Signed-off-by: Avi Kivity

    Laurent Vivier