13 Aug, 2010

1 commit

  • * 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    x86: Detect whether we should use Xen SWIOTLB.
    pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
    swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
    xen/mmu: inhibit vmap aliases rather than trying to clear them out
    vmap: add flag to allow lazy unmap to be disabled at runtime
    xen: Add xen_create_contiguous_region
    xen: Rename the balloon lock
    xen: Allow unprivileged Xen domains to create iomap pages
    xen: use _PAGE_IOMAP in ioremap to do machine mappings

    Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
    driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
    include/xen/xen-ops.h

    Linus Torvalds
     

27 Jul, 2010

4 commits

  • This patchset:

    PV guests under Xen are running in an non-contiguous memory architecture.

    When PCI pass-through is utilized, this necessitates an IOMMU for
    translating bus (DMA) to virtual and vice-versa and also providing a
    mechanism to have contiguous pages for device drivers operations (say DMA
    operations).

    Specifically, under Xen the Linux idea of pages is an illusion. It
    assumes that pages start at zero and go up to the available memory. To
    help with that, the Linux Xen MMU provides a lookup mechanism to
    translate the page frame numbers (PFN) to machine frame numbers (MFN)
    and vice-versa. The MFN are the "real" frame numbers. Furthermore
    memory is not contiguous. Xen hypervisor stitches memory for guests
    from different pools, which means there is no guarantee that PFN==MFN
    and PFN+1==MFN+1. Lastly with Xen 4.0, pages (in debug mode) are
    allocated in descending order (high to low), meaning the guest might
    never get any MFN's under the 4GB mark.

    Signed-off-by: Konrad Rzeszutek Wilk
    Acked-by: Jeremy Fitzhardinge
    Cc: FUJITA Tomonori
    Cc: Albert Herranz
    Cc: Ian Campbell

    Konrad Rzeszutek Wilk
     
  • When a pagetable is about to be destroyed, we notify Xen so that the
    hypervisor can clear the related shadow pagetable.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • Add a xen_emul_unplug command line option to the kernel to unplug
    xen emulated disks and nics.

    Set the default value of xen_emul_unplug depending on whether or
    not the Xen PV frontends and the Xen platform PCI driver have
    been compiled for this kernel (modules or built-in are both OK).

    The user can specify xen_emul_unplug=ignore to enable PV drivers on HVM
    even if the host platform doesn't support unplug.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • Use xen_vcpuop_clockevent instead of hpet and APIC timers as main
    clockevent device on all vcpus, use the xen wallclock time as wallclock
    instead of rtc and use xen_clocksource as clocksource.
    The pv clock algorithm needs to work correctly for the xen_clocksource
    and xen wallclock to be usable, only modern Xen versions offer a
    reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock).

    Using the hpet as clocksource means a VMEXIT every time we read/write to
    the hpet mmio addresses, pvclock give us a better rating without
    VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Don Dutile
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     

23 Jul, 2010

4 commits

  • Suspend/resume requires few different things on HVM: the suspend
    hypercall is different; we don't need to save/restore memory related
    settings; except the shared info page and the callback mechanism.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • Add the xen pci platform device driver that is responsible
    for initializing the grant table and xenbus in PV on HVM mode.
    Few changes to xenbus and grant table are necessary to allow the delayed
    initialization in HVM mode.
    Grant table needs few additional modifications to work in HVM mode.

    The Xen PCI platform device raises an irq every time an event has been
    delivered to us. However these interrupts are only delivered to vcpu 0.
    The Xen PCI platform interrupt handler calls xen_hvm_evtchn_do_upcall
    that is a little wrapper around __xen_evtchn_do_upcall, the traditional
    Xen upcall handler, the very same used with traditional PV guests.

    When running on HVM the event channel upcall is never called while in
    progress because it is a normal Linux irq handler (and we cannot switch
    the irq chip wholesale to the Xen PV ones as we are running QEMU and
    might have passed in PCI devices), therefore we cannot be sure that
    evtchn_upcall_pending is 0 when returning.
    For this reason if evtchn_upcall_pending is set by Xen we need to loop
    again on the event channels set pending otherwise we might loose some
    event channel deliveries.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Sheng Yang
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • Set the callback to receive evtchns from Xen, using the
    callback vector delivery mechanism.

    The traditional way for receiving event channel notifications from Xen
    is via the interrupts from the platform PCI device.
    The callback vector is a newer alternative that allow us to receive
    notifications on any vcpu and doesn't need any PCI support: we allocate
    a vector exclusively to receive events, in the vector handler we don't
    need to interact with the vlapic, therefore we avoid a VMEXIT.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Sheng Yang
    Signed-off-by: Jeremy Fitzhardinge

    Sheng Yang
     
  • Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Sheng Yang
    Signed-off-by: Stefano Stabellini

    Jeremy Fitzhardinge
     

08 Jun, 2010

2 commits

  • A memory region must be physically contiguous in order to be accessed
    through DMA. This patch adds xen_create_contiguous_region, which
    ensures a region of contiguous virtual memory is also physically
    contiguous.

    Based on Stephen Tweedie's port of the 2.6.18-xen version.

    Remove contiguous_bitmap[] as it's no longer needed.

    Ported from linux-2.6.18-xen.hg 707:e410857fd83c

    [ Impact: add Xen-internal API to make pages phys-contig ]

    Signed-off-by: Alex Nixon
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Alex Nixon
     
  • * xen_create_contiguous_region needs access to the balloon lock to
    ensure memory doesn't change under its feet, so expose the balloon
    lock
    * Change the name of the lock to xen_reservation_lock, to imply it's
    now less-specific usage.

    [ Impact: cleanup ]

    Signed-off-by: Alex Nixon
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Konrad Rzeszutek Wilk

    Alex Nixon
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

05 Nov, 2009

1 commit


31 Mar, 2009

7 commits


09 Jan, 2009

1 commit

  • The xenfs filesystem exports various interfaces to usermode. Initially
    this exports a file to allow usermode to interact with xenbus/xenstore.

    Traditionally this appeared in /proc/xen. Rather than extending procfs,
    this patch adds a backward-compat mountpoint on /proc/xen, and provides
    a xenfs filesystem which can be mounted there.

    Signed-off-by: Alex Zeffertt
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Zeffertt
     

17 Dec, 2008

1 commit


03 Oct, 2008

1 commit


21 Aug, 2008

1 commit

  • A spinlock can be interrupted while spinning, so make sure we preserve
    the previous lock of interest if we're taking a lock from within an
    interrupt handler.

    We also need to deal with the case where the blocking path gets
    interrupted between testing to see if the lock is free and actually
    blocking. If we get interrupted there and end up in the state where
    the lock is free but the irq isn't pending, then we'll block
    indefinitely in the hypervisor. This fix is to make sure that any
    nested lock-takers will always leave the irq pending if there's any
    chance the outer lock became free.

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Jan Beulich
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     

16 Jul, 2008

4 commits

  • The standard ticket spinlocks are very expensive in a virtual
    environment, because their performance depends on Xen's scheduler
    giving vcpus time in the order that they're supposed to take the
    spinlock.

    This implements a Xen-specific spinlock, which should be much more
    efficient.

    The fast-path is essentially the old Linux-x86 locks, using a single
    lock byte. The locker decrements the byte; if the result is 0, then
    they have the lock. If the lock is negative, then locker must spin
    until the lock is positive again.

    When there's contention, the locker spin for 2^16[*] iterations waiting
    to get the lock. If it fails to get the lock in that time, it adds
    itself to the contention count in the lock and blocks on a per-cpu
    event channel.

    When unlocking the spinlock, the locker looks to see if there's anyone
    blocked waiting for the lock by checking for a non-zero waiter count.
    If there's a waiter, it traverses the per-cpu "lock_spinners"
    variable, which contains which lock each CPU is waiting on. It picks
    one CPU waiting on the lock and sends it an event to wake it up.

    This allows efficient fast-path spinlock operation, while allowing
    spinning vcpus to give up their processor time while waiting for a
    contended lock.

    [*] 2^16 iterations is threshold at which 98% locks have been taken
    according to Thomas Friebel's Xen Summit talk "Preventing Guests from
    Spinning Around". Therefore, we'd expect the lock and unlock slow
    paths will only be entered 2% of the time.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Jens Axboe
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Cc: Petr Tesarik
    Cc: Virtualization
    Cc: Xen devel
    Cc: Thomas Friebel
    Cc: Nick Piggin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • fix:

    arch/x86/xen/built-in.o: In function `set_page_prot':
    enlighten.c:(.text+0x111d): undefined reference to `xen_raw_printk'
    arch/x86/xen/built-in.o: In function `xen_start_kernel':
    : undefined reference to `xen_raw_console_write'
    arch/x86/xen/built-in.o: In function `xen_start_kernel':
    : undefined reference to `xen_raw_console_write'

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Copy 64-bit definitions of various interface structures into place.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • add xen_timer_resume() hook.

    Timer resume should be done after event channel is resumed.
    add xen_arch_resume() hook when ipi becomes usable after resume.
    After resume, some cpu specific resource must be reinitialized
    on ia64 that can't be set by another cpu.

    However available hooks is run once on only one cpu so that ipi has
    to be used.

    During stop_machine_run() ipi can't be used because interrupt is masked.
    So add another hook after stop_machine_run().
    Another approach might be use resume hook which is run by
    device_resume(). However device_resume() may be executed on
    suspend error recovery path.

    So it is necessary to determine whether it is executed on real resume path
    or error recovery path.

    Signed-off-by: Isaku Yamahata
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Isaku Yamahata
     

25 Jun, 2008

3 commits

  • Xen has a pte update function which will update a pte while preserving
    its accessed and dirty bits. This means that ptep_modify_prot_start() can be
    implemented as a simple read of the pte value. The hardware may
    update the pte in the meantime, but ptep_modify_prot_commit() updates it while
    preserving any changes that may have happened in the meantime.

    The updates in ptep_modify_prot_commit() are batched if we're currently in lazy
    mmu mode.

    The mmu_update hypercall can take a batch of updates to perform, but
    this code doesn't make particular use of that feature, in favour of
    using generic multicall batching to get them all into the hypervisor.

    The net effect of this is that each mprotect pte update turns from two
    expensive trap-and-emulate faults into they hypervisor into a single
    hypercall whose cost is amortized in a batched multicall.

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Linus Torvalds
    Acked-by: Hugh Dickins
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Conflicts:

    arch/x86/xen/enlighten.c
    arch/x86/xen/mmu.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • This patch updates the xen guest to use the pvclock structs
    and helper functions.

    Signed-off-by: Gerd Hoffmann
    Acked-by: Jeremy Fitzhardinge
    Signed-off-by: Avi Kivity

    Gerd Hoffmann
     

02 Jun, 2008

1 commit


29 May, 2008

1 commit

  • -tip testing found the following build breakage:

    drivers/built-in.o: In function `xen_suspend':
    manage.c:(.text+0x4390f): undefined reference to `xen_console_resume'

    with this config:

    http://redhat.com/~mingo/misc/config-Thu_May_29_09_23_16_CEST_2008.bad

    i have bisected it down to:

    | commit 0e91398f2a5d4eb6b07df8115917d0d1cf3e9b58
    | Author: Jeremy Fitzhardinge
    | Date: Mon May 26 23:31:27 2008 +0100
    |
    | xen: implement save/restore

    the problem is that drivers/xen/manage.c is built unconditionally if
    CONFIG_XEN is enabled and makes use of xen_suspend(), but
    drivers/char/hvc_xen.c, where the xen_suspend() method is implemented,
    is only build if CONFIG_HVC_XEN=y as well.

    i have solved this by providing a NOP implementation for xen_suspend()
    in the !CONFIG_HVC_XEN case.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

27 May, 2008

7 commits

  • Hook into the device model to make sure that timekeeping's resume handler
    is called. This deals with our clocksource's non-monotonicity over the
    save/restore. Explicitly call clock_has_changed() to make sure that
    all the timers get retriggered properly.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge
     
  • This patch implements Xen save/restore and migration.

    Saving is triggered via xenbus, which is polled in
    drivers/xen/manage.c. When a suspend request comes in, the kernel
    prepares itself for saving by:

    1 - Freeze all processes. This is primarily to prevent any
    partially-completed pagetable updates from confusing the suspend
    process. If CONFIG_PREEMPT isn't defined, then this isn't necessary.

    2 - Suspend xenbus and other devices

    3 - Stop_machine, to make sure all the other vcpus are quiescent. The
    Xen tools require the domain to run its save off vcpu0.

    4 - Within the stop_machine state, it pins any unpinned pgds (under
    construction or destruction), performs canonicalizes various other
    pieces of state (mostly converting mfns to pfns), and finally

    5 - Suspend the domain

    Restore reverses the steps used to save the domain, ending when all
    the frozen processes are thawed.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge
     
  • Add code to:

    1. Deal with the console page being canonicalized. During save, the
    console's mfn in the start_info structure is canonicalized to a pfn.
    In order to deal with that, we always use a copy of the pfn and
    indirect off that all the time. However, we fall back to using the
    mfn if the pfn hasn't been initialized yet.

    2. Restore the console event channel, and rebind it to the existing irq.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge
     
  • Add rebind_evtchn_irq(), which will rebind an device driver's existing
    irq to a new event channel on restore. Since the new event channel
    will be masked and bound to vcpu0, we update the state accordingly and
    unmask the irq once everything is set up.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge
     
  • Add xen handles realted definitions for xen memory which ia64/xen needs.
    Pointer argumsnts for ia64/xen hypercall are passed in pseudo physical
    address (guest physical address) so that it is required to convert
    guest kernel virtual address into pseudo physical address.
    The xen guest handle represents such arguments.

    Signed-off-by: Isaku Yamahata
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Isaku Yamahata
     
  • The pvfb backend indicates dynamic mode support by creating node
    feature_resize with a non-zero value in its xenstore directory.
    xen-fbfront sends a resize notification event on mode change. Fully
    backwards compatible both ways.

    Framebuffer size and initial resolution can be controlled through
    kernel parameter xen_fbfront.video. The backend enforces a separate
    size limit, which it advertises in node videoram in its xenstore
    directory.

    xen-kbdfront gets the maximum screen resolution from nodes width and
    height in the backend's xenstore directory instead of hardcoding it.

    Additional goodie: support for larger framebuffers (512M on a 64-bit
    system with 4K pages).

    Changing the number of bits per pixels dynamically is not supported,
    yet.

    Ported from
    http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/92f7b3144f41
    http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/bfc040135633

    Signed-off-by: Pat Campbell
    Signed-off-by: Markus Armbruster
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Markus Armbruster
     
  • Add z-axis motion to pointer events. Backward compatible, because
    there's space for the z-axis in union xenkbd_in_event, and old
    backends zero it.

    Derived from
    http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/57dfe0098000
    http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/1edfea26a2a9
    http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/c3ff0b26f664

    Signed-off-by: Pat Campbell
    Signed-off-by: Markus Armbruster
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Markus Armbruster