29 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: pxa27x_keypad - remove input_free_device() in pxa27x_keypad_remove()
    Input: mousedev - fix regression of inverting axes
    Input: uinput - add devname alias to allow module on-demand load
    Input: hil_kbd - fix compile error
    USB: drop tty argument from usb_serial_handle_sysrq_char()
    Input: sysrq - drop tty argument form handle_sysrq()
    Input: sysrq - drop tty argument from sysrq ops handlers

    Linus Torvalds
     

25 Aug, 2010

2 commits

  • Xen events are logically edge triggered, as Xen only calls the event
    upcall when an event is newly set, but not continuously as it remains set.
    As a result, use handle_edge_irq rather than handle_level_irq.

    This has the important side-effect of fixing a long-standing bug of
    events getting lost if:
    - an event's interrupt handler is running
    - the event is migrated to a different vcpu
    - the event is re-triggered

    The most noticable symptom of these lost events is occasional lockups
    of blkfront.

    Many thanks to Tom Kopec and Daniel Stodden in tracking this down.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Tom Kopec
    Cc: Daniel Stodden
    Cc: Stable Kernel

    Jeremy Fitzhardinge
     
  • IPIs and VIRQs are inherently per-cpu event types, so treat them as such:
    - use a specific percpu irq_chip implementation, and
    - handle them with handle_percpu_irq

    This makes the path for delivering these interrupts more efficient
    (no masking/unmasking, no locks), and it avoid problems with attempts
    to migrate them.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Jeremy Fitzhardinge
     

21 Aug, 2010

1 commit

  • Sysrq operations do not accept tty argument anymore so no need to pass
    it to us.

    [Stephen Rothwell : fix build breakage in drm code
    caused by sysrq using bool but not including linux/types.h]

    [Sachin Sant : fix build breakage in s390 keyboadr
    driver]

    Acked-by: Alan Cox
    Acked-by: Jason Wessel
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Dmitry Torokhov

    Dmitry Torokhov
     

13 Aug, 2010

1 commit

  • * 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    x86: Detect whether we should use Xen SWIOTLB.
    pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
    swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
    xen/mmu: inhibit vmap aliases rather than trying to clear them out
    vmap: add flag to allow lazy unmap to be disabled at runtime
    xen: Add xen_create_contiguous_region
    xen: Rename the balloon lock
    xen: Allow unprivileged Xen domains to create iomap pages
    xen: use _PAGE_IOMAP in ioremap to do machine mappings

    Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
    driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
    include/xen/xen-ops.h

    Linus Torvalds
     

11 Aug, 2010

1 commit

  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     

08 Aug, 2010

1 commit

  • According to the comments, this was how it's been done years ago, but
    apparently took an xbt pointer from elsewhere back then. The code was
    removed because of consistency issues: cancellation wont't roll back
    the saved xbdev->state.

    Still, unsolicited writes to the state field remain an issue,
    especially if device shutdown takes thread synchronization, and subtle
    races cause accidental recreation of the device node.

    Fixed by reintroducing the transaction. An internal one is sufficient,
    so the xbdev->state value remains consistent.

    Also fixes the original hack to prevent infinite recursion. Instead of
    bailing out on the first attempt to switch to Closing, checks call
    depth now.

    Signed-off-by: Daniel Stodden
    Signed-off-by: Jeremy Fitzhardinge

    Daniel Stodden
     

07 Aug, 2010

1 commit


05 Aug, 2010

4 commits

  • * xen/xenbus:
    implement O_NONBLOCK for /proc/xen/xenbus
    xenbus: do not hold transaction_mutex when returning to userspace

    Jeremy Fitzhardinge
     
  • * upstream/pvhvm:
    Introduce CONFIG_XEN_PVHVM compile option
    blkfront: do not create a PV cdrom device if xen_hvm_guest
    support multiple .discard.* sections to avoid section type conflicts
    xen/pvhvm: fix build problem when !CONFIG_XEN
    xenfs: enable for HVM domains too
    x86: Call HVMOP_pagetable_dying on exit_mmap.
    x86: Unplug emulated disks and nics.
    x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
    xen: Fix find_unbound_irq in presence of ioapic irqs.
    xen: Add suspend/resume support for PV on HVM guests.
    xen: Xen PCI platform device driver.
    x86/xen: event channels delivery on HVM.
    x86: early PV on HVM features initialization.
    xen: Add support for HVM hypercalls.

    Conflicts:
    arch/x86/xen/enlighten.c
    arch/x86/xen/time.c

    Jeremy Fitzhardinge
     
  • * upstream/core:
    xen/panic: use xen_reboot and fix smp_send_stop
    Xen: register panic notifier to take crashes of xen guests on panic
    xen: support large numbers of CPUs with vcpu info placement
    xen: drop xen_sched_clock in favour of using plain wallclock time
    pvops: do not notify callers from register_xenstore_notifier
    xen: make sure pages are really part of domain before freeing
    xen: release unused free memory

    Jeremy Fitzhardinge
     
  • Currently register_xenstore_notifier notifies the caller during the
    registration itself if xenstore is believed to be ready. This behaviour
    causes problems to PV on HVM guests, in which case callers should be
    notified by xenbus_probe only after the platform pci driver is loaded.
    We already make sure xenbus_probe is called at the right time, calling
    it either from device_initcall (PV case) or from the platform pci
    driver initialization (HVM case) so we don't need this additional
    notification.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     

30 Jul, 2010

1 commit


29 Jul, 2010

1 commit

  • In general the semantics of IPIs are that they are are expected to
    continue functioning after dpm_suspend_noirq().

    Specifically I have seen a deadlock between the callfunc IPI and the
    stop machine used by xen's do_suspend() routine. If one CPU has already
    called dpm_suspend_noirq() then there is a window where it can be sent
    a callfunc IPI before all the other CPUs have entered stop_cpu().

    If this happens then the first CPU ends up spinning in stop_cpu()
    waiting for the other to rendezvous in state STOPMACHINE_PREPARE while
    the other is spinning in csd_lock_wait().

    Signed-off-by: Ian Campbell
    Cc: Jeremy Fitzhardinge
    Cc: xen-devel@lists.xensource.com
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Ian Campbell
     

27 Jul, 2010

4 commits

  • This patchset:

    PV guests under Xen are running in an non-contiguous memory architecture.

    When PCI pass-through is utilized, this necessitates an IOMMU for
    translating bus (DMA) to virtual and vice-versa and also providing a
    mechanism to have contiguous pages for device drivers operations (say DMA
    operations).

    Specifically, under Xen the Linux idea of pages is an illusion. It
    assumes that pages start at zero and go up to the available memory. To
    help with that, the Linux Xen MMU provides a lookup mechanism to
    translate the page frame numbers (PFN) to machine frame numbers (MFN)
    and vice-versa. The MFN are the "real" frame numbers. Furthermore
    memory is not contiguous. Xen hypervisor stitches memory for guests
    from different pools, which means there is no guarantee that PFN==MFN
    and PFN+1==MFN+1. Lastly with Xen 4.0, pages (in debug mode) are
    allocated in descending order (high to low), meaning the guest might
    never get any MFN's under the 4GB mark.

    Signed-off-by: Konrad Rzeszutek Wilk
    Acked-by: Jeremy Fitzhardinge
    Cc: FUJITA Tomonori
    Cc: Albert Herranz
    Cc: Ian Campbell

    Konrad Rzeszutek Wilk
     
  • Signed-off-by: Jeremy Fitzhardinge

    Jeremy Fitzhardinge
     
  • Add a xen_emul_unplug command line option to the kernel to unplug
    xen emulated disks and nics.

    Set the default value of xen_emul_unplug depending on whether or
    not the Xen PV frontends and the Xen platform PCI driver have
    been compiled for this kernel (modules or built-in are both OK).

    The user can specify xen_emul_unplug=ignore to enable PV drivers on HVM
    even if the host platform doesn't support unplug.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • This patch implements O_NONBLOCK for /proc/xen/xenbus. It is a simple
    matter of returning -EAGAIN instead of waiting on a queue.

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Jeremy Fitzhardinge

    Paolo Bonzini
     

23 Jul, 2010

5 commits

  • Don't break the assumption that the first 16 irqs are ISA irqs;
    make sure that the irq is actually free before using it.

    Use dynamic_irq_init_keep_chip_data instead of
    dynamic_irq_init so that chip_data is not NULL (a NULL chip_data breaks
    setup_vector_irq).

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • Suspend/resume requires few different things on HVM: the suspend
    hypercall is different; we don't need to save/restore memory related
    settings; except the shared info page and the callback mechanism.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • Add the xen pci platform device driver that is responsible
    for initializing the grant table and xenbus in PV on HVM mode.
    Few changes to xenbus and grant table are necessary to allow the delayed
    initialization in HVM mode.
    Grant table needs few additional modifications to work in HVM mode.

    The Xen PCI platform device raises an irq every time an event has been
    delivered to us. However these interrupts are only delivered to vcpu 0.
    The Xen PCI platform interrupt handler calls xen_hvm_evtchn_do_upcall
    that is a little wrapper around __xen_evtchn_do_upcall, the traditional
    Xen upcall handler, the very same used with traditional PV guests.

    When running on HVM the event channel upcall is never called while in
    progress because it is a normal Linux irq handler (and we cannot switch
    the irq chip wholesale to the Xen PV ones as we are running QEMU and
    might have passed in PCI devices), therefore we cannot be sure that
    evtchn_upcall_pending is 0 when returning.
    For this reason if evtchn_upcall_pending is set by Xen we need to loop
    again on the event channels set pending otherwise we might loose some
    event channel deliveries.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Sheng Yang
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     
  • Set the callback to receive evtchns from Xen, using the
    callback vector delivery mechanism.

    The traditional way for receiving event channel notifications from Xen
    is via the interrupts from the platform PCI device.
    The callback vector is a newer alternative that allow us to receive
    notifications on any vcpu and doesn't need any PCI support: we allocate
    a vector exclusively to receive events, in the vector handler we don't
    need to interact with the vlapic, therefore we avoid a VMEXIT.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Sheng Yang
    Signed-off-by: Jeremy Fitzhardinge

    Sheng Yang
     
  • Initialize basic pv on hvm features adding a new Xen HVM specific
    hypervisor_x86 structure.

    Don't try to initialize xen-kbdfront and xen-fbfront when running on HVM
    because the backends are not available.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Sheng Yang
    Signed-off-by: Yaozu (Eddie) Dong
    Signed-off-by: Jeremy Fitzhardinge

    Sheng Yang
     

08 Jun, 2010

1 commit

  • * xen_create_contiguous_region needs access to the balloon lock to
    ensure memory doesn't change under its feet, so expose the balloon
    lock
    * Change the name of the lock to xen_reservation_lock, to imply it's
    now less-specific usage.

    [ Impact: cleanup ]

    Signed-off-by: Alex Nixon
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Konrad Rzeszutek Wilk

    Alex Nixon
     

03 Jun, 2010

1 commit

  • Since the device we are resuming could be the device containing the
    swap device we should ensure that the allocation cannot cause
    IO.

    On resume, this path is triggered when the running system tries to
    continue using its devices. If it cannot then the resume will fail;
    to try to avoid this we let it dip into the emergency pools.

    The majority of these changes were made when linux-2.6.18-xen.hg
    changeset e8b49cfbdac0 was ported upstream in
    a144ff09bc52ef3f3684ed23eadc9c7c0e57b3aa but somehow this hunk was
    dropped.

    Signed-off-by: Ian Campbell
    Acked-by: Jeremy Fitzhardinge
    Cc: Stable Kernel # .32.x

    Ian Campbell
     

25 May, 2010

1 commit

  • Fix build error when CONFIG_MAGIC_SYSRQ is not enabled:

    drivers/xen/manage.c:223: error: implicit declaration of function 'handle_sysrq'

    Signed-off-by: Randy Dunlap
    Acked-by: Jeremy Fitzhardinge
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

07 May, 2010

1 commit

  • Reimplement stop_machine using cpu_stop. As cpu stoppers are
    guaranteed to be available for all online cpus,
    stop_machine_create/destroy() are no longer necessary and removed.

    With resource management and synchronization handled by cpu_stop, the
    new implementation is much simpler. Asking the cpu_stop to execute
    the stop_cpu() state machine on all online cpus with cpu hotplug
    disabled is enough.

    stop_machine itself doesn't need to manage any global resources
    anymore, so all per-instance information is rolled into struct
    stop_machine_data and the mutex and all static data variables are
    removed.

    The previous implementation created and destroyed RT workqueues as
    necessary which made stop_machine() calls highly expensive on very
    large machines. According to Dimitri Sivanich, preventing the dynamic
    creation/destruction makes booting faster more than twice on very
    large machines. cpu_stop resources are preallocated for all online
    cpus and should have the same effect.

    Signed-off-by: Tejun Heo
    Acked-by: Rusty Russell
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

08 Mar, 2010

1 commit

  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     

07 Mar, 2010

1 commit

  • Currently the xen support drivers are displayed in the main Device Drivers
    menu of the config tools instead of in their own sub-menu, so move them to
    their own sub-menu, like the rest of the driver world uses.

    This keeps the main Device Drivers menu from becoming messy.

    Signed-off-by: Randy Dunlap
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

19 Feb, 2010

1 commit

  • Right now xen's use of the x86 and ia64 handle_irq is just bizarre and very
    fragile as it is very non-obvious the function exists and is is used by
    code out in drivers/.... Luckily using handle_irq is completely unnecessary,
    and we can just use the generic irq apis instead.

    This still leaves drivers/xen/events.c as a problematic user of the generic
    irq apis it has "static struct irq_info irq_info[NR_IRQS]" but that can be
    fixed some other time.

    Signed-off-by: Eric W. Biederman
    LKML-Reference:
    Acked-by: Jeremy Fitzhardinge
    Cc: Ian Campbell
    Signed-off-by: H. Peter Anvin

    Eric W. Biederman
     

13 Jan, 2010

1 commit

  • In 65f63384 "xen: improve error handling in do_suspend" I said:
    - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
    nested in the obvious way.
    and changed the ordering of the calls as so:
    BEFORE AFTER
    xs_suspend dpm_suspend_noirq
    dpm_suspend_noirq xs_suspend
    *SUSPEND* *SUSPEND*
    dpm_resume_noirq dpm_resume_noirq
    xs_resume xs_resume
    Clearly this is not an improvement and I was talking rubbish.

    In particular the new ordering is susceptible to a hang if a xenstore write is
    in progress at the point at which the suspend kicks in. When the suspend
    process calls xs_suspend it tries to take the request_mutex but if a write is
    in progress it could be looping in xenbus_xs.c:read_reply() waiting for
    something to arrive on &xs_state.reply_list while holding the request_mutex
    (taken in the caller of read_reply).

    However if we have done dpm_suspend_noirq before xs_suspend then we won't get
    any more xenstore interrupts and process_msg() will never be woken up to add
    anything to the reply_list.

    Fix this by calling xs_suspend before dpm_suspend_noirq. If dpm_suspend_noirq
    fails then make sure we go through the xs_suspend_cancel() code path.

    Signed-off-by: Ian Campbell
    Acked-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     

12 Dec, 2009

1 commit

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (109 commits)
    PCI: fix coding style issue in pci_save_state()
    PCI: add pci_request_acs
    PCI: fix BUG_ON triggered by logical PCIe root port removal
    PCI: remove ifdefed pci_cleanup_aer_correct_error_status
    PCI: unconditionally clear AER uncorr status register during cleanup
    x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource
    PCI: portdrv: remove redundant definitions
    PCI: portdrv: remove unnecessary struct pcie_port_data
    PCI: portdrv: minor cleanup for pcie_port_device_register
    PCI: portdrv: add missing irq cleanup
    PCI: portdrv: enable device before irq initialization
    PCI: portdrv: cleanup service irqs initialization
    PCI: portdrv: check capabilities first
    PCI: portdrv: move PME capability check
    PCI: portdrv: remove redundant pcie type calculation
    PCI: portdrv: cleanup pcie_device registration
    PCI: portdrv: remove redundant pcie_port_device_probe
    PCI: Always set prefetchable base/limit upper32 registers
    PCI: read-modify-write the pcie device control register when initiating pcie flr
    PCI: show dma_mask bits in /sys
    ...

    Fixed up conflicts in:
    arch/x86/kernel/amd_iommu_init.c
    drivers/pci/dmar.c
    drivers/pci/hotplug/acpiphp_glue.c

    Linus Torvalds
     

11 Dec, 2009

1 commit

  • * 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
    xen: try harder to balloon up under memory pressure.
    Xen balloon: fix totalram_pages counting.
    xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
    xen: improve error handling in do_suspend.
    xen: don't leak IRQs over suspend/resume.
    xen: call clock resume notifier on all CPUs
    xen: use iret for return from 64b kernel to 32b usermode
    xen: don't call dpm_resume_noirq() with interrupts disabled.
    xen: register runstate info for boot CPU early
    xen: register runstate on secondary CPUs
    xen: register timer interrupt with IRQF_TIMER
    xen: correctly restore pfn_to_mfn_list_list after resume
    xen: restore runstate_info even if !have_vcpu_info_placement
    xen: re-register runstate area earlier on resume.
    xen: wait up to 5 minutes for device connetion
    xen: improvement to wait_for_devices()
    xen: fix is_disconnected_device/exists_disconnected_device
    xen/xenbus: make DEVICE_ATTR()s static

    Linus Torvalds
     

05 Dec, 2009

2 commits

  • Currently if the balloon driver is unable to increase the guest's
    reservation it assumes the failure was due to reaching its full
    allocation, gives up on the ballooning operation and records the limit
    it reached as the "hard limit". The driver will not try again until
    the target is set again (even to the same value).

    However it is possible that ballooning has in fact failed due to
    memory pressure in the host and therefore it is desirable to keep
    attempting to reach the target in case memory becomes available. The
    most likely scenario is that some guests are ballooning down while
    others are ballooning up and therefore there is temporary memory
    pressure while things stabilise. You would not expect a well behaved
    toolstack to ask a domain to balloon to more than its allocation nor
    would you expect it to deliberately over-commit memory by setting
    balloon targets which exceed the total host memory.

    This patch drops the concept of a hard limit and causes the balloon
    driver to retry increasing the reservation on a timer in the same
    manner as when decreasing the reservation.

    Also if we partially succeed in increasing the reservation
    (i.e. receive less pages than we asked for) then we may as well keep
    those pages rather than returning them to Xen.

    Signed-off-by: Ian Campbell
    Cc: Stable Kernel

    Ian Campbell
     
  • Change totalram_pages when a single page is added/removed to the
    ballooned list. This avoid totalram_pages to be set erroneously to
    max_pfn at boot.

    Signed-off-by: Gianluca Guida
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Gianluca Guida
     

04 Dec, 2009

4 commits

  • I have observed cases where the implicit stop_machine_destroy() done by
    stop_machine() hangs while destroying the workqueues, specifically in
    kthread_stop(). This seems to be because timer ticks are not restarted
    until after stop_machine() returns.

    Fortunately stop_machine provides a facility to pre-create/post-destroy
    the workqueues so use this to ensure that workqueues are only destroyed
    after everything is really up and running again.

    I only actually observed this failure with 2.6.30. It seems that newer
    kernels are somehow more robust against doing kthread_stop() without timer
    interrupts (I tried some backports of some likely looking candidates but
    did not track down the commit which added this robustness). However this
    change seems like a reasonable belt&braces thing to do.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • The existing error handling has a few issues:
    - If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND.
    - If dpm_suspend_noirq() fails it exits without resuming xenbus.
    - If stop_machine() fails it exits without resuming xenbus or calling
    dpm_resume_end().
    - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
    nested in the obvious way.

    Fix by ensuring each failure case goto's the correct label. Treat a failure of
    stop_machine() as a cancelled suspend in order to follow the correct resume
    path.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • On resume irq_info[*].evtchn is reset to 0 since event channel mappings
    are not preserved over suspend/resume. The other contents of irq_info
    is preserved to allow rebind_evtchn_irq() to function.

    However when a device resumes it will try to unbind from the
    previous IRQ (e.g. blkfront goes blkfront_resume() -> blkif_free() ->
    unbind_from_irqhandler() -> unbind_from_irq()). This will fail due to the
    check for VALID_EVTCHN in unbind_from_irq() and the IRQ is leaked. The
    device will then continue to resume and allocate a new IRQ, eventually
    leading to find_unbound_irq() panic()ing.

    Fix this by changing unbind_from_irq() to handle teardown of interrupts
    which have type!=IRQT_UNBOUND but are not currently bound to a specific
    event channel.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • dpm_resume_noirq() takes a mutex, so it can't be called from a no-interrupt
    context. Don't call it from within the stop-machine function, but just
    afterwards, since we're resuming anyway, regardless of what happened.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Jeremy Fitzhardinge