15 Feb, 2013

1 commit

  • This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.

    We are doing this b/c on 32-bit PVonHVM with older hypervisors
    (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
    we use it for the time keeping, and the timekeeping code loops
    forever - as the version field never changes. Olaf says to
    revert it, so lets do that.

    Acked-by: Olaf Hering
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     

02 Nov, 2012

1 commit

  • This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
    ("xen PVonHVM: move shared_info to MMIO before kexec").

    Currently kexec in a PVonHVM guest fails with a triple fault because the
    new kernel overwrites the shared info page. The exact failure depends on
    the size of the kernel image. This patch moves the pfn from RAM into an
    E820 reserved memory area.

    The pfn containing the shared_info is located somewhere in RAM. This will
    cause trouble if the current kernel is doing a kexec boot into a new
    kernel. The new kernel (and its startup code) can not know where the pfn
    is, so it can not reserve the page. The hypervisor will continue to update
    the pfn, and as a result memory corruption occours in the new kernel.

    The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
    E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
    starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
    will usually not allocate areas up to FE700000, so FE700000 is expected
    to work also with older toolstacks.

    In Xen3 there is no reserved area at a fixed location. If the guest is
    started on such old hosts the shared_info page will be placed in RAM. As
    a result kexec can not be used.

    Signed-off-by: Olaf Hering
    Signed-off-by: Konrad Rzeszutek Wilk

    Olaf Hering
     

17 Aug, 2012

1 commit

  • This reverts commit 00e37bdb0113a98408de42db85be002f21dbffd3.

    During shutdown of PVHVM guests with more than 2VCPUs on certain
    machines we can hit the race where the replaced shared_info is not
    replaced fast enough and the PV time clock retries reading the same
    area over and over without any any success and is stuck in an
    infinite loop.

    Acked-by: Olaf Hering
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     

20 Jul, 2012

1 commit

  • Currently kexec in a PVonHVM guest fails with a triple fault because the
    new kernel overwrites the shared info page. The exact failure depends on
    the size of the kernel image. This patch moves the pfn from RAM into
    MMIO space before the kexec boot.

    The pfn containing the shared_info is located somewhere in RAM. This
    will cause trouble if the current kernel is doing a kexec boot into a
    new kernel. The new kernel (and its startup code) can not know where the
    pfn is, so it can not reserve the page. The hypervisor will continue to
    update the pfn, and as a result memory corruption occours in the new
    kernel.

    One way to work around this issue is to allocate a page in the
    xen-platform pci device's BAR memory range. But pci init is done very
    late and the shared_info page is already in use very early to read the
    pvclock. So moving the pfn from RAM to MMIO is racy because some code
    paths on other vcpus could access the pfn during the small window when
    the old pfn is moved to the new pfn. There is even a small window were
    the old pfn is not backed by a mfn, and during that time all reads
    return -1.

    Because it is not known upfront where the MMIO region is located it can
    not be used right from the start in xen_hvm_init_shared_info.

    To minimise trouble the move of the pfn is done shortly before kexec.
    This does not eliminate the race because all vcpus are still online when
    the syscore_ops will be called. But hopefully there is no work pending
    at this point in time. Also the syscore_op is run last which reduces the
    risk further.

    Signed-off-by: Olaf Hering
    Signed-off-by: Konrad Rzeszutek Wilk

    Olaf Hering
     

26 Feb, 2011

2 commits


02 Dec, 2010

1 commit


27 Jul, 2010

1 commit

  • Use xen_vcpuop_clockevent instead of hpet and APIC timers as main
    clockevent device on all vcpus, use the xen wallclock time as wallclock
    instead of rtc and use xen_clocksource as clocksource.
    The pv clock algorithm needs to work correctly for the xen_clocksource
    and xen wallclock to be usable, only modern Xen versions offer a
    reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock).

    Using the hpet as clocksource means a VMEXIT every time we read/write to
    the hpet mmio addresses, pvclock give us a better rating without
    VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Don Dutile
    Signed-off-by: Jeremy Fitzhardinge

    Stefano Stabellini
     

23 Jul, 2010

1 commit


03 Jun, 2010

1 commit

  • The core suspend/resume code is run from stop_machine on CPU0 but
    parts of the suspend/resume machinery (including xen_arch_resume) are
    run on whichever CPU happened to schedule the xenwatch kernel thread.

    As part of the non-core resume code xen_arch_resume is called in order
    to restart the timer tick on non-boot processors. The boot processor
    itself is taken care of by core timekeeping code.

    xen_arch_resume uses smp_call_function which does not call the given
    function on the current processor. This means that we can end up with
    one CPU not receiving timer ticks if the xenwatch thread happened to
    be scheduled on CPU > 0.

    Use on_each_cpu instead of smp_call_function to ensure the timer tick
    is resumed everywhere.

    Signed-off-by: Ian Campbell
    Acked-by: Jeremy Fitzhardinge
    Cc: Stable Kernel # .32.x

    Ian Campbell
     

04 Dec, 2009

2 commits

  • tick_resume() is never called on secondary processors. Presumably this
    is because they are offlined for suspend on native and so this is
    normally taken care of in the CPU onlining path. Under Xen we keep all
    CPUs online over a suspend.

    This patch papers over the issue for me but I will investigate a more
    generic, less hacky, way of doing to the same.

    tick_suspend is also only called on the boot CPU which I presume should
    be fixed too.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel
    Cc: Thomas Gleixner

    Ian Campbell
     
  • pvops kernels >= 2.6.30 can currently only be saved and restored once. The
    second attempt to save results in:

    ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
    ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
    ERROR Internal error: Failed to map/save the p2m frame list

    I finally narrowed it down to:

    commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b
    Author: Jeremy Fitzhardinge
    Date: Fri Feb 27 15:34:59 2009 -0800

    xen: split construction of p2m mfn tables from registration

    Build the p2m_mfn_list_list early with the rest of the p2m table, but
    register it later when the real shared_info structure is in place.

    Signed-off-by: Jeremy Fitzhardinge

    The unforeseen side-effect of this change was to cause the mfn list list to not
    be rebuilt on resume. Prior to this change it would have been rebuilt via
    xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().

    Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     

23 Jan, 2009

1 commit

  • Impact: build fix

    This build error:

    arch/x86/xen/suspend.c:22: error: implicit declaration of function 'fix_to_virt'
    arch/x86/xen/suspend.c:22: error: 'FIX_PARAVIRT_BOOTMAP' undeclared (first use in this function)
    arch/x86/xen/suspend.c:22: error: (Each undeclared identifier is reported only once
    arch/x86/xen/suspend.c:22: error: for each function it appears in.)

    triggers because the hardirq.h unification removed an implicit fixmap.h
    include - on which arch/x86/xen/suspend.c depended. Add the fixmap.h
    include explicitly.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

17 Dec, 2008

1 commit


16 Jul, 2008

1 commit

  • add xen_timer_resume() hook.

    Timer resume should be done after event channel is resumed.
    add xen_arch_resume() hook when ipi becomes usable after resume.
    After resume, some cpu specific resource must be reinitialized
    on ia64 that can't be set by another cpu.

    However available hooks is run once on only one cpu so that ipi has
    to be used.

    During stop_machine_run() ipi can't be used because interrupt is masked.
    So add another hook after stop_machine_run().
    Another approach might be use resume hook which is run by
    device_resume(). However device_resume() may be executed on
    suspend error recovery path.

    So it is necessary to determine whether it is executed on real resume path
    or error recovery path.

    Signed-off-by: Isaku Yamahata
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Isaku Yamahata
     

02 Jun, 2008

2 commits

  • On resume, the vcpu timer modes will not be restored. The timer
    infrastructure doesn't do this for us, since it assumes the cpus
    are offline. We can just poke the other vcpus into the right mode
    directly though.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • If we're using vcpu_info mapping, then make sure its restored on all
    processors before relasing them from stop_machine.

    The only complication is that if this fails, we can't continue because
    we've already made assumptions that the mapping is available (baked in
    calls to the _direct versions of the functions, for example).

    Fortunately this can only happen with a 32-bit hypervisor, which may
    possibly run out of mapping space. On a 64-bit hypervisor, this is a
    non-issue.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     

27 May, 2008

1 commit

  • This patch implements Xen save/restore and migration.

    Saving is triggered via xenbus, which is polled in
    drivers/xen/manage.c. When a suspend request comes in, the kernel
    prepares itself for saving by:

    1 - Freeze all processes. This is primarily to prevent any
    partially-completed pagetable updates from confusing the suspend
    process. If CONFIG_PREEMPT isn't defined, then this isn't necessary.

    2 - Suspend xenbus and other devices

    3 - Stop_machine, to make sure all the other vcpus are quiescent. The
    Xen tools require the domain to run its save off vcpu0.

    4 - Within the stop_machine state, it pins any unpinned pgds (under
    construction or destruction), performs canonicalizes various other
    pieces of state (mostly converting mfns to pfns), and finally

    5 - Suspend the domain

    Restore reverses the steps used to save the domain, ending when all
    the frozen processes are thawed.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Thomas Gleixner

    Jeremy Fitzhardinge