14 Dec, 2015

1 commit

  • Using MMUEXT_TLB_FLUSH_MULTI doesn't buy us much since the hypervisor
    will likely perform same IPIs as would have the guest.

    More importantly, using MMUEXT_INVLPG_MULTI may not to invalidate the
    guest's address on remote CPU (when, for example, VCPU from another guest
    is running there).

    Signed-off-by: Boris Ostrovsky
    Suggested-by: Jan Beulich
    Signed-off-by: David Vrabel

    Boris Ostrovsky
     

28 Oct, 2015

1 commit


09 Sep, 2015

1 commit

  • The privcmd code is mixing the usage of GFN and MFN within the same
    functions which make the code difficult to understand when you only work
    with auto-translated guests.

    The privcmd driver is only dealing with GFN so replace all the mention
    of MFN into GFN.

    The ioctl structure used to map foreign change has been left unchanged
    given that the userspace is using it. Nonetheless, add a comment to
    explain the expected value within the "mfn" field.

    Signed-off-by: Julien Grall
    Reviewed-by: David Vrabel
    Signed-off-by: David Vrabel

    Julien Grall
     

20 Aug, 2015

4 commits

  • Check whether the hypervisor supplied p2m list is placed at a location
    which is conflicting with the target E820 map. If this is the case
    relocate it to a new area unused up to now and compliant to the E820
    map.

    As the p2m list might by huge (up to several GB) and is required to be
    mapped virtually, set up a temporary mapping for the copied list.

    For pvh domains just delete the p2m related information from start
    info instead of reserving the p2m memory, as we don't need it at all.

    For 32 bit kernels adjust the memblock_reserve() parameters in order
    to cover the page tables only. This requires to memblock_reserve() the
    start_info page on it's own.

    Signed-off-by: Juergen Gross
    Acked-by: Konrad Rzeszutek Wilk
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Some special pages containing interfaces to xen are being reserved
    implicitly only today. The memblock_reserve() call to reserve them is
    meant to reserve the p2m list supplied by xen. It is just reserving
    not only the p2m list itself, but some more pages up to the start of
    the xen built page tables.

    To be able to move the p2m list to another pfn range, which is needed
    for support of huge RAM, this memblock_reserve() must be split up to
    cover all affected reserved pages explicitly.

    The affected pages are:
    - start_info page
    - xenstore ring (might be missing, mfn is 0 in this case)
    - console ring (not for initial domain)

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Check whether the page tables built by the domain builder are at
    memory addresses which are in conflict with the target memory map.
    If this is the case just panic instead of running into problems
    later.

    Signed-off-by: Juergen Gross
    Acked-by: Konrad Rzeszutek Wilk
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Direct Xen to place the initial P->M table outside of the initial
    mapping, as otherwise the 1G (implementation) / 2G (theoretical)
    restriction on the size of the initial mapping limits the amount
    of memory a domain can be handed initially.

    As the initial P->M table is copied rather early during boot to
    domain private memory and it's initial virtual mapping is dropped,
    the easiest way to avoid virtual address conflicts with other
    addresses in the kernel is to use a user address area for the
    virtual address of the initial P->M table. This allows us to just
    throw away the page tables of the initial mapping after the copy
    without having to care about address invalidation.

    It should be noted that this patch won't enable a pv-domain to USE
    more than 512 GB of RAM. It just enables it to be started with a
    P->M table covering more memory. This is especially important for
    being able to boot a Dom0 on a system with more than 512 GB memory.

    Signed-off-by: Juergen Gross
    Based-on-patch-by: Jan Beulich
    Acked-by: Konrad Rzeszutek Wilk
    Signed-off-by: David Vrabel

    Juergen Gross
     

17 Apr, 2015

1 commit

  • Pull xen features and fixes from David Vrabel:

    - use a single source list of hypercalls, generating other tables etc.
    at build time.

    - add a "Xen PV" APIC driver to support >255 VCPUs in PV guests.

    - significant performance improve to guest save/restore/migration.

    - scsiback/front save/restore support.

    - infrastructure for multi-page xenbus rings.

    - misc fixes.

    * tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/pci: Try harder to get PXM information for Xen
    xenbus_client: Extend interface to support multi-page ring
    xen-pciback: also support disabling of bus-mastering and memory-write-invalidate
    xen: support suspend/resume in pvscsi frontend
    xen: scsiback: add LUN of restored domain
    xen-scsiback: define a pr_fmt macro with xen-pvscsi
    xen/mce: fix up xen_late_init_mcelog() error handling
    xen/privcmd: improve performance of MMAPBATCH_V2
    xen: unify foreign GFN map/unmap for auto-xlated physmap guests
    x86/xen/apic: WARN with details.
    x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs
    xen/pciback: Don't print scary messages when unsupported by hypervisor.
    xen: use generated hypercall symbols in arch/x86/xen/xen-head.S
    xen: use generated hypervisor symbols in arch/x86/xen/trace.c
    xen: synchronize include/xen/interface/xen.h with xen
    xen: build infrastructure for generating hypercall depending symbols
    xen: balloon: Use static attribute groups for sysfs entries
    xen: pcpu: Use static attribute groups for sysfs entry

    Linus Torvalds
     

15 Apr, 2015

1 commit


16 Mar, 2015

2 commits

  • Make the IOCTL_PRIVCMD_MMAPBATCH_V2 (and older V1 version) map
    multiple frames at a time rather than one at a time, despite the pages
    being non-consecutive GFNs.

    xen_remap_foreign_mfn_array() is added which maps an array of GFNs
    (instead of a consecutive range of GFNs).

    Since per-frame errors are returned in an array, privcmd must set the
    MMAPBATCH_V1 error bits as part of the "report errors" phase, after
    all the frames are mapped.

    Migrate times are significantly improved (when using a PV toolstack
    domain). For example, for an idle 12 GiB PV guest:

    Before After
    real 0m38.179s 0m26.868s
    user 0m15.096s 0m13.652s
    sys 0m28.988s 0m18.732s

    Signed-off-by: David Vrabel
    Reviewed-by: Stefano Stabellini

    David Vrabel
     
  • Auto-translated physmap guests (arm, arm64 and x86 PVHVM/PVH) map and
    unmap foreign GFNs using the same method (updating the physmap).
    Unify the two arm and x86 implementations into one commont one.

    Note that on arm and arm64, the correct error code will be returned
    (instead of always -EFAULT) and map/unmap failure warnings are no
    longer printed. These changes are required if the foreign domain is
    paging (-ENOENT failures are expected and must be propagated up to the
    caller).

    Signed-off-by: David Vrabel
    Reviewed-by: Stefano Stabellini

    David Vrabel
     

28 Jan, 2015

2 commits


17 Dec, 2014

1 commit

  • Pull additional xen update from David Vrabel:
    "Xen: additional features for 3.19-rc0

    - Linear p2m for x86 PV guests which simplifies the p2m code,
    improves performance and will allow for > 512 GB PV guests in the
    future.

    A last-minute, configuration specific issue was discovered with this
    change which is why it was not included in my previous pull request.
    This is now been fixed and tested"

    * tag 'stable/for-linus-3.19-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen: switch to post-init routines in xen mmu.c earlier
    Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
    xen: annotate xen_set_identity_and_remap_chunk() with __init
    xen: introduce helper functions to do safe read and write accesses
    xen: Speed up set_phys_to_machine() by using read-only mappings
    xen: switch to linear virtual mapped sparse p2m list
    xen: Hide get_phys_to_machine() to be able to tune common path
    x86: Introduce function to get pmd entry pointer
    xen: Delay invalidating extra memory
    xen: Delay m2p_override initialization
    xen: Delay remapping memory of pv-domain
    xen: use common page allocation function in p2m.c
    xen: Make functions static
    xen: fix some style issues in p2m.c

    Linus Torvalds
     

11 Dec, 2014

2 commits

  • With the virtual mapped linear p2m list the post-init mmu operations
    must be used for setting up the p2m mappings, as in case of
    CONFIG_FLATMEM the init routines may trigger BUGs.

    paging_init() sets up all infrastructure needed to switch to the
    post-init mmu ops done by xen_post_allocator_init(). With the virtual
    mapped linear p2m list we need some mmu ops during setup of this list,
    so we have to switch to the correct mmu ops as soon as possible.

    The p2m list is usable from the beginning, just expansion requires to
    have established the new linear mapping. So the call of
    xen_remap_memory() had to be introduced, but this is not due to the
    mmu ops requiring this.

    Summing it up: calling xen_post_allocator_init() not directly after
    paging_init() was conceptually wrong in the beginning, it just didn't
    matter up to now as no functions used between the two calls needed
    some critical mmu ops (e.g. alloc_pte). This has changed now, so I
    corrected it.

    Reported-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Pull x86 vdso updates from Ingo Molnar:
    "Various vDSO updates from Andy Lutomirski, mostly cleanups and
    reorganization to improve maintainability, but also some
    micro-optimizations and robustization changes"

    * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86_64/vsyscall: Restore orig_ax after vsyscall seccomp
    x86_64: Add a comment explaining the TASK_SIZE_MAX guard page
    x86_64,vsyscall: Make vsyscall emulation configurable
    x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code
    x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none
    x86,vdso: Use LSL unconditionally for vgetcpu
    x86: vdso: Fix build with older gcc
    x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls
    x86_64/vdso: Remove jiffies from the vvar page
    x86/vdso: Make the PER_CPU segment 32 bits
    x86/vdso: Make the PER_CPU segment start out accessed
    x86/vdso: Change the PER_CPU segment to use struct desc_struct
    x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c
    x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c

    Linus Torvalds
     

04 Dec, 2014

4 commits

  • At start of the day the Xen hypervisor presents a contiguous mfn list
    to a pv-domain. In order to support sparse memory this mfn list is
    accessed via a three level p2m tree built early in the boot process.
    Whenever the system needs the mfn associated with a pfn this tree is
    used to find the mfn.

    Instead of using a software walked tree for accessing a specific mfn
    list entry this patch is creating a virtual address area for the
    entire possible mfn list including memory holes. The holes are
    covered by mapping a pre-defined page consisting only of "invalid
    mfn" entries. Access to a mfn entry is possible by just using the
    virtual base address of the mfn list and the pfn as index into that
    list. This speeds up the (hot) path of determining the mfn of a
    pfn.

    Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
    showed following improvements:

    Elapsed time: 32:50 -> 32:35
    System: 18:07 -> 17:47
    User: 104:00 -> 103:30

    Tested with following configurations:
    - 64 bit dom0, 8GB RAM
    - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
    - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
    the patch)
    - 32 bit domU, ballooning up and down
    - 32 bit domU, save and restore
    - 32 bit domU with PCI passthrough
    - 64 bit domU, 8 GB, 2049 MB, 5000 MB
    - 64 bit domU, ballooning up and down
    - 64 bit domU, save and restore
    - 64 bit domU with PCI passthrough

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Today get_phys_to_machine() is always called when the mfn for a pfn
    is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
    to be able to avoid calling get_phys_to_machine() when possible as
    soon as the switch to a linear mapped p2m list has been done.

    Signed-off-by: Juergen Gross
    Reviewed-by: David Vrabel
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Early in the boot process the memory layout of a pv-domain is changed
    to match the E820 map (either the host one for Dom0 or the Xen one)
    regarding placement of RAM and PCI holes. This requires removing memory
    pages initially located at positions not suitable for RAM and adding
    them later at higher addresses where no restrictions apply.

    To be able to operate on the hypervisor supported p2m list until a
    virtual mapped linear p2m list can be constructed, remapping must
    be delayed until virtual memory management is initialized, as the
    initial p2m list can't be extended unlimited at physical memory
    initialization time due to it's fixed structure.

    A further advantage is the reduction in complexity and code volume as
    we don't have to be careful regarding memory restrictions during p2m
    updates.

    Signed-off-by: Juergen Gross
    Reviewed-by: David Vrabel
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • In arch/x86/xen/p2m.c three different allocation functions for
    obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
    or __get_free_page(). Which of those functions is used depends on the
    progress of the boot process of the system.

    Introduce a common allocation routine selecting the to be called
    allocation routine dynamically based on the boot progress. This allows
    moving initialization steps without having to care about changing
    allocation calls.

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     

16 Nov, 2014

1 commit

  • With the dynamical mapping between cache modes and pgprot values it is
    now possible to use all cache modes via the Xen hypervisor PAT settings
    in a pv domain.

    All to be done is to read the PAT configuration MSR and set up the
    translation tables accordingly.

    Signed-off-by: Juergen Gross
    Reviewed-by: David Vrabel
    Reviewed-by: Konrad Rzeszutek Wilk
    Reviewed-by: Thomas Gleixner
    Cc: stefan.bader@canonical.com
    Cc: xen-devel@lists.xensource.com
    Cc: ville.syrjala@linux.intel.com
    Cc: jbeulich@suse.com
    Cc: toshi.kani@hp.com
    Cc: plagnioj@jcrosoft.com
    Cc: tomi.valkeinen@ti.com
    Cc: bhelgaas@google.com
    Link: http://lkml.kernel.org/r/1415019724-4317-19-git-send-email-jgross@suse.com
    Signed-off-by: Thomas Gleixner

    Juergen Gross
     

04 Nov, 2014

1 commit

  • This adds CONFIG_X86_VSYSCALL_EMULATION, guarded by CONFIG_EXPERT.
    Turning it off completely disables vsyscall emulation, saving ~3.5k
    for vsyscall_64.c, 4k for vsyscall_emu_64.S (the fake vsyscall
    page), some tiny amount of core mm code that supports a gate area,
    and possibly 4k for a wasted pagetable. The latter is because the
    vsyscall addresses are misaligned and fit poorly in the fixmap.

    Signed-off-by: Andy Lutomirski
    Reviewed-by: Josh Triplett
    Cc: Konrad Rzeszutek Wilk
    Link: http://lkml.kernel.org/r/406db88b8dd5f0cbbf38216d11be34bbb43c7eae.1414618407.git.luto@amacapital.net
    Signed-off-by: Thomas Gleixner

    Andy Lutomirski
     

23 Oct, 2014

1 commit

  • The 3 level p2m tree for the Xen tools is constructed very early at
    boot by calling xen_build_mfn_list_list(). Memory needed for this tree
    is allocated via extend_brk().

    As this tree (other than the kernel internal p2m tree) is only needed
    for domain save/restore, live migration and crash dump analysis it
    doesn't matter whether it is constructed very early or just some
    milliseconds later when memory allocation is possible by other means.

    This patch moves the call of xen_build_mfn_list_list() just after
    calling xen_pagetable_p2m_copy() simplifying this function, too, as it
    doesn't have to bother with two parallel trees now. The same applies
    for some other internal functions.

    While simplifying code, make early_can_reuse_p2m_middle() static and
    drop the unused second parameter. p2m_mid_identity_mfn can be removed
    as well, it isn't used either.

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     

23 Sep, 2014

1 commit

  • Since mfn_to_pfn() returns the correct PFN for identity mappings (as
    used for MMIO regions), the use of _PAGE_IOMAP is not required in
    pte_mfn_to_pfn().

    Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it
    in pte_mfn_to_pfn().

    This will allow _PAGE_IOMAP to be removed, making it available for
    future use.

    Signed-off-by: David Vrabel
    Reviewed-by: Konrad Rzeszutek Wilk

    David Vrabel
     

10 Sep, 2014

1 commit

  • When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
    modules exceeds 512 MiB, then loading modules fails with a warning
    (and hence a vmalloc allocation failure) because the PTEs for the
    newly-allocated vmalloc address space are not zero.

    WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
    vmap_page_range_noflush+0x2a1/0x360()

    This is caused by xen_setup_kernel_pagetables() copying
    level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present
    entries.

    Without KASLR, the normal kernel image size only covers the first half
    of level2_kernel_pgt and module space starts after that.

    L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel
    [256..511]->module
    [511]->level2_fixmap_pgt[ 0..505]->module

    This allows 512 MiB of of module vmalloc space to be used before
    having to use the corrupted level2_fixmap_pgt entries.

    With KASLR enabled, the kernel image uses the full PUD range of 1G and
    module space starts in the level2_fixmap_pgt. So basically:

    L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
    [511]->level2_fixmap_pgt[0..505]->module

    And now no module vmalloc space can be used without using the corrupt
    level2_fixmap_pgt entries.

    Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
    and setting level1_fixmap_pgt as read-only.

    A number of comments were also using the the wrong L3 offset for
    level2_kernel_pgt. These have been corrected.

    Signed-off-by: Stefan Bader
    Signed-off-by: David Vrabel
    Reviewed-by: Boris Ostrovsky
    Cc: stable@vger.kernel.org

    Stefan Bader
     

05 Jun, 2014

1 commit

  • Pull x86 cdso updates from Peter Anvin:
    "Vdso cleanups and improvements largely from Andy Lutomirski. This
    makes the vdso a lot less ''special''"

    * 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/vdso, build: Make LE access macros clearer, host-safe
    x86/vdso, build: Fix cross-compilation from big-endian architectures
    x86/vdso, build: When vdso2c fails, unlink the output
    x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
    x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
    x86, mm: Improve _install_special_mapping and fix x86 vdso naming
    mm, fs: Add vm_ops->name as an alternative to arch_vma_name
    x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
    x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
    x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
    x86, vdso: Move the 32-bit vdso special pages after the text
    x86, vdso: Reimplement vdso.so preparation in build-time C
    x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
    x86, vdso: Clean up 32-bit vs 64-bit vdso params
    x86, mm: Ensure correct alignment of the fixmap

    Linus Torvalds
     

27 May, 2014

1 commit

  • When running as a dom0 in PVH mode, foreign pfns that are accessed
    must be added to our p2m which is managed by xen. This is done via
    XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
    building guests and mapping guest memory, xentrace mapping xen pages,
    etc.

    Signed-off-by: Mukesh Rathor
    Signed-off-by: David Vrabel

    Mukesh Rathor
     

15 May, 2014

1 commit

  • _PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the
    pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate
    the MFN. If mfn_pte() is used instead, the p2m look up is avoided and
    the use of _PAGE_IOMAP is no longer needed.

    Signed-off-by: David Vrabel
    Reviewed-by: Konrad Rzeszutek Wilk
    Tested-by: Konrad Rzeszutek Wilk

    David Vrabel
     

06 May, 2014

1 commit


03 Apr, 2014

1 commit

  • Pull x86 vdso changes from Peter Anvin:
    "This is the revamp of the 32-bit vdso and the associated cleanups.

    This adds timekeeping support to the 32-bit vdso that we already have
    in the 64-bit vdso. Although 32-bit x86 is legacy, it is likely to
    remain in the embedded space for a very long time to come.

    This removes the traditional COMPAT_VDSO support; the configuration
    variable is reused for simply removing the 32-bit vdso, which will
    produce correct results but obviously suffer a performance penalty.
    Only one beta version of glibc was affected, but that version was
    unfortunately included in one OpenSUSE release.

    This is not the end of the vdso cleanups. Stefani and Andy have
    agreed to continue work for the next kernel cycle; in fact Andy has
    already produced another set of cleanups that came too late for this
    cycle.

    An incidental, but arguably important, change is that this ensures
    that unused space in the VVAR page is properly zeroed. It wasn't
    before, and would contain whatever garbage was left in memory by BIOS
    or the bootloader. Since the VVAR page is accessible to user space
    this had the potential of information leaks"

    * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    x86, vdso: Fix the symbol versions on the 32-bit vDSO
    x86, vdso, build: Don't rebuild 32-bit vdsos on every make
    x86, vdso: Actually discard the .discard sections
    x86, vdso: Fix size of get_unmapped_area()
    x86, vdso: Finish removing VDSO32_PRELINK
    x86, vdso: Move more vdso definitions into vdso.h
    x86: Load the 32-bit vdso in place, just like the 64-bit vdsos
    x86, vdso32: handle 32 bit vDSO larger one page
    x86, vdso32: Disable stack protector, adjust optimizations
    x86, vdso: Zero-pad the VVAR page
    x86, vdso: Add 32 bit VDSO time support for 64 bit kernel
    x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
    x86, vdso: Patch alternatives in the 32-bit VDSO
    x86, vdso: Introduce VVAR marco for vdso32
    x86, vdso: Cleanup __vdso_gettimeofday()
    x86, vdso: Replace VVAR(vsyscall_gtod_data) by gtod macro
    x86, vdso: __vdso_clock_gettime() cleanup
    x86, vdso: Revamp vclock_gettime.c
    mm: Add new func _install_special_mapping() to mmap.c
    x86, vdso: Make vsyscall_gtod_data handling x86 generic
    ...

    Linus Torvalds
     

25 Mar, 2014

1 commit

  • This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.

    PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
    is set and pseudo-physical addresses is _PAGE_PRESENT is clear.

    This is because during a domain save/restore (migration) the page
    table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
    converted to PFNs during domain save so that on a restore the page
    table entries may be rewritten with the new MFNs on the destination.
    This canonicalisation is only done for PTEs that are present.

    This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
    _PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be
    migrated as-is which would result in unexpected behaviour in the
    destination domain. Either a) the MFN would be translated to the
    wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
    because the MFN is no longer owned by the domain; or c) the present
    bit would not get set.

    Symptoms include "Bad page" reports when munmapping after migrating a
    domain.

    Signed-off-by: David Vrabel
    Acked-by: Konrad Rzeszutek Wilk
    Cc: [3.12+]

    David Vrabel
     

14 Mar, 2014

1 commit

  • Checkin

    b0b49f2673f0 x86, vdso: Remove compat vdso support

    ... removed the VDSO from the fixmap, and thus FIX_VDSO; remove a
    stray reference in Xen.

    Found by Fengguang Wu's test robot.

    Reported-by: Fengguang Wu
    Cc: Andy Lutomirski
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.net
    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     

11 Feb, 2014

1 commit

  • Steven Noonan forwarded a users report where they had a problem starting
    vsftpd on a Xen paravirtualized guest, with this in dmesg:

    BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067
    page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0
    page flags: 0x2ffc0000000014(referenced|dirty)
    addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74
    CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
    Call Trace:
    dump_stack+0x45/0x56
    print_bad_pte+0x22e/0x250
    unmap_single_vma+0x583/0x890
    unmap_vmas+0x65/0x90
    exit_mmap+0xc5/0x170
    mmput+0x65/0x100
    do_exit+0x393/0x9e0
    do_group_exit+0xcc/0x140
    SyS_exit_group+0x14/0x20
    system_call_fastpath+0x1a/0x1f
    Disabling lock debugging due to kernel taint
    BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
    BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1

    The issue could not be reproduced under an HVM instance with the same
    kernel, so it appears to be exclusive to paravirtual Xen guests. He
    bisected the problem to commit 1667918b6483 ("mm: numa: clear numa
    hinting information on mprotect") that was also included in 3.12-stable.

    The problem was related to how xen translates ptes because it was not
    accounting for the _PAGE_NUMA bit. This patch splits pte_present to add
    a pteval_present helper for use by xen so both bare metal and xen use
    the same code when checking if a PTE is present.

    [mgorman@suse.de: wrote changelog, proposed minor modifications]
    [akpm@linux-foundation.org: fix typo in comment]
    Reported-by: Steven Noonan
    Tested-by: Steven Noonan
    Signed-off-by: Elena Ufimtseva
    Signed-off-by: Mel Gorman
    Reviewed-by: David Vrabel
    Acked-by: Konrad Rzeszutek Wilk
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

31 Jan, 2014

1 commit

  • Pull x86 asmlinkage (LTO) changes from Peter Anvin:
    "This patchset adds more infrastructure for link time optimization
    (LTO).

    This patchset was pulled into my tree late because of a
    miscommunication (part of the patchset was picked up by other
    maintainers). However, the patchset is strictly build-related and
    seems to be okay in testing"

    * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, asmlinkage, xen: Fix type of NMI
    x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visible
    x86: Use inline assembler instead of global register variable to get sp
    x86, asmlinkage, paravirt: Make paravirt thunks global
    x86, asmlinkage, paravirt: Don't rely on local assembler labels
    x86, asmlinkage, lguest: Fix C functions used by inline assembler

    Linus Torvalds
     

30 Jan, 2014

1 commit

  • The paravirt thunks use a hack of using a static reference to a static
    function to reference that function from the top level statement.

    This assumes that gcc always generates static function names in a specific
    format, which is not necessarily true.

    Simply make these functions global and asmlinkage or __visible. This way the
    static __used variables are not needed and everything works.

    Functions with arguments are __visible to keep the register calling
    convention on 32bit.

    Changed in paravirt and in all users (Xen and vsmp)

    v2: Use __visible for functions with arguments

    Cc: Jeremy Fitzhardinge
    Cc: Ido Yariv
    Signed-off-by: Andi Kleen
    Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.org
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     

06 Jan, 2014

4 commits

  • We also optimize one - the TLB flush. The native operation would
    needlessly IPI offline VCPUs causing extra wakeups. Using the
    Xen one avoids that and lets the hypervisor determine which
    VCPU needs the TLB flush.

    Signed-off-by: Mukesh Rathor
    Signed-off-by: Konrad Rzeszutek Wilk

    Mukesh Rathor
     
  • .. which are surprisingly small compared to the amount for PV code.

    PVH uses mostly native mmu ops, we leave the generic (native_*) for
    the majority and just overwrite the baremetal with the ones we need.

    At startup, we are running with pre-allocated page-tables
    courtesy of the tool-stack. But we still need to graft them
    in the Linux initial pagetables. However there is no need to
    unpin/pin and change them to R/O or R/W.

    Note that the xen_pagetable_init due to 7836fec9d0994cc9c9150c5a33f0eb0eb08a335a
    "xen/mmu/p2m: Refactor the xen_pagetable_init code." does not
    need any changes - we just need to make sure that xen_post_allocator_init
    does not alter the pvops from the default native one.

    Signed-off-by: Mukesh Rathor
    Signed-off-by: Konrad Rzeszutek Wilk
    Acked-by: Stefano Stabellini

    Mukesh Rathor
     
  • Stefano noticed that the code runs only under 64-bit so
    the comments about 32-bit are pointless.

    Also we change the condition for xen_revector_p2m_tree
    returning the same value (because it could not allocate
    a swath of space to put the new P2M in) or it had been
    called once already. In such we return early from the
    function.

    Signed-off-by: Konrad Rzeszutek Wilk
    Acked-by: Stefano Stabellini

    Konrad Rzeszutek Wilk
     
  • The revectoring and copying of the P2M only happens when
    !auto-xlat and on 64-bit builds. It is not obvious from
    the code, so lets have seperate 32 and 64-bit functions.

    We also invert the check for auto-xlat to make the code
    flow simpler.

    Suggested-by: Stefano Stabellini
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     

15 Nov, 2013

1 commit

  • Pull Xen updates from Konrad Rzeszutek Wilk:
    "This has tons of fixes and two major features which are concentrated
    around the Xen SWIOTLB library.

    The short is that the tracing facility (just one function) has
    been added to SWIOTLB to make it easier to track I/O progress.
    Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver
    "is used to translate physical to machine and machine to physical
    addresses of foreign[guest] pages for DMA operations" (Stefano) when
    booting under hardware without proper IOMMU.

    There are also bug-fixes, cleanups, compile warning fixes, etc.

    The commit times for some of the commits is a bit fresh - that is b/c
    we wanted to make sure we have the Ack's from the ARM folks - which
    with the string of back-to-back conferences took a bit of time. Rest
    assured - the code has been stewing in #linux-next for some time.

    Features:
    - SWIOTLB has tracing added when doing bounce buffer.
    - Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to
    safely program real devices for DMA operations when running as a
    guest on Xen on ARM, without IOMMU support. [*1]
    - xen_raw_printk works with PVHVM guests if needed.

    Bug-fixes:
    - Make memory ballooning work under HVM with large MMIO region.
    - Inform hypervisor of MCFG regions found in ACPI DSDT.
    - Remove deprecated IRQF_DISABLED.
    - Remove deprecated __cpuinit.

    [*1]:
    "On arm and arm64 all Xen guests, including dom0, run with second
    stage translation enabled. As a consequence when dom0 programs a
    device for a DMA operation is going to use (pseudo) physical
    addresses instead machine addresses. This work introduces two trees
    to track physical to machine and machine to physical mappings of
    foreign pages. Local pages are assumed mapped 1:1 (physical address
    == machine address). It enables the SWIOTLB-Xen driver on ARM and
    ARM64, so that Linux can translate physical addresses to machine
    addresses for dma operations when necessary. " (Stefano)"

    * tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits)
    xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m
    arm,arm64/include/asm/io.h: define struct bio_vec
    swiotlb-xen: missing include dma-direction.h
    pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI
    arm: make SWIOTLB available
    xen: delete new instances of added __cpuinit
    xen/balloon: Set balloon's initial state to number of existing RAM pages
    xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
    xen: remove deprecated IRQF_DISABLED
    x86/xen: remove deprecated IRQF_DISABLED
    swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs
    swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary
    grant-table: call set_phys_to_machine after mapping grant refs
    arm,arm64: do not always merge biovec if we are running on Xen
    swiotlb: print a warning when the swiotlb is full
    swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device
    xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
    tracing/events: Fix swiotlb tracepoint creation
    swiotlb-xen: use xen_alloc/free_coherent_pages
    xen: introduce xen_alloc/free_coherent_pages
    ...

    Linus Torvalds