15 Apr, 2013
1 commit
-
Pull x86 fixes from Ingo Molnar:
"Misc fixes"* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
x86/mm/cpa/selftest: Fix false positive in CPA self test
x86/mm/cpa: Convert noop to functional fix
x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
11 Apr, 2013
1 commit
-
Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
updates" may cause a minor performance regression on
bare metal. This patch resolves that performance regression. It is
somewhat unclear to me if this is a good -stable candidate. ]Signed-off-by: Boris Ostrovsky
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com
Tested-by: Josh Boyer
Tested-by: Konrad Rzeszutek Wilk
Acked-by: Borislav Petkov
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: H. Peter Anvin
Cc: SEE NOTE ABOVE
03 Apr, 2013
1 commit
-
Occassionaly on a DL380 G4 the guest would crash quite early with this:
(XEN) d244:v0: unhandled page fault (ec=0003)
(XEN) Pagetable walk from ffffffff84dc7000:
(XEN) L4[0x1ff] = 00000000c3f18067 0000000000001789
(XEN) L3[0x1fe] = 00000000c3f14067 000000000000178d
(XEN) L2[0x026] = 00000000dc8b2067 0000000000004def
(XEN) L1[0x1c7] = 00100000dc8da067 0000000000004dc7
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 244 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.1.3OVM x86_64 debug=n Not tainted ]----
(XEN) CPU: 3
(XEN) RIP: e033:[]
(XEN) RFLAGS: 0000000000000216 EM: 1 CONTEXT: pv guest
(XEN) rax: 0000000000000000 rbx: ffffffff81785f88 rcx: 000000000000003f
(XEN) rdx: 0000000000000000 rsi: 00000000dc8da063 rdi: ffffffff84dc7000The offending code shows it to be a loop writting the value zero
(%rax) in the %rdi (the L4 provided by Xen) register:0: 44 00 00 add %r8b,(%rax)
3: 31 c0 xor %eax,%eax
5: b9 40 00 00 00 mov $0x40,%ecx
a: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
11: 00 00
13: ff c9 dec %ecx
15:* 48 89 07 mov %rax,(%rdi)
Signed-off-by: Konrad Rzeszutek Wilk
28 Mar, 2013
1 commit
-
We move the setting of write_cr3 from the early bootup variant
(see git commit 0cc9129d75ef8993702d97ab0e49542c15ac6ab9
"x86-64, xen, mmu: Provide an early version of write_cr3.")
to a more appropiate location.This new location sets all of the other non-early variants
of pvops calls - and most importantly is before the
alternative_asm mechanism kicks in.Signed-off-by: Konrad Rzeszutek Wilk
23 Feb, 2013
1 commit
-
With commit 8170e6bed465 ("x86, 64bit: Use a #PF handler to materialize
early mappings on demand") we started hitting an early bootup crash
where the Xen hypervisor would inform us that:(XEN) d7:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffea000005b2d0:
(XEN) L4[0x1d4] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 7 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.2.0 x86_64 debug=n Not tainted ]----.. that Xen was unable to context switch back to dom0.
Looking at the calling stack we find:
[] xen_get_user_pgd+0x5a ] xen_get_user_pgd+0x5a
[] xen_write_cr3+0x77
[] init_mem_mapping+0x1f9
[] setup_arch+0x742
[] printk+0x48We are trying to figure out whether we need to up-date the user PGD as
well. Please keep in mind that under 64-bit PV guests we have a limited
amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel
and user-space. As such the Linux pvops'fied version of write_cr3
checks if it has to update the user-space cr3 as well.That clearly is not needed during early bootup. The recent changes (see
above git commit) streamline the x86 page table allocation to be much
simpler (And also incidentally the #PF handler ends up in spirit being
similar to how the Xen toolstack sets up the initial page-tables).The fix is to have an early-bootup version of cr3 that just loads the
kernel %cr3. The later version - which also handles user-page
modifications will be used after the initial page tables have been
setup.[ hpa: removed a redundant #ifdef and made the new function __init.
Also note that x86-32 already has such an early xen_write_cr3. ]Tested-by: "H. Peter Anvin"
Reported-by: Konrad Rzeszutek Wilk
Signed-off-by: Konrad Rzeszutek Wilk
Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.com
Signed-off-by: H. Peter Anvin
Signed-off-by: Linus Torvalds
30 Jan, 2013
1 commit
-
Coming patches to x86/mm2 require the changes and advanced baseline in
x86/boot.Resolved Conflicts:
arch/x86/kernel/setup.c
mm/nobootmem.cSigned-off-by: H. Peter Anvin
14 Dec, 2012
1 commit
-
Pull Xen updates from Konrad Rzeszutek Wilk:
- Add necessary infrastructure to make balloon driver work under ARM.
- Add /dev/xen/privcmd interfaces to work with ARM and PVH.
- Improve Xen PCIBack wild-card parsing.
- Add Xen ACPI PAD (Processor Aggregator) support - so can offline/
online sockets depending on the power consumption.
- PVHVM + kexec = use an E820_RESV region for the shared region so we
don't overwrite said region during kexec reboot.
- Cleanups, compile fixes.Fix up some trivial conflicts due to the balloon driver now working on
ARM, and there were changes next to the previous work-arounds that are
now gone.* tag 'stable/for-linus-3.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/PVonHVM: fix compile warning in init_hvm_pv_info
xen: arm: implement remap interfaces needed for privcmd mappings.
xen: correctly use xen_pfn_t in remap_domain_mfn_range.
xen: arm: enable balloon driver
xen: balloon: allow PVMMU interfaces to be compiled out
xen: privcmd: support autotranslated physmap guests.
xen: add pages parameter to xen_remap_domain_mfn_range
xen/acpi: Move the xen_running_on_version_or_later function.
xen/xenbus: Remove duplicate inclusion of asm/xen/hypervisor.h
xen/acpi: Fix compile error by missing decleration for xen_domain.
xen/acpi: revert pad config check in xen_check_mwait
xen/acpi: ACPI PAD driver
xen-pciback: reject out of range inputs
xen-pciback: simplify and tighten parsing of device IDs
xen PVonHVM: use E820_Reserved area for shared_info
29 Nov, 2012
2 commits
-
For Xen on ARM a PFN is 64 bits so we need to use the appropriate
type here.Signed-off-by: Ian Campbell
Acked-by: Stefano Stabellini
Signed-off-by: Konrad Rzeszutek Wilk
[v2: include the necessary header,
Reported-by: Fengguang Wu ] -
Also introduce xen_unmap_domain_mfn_range. These are the parts of
Mukesh's "xen/pvh: Implement MMU changes for PVH" which are also
needed as a baseline for ARM privcmd support.The original patch was:
Signed-off-by: Mukesh Rathor
Signed-off-by: Konrad Rzeszutek WilkThis derivative is also:
Signed-off-by: Ian Campbell
18 Nov, 2012
1 commit
-
Page table area are pre-mapped now after
x86, mm: setup page table in top-down
x86, mm: Remove early_memremap workaround for page table accessing on 64bitmapping_pagetable_reserve is not used anymore, so remove it.
Also remove operation in mask_rw_pte(), as modified allow_low_page
always return pages that are already mapped, moreover
xen_alloc_pte_init, xen_alloc_pmd_init, etc, will mark the page RO
before hooking it into the pagetable automatically.-v2: add changelog about mask_rw_pte() from Stefano.
Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1353123563-3103-27-git-send-email-yinghai@kernel.org
Cc: Stefano Stabellini
Signed-off-by: H. Peter Anvin
01 Nov, 2012
1 commit
-
As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
hypervisor to do a TLB flush on all active vCPUs. If instead
we were using the generic one (which ends up being xen_flush_tlb)
we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
before we make that hypercall the kernel will IPI all of the
vCPUs (even those that were asleep from the hypervisor
perspective). The end result is that we needlessly wake them
up and do a TLB flush when we can just let the hypervisor
do it correctly.This patch gives around 50% speed improvement when migrating
idle guest's from one host to another.Oracle-bug: 14630170
CC: stable@vger.kernel.org
Tested-by: Jingjie Jiang
Suggested-by: Mukesh Rathor
Signed-off-by: Konrad Rzeszutek Wilk
12 Oct, 2012
1 commit
-
Pull Xen fixes from Konrad Rzeszutek Wilk:
"This has four bug-fixes and one tiny feature that I forgot to put
initially in my tree due to oversight.The feature is for kdump kernels to speed up the /proc/vmcore reading.
There is a ram_is_pfn helper function that the different platforms can
register for. We are now doing that.The bug-fixes cover some embarrassing struct pv_cpu_ops variables
being set to NULL on Xen (but not baremetal). We had a similar issue
in the past with {write|read}_msr_safe and this fills the three
missing ones. The other bug-fix is to make the console output (hvc)
be capable of dealing with misbehaving backends and not fall flat on
its face. Lastly, a quirk for older XenBus implementations that came
with an ancient v3.4 hypervisor (so RHEL5 based) - reading of certain
non-existent attributes just hangs the guest during bootup - so we
take precaution of not doing that on such older installations.Feature:
- Register a pfn_is_ram helper to speed up reading of /proc/vmcore.
Bug-fixes:
- Three pvops call for Xen were undefined causing BUG_ONs.
- Add a quirk so that the shutdown watches (used by kdump) are not
used with older Xen (3.4).
- Fix ungraceful state transition for the HVC console."* tag 'stable/for-linus-3.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches.
xen/bootup: allow {read|write}_cr8 pvops call.
xen/bootup: allow read_tscp call for Xen PV guests.
xen pv-on-hvm: add pfn_is_ram helper for kdump
xen/hvc: handle backend CLOSED without CLOSING
09 Oct, 2012
1 commit
-
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAPThis patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov
Cc: Alexander Viro
Cc: Carsten Otte
Cc: Chris Metcalf
Cc: Cyrill Gorcunov
Cc: Eric Paris
Cc: H. Peter Anvin
Cc: Hugh Dickins
Cc: Ingo Molnar
Cc: James Morris
Cc: Jason Baron
Cc: Kentaro Takeda
Cc: Matt Helsley
Cc: Nick Piggin
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Robert Richter
Cc: Suresh Siddha
Cc: Tetsuo Handa
Cc: Venkatesh Pallipadi
Acked-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Oct, 2012
1 commit
-
Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump
kernel. See commit message of 997c136f518c ("fs/proc/vmcore.c: add hook
to read_from_oldmem() to check for non-ram pages") for details.It makes use of a new hvmop HVMOP_get_mem_type which was introduced in
xen 4.2 (23298:26413986e6e0) and backported to 4.1.1.The new function is currently only enabled for reading /proc/vmcore.
Later it will be used also for the kexec kernel. Since that requires
more changes in the generic kernel make it static for the time being.Signed-off-by: Olaf Hering
Signed-off-by: Konrad Rzeszutek Wilk
12 Sep, 2012
6 commits
-
* stable/128gb.v5.1:
xen/mmu: If the revector fails, don't attempt to revector anything else.
xen/p2m: When revectoring deal with holes in the P2M array.
xen/mmu: Release just the MFN list, not MFN list and part of pagetables.
xen/mmu: Remove from __ka space PMD entries for pagetables.
xen/mmu: Copy and revector the P2M tree.
xen/p2m: Add logic to revector a P2M tree to use __va leafs.
xen/mmu: Recycle the Xen provided L4, L3, and L2 pages
xen/mmu: For 64-bit do not call xen_map_identity_early
xen/mmu: use copy_page instead of memcpy.
xen/mmu: Provide comments describing the _ka and _va aliasing issue
xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything.
Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas."
xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain.
xen/x86: Use memblock_reserve for sensitive areas.
xen/p2m: Fix the comment describing the P2M tree.Conflicts:
arch/x86/xen/mmu.cThe pagetable_init is the old xen_pagetable_setup_done and xen_pagetable_setup_start
rolled in one.Signed-off-by: Konrad Rzeszutek Wilk
-
…/tip into stable/for-linus-3.7
* 'x86/platform' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (9690 commits)
x86: Document x86_init.paging.pagetable_init()
x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done()
x86: Move paging_init() call to x86_init.paging.pagetable_init()
x86: Rename pagetable_setup_start() to pagetable_init()
x86: Remove base argument from x86_init.paging.pagetable_setup_start
Linux 3.6-rc5
HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
Remove user-triggerable BUG from mpol_to_str
xen/pciback: Fix proper FLR steps.
uml: fix compile error in deliver_alarm()
dj: memory scribble in logi_dj
Fix order of arguments to compat_put_time[spec|val]
xen: Use correct masking in xen_swiotlb_alloc_coherent.
xen: fix logical error in tlb flushing
xen/p2m: Fix one-off error in checking the P2M tree directory.
powerpc: Don't use __put_user() in patch_instruction
powerpc: Make sure IPI handlers see data written by IPI senders
powerpc: Restore correct DSCR in context switch
powerpc: Fix DSCR inheritance in copy_thread()
powerpc: Keep thread.dscr and thread.dscr_inherit in sync
... -
At this stage x86_init.paging.pagetable_setup_done is only used in the
XEN case. Move its content in the x86_init.paging.pagetable_init setup
function and remove the now unused x86_init.paging.pagetable_setup_done
remaining infrastructure.Signed-off-by: Attilio Rao
Acked-by:
Cc:
Cc:
Cc:
Link: http://lkml.kernel.org/r/1345580561-8506-5-git-send-email-attilio.rao@citrix.com
Signed-off-by: Thomas Gleixner -
Move the paging_init() call to the platform specific pagetable_init()
function, so we can get rid of the extra pagetable_setup_done()
function pointer.Signed-off-by: Attilio Rao
Acked-by:
Cc:
Cc:
Cc:
Link: http://lkml.kernel.org/r/1345580561-8506-4-git-send-email-attilio.rao@citrix.com
Signed-off-by: Thomas Gleixner -
In preparation for unifying the pagetable_setup_start() and
pagetable_setup_done() setup functions, rename appropriately all the
infrastructure related to pagetable_setup_start().Signed-off-by: Attilio Rao
Ackedd-by:
Cc:
Cc:
Cc:
Link: http://lkml.kernel.org/r/1345580561-8506-3-git-send-email-attilio.rao@citrix.com
Signed-off-by: Thomas Gleixner -
We either use swapper_pg_dir or the argument is unused. Preparatory
patch to simplify platform pagetable setup further.Signed-off-by: Attilio Rao
Ackedb-by:
Cc:
Cc:
Cc:
Link: http://lkml.kernel.org/r/1345580561-8506-2-git-send-email-attilio.rao@citrix.com
Signed-off-by: Thomas Gleixner
06 Sep, 2012
1 commit
-
Callers of xen_remap_domain_range() need to know if the remap failed
because frame is currently paged out. So they can retry the remap
later on. Return -ENOENT in this case.This assumes that the error codes returned by Xen are a subset of
those used by the kernel. It is unclear if this is defined as part of
the hypercall ABI.Acked-by: Andres Lagar-Cavilla
Signed-off-by: David Vrabel
Signed-off-by: Konrad Rzeszutek Wilk
05 Sep, 2012
1 commit
-
While TLB_FLUSH_ALL gets passed as 'end' argument to
flush_tlb_others(), the Xen code was made to check its 'start'
parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI
instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not
be flushed from TLB.This patch fixed this issue.
Reported-by: Jan Beulich
Signed-off-by: Alex Shi
Acked-by: Jan Beulich
Tested-by: Yongjie Ren
Signed-off-by: Konrad Rzeszutek Wilk
23 Aug, 2012
10 commits
-
If the P2M revectoring would fail, we would try to continue on by
cleaning the PMD for L1 (PTE) page-tables. The xen_cleanhighmap
is greedy and erases the PMD on both boundaries. Since the P2M
array can share the PMD, we would wipe out part of the __ka
that is still used in the P2M tree to point to P2M leafs.This fixes it by bypassing the revectoring and continuing on.
If the revector fails, a nice WARN is printed so we can still
troubleshoot this.Signed-off-by: Konrad Rzeszutek Wilk
-
We call memblock_reserve for [start of mfn list] -> [PMD aligned end
of mfn list] instead of -> -
Please first read the description in "xen/mmu: Copy and revector the
P2M tree."At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).The xen_remove_p2m_tree and code around has ripped out the __ka for
the old P2M array.Here we continue on doing it to where the Xen page-tables were.
It is safe to do it, as the page-tables are addressed using __va.
For good measure we delete anything that is within MODULES_VADDR
and up to the end of the PMD.At this point the __ka only contains PMD entries for the start
of the kernel up to __brk.[v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug]
Signed-off-by: Konrad Rzeszutek Wilk -
Please first read the description in "xen/p2m: Add logic to revector a
P2M tree to use __va leafs" patch.The 'xen_revector_p2m_tree()' function allocates a new P2M tree
copies the contents of the old one in it, and returns the new one.At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).We have revectored the P2M tree (and the one for save/restore as well)
to use new shiny __va address to new MFNs. The xen_start_info
has been taken care of already in 'xen_setup_kernel_pagetable()' and
xen_start_info->shared_info in 'xen_setup_shared_info()', so
we are free to roam and delete PMD entries - which is exactly what
we are going to do. We rip out the __ka for the old P2M array.[v1: Fix smatch warnings]
[v2: memset was doing 0 instead of 0xff]
Signed-off-by: Konrad Rzeszutek Wilk -
As we are not using them. We end up only using the L1 pagetables
and grafting those to our page-tables.[v1: Per Stefano's suggestion squashed two commits]
[v2: Per Stefano's suggestion simplified loop]
[v3: Fix smatch warnings]
[v4: Add more comments]
Acked-by: Stefano Stabellini
Signed-off-by: Konrad Rzeszutek Wilk -
B/c we do not need it. During the startup the Xen provides
us with all the initial memory mapped that we need to function.The initial memory mapped is up to the bootstack, which means
we can reference using __ka up to 4.f):(from xen/interface/xen.h):
4. This the order of bootstrap elements in the initial virtual region:
a. relocated kernel image
b. initial ram disk [mod_start, mod_len]
c. list of allocated page frames [mfn_list, nr_pages]
d. start_info_t structure [register ESI (x86)]
e. bootstrap page tables [pt_base, CR3 (x86)]
f. bootstrap stack [register ESP (x86)](initial ram disk may be ommitted).
[v1: More comments in git commit]
Acked-by: Stefano Stabellini
Signed-off-by: Konrad Rzeszutek Wilk -
After all, this is what it is there for.
Acked-by: Jan Beulich
Signed-off-by: Konrad Rzeszutek Wilk -
Which is that the level2_kernel_pgt (__ka virtual addresses)
and level2_ident_pgt (__va virtual address) contain the same
PMD entries. So if you modify a PTE in __ka, it will be reflected
in __va (and vice-versa).Signed-off-by: Konrad Rzeszutek Wilk
-
We don't need to return the new PGD - as we do not use it.
Acked-by: Stefano Stabellini
Signed-off-by: Konrad Rzeszutek Wilk -
This patch removes the "return -ENOSYS" for auto_translated_physmap
guests from privcmd_mmap, thus it allows ARM guests to issue privcmd
mmap calls. However privcmd mmap calls are still going to fail for HVM
and hybrid guests on x86 because the xen_remap_domain_mfn_range
implementation is currently PV only.Changes in v2:
- better commit message;
- return -EINVAL from xen_remap_domain_mfn_range if
auto_translated_physmap.Signed-off-by: Stefano Stabellini
Acked-by: Konrad Rzeszutek Wilk
Signed-off-by: Konrad Rzeszutek Wilk
27 Jul, 2012
1 commit
-
Pull x86/mm changes from Peter Anvin:
"The big change here is the patchset by Alex Shi to use INVLPG to flush
only the affected pages when we only need to flush a small page range.It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
vectors!) and replace it with an ordinary IPI function call."Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
to changed line)* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tlb: Fix build warning and crash when building for !SMP
x86/tlb: do flush_tlb_kernel_range by 'invlpg'
x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
x86/tlb: enable tlb flush range support for x86
mm/mmu_gather: enable tlb flush range in generic mmu_gather
x86/tlb: add tlb_flushall_shift knob into debugfs
x86/tlb: add tlb_flushall_shift for specific CPU
x86/tlb: fall back to flush all when meet a THP large page
x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86/tlb_info: get last level TLB entry number of CPU
x86: Add read_mostly declaration/definition to variables from smp.h
x86: Define early read-mostly per-cpu macros
20 Jul, 2012
2 commits
-
When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out. In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).Signed-off-by: David Vrabel
Signed-off-by: Konrad Rzeszutek Wilk -
In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE). Instead, do one mmu_update hypercall
directly.During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.This significantly improves page fault performance in 32-bit PV
guests.lmbench3 test Before After Improvement
----------------------------------------------
lat_pagefault 3.18 us 2.32 us 27%
lat_proc fork 356 us 313.3 us 11%Signed-off-by: David Vrabel
Signed-off-by: Konrad Rzeszutek Wilk
28 Jun, 2012
1 commit
-
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.So, on a 512 4KB TLB entries CPU, the balance points is at:
(512 - X) * 100ns(assumed TLB refill cost) =
X(TLB flush entries) * 100ns(assumed invlpg cost)Here, X is 256, that is 1/2 of 512 entries.
But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':multi thread testing, '-t' paramter is thread number:
with patch unpatched 3.4-rc4
./mprotect -t 1 14ns 24ns
./mprotect -t 2 13ns 22ns
./mprotect -t 4 12ns 19ns
./mprotect -t 8 14ns 16ns
./mprotect -t 16 28ns 26ns
./mprotect -t 32 54ns 51ns
./mprotect -t 128 200ns 199nsSingle process with sequencial flushing and memory accessing:
with patch unpatched 3.4-rc4
./mprotect 7ns 11ns
./mprotect -p 4096 -l 8 -n 10240
21ns 21ns[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
has additional performance numbers. ]Signed-off-by: Alex Shi
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin
25 May, 2012
1 commit
-
Pull Xen updates from Konrad Rzeszutek Wilk:
"Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector
support so that 'perf' can work properly.
* Fix self-ballooning code, and balloon logic when booting as initial
domain.
* Move array printing code to generic debugfs
* Support XenBus domains.
* Lazily free grants when a domain is dead/non-existent.
* In M2P code use batching calls
Bug-fixes:
* Fix NULL dereference in allocation failure path (hvc_xen)
* Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
* Fix HVM guest resume - we would leak an PIRQ value instead of
reusing the existing one."Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: do not map the same GSI twice in PVHVM guests.
hvc_xen: NULL dereference on allocation failure
xen: Add selfballoning memory reservation tunable.
xenbus: Add support for xenbus backend in stub domain
xen/smp: unbind irqworkX when unplugging vCPUs.
xen: enter/exit lazy_mmu_mode around m2p_override calls
xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
xen: implement IRQ_WORK_VECTOR handler
xen: implement apic ipi interface
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/gnttab: add deferred freeing logic
debugfs: Add support to print u32 array in debugfs
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
23 May, 2012
1 commit
-
Pull x86/apic changes from Ingo Molnar:
"Most of the changes are about helping virtualized guest kernels
achieve better performance."Fix up trivial conflicts with the iommu updates to arch/x86/kernel/apic/io_apic.c
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Implement EIO micro-optimization
x86/apic: Add apic->eoi_write() callback
x86/apic: Use symbolic APIC_EOI_ACK
x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it
x86/xen/apic: Add missing #include
x86/apic: Only compile local function if used with !CONFIG_GENERIC_PENDING_IRQ
x86/apic: Fix UP boot crash
x86: Conditionally update time when ack-ing pending irqs
xen/apic: implement io apic read with hypercall
Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'"
xen/x86: Implement x86_apic_ops
x86/apic: Replace io_apic_ops with x86_io_apic_ops.
08 May, 2012
2 commits
-
* stable/autoballoon.v5.2:
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage. -
In xen_memory_setup(), if a page that is being released has a VA
mapping this must also be updated. Otherwise, the page will be not
released completely -- it will still be referenced in Xen and won't be
freed util the mapping is removed and this prevents it from being
reallocated at a different PFN.This was already being done for the ISA memory region in
xen_ident_map_ISA() but on many systems this was omitting a few pages
as many systems marked a few pages below the ISA memory region as
reserved in the e820 map.This fixes errors such as:
(XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)Signed-off-by: David Vrabel
Signed-off-by: Konrad Rzeszutek Wilk