11 Sep, 2015
2 commits
-
Merge third patch-bomb from Andrew Morton:
- even more of the rest of MM
- lib/ updates
- checkpatch updates
- small changes to a few scruffy filesystems
- kmod fixes/cleanups
- kexec updates
- a dma-mapping cleanup series from hch
* emailed patches from Andrew Morton : (81 commits)
dma-mapping: consolidate dma_set_mask
dma-mapping: consolidate dma_supported
dma-mapping: cosolidate dma_mapping_error
dma-mapping: consolidate dma_{alloc,free}_noncoherent
dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd()
mm: make sure all file VMAs have ->vm_ops set
mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()
mm: mark most vm_operations_struct const
namei: fix warning while make xmldocs caused by namei.c
ipc: convert invalid scenarios to use WARN_ON
zlib_deflate/deftree: remove bi_reverse()
lib/decompress_unlzma: Do a NULL check for pointer
lib/decompressors: use real out buf size for gunzip with kernel
fs/affs: make root lookup from blkdev logical size
sysctl: fix int -> unsigned long assignments in INT_MIN case
kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo
kexec: align crash_notes allocation to make it be inside one physical page
kexec: remove unnecessary test in kimage_alloc_crash_control_pages()
kexec: split kexec_load syscall from kexec core code
... -
In the scope of the idle memory tracking feature, which is introduced by
the following patch, we need to clear the referenced/accessed bit not only
in primary, but also in secondary ptes. The latter is required in order
to estimate wss of KVM VMs. At the same time we want to avoid flushing
tlb, because it is quite expensive and it won't really affect the final
result.Currently, there is no function for clearing pte young bit that would meet
our requirements, so this patch introduces one. To achieve that we have
to add a new mmu-notifier callback, clear_young, since there is no method
for testing-and-clearing a secondary pte w/o flushing tlb. The new method
is not mandatory and currently only implemented by KVM.Signed-off-by: Vladimir Davydov
Reviewed-by: Andres Lagar-Cavilla
Acked-by: Paolo Bonzini
Cc: Minchan Kim
Cc: Raghavendra K T
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Greg Thelen
Cc: Michel Lespinasse
Cc: David Rientjes
Cc: Pavel Emelyanov
Cc: Cyrill Gorcunov
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Sep, 2015
1 commit
-
We were taking the exit path after checking ue->flags and return value
of setup_routing_entry(), but 'e' was not freed incase of a failure.Signed-off-by: Sudip Mukherjee
Signed-off-by: Paolo Bonzini
06 Sep, 2015
3 commits
-
Tracepoint for dynamic halt_pool_ns, fired on every potential change.
Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini -
There is a downside of always-poll since poll is still happened for idle
vCPUs which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, to grow halt_poll_ns when shot halt is detected,
and to shrink halt_poll_ns when long halt is detected.There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.no-poll always-poll dynamic-poll
-----------------------------------------------------------------------
Idle (nohz) vCPU %c0 0.15% 0.3% 0.2%
Idle (250HZ) vCPU %c0 1.1% 4.6%~14% 1.2%
TCP_RR latency 34us 27us 26.7us"Idle (X) vCPU %c0" is the percent of time the physical cpu spent in
c0 over 60 seconds (each vCPU is pinned to a pCPU). (nohz) means the
guest was tickless. (250HZ) means the guest was ticking at 250HZ.The big win is with ticking operating systems. Running the linux guest
with nohz=off (and HZ=250), we save 3.4%~12.8% CPUs/second and get close
to no-polling overhead levels by using the dynamic-poll. The savings
should be even higher for higher frequency ticks.Suggested-by: David Matlack
Signed-off-by: Wanpeng Li
[Simplify the patch. - Paolo]
Signed-off-by: Paolo Bonzini -
Change halt_poll_ns into per-VCPU variable, seeded from module parameter,
to allow greater flexibility.Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini
23 Aug, 2015
1 commit
-
Patch queue for ppc - 2015-08-22
Highlights for KVM PPC this time around:
- Book3S: A few bug fixes
- Book3S: Allow micro-threading on POWER8
12 Aug, 2015
7 commits
-
In order to remove the crude hack where we sneak the masked bit
into the timer's control register, make use of the phys_irq_map
API control the active state of the interrupt.This causes some limited changes to allow for potential error
propagation.Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
Virtual interrupts mapped to a HW interrupt should only be triggered
from inside the kernel. Otherwise, you could end up confusing the
kernel (and the GIC's) state machine.Rearrange the injection path so that kvm_vgic_inject_irq is
used for non-mapped interrupts, and kvm_vgic_inject_mapped_irq is
used for mapped interrupts. The latter should only be called from
inside the kernel (timer, irqfd).Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
In order to control the active state of an interrupt, introduce
a pair of accessors allowing the state to be set/queried.This only affects the logical state, and the HW state will only be
applied at world-switch time.Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
To allow a HW interrupt to be injected into a guest, we lookup the
guest virtual interrupt in the irq_phys_map list, and if we have
a match, encode both interrupts in the LR.We also mark the interrupt as "active" at the host distributor level.
On guest EOI on the virtual interrupt, the host interrupt will be
deactivated.Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
In order to be able to feed physical interrupts to a guest, we need
to be able to establish the virtual-physical mapping between the two
worlds.The mappings are kept in a set of RCU lists, indexed by virtual interrupts.
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
We only set the irq_queued flag for level interrupts, meaning
that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
for all interrupts.This will allow us to inject edge HW interrupts, for which the
state ACTIVE+PENDING is not allowed.Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier -
Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
field, we can encode that information into the list registers.This patch provides implementations for both GICv2 and GICv3.
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
30 Jul, 2015
1 commit
-
Signed-off-by: Paolo Bonzini
29 Jul, 2015
1 commit
-
This is another remnant of ia64 support.
Signed-off-by: Paolo Bonzini
10 Jul, 2015
1 commit
-
If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.Reviewed-by: Alex Williamson
Tested-by: Joerg Roedel
Signed-off-by: Paolo Bonzini
04 Jul, 2015
1 commit
-
Commit 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
had two problems. First, the preempt-notifier API needs to sleep with the
addition of the static_key, we do however need to hold off preemption
while modifying the preempt notifier list, otherwise a preemption could
observe an inconsistent list state. KVM correctly registers and
unregisters preempt notifiers with preemption disabled, so the sleep
caused dmesg splats.Second, KVM registers and unregisters preemption notifiers very often
(in vcpu_load/vcpu_put). With a single uniprocessor guest the static key
would move between 0 and 1 continuously, hitting the slow path on every
userspace exit.To fix this, wrap the static_key inc/dec in a new API, and call it from
KVM.Fixes: 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
Reported-by: Pontus Fuchs
Reported-by: Takashi Iwai
Tested-by: Takashi Iwai
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Paolo Bonzini
25 Jun, 2015
1 commit
-
Pull arm64 updates from Catalin Marinas:
"Mostly refactoring/clean-up:- CPU ops and PSCI (Power State Coordination Interface) refactoring
following the merging of the arm64 ACPI support, together with
handling of Trusted (secure) OS instances- Using fixmap for permanent FDT mapping, removing the initial dtb
placement requirements (within 512MB from the start of the kernel
image). This required moving the FDT self reservation out of the
memreserve processing- Idmap (1:1 mapping used for MMU on/off) handling clean-up
- Removing flush_cache_all() - not safe on ARM unless the MMU is off.
Last stages of CPU power down/up are handled by firmware already- "Alternatives" (run-time code patching) refactoring and support for
immediate branch patching, GICv3 CPU interface access- User faults handling clean-up
And some fixes:
- Fix for VDSO building with broken ELF toolchains
- Fix another case of init_mm.pgd usage for user mappings (during
ASID roll-over broadcasting)- Fix for FPSIMD reloading after CPU hotplug
- Fix for missing syscall trace exit
- Workaround for .inst asm bug
- Compat fix for switching the user tls tpidr_el0 register"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
arm64: use private ratelimit state along with show_unhandled_signals
arm64: show unhandled SP/PC alignment faults
arm64: vdso: work-around broken ELF toolchains in Makefile
arm64: kernel: rename __cpu_suspend to keep it aligned with arm
arm64: compat: print compat_sp instead of sp
arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
arm64: entry: fix context tracking for el0_sp_pc
arm64: defconfig: enable memtest
arm64: mm: remove reference to tlb.S from comment block
arm64: Do not attempt to use init_mm in reset_context()
arm64: KVM: Switch vgic save/restore to alternative_insn
arm64: alternative: Introduce feature for GICv3 CPU interface
arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning
arm64: fix bug for reloading FPSIMD state after CPU hotplug.
arm64: kernel thread don't need to save fpsimd context.
arm64: fix missing syscall trace exit
arm64: alternative: Work around .inst assembler bugs
arm64: alternative: Merge alternative-asm.h into alternative.h
arm64: alternative: Allow immediate branch as alternative instruction
arm64: Rework alternate sequence for ARM erratum 845719
...
19 Jun, 2015
4 commits
-
Tabs rather than spaces
Signed-off-by: Kevin Mulvey
Signed-off-by: Paolo Bonzini -
fix brace spacing
Signed-off-by: Kevin Mulvey
Signed-off-by: Paolo Bonzini -
The allocation size of the kvm_irq_routing_table depends on
the number of irq routing entries because they are all
allocated with one kzalloc call.When the irq routing table gets bigger this requires high
order allocations which fail from time to time:qemu-kvm: page allocation failure: order:4, mode:0xd0
This patch fixes this issue by breaking up the allocation of
the table and its entries into individual kzalloc calls.
These could all be satisfied with order-0 allocations, which
are less likely to fail.The downside of this change is the lower performance, because
of more calls to kzalloc. But given how often kvm_set_irq_routing
is called in the lifetime of a guest, it doesn't really
matter much.Signed-off-by: Joerg Roedel
[Avoid sparse warning through rcu_access_pointer. - Paolo]
Signed-off-by: Paolo Bonzini -
KVM/ARM changes for v4.2:
- Proper guest time accounting
- FP access fix for 32bit
- The usual pile of GIC fixes
- PSCI fixes
- Random cleanups
18 Jun, 2015
1 commit
-
Back in the days, vgic.c used to have an intimate knowledge of
the actual GICv2. These days, this has been abstracted away into
hardware-specific backends.Remove the now useless arm-gic.h #include directive, making it
clear that GICv2 specific code doesn't belong here.Signed-off-by: Marc Zyngier
17 Jun, 2015
2 commits
-
Commit fd1d0ddf2ae9 (KVM: arm/arm64: check IRQ number on userland
injection) rightly limited the range of interrupts userspace can
inject in a guest, but failed to consider the (unlikely) case where
a guest is configured with 1024 interrupts.In this case, interrupts ranging from 1020 to 1023 are unuseable,
as they have a special meaning for the GIC CPU interface.Make sure that these number cannot be used as an IRQ. Also delete
a redundant (and similarily buggy) check in kvm_set_irq.Reported-by: Peter Maydell
Cc: Andre Przywara
Cc: # 4.1, 4.0, 3.19, 3.18
Signed-off-by: Marc Zyngier -
If a GICv3-enabled guest tries to configure Group0, we print a
warning on the console (because we don't support Group0 interrupts).This is fairly pointless, and would allow a guest to spam the
console. Let's just drop the warning.Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier
12 Jun, 2015
1 commit
-
So far, we configured the world-switch by having a small array
of pointers to the save and restore functions, depending on the
GIC used on the platform.Loading these values each time is a bit silly (they never change),
and it makes sense to rely on the instruction patching instead.This leads to a nice cleanup of the code.
Acked-by: Will Deacon
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier
Signed-off-by: Catalin Marinas
10 Jun, 2015
1 commit
-
Commit 47a98b15ba7c ("arm/arm64: KVM: support for un-queuing active
IRQs") introduced handling of the GICD_I[SC]ACTIVER registers,
but only for the GICv2 emulation. For the sake of completeness and
as this is a pre-requisite for save/restore of the GICv3 distributor
state, we should also emulate their handling in the distributor and
redistributor frames of an emulated GICv3.Acked-by: Christoffer Dall
Signed-off-by: Andre Przywara
Signed-off-by: Marc Zyngier
05 Jun, 2015
2 commits
-
Only two ioctls have to be modified; the address space id is
placed in the higher 16 bits of their slot id argument.As of this patch, no architecture defines more than one
address space; x86 will be the first.Reviewed-by: Radim Krčmář
Signed-off-by: Paolo Bonzini -
We need to hide SMRAM from guests not running in SMM. Therefore, all
uses of kvm_read_guest* and kvm_write_guest* must be changed to use
different address spaces, depending on whether the VCPU is in system
management mode. We need to introduce a new family of functions for
this purpose.For now, the VCPU-based functions have the same behavior as the
existing per-VM ones, they just accept a different type for the
first argument. Later however they will be changed to use one of many
"struct kvm_memslots" stored in struct kvm, through an architecture hook.
VM-based functions will unconditionally use the first memslots pointer.Whenever possible, this patch introduces slot-based functions with an
__ prefix, with two wrappers for generic and vcpu-based actions.
The exceptions are kvm_read_guest and kvm_write_guest, which are copied
into the new functions kvm_vcpu_read_guest and kvm_vcpu_write_guest.Reviewed-by: Radim Krčmář
Signed-off-by: Paolo Bonzini
28 May, 2015
4 commits
-
Signed-off-by: Paolo Bonzini
-
Most of the function that wrap it can be rewritten without it, except
for gfn_to_pfn_prot. Just inline it into gfn_to_pfn_prot, and rewrite
the other function on top of gfn_to_pfn_memslot*.Reviewed-by: Radim Krcmar
Signed-off-by: Paolo Bonzini -
The memory slot is already available from gfn_to_memslot_dirty_bitmap.
Isn't it a shame to look it up again? Plus, it makes gfn_to_page_many_atomic
agnostic of multiple VCPU address spaces.Reviewed-by: Radim Krcmar
Signed-off-by: Paolo Bonzini -
This lets the function access the new memory slot without going through
kvm_memslots and id_to_memslot. It will simplify the code when more
than one address space will be supported.Unfortunately, the "const"ness of the new argument must be casted
away in two places. Fixing KVM to accept const struct kvm_memory_slot
pointers would require modifications in pretty much all architectures,
and is left for later.Reviewed-by: Radim Krcmar
Signed-off-by: Paolo Bonzini
26 May, 2015
4 commits
-
Prepare for the case of multiple address spaces.
Reviewed-by: Radim Krcmar
Signed-off-by: Paolo Bonzini -
Architecture-specific helpers are not supposed to muck with
struct kvm_userspace_memory_region contents. Add const to
enforce this.In order to eliminate the only write in __kvm_set_memory_region,
the cleaning of deleted slots is pulled up from update_memslots
to __kvm_set_memory_region.Reviewed-by: Takuya Yoshikawa
Reviewed-by: Radim Krcmar
Signed-off-by: Paolo Bonzini -
kvm_memslots provides lockdep checking. Use it consistently instead of
explicit dereferencing of kvm->memslots.Reviewed-by: Radim Krcmar
Signed-off-by: Paolo Bonzini -
kvm_alloc_memslots is extracted out of previously scattered code
that was in kvm_init_memslots_id and kvm_create_vm.kvm_free_memslot and kvm_free_memslots are new names of
kvm_free_physmem and kvm_free_physmem_slot, but they also take
an explicit pointer to struct kvm_memslots.This will simplify the transition to multiple address spaces,
each represented by one pointer to struct kvm_memslots.Reviewed-by: Takuya Yoshikawa
Reviewed-by: Radim Krcmar
Signed-off-by: Paolo Bonzini
20 May, 2015
1 commit
-
gfn_to_pfn_async is used in just one place, and because of x86-specific
treatment that place will need to look at the memory slot. Hence inline
it into try_async_pf and export __gfn_to_pfn_memslot.The patch also switches the subsequent call to gfn_to_pfn_prot to use
__gfn_to_pfn_memslot. This is a small optimization. Finally, remove
the now-unused async argument of __gfn_to_pfn.Signed-off-by: Paolo Bonzini
08 May, 2015
1 commit
-
On cpu hotplug only KVM emits an unconditional message that its notifier
has been called. It certainly can be assumed that calling cpu hotplug
notifiers work, therefore there is no added value if KVM prints a message.If an error happens on cpu online KVM will still emit a warning.
So let's remove this superfluous message.
Signed-off-by: Heiko Carstens
Signed-off-by: Paolo Bonzini