Eric Lee / smarc-fsl-linux-kernel

22 Mar, 2016

3 commits

4ae3cb3a2 KVM: Replace smp_mb() with smp_load_acquire() in the kvm_flush_remote_tlbs() ... Browse Code »

smp_load_acquire() is enough here and it's cheaper than smp_mb().
Adding a comment about reusing memory barrier of kvm_make_all_cpus_request()
here to keep order between modifications to the page tables and reading mode.

Signed-off-by: Lan Tianyu
Signed-off-by: Paolo Bonzini

Lan Tianyu
2016-03-22 23:38:33 +0800
a30a05091 KVM: Replace smp_mb() with smp_mb_after_atomic() in the kvm_make_all_cpus_request() ... Browse Code »

Signed-off-by: Lan Tianyu
Signed-off-by: Paolo Bonzini

Lan Tianyu
2016-03-22 23:38:30 +0800
e9ad4ec83 KVM: fix spin_lock_init order on x86 ... Browse Code »

Moving the initialization earlier is needed in 4.6 because
kvm_arch_init_vm is now using mmu_lock, causing lockdep to
complain:

[ 284.440294] INFO: trying to register non-static key.
[ 284.445259] the code is fine but needs lockdep annotation.
[ 284.450736] turning off the locking correctness validator.
...
[ 284.528318] [] lock_acquire+0xd3/0x240
[ 284.533733] [] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[ 284.541467] [] _raw_spin_lock+0x41/0x80
[ 284.546960] [] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[ 284.554707] [] kvm_page_track_register_notifier+0x20/0x60 [kvm]
[ 284.562281] [] kvm_mmu_init_vm+0x20/0x30 [kvm]
[ 284.568381] [] kvm_arch_init_vm+0x1ea/0x200 [kvm]
[ 284.574740] [] kvm_dev_ioctl+0xbf/0x4d0 [kvm]

However, it also helps fixing a preexisting problem, which is why this
patch is also good for stable kernels: kvm_create_vm was incrementing
current->mm->mm_count but not decrementing it at the out_err label (in
case kvm_init_mmu_notifier failed). The new initialization order makes
it possible to add the required mmdrop without adding a new error label.

Cc: stable@vger.kernel.org
Reported-by: Borislav Petkov
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2016-03-22 19:02:51 +0800

21 Mar, 2016

1 commit

643ad15d4 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

There's a background article at LWN.net:

https://lwn.net/Articles/643797/

The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.

This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).

This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:

mmap(..., PROT_EXEC);

or

mprotect(ptr, sz, PROT_EXEC);

(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.

So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.

We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.

There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.

Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...

Linus Torvalds
2016-03-21 10:08:56 +0800

17 Mar, 2016

1 commit

10dc37476 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"One of the largest releases for KVM... Hardly any generic
changes, but lots of architecture-specific updates.

ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.

PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).

s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.

x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using
vector hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest
memory - currently its only use is to speedup the legacy shadow
paging (pre-EPT) case, but in the future it will be used for
virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
KVM: x86: disable MPX if host did not enable MPX XSAVE features
arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
arm64: KVM: vgic-v3: Reset LRs at boot time
arm64: KVM: vgic-v3: Do not save an LR known to be empty
arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
arm64: KVM: vgic-v3: Avoid accessing ICH registers
KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
KVM: arm/arm64: vgic-v2: Reset LRs at boot time
KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
KVM: s390: allocate only one DMA page per VM
KVM: s390: enable STFLE interpretation only if enabled for the guest
KVM: s390: wake up when the VCPU cpu timer expires
KVM: s390: step the VCPU timer while in enabled wait
KVM: s390: protect VCPU cpu timer with a seqcount
KVM: s390: step VCPU cpu timer during kvm_run ioctl
...

Linus Torvalds
2016-03-17 00:55:35 +0800

15 Mar, 2016

1 commit

d4e796152 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:

- Make schedstats a runtime tunable (disabled by default) and
optimize it via static keys.

As most distributions enable CONFIG_SCHEDSTATS=y due to its
instrumentation value, this is a nice performance enhancement.
(Mel Gorman)

- Implement 'simple waitqueues' (swait): these are just pure
waitqueues without any of the more complex features of full-blown
waitqueues (callbacks, wake flags, wake keys, etc.). Simple
waitqueues have less memory overhead and are faster.

Use simple waitqueues in the RCU code (in 4 different places) and
for handling KVM vCPU wakeups.

(Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker,
Marcelo Tosatti)

- sched/numa enhancements (Rik van Riel)

- NOHZ performance enhancements (Rik van Riel)

- Various sched/deadline enhancements (Steven Rostedt)

- Various fixes (Peter Zijlstra)

- ... and a number of other fixes, cleanups and smaller enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
sched/cputime: Fix steal_account_process_tick() to always return jiffies
sched/deadline: Remove dl_new from struct sched_dl_entity
Revert "kbuild: Add option to turn incompatible pointer check into error"
sched/deadline: Remove superfluous call to switched_to_dl()
sched/debug: Fix preempt_disable_ip recording for preempt_disable()
sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
time, acct: Drop irq save & restore from __acct_update_integrals()
acct, time: Change indentation in __acct_update_integrals()
sched, time: Remove non-power-of-two divides from __acct_update_integrals()
sched/rt: Kick RT bandwidth timer immediately on start up
sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
sched/debug: Move sched_domain_sysctl to debug.c
sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
sched/rt: Fix PI handling vs. sched_setscheduler()
sched/core: Remove duplicated sched_group_set_shares() prototype
sched/fair: Consolidate nohz CPU load update code
sched/fair: Avoid using decay_load_missed() with a negative value
sched/deadline: Always calculate end of period on sched_yield()
sched/cgroup: Fix cgroup entity load tracking tear-down
rcu: Use simple wait queues where possible in rcutree
...

Linus Torvalds
2016-03-15 10:14:06 +0800

09 Mar, 2016

11 commits

313f636d5 kvm: cap halt polling at exactly halt_poll_ns ... Browse Code »

When growing halt-polling, there is no check that the poll time exceeds
the limit. It's possible for vcpu->halt_poll_ns grow once past
halt_poll_ns, and stay there until a halt which takes longer than
vcpu->halt_poll_ns. For example, booting a Linux guest with
halt_poll_ns=11000:

... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)

Signed-off-by: David Matlack
Fixes: aca6ff29c4063a8d467cdee241e6b3bf7dc4a171
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini

David Matlack
2016-03-09 18:54:14 +0800
ab92f3087 Merge tag 'kvm-arm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD ... Browse Code »

KVM/ARM updates for 4.6

- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore code

Conflicts:
include/uapi/linux/kvm.h

Paolo Bonzini
2016-03-09 18:50:42 +0800
0d98d00b8 arm64: KVM: vgic-v3: Reset LRs at boot time ... Browse Code »

In order to let the GICv3 code be more lazy in the way it
accesses the LRs, it is necessary to start with a clean slate.

Let's reset the LRs on each CPU when the vgic is probed (which
includes a round trip to EL2...).

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:24:09 +0800
1b8e83c04 arm64: KVM: vgic-v3: Avoid accessing ICH registers ... Browse Code »

Just like on GICv2, we're a bit hammer-happy with GICv3, and access
them more often than we should.

Adopt a policy similar to what we do for GICv2, only save/restoring
the minimal set of registers. As we don't access the registers
linearly anymore (we may skip some), the convoluted accessors become
slightly simpler, and we can drop the ugly indexing macro that
tended to confuse the reviewers.

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:24:04 +0800
667a87a92 KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit ... Browse Code »

The GICD_SGIR register lives a long way from the beginning of
the handler array, which is searched linearly. As this is hit
pretty often, let's move it up. This saves us some precious
cycles when the guest is generating IPIs.

Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:24:03 +0800
cc1daf0b8 KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit ... Browse Code »

So far, we're always writing all possible LRs, setting the empty
ones with a zero value. This is obvious doing a lot of work for
nothing, and we're better off clearing those we've actually
dirtied on the exit path (it is very rare to inject more than one
interrupt at a time anyway).

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:23:56 +0800
d6400d774 KVM: arm/arm64: vgic-v2: Reset LRs at boot time ... Browse Code »

In order to let make the GICv2 code more lazy in the way it
accesses the LRs, it is necessary to start with a clean slate.

Let's reset the LRs on each CPU when the vgic is probed.

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:23:00 +0800
f8cfbce1b KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty ... Browse Code »

On exit, any empty LR will be signaled in GICH_ELRSR*. Which
means that we do not have to save it, and we can just clear
its state in the in-memory copy.

Take this opportunity to move the LR saving code into its
own function.

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:22:24 +0800
2a1044f8b KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function ... Browse Code »

In order to make the saving path slightly more readable and
prepare for some more optimizations, let's move the GICH_ELRSR
saving to its own function.

No functional change.

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:22:23 +0800
c813bb17f KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required ... Browse Code »

Next on our list of useless accesses is the maintenance interrupt
status registers (GICH_MISR, GICH_EISR{0,1}).

It is pointless to save them if we haven't asked for a maintenance
interrupt the first place, which can only happen for two reasons:
- Underflow: GICH_HCR_UIE will be set,
- EOI: GICH_LR_EOI will be set.

These conditions can be checked on the in-memory copies of the regs.
Should any of these two condition be valid, we must read GICH_MISR.
We can then check for GICH_MISR_EOI, and only when set read
GICH_EISR*.

This means that in most case, we don't have to save them at all.

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:22:21 +0800
59f00ff9a KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers ... Browse Code »

GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
But we're equaly bad, as we make a point in accessing them even if
we don't have any interrupt in flight.

A good solution is to first find out if we have anything useful to
write into the GIC, and if we don't, to simply not do it. This
involves tracking which LRs actually have something valid there.

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-09 12:22:20 +0800

04 Mar, 2016

1 commit

b2740d353 KVM: ensure __gfn_to_pfn_memslot initializes *writable ... Browse Code »

For the kvm_is_error_hva, ubsan complains if the uninitialized writable
is passed to __direct_map, even though the value itself is not used
(__direct_map goes to mmu_set_spte->set_spte->set_mmio_spte but never
looks at that argument).

Ensuring that __gfn_to_pfn_memslot initializes *writable is cheap and
avoids this kind of issue.

Signed-off-by: Paolo Bonzini

Paolo Bonzini
2016-03-04 19:35:20 +0800

01 Mar, 2016

13 commits

9b4a30044 KVM: arm/arm64: timer: Add active state caching ... Browse Code »

Programming the active state in the (re)distributor can be an
expensive operation so it makes some sense to try and reduce
the number of accesses as much as possible. So far, we
program the active state on each VM entry, but there is some
opportunity to do less.

An obvious solution is to cache the active state in memory,
and only program it in the HW when conditions change. But
because the HW can also change things under our feet (the active
state can transition from 1 to 0 when the guest does an EOI),
some precautions have to be taken, which amount to only caching
an "inactive" state, and always programing it otherwise.

With this in place, we observe a reduction of around 700 cycles
on a 2GHz GICv2 platform for a NULL hypercall.

Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-01 02:34:22 +0800
bb0c70bcc arm64: KVM: Add a new vcpu device control group for PMUv3 ... Browse Code »

To configure the virtual PMUv3 overflow interrupt number, we use the
vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ
attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group.

After configuring the PMUv3, call the vcpu ioctl with attribute
KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3.

Signed-off-by: Shannon Zhao
Acked-by: Peter Maydell
Reviewed-by: Andrew Jones
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:21 +0800
808e73814 arm64: KVM: Add a new feature bit for PMUv3 ... Browse Code »

To support guest PMUv3, use one bit of the VCPU INIT feature array.
Initialize the PMU when initialzing the vcpu with that bit and PMU
overflow interrupt set.

Signed-off-by: Shannon Zhao
Acked-by: Peter Maydell
Reviewed-by: Andrew Jones
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:21 +0800
5f0a714a2 arm64: KVM: Free perf event of PMU when destroying vcpu ... Browse Code »

When KVM frees VCPU, it needs to free the perf_event of PMU.

Signed-off-by: Shannon Zhao
Reviewed-by: Marc Zyngier
Reviewed-by: Andrew Jones
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:21 +0800
2aa36e984 arm64: KVM: Reset PMU state when resetting vcpu ... Browse Code »

When resetting vcpu, it needs to reset the PMU state to initial status.

Signed-off-by: Shannon Zhao
Reviewed-by: Marc Zyngier
Reviewed-by: Andrew Jones
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:21 +0800
b02386eb7 arm64: KVM: Add PMU overflow interrupt routing ... Browse Code »

When calling perf_event_create_kernel_counter to create perf_event,
assign a overflow handler. Then when the perf event overflows, set the
corresponding bit of guest PMOVSSET register. If this counter is enabled
and its interrupt is enabled as well, kick the vcpu to sync the
interrupt.

On VM entry, if there is counter overflowed and interrupt level is
changed, inject the interrupt with corresponding level. On VM exit, sync
the interrupt level as well if it has been changed.

Signed-off-by: Shannon Zhao
Reviewed-by: Marc Zyngier
Reviewed-by: Andrew Jones
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:21 +0800
76993739c arm64: KVM: Add helper to handle PMCR register bits ... Browse Code »

According to ARMv8 spec, when writing 1 to PMCR.E, all counters are
enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are
disabled. When writing 1 to PMCR.P, reset all event counters, not
including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to
zero.

Signed-off-by: Shannon Zhao
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:21 +0800
7a0adc706 arm64: KVM: Add access handler for PMSWINC register ... Browse Code »

Add access handler which emulates writing and reading PMSWINC
register and add support for creating software increment event.

Signed-off-by: Shannon Zhao
Reviewed-by: Andrew Jones
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:20 +0800
76d883c4e arm64: KVM: Add access handler for PMOVSSET and PMOVSCLR register ... Browse Code »

Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMOVSSET or PMOVSCLR register.

When writing non-zero value to PMOVSSET, the counter and its interrupt
is enabled, kick this vcpu to sync PMU interrupt.

Signed-off-by: Shannon Zhao
Reviewed-by: Andrew Jones
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:20 +0800
7f7663587 arm64: KVM: PMU: Add perf event map and introduce perf event creating function ... Browse Code »

When we use tools like perf on host, perf passes the event type and the
id of this event type category to kernel, then kernel will map them to
hardware event number and write this number to PMU PMEVTYPER_EL0
register. When getting the event number in KVM, directly use raw event
type to create a perf_event for it.

Signed-off-by: Shannon Zhao
Reviewed-by: Marc Zyngier
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:20 +0800
96b0eebcc arm64: KVM: Add access handler for PMCNTENSET and PMCNTENCLR register ... Browse Code »

Since the reset value of PMCNTENSET and PMCNTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMCNTENSET or PMCNTENCLR register.

When writing to PMCNTENSET, call perf_event_enable to enable the perf
event. When writing to PMCNTENCLR, call perf_event_disable to disable
the perf event.

Signed-off-by: Shannon Zhao
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:20 +0800
051ff581c arm64: KVM: Add access handler for event counter register ... Browse Code »

These kind of registers include PMEVCNTRn, PMCCNTR and PMXEVCNTR which
is mapped to PMEVCNTRn.

The access handler translates all aarch32 register offsets to aarch64
ones and uses vcpu_sys_reg() to access their values to avoid taking care
of big endian.

When reading these registers, return the sum of register value and the
value perf event counts.

Signed-off-by: Shannon Zhao
Reviewed-by: Andrew Jones
Signed-off-by: Marc Zyngier

Shannon Zhao
2016-03-01 02:34:20 +0800
6d50d54cd arm64: KVM: Move vgic-v2 and timer save/restore to virt/kvm/arm/hyp ... Browse Code »

We already have virt/kvm/arm/ containing timer and vgic stuff.
Add yet another subdirectory to contain the hyp-specific files
(timer and vgic again).

Acked-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2016-03-01 02:34:18 +0800

29 Feb, 2016

1 commit

6aa447bcb Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2016-02-29 16:42:07 +0800

25 Feb, 2016

2 commits

8577370fb KVM: Use simple waitqueue for vcpu->wq ... Browse Code »

The problem:

On -rt, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:

This patch reduces the average latency in my tests from 14us to 11us.

Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 1000 times on
tip/sched/core 4.4.0-rc8-01134-g0905f04:

./x86-run x86/tscdeadline_latency.flat -cpu host

with idle=poll.

The test seems not to deliver really stable numbers though most of
them are smaller. Paolo write:

"Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.

The mean shows an improvement indeed."

Before:

min max mean std
count 1000.000000 1000.000000 1000.000000 1000.000000
mean 5162.596000 2019270.084000 5824.491541 20681.645558
std 75.431231 622607.723969 89.575700 6492.272062
min 4466.000000 23928.000000 5537.926500 585.864966
25% 5163.000000 1613252.750000 5790.132275 16683.745433
50% 5175.000000 2281919.000000 5834.654000 23151.990026
75% 5190.000000 2382865.750000 5861.412950 24148.206168
max 5228.000000 4175158.000000 6254.827300 46481.048691

After
min max mean std
count 1000.000000 1000.00000 1000.000000 1000.000000
mean 5143.511000 2076886.10300 5813.312474 21207.357565
std 77.668322 610413.09583 86.541500 6331.915127
min 4427.000000 25103.00000 5529.756600 559.187707
25% 5148.000000 1691272.75000 5784.889825 17473.518244
50% 5160.000000 2308328.50000 5832.025000 23464.837068
75% 5172.000000 2393037.75000 5853.177675 24223.969976
max 5222.000000 3922458.00000 6186.720500 42520.379830

[Patch was originaly based on the swait implementation found in the -rt
tree. Daniel ported it to mainline's version and gathered the
benchmark numbers for tscdeadline_latency test.]

Signed-off-by: Daniel Wagner
Acked-by: Peter Zijlstra (Intel)
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng
Cc: Marcelo Tosatti
Cc: Steven Rostedt
Cc: Paul Gortmaker
Cc: Paolo Bonzini
Cc: "Paul E. McKenney"
Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner

Marcelo Tosatti
2016-02-25 18:27:16 +0800
0fb00d326 Merge tag 'kvm-arm-for-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/kvmarm/kvmarm into kvm-master

KVM/ARM fixes for 4.5-rc6

- Fix per-vcpu vgic bitmap allocation
- Do not give copy random memory on MMIO read
- Fix GICv3 APR register restore order

Paolo Bonzini
2016-02-25 16:53:55 +0800

24 Feb, 2016

2 commits

d7444794a KVM: async_pf: do not warn on page allocation failures ... Browse Code »

In async_pf we try to allocate with NOWAIT to get an element quickly
or fail. This code also handle failures gracefully. Lets silence
potential page allocation failures under load.

qemu-system-s39: page allocation failure: order:0,mode:0x2200000
[...]
Call Trace:
([] show_trace+0xf8/0x148)
[] show_stack+0x62/0xe8
[] dump_stack+0x70/0x98
[] warn_alloc_failed+0xd2/0x148
[] __alloc_pages_nodemask+0x94e/0xb38
[] new_slab+0x382/0x400
[] ___slab_alloc.constprop.30+0x2dc/0x378
[] kmem_cache_alloc+0x160/0x1d0
[] kvm_setup_async_pf+0x6c/0x198
[] kvm_arch_vcpu_ioctl_run+0xd48/0xd58
[] kvm_vcpu_ioctl+0x372/0x690
[] do_vfs_ioctl+0x3be/0x510
[] SyS_ioctl+0xa4/0xb8
[] system_call+0xd6/0x264
[] 0x3ffa24fa06a

Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger
Reviewed-by: Dominik Dingel
Signed-off-by: Paolo Bonzini

Christian Borntraeger
2016-02-24 21:47:46 +0800
236cf17c2 KVM: arm/arm64: vgic: Ensure bitmaps are long enough ... Browse Code »

When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
bits we need by 8 to figure out how many bytes to allocate. However,
bitmap elements are always accessed as unsigned longs, and if we didn't
happen to allocate a size such that size % sizeof(unsigned long) == 0,
bitmap accesses may go past the end of the allocation.

When using KASAN (which does byte-granular access checks), this results
in a continuous stream of BUGs whenever these bitmaps are accessed:

=============================================================================
BUG kmalloc-128 (Tainted: G B ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
INFO: Object 0xffffffc935769500 @offset=1280 fp=0x (null)

Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G B 4.4.0+ #17
Hardware name: ARM Juno development board (r1) (DT)
Call trace:
[] dump_backtrace+0x0/0x280
[] show_stack+0x14/0x20
[] dump_stack+0x100/0x188
[] print_trailer+0xfc/0x168
[] object_err+0x3c/0x50
[] kasan_report_error+0x244/0x558
[] __asan_report_load8_noabort+0x48/0x50
[] __bitmap_or+0xc0/0xc8
[] kvm_vgic_flush_hwstate+0x1bc/0x650
[] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
[] kvm_vcpu_ioctl+0x474/0xa68
[] do_vfs_ioctl+0x5b8/0xcb0
[] SyS_ioctl+0x8c/0xa0
[] el0_svc_naked+0x24/0x28
Memory state around the buggy address:
ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fix the issue by always allocating a multiple of sizeof(unsigned long),
as we do elsewhere in the vgic code.

Fixes: c1bfb577a ("arm/arm64: KVM: vgic: switch to dynamic allocation")
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier
Acked-by: Christoffer Dall
Signed-off-by: Mark Rutland
Signed-off-by: Marc Zyngier

Mark Rutland
2016-02-24 03:02:48 +0800

23 Feb, 2016

2 commits

433da8602 KVM: async_pf: use list_first_entry ... Browse Code »

To make the intention clearer, use list_first_entry instead of
list_entry.

Signed-off-by: Geliang Tang
Signed-off-by: Paolo Bonzini

Geliang Tang
2016-02-23 22:40:55 +0800
e6e3b5a64 KVM: use list_for_each_entry_safe ... Browse Code »

Use list_for_each_entry_safe() instead of list_for_each_safe() to
simplify the code.

Signed-off-by: Geliang Tang
Signed-off-by: Paolo Bonzini

Geliang Tang
2016-02-23 22:40:53 +0800

17 Feb, 2016

1 commit

6b6de68c6 KVM: halt_polling: improve grow/shrink settings ... Browse Code »

Right now halt_poll_ns can be change during runtime. The
grow and shrink factors can only be set during module load.
Lets fix several aspects of grow shrink:
- make grow/shrink changeable by root
- make all variables unsigned int
- read the variables once to prevent races

Signed-off-by: Christian Borntraeger
Signed-off-by: Paolo Bonzini

Christian Borntraeger
2016-02-17 01:48:29 +0800

16 Feb, 2016

1 commit

d4edcf0d5 mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm ... Browse Code »

We will soon modify the vanilla get_user_pages() so it can no
longer be used on mm/tasks other than 'current/current->mm',
which is by far the most common way it is called. For now,
we allow the old-style calls, but warn when they are used.
(implemented in previous patch)

This patch switches all callers of:

get_user_pages()
get_user_pages_unlocked()
get_user_pages_locked()

to stop passing tsk/mm so they will no longer see the warnings.

Signed-off-by: Dave Hansen
Reviewed-by: Thomas Gleixner
Cc: Andrea Arcangeli
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Kirill A. Shutemov
Cc: Linus Torvalds
Cc: Naoya Horiguchi
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Srikar Dronamraju
Cc: Vlastimil Babka
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.com
Signed-off-by: Ingo Molnar

Dave Hansen
2016-02-16 17:11:12 +0800