Eric Lee / smarc-fsl-linux-kernel

27 Oct, 2018

1 commit

4e15a073a Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks" ... Browse Code »

Revert 5ff7091f5a2ca ("mm, mmu_notifier: annotate mmu notifiers with
blockable invalidate callbacks").

MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no
longer needed since 93065ac753e4 ("mm, oom: distinguish blockable mode for
mmu notifiers"). We now have a full support for per range !blocking
behavior so we can drop the stop gap workaround which the per notifier
flag was used for.

Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.org
Signed-off-by: Michal Hocko
Cc: David Rientjes
Cc: Boris Ostrovsky
Cc: Jerome Glisse
Cc: Juergen Gross
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2018-10-27 07:25:19 +0800

26 Oct, 2018

1 commit

0d1e8b8d2 Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Radim Krčmář:
"ARM:
- Improved guest IPA space support (32 to 52 bits)

- RAS event delivery for 32bit

- PMU fixes

- Guest entry hardening

- Various cleanups

- Port of dirty_log_test selftest

PPC:
- Nested HV KVM support for radix guests on POWER9. The performance
is much better than with PR KVM. Migration and arbitrary level of
nesting is supported.

- Disable nested HV-KVM on early POWER9 chips that need a particular
hardware bug workaround

- One VM per core mode to prevent potential data leaks

- PCI pass-through optimization

- merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base

s390:
- Initial version of AP crypto virtualization via vfio-mdev

- Improvement for vfio-ap

- Set the host program identifier

- Optimize page table locking

x86:
- Enable nested virtualization by default

- Implement Hyper-V IPI hypercalls

- Improve #PF and #DB handling

- Allow guests to use Enlightened VMCS

- Add migration selftests for VMCS and Enlightened VMCS

- Allow coalesced PIO accesses

- Add an option to perform nested VMCS host state consistency check
through hardware

- Automatic tuning of lapic_timer_advance_ns

- Many fixes, minor improvements, and cleanups"

* tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
Revert "kvm: x86: optimize dr6 restore"
KVM: PPC: Optimize clearing TCEs for sparse tables
x86/kvm/nVMX: tweak shadow fields
selftests/kvm: add missing executables to .gitignore
KVM: arm64: Safety check PSTATE when entering guest and handle IL
KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips
arm/arm64: KVM: Enable 32 bits kvm vcpu events support
arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
KVM: arm64: Fix caching of host MDCR_EL2 value
KVM: VMX: enable nested virtualization by default
KVM/x86: Use 32bit xor to clear registers in svm.c
kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD
kvm: vmx: Defer setting of DR6 until #DB delivery
kvm: x86: Defer setting of CR2 until #PF delivery
kvm: x86: Add payload operands to kvm_multiple_exception
kvm: x86: Add exception payload fields to kvm_vcpu_events
kvm: x86: Add has_payload and payload to kvm_queued_exception
KVM: Documentation: Fix omission in struct kvm_vcpu_events
KVM: selftests: add Enlightened VMCS test
...

Linus Torvalds
2018-10-26 08:57:35 +0800

24 Oct, 2018

1 commit

ba9f6f895 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/eb… ... Browse Code »

…iederm/user-namespace

Pull siginfo updates from Eric Biederman:
"I have been slowly sorting out siginfo and this is the culmination of
that work.

The primary result is in several ways the signal infrastructure has
been made less error prone. The code has been updated so that manually
specifying SEND_SIG_FORCED is never necessary. The conversion to the
new siginfo sending functions is now complete, which makes it
difficult to send a signal without filling in the proper siginfo
fields.

At the tail end of the patchset comes the optimization of decreasing
the size of struct siginfo in the kernel from 128 bytes to about 48
bytes on 64bit. The fundamental observation that enables this is by
definition none of the known ways to use struct siginfo uses the extra
bytes.

This comes at the cost of a small user space observable difference.
For the rare case of siginfo being injected into the kernel only what
can be copied into kernel_siginfo is delivered to the destination, the
rest of the bytes are set to 0. For cases where the signal and the
si_code are known this is safe, because we know those bytes are not
used. For cases where the signal and si_code combination is unknown
the bits that won't fit into struct kernel_siginfo are tested to
verify they are zero, and the send fails if they are not.

I made an extensive search through userspace code and I could not find
anything that would break because of the above change. If it turns out
I did break something it will take just the revert of a single change
to restore kernel_siginfo to the same size as userspace siginfo.

Testing did reveal dependencies on preferring the signo passed to
sigqueueinfo over si->signo, so bit the bullet and added the
complexity necessary to handle that case.

Testing also revealed bad things can happen if a negative signal
number is passed into the system calls. Something no sane application
will do but something a malicious program or a fuzzer might do. So I
have fixed the code that performs the bounds checks to ensure negative
signal numbers are handled"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
signal: Guard against negative signal numbers in copy_siginfo_from_user32
signal: Guard against negative signal numbers in copy_siginfo_from_user
signal: In sigqueueinfo prefer sig not si_signo
signal: Use a smaller struct siginfo in the kernel
signal: Distinguish between kernel_siginfo and siginfo
signal: Introduce copy_siginfo_from_user and use it's return value
signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
signal: Fail sigqueueinfo if si_signo != sig
signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
signal/unicore32: Use force_sig_fault where appropriate
signal/unicore32: Generate siginfo in ucs32_notify_die
signal/unicore32: Use send_sig_fault where appropriate
signal/arc: Use force_sig_fault where appropriate
signal/arc: Push siginfo generation into unhandled_exception
signal/ia64: Use force_sig_fault where appropriate
signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
signal/ia64: Use the generic force_sigsegv in setup_frame
signal/arm/kvm: Use send_sig_mceerr
signal/arm: Use send_sig_fault where appropriate
signal/arm: Use force_sig_fault where appropriate
...

Linus Torvalds
2018-10-24 18:22:39 +0800

23 Oct, 2018

1 commit

528985117 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Catalin Marinas:
"Apart from some new arm64 features and clean-ups, this also contains
the core mmu_gather changes for tracking the levels of the page table
being cleared and a minor update to the generic
compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ.

Summary:

- Core mmu_gather changes which allow tracking the levels of
page-table being cleared together with the arm64 low-level flushing
routines

- Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
mitigate Spectre-v4 dynamically without trapping to EL3 firmware

- Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack

- Optimise emulation of MRS instructions to ID_* registers on ARMv8.4

- Support for Common Not Private (CnP) translations allowing threads
of the same CPU to share the TLB entries

- Accelerated crc32 routines

- Move swapper_pg_dir to the rodata section

- Trap WFI instruction executed in user space

- ARM erratum 1188874 workaround (arch_timer)

- Miscellaneous fixes and clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work
arm64: cpufeature: Trap CTR_EL0 access only where it is necessary
arm64: cpufeature: Fix handling of CTR_EL0.IDC field
arm64: cpufeature: ctr: Fix cpu capability check for late CPUs
Documentation/arm64: HugeTLB page implementation
arm64: mm: Use __pa_symbol() for set_swapper_pgd()
arm64: Add silicon-errata.txt entry for ARM erratum 1188873
Revert "arm64: uaccess: implement unsafe accessors"
arm64: mm: Drop the unused cpu parameter
MAINTAINERS: fix bad sdei paths
arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines
arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c
arm64: xen: Use existing helper to check interrupt status
arm64: Use daifflag_restore after bp_hardening
arm64: daifflags: Use irqflags functions for daifflags
arm64: arch_timer: avoid unused function warning
arm64: Trap WFI executed in userspace
arm64: docs: Document SSBS HWCAP
arm64: docs: Fix typos in ELF hwcaps
arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe
...

Linus Torvalds
2018-10-23 00:30:06 +0800

19 Oct, 2018

1 commit

e42b4a507 Merge tag 'kvmarm-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kv… ... Browse Code »

…marm/kvmarm into HEAD

KVM/arm updates for 4.20

- Improved guest IPA space support (32 to 52 bits)
- RAS event delivery for 32bit
- PMU fixes
- Guest entry hardening
- Various cleanups

Paolo Bonzini
2018-10-19 21:24:24 +0800

18 Oct, 2018

3 commits

58bf437ff arm/arm64: KVM: Enable 32 bits kvm vcpu events support ... Browse Code »

The commit 539aee0edb9f ("KVM: arm64: Share the parts of
get/set events useful to 32bit") shares the get/set events
helper for arm64 and arm32, but forgot to share the cap
extension code.

User space will check whether KVM supports vcpu events by
checking the KVM_CAP_VCPU_EVENTS extension

Acked-by: James Morse
Reviewed-by : Suzuki K Poulose
Signed-off-by: Dongjiu Geng
Signed-off-by: Marc Zyngier

Dongjiu Geng
2018-10-18 17:14:03 +0800
375bdd3b5 arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension() ... Browse Code »

Rename kvm_arch_dev_ioctl_check_extension() to
kvm_arch_vm_ioctl_check_extension(), because it does
not have any relationship with device.

Renaming this function can make code readable.

Cc: James Morse
Reviewed-by: Suzuki K Poulose
Signed-off-by: Dongjiu Geng
Signed-off-by: Marc Zyngier

Dongjiu Geng
2018-10-18 17:12:53 +0800
da5a3ce66 KVM: arm64: Fix caching of host MDCR_EL2 value ... Browse Code »

At boot time, KVM stashes the host MDCR_EL2 value, but only does this
when the kernel is not running in hyp mode (i.e. is non-VHE). In these
cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
lead to CONSTRAINED UNPREDICTABLE behaviour.

Since we use this value to derive the MDCR_EL2 value when switching
to/from a guest, after a guest have been run, the performance counters
do not behave as expected. This has been observed to result in accesses
via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
counters, resulting in events not being counted. In these cases, only
the fixed-purpose cycle counter appears to work as expected.

Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.

Cc: Christopher Dall
Cc: James Morse
Cc: Will Deacon
Cc: stable@vger.kernel.org
Fixes: 1e947bad0b63b351 ("arm64: KVM: Skip HYP setup when already running in HYP")
Tested-by: Robin Murphy
Signed-off-by: Mark Rutland
Signed-off-by: Marc Zyngier

Mark Rutland
2018-10-18 01:32:46 +0800

17 Oct, 2018

4 commits

970c0d4b9 KVM: refine the comment of function gfn_to_hva_memslot_prot() ... Browse Code »

The original comment is little hard to understand.

No functional change, just amend the comment a little.

Signed-off-by: Wei Yang
Signed-off-by: Paolo Bonzini

Wei Yang
2018-10-17 06:30:13 +0800
0804c849f kvm/x86 : add coalesced pio support ... Browse Code »

Coalesced pio is based on coalesced mmio and can be used for some port
like rtc port, pci-host config port and so on.

Specially in case of rtc as coalesced pio, some versions of windows guest
access rtc frequently because of rtc as system tick. guest access rtc like
this: write register index to 0x70, then write or read data from 0x71.
writing 0x70 port is just as index and do nothing else. So we can use
coalesced pio to handle this scene to reduce VM-EXIT time.

When starting and closing a virtual machine, it will access pci-host config
port frequently. So setting these port as coalesced pio can reduce startup
and shutdown time.

without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8
using perf tools: (guest OS : windows 7 64bit)
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%)
0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%)

with my patch
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%)
0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%)

Signed-off-by: Peng Hao
Signed-off-by: Paolo Bonzini

Peng Hao
2018-10-17 06:30:11 +0800
31fc4f95d KVM: leverage change to adjust slots->used_slots in update_memslots() ... Browse Code »

update_memslots() is only called by __kvm_set_memory_region(), in which
"change" is calculated and indicates how to adjust slots->used_slots

* increase by one if it is KVM_MR_CREATE
* decrease by one if it is KVM_MR_DELETE
* not change for others

This patch adjusts slots->used_slots in update_memslots() based on "change"
value instead of re-calculate those states again.

Signed-off-by: Wei Yang
Signed-off-by: Paolo Bonzini

Wei Yang
2018-10-17 06:29:48 +0800
a812297c4 KVM: x86: hyperv: optimize 'all cpus' case in kvm_hv_flush_tlb() ... Browse Code »

We can use 'NULL' to represent 'all cpus' case in
kvm_make_vcpus_request_mask() and avoid building vCPU mask with
all vCPUs.

Suggested-by: Radim Krčmář
Signed-off-by: Vitaly Kuznetsov
Reviewed-by: Roman Kagan
Signed-off-by: Paolo Bonzini

Vitaly Kuznetsov
2018-10-17 06:29:44 +0800

03 Oct, 2018

4 commits

fd2ef3582 KVM: arm/arm64: Ensure only THP is candidate for adjustment ... Browse Code »

PageTransCompoundMap() returns true for hugetlbfs and THP
hugepages. This behaviour incorrectly leads to stage 2 faults for
unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be
treated as THP faults.

Tighten the check to filter out hugetlbfs pages. This also leads to
consistently mapping all unsupported hugepage sizes as PTE level
entries at stage 2.

Signed-off-by: Punit Agrawal
Reviewed-by: Suzuki Poulose
Cc: Christoffer Dall
Cc: Marc Zyngier
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-10-03 18:49:34 +0800
9d47bb0d9 KVM: arm64: Drop __cpu_init_stage2 on the VHE path ... Browse Code »

__cpu_init_stage2 doesn't do anything anymore on arm64, and is
totally non-sensical if running VHE (as VHE is 64bit only).

Reviewed-by: Eric Auger
Reviewed-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Marc Zyngier
2018-10-03 18:48:30 +0800
bca607ebc KVM: arm/arm64: Rename kvm_arm_config_vm to kvm_arm_setup_stage2 ... Browse Code »

VM tends to be a very overloaded term in KVM, so let's keep it
to describe the virtual machine. For the virtual memory setup,
let's use the "stage2" suffix.

Reviewed-by: Eric Auger
Reviewed-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Marc Zyngier
2018-10-03 18:45:29 +0800
0f62f0e95 kvm: arm64: Set a limit on the IPA size ... Browse Code »

So far we have restricted the IPA size of the VM to the default
value (40bits). Now that we can manage the IPA size per VM and
support dynamic stage2 page tables, we can allow VMs to have
larger IPA. This patch introduces a the maximum IPA size
supported on the host. This is decided by the following factors :

1) Maximum PARange supported by the CPUs - This can be inferred
from the system wide safe value.
2) Maximum PA size supported by the host kernel (48 vs 52)
3) Number of levels in the host page table (as we base our
stage2 tables on the host table helpers).

Since the stage2 page table code is dependent on the stage1
page table, we always ensure that :

Number of Levels at Stage1 >= Number of Levels at Stage2

So we limit the IPA to make sure that the above condition
is satisfied. This will affect the following combinations
of VA_BITS and IPA for different page sizes.

Host configuration | Unsupported IPA ranges
39bit VA, 4K | [44, 48]
36bit VA, 16K | [41, 48]
42bit VA, 64K | [47, 52]

Supporting the above combinations need independent stage2
page table manipulation code, which would need substantial
changes. We could purse the solution independently and
switch the page table code once we have it ready.

Cc: Catalin Marinas
Cc: Marc Zyngier
Cc: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Suzuki K Poulose
2018-10-03 18:44:55 +0800

01 Oct, 2018

5 commits

8ad50c898 vgic: Add support for 52bit guest physical address ... Browse Code »

Add support for handling 52bit guest physical address to the
VGIC layer. So far we have limited the guest physical address
to 48bits, by explicitly masking the upper bits. This patch
removes the restriction. We do not have to check if the host
supports 52bit as the gpa is always validated during an access.
(e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()).
Also, the ITS table save-restore is also not affected with
the enhancement. The DTE entries already store the bits[51:8]
of the ITT_addr (with a 256byte alignment).

Cc: Marc Zyngier
Cc: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Kristina Martsenko
[ Macro clean ups, fix PROPBASER and PENDBASER accesses ]
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Kristina Martsenko
2018-10-01 20:50:32 +0800
e55cac5bf kvm: arm/arm64: Prepare for VM specific stage2 translations ... Browse Code »

Right now the stage2 page table for a VM is hard coded, assuming
an IPA of 40bits. As we are about to add support for per VM IPA,
prepare the stage2 page table helpers to accept the kvm instance
to make the right decision for the VM. No functional changes.
Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves
some of the definitions in arm32 to align with the arm64.
Also drop the _AC() specifier constants wherever possible.

Cc: Christoffer Dall
Acked-by: Marc Zyngier
Reviewed-by: Eric Auger
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Suzuki K Poulose
2018-10-01 20:50:30 +0800
5b6c6742b kvm: arm/arm64: Allow arch specific configurations for VM ... Browse Code »

Allow the arch backends to perform VM specific initialisation.
This will be later used to handle IPA size configuration and per-VM
VTCR configuration on arm64.

Cc: Marc Zyngier
Cc: Christoffer Dall
Reviewed-by: Eric Auger
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Suzuki K Poulose
2018-10-01 20:50:29 +0800
7788a2806 kvm: arm/arm64: Remove spurious WARN_ON ... Browse Code »

On a 4-level page table pgd entry can be empty, unlike a 3-level
page table. Remove the spurious WARN_ON() in stage_get_pud().

Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Reviewed-by: Eric Auger
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Suzuki K Poulose
2018-10-01 20:08:41 +0800
d2db7773b kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table ... Browse Code »

So far we have only supported 3 level page table with fixed IPA of
40bits, where PUD is folded. With 4 level page tables, we need
to check if the PUD entry is valid or not. Fix stage2_flush_memslot()
to do this check, before walking down the table.

Acked-by: Christoffer Dall
Acked-by: Marc Zyngier
Reviewed-by: Eric Auger
Signed-off-by: Suzuki K Poulose
Signed-off-by: Marc Zyngier

Suzuki K Poulose
2018-10-01 20:08:41 +0800

28 Sep, 2018

1 commit

795a83714 signal/arm/kvm: Use send_sig_mceerr ... Browse Code »

This simplifies the code making it clearer what is going on, and
making the siginfo generation easier to maintain.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-09-28 03:57:43 +0800

18 Sep, 2018

1 commit

ab510027d arm64: KVM: Enable Common Not Private translations ... Browse Code »

We rely on cpufeature framework to detect and enable CNP so for KVM we
need to patch hyp to set CNP bit just before TTBR0_EL2 gets written.

For the guest we encode CNP bit while building vttbr, so we don't need
to bother with that in a world switch.

Reviewed-by: James Morse
Acked-by: Catalin Marinas
Acked-by: Marc Zyngier
Signed-off-by: Vladimir Murzin
Signed-off-by: Catalin Marinas

Vladimir Murzin
2018-09-18 19:03:34 +0800

07 Sep, 2018

2 commits

a35381e10 KVM: Remove obsolete kvm_unmap_hva notifier backend ... Browse Code »

kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to
deal with. Drop the now obsolete code.

Fixes: fb1522e099f0 ("KVM: update to new mmu_notifier semantic v2")
Cc: James Hogan
Reviewed-by: Paolo Bonzini
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Marc Zyngier
2018-09-07 21:06:02 +0800
694556d54 KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW ... Browse Code »

When triggering a CoW, we unmap the RO page via an MMU notifier
(invalidate_range_start), and then populate the new PTE using another
one (change_pte). In the meantime, we'll have copied the old page
into the new one.

The problem is that the data for the new page is sitting in the
cache, and should the guest have an uncached mapping to that page
(or its MMU off), following accesses will bypass the cache.

In a way, this is similar to what happens on a translation fault:
We need to clean the page to the PoC before mapping it. So let's just
do that.

This fixes a KVM unit test regression observed on a HiSilicon platform,
and subsequently reproduced on Seattle.

Fixes: a9c0e12ebee5 ("KVM: arm/arm64: Only clean the dcache on translation fault")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Mike Galbraith
Signed-off-by: Marc Zyngier
Signed-off-by: Christoffer Dall

Marc Zyngier
2018-09-07 21:05:40 +0800

23 Aug, 2018

3 commits

b37211531 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull second set of KVM updates from Paolo Bonzini:
"ARM:
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups

x86:
- fixes for L1TF
- a new test case
- non-support for SGX (inject the right exception in the guest)
- fix lockdep false positive"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits)
KVM: VMX: fixes for vmentry_l1d_flush module parameter
kvm: selftest: add dirty logging test
kvm: selftest: pass in extra memory when create vm
kvm: selftest: include the tools headers
kvm: selftest: unify the guest port macros
tools: introduce test_and_clear_bit
KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
KVM: vmx: Inject #UD for SGX ENCLS instruction in guest
KVM: vmx: Add defines for SGX ENCLS exiting
x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush()
x86: kvm: avoid unused variable warning
KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR
KVM: arm/arm64: Skip updating PTE entry if no change
KVM: arm/arm64: Skip updating PMD entry if no change
KVM: arm: Use true and false for boolean values
KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled
KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h
KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses
KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accesses
KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs
...

Linus Torvalds
2018-08-23 04:52:44 +0800
cd9b44f90 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:

- the rest of MM

- procfs updates

- various misc things

- more y2038 fixes

- get_maintainer updates

- lib/ updates

- checkpatch updates

- various epoll updates

- autofs updates

- hfsplus

- some reiserfs work

- fatfs updates

- signal.c cleanups

- ipc/ updates

* emailed patches from Andrew Morton : (166 commits)
ipc/util.c: update return value of ipc_getref from int to bool
ipc/util.c: further variable name cleanups
ipc: simplify ipc initialization
ipc: get rid of ids->tables_initialized hack
lib/rhashtable: guarantee initial hashtable allocation
lib/rhashtable: simplify bucket_table_alloc()
ipc: drop ipc_lock()
ipc/util.c: correct comment in ipc_obtain_object_check
ipc: rename ipcctl_pre_down_nolock()
ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
ipc: reorganize initialization of kern_ipc_perm.seq
ipc: compute kern_ipc_perm.id under the ipc lock
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
adfs: use timespec64 for time conversion
kernel/sysctl.c: fix typos in comments
drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
fork: don't copy inconsistent signal handler state to child
signal: make get_signal() return bool
signal: make sigkill_pending() return bool
...

Linus Torvalds
2018-08-23 03:34:08 +0800
93065ac75 mm, oom: distinguish blockable mode for mmu notifiers ... Browse Code »

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep. That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though. Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held. Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range. Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false. This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that. The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode. A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom. This can be done e.g. after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small. Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Christian König # AMD notifiers
Acked-by: Leon Romanovsky # mlx and umem_odp
Reported-by: David Rientjes
Cc: "David (ChunMing) Zhou"
Cc: Paolo Bonzini
Cc: Alex Deucher
Cc: David Airlie
Cc: Jani Nikula
Cc: Joonas Lahtinen
Cc: Rodrigo Vivi
Cc: Doug Ledford
Cc: Jason Gunthorpe
Cc: Mike Marciniszyn
Cc: Dennis Dalessandro
Cc: Sudeep Dutt
Cc: Ashutosh Dixit
Cc: Dimitri Sivanich
Cc: Boris Ostrovsky
Cc: Juergen Gross
Cc: "Jérôme Glisse"
Cc: Andrea Arcangeli
Cc: Felix Kuehling
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2018-08-23 01:52:44 +0800

22 Aug, 2018

2 commits

631989303 Merge tag 'kvmarm-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kv… ... Browse Code »

…marm/kvmarm into HEAD

KVM/arm updates for 4.19

- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups

Paolo Bonzini
2018-08-22 20:07:56 +0800
0214f46b3 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/eb… ... Browse Code »

…iederm/user-namespace

Pull core signal handling updates from Eric Biederman:
"It was observed that a periodic timer in combination with a
sufficiently expensive fork could prevent fork from every completing.
This contains the changes to remove the need for that restart.

This set of changes is split into several parts:

- The first part makes PIDTYPE_TGID a proper pid type instead
something only for very special cases. The part starts using
PIDTYPE_TGID enough so that in __send_signal where signals are
actually delivered we know if the signal is being sent to a a group
of processes or just a single process.

- With that prep work out of the way the logic in fork is modified so
that fork logically makes signals received while it is running
appear to be received after the fork completes"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
signal: Don't send signals to tasks that don't exist
signal: Don't restart fork when signals come in.
fork: Have new threads join on-going signal group stops
fork: Skip setting TIF_SIGPENDING in ptrace_init_task
signal: Add calculate_sigpending()
fork: Unconditionally exit if a fatal signal is pending
fork: Move and describe why the code examines PIDNS_ADDING
signal: Push pid type down into complete_signal.
signal: Push pid type down into __send_signal
signal: Push pid type down into send_signal
signal: Pass pid type into do_send_sig_info
signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
signal: Pass pid type into group_send_sig_info
signal: Pass pid and pid type into send_sigqueue
posix-timers: Noralize good_sigevent
signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
pid: Implement PIDTYPE_TGID
pids: Move the pgrp and session pid pointers from task_struct to signal_struct
kvm: Don't open code task_pid in kvm_vcpu_ioctl
pids: Compute task_tgid using signal->leader_pid
...

Linus Torvalds
2018-08-22 04:47:29 +0800

20 Aug, 2018

1 commit

e61cf2e3a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull first set of KVM updates from Paolo Bonzini:
"PPC:
- minor code cleanups

x86:
- PCID emulation and CR3 caching for shadow page tables
- nested VMX live migration
- nested VMCS shadowing
- optimized IPI hypercall
- some optimizations

ARM will come next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits)
kvm: x86: Set highest physical address bits in non-present/reserved SPTEs
KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
KVM: X86: Implement PV IPIs in linux guest
KVM: X86: Add kvm hypervisor init time platform setup callback
KVM: X86: Implement "send IPI" hypercall
KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
KVM: x86: Skip pae_root shadow allocation if tdp enabled
KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
KVM: vmx: move struct host_state usage to struct loaded_vmcs
KVM: vmx: compute need to reload FS/GS/LDT on demand
KVM: nVMX: remove a misleading comment regarding vmcs02 fields
KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
KVM: vmx: refactor segmentation code in vmx_save_host_state()
kvm: nVMX: Fix fault priority for VMX operations
kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
...

Linus Torvalds
2018-08-20 01:38:36 +0800

15 Aug, 2018

1 commit

1202f4fdb Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Will Deacon:
"A bunch of good stuff in here. Worth noting is that we've pulled in
the x86/mm branch from -tip so that we can make use of the core
ioremap changes which allow us to put down huge mappings in the
vmalloc area without screwing up the TLB. Much of the positive
diffstat is because of the rseq selftest for arm64.

Summary:

- Wire up support for qspinlock, replacing our trusty ticket lock
code

- Add an IPI to flush_icache_range() to ensure that stale
instructions fetched into the pipeline are discarded along with the
I-cache lines

- Support for the GCC "stackleak" plugin

- Support for restartable sequences, plus an arm64 port for the
selftest

- Kexec/kdump support on systems booting with ACPI

- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace

- Support for chained PMU counters, allowing 64-bit event counters to
be constructed on current CPUs

- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events

- Re-enable support for huge vmalloc/IO mappings now that the core
code has the correct hooks to use break-before-make sequences

- Miscellaneous, non-critical fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
arm64: alternative: Use true and false for boolean values
arm64: kexec: Add comment to explain use of __flush_icache_range()
arm64: sdei: Mark sdei stack helper functions as static
arm64, kaslr: export offset in VMCOREINFO ELF notes
arm64: perf: Add cap_user_time aarch64
efi/libstub: Only disable stackleak plugin for arm64
arm64: drop unused kernel_neon_begin_partial() macro
arm64: kexec: machine_kexec should call __flush_icache_range
arm64: svc: Ensure hardirq tracing is updated before return
arm64: mm: Export __sync_icache_dcache() for xen-privcmd
drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory
arm64: Add support for STACKLEAK gcc plugin
arm64: Add stack information to on_accessible_stack
drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported
arm64: fix ACPI dependencies
rseq/selftests: Add support for arm64
arm64: acpi: fix alignment fault in accessing ACPI
efi/arm: map UEFI memory map even w/o runtime services enabled
efi/arm: preserve early mapping of UEFI memory map longer for BGRT
drivers: acpi: add dependency of EFI for arm64
...

Linus Torvalds
2018-08-15 07:39:13 +0800

13 Aug, 2018

2 commits

976d34e2d KVM: arm/arm64: Skip updating PTE entry if no change ... Browse Code »

When there is contention on faulting in a particular page table entry
at stage 2, the break-before-make requirement of the architecture can
lead to additional refaulting due to TLB invalidation.

Avoid this by skipping a page table update if the new value of the PTE
matches the previous value.

Cc: stable@vger.kernel.org
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-08-13 22:32:01 +0800
86658b819 KVM: arm/arm64: Skip updating PMD entry if no change ... Browse Code »

Contention on updating a PMD entry by a large number of vcpus can lead
to duplicate work when handling stage 2 page faults. As the page table
update follows the break-before-make requirement of the architecture,
it can lead to repeated refaults due to clearing the entry and
flushing the tlbs.

This problem is more likely when -

* there are large number of vcpus
* the mapping is large block mapping

such as when using PMD hugepages (512MB) with 64k pages.

Fix this by skipping the page table update if there is no change in
the entry being updated.

Cc: stable@vger.kernel.org
Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages")
Reviewed-by: Suzuki Poulose
Acked-by: Christoffer Dall
Signed-off-by: Punit Agrawal
Signed-off-by: Marc Zyngier

Punit Agrawal
2018-08-13 22:31:35 +0800

12 Aug, 2018

3 commits

d0823cb34 KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled ... Browse Code »

kvm_vgic_sync_hwstate is only called with IRQ being disabled.
There is thus no need to call spin_lock_irqsave/restore in
vgic_fold_lr_state and vgic_prune_ap_list.

This patch replace them with the non irq-safe version.

Signed-off-by: Jia He
Acked-by: Christoffer Dall
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier

Jia He
2018-08-12 19:15:18 +0800
dc961e539 KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h ... Browse Code »

DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3,
so let's move it to vgic.h

Signed-off-by: Jia He
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier

Jia He
2018-08-12 19:14:08 +0800
6249f2a47 KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs ... Browse Code »

Although vgic-v3 now supports Group0 interrupts, it still doesn't
deal with Group0 SGIs. As usually with the GIC, nothing is simple:

- ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1
with KVM (as per 8.1.10, Non-secure EL1 access)

- ICC_SGI0R can only generate Group0 SGIs

- ICC_ASGI1R sees its scope refocussed to generate only Group0
SGIs (as per the note at the bottom of Table 8-14)

We only support Group1 SGIs so far, so no material change.

Reviewed-by: Eric Auger
Reviewed-by: Christoffer Dall
Signed-off-by: Marc Zyngier

Marc Zyngier
2018-08-12 19:06:34 +0800

06 Aug, 2018

3 commits

b9b33da2a KVM: try __get_user_pages_fast even if not in atomic context ... Browse Code »

We are currently cutting hva_to_pfn_fast short if we do not want an
immediate exit, which is represented by !async && !atomic. However,
this is unnecessary, and __get_user_pages_fast is *much* faster
because the regular get_user_pages takes pmd_lock/pte_lock.
In fact, when many CPUs take a nested vmexit at the same time
the contention on those locks is visible, and this patch removes
about 25% (compared to 4.18) from vmexit.flat on a 16 vCPU
nested guest.

Suggested-by: Andrea Arcangeli
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2018-08-06 23:59:07 +0800
b08660e59 KVM: x86: Add tlb remote flush callback in kvm_x86_ops. ... Browse Code »

This patch is to provide a way for platforms to register hv tlb remote
flush callback and this helps to optimize operation of tlb flush
among vcpus for nested virtualization case.

Signed-off-by: Lan Tianyu
Signed-off-by: Paolo Bonzini

Tianyu Lan
2018-08-06 23:59:06 +0800
50c28f21d kvm: x86: Use fast CR3 switch for nested VMX ... Browse Code »

Use the fast CR3 switch mechanism to locklessly change the MMU root
page when switching between L1 and L2. The switch from L2 to L1 should
always go through the fast path, while the switch from L1 to L2 should
go through the fast path if L1's CR3/EPTP for L2 hasn't changed
since the last time.

Signed-off-by: Junaid Shahid
Signed-off-by: Paolo Bonzini

Junaid Shahid
2018-08-06 23:58:54 +0800