25 May, 2020

1 commit


12 May, 2020

5 commits

  • $(AS) is not used anywhere in the kernel build, hence commit
    aa824e0c962b ("kbuild: remove AS variable") killed it.

    Remove the left-over code in arch/{arm,arm64}/Makefile.

    Signed-off-by: Masahiro Yamada
    Reviewed-by: Nathan Chancellor
    Acked-by: Will Deacon

    Masahiro Yamada
     
  • Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from
    -Wframe-larger-than="), 'make ARCH=unicore32 clean' emits error
    messages as follows:

    $ make ARCH=unicore32 clean
    gcc: error: missing argument to '-Wframe-larger-than='
    gcc: error: missing argument to '-Wframe-larger-than='

    We do not care compiler flags when cleaning.

    Use the '=' operator for lazy expansion because we do not use
    GNU_LIBC_A or GNU_LIBGCC_A when cleaning.

    Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from -Wframe-larger-than=")
    Signed-off-by: Masahiro Yamada
    Reviewed-by: Nick Desaulniers

    Masahiro Yamada
     
  • 'make ARCH=h8300 clean' emits error messages as follows:

    $ make ARCH=h8300 clean
    gcc: error: missing argument to '-Wframe-larger-than='
    gcc: error: unrecognized command line option '-mint32'

    You can suppress the second one by setting the correct CROSS_COMPILE=,
    but we should not require any compiler for cleaning.

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     
  • 'make ARCH=hexagon clean' emits an error message as follows:

    $ make ARCH=hexagon clean
    gcc: error: unrecognized command line option '-G0'

    You can suppress it by setting the correct CROSS_COMPILE=,
    but we should not require any compiler for cleaning.

    Signed-off-by: Masahiro Yamada
    Acked-by: Brian Cain

    Masahiro Yamada
     
  • Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from
    -Wframe-larger-than="), 'make ARCH=um clean' emits an error message
    as follows:

    $ make ARCH=um clean
    gcc: error: missing argument to '-Wframe-larger-than='

    We do not care compiler flags when cleaning.

    Use the '=' operator for lazy expansion because we do not use
    LDFLAGS_pcap.o or LDFLAGS_vde.o when cleaning.

    While I was here, I removed the redundant -r option because it
    already exists in the recipe.

    Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from -Wframe-larger-than=")
    Signed-off-by: Masahiro Yamada
    Reviewed-by: Nathan Chancellor
    Tested-by: Nathan Chancellor [build]

    Masahiro Yamada
     

11 May, 2020

2 commits

  • Pull x86 fixes from Thomas Gleixner:
    "A set of fixes for x86:

    - Ensure that direct mapping alias is always flushed when changing
    page attributes. The optimization for small ranges failed to do so
    when the virtual address was in the vmalloc or module space.

    - Unbreak the trace event registration for syscalls without arguments
    caused by the refactoring of the SYSCALL_DEFINE0() macro.

    - Move the printk in the TSC deadline timer code to a place where it
    is guaranteed to only be called once during boot and cannot be
    rearmed by clearing warn_once after boot. If it's invoked post boot
    then lockdep rightfully complains about a potential deadlock as the
    calling context is different.

    - A series of fixes for objtool and the ORC unwinder addressing
    variety of small issues:

    - Stack offset tracking for indirect CFAs in objtool ignored
    subsequent pushs and pops

    - Repair the unwind hints in the register clearing entry ASM code

    - Make the unwinding in the low level exit to usermode code stop
    after switching to the trampoline stack. The unwind hint is no
    longer valid and the ORC unwinder emits a warning as it can't
    find the registers anymore.

    - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
    which caused objtool to generate bogus ORC data.

    - Prevent unwinder warnings when dumping the stack of a
    non-current task as there is no way to be sure about the
    validity because the dumped stack can be a moving target.

    - Make the ORC unwinder behave the same way as the frame pointer
    unwinder when dumping an inactive tasks stack and do not skip
    the first frame.

    - Prevent ORC unwinding before ORC data has been initialized

    - Immediately terminate unwinding when a unknown ORC entry type
    is found.

    - Prevent premature stop of the unwinder caused by IRET frames.

    - Fix another infinite loop in objtool caused by a negative
    offset which was not catched.

    - Address a few build warnings in the ORC unwinder and add
    missing static/ro_after_init annotations"

    * tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
    x86/apic: Move TSC deadline timer debug printk
    ftrace/x86: Fix trace event registration for syscalls without arguments
    x86/mm/cpa: Flush direct map alias during cpa
    objtool: Fix infinite loop in for_offset_range()
    x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
    x86/unwind/orc: Fix error path for bad ORC entry type
    x86/unwind/orc: Prevent unwinding before ORC initialization
    x86/unwind/orc: Don't skip the first frame for inactive tasks
    x86/unwind: Prevent false warnings for non-current tasks
    x86/unwind/orc: Convert global variables to static
    x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
    x86/entry/64: Fix unwind hints in __switch_to_asm()
    x86/entry/64: Fix unwind hints in kernel exit path
    x86/entry/64: Fix unwind hints in register clearing code
    objtool: Fix stack offset tracking for indirect CFAs

    Linus Torvalds
     
  • Pull locking fix from Thomas Gleixner:
    "A single fix for the fallout of the recent futex uacess rework.

    With those changes GCC9 fails to analyze arch_futex_atomic_op_inuser()
    correctly and emits a 'maybe unitialized' warning. While we usually
    ignore compiler stupidity the conditional store is pointless anyway
    because the correct case has to store. For the fault case the extra
    store does no harm"

    * tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ARM: futex: Address build warning

    Linus Torvalds
     

10 May, 2020

1 commit

  • Pull RISC-V fixes from Palmer Dabbelt:
    "A smattering of fixes and cleanups:

    - Dead code removal.

    - Exporting riscv_cpuid_to_hartid_mask for modules.

    - Per-CPU tracking of ISA features.

    - Setting max_pfn correctly when probing memory.

    - Adding a note to the VDSO so glibc can check the kernel's version
    without a uname().

    - A fix to force the bootloader to initialize the boot spin tables,
    which still get used as a fallback when SBI-0.1 is enabled"

    * tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
    RISC-V: Remove unused code from STRICT_KERNEL_RWX
    riscv: force __cpu_up_ variables to put in data section
    riscv: add Linux note to vdso
    riscv: set max_pfn to the PFN of the last page
    RISC-V: Remove N-extension related defines
    RISC-V: Add bitmap reprensenting ISA features common across CPUs
    RISC-V: Export riscv_cpuid_to_hartid_mask() API

    Linus Torvalds
     

08 May, 2020

4 commits

  • Merge misc fixes from Andrew Morton:
    "14 fixes and one selftest to verify the ipc fixes herein"

    * emailed patches from Andrew Morton :
    mm: limit boost_watermark on small zones
    ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
    mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
    epoll: atomically remove wait entry on wake up
    kselftests: introduce new epoll60 testcase for catching lost wakeups
    percpu: make pcpu_alloc() aware of current gfp context
    mm/slub: fix incorrect interpretation of s->offset
    scripts/gdb: repair rb_first() and rb_last()
    eventpoll: fix missing wakeup for ovflist in ep_poll_callback
    arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
    scripts/decodecode: fix trapping instruction formatting
    kernel/kcov.c: fix typos in kcov_remote_start documentation
    mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
    mm, memcg: fix error return value of mem_cgroup_css_alloc()
    ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()

    Linus Torvalds
     
  • When trying to lock read-only pages, sev_pin_memory() fails because
    FOLL_WRITE is used as the flag for get_user_pages_fast().

    Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a
    write 'bool'") updated the get_user_pages_fast() call sites to use
    flags, but incorrectly updated the call in sev_pin_memory(). As the
    original coding of this call was correct, revert the change made by that
    commit.

    Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
    Signed-off-by: Janakarajan Natarajan
    Signed-off-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Cc: Paolo Bonzini
    Cc: Sean Christopherson
    Cc: Vitaly Kuznetsov
    Cc: Wanpeng Li
    Cc: Jim Mattson
    Cc: Joerg Roedel
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H . Peter Anvin"
    Cc: Mike Marshall
    Cc: Brijesh Singh
    Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com
    Signed-off-by: Linus Torvalds

    Janakarajan Natarajan
     
  • Pull arm64 fix from Catalin Marinas:
    "Avoid potential NULL dereference in huge_pte_alloc() on pmd_alloc()
    failure"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: hugetlb: avoid potential NULL dereference

    Linus Torvalds
     
  • Pull kvm fixes from Paolo Bonzini:
    "Bugfixes, mostly for ARM and AMD, and more documentation.

    Slightly bigger than usual because I couldn't send out what was
    pending for rc4, but there is nothing worrisome going on. I have more
    fixes pending for guest debugging support (gdbstub) but I will send
    them next week"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
    KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
    KVM: selftests: Fix build for evmcs.h
    kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
    KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
    docs/virt/kvm: Document configuring and running nested guests
    KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
    kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
    KVM: x86: Fixes posted interrupt check for IRQs delivery modes
    KVM: SVM: fill in kvm_run->debug.arch.dr[67]
    KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
    KVM: arm64: Fix 32bit PC wrap-around
    KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
    KVM: arm64: Save/restore sp_el0 as part of __guest_enter
    KVM: arm64: Delete duplicated label in invalid_vector
    KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
    KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
    KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
    KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
    KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
    KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
    ...

    Linus Torvalds
     

07 May, 2020

3 commits

  • The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may
    pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL:

    | CC arch/arm64/mm/pageattr.o
    | CC arch/arm64/mm/hugetlbpage.o
    | from arch/arm64/mm/hugetlbpage.c:10:
    | arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’:
    | ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference]
    | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
    | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
    | |arch/arm64/mm/hugetlbpage.c:232:10:
    | |./arch/arm64/include/asm/pgtable-types.h:28:24:
    | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
    | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’

    This can only occur when the kernel cannot allocate a page, and so is
    unlikely to happen in practice before other systems start failing.

    We can avoid this by bailing out if pmd_alloc() fails, as we do earlier
    in the function if pud_alloc() fails.

    Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
    Signed-off-by: Mark Rutland
    Reported-by: Kyrill Tkachov
    Cc: # 4.5.x-
    Cc: Will Deacon
    Signed-off-by: Catalin Marinas

    Mark Rutland
     
  • Stephen reported the following build warning on a ARM multi_v7_defconfig
    build with GCC 9.2.1:

    kernel/futex.c: In function 'do_futex':
    kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
    1676 | return oldval == cmparg;
    | ~~~~~~~^~~~~~~~~
    kernel/futex.c:1652:6: note: 'oldval' was declared here
    1652 | int oldval, ret;
    | ^~~~~~

    introduced by commit a08971e9488d ("futex: arch_futex_atomic_op_inuser()
    calling conventions change").

    While that change should not make any difference it confuses GCC which
    fails to work out that oldval is not referenced when the return value is
    not zero.

    GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the
    early return, the issue is with the assembly macros. GCC fails to detect
    that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT
    which makes oldval uninteresting. The store to the callsite supplied oldval
    pointer is conditional on ret == 0.

    The straight forward way to solve this is to make the store unconditional.

    Aside of addressing the build warning this makes sense anyway because it
    removes the conditional from the fastpath. In the error case the stored
    value is uninteresting and the extra store does not matter at all.

    Reported-by: Stephen Rothwell
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/87pncao2ph.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes a potential scheduling latency problem for the algorithms
    used by WireGuard"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: arch/nhpoly1305 - process in explicit 4k chunks
    crypto: arch/lib - limit simd usage to 4k chunks

    Linus Torvalds
     

06 May, 2020

5 commits

  • …it/kvms390/linux into HEAD

    KVM: s390: Fix for running nested uner z/VM

    There are circumstances when running nested under z/VM that would trigger a
    WARN_ON_ONCE. Remove the WARN_ON_ONCE. Long term we certainly want to make this
    code more robust and flexible, but just returning instead of WARNING makes
    guest bootable again.

    Paolo Bonzini
     
  • KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared
    as supported. My wild guess is that userspaces like QEMU are using "#ifdef
    KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be
    wrong because the compilation host may not be the runtime host.

    The userspace might still want to keep the old "#ifdef" though to not break the
    guest debug on old kernels.

    Signed-off-by: Peter Xu
    Message-Id:
    [Do the same for PPC and s390. - Paolo]
    Signed-off-by: Paolo Bonzini

    Peter Xu
     
  • Using CPUID data can be useful for the processor compatibility
    check, but that's it. Using it to compute guest-reserved bits
    can have both false positives (such as LA57 and UMIP which we
    are already handling) and false negatives: in particular, with
    this patch we don't allow anymore a KVM guest to set CR4.PKE
    when CR4.PKE is clear on the host.

    Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU")
    Reported-by: Jim Mattson
    Tested-by: Jim Mattson
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so
    that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the
    RSB has always clobbered RFLAGS, its current incarnation just happens
    clear CF and ZF in the processs. Relying on the macro to clear CF and
    ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation:
    Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such
    that the ZF flag is always set.

    Reported-by: Qian Cai
    Cc: Rick Edgecombe
    Cc: Peter Zijlstra (Intel)
    Cc: Josh Poimboeuf
    Cc: stable@vger.kernel.org
    Fixes: f2fde6a5bcfcf ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit")
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     
  • This patch removes the unused functions set_kernel_text_rw/ro.
    Currently, it is not being invoked from anywhere and no other architecture
    (except arm) uses this code. Even in ARM, these functions are not invoked
    from anywhere currently.

    Fixes: d27c3c90817e ("riscv: add STRICT_KERNEL_RWX support")
    Signed-off-by: Atish Patra
    Reviewed-by: Zong Li
    Signed-off-by: Palmer Dabbelt

    Atish Patra
     

05 May, 2020

10 commits

  • In LPAR we will only get an intercept for FC==3 for the PQAP
    instruction. Running nested under z/VM can result in other intercepts as
    well as ECA_APIE is an effective bit: If one hypervisor layer has
    turned this bit off, the end result will be that we will get intercepts for
    all function codes. Usually the first one will be a query like PQAP(QCI).
    So the WARN_ON_ONCE is not right. Let us simply remove it.

    Cc: Pierre Morel
    Cc: Tony Krowiak
    Cc: stable@vger.kernel.org # v5.3+
    Fixes: e5282de93105 ("s390: ap: kvm: add PQAP interception for AQIC")
    Link: https://lore.kernel.org/kvm/20200505083515.2720-1-borntraeger@de.ibm.com
    Reported-by: Qian Cai
    Signed-off-by: Christian Borntraeger
    Reviewed-by: David Hildenbrand
    Reviewed-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger

    Christian Borntraeger
     
  • Put __cpu_up_stack_pointer and __cpu_up_task_pointer in data section.
    Currently, these two variables are put in bss section, there is a
    potential risk that secondary harts get the uninitialized value before
    main hart finishing the bss clearing. In this case, all secondary
    harts would pass the waiting loop and enable the MMU before main hart
    set up the page table.

    This issue happens on random booting of multiple harts, which means
    it will manifest for BBL and OpenSBI v0.6 (or older version). In OpenSBI
    v0.7 (or higher version), we have HSM extension so all the secondary harts
    are brought-up by Linux kernel in an orderly fashion. This means we don't
    need this change for OpenSBI v0.7 (or higher version).

    Signed-off-by: Zong Li
    Reviewed-by: Greentime Hu
    Reviewed-by: Anup Patel
    Reviewed-by: Atish Patra
    Signed-off-by: Palmer Dabbelt

    Zong Li
     
  • The Linux note in the vdso allows glibc to check the running kernel
    version without having to issue the uname syscall.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Palmer Dabbelt

    Andreas Schwab
     
  • The current max_pfn equals to zero. In this case, I found it caused users
    cannot get some page information through /proc such as kpagecount in v5.6
    kernel because of new sanity checks. The following message is displayed by
    stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t
    1" on HiFive unleashed board.

    # stress-ng --verbose --physpage 1 -t 1
    stress-ng: debug: [109] 4 processors online, 4 processors configured
    stress-ng: info: [109] dispatching hogs: 1 physpage
    stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0
    stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0
    stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found
    stress-ng: debug: [109] cache allocate: default cache size: 2048K
    stress-ng: debug: [109] starting stressors
    stress-ng: debug: [109] 1 stressor spawned
    stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0)
    stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success)
    stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
    ...
    stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
    stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0)
    stress-ng: debug: [109] process [110] terminated
    stress-ng: info: [109] successful run completed in 1.00s
    #

    After applying this patch, the kernel can pass the test.

    # stress-ng --verbose --physpage 1 -t 1
    stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage
    stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs
    stress-ng: debug: [104] cache allocate: default cache size: 2048K
    stress-ng: debug: [104] starting stressors
    stress-ng: debug: [104] 1 stressor spawned
    stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated
    stress-ng: info: [104] successful run completed in 1.01s
    #

    Cc: stable@vger.kernel.org
    Signed-off-by: Vincent Chen
    Reviewed-by: Anup Patel
    Reviewed-by: Yash Shah
    Tested-by: Yash Shah
    Signed-off-by: Palmer Dabbelt

    Vincent Chen
     
  • The RISC-V N-extension is still in draft state hence remove
    N-extension related defines from asm/csr.h.

    Signed-off-by: Anup Patel
    Signed-off-by: Palmer Dabbelt

    Anup Patel
     
  • This patch adds riscv_isa bitmap which represents Host ISA features
    common across all Host CPUs. The riscv_isa is not same as elf_hwcap
    because elf_hwcap will only have ISA features relevant for user-space
    apps whereas riscv_isa will have ISA features relevant to both kernel
    and user-space apps.

    One of the use-case for riscv_isa bitmap is in KVM hypervisor where
    we will use it to do following operations:

    1. Check whether hypervisor extension is available
    2. Find ISA features that need to be virtualized (e.g. floating
    point support, vector extension, etc.)

    Signed-off-by: Anup Patel
    Signed-off-by: Atish Patra
    Reviewed-by: Alexander Graf
    Signed-off-by: Palmer Dabbelt

    Anup Patel
     
  • The riscv_cpuid_to_hartid_mask() API should be exported to allow
    building KVM RISC-V as loadable module.

    Signed-off-by: Anup Patel
    Reviewed-by: Palmer Dabbelt
    Signed-off-by: Palmer Dabbelt

    Anup Patel
     
  • Commit f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
    the following infinite loop:

    BUG: stack guard page was hit at 000000008f595917 \
    (stack is 00000000bdefe5a4..00000000ae2b06f5)
    kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
    RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
    Call Trace:
    irqfd_resampler_ack+0x32/0x90 [kvm]
    kvm_notify_acked_irq+0x62/0xd0 [kvm]
    kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
    ioapic_set_irq+0x20e/0x240 [kvm]
    kvm_ioapic_set_irq+0x5c/0x80 [kvm]
    kvm_set_irq+0xbb/0x160 [kvm]
    ? kvm_hv_set_sint+0x20/0x20 [kvm]
    irqfd_resampler_ack+0x32/0x90 [kvm]
    kvm_notify_acked_irq+0x62/0xd0 [kvm]
    kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
    ioapic_set_irq+0x20e/0x240 [kvm]
    kvm_ioapic_set_irq+0x5c/0x80 [kvm]
    kvm_set_irq+0xbb/0x160 [kvm]
    ? kvm_hv_set_sint+0x20/0x20 [kvm]
    ....

    The re-entrancy happens because the irq state is the OR of
    the interrupt state and the resamplefd state. That is, we don't
    want to show the state as 0 until we've had a chance to set the
    resamplefd. But if the interrupt has _not_ gone low then
    ioapic_set_irq is invoked again, causing an infinite loop.

    This can only happen for a level-triggered interrupt, otherwise
    irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high
    and then low. Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because
    TMR is set. Thus, fix the bug by restricting the lazy invocation
    of the ack notifier to edge-triggered interrupts, the only ones that
    need it.

    Tested-by: Suravee Suthikulpanit
    Reported-by: borisvk@bstnet.org
    Suggested-by: Paolo Bonzini
    Link: https://www.spinics.net/lists/kvm/msg213512.html
    Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI")
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Current logic incorrectly uses the enum ioapic_irq_destination_types
    to check the posted interrupt destination types. However, the value was
    set using APIC_DM_XXX macros, which are left-shifted by 8 bits.

    Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.

    Fixes: (fdcf75621375 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
    Cc: Alexander Graf
    Signed-off-by: Suravee Suthikulpanit
    Message-Id:
    Reviewed-by: Maxim Levitsky
    Tested-by: Maxim Levitsky
    Signed-off-by: Paolo Bonzini

    Suravee Suthikulpanit
     
  • …kvmarm/kvmarm into kvm-master

    KVM/arm fixes for Linux 5.7, take #2

    - Fix compilation with Clang
    - Correctly initialize GICv4.1 in the absence of a virtual ITS
    - Move SP_EL0 save/restore to the guest entry/exit code
    - Handle PC wrap around on 32bit guests, and narrow all 32bit
    registers on userspace access

    Paolo Bonzini
     

04 May, 2020

2 commits

  • The corresponding code was added for VMX in commit 42dbaa5a057
    ("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD.
    Fix this.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Use BUG() in the impossible-to-hit default case when switching on the
    scope of INVEPT to squash a warning with clang 11 due to clang treating
    the BUG_ON() as conditional.

    >> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free'
    is used uninitialized whenever 'if' condition is false
    [-Wsometimes-uninitialized]
    BUG_ON(1);

    Reported-by: kbuild test robot
    Fixes: ce8fe7b77bd8 ("KVM: nVMX: Free only the affected contexts when emulating INVEPT")
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     

03 May, 2020

1 commit

  • Fix the following warnings seen with !CONFIG_MODULES:

    arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable]
    29 | static struct orc_entry *cur_orc_table = __start_orc_unwind;
    | ^~~~~~~~~~~~~
    arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable]
    28 | static int *cur_orc_ip_table = __start_orc_unwind_ip;
    | ^~~~~~~~~~~~~~~~

    Fixes: 153eb2223c79 ("x86/unwind/orc: Convert global variables to static")
    Reported-by: Stephen Rothwell
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Linux Next Mailing List
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble

    Josh Poimboeuf
     

02 May, 2020

3 commits

  • Pull arm64 fix from Catalin Marinas:
    "Add -fasynchronous-unwind-tables to the vDSO CFLAGS"

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: vdso: Add -fasynchronous-unwind-tables to cflags

    Linus Torvalds
     
  • Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a
    lockdep splat due to a lock order violation between hrtimer_base::lock and
    console_sem, when the 'once' condition is reset via
    /sys/kernel/debug/clear_warn_once after boot.

    The initial printk cannot trigger this because that happens during boot
    when the local APIC timer is set up on the boot CPU.

    Prevent it by moving the printk to a place which is guaranteed to be only
    called once during boot.

    Mark the deadline timer check related functions and data __init while at
    it.

    Reported-by: Leon Romanovsky
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • The refactoring of SYSCALL_DEFINE0() macros removed the ABI stubs and
    simply defines __abi_sys_$NAME as alias of __do_sys_$NAME.

    As a result kallsyms_lookup() returns "__do_sys_$NAME" which does not match
    with the declared trace event name.

    See also commit 1c758a2202a6 ("tracing/x86: Update syscall trace events to
    handle new prefixed syscall func names").

    Add __do_sys_ to the valid prefixes which are checked in
    arch_syscall_match_sym_name().

    Fixes: d2b5de495ee9 ("x86/entry: Refactor SYSCALL_DEFINE0 macros")
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Thomas Gleixner
    Acked-by: Steven Rostedt (VMware)
    Link: https://lkml.kernel.org/r/158636958997.7900.16485049455470033557.stgit@buzz

    Konstantin Khlebnikov
     

01 May, 2020

3 commits

  • In the unlikely event that a 32bit vcpu traps into the hypervisor
    on an instruction that is located right at the end of the 32bit
    range, the emulation of that instruction is going to increment
    PC past the 32bit range. This isn't great, as userspace can then
    observe this value and get a bit confused.

    Conversly, userspace can do things like (in the context of a 64bit
    guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0,
    set PC to a 64bit value, change PSTATE to AArch32-USR, and observe
    that PC hasn't been truncated. More confusion.

    Fix both by:
    - truncating PC increments for 32bit guests
    - sanitizing all 32bit regs every time a core reg is changed by
    userspace, and that PSTATE indicates a 32bit mode.

    Cc: stable@vger.kernel.org
    Acked-by: Will Deacon
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • As an optimization, cpa_flush() was changed to optionally only flush
    the range in @cpa if it was small enough. However, this range does
    not include any direct map aliases changed in cpa_process_alias(). So
    small set_memory_() calls that touch that alias don't get the direct
    map changes flushed. This situation can happen when the virtual
    address taking variants are passed an address in vmalloc or modules
    space.

    In these cases, force a full TLB flush.

    Note this issue does not extend to cases where the set_memory_() calls are
    passed a direct map address, or page array, etc, as the primary target. In
    those cases the direct map would be flushed.

    Fixes: 935f5839827e ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation")
    Signed-off-by: Rick Edgecombe
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200424105343.GA20730@hirez.programming.kicks-ass.net

    Rick Edgecombe
     
  • On arm64 linux gcc uses -fasynchronous-unwind-tables -funwind-tables
    by default since gcc-8, so now the de facto platform ABI is to allow
    unwinding from async signal handlers.

    However on bare metal targets (aarch64-none-elf), and on old gcc,
    async and sync unwind tables are not enabled by default to avoid
    runtime memory costs.

    This means if linux is built with a baremetal toolchain the vdso.so
    may not have unwind tables which breaks the gcc platform ABI guarantee
    in userspace.

    Add -fasynchronous-unwind-tables explicitly to the vgettimeofday.o
    cflags to address the ABI change.

    Fixes: 28b1a824a4f4 ("arm64: vdso: Substitute gettimeofday() with C implementation")
    Cc: Will Deacon
    Reported-by: Szabolcs Nagy
    Signed-off-by: Vincenzo Frascino
    Signed-off-by: Catalin Marinas

    Vincenzo Frascino