06 Sep, 2019

1 commit


03 Sep, 2019

1 commit

  • When the 'start' parameter is >= 0xFF000000 on 32-bit
    systems, or >= 0xFFFFFFFF'FF000000 on 64-bit systems,
    fill_gva_list() gets into an infinite loop.

    With such inputs, 'cur' overflows after adding HV_TLB_FLUSH_UNIT
    and always compares as less than end. Memory is filled with
    guest virtual addresses until the system crashes.

    Fix this by never incrementing 'cur' to be larger than 'end'.

    Reported-by: Jong Hyun Park
    Signed-off-by: Tianyu Lan
    Reviewed-by: Michael Kelley
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 2ffd9e33ce4a ("x86/hyper-v: Use hypercall for remote TLB flush")
    Signed-off-by: Ingo Molnar

    Tianyu Lan
     

02 Sep, 2019

4 commits

  • Identical to __put_user(); the __get_user() argument evalution will too
    leak UBSAN crud into the __uaccess_begin() / __uaccess_end() region.
    While uncommon this was observed to happen for:

    drivers/xen/gntdev.c: if (__get_user(old_status, batch->status[i]))

    where UBSAN added array bound checking.

    This complements commit:

    6ae865615fc4 ("x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation")

    Tested-by Sedat Dilek
    Reported-by: Randy Dunlap
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Josh Poimboeuf
    Reviewed-by: Thomas Gleixner
    Cc: broonie@kernel.org
    Cc: sfr@canb.auug.org.au
    Cc: akpm@linux-foundation.org
    Cc: Randy Dunlap
    Cc: mhocko@suse.cz
    Cc: Josh Poimboeuf
    Link: https://lkml.kernel.org/r/20190829082445.GM2369@hirez.programming.kicks-ass.net

    Peter Zijlstra
     
  • Commit

    a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else")

    now zeroes the secure boot setting information (enabled/disabled/...)
    passed by the boot loader or by the kernel's EFI handover mechanism.

    The problem manifests itself with signed kernels using the EFI handoff
    protocol with grub and the kernel loses the information whether secure
    boot is enabled in the firmware, i.e., the log message "Secure boot
    enabled" becomes "Secure boot could not be determined".

    efi_main() arch/x86/boot/compressed/eboot.c sets this field early but it
    is subsequently zeroed by the above referenced commit.

    Include boot_params.secure_boot in the preserve field list.

    [ bp: restructure commit message and massage. ]

    Fixes: a90118c445cc ("x86/boot: Save fields explicitly, zero out everything else")
    Signed-off-by: John S. Gruber
    Signed-off-by: Borislav Petkov
    Reviewed-by: John Hubbard
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Juergen Gross
    Cc: Mark Brown
    Cc: stable
    Cc: Thomas Gleixner
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/CAPotdmSPExAuQcy9iAHqX3js_fc4mMLQOTr5RBGvizyCOPcTQQ@mail.gmail.com

    John S. Gruber
     
  • Pull x86 fixes from Thomas Gleixner:
    "A set of fixes for x86:

    - Fix the bogus detection of 32bit user mode for uretprobes which
    caused corruption of the user return address resulting in
    application crashes. In the uprobes handler in_ia32_syscall() is
    obviously always returning false on a 64bit kernel. Use
    user_64bit_mode() instead which works correctly.

    - Prevent large page splitting when ftrace flips RW/RO on the kernel
    text which caused iTLB performance issues. Ftrace wants to be
    converted to text_poke() which avoids the problem, but for now
    allow large page preservation in the static protections check when
    the change request spawns a full large page.

    - Prevent arch_dynirq_lower_bound() from returning 0 when the IOAPIC
    is configured via device tree. In the device tree case the GSI 1:1
    mapping is meaningless therefore the lower bound which protects the
    GSI range on ACPI machines is irrelevant. Return the lower bound
    which the core hands to the function instead of blindly returning 0
    which causes the core to allocate the invalid virtual interupt
    number 0 which in turn prevents all drivers from allocating and
    requesting an interrupt.

    - Remove the bogus initialization of LDR and DFR in the 32bit bigsmp
    APIC driver. That uses physical destination mode where LDR/DFR are
    ignored, but the initialization and the missing clear of LDR caused
    the APIC to be left in a inconsistent state on kexec/reboot.

    - Clear LDR when clearing the APIC registers so the APIC is in a well
    defined state.

    - Initialize variables proper in the find_trampoline_placement()
    code.

    - Silence GCC( build warning for the real mode part of the build"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text
    x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning
    x86/boot/compressed/64: Fix missing initialization in find_trampoline_placement()
    x86/apic: Include the LDR when clearing out APIC registers
    x86/apic: Do not initialize LDR and DFR for bigsmp
    uprobes/x86: Fix detection of 32-bit user mode
    x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:
    "Two fixes for perf x86 hardware implementations:

    - Restrict the period on Nehalem machines to prevent perf from
    hogging the CPU

    - Prevent the AMD IBS driver from overwriting the hardwre controlled
    and pre-seeded reserved bits (0-6) in the count register which
    caused a sample bias for dispatched micro-ops"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
    perf/x86/intel: Restrict period on Nehalem

    Linus Torvalds
     

01 Sep, 2019

2 commits

  • Pull tracing fixes from Steven Rostedt:
    "Small fixes and minor cleanups for tracing:

    - Make exported ftrace function not static

    - Fix NULL pointer dereference in reading probes as they are created

    - Fix NULL pointer dereference in k/uprobe clean up path

    - Various documentation fixes"

    * tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Correct kdoc formats
    ftrace/x86: Remove mcount() declaration
    tracing/probe: Fix null pointer dereference
    tracing: Make exported ftrace_set_clr_event non-static
    ftrace: Check for successful allocation of hash
    ftrace: Check for empty hash and comment the race with registering probes
    ftrace: Fix NULL pointer dereference in t_probe_next()

    Linus Torvalds
     
  • Pull RISC-V fix from Paul Walmsley:
    "One significant fix for 32-bit RISC-V systems:

    Fix the RV32 memory map to prevent userspace from corrupting the
    FIXMAP area. Without this patch, the system can crash very early
    during the boot"

    * tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
    RISC-V: Fix FIXMAP area corruption on RV32 systems

    Linus Torvalds
     

31 Aug, 2019

4 commits

  • Pull KVM fixes from Radim Krčmář:
    "PPC:
    - Fix bug which could leave locks held in the host on return to a
    guest.

    x86:
    - Prevent infinitely looping emulation of a failing syscall while
    single stepping.

    - Do not crash the host when nesting is disabled"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: Don't update RIP or do single-step on faulting emulation
    KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled
    KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling

    Linus Torvalds
     
  • Commit 562e14f72292 ("ftrace/x86: Remove mcount support") removed the
    support for using mcount, so we could remove the mcount() declaration
    to clean up.

    Link: http://lkml.kernel.org/r/20190826170150.10f101ba@xhacker.debian

    Signed-off-by: Jisheng Zhang
    Signed-off-by: Steven Rostedt (VMware)

    Jisheng Zhang
     
  • Pull ARM fixes from Russell King:
    "Three fixes for ARM this time around:

    - A fix for update_sections_early() to cope with NULL ->mm pointers.

    - A correction to the backtrace code to allow proper backtraces.

    - Reinforcement of pfn_valid() with PFNs >= 4GiB"

    * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
    ARM: 8901/1: add a criteria for pfn_valid of arm
    ARM: 8897/1: check stmfd instruction using right shift
    ARM: 8874/1: mm: only adjust sections of valid mm structures

    Linus Torvalds
     
  • Pull ARM SoC fixes from Arnd Bergmann:
    "The majority of the fixes this time are for OMAP hardware, here is a
    breakdown of the significant changes:

    Various device tree bug fixes:
    - TI am57xx boards need a voltage level fix to avoid damaging SD
    cards
    - vf610-bk4 fails to detect its flash due to an incorrect description
    - meson-g12a USB phy configuration fails
    - meson-g12b reboot should not power off the SD card
    - Some corrections for apparently harmless differences from the
    documentation.

    Regression fixes:
    - ams-delta FIQ interrupts broke in 5.3
    - TI am3/am4 mmc controllers broke in 5.2

    The logic_pio driver (used on some Huawei ARM servers) got a few bug
    fixes for reliability.

    And a couple of compile-time warning fixes"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
    soc: ixp4xx: Protect IXP4xx SoC drivers by ARCH_IXP4XX || COMPILE_TEST
    soc: ti: pm33xx: Make two symbols static
    soc: ti: pm33xx: Fix static checker warnings
    ARM: OMAP: dma: Mark expected switch fall-throughs
    ARM: dts: Fix incomplete dts data for am3 and am4 mmc
    bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
    ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
    ARM: dts: dra74x: Fix iodelay configuration for mmc3
    ARM: dts: am335x: Fix UARTs length
    ARM: OMAP2+: Fix omap4 errata warning on other SoCs
    bus: hisi_lpc: Add .remove method to avoid driver unbind crash
    bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
    lib: logic_pio: Add logic_pio_unregister_range()
    lib: logic_pio: Avoid possible overlap for unregistering regions
    lib: logic_pio: Fix RCU usage
    arm64: dts: amlogic: odroid-n2: keep SD card regulator always on
    arm64: dts: meson-g12a-sei510: enable IR controller
    arm64: dts: meson-g12a: add missing dwc2 phy-names
    ARM: dts: vf610-bk4: Fix qspi node description
    ARM: dts: Fix incorrect dcan register mapping for am3, am4 and dra7
    ...

    Linus Torvalds
     

30 Aug, 2019

6 commits

  • When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
    sample bias, IBS hardware preloads the least significant 7 bits of
    current count (IbsOpCurCnt) with random values, such that, after the
    interrupt is handled and counting resumes, the next sample taken
    will be slightly perturbed.

    The current count bitfield is in the IBS execution control h/w register,
    alongside the maximum count field.

    Currently, the IBS driver writes that register with the maximum count,
    leaving zeroes to fill the current count field, thereby overwriting
    the random bits the hardware preloaded for itself.

    Fix the driver to actually retain and carry those random bits from the
    read of the IBS control register, through to its write, instead of
    overwriting the lower current count bits with zeroes.

    Tested with:

    perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0

    'perf annotate' output before:

    15.70 65: addsd %xmm0,%xmm1
    17.30 add $0x1,%rax
    15.88 cmp %rdx,%rax
    je 82
    17.32 72: test $0x1,%al
    jne 7c
    7.52 movapd %xmm1,%xmm0
    5.90 jmp 65
    8.23 7c: sqrtsd %xmm1,%xmm0
    12.15 jmp 65

    'perf annotate' output after:

    16.63 65: addsd %xmm0,%xmm1
    16.82 add $0x1,%rax
    16.81 cmp %rdx,%rax
    je 82
    16.69 72: test $0x1,%al
    jne 7c
    8.30 movapd %xmm1,%xmm0
    8.13 jmp 65
    8.24 7c: sqrtsd %xmm1,%xmm0
    8.39 jmp 65

    Tested on Family 15h and 17h machines.

    Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
    and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
    affect their operation.

    It is unknown why commit db98c5faf8cb ("perf/x86: Implement 64-bit
    counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
    field; the number of preloaded random bits has always been 7, AFAICT.

    Signed-off-by: Kim Phillips
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: "Arnaldo Carvalho de Melo"
    Cc:
    Cc: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Thomas Gleixner
    Cc: "Borislav Petkov"
    Cc: Stephane Eranian
    Cc: Alexander Shishkin
    Cc: "Namhyung Kim"
    Cc: "H. Peter Anvin"
    Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com

    Kim Phillips
     
  • We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
    some cases when using perf:

    perfevents: irq loop stuck!
    WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
    ...
    RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
    ...
    Call Trace:

    ? perf_event_nmi_handler+0x2e/0x50
    ? intel_pmu_save_and_restart+0x50/0x50
    perf_event_nmi_handler+0x2e/0x50
    nmi_handle+0x6e/0x120
    default_do_nmi+0x3e/0x100
    do_nmi+0x102/0x160
    end_repeat_nmi+0x16/0x50
    ...
    ? native_write_msr+0x6/0x20
    ? native_write_msr+0x6/0x20

    intel_pmu_enable_event+0x1ce/0x1f0
    x86_pmu_start+0x78/0xa0
    x86_pmu_enable+0x252/0x310
    __perf_event_task_sched_in+0x181/0x190
    ? __switch_to_asm+0x41/0x70
    ? __switch_to_asm+0x35/0x70
    ? __switch_to_asm+0x41/0x70
    ? __switch_to_asm+0x35/0x70
    finish_task_switch+0x158/0x260
    __schedule+0x2f6/0x840
    ? hrtimer_start_range_ns+0x153/0x210
    schedule+0x32/0x80
    schedule_hrtimeout_range_clock+0x8a/0x100
    ? hrtimer_init+0x120/0x120
    ep_poll+0x2f7/0x3a0
    ? wake_up_q+0x60/0x60
    do_epoll_wait+0xa9/0xc0
    __x64_sys_epoll_wait+0x1a/0x20
    do_syscall_64+0x4e/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7fdeb1e96c03
    ...
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: acme@kernel.org
    Cc: Josh Hunt
    Cc: bpuranda@akamai.com
    Cc: mingo@redhat.com
    Cc: jolsa@redhat.com
    Cc: tglx@linutronix.de
    Cc: namhyung@kernel.org
    Cc: alexander.shishkin@linux.intel.com
    Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com

    Josh Hunt
     
  • ftrace does not use text_poke() for enabling trace functionality. It uses
    its own mechanism and flips the whole kernel text to RW and back to RO.

    The CPA rework removed a loop based check of 4k pages which tried to
    preserve a large page by checking each 4k page whether the change would
    actually cover all pages in the large page.

    This resulted in endless loops for nothing as in testing it turned out that
    it actually never preserved anything. Of course testing missed to include
    ftrace, which is the one and only case which benefitted from the 4k loop.

    As a consequence enabling function tracing or ftrace based kprobes results
    in a full 4k split of the kernel text, which affects iTLB performance.

    The kernel RO protection is the only valid case where this can actually
    preserve large pages.

    All other static protections (RO data, data NX, PCI, BIOS) are truly
    static. So a conflict with those protections which results in a split
    should only ever happen when a change of memory next to a protected region
    is attempted. But these conflicts are rightfully splitting the large page
    to preserve the protected regions. In fact a change to the protected
    regions itself is a bug and is warned about.

    Add an exception for the static protection check for kernel text RO when
    the to be changed region spawns a full large page which allows to preserve
    the large mappings. This also prevents the syslog to be spammed about CPA
    violations when ftrace is used.

    The exception needs to be removed once ftrace switched over to text_poke()
    which avoids the whole issue.

    Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
    Reported-by: Song Liu
    Signed-off-by: Thomas Gleixner
    Tested-by: Song Liu
    Reviewed-by: Song Liu
    Acked-by: Peter Zijlstra (Intel)
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de

    Thomas Gleixner
     
  • …kernel/git/gustavoars/linux

    Pull fallthrough fixes from Gustavo A. R. Silva:
    "Fix fall-through warnings on arc and nds32 for multiple
    configurations"

    * tag 'Wimplicit-fallthrough-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
    nds32: Mark expected switch fall-throughs
    ARC: unwind: Mark expected switch fall-through

    Linus Torvalds
     
  • Mark switch cases where we are expecting to fall through.

    This patch fixes the following warnings (Building: allmodconfig nds32):

    include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    arch/nds32/kernel/signal.c:362:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
    arch/nds32/kernel/signal.c:315:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Reported-by: Michael Ellerman
    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     
  • Mark switch cases where we are expecting to fall through.

    This patch fixes the following warnings (Building: haps_hs_defconfig arc):

    arch/arc/kernel/unwind.c: In function ‘read_pointer’:
    ./include/linux/compiler.h:328:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
    do { \
    ^
    ./include/linux/compiler.h:338:2: note: in expansion of macro ‘__compiletime_assert’
    __compiletime_assert(condition, msg, prefix, suffix)
    ^~~~~~~~~~~~~~~~~~~~
    ./include/linux/compiler.h:350:2: note: in expansion of macro ‘_compiletime_assert’
    _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
    ^~~~~~~~~~~~~~~~~~~
    ./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
    ^~~~~~~~~~~~~~~~~~
    ./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
    BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
    ^~~~~~~~~~~~~~~~
    arch/arc/kernel/unwind.c:573:3: note: in expansion of macro ‘BUILD_BUG_ON’
    BUILD_BUG_ON(sizeof(u32) != sizeof(value));
    ^~~~~~~~~~~~
    arch/arc/kernel/unwind.c:575:2: note: here
    case DW_EH_PE_native:
    ^~~~

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

29 Aug, 2019

2 commits

  • pfn_valid can be wrong when parsing a invalid pfn whose phys address
    exceeds BITS_PER_LONG as the MSB will be trimed when shifted.

    The issue originally arise from bellowing call stack, which corresponding to
    an access of the /proc/kpageflags from userspace with a invalid pfn parameter
    and leads to kernel panic.

    [46886.723249] c7 [] (stable_page_flags) from []
    [46886.723264] c7 [] (kpageflags_read) from []
    [46886.723280] c7 [] (proc_reg_read) from []
    [46886.723290] c7 [] (__vfs_read) from []
    [46886.723301] c7 [] (vfs_read) from []
    [46886.723315] c7 [] (SyS_pread64) from []
    (ret_fast_syscall+0x0/0x28)

    Signed-off-by: Zhaoyang Huang
    Signed-off-by: Russell King

    zhaoyang
     
  • Currently, various virtual memory areas of Linux RISC-V are organized
    in increasing order of their virtual addresses is as follows:
    1. User space area (This is lowest area and starts at 0x0)
    2. FIXMAP area
    3. VMALLOC area
    4. Kernel area (This is highest area and starts at PAGE_OFFSET)

    The maximum size of user space aread is represented by TASK_SIZE.

    On RV32 systems, TASK_SIZE is defined as VMALLOC_START which causes the
    user space area to overlap the FIXMAP area. This allows user space apps
    to potentially corrupt the FIXMAP area and kernel OF APIs will crash
    whenever they access corrupted FDT in the FIXMAP area.

    On RV64 systems, TASK_SIZE is set to fixed 256GB and no other areas
    happen to overlap so we don't see any FIXMAP area corruptions.

    This patch fixes FIXMAP area corruption on RV32 systems by setting
    TASK_SIZE to FIXADDR_START. We also move FIXADDR_TOP, FIXADDR_SIZE,
    and FIXADDR_START defines to asm/pgtable.h so that we can avoid cyclic
    header includes.

    Signed-off-by: Anup Patel
    Tested-by: Alistair Francis
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Paul Walmsley

    Anup Patel
     

28 Aug, 2019

5 commits

  • One of the very few warnings I have in the current build comes from
    arch/x86/boot/edd.c, where I get the following with a gcc9 build:

    arch/x86/boot/edd.c: In function ‘query_edd’:
    arch/x86/boot/edd.c:148:11: warning: taking address of packed member of ‘struct boot_params’ may result in an unaligned pointer value [-Waddress-of-packed-member]
    148 | mbrptr = boot_params.edd_mbr_sig_buffer;
    | ^~~~~~~~~~~

    This warning triggers because we throw away all the CFLAGS and then make
    a new set for REALMODE_CFLAGS, so the -Wno-address-of-packed-member we
    added in the following commit is not present:

    6f303d60534c ("gcc-9: silence 'address-of-packed-member' warning")

    The simplest solution for now is to adjust the warning for this version
    of CFLAGS as well, but it would definitely make sense to examine whether
    REALMODE_CFLAGS could be derived from CFLAGS, so that it picks up changes
    in the compiler flags environment automatically.

    Signed-off-by: Linus Torvalds
    Acked-by: Borislav Petkov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Linus Torvalds
     
  • Don't advance RIP or inject a single-step #DB if emulation signals a
    fault. This logic applies to all state updates that are conditional on
    clean retirement of the emulation instruction, e.g. updating RFLAGS was
    previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
    EFLAGS on faulting emulation").

    Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
    ctxt->_eip until emulation "retires" anyways. Skipping #DB injection
    fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
    invalid state with EFLAGS.TF=1 would loop indefinitely due to emulation
    overwriting the #UD with #DB and thus restarting the bad SYSCALL over
    and over.

    Cc: Nadav Amit
    Cc: stable@vger.kernel.org
    Reported-by: Andy Lutomirski
    Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation")
    Signed-off-by: Sean Christopherson
    Signed-off-by: Radim Krčmář

    Sean Christopherson
     
  • If kvm_intel is loaded with nested=0 parameter an attempt to perform
    KVM_GET_SUPPORTED_HV_CPUID results in OOPS as nested_get_evmcs_version hook
    in kvm_x86_ops is NULL (we assign it in nested_vmx_hardware_setup() and
    this only happens in case nested is enabled).

    Check that kvm_x86_ops->nested_get_evmcs_version is not NULL before
    calling it. With this, we can remove the stub from svm as it is no
    longer needed.

    Cc:
    Fixes: e2e871ab2f02 ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper")
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Jim Mattson
    Signed-off-by: Radim Krčmář

    Vitaly Kuznetsov
     
  • Pull ARC updates from Vineet Gupta:

    - support for Edge Triggered IRQs in ARC IDU intc

    - other fixes here and there

    * tag 'arc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    arc: prefer __section from compiler_attributes.h
    dt-bindings: IDU-intc: Add support for edge-triggered interrupts
    dt-bindings: IDU-intc: Clean up documentation
    ARCv2: IDU-intc: Add support for edge-triggered interrupts
    ARC: unwind: Mark expected switch fall-throughs
    ARC: [plat-hsdk]: allow to switch between AXI DMAC port configurations
    ARC: fix typo in setup_dma_ops log message
    ARCv2: entry: early return from exception need not clear U & DE bits

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya
    Leoshkevich.

    2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for
    SMC from Jason Baron.

    3) ipv6_mc_may_pull() should return 0 for malformed packets, not
    -EINVAL. From Stefano Brivio.

    4) Don't forget to unpin umem xdp pages in error path of
    xdp_umem_reg(). From Ivan Khoronzhuk.

    5) Fix sta object leak in mac80211, from Johannes Berg.

    6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
    switches. From Florian Fainelli.

    7) Revert DMA sync removal from r8169 which was causing regressions on
    some MIPS Loongson platforms. From Heiner Kallweit.

    8) Use after free in flow dissector, from Jakub Sitnicki.

    9) Fix NULL derefs of net devices during ICMP processing across
    collect_md tunnels, from Hangbin Liu.

    10) proto_register() memory leaks, from Zhang Lin.

    11) Set NLM_F_MULTI flag in multipart netlink messages consistently,
    from John Fastabend.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
    r8152: Set memory to all 0xFFs on failed reg reads
    openvswitch: Fix conntrack cache with timeout
    ipv4: mpls: fix mpls_xmit for iptunnel
    nexthop: Fix nexthop_num_path for blackhole nexthops
    net: rds: add service level support in rds-info
    net: route dump netlink NLM_F_MULTI flag missing
    s390/qeth: reject oversized SNMP requests
    sock: fix potential memory leak in proto_register()
    MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
    xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
    ipv4/icmp: fix rt dst dev null pointer dereference
    openvswitch: Fix log message in ovs conntrack
    bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
    bpf: fix use after free in prog symbol exposure
    bpf: fix precision tracking in presence of bpf2bpf calls
    flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
    Revert "r8169: remove not needed call to dma_sync_single_for_device"
    ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
    net/ncsi: Fix the payload copying for the request coming from Netlink
    qed: Add cleanup in qed_slowpath_start()
    ...

    Linus Torvalds
     

27 Aug, 2019

7 commits

  • KVM/PPC fix for 5.3

    - Fix bug which could leave locks locked in the host on return
    to a guest.

    Radim Krčmář
     
  • Gustavo noticed that 'new' can be left uninitialized if 'bios_start'
    happens to be less or equal to 'entry->addr + entry->size'.

    Initialize the variable at the begin of the iteration to the current value
    of 'bios_start'.

    Fixes: 0a46fff2f910 ("x86/boot/compressed/64: Fix boot on machines with broken E820 table")
    Reported-by: "Gustavo A. R. Silva"
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190826133326.7cxb4vbmiawffv2r@box

    Kirill A. Shutemov
     
  • H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from
    a guest. Although we verify correctness of TCEs before we do anything
    with the existing tables, there is a small window when a check in
    kvmppc_tce_validate might pass and right after that the guest alters
    the page of TCEs, causing an early exit from the handler and leaving
    srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap)
    (real mode) locked.

    This fixes the bug by jumping to the common exit code with an appropriate
    unlock.

    Cc: stable@vger.kernel.org # v4.11+
    Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
    Signed-off-by: Alexey Kardashevskiy
    Signed-off-by: Paul Mackerras

    Alexey Kardashevskiy
     
  • Although APIC initialization will typically clear out the LDR before
    setting it, the APIC cleanup code should reset the LDR.

    This was discovered with a 32-bit KVM guest jumping into a kdump
    kernel. The stale bits in the LDR triggered a bug in the KVM APIC
    implementation which caused the destination mapping for VCPUs to be
    corrupted.

    Note that this isn't intended to paper over the KVM APIC bug. The kernel
    has to clear the LDR when resetting the APIC registers except when X2APIC
    is enabled.

    This lacks a Fixes tag because missing to clear LDR goes way back into pre
    git history.

    [ tglx: Made x2apic_enabled a function call as required ]

    Signed-off-by: Bandan Das
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com

    Bandan Das
     
  • Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
    bigsmp APIC implementation uses physical destination mode, but it
    nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
    multiple bit being set.

    This does not cause a functional problem because LDR and DFR are ignored
    when physical destination mode is active, but it triggered a problem on a
    32-bit KVM guest which jumps into a kdump kernel.

    The multiple bits set unearthed a bug in the KVM APIC implementation. The
    code which creates the logical destination map for VCPUs ignores the
    disabled state of the APIC and ends up overwriting an existing valid entry
    and as a result, APIC calibration hangs in the guest during kdump
    initialization.

    Remove the bogus LDR/DFR initialization.

    This is not intended to work around the KVM APIC bug. The LDR/DFR
    ininitalization is wrong on its own.

    The issue goes back into the pre git history. The fixes tag is the commit
    in the bitkeeper import which introduced bigsmp support in 2003.

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

    Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems")
    Suggested-by: Thomas Gleixner
    Signed-off-by: Bandan Das
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com

    Bandan Das
     
  • Reported-by: Sedat Dilek
    Suggested-by: Josh Poimboeuf
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Vineet Gupta

    Nick Desaulniers
     
  • This adds support for an optional extra interrupt cell to specify edge
    vs level triggered. It is backward compatible with dts files with only
    one cell, and will default to level-triggered in such a case.

    Note that I had to make a change to idu_irq_set_affinity as well, as
    this function was setting the interrupt type to "level" unconditionally,
    since this was the only type supported previously.

    Signed-off-by: Mischa Jonker
    Reviewed-by: Vineet Gupta
    Signed-off-by: Vineet Gupta

    Mischa Jonker
     

26 Aug, 2019

6 commits

  • 32-bit processes running on a 64-bit kernel are not always detected
    correctly, causing the process to crash when uretprobes are installed.

    The reason for the crash is that in_ia32_syscall() is used to determine the
    process's mode, which only works correctly when called from a syscall.

    In the case of uretprobes, however, the function is called from a exception
    and always returns 'false' on a 64-bit kernel. In consequence this leads to
    corruption of the process's return address.

    Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which
    is correct in any situation.

    [ tglx: Add a comment and the following historical info ]

    This should have been detected by the rename which happened in commit

    abfb9498ee13 ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()")

    which states in the changelog:

    The is_ia32_task()/is_x32_task() function names are a big misnomer: they
    suggests that the compat-ness of a system call is a task property, which
    is not true, the compatness of a system call purely depends on how it
    was invoked through the system call layer.
    .....

    and then it went and blindly renamed every call site.

    Sadly enough this was already mentioned here:

    8faaed1b9f50 ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and
    arch_uretprobe_hijack_return_addr()")

    where the changelog says:

    TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
    not necessarily mean 32bit. Fortunately syscall-like insns can't be
    probed so it actually works, but it would be better to rename and
    use is_ia32_frame().

    and goes all the way back to:

    0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions")

    Oh well. 7+ years until someone actually tried a uretprobe on a 32bit
    process on a 64bit kernel....

    Fixes: 0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions")
    Signed-off-by: Sebastian Mayr
    Signed-off-by: Thomas Gleixner
    Cc: Masami Hiramatsu
    Cc: Dmitry Safonov
    Cc: Oleg Nesterov
    Cc: Srikar Dronamraju
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st

    Sebastian Mayr
     
  • Rahul Tanwar reported the following bug on DT systems:

    > 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is
    > updated to the end of hardware IRQ numbers but this is done only when IOAPIC
    > configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is
    > a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration
    > comes from devicetree.
    >
    > See dtb_add_ioapic() in arch/x86/kernel/devicetree.c
    >
    > In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base'
    > remains to zero initialized value. This means that for OF based systems,
    > virtual IRQ base will get set to zero.

    Such systems will very likely not even boot.

    For DT enabled machines ioapic_dynirq_base is irrelevant and not
    updated, so simply map the IRQ base 1:1 instead.

    Reported-by: Rahul Tanwar
    Tested-by: Rahul Tanwar
    Tested-by: Andy Shevchenko
    Signed-off-by: Thomas Gleixner
    Cc: Alexander Shishkin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: alan@linux.intel.com
    Cc: bp@alien8.de
    Cc: cheol.yong.kim@intel.com
    Cc: qi-ming.wu@intel.com
    Cc: rahul.tanwar@intel.com
    Cc: rppt@linux.ibm.com
    Cc: tony.luck@intel.com
    Link: http://lkml.kernel.org/r/20190821081330.1187-1-rahul.tanwar@linux.intel.com
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Pull UML fix from Richard Weinberger:
    "Fix time travel mode"

    * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: fix time travel mode

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "A few fixes for x86:

    - Fix a boot regression caused by the recent bootparam sanitizing
    change, which escaped the attention of all people who reviewed that
    code.

    - Address a boot problem on machines with broken E820 tables caused
    by an underflow which ended up placing the trampoline start at
    physical address 0.

    - Handle machines which do not advertise a legacy timer of any form,
    but need calibration of the local APIC timer gracefully by making
    the calibration routine independent from the tick interrupt. Marked
    for stable as well as there seems to be quite some new laptops
    rolled out which expose this.

    - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are
    affected by broken firmware which does not initialize RDRAND
    correctly after resume. Add a command line parameter to override
    this for machine which either do not use suspend/resume or have a
    fixed BIOS. Unfortunately there is no way to detect this on boot,
    so the only safe decision is to turn it off by default.

    - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which
    caused fast KVM instruction emulation to break.

    - Explain the Intel CPU model naming convention so that the repeating
    discussions come to an end"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
    x86/boot: Fix boot regression caused by bootparam sanitizing
    x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
    x86/boot/compressed/64: Fix boot on machines with broken E820 table
    x86/apic: Handle missing global clockevent gracefully
    x86/cpu: Explain Intel model naming convention

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:
    "Two small fixes for kprobes and perf:

    - Prevent a deadlock in kprobe_optimizer() causes by reverse lock
    ordering

    - Fix a comment typo"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    kprobes: Fix potential deadlock in kprobe_optimizer()
    perf/x86: Fix typo in comment

    Linus Torvalds
     
  • Mergr misc fixes from Andrew Morton:
    "11 fixes"

    Mostly VM fixes, one psi polling fix, and one parisc build fix.

    * emailed patches from Andrew Morton :
    mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
    mm/zsmalloc.c: fix race condition in zs_destroy_pool
    mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
    mm, page_owner: handle THP splits correctly
    userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
    psi: get poll_work to run when calling poll syscall next time
    mm: memcontrol: flush percpu vmevents before releasing memcg
    mm: memcontrol: flush percpu vmstats before releasing memcg
    parisc: fix compilation errrors
    mm, page_alloc: move_freepages should not examine struct page of reserved memory
    mm/z3fold.c: fix race between migration and destruction

    Linus Torvalds
     

25 Aug, 2019

2 commits

  • Pull dma-mapping fixes from Christoph Hellwig:
    "Two fixes for regressions in this merge window:

    - select the Kconfig symbols for the noncoherent dma arch helpers on
    arm if swiotlb is selected, not just for LPAE to not break then Xen
    build, that uses swiotlb indirectly through swiotlb-xen

    - fix the page allocator fallback in dma_alloc_contiguous if the CMA
    allocation fails"

    * tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping:
    dma-direct: fix zone selection after an unaddressable CMA allocation
    arm: select the dma-noncoherent symbols for all swiotlb builds

    Linus Torvalds
     
  • Commit 0cfaee2af3a0 ("include/asm-generic/5level-fixup.h: fix variable
    'p4d' set but not used") converted a few functions from macros to static
    inline, which causes parisc to complain,

    In file included from include/asm-generic/4level-fixup.h:38:0,
    from arch/parisc/include/asm/pgtable.h:5,
    from arch/parisc/include/asm/io.h:6,
    from include/linux/io.h:13,
    from sound/core/memory.c:9:
    include/asm-generic/5level-fixup.h:14:18: error: unknown type name 'pgd_t'; did you mean 'pid_t'?
    #define p4d_t pgd_t
    ^
    include/asm-generic/5level-fixup.h:24:28: note: in expansion of macro 'p4d_t'
    static inline int p4d_none(p4d_t p4d)
    ^~~~~

    It is because "4level-fixup.h" is included before "asm/page.h" where
    "pgd_t" is defined.

    Link: http://lkml.kernel.org/r/20190815205305.1382-1-cai@lca.pw
    Fixes: 0cfaee2af3a0 ("include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used")
    Signed-off-by: Qian Cai
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai