01 Oct, 2020

12 commits

  • commit 86a82ae0b5095ea24c55898a3f025791e7958b21 upstream.

    Several people reported in the kernel bugzilla that between v4.12 and v4.13
    the magic which works around broken hardware and BIOSes to find the proper
    timer interrupt delivery mode stopped working for some older affected
    platforms which need to fall back to ExtINT delivery mode.

    The reason is that the core code changed to keep track of the masked and
    disabled state of an interrupt line more accurately to avoid the expensive
    hardware operations.

    That broke an assumption in i8259_make_irq() which invokes

    disable_irq_nosync();
    irq_set_chip_and_handler();
    enable_irq();

    Up to v4.12 this worked because enable_irq() unconditionally unmasked the
    interrupt line, but after the state tracking improvements this is not
    longer the case because the IO/APIC uses lazy disabling. So the line state
    is unmasked which means that enable_irq() does not call into the new irq
    chip to unmask it.

    In principle this is a shortcoming of the core code, but it's more than
    unclear whether the core code should try to reset state. At least this
    cannot be done unconditionally as that would break other existing use cases
    where the chip type is changed, e.g. when changing the trigger type, but
    the callers expect the state to be preserved.

    As the way how check_timer() is switching the delivery modes is truly
    unique, the obvious fix is to simply unmask the i8259 manually after
    changing the mode to ExtINT delivery and switching the irq chip to the
    legacy PIC.

    Note, that the fixes tag is not really precise, but identifies the commit
    which broke the assumptions in the IO/APIC and i8259 code and that's the
    kernel version to which this needs to be backported.

    Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls")
    Reported-by: p_c_chan@hotmail.com
    Reported-by: ecm4@mail.com
    Reported-by: perdigao1@yahoo.com
    Reported-by: matzes@users.sourceforge.net
    Reported-by: rvelascog@gmail.com
    Signed-off-by: Thomas Gleixner
    Tested-by: p_c_chan@hotmail.com
    Tested-by: matzes@users.sourceforge.net
    Cc: stable@vger.kernel.org
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=197769
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit a1cd6c2ae47ee10ff21e62475685d5b399e2ed4a upstream.

    If we copy less than 8 bytes and if the destination crosses a cache
    line, __copy_user_flushcache would invalidate only the first cache line.

    This patch makes it invalidate the second cache line as well.

    Fixes: 0aed55af88345b ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations")
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Ingo Molnar
    Cc:
    Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • [ Upstream commit 4bb05f30483fd21ea5413eaf1182768f251cf625 ]

    The INVD instruction intercept performs emulation. Emulation can't be done
    on an SEV guest because the guest memory is encrypted.

    Provide a dedicated intercept routine for the INVD intercept. And since
    the instruction is emulated as a NOP, just skip it instead.

    Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command")
    Signed-off-by: Tom Lendacky
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Tom Lendacky
     
  • [ Upstream commit 8d214c481611b29458a57913bd786f0ac06f0605 ]

    Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled.
    Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are
    toggled inadvertantly skipped the MMU context reset due to the mask
    of bits that triggers PDPTR loads also being used to trigger MMU context
    resets.

    Fixes: 427890aff855 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode")
    Fixes: cb957adb4ea4 ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode")
    Cc: Jim Mattson
    Cc: Peter Shier
    Cc: Oliver Upton
    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Sean Christopherson
     
  • [ Upstream commit a7ef9ba986b5fae9d80f8a7b31db0423687efe4e ]

    Prevent the compiler from uninlining and creating traceable/probable
    functions as this is invoked _after_ context tracking switched to
    CONTEXT_USER and rcu idle.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexandre Chartre
    Acked-by: Peter Zijlstra
    Link: https://lkml.kernel.org/r/20200505134340.902709267@linutronix.de
    Signed-off-by: Sasha Levin

    Thomas Gleixner
     
  • [ Upstream commit fede8076aab4c2280c673492f8f7a2e87712e8b4 ]

    KVM is not handling the case where EIP wraps around the 32-bit address
    space (that is, outside long mode). This is needed both in vmx.c
    and in emulate.c. SVM with NRIPS is okay, but it can still print
    an error to dmesg due to integer overflow.

    Reported-by: Nick Peterson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Paolo Bonzini
     
  • [ Upstream commit 7289fdb5dcdbc5155b5531529c44105868a762f2 ]

    Fixes a NULL pointer dereference, caused by the PIT firing an interrupt
    before the interrupt table has been initialized.

    SET_PIT2 can race with the creation of the IRQchip. In particular,
    if SET_PIT2 is called with a low PIT timer period (after the creation of
    the IOAPIC, but before the instantiation of the irq routes), the PIT can
    fire an interrupt at an uninitialized table.

    Signed-off-by: Steve Rutherford
    Signed-off-by: Jon Cargille
    Reviewed-by: Jim Mattson
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Steve Rutherford
     
  • [ Upstream commit edec6e015a02003c2af0ce82c54ea016b5a9e3f0 ]

    apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but
    started later with HRTIMER_MODE_ABS, which may cause the following warning
    in PREEMPT_RT kernel.

    WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0
    CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1
    Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018
    RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0
    Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45
    b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34
    c5 a0 26 bf 93 e9 a1 fd ff ff 0b e9 fd fc ff
    ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91
    RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202
    RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110
    RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf
    R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00
    FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0
    Call Trace:
    ? kvm_release_pfn_clean+0x22/0x60 [kvm]
    start_sw_timer+0x85/0x230 [kvm]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm]
    vmx_pre_block+0x1cb/0x260 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_vmexit+0x1b/0x30 [kvm_intel]
    ? vmx_vmexit+0xf/0x30 [kvm_intel]
    ? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel]
    ? kvm_apic_has_interrupt+0x46/0x80 [kvm]
    kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm]
    ? _raw_spin_unlock_irqrestore+0x18/0x50
    ? _copy_to_user+0x2c/0x30
    kvm_vcpu_ioctl+0x235/0x660 [kvm]
    ? rt_spin_unlock+0x2c/0x50
    do_vfs_ioctl+0x3e4/0x650
    ? __fget+0x7a/0xa0
    ksys_ioctl+0x67/0x90
    __x64_sys_ioctl+0x1a/0x20
    do_syscall_64+0x4d/0x120
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f4027cc54a7
    Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00
    00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
    00 00 00 b8 10 00 00 00 0f 05 3d 01 f0 ff ff
    73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48
    RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7
    RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
    RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff
    R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840
    --[ end trace 0000000000000002 ]--

    Signed-off-by: He Zhe
    Message-Id:
    Reviewed-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    He Zhe
     
  • [ Upstream commit 16171bffc829272d5e6014bad48f680cb50943d9 ]

    Alex Shi reported the pkey macros above arch_set_user_pkey_access()
    to be unused. They are unused, and even refer to a nonexistent
    CONFIG option.

    But, they might have served a good use, which was to ensure that
    the code does not try to set values that would not fit in the
    PKRU register. As it stands, a too-large 'pkey' value would
    be likely to silently overflow the u32 new_pkru_bits.

    Add a check to look for overflows. Also add a comment to remind
    any future developer to closely examine the types used to store
    pkey values if arch_max_pkey() ever changes.

    This boots and passes the x86 pkey selftests.

    Reported-by: Alex Shi
    Signed-off-by: Dave Hansen
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/20200122165346.AD4DA150@viggo.jf.intel.com
    Signed-off-by: Sasha Levin

    Dave Hansen
     
  • [ Upstream commit c9dfd3fb08352d439f0399b6fabe697681d2638c ]

    For the duration of mapping eVMCS, it derefences ->memslots without holding
    ->srcu or ->slots_lock when accessing hv assist page. This patch fixes it by
    moving nested_sync_vmcs12_to_shadow to prepare_guest_switch, where the SRCU
    is already taken.

    It can be reproduced by running kvm's evmcs_test selftest.

    =============================
    warning: suspicious rcu usage
    5.6.0-rc1+ #53 tainted: g w ioe
    -----------------------------
    ./include/linux/kvm_host.h:623 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by evmcs_test/8507:
    #0: ffff9ddd156d00d0 (&vcpu->mutex){+.+.}, at:
    kvm_vcpu_ioctl+0x85/0x680 [kvm]

    stack backtrace:
    cpu: 6 pid: 8507 comm: evmcs_test tainted: g w ioe 5.6.0-rc1+ #53
    hardware name: dell inc. optiplex 7040/0jctf8, bios 1.4.9 09/12/2016
    call trace:
    dump_stack+0x68/0x9b
    kvm_read_guest_cached+0x11d/0x150 [kvm]
    kvm_hv_get_assist_page+0x33/0x40 [kvm]
    nested_enlightened_vmentry+0x2c/0x60 [kvm_intel]
    nested_vmx_handle_enlightened_vmptrld.part.52+0x32/0x1c0 [kvm_intel]
    nested_sync_vmcs12_to_shadow+0x439/0x680 [kvm_intel]
    vmx_vcpu_run+0x67a/0xe60 [kvm_intel]
    vcpu_enter_guest+0x35e/0x1bc0 [kvm]
    kvm_arch_vcpu_ioctl_run+0x40b/0x670 [kvm]
    kvm_vcpu_ioctl+0x370/0x680 [kvm]
    ksys_ioctl+0x235/0x850
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x77/0x780
    entry_syscall_64_after_hwframe+0x49/0xbe

    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    wanpeng li
     
  • [ Upstream commit 147f1a1fe5d7e6b01b8df4d0cbd6f9eaf6b6c73b ]

    The "u" field in the event has three states, -1/0/1. Using u8 however means that
    comparison with -1 will always fail, so change to signed char.

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Paolo Bonzini
     
  • [ Upstream commit 6f599d84231fd27e42f4ca2a786a6641e8cddf00 ]

    On x86, purgatory() copies the first 640K of memory to a backup region
    because the kernel needs those first 640K for the real mode trampoline
    during boot, among others.

    However, when SME is enabled, the kernel cannot properly copy the old
    memory to the backup area but reads only its encrypted contents. The
    result is that the crash tool gets invalid pointers when parsing vmcore:

    crash> kmem -s|grep -i invalid
    kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
    kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
    crash>

    So reserve the remaining low 1M memory when the crashkernel option is
    specified (after reserving real mode memory) so that allocated memory
    does not fall into the low 1M area and thus the copying of the contents
    of the first 640k to a backup region in purgatory() can be avoided
    altogether.

    This way, it does not need to be included in crash dumps or used for
    anything except the trampolines that must live in the low 1M.

    [ bp: Heavily rewrite commit message, flip check logic in
    crash_reserve_low_1M().]

    Signed-off-by: Lianbo Jiang
    Signed-off-by: Borislav Petkov
    Cc: bhe@redhat.com
    Cc: Dave Young
    Cc: d.hatayama@fujitsu.com
    Cc: dhowells@redhat.com
    Cc: ebiederm@xmission.com
    Cc: horms@verge.net.au
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jürgen Gross
    Cc: kexec@lists.infradead.org
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: vgoyal@redhat.com
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20191108090027.11082-2-lijiang@redhat.com
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=204793
    Signed-off-by: Sasha Levin

    Lianbo Jiang
     

23 Sep, 2020

1 commit

  • commit 09e43968db40c33a73e9ddbfd937f46d5c334924 upstream.

    The x86-64 psABI [0] specifies special relocation types
    (R_X86_64_[REX_]GOTPCRELX) for indirection through the Global Offset
    Table, semantically equivalent to R_X86_64_GOTPCREL, which the linker
    can take advantage of for optimization (relaxation) at link time. This
    is supported by LLD and binutils versions 2.26 onwards.

    The compressed kernel is position-independent code, however, when using
    LLD or binutils versions before 2.27, it must be linked without the -pie
    option. In this case, the linker may optimize certain instructions into
    a non-position-independent form, by converting foo@GOTPCREL(%rip) to $foo.

    This potential issue has been present with LLD and binutils-2.26 for a
    long time, but it has never manifested itself before now:

    - LLD and binutils-2.26 only relax
    movq foo@GOTPCREL(%rip), %reg
    to
    leaq foo(%rip), %reg
    which is still position-independent, rather than
    mov $foo, %reg
    which is permitted by the psABI when -pie is not enabled.

    - GCC happens to only generate GOTPCREL relocations on mov instructions.

    - CLang does generate GOTPCREL relocations on non-mov instructions, but
    when building the compressed kernel, it uses its integrated assembler
    (due to the redefinition of KBUILD_CFLAGS dropping -no-integrated-as),
    which has so far defaulted to not generating the GOTPCRELX
    relocations.

    Nick Desaulniers reports [1,2]:

    "A recent change [3] to a default value of configuration variable
    (ENABLE_X86_RELAX_RELOCATIONS OFF -> ON) in LLVM now causes Clang's
    integrated assembler to emit R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX
    relocations. LLD will relax instructions with these relocations based
    on whether the image is being linked as position independent or not.
    When not, then LLD will relax these instructions to use absolute
    addressing mode (R_RELAX_GOT_PC_NOPIC). This causes kernels built with
    Clang and linked with LLD to fail to boot."

    Patch series [4] is a solution to allow the compressed kernel to be
    linked with -pie unconditionally, but even if merged is unlikely to be
    backported. As a simple solution that can be applied to stable as well,
    prevent the assembler from generating the relaxed relocation types using
    the -mrelax-relocations=no option. For ease of backporting, do this
    unconditionally.

    [0] https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/linker-optimization.tex#L65
    [1] https://lore.kernel.org/lkml/20200807194100.3570838-1-ndesaulniers@google.com/
    [2] https://github.com/ClangBuiltLinux/linux/issues/1121
    [3] https://reviews.llvm.org/rGc41a18cf61790fc898dcda1055c3efbf442c14c0
    [4] https://lore.kernel.org/lkml/20200731202738.2577854-1-nivedita@alum.mit.edu/

    Reported-by: Nick Desaulniers
    Signed-off-by: Arvind Sankar
    Signed-off-by: Ingo Molnar
    Tested-by: Nick Desaulniers
    Tested-by: Sedat Dilek
    Acked-by: Ard Biesheuvel
    Reviewed-by: Nick Desaulniers
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20200812004308.1448603-1-nivedita@alum.mit.edu
    Signed-off-by: Greg Kroah-Hartman

    Arvind Sankar
     

17 Sep, 2020

2 commits

  • commit 99b82a1437cb31340dbb2c437a2923b9814a7b15 upstream.

    According to SDM 27.2.4, Event delivery causes an APIC-access VM exit.
    Don't report internal error and freeze guest when event delivery causes
    an APIC-access exit, it is handleable and the event will be re-injected
    during the next vmentry.

    Signed-off-by: Wanpeng Li
    Message-Id:
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Wanpeng Li
     
  • commit 973c096f6a85e5b5f2a295126ba6928d9a6afd45 upstream.

    Yunhai Zhang recently fixed a VGA software scrollback bug in commit
    ebfdfeeae8c0 ("vgacon: Fix for missing check in scrollback handling"),
    but that then made people look more closely at some of this code, and
    there were more problems on the vgacon side, but also the fbcon software
    scrollback.

    We don't really have anybody who maintains this code - probably because
    nobody actually _uses_ it any more. Sure, people still use both VGA and
    the framebuffer consoles, but they are no longer the main user
    interfaces to the kernel, and haven't been for decades, so these kinds
    of extra features end up bitrotting and not really being used.

    So rather than try to maintain a likely unused set of code, I'll just
    aggressively remove it, and see if anybody even notices. Maybe there
    are people who haven't jumped on the whole GUI badnwagon yet, and think
    it's just a fad. And maybe those people use the scrollback code.

    If that turns out to be the case, we can resurrect this again, once
    we've found the sucker^Wmaintainer for it who actually uses it.

    Reported-by: NopNop Nop
    Tested-by: Willy Tarreau
    Cc: 张云海
    Acked-by: Andy Lutomirski
    Acked-by: Willy Tarreau
    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

10 Sep, 2020

2 commits

  • commit 2356bb4b8221d7dc8c7beb810418122ed90254c9 upstream.

    On i386, the order of parameters passed on regs is eax,edx,and ecx
    (as per regparm(3) calling conventions).

    Change the mapping in regs_get_kernel_argument(), so that arg1=ax
    arg2=dx, and arg3=cx.

    Running the selftests testcase kprobes_args_use.tc shows the result
    as passed.

    Fixes: 3c88ee194c28 ("x86: ptrace: Add function argument access API")
    Signed-off-by: Vamshi K Sthambamkadi
    Signed-off-by: Borislav Petkov
    Acked-by: Masami Hiramatsu
    Acked-by: Peter Zijlstra (Intel)
    Cc:
    Link: https://lkml.kernel.org/r/20200828113242.GA1424@cosmos
    Signed-off-by: Greg Kroah-Hartman

    Vamshi K Sthambamkadi
     
  • [ Upstream commit ccae0f36d500aef727f98acd8d0601e6b262a513 ]

    Commit:

    cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability")

    uses "-1" as the starting node ID, which causes the strange kernel log as
    follows, when "numa=fake=32G" is added to the kernel command line:

    Faking node -1 at [mem 0x0000000000000000-0x0000000893ffffff] (35136MB)
    Faking node 0 at [mem 0x0000001840000000-0x000000203fffffff] (32768MB)
    Faking node 1 at [mem 0x0000000894000000-0x000000183fffffff] (64192MB)
    Faking node 2 at [mem 0x0000002040000000-0x000000283fffffff] (32768MB)
    Faking node 3 at [mem 0x0000002840000000-0x000000303fffffff] (32768MB)

    And finally the kernel crashes:

    BUG: Bad page state in process swapper pfn:00011
    page:(____ptrval____) refcount:0 mapcount:1 mapping:(____ptrval____) index:0x55cd7e44b270 pfn:0x11
    failed to read mapping contents, not a valid kernel address?
    flags: 0x5(locked|uptodate)
    raw: 0000000000000005 000055cd7e44af30 000055cd7e44af50 0000000100000006
    raw: 000055cd7e44b270 000055cd7e44b290 0000000000000000 000055cd7e44b510
    page dumped because: page still charged to cgroup
    page->mem_cgroup:000055cd7e44b510
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc2 #1
    Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
    Call Trace:
    dump_stack+0x57/0x80
    bad_page.cold+0x63/0x94
    __free_pages_ok+0x33f/0x360
    memblock_free_all+0x127/0x195
    mem_init+0x23/0x1f5
    start_kernel+0x219/0x4f5
    secondary_startup_64+0xb6/0xc0

    Fix this bug via using 0 as the starting node ID. This restores the
    original behavior before cc9aec03e58f.

    [ mingo: Massaged the changelog. ]

    Fixes: cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability")
    Signed-off-by: "Huang, Ying"
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20200904061047.612950-1-ying.huang@intel.com
    Signed-off-by: Sasha Levin

    Huang Ying
     

03 Sep, 2020

1 commit

  • commit 52d6b926aabc47643cd910c85edb262b7f44c168 upstream.

    There is a race when taking a CPU offline. Current code looks like this:

    native_cpu_disable()
    {
    ...
    apic_soft_disable();
    /*
    * Any existing set bits for pending interrupt to
    * this CPU are preserved and will be sent via IPI
    * to another CPU by fixup_irqs().
    */
    cpu_disable_common();
    {
    ....
    /*
    * Race window happens here. Once local APIC has been
    * disabled any new interrupts from the device to
    * the old CPU are lost
    */
    fixup_irqs(); // Too late to capture anything in IRR.
    ...
    }
    }

    The fix is to disable the APIC *after* cpu_disable_common().

    Testing was done with a USB NIC that provided a source of frequent
    interrupts. A script migrated interrupts to a specific CPU and
    then took that CPU offline.

    Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead")
    Reported-by: Evan Green
    Signed-off-by: Ashok Raj
    Signed-off-by: Thomas Gleixner
    Tested-by: Mathias Nyman
    Tested-by: Evan Green
    Reviewed-by: Evan Green
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/
    Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
    Signed-off-by: Greg Kroah-Hartman

    Ashok Raj
     

26 Aug, 2020

5 commits

  • commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream.

    The 'flags' field of 'struct mmu_notifier_range' is used to indicate
    whether invalidate_range_{start,end}() are permitted to block. In the
    case of kvm_mmu_notifier_invalidate_range_start(), this field is not
    forwarded on to the architecture-specific implementation of
    kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
    whether or not to block.

    Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
    architectures are aware as to whether or not they are permitted to block.

    Cc:
    Cc: Marc Zyngier
    Cc: Suzuki K Poulose
    Cc: James Morse
    Signed-off-by: Will Deacon
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Will Deacon
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     
  • [ Upstream commit ee87e1557c42dc9c2da11c38e11b87c311569853 ]

    ../arch/x86/pci/xen.c: In function ‘pci_xen_init’:
    ../arch/x86/pci/xen.c:410:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration]
    acpi_noirq_set();

    Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef")
    Signed-off-by: Randy Dunlap
    Reviewed-by: Juergen Gross
    Cc: Andy Shevchenko
    Cc: Bjorn Helgaas
    Cc: Konrad Rzeszutek Wilk
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-pci@vger.kernel.org
    Signed-off-by: Juergen Gross
    Signed-off-by: Sasha Levin

    Randy Dunlap
     
  • [ Upstream commit cb957adb4ea422bd758568df5b2478ea3bb34f35 ]

    See the SDM, volume 3, section 4.4.1:

    If PAE paging would be in use following an execution of MOV to CR0 or
    MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
    CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
    the PDPTEs are loaded from the address in CR3.

    Fixes: b9baba8614890 ("KVM, pkeys: expose CPUID/CR4 to guest")
    Cc: Huaitong Han
    Signed-off-by: Jim Mattson
    Reviewed-by: Peter Shier
    Reviewed-by: Oliver Upton
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Jim Mattson
     
  • [ Upstream commit 427890aff8558eb4326e723835e0eae0e6fe3102 ]

    See the SDM, volume 3, section 4.4.1:

    If PAE paging would be in use following an execution of MOV to CR0 or
    MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
    CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
    the PDPTEs are loaded from the address in CR3.

    Fixes: 0be0226f07d14 ("KVM: MMU: fix SMAP virtualization")
    Cc: Xiao Guangrong
    Signed-off-by: Jim Mattson
    Reviewed-by: Peter Shier
    Reviewed-by: Oliver Upton
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Jim Mattson
     
  • commit eefb8c124fd969e9a174ff2bedff86aa305a7438 upstream.

    Introduce a new READELF variable to top-level Makefile, so the name of
    readelf binary can be specified.

    Before this change the name of the binary was hardcoded to
    "$(CROSS_COMPILE)readelf" which might not be present for every
    toolchain.

    This allows to build with LLVM Object Reader by using make parameter
    READELF=llvm-readelf.

    Link: https://github.com/ClangBuiltLinux/linux/issues/771
    Signed-off-by: Dmitry Golovin
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Golovin
     

21 Aug, 2020

3 commits

  • [ Upstream commit 4bb5fcb97a5df0bbc0a27e0252b1e7ce140a8431 ]

    This fixes a problem introduced by commit:

    5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")

    that perf event sysfs attributes for psys RAPL domain are missing.

    Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")
    Signed-off-by: Zhang Rui
    Signed-off-by: Ingo Molnar
    Reviewed-by: Kan Liang
    Reviewed-by: Len Brown
    Acked-by: Jiri Olsa
    Link: https://lore.kernel.org/r/20200811153149.12242-2-rui.zhang@intel.com
    Signed-off-by: Sasha Levin

    Zhang Rui
     
  • [ Upstream commit 7d98585860d845e36ee612832a5ff021f201dbaf ]

    Frequency descriptor of Lightning Mountain SoC doesn't have all the
    frequency entries so resulting in the below failure causing a kernel hang:

    Error MSR_FSB_FREQ index 15 is unknown
    tsc: Fast TSC calibration failed

    So, add all the frequency entries in the Lightning Mountain SoC frequency
    descriptor.

    Fixes: 0cc5359d8fd45 ("x86/cpu: Update init data for new Airmont CPU model")
    Fixes: 812c2d7506fd ("x86/tsc_msr: Use named struct initializers")
    Signed-off-by: Dilip Kota
    Signed-off-by: Ingo Molnar
    Reviewed-by: Andy Shevchenko
    Link: https://lore.kernel.org/r/211c643ae217604b46cbec43a2c0423946dc7d2d.1596440057.git.eswara.kota@linux.intel.com
    Signed-off-by: Sasha Levin

    Dilip Kota
     
  • commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream.

    John reported that on a RK3288 system the perf per CPU interrupts are all
    affine to CPU0 and provided the analysis:

    "It looks like what happens is that because the interrupts are not per-CPU
    in the hardware, armpmu_request_irq() calls irq_force_affinity() while
    the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
    IRQF_NOBALANCING.

    Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
    irq_setup_affinity() which returns early because IRQF_PERCPU and
    IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."

    This was broken by the recent commit which blocked interrupt affinity
    setting in hardware before activation of the interrupt. While this works in
    general, it does not work for this particular case. As contrary to the
    initial analysis not all interrupt chip drivers implement an activate
    callback, the safe cure is to make the deferred interrupt affinity setting
    at activation time opt-in.

    Implement the necessary core logic and make the two irqchip implementations
    for which this is required opt-in. In hindsight this would have been the
    right thing to do, but ...

    Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
    Reported-by: John Keeping
    Signed-off-by: Thomas Gleixner
    Tested-by: Marc Zyngier
    Acked-by: Marc Zyngier
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

19 Aug, 2020

5 commits

  • commit ec0160891e387f4771f953b888b1fe951398e5d9 upstream.

    Commit 711419e504eb ("irqdomain: Add the missing assignment of
    domain->fwnode for named fwnode") unintentionally caused a dangling pointer
    page fault issue on firmware nodes that were freed after IRQ domain
    allocation. Commit e3beca48a45b fixed that dangling pointer issue by only
    freeing the firmware node after an IRQ domain allocation failure. That fix
    no longer frees the firmware node immediately, but leaves the firmware node
    allocated after the domain is removed.

    The firmware node must be kept around through irq_domain_remove, but should be
    freed it afterwards.

    Add the missing free operations after domain removal where where appropriate.

    Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated")
    Signed-off-by: Jon Derrick
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Andy Shevchenko
    Acked-by: Bjorn Helgaas # drivers/pci
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com
    Signed-off-by: Greg Kroah-Hartman

    Jon Derrick
     
  • [ Upstream commit 44069737ac9625a0f02f0f7f5ab96aae4cd819bc ]

    Clang's integrated assembler complains "invalid reassignment of
    non-absolute variable 'var_ddq_add'" while assembling
    arch/x86/crypto/aes_ctrby8_avx-x86_64.S. It was because var_ddq_add was
    reassigned with non-absolute values several times, which IAS did not
    support. We can avoid the reassignment by replacing the uses of
    var_ddq_add with its definitions accordingly to have compatilibility
    with IAS.

    Link: https://github.com/ClangBuiltLinux/linux/issues/1008
    Reported-by: Sedat Dilek
    Reported-by: Fangrui Song
    Tested-by: Sedat Dilek # build+boot Linux v5.7.5; clang v11.0.0-git
    Signed-off-by: Jian Cai
    Signed-off-by: Herbert Xu
    Signed-off-by: Sasha Levin

    Jian Cai
     
  • [ Upstream commit 8ab49526b53d3172d1d8dd03a75c7d1f5bd21239 ]

    syzbot found its way in 86_fsgsbase_read_task() and triggered this oops:

    KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
    CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0
    RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393
    Call Trace:
    putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876
    genregs32_set arch/x86/kernel/ptrace.c:1026 [inline]
    genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006
    copy_regset_from_user include/linux/regset.h:326 [inline]
    ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline]
    compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198
    __do_compat_sys_ptrace kernel/ptrace.c:1420 [inline]
    __se_compat_sys_ptrace kernel/ptrace.c:1389 [inline]
    __ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389
    do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline]
    __do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126
    do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149
    entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

    This can happen if ptrace() or sigreturn() pokes an LDT selector into FS
    or GS for a task with no LDT and something tries to read the base before
    a return to usermode notices the bad selector and fixes it.

    The fix is to make sure ldt pointer is not NULL.

    Fixes: 07e1d88adaae ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately")
    Co-developed-by: Jann Horn
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: Andy Lutomirski
    Cc: Chang S. Bae
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Markus T Metzger
    Cc: Peter Zijlstra
    Cc: Ravi Shankar
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Eric Dumazet
     
  • [ Upstream commit 3347c8a079d67af21760a78cc5f2abbcf06d9571 ]

    When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS)
    from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils
    I see the following breakage in Debian/testing AMD64:

    :15:74: error: too many positional arguments
    PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
    ^
    arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation
    GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
    ^
    :47:2: error: unknown use of instruction mnemonic without a size suffix
    GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
    ^
    arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation
    GCM_ENC_DEC dec
    ^
    :15:74: error: too many positional arguments
    PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
    ^
    arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation
    GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
    ^
    :47:2: error: unknown use of instruction mnemonic without a size suffix
    GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
    ^
    arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation
    GCM_ENC_DEC enc

    Craig Topper suggested me in ClangBuiltLinux issue #1050:

    > I think the "too many positional arguments" is because the parser isn't able
    > to handle the trailing commas.
    >
    > The "unknown use of instruction mnemonic" is because the macro was named
    > GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with
    > GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the
    > macro instantiation, but llvm doesn't.

    First, I removed the trailing comma in the PRECOMPUTE line.

    Second, I substituted:
    1. GHASH_4_ENCRYPT_4_PARALLEL_DEC -> GHASH_4_ENCRYPT_4_PARALLEL_dec
    2. GHASH_4_ENCRYPT_4_PARALLEL_ENC -> GHASH_4_ENCRYPT_4_PARALLEL_enc

    With these changes I was able to build with LLVM_IAS=1 and boot on bare metal.

    I confirmed that this works with Linux-kernel v5.7.5 final.

    NOTE: This patch is on top of Linux v5.7 final.

    Thanks to Craig and especially Nick for double-checking and his comments.

    Suggested-by: Craig Topper
    Suggested-by: Craig Topper
    Suggested-by: Nick Desaulniers
    Reviewed-by: Nick Desaulniers
    Cc: "ClangBuiltLinux"
    Link: https://github.com/ClangBuiltLinux/linux/issues/1050
    Link: https://bugs.llvm.org/show_bug.cgi?id=24494
    Signed-off-by: Sedat Dilek
    Signed-off-by: Herbert Xu
    Signed-off-by: Sasha Levin

    Sedat Dilek
     
  • [ Upstream commit 5d7f7d1d5e01c22894dee7c9c9266500478dca99 ]

    The original code is a nop as i_mce.status is or'ed with part of itself,
    fix it.

    Fixes: a1300e505297 ("x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts")
    Signed-off-by: Zhenzhong Duan
    Signed-off-by: Borislav Petkov
    Acked-by: Yazen Ghannam
    Link: https://lkml.kernel.org/r/20200611023238.3830-1-zhenzhong.duan@gmail.com
    Signed-off-by: Sasha Levin

    Zhenzhong Duan
     

05 Aug, 2020

4 commits

  • commit bdd65589593edd79b6a12ce86b3b7a7c6dae5208 upstream.

    0day reported a possible circular locking dependency:

    Chain exists of:
    &irq_desc_lock_class --> console_owner --> &port_lock_key

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&port_lock_key);
    lock(console_owner);
    lock(&port_lock_key);
    lock(&irq_desc_lock_class);

    The reason for this is a printk() in the i8259 interrupt chip driver
    which is invoked with the irq descriptor lock held, which reverses the
    lock operations vs. printk() from arbitrary contexts.

    Switch the printk() to printk_deferred() to avoid that.

    Reported-by: kernel test robot
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/87365abt2v.fsf@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit d2286ba7d574ba3103a421a2f9ec17cb5b0d87a1 upstream.

    Prevent setting the tscdeadline timer if the lapic is hw disabled.

    Fixes: bce87cce88 (KVM: x86: consolidate different ways to test for in-kernel LAPIC)
    Cc:
    Signed-off-by: Wanpeng Li
    Message-Id:
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Wanpeng Li
     
  • [ Upstream commit 039a7a30ec102ec866d382a66f87f6f7654f8140 ]

    If a user task's stack is empty, or if it only has user regs, ORC
    reports it as a reliable empty stack. But arch_stack_walk_reliable()
    incorrectly treats it as unreliable.

    That happens because the only success path for user tasks is inside the
    loop, which only iterates on non-empty stacks. Generally, a user task
    must end in a user regs frame, but an empty stack is an exception to
    that rule.

    Thanks to commit 71c95825289f ("x86/unwind/orc: Fix error handling in
    __unwind_start()"), unwind_start() now sets state->error appropriately.
    So now for both ORC and FP unwinders, unwind_done() and !unwind_error()
    always means the end of the stack was successfully reached. So the
    success path for kthreads is no longer needed -- it can also be used for
    empty user tasks.

    Reported-by: Wang ShaoBo
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Wang ShaoBo
    Link: https://lkml.kernel.org/r/f136a4e5f019219cbc4f4da33b30c2f44fa65b84.1594994374.git.jpoimboe@redhat.com
    Signed-off-by: Sasha Levin

    Josh Poimboeuf
     
  • [ Upstream commit 372a8eaa05998cd45b3417d0e0ffd3a70978211a ]

    The ORC unwinder fails to unwind newly forked tasks which haven't yet
    run on the CPU. It correctly reads the 'ret_from_fork' instruction
    pointer from the stack, but it incorrectly interprets that value as a
    call stack address rather than a "signal" one, so the address gets
    incorrectly decremented in the call to orc_find(), resulting in bad ORC
    data.

    Fix it by forcing 'ret_from_fork' frames to be signal frames.

    Reported-by: Wang ShaoBo
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Tested-by: Wang ShaoBo
    Link: https://lkml.kernel.org/r/f91a8778dde8aae7f71884b5df2b16d552040441.1594994374.git.jpoimboe@redhat.com
    Signed-off-by: Sasha Levin

    Josh Poimboeuf
     

29 Jul, 2020

3 commits

  • commit de2b41be8fcccb2f5b6c480d35df590476344201 upstream.

    On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
    page-aligned, but the end of the .bss..page_aligned section is not
    guaranteed to be page-aligned.

    As a result, objects from other .bss sections may end up on the same 4k
    page as the idt_table, and will accidentially get mapped read-only during
    boot, causing unexpected page-faults when the kernel writes to them.

    This could be worked around by making the objects in the page aligned
    sections page sized, but that's wrong.

    Explicit sections which store only page aligned objects have an implicit
    guarantee that the object is alone in the page in which it is placed. That
    works for all objects except the last one. That's inconsistent.

    Enforcing page sized objects for these sections would wreckage memory
    sanitizers, because the object becomes artificially larger than it should
    be and out of bound access becomes legit.

    Align the end of the .bss..page_aligned and .data..page_aligned section on
    page-size so all objects places in these sections are guaranteed to have
    their own page.

    [ tglx: Amended changelog ]

    Signed-off-by: Joerg Roedel
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org
    Signed-off-by: Greg Kroah-Hartman

    Joerg Roedel
     
  • [ Upstream commit 81e96851ea32deb2c921c870eecabf335f598aeb ]

    The clang integrated assembler requires the 'cmp' instruction to
    have a length prefix here:

    arch/x86/math-emu/wm_sqrt.S:212:2: error: ambiguous instructions require an explicit suffix (could be 'cmpb', 'cmpw', or 'cmpl')
    cmp $0xffffffff,-24(%ebp)
    ^

    Make this a 32-bit comparison, which it was clearly meant to be.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Nick Desaulniers
    Link: https://lkml.kernel.org/r/20200527135352.1198078-1-arnd@arndb.de
    Signed-off-by: Sasha Levin

    Arnd Bergmann
     
  • [ Upstream commit e3beca48a45b5e0e6e6a4e0124276b8248dcc9bb ]

    Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type
    IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after
    creating the irqdomain. The only purpose of these FW nodes is to convey
    name information. When this was introduced the core code did not store the
    pointer to the node in the irqdomain. A recent change stored the firmware
    node pointer in irqdomain for other reasons and missed to notice that the
    usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence
    are broken by this. Storing a dangling pointer is dangerous itself, but in
    case that the domain is destroyed later on this leads to a double free.

    Remove the freeing of the firmware node after creating the irqdomain from
    all affected call sites to cure this.

    Fixes: 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode")
    Reported-by: Andy Shevchenko
    Signed-off-by: Thomas Gleixner
    Acked-by: Bjorn Helgaas
    Acked-by: Marc Zyngier
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
    Signed-off-by: Sasha Levin

    Thomas Gleixner
     

22 Jul, 2020

2 commits

  • commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream.

    Setting interrupt affinity on inactive interrupts is inconsistent when
    hierarchical irq domains are enabled. The core code should just store the
    affinity and not call into the irq chip driver for inactive interrupts
    because the chip drivers may not be in a state to handle such requests.

    X86 has a hacky workaround for that but all other irq chips have not which
    causes problems e.g. on GIC V3 ITS.

    Instead of adding more ugly hacks all over the place, solve the problem in
    the core code. If the affinity is set on an inactive interrupt then:

    - Store it in the irq descriptors affinity mask
    - Update the effective affinity to reflect that so user space has
    a consistent view
    - Don't call into the irq chip driver

    This is the core equivalent of the X86 workaround and works correctly
    because the affinity setting is established in the irq chip when the
    interrupt is activated later on.

    Note, that this is only effective when hierarchical irq domains are enabled
    by the architecture. Doing it unconditionally would break legacy irq chip
    implementations.

    For hierarchial irq domains this works correctly as none of the drivers can
    have a dependency on affinity setting in inactive state by design.

    Remove the X86 workaround as it is not longer required.

    Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
    Reported-by: Ali Saidi
    Signed-off-by: Thomas Gleixner
    Tested-by: Ali Saidi
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com
    Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 5714ee50bb4375bd586858ad800b1d9772847452 upstream.

    This fixes a regression encountered while running the
    gdb.base/corefile.exp test in GDB's test suite.

    In my testing, the typo prevented the sw_reserved field of struct
    fxregs_state from being output to the kernel XSAVES area. Thus the
    correct mask corresponding to XCR0 was not present in the core file for
    GDB to interrogate, resulting in the following behavior:

    [kev@f32-1 gdb]$ ./gdb -q testsuite/outputs/gdb.base/corefile/corefile testsuite/outputs/gdb.base/corefile/corefile.core
    Reading symbols from testsuite/outputs/gdb.base/corefile/corefile...
    [New LWP 232880]

    warning: Unexpected size of section `.reg-xstate/232880' in core file.

    With the typo fixed, the test works again as expected.

    Signed-off-by: Kevin Buettner
    Fixes: 9e4636545933 ("copy_xstate_to_kernel(): don't leave parts of destination uninitialized")
    Cc: Al Viro
    Cc: Dave Airlie
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kevin Buettner