Eric Lee / smarc-fsl-linux-kernel

01 Oct, 2020

12 commits

6890a6f56 x86/ioapic: Unbreak check_timer() ... Browse Code »

commit 86a82ae0b5095ea24c55898a3f025791e7958b21 upstream.

Several people reported in the kernel bugzilla that between v4.12 and v4.13
the magic which works around broken hardware and BIOSes to find the proper
timer interrupt delivery mode stopped working for some older affected
platforms which need to fall back to ExtINT delivery mode.

The reason is that the core code changed to keep track of the masked and
disabled state of an interrupt line more accurately to avoid the expensive
hardware operations.

That broke an assumption in i8259_make_irq() which invokes

disable_irq_nosync();
irq_set_chip_and_handler();
enable_irq();

Up to v4.12 this worked because enable_irq() unconditionally unmasked the
interrupt line, but after the state tracking improvements this is not
longer the case because the IO/APIC uses lazy disabling. So the line state
is unmasked which means that enable_irq() does not call into the new irq
chip to unmask it.

In principle this is a shortcoming of the core code, but it's more than
unclear whether the core code should try to reset state. At least this
cannot be done unconditionally as that would break other existing use cases
where the chip type is changed, e.g. when changing the trigger type, but
the callers expect the state to be preserved.

As the way how check_timer() is switching the delivery modes is truly
unique, the obvious fix is to simply unmask the i8259 manually after
changing the mode to ExtINT delivery and switching the irq chip to the
legacy PIC.

Note, that the fixes tag is not really precise, but identifies the commit
which broke the assumptions in the IO/APIC and i8259 code and that's the
kernel version to which this needs to be backported.

Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls")
Reported-by: p_c_chan@hotmail.com
Reported-by: ecm4@mail.com
Reported-by: perdigao1@yahoo.com
Reported-by: matzes@users.sourceforge.net
Reported-by: rvelascog@gmail.com
Signed-off-by: Thomas Gleixner
Tested-by: p_c_chan@hotmail.com
Tested-by: matzes@users.sourceforge.net
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197769
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2020-10-01 19:18:22 +0800
361a4b17e arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback ... Browse Code »

commit a1cd6c2ae47ee10ff21e62475685d5b399e2ed4a upstream.

If we copy less than 8 bytes and if the destination crosses a cache
line, __copy_user_flushcache would invalidate only the first cache line.

This patch makes it invalidate the second cache line as well.

Fixes: 0aed55af88345b ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations")
Signed-off-by: Mikulas Patocka
Signed-off-by: Andrew Morton
Reviewed-by: Dan Williams
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Al Viro
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc: Ingo Molnar
Cc:
Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Mikulas Patocka
2020-10-01 19:18:21 +0800
5d4431c9d KVM: SVM: Add a dedicated INVD intercept routine ... Browse Code »

[ Upstream commit 4bb05f30483fd21ea5413eaf1182768f251cf625 ]

The INVD instruction intercept performs emulation. Emulation can't be done
on an SEV guest because the guest memory is encrypted.

Provide a dedicated intercept routine for the INVD intercept. And since
the instruction is emulated as a NOP, just skip it instead.

Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command")
Signed-off-by: Tom Lendacky
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Tom Lendacky
2020-10-01 19:18:21 +0800
16788dc19 KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE ... Browse Code »

[ Upstream commit 8d214c481611b29458a57913bd786f0ac06f0605 ]

Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled.
Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are
toggled inadvertantly skipped the MMU context reset due to the mask
of bits that triggers PDPTR loads also being used to trigger MMU context
resets.

Fixes: 427890aff855 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode")
Fixes: cb957adb4ea4 ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode")
Cc: Jim Mattson
Cc: Peter Shier
Cc: Oliver Upton
Signed-off-by: Sean Christopherson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Sean Christopherson
2020-10-01 19:18:21 +0800
bc65336ac x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline ... Browse Code »

[ Upstream commit a7ef9ba986b5fae9d80f8a7b31db0423687efe4e ]

Prevent the compiler from uninlining and creating traceable/probable
functions as this is invoked _after_ context tracking switched to
CONTEXT_USER and rcu idle.

Signed-off-by: Thomas Gleixner
Reviewed-by: Alexandre Chartre
Acked-by: Peter Zijlstra
Link: https://lkml.kernel.org/r/20200505134340.902709267@linutronix.de
Signed-off-by: Sasha Levin

Thomas Gleixner
2020-10-01 19:18:09 +0800
97cf50cc4 KVM: x86: handle wrap around 32-bit address space ... Browse Code »

[ Upstream commit fede8076aab4c2280c673492f8f7a2e87712e8b4 ]

KVM is not handling the case where EIP wraps around the 32-bit address
space (that is, outside long mode). This is needed both in vmx.c
and in emulate.c. SVM with NRIPS is okay, but it can still print
an error to dmesg due to integer overflow.

Reported-by: Nick Peterson
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Paolo Bonzini
2020-10-01 19:18:00 +0800
4f4e29789 KVM: Remove CREATE_IRQCHIP/SET_PIT2 race ... Browse Code »

[ Upstream commit 7289fdb5dcdbc5155b5531529c44105868a762f2 ]

Fixes a NULL pointer dereference, caused by the PIT firing an interrupt
before the interrupt table has been initialized.

SET_PIT2 can race with the creation of the IRQchip. In particular,
if SET_PIT2 is called with a low PIT timer period (after the creation of
the IOAPIC, but before the instantiation of the irq routes), the PIT can
fire an interrupt at an uninitialized table.

Signed-off-by: Steve Rutherford
Signed-off-by: Jon Cargille
Reviewed-by: Jim Mattson
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Steve Rutherford
2020-10-01 19:17:55 +0800
048892dfa KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context ... Browse Code »

[ Upstream commit edec6e015a02003c2af0ce82c54ea016b5a9e3f0 ]

apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but
started later with HRTIMER_MODE_ABS, which may cause the following warning
in PREEMPT_RT kernel.

WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0
CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1
Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018
RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0
Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45
b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34
c5 a0 26 bf 93 e9 a1 fd ff ff 0b e9 fd fc ff
ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91
RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110
RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf
R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00
FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0
Call Trace:
? kvm_release_pfn_clean+0x22/0x60 [kvm]
start_sw_timer+0x85/0x230 [kvm]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm]
vmx_pre_block+0x1cb/0x260 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel]
? kvm_apic_has_interrupt+0x46/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm]
? _raw_spin_unlock_irqrestore+0x18/0x50
? _copy_to_user+0x2c/0x30
kvm_vcpu_ioctl+0x235/0x660 [kvm]
? rt_spin_unlock+0x2c/0x50
do_vfs_ioctl+0x3e4/0x650
? __fget+0x7a/0xa0
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x4d/0x120
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4027cc54a7
Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
00 00 00 b8 10 00 00 00 0f 05 3d 01 f0 ff ff
73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff
R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840
--[ end trace 0000000000000002 ]--

Signed-off-by: He Zhe
Message-Id:
Reviewed-by: Wanpeng Li
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

He Zhe
2020-10-01 19:17:45 +0800
e8b95c29c x86/pkeys: Add check for pkey "overflow" ... Browse Code »

[ Upstream commit 16171bffc829272d5e6014bad48f680cb50943d9 ]

Alex Shi reported the pkey macros above arch_set_user_pkey_access()
to be unused. They are unused, and even refer to a nonexistent
CONFIG option.

But, they might have served a good use, which was to ensure that
the code does not try to set values that would not fit in the
PKRU register. As it stands, a too-large 'pkey' value would
be likely to silently overflow the u32 new_pkru_bits.

Add a check to look for overflows. Also add a comment to remind
any future developer to closely examine the types used to store
pkey values if arch_max_pkey() ever changes.

This boots and passes the x86 pkey selftests.

Reported-by: Alex Shi
Signed-off-by: Dave Hansen
Signed-off-by: Borislav Petkov
Link: https://lkml.kernel.org/r/20200122165346.AD4DA150@viggo.jf.intel.com
Signed-off-by: Sasha Levin

Dave Hansen
2020-10-01 19:17:35 +0800
484de771d KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow ... Browse Code »

[ Upstream commit c9dfd3fb08352d439f0399b6fabe697681d2638c ]

For the duration of mapping eVMCS, it derefences ->memslots without holding
->srcu or ->slots_lock when accessing hv assist page. This patch fixes it by
moving nested_sync_vmcs12_to_shadow to prepare_guest_switch, where the SRCU
is already taken.

It can be reproduced by running kvm's evmcs_test selftest.

=============================
warning: suspicious rcu usage
5.6.0-rc1+ #53 tainted: g w ioe
-----------------------------
./include/linux/kvm_host.h:623 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by evmcs_test/8507:
#0: ffff9ddd156d00d0 (&vcpu->mutex){+.+.}, at:
kvm_vcpu_ioctl+0x85/0x680 [kvm]

stack backtrace:
cpu: 6 pid: 8507 comm: evmcs_test tainted: g w ioe 5.6.0-rc1+ #53
hardware name: dell inc. optiplex 7040/0jctf8, bios 1.4.9 09/12/2016
call trace:
dump_stack+0x68/0x9b
kvm_read_guest_cached+0x11d/0x150 [kvm]
kvm_hv_get_assist_page+0x33/0x40 [kvm]
nested_enlightened_vmentry+0x2c/0x60 [kvm_intel]
nested_vmx_handle_enlightened_vmptrld.part.52+0x32/0x1c0 [kvm_intel]
nested_sync_vmcs12_to_shadow+0x439/0x680 [kvm_intel]
vmx_vcpu_run+0x67a/0xe60 [kvm_intel]
vcpu_enter_guest+0x35e/0x1bc0 [kvm]
kvm_arch_vcpu_ioctl_run+0x40b/0x670 [kvm]
kvm_vcpu_ioctl+0x370/0x680 [kvm]
ksys_ioctl+0x235/0x850
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x77/0x780
entry_syscall_64_after_hwframe+0x49/0xbe

Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

wanpeng li
2020-10-01 19:17:35 +0800
d1da39644 KVM: x86: fix incorrect comparison in trace event ... Browse Code »

[ Upstream commit 147f1a1fe5d7e6b01b8df4d0cbd6f9eaf6b6c73b ]

The "u" field in the event has three states, -1/0/1. Using u8 however means that
comparison with -1 will always fail, so change to signed char.

Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Paolo Bonzini
2020-10-01 19:17:35 +0800
04f4f0950 x86/kdump: Always reserve the low 1M when the crashkernel option is specified ... Browse Code »

[ Upstream commit 6f599d84231fd27e42f4ca2a786a6641e8cddf00 ]

On x86, purgatory() copies the first 640K of memory to a backup region
because the kernel needs those first 640K for the real mode trampoline
during boot, among others.

However, when SME is enabled, the kernel cannot properly copy the old
memory to the backup area but reads only its encrypted contents. The
result is that the crash tool gets invalid pointers when parsing vmcore:

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
crash>

So reserve the remaining low 1M memory when the crashkernel option is
specified (after reserving real mode memory) so that allocated memory
does not fall into the low 1M area and thus the copying of the contents
of the first 640k to a backup region in purgatory() can be avoided
altogether.

This way, it does not need to be included in crash dumps or used for
anything except the trampolines that must live in the low 1M.

[ bp: Heavily rewrite commit message, flip check logic in
crash_reserve_low_1M().]

Signed-off-by: Lianbo Jiang
Signed-off-by: Borislav Petkov
Cc: bhe@redhat.com
Cc: Dave Young
Cc: d.hatayama@fujitsu.com
Cc: dhowells@redhat.com
Cc: ebiederm@xmission.com
Cc: horms@verge.net.au
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jürgen Gross
Cc: kexec@lists.infradead.org
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: vgoyal@redhat.com
Cc: x86-ml
Link: https://lkml.kernel.org/r/20191108090027.11082-2-lijiang@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204793
Signed-off-by: Sasha Levin

Lianbo Jiang
2020-10-01 19:17:18 +0800

23 Sep, 2020

1 commit

463a0d4c1 x86/boot/compressed: Disable relocation relaxation ... Browse Code »

commit 09e43968db40c33a73e9ddbfd937f46d5c334924 upstream.

The x86-64 psABI [0] specifies special relocation types
(R_X86_64_[REX_]GOTPCRELX) for indirection through the Global Offset
Table, semantically equivalent to R_X86_64_GOTPCREL, which the linker
can take advantage of for optimization (relaxation) at link time. This
is supported by LLD and binutils versions 2.26 onwards.

The compressed kernel is position-independent code, however, when using
LLD or binutils versions before 2.27, it must be linked without the -pie
option. In this case, the linker may optimize certain instructions into
a non-position-independent form, by converting foo@GOTPCREL(%rip) to $foo.

This potential issue has been present with LLD and binutils-2.26 for a
long time, but it has never manifested itself before now:

- LLD and binutils-2.26 only relax
movq foo@GOTPCREL(%rip), %reg
to
leaq foo(%rip), %reg
which is still position-independent, rather than
mov $foo, %reg
which is permitted by the psABI when -pie is not enabled.

- GCC happens to only generate GOTPCREL relocations on mov instructions.

- CLang does generate GOTPCREL relocations on non-mov instructions, but
when building the compressed kernel, it uses its integrated assembler
(due to the redefinition of KBUILD_CFLAGS dropping -no-integrated-as),
which has so far defaulted to not generating the GOTPCRELX
relocations.

Nick Desaulniers reports [1,2]:

"A recent change [3] to a default value of configuration variable
(ENABLE_X86_RELAX_RELOCATIONS OFF -> ON) in LLVM now causes Clang's
integrated assembler to emit R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX
relocations. LLD will relax instructions with these relocations based
on whether the image is being linked as position independent or not.
When not, then LLD will relax these instructions to use absolute
addressing mode (R_RELAX_GOT_PC_NOPIC). This causes kernels built with
Clang and linked with LLD to fail to boot."

Patch series [4] is a solution to allow the compressed kernel to be
linked with -pie unconditionally, but even if merged is unlikely to be
backported. As a simple solution that can be applied to stable as well,
prevent the assembler from generating the relaxed relocation types using
the -mrelax-relocations=no option. For ease of backporting, do this
unconditionally.

[0] https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/linker-optimization.tex#L65
[1] https://lore.kernel.org/lkml/20200807194100.3570838-1-ndesaulniers@google.com/
[2] https://github.com/ClangBuiltLinux/linux/issues/1121
[3] https://reviews.llvm.org/rGc41a18cf61790fc898dcda1055c3efbf442c14c0
[4] https://lore.kernel.org/lkml/20200731202738.2577854-1-nivedita@alum.mit.edu/

Reported-by: Nick Desaulniers
Signed-off-by: Arvind Sankar
Signed-off-by: Ingo Molnar
Tested-by: Nick Desaulniers
Tested-by: Sedat Dilek
Acked-by: Ard Biesheuvel
Reviewed-by: Nick Desaulniers
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200812004308.1448603-1-nivedita@alum.mit.edu
Signed-off-by: Greg Kroah-Hartman

Arvind Sankar
2020-09-23 18:40:46 +0800

17 Sep, 2020

2 commits

a86743ebe KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit ... Browse Code »

commit 99b82a1437cb31340dbb2c437a2923b9814a7b15 upstream.

According to SDM 27.2.4, Event delivery causes an APIC-access VM exit.
Don't report internal error and freeze guest when event delivery causes
an APIC-access exit, it is handleable and the event will be re-injected
during the next vmentry.

Signed-off-by: Wanpeng Li
Message-Id:
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Wanpeng Li
2020-09-17 19:47:54 +0800
087b6cb17 vgacon: remove software scrollback support ... Browse Code »

commit 973c096f6a85e5b5f2a295126ba6928d9a6afd45 upstream.

Yunhai Zhang recently fixed a VGA software scrollback bug in commit
ebfdfeeae8c0 ("vgacon: Fix for missing check in scrollback handling"),
but that then made people look more closely at some of this code, and
there were more problems on the vgacon side, but also the fbcon software
scrollback.

We don't really have anybody who maintains this code - probably because
nobody actually _uses_ it any more. Sure, people still use both VGA and
the framebuffer consoles, but they are no longer the main user
interfaces to the kernel, and haven't been for decades, so these kinds
of extra features end up bitrotting and not really being used.

So rather than try to maintain a likely unused set of code, I'll just
aggressively remove it, and see if anybody even notices. Maybe there
are people who haven't jumped on the whole GUI badnwagon yet, and think
it's just a fad. And maybe those people use the scrollback code.

If that turns out to be the case, we can resurrect this again, once
we've found the sucker^Wmaintainer for it who actually uses it.

Reported-by: NopNop Nop
Tested-by: Willy Tarreau
Cc: 张云海
Acked-by: Andy Lutomirski
Acked-by: Willy Tarreau
Reviewed-by: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2020-09-17 19:47:54 +0800

10 Sep, 2020

2 commits

cc6c4d81d tracing/kprobes, x86/ptrace: Fix regs argument order for i386 ... Browse Code »

commit 2356bb4b8221d7dc8c7beb810418122ed90254c9 upstream.

On i386, the order of parameters passed on regs is eax,edx,and ecx
(as per regparm(3) calling conventions).

Change the mapping in regs_get_kernel_argument(), so that arg1=ax
arg2=dx, and arg3=cx.

Running the selftests testcase kprobes_args_use.tc shows the result
as passed.

Fixes: 3c88ee194c28 ("x86: ptrace: Add function argument access API")
Signed-off-by: Vamshi K Sthambamkadi
Signed-off-by: Borislav Petkov
Acked-by: Masami Hiramatsu
Acked-by: Peter Zijlstra (Intel)
Cc:
Link: https://lkml.kernel.org/r/20200828113242.GA1424@cosmos
Signed-off-by: Greg Kroah-Hartman

Vamshi K Sthambamkadi
2020-09-10 01:12:30 +0800
920d9ffcd x86, fakenuma: Fix invalid starting node ID ... Browse Code »

[ Upstream commit ccae0f36d500aef727f98acd8d0601e6b262a513 ]

Commit:

cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability")

uses "-1" as the starting node ID, which causes the strange kernel log as
follows, when "numa=fake=32G" is added to the kernel command line:

Faking node -1 at [mem 0x0000000000000000-0x0000000893ffffff] (35136MB)
Faking node 0 at [mem 0x0000001840000000-0x000000203fffffff] (32768MB)
Faking node 1 at [mem 0x0000000894000000-0x000000183fffffff] (64192MB)
Faking node 2 at [mem 0x0000002040000000-0x000000283fffffff] (32768MB)
Faking node 3 at [mem 0x0000002840000000-0x000000303fffffff] (32768MB)

And finally the kernel crashes:

BUG: Bad page state in process swapper pfn:00011
page:(____ptrval____) refcount:0 mapcount:1 mapping:(____ptrval____) index:0x55cd7e44b270 pfn:0x11
failed to read mapping contents, not a valid kernel address?
flags: 0x5(locked|uptodate)
raw: 0000000000000005 000055cd7e44af30 000055cd7e44af50 0000000100000006
raw: 000055cd7e44b270 000055cd7e44b290 0000000000000000 000055cd7e44b510
page dumped because: page still charged to cgroup
page->mem_cgroup:000055cd7e44b510
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc2 #1
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
Call Trace:
dump_stack+0x57/0x80
bad_page.cold+0x63/0x94
__free_pages_ok+0x33f/0x360
memblock_free_all+0x127/0x195
mem_init+0x23/0x1f5
start_kernel+0x219/0x4f5
secondary_startup_64+0xb6/0xc0

Fix this bug via using 0 as the starting node ID. This restores the
original behavior before cc9aec03e58f.

[ mingo: Massaged the changelog. ]

Fixes: cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability")
Signed-off-by: "Huang, Ying"
Signed-off-by: Ingo Molnar
Link: https://lore.kernel.org/r/20200904061047.612950-1-ying.huang@intel.com
Signed-off-by: Sasha Levin

Huang Ying
2020-09-10 01:12:28 +0800

03 Sep, 2020

1 commit

1adf8c19f x86/hotplug: Silence APIC only after all interrupts are migrated ... Browse Code »

commit 52d6b926aabc47643cd910c85edb262b7f44c168 upstream.

There is a race when taking a CPU offline. Current code looks like this:

native_cpu_disable()
{
...
apic_soft_disable();
/*
* Any existing set bits for pending interrupt to
* this CPU are preserved and will be sent via IPI
* to another CPU by fixup_irqs().
*/
cpu_disable_common();
{
....
/*
* Race window happens here. Once local APIC has been
* disabled any new interrupts from the device to
* the old CPU are lost
*/
fixup_irqs(); // Too late to capture anything in IRR.
...
}
}

The fix is to disable the APIC *after* cpu_disable_common().

Testing was done with a USB NIC that provided a source of frequent
interrupts. A script migrated interrupts to a specific CPU and
then took that CPU offline.

Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead")
Reported-by: Evan Green
Signed-off-by: Ashok Raj
Signed-off-by: Thomas Gleixner
Tested-by: Mathias Nyman
Tested-by: Evan Green
Reviewed-by: Evan Green
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
Signed-off-by: Greg Kroah-Hartman

Ashok Raj
2020-09-03 17:27:06 +0800

26 Aug, 2020

5 commits

e1818ffcc KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() ... Browse Code »

commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream.

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc:
Cc: Marc Zyngier
Cc: Suzuki K Poulose
Cc: James Morse
Signed-off-by: Will Deacon
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Will Deacon
Signed-off-by: Greg Kroah-Hartman

Will Deacon
2020-08-26 16:41:08 +0800
dc0d58e28 Fix build error when CONFIG_ACPI is not set/enabled: ... Browse Code »

[ Upstream commit ee87e1557c42dc9c2da11c38e11b87c311569853 ]

../arch/x86/pci/xen.c: In function ‘pci_xen_init’:
../arch/x86/pci/xen.c:410:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration]
acpi_noirq_set();

Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef")
Signed-off-by: Randy Dunlap
Reviewed-by: Juergen Gross
Cc: Andy Shevchenko
Cc: Bjorn Helgaas
Cc: Konrad Rzeszutek Wilk
Cc: xen-devel@lists.xenproject.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Juergen Gross
Signed-off-by: Sasha Levin

Randy Dunlap
2020-08-26 16:41:05 +0800
6e2aa034d kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode ... Browse Code »

[ Upstream commit cb957adb4ea422bd758568df5b2478ea3bb34f35 ]

See the SDM, volume 3, section 4.4.1:

If PAE paging would be in use following an execution of MOV to CR0 or
MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
the PDPTEs are loaded from the address in CR3.

Fixes: b9baba8614890 ("KVM, pkeys: expose CPUID/CR4 to guest")
Cc: Huaitong Han
Signed-off-by: Jim Mattson
Reviewed-by: Peter Shier
Reviewed-by: Oliver Upton
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Jim Mattson
2020-08-26 16:41:04 +0800
46713f3d6 kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode ... Browse Code »

[ Upstream commit 427890aff8558eb4326e723835e0eae0e6fe3102 ]

See the SDM, volume 3, section 4.4.1:

If PAE paging would be in use following an execution of MOV to CR0 or
MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
the PDPTEs are loaded from the address in CR3.

Fixes: 0be0226f07d14 ("KVM: MMU: fix SMAP virtualization")
Cc: Xiao Guangrong
Signed-off-by: Jim Mattson
Reviewed-by: Peter Shier
Reviewed-by: Oliver Upton
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Sasha Levin

Jim Mattson
2020-08-26 16:41:03 +0800
fa84d9f31 x86/boot: kbuild: allow readelf executable to be specified ... Browse Code »

commit eefb8c124fd969e9a174ff2bedff86aa305a7438 upstream.

Introduce a new READELF variable to top-level Makefile, so the name of
readelf binary can be specified.

Before this change the name of the binary was hardcoded to
"$(CROSS_COMPILE)readelf" which might not be present for every
toolchain.

This allows to build with LLVM Object Reader by using make parameter
READELF=llvm-readelf.

Link: https://github.com/ClangBuiltLinux/linux/issues/771
Signed-off-by: Dmitry Golovin
Reviewed-by: Nick Desaulniers
Signed-off-by: Masahiro Yamada
Signed-off-by: Nick Desaulniers
Signed-off-by: Greg Kroah-Hartman

Dmitry Golovin
2020-08-26 16:40:46 +0800

21 Aug, 2020

3 commits

64d358a9a perf/x86/rapl: Fix missing psys sysfs attributes ... Browse Code »

[ Upstream commit 4bb5fcb97a5df0bbc0a27e0252b1e7ce140a8431 ]

This fixes a problem introduced by commit:

5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")

that perf event sysfs attributes for psys RAPL domain are missing.

Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")
Signed-off-by: Zhang Rui
Signed-off-by: Ingo Molnar
Reviewed-by: Kan Liang
Reviewed-by: Len Brown
Acked-by: Jiri Olsa
Link: https://lore.kernel.org/r/20200811153149.12242-2-rui.zhang@intel.com
Signed-off-by: Sasha Levin

Zhang Rui
2020-08-21 19:05:38 +0800
8d7633b5a x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC ... Browse Code »

[ Upstream commit 7d98585860d845e36ee612832a5ff021f201dbaf ]

Frequency descriptor of Lightning Mountain SoC doesn't have all the
frequency entries so resulting in the below failure causing a kernel hang:

Error MSR_FSB_FREQ index 15 is unknown
tsc: Fast TSC calibration failed

So, add all the frequency entries in the Lightning Mountain SoC frequency
descriptor.

Fixes: 0cc5359d8fd45 ("x86/cpu: Update init data for new Airmont CPU model")
Fixes: 812c2d7506fd ("x86/tsc_msr: Use named struct initializers")
Signed-off-by: Dilip Kota
Signed-off-by: Ingo Molnar
Reviewed-by: Andy Shevchenko
Link: https://lore.kernel.org/r/211c643ae217604b46cbec43a2c0423946dc7d2d.1596440057.git.eswara.kota@linux.intel.com
Signed-off-by: Sasha Levin

Dilip Kota
2020-08-21 19:05:36 +0800
a11f42496 genirq/affinity: Make affinity setting if activated opt-in ... Browse Code »

commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream.

John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:

"It looks like what happens is that because the interrupts are not per-CPU
in the hardware, armpmu_request_irq() calls irq_force_affinity() while
the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
IRQF_NOBALANCING.

Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
irq_setup_affinity() which returns early because IRQF_PERCPU and
IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."

This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.

Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...

Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: John Keeping
Signed-off-by: Thomas Gleixner
Tested-by: Marc Zyngier
Acked-by: Marc Zyngier
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2020-08-21 19:05:20 +0800

19 Aug, 2020

5 commits

a3ec61c84 irqdomain/treewide: Free firmware node after domain removal ... Browse Code »

commit ec0160891e387f4771f953b888b1fe951398e5d9 upstream.

Commit 711419e504eb ("irqdomain: Add the missing assignment of
domain->fwnode for named fwnode") unintentionally caused a dangling pointer
page fault issue on firmware nodes that were freed after IRQ domain
allocation. Commit e3beca48a45b fixed that dangling pointer issue by only
freeing the firmware node after an IRQ domain allocation failure. That fix
no longer frees the firmware node immediately, but leaves the firmware node
allocated after the domain is removed.

The firmware node must be kept around through irq_domain_remove, but should be
freed it afterwards.

Add the missing free operations after domain removal where where appropriate.

Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Signed-off-by: Jon Derrick
Signed-off-by: Thomas Gleixner
Reviewed-by: Andy Shevchenko
Acked-by: Bjorn Helgaas # drivers/pci
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com
Signed-off-by: Greg Kroah-Hartman

Jon Derrick
2020-08-19 14:16:27 +0800
5ef739b7a crypto: aesni - add compatibility with IAS ... Browse Code »

[ Upstream commit 44069737ac9625a0f02f0f7f5ab96aae4cd819bc ]

Clang's integrated assembler complains "invalid reassignment of
non-absolute variable 'var_ddq_add'" while assembling
arch/x86/crypto/aes_ctrby8_avx-x86_64.S. It was because var_ddq_add was
reassigned with non-absolute values several times, which IAS did not
support. We can avoid the reassignment by replacing the uses of
var_ddq_add with its definitions accordingly to have compatilibility
with IAS.

Link: https://github.com/ClangBuiltLinux/linux/issues/1008
Reported-by: Sedat Dilek
Reported-by: Fangrui Song
Tested-by: Sedat Dilek # build+boot Linux v5.7.5; clang v11.0.0-git
Signed-off-by: Jian Cai
Signed-off-by: Herbert Xu
Signed-off-by: Sasha Levin

Jian Cai
2020-08-19 14:16:22 +0800
c44efee6e x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task ... Browse Code »

[ Upstream commit 8ab49526b53d3172d1d8dd03a75c7d1f5bd21239 ]

syzbot found its way in 86_fsgsbase_read_task() and triggered this oops:

KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0
RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393
Call Trace:
putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876
genregs32_set arch/x86/kernel/ptrace.c:1026 [inline]
genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006
copy_regset_from_user include/linux/regset.h:326 [inline]
ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline]
compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198
__do_compat_sys_ptrace kernel/ptrace.c:1420 [inline]
__se_compat_sys_ptrace kernel/ptrace.c:1389 [inline]
__ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389
do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline]
__do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126
do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

This can happen if ptrace() or sigreturn() pokes an LDT selector into FS
or GS for a task with no LDT and something tries to read the base before
a return to usermode notices the bad selector and fixes it.

The fix is to make sure ldt pointer is not NULL.

Fixes: 07e1d88adaae ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately")
Co-developed-by: Jann Horn
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Acked-by: Andy Lutomirski
Cc: Chang S. Bae
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Markus T Metzger
Cc: Peter Zijlstra
Cc: Ravi Shankar
Cc: Rik van Riel
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Eric Dumazet
2020-08-19 14:16:22 +0800
8b8d17d9f crypto: aesni - Fix build with LLVM_IAS=1 ... Browse Code »

[ Upstream commit 3347c8a079d67af21760a78cc5f2abbcf06d9571 ]

When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS)
from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils
I see the following breakage in Debian/testing AMD64:

:15:74: error: too many positional arguments
PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
^
arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation
GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
^
:47:2: error: unknown use of instruction mnemonic without a size suffix
GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
^
arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation
GCM_ENC_DEC dec
^
:15:74: error: too many positional arguments
PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
^
arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation
GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
^
:47:2: error: unknown use of instruction mnemonic without a size suffix
GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
^
arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation
GCM_ENC_DEC enc

Craig Topper suggested me in ClangBuiltLinux issue #1050:

> I think the "too many positional arguments" is because the parser isn't able
> to handle the trailing commas.
>
> The "unknown use of instruction mnemonic" is because the macro was named
> GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with
> GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the
> macro instantiation, but llvm doesn't.

First, I removed the trailing comma in the PRECOMPUTE line.

Second, I substituted:
1. GHASH_4_ENCRYPT_4_PARALLEL_DEC -> GHASH_4_ENCRYPT_4_PARALLEL_dec
2. GHASH_4_ENCRYPT_4_PARALLEL_ENC -> GHASH_4_ENCRYPT_4_PARALLEL_enc

With these changes I was able to build with LLVM_IAS=1 and boot on bare metal.

I confirmed that this works with Linux-kernel v5.7.5 final.

NOTE: This patch is on top of Linux v5.7 final.

Thanks to Craig and especially Nick for double-checking and his comments.

Suggested-by: Craig Topper
Suggested-by: Craig Topper
Suggested-by: Nick Desaulniers
Reviewed-by: Nick Desaulniers
Cc: "ClangBuiltLinux"
Link: https://github.com/ClangBuiltLinux/linux/issues/1050
Link: https://bugs.llvm.org/show_bug.cgi?id=24494
Signed-off-by: Sedat Dilek
Signed-off-by: Herbert Xu
Signed-off-by: Sasha Levin

Sedat Dilek
2020-08-19 14:16:00 +0800
072d1300f x86/mce/inject: Fix a wrong assignment of i_mce.status ... Browse Code »

[ Upstream commit 5d7f7d1d5e01c22894dee7c9c9266500478dca99 ]

The original code is a nop as i_mce.status is or'ed with part of itself,
fix it.

Fixes: a1300e505297 ("x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts")
Signed-off-by: Zhenzhong Duan
Signed-off-by: Borislav Petkov
Acked-by: Yazen Ghannam
Link: https://lkml.kernel.org/r/20200611023238.3830-1-zhenzhong.duan@gmail.com
Signed-off-by: Sasha Levin

Zhenzhong Duan
2020-08-19 14:15:53 +0800

05 Aug, 2020

4 commits

395685467 x86/i8259: Use printk_deferred() to prevent deadlock ... Browse Code »

commit bdd65589593edd79b6a12ce86b3b7a7c6dae5208 upstream.

0day reported a possible circular locking dependency:

Chain exists of:
&irq_desc_lock_class --> console_owner --> &port_lock_key

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&port_lock_key);
lock(console_owner);
lock(&port_lock_key);
lock(&irq_desc_lock_class);

The reason for this is a printk() in the i8259 interrupt chip driver
which is invoked with the irq descriptor lock held, which reverses the
lock operations vs. printk() from arbitrary contexts.

Switch the printk() to printk_deferred() to avoid that.

Reported-by: kernel test robot
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87365abt2v.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2020-08-05 15:59:52 +0800
01ac46c6b KVM: LAPIC: Prevent setting the tscdeadline timer if the lapic is hw disabled ... Browse Code »

commit d2286ba7d574ba3103a421a2f9ec17cb5b0d87a1 upstream.

Prevent setting the tscdeadline timer if the lapic is hw disabled.

Fixes: bce87cce88 (KVM: x86: consolidate different ways to test for in-kernel LAPIC)
Cc:
Signed-off-by: Wanpeng Li
Message-Id:
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Wanpeng Li
2020-08-05 15:59:52 +0800
5f4e6b874 x86/stacktrace: Fix reliable check for empty user task stacks ... Browse Code »

[ Upstream commit 039a7a30ec102ec866d382a66f87f6f7654f8140 ]

If a user task's stack is empty, or if it only has user regs, ORC
reports it as a reliable empty stack. But arch_stack_walk_reliable()
incorrectly treats it as unreliable.

That happens because the only success path for user tasks is inside the
loop, which only iterates on non-empty stacks. Generally, a user task
must end in a user regs frame, but an empty stack is an exception to
that rule.

Thanks to commit 71c95825289f ("x86/unwind/orc: Fix error handling in
__unwind_start()"), unwind_start() now sets state->error appropriately.
So now for both ORC and FP unwinders, unwind_done() and !unwind_error()
always means the end of the stack was successfully reached. So the
success path for kthreads is no longer needed -- it can also be used for
empty user tasks.

Reported-by: Wang ShaoBo
Signed-off-by: Josh Poimboeuf
Signed-off-by: Thomas Gleixner
Tested-by: Wang ShaoBo
Link: https://lkml.kernel.org/r/f136a4e5f019219cbc4f4da33b30c2f44fa65b84.1594994374.git.jpoimboe@redhat.com
Signed-off-by: Sasha Levin

Josh Poimboeuf
2020-08-05 15:59:51 +0800
32344d299 x86/unwind/orc: Fix ORC for newly forked tasks ... Browse Code »

[ Upstream commit 372a8eaa05998cd45b3417d0e0ffd3a70978211a ]

The ORC unwinder fails to unwind newly forked tasks which haven't yet
run on the CPU. It correctly reads the 'ret_from_fork' instruction
pointer from the stack, but it incorrectly interprets that value as a
call stack address rather than a "signal" one, so the address gets
incorrectly decremented in the call to orc_find(), resulting in bad ORC
data.

Fix it by forcing 'ret_from_fork' frames to be signal frames.

Reported-by: Wang ShaoBo
Signed-off-by: Josh Poimboeuf
Signed-off-by: Thomas Gleixner
Tested-by: Wang ShaoBo
Link: https://lkml.kernel.org/r/f91a8778dde8aae7f71884b5df2b16d552040441.1594994374.git.jpoimboe@redhat.com
Signed-off-by: Sasha Levin

Josh Poimboeuf
2020-08-05 15:59:51 +0800

29 Jul, 2020

3 commits

697bd3e4a x86, vmlinux.lds: Page-align end of ..page_aligned sections ... Browse Code »

commit de2b41be8fcccb2f5b6c480d35df590476344201 upstream.

On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
page-aligned, but the end of the .bss..page_aligned section is not
guaranteed to be page-aligned.

As a result, objects from other .bss sections may end up on the same 4k
page as the idt_table, and will accidentially get mapped read-only during
boot, causing unexpected page-faults when the kernel writes to them.

This could be worked around by making the objects in the page aligned
sections page sized, but that's wrong.

Explicit sections which store only page aligned objects have an implicit
guarantee that the object is alone in the page in which it is placed. That
works for all objects except the last one. That's inconsistent.

Enforcing page sized objects for these sections would wreckage memory
sanitizers, because the object becomes artificially larger than it should
be and out of bound access becomes legit.

Align the end of the .bss..page_aligned and .data..page_aligned section on
page-size so all objects places in these sections are guaranteed to have
their own page.

[ tglx: Amended changelog ]

Signed-off-by: Joerg Roedel
Signed-off-by: Thomas Gleixner
Reviewed-by: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org
Signed-off-by: Greg Kroah-Hartman

Joerg Roedel
2020-07-29 16:18:45 +0800
90e78ec7d x86: math-emu: Fix up 'cmp' insn for clang ias ... Browse Code »

[ Upstream commit 81e96851ea32deb2c921c870eecabf335f598aeb ]

The clang integrated assembler requires the 'cmp' instruction to
have a length prefix here:

arch/x86/math-emu/wm_sqrt.S:212:2: error: ambiguous instructions require an explicit suffix (could be 'cmpb', 'cmpw', or 'cmpl')
cmp $0xffffffff,-24(%ebp)
^

Make this a 32-bit comparison, which it was clearly meant to be.

Signed-off-by: Arnd Bergmann
Signed-off-by: Thomas Gleixner
Reviewed-by: Nick Desaulniers
Link: https://lkml.kernel.org/r/20200527135352.1198078-1-arnd@arndb.de
Signed-off-by: Sasha Levin

Arnd Bergmann
2020-07-29 16:18:40 +0800
36f735554 irqdomain/treewide: Keep firmware node unconditionally allocated ... Browse Code »

[ Upstream commit e3beca48a45b5e0e6e6a4e0124276b8248dcc9bb ]

Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type
IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after
creating the irqdomain. The only purpose of these FW nodes is to convey
name information. When this was introduced the core code did not store the
pointer to the node in the irqdomain. A recent change stored the firmware
node pointer in irqdomain for other reasons and missed to notice that the
usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence
are broken by this. Storing a dangling pointer is dangerous itself, but in
case that the domain is destroyed later on this leads to a double free.

Remove the freeing of the firmware node after creating the irqdomain from
all affected call sites to cure this.

Fixes: 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode")
Reported-by: Andy Shevchenko
Signed-off-by: Thomas Gleixner
Acked-by: Bjorn Helgaas
Acked-by: Marc Zyngier
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin

Thomas Gleixner
2020-07-29 16:18:28 +0800

22 Jul, 2020

2 commits

9f8d3d2f7 genirq/affinity: Handle affinity setting on inactive interrupts correctly ... Browse Code »

commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream.

Setting interrupt affinity on inactive interrupts is inconsistent when
hierarchical irq domains are enabled. The core code should just store the
affinity and not call into the irq chip driver for inactive interrupts
because the chip drivers may not be in a state to handle such requests.

X86 has a hacky workaround for that but all other irq chips have not which
causes problems e.g. on GIC V3 ITS.

Instead of adding more ugly hacks all over the place, solve the problem in
the core code. If the affinity is set on an inactive interrupt then:

- Store it in the irq descriptors affinity mask
- Update the effective affinity to reflect that so user space has
a consistent view
- Don't call into the irq chip driver

This is the core equivalent of the X86 workaround and works correctly
because the affinity setting is established in the irq chip when the
interrupt is activated later on.

Note, that this is only effective when hierarchical irq domains are enabled
by the architecture. Doing it unconditionally would break legacy irq chip
implementations.

For hierarchial irq domains this works correctly as none of the drivers can
have a dependency on affinity setting in inactive state by design.

Remove the X86 workaround as it is not longer required.

Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Reported-by: Ali Saidi
Signed-off-by: Thomas Gleixner
Tested-by: Ali Saidi
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com
Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2020-07-22 15:33:16 +0800
742b79562 copy_xstate_to_kernel: Fix typo which caused GDB regression ... Browse Code »

commit 5714ee50bb4375bd586858ad800b1d9772847452 upstream.

This fixes a regression encountered while running the
gdb.base/corefile.exp test in GDB's test suite.

In my testing, the typo prevented the sw_reserved field of struct
fxregs_state from being output to the kernel XSAVES area. Thus the
correct mask corresponding to XCR0 was not present in the core file for
GDB to interrogate, resulting in the following behavior:

[kev@f32-1 gdb]$ ./gdb -q testsuite/outputs/gdb.base/corefile/corefile testsuite/outputs/gdb.base/corefile/corefile.core
Reading symbols from testsuite/outputs/gdb.base/corefile/corefile...
[New LWP 232880]

warning: Unexpected size of section `.reg-xstate/232880' in core file.

With the typo fixed, the test works again as expected.

Signed-off-by: Kevin Buettner
Fixes: 9e4636545933 ("copy_xstate_to_kernel(): don't leave parts of destination uninitialized")
Cc: Al Viro
Cc: Dave Airlie
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Kevin Buettner
2020-07-22 15:33:04 +0800