31 Jan, 2019
1 commit
-
commit 9f08890ab906abaf9d4c1bad8111755cbd302260 upstream.
Right now there is only a pvclock_pvti_cpu0_va() which is defined
on kvmclock since:commit dac16fba6fc5
("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")The only user of this interface so far is kvm. This commit adds a
setter function for the pvti page and moves pvclock_pvti_cpu0_va
to pvclock, which is a more generic place to have it; and would
allow other PV clocksources to use it, such as Xen.While moving pvclock_pvti_cpu0_va into pvclock, rename also this
function to pvclock_get_pvti_cpu0_va (including its call sites)
to be symmetric with the setter (pvclock_set_pvti_cpu0_va).Signed-off-by: Joao Martins
Acked-by: Andy Lutomirski
Acked-by: Paolo Bonzini
Acked-by: Thomas Gleixner
Signed-off-by: Boris Ostrovsky
Signed-off-by: Juergen Gross
Signed-off-by: Greg Kroah-Hartman
13 Oct, 2018
3 commits
-
commit 02e425668f5c9deb42787d10001a3b605993ad15 upstream.
When I added the missing memory outputs, I failed to update the
index of the first argument (ebx) on 32-bit builds, which broke the
fallbacks. Somehow I must have screwed up my testing or gotten
lucky.Add another test to cover gettimeofday() as well.
Signed-off-by: Andy Lutomirski
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: stable@vger.kernel.org
Fixes: 715bd9d12f84 ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks")
Link: http://lkml.kernel.org/r/21bd45ab04b6d838278fa5bebfa9163eceffa13c.1538608971.git.luto@kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 4f166564014aba65ad6f15b612f6711fd0f117ee upstream.
When I fixed the vDSO build to use inline retpolines, I messed up
the Makefile logic and made it unconditional. It should have
depended on CONFIG_RETPOLINE and on the availability of compiler
support. This broke the build on some older compilers.Reported-by: nikola.ciprich@linuxbox.cz
Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: David Woodhouse
Cc: Linus Torvalds
Cc: Matt Rickard
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: jason.vas.dias@gmail.com
Cc: stable@vger.kernel.org
Fixes: 2e549b2ee0e3 ("x86/vdso: Fix vDSO build if a retpoline is emitted")
Link: http://lkml.kernel.org/r/08a1f29f2c238dd1f493945e702a521f8a5aa3ae.1538540801.git.luto@kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 715bd9d12f84d8f5cc8ad21d888f9bc304a8eb0b upstream.
The syscall fallbacks in the vDSO have incorrect asm constraints.
They are not marked as writing to their outputs -- instead, they are
marked as clobbering "memory", which is useless. In particular, gcc
is smart enough to know that the timespec parameter hasn't escaped,
so a memory clobber doesn't clobber it. And passing a pointer as an
asm *input* does not tell gcc that the pointed-to value is changed.Add in the fact that the asm instructions weren't volatile, and gcc
was free to omit them entirely unless their sole output (the return
value) is used. Which it is (phew!), but that stops happening with
some upcoming patches.As a trivial example, the following code:
void test_fallback(struct timespec *ts)
{
vdso_fallback_gettime(CLOCK_MONOTONIC, ts);
}compiles to:
00000000000000c0 :
c0: c3 retqTo add insult to injury, the RCX and R11 clobbers on 64-bit
builds were missing.The "memory" clobber is also unnecessary -- no ordering with respect to
other memory operations is needed, but that's going to be fixed in a
separate not-for-stable patch.Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2018
1 commit
-
[ Upstream commit 6709812f094d96543b443645c68daaa32d3d3e77 ]
Sadly, other than claimed in:
a368d7fd2a ("x86/entry/64: Add instruction suffix")
... there are two more instances which want to be adjusted.
As said there, omitting suffixes from instructions in AT&T mode is bad
practice when operand size cannot be determined by the assembler from
register operands, and is likely going to be warned about by upstream
gas in the future (mine does already).Add the other missing suffixes here as well.
Signed-off-by: Jan Beulich
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/5B3A02DD02000078001CFB78@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
05 Sep, 2018
1 commit
-
commit 2e549b2ee0e358bc758480e716b881f9cabedb6a upstream.
Currently, if the vDSO ends up containing an indirect branch or
call, GCC will emit the "external thunk" style of retpoline, and it
will fail to link.Fix it by building the vDSO with inline retpoline thunks.
I haven't seen any reports of this triggering on an unpatched
kernel.Fixes: commit 76b043848fd2 ("x86/retpoline: Add initial retpoline support")
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Acked-by: Matt Rickard
Cc: Borislav Petkov
Cc: Jason Vas Dias
Cc: David Woodhouse
Cc: Peter Zijlstra
Cc: Andi Kleen
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/c76538cd3afbe19c6246c2d1715bc6a60bd63985.1534448381.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman
06 Aug, 2018
1 commit
-
commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream.
error_entry and error_exit communicate the user vs. kernel status of
the frame using %ebx. This is unnecessary -- the information is in
regs->cs. Just use regs->cs.This makes error_entry simpler and makes error_exit more robust.
It also fixes a nasty bug. Before all the Spectre nonsense, the
xen_failsafe_callback entry point returned like this:ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
ENCODE_FRAME_POINTER
jmp error_exitAnd it did not go through error_entry. This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used. As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks. Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes. This was introduced by:commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.[ Historical note: I wrote this patch as a cleanup before I was aware
of the bug it fixed. ][ Note to stable maintainers: this should probably get applied to all
kernels. If you're nervous about that, a more conservative fix to
add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should
also fix the problem. ]Reported-and-tested-by: M. Vefa Bicakci
Signed-off-by: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: Dominik Brodowski
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: stable@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.luto@kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
29 Mar, 2018
2 commits
-
commit 31ad7f8e7dc94d3b85ccf9b6141ce6dfd35a1781 upstream.
Writing to it directly does not work for Xen PV guests.
Fixes: 49275fef986a ("x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy")
Signed-off-by: Boris Ostrovsky
Signed-off-by: Thomas Gleixner
Reviewed-by: Juergen Gross
Acked-by: Andy Lutomirski
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180319143154.3742-1-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit d8ba61ba58c88d5207c1ba2f7d9a2280e7d03be9 upstream.
There's nothing IST-worthy about #BP/int3. We don't allow kprobes
in the small handful of places in the kernel that run at CPL0 with
an invalid stack, and 32-bit kernels have used normal interrupt
gates for #BP forever.Furthermore, we don't allow kprobes in places that have usergs while
in kernel mode, so "paranoid" is also unnecessary.Signed-off-by: Andy Lutomirski
Signed-off-by: Linus Torvalds
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman
15 Mar, 2018
3 commits
-
commit d1c99108af3c5992640aa2afa7d2e88c3775c06e upstream.
This reverts commit 1dde7415e99933bb7293d6b2843752cbdb43ec11. By putting
the RSB filling out of line and calling it, we waste one RSB slot for
returning from the function itself, which means one fewer actual function
call we can make if we're doing the Skylake abomination of call-depth
counting.It also changed the number of RSB stuffings we do on vmexit from 32,
which was correct, to 16. Let's just stop with the bikeshedding; it
didn't actually *fix* anything anyway.Signed-off-by: David Woodhouse
Acked-by: Thomas Gleixner
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: arjan.van.de.ven@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/1519037457-7643-4-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit ced5d0bf603fa0baee8ea889e1d70971fd210894 upstream.
On some x86 CPU microarchitectures using 'xorq' to clear general-purpose
registers is slower than 'xorl'. As 'xorl' is sufficient to clear all
64 bits of these registers due to zero-extension [*], switch the x86
64-bit entry code to use 'xorl'.No change in functionality and no change in code size.
[*] According to Intel 64 and IA-32 Architecture Software Developer's
Manual, section 3.4.1.1, the result of 32-bit operands are "zero-
extended to a 64-bit result in the destination general-purpose
register." The AMD64 Architecture Programmer’s Manual Volume 3,
Appendix B.1, describes the same behaviour.Suggested-by: Denys Vlasenko
Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Arjan van de Ven
Cc: Borislav Petkov
Cc: Dan Williams
Cc: Dave Hansen
Cc: David Woodhouse
Cc: Greg Kroah-Hartman
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20180214175924.23065-3-linux@dominikbrodowski.net
[ Improved on the changelog a bit. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 9e809d15d6b692fa061d74be7aaab1c79f6784b8 upstream.
Play a little trick in the generic PUSH_AND_CLEAR_REGS macro
to insert the GP registers "above" the original return address.This allows us to (re-)insert the macro in error_entry() and
paranoid_entry() and to remove it from the idtentry macro. This
reduces the static footprint significantly:text data bss dec hex filename
24307 0 0 24307 5ef3 entry_64.o-orig
20987 0 0 20987 51fb entry_64.oCo-developed-by: Linus Torvalds
Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Arjan van de Ven
Cc: Borislav Petkov
Cc: Dan Williams
Cc: Dave Hansen
Cc: David Woodhouse
Cc: Greg Kroah-Hartman
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20180214175924.23065-2-linux@dominikbrodowski.net
[ Small tweaks to comments. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
22 Feb, 2018
14 commits
-
commit e48657573481a5dff7cfdc3d57005c80aa816500 upstream.
Josh Poimboeuf noticed the following bug:
"The paranoid exit code only restores the saved CR3 when it switches back
to the user GS. However, even in the kernel GS case, it's possible that
it needs to restore a user CR3, if for example, the paranoid exception
occurred in the syscall exit path between SWITCH_TO_USER_CR3_STACK and
SWAPGS."Josh also confirmed via targeted testing that it's possible to hit this bug.
Fix the bug by also restoring CR3 in the paranoid_exit_no_swapgs branch.
The reason we haven't seen this bug reported by users yet is probably because
"paranoid" entry points are limited to the following cases:idtentry double_fault do_double_fault has_error_code=1 paranoid=2
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
idtentry machine_check do_mce has_error_code=0 paranoid=1Amongst those entry points only machine_check is one that will interrupt an
IRQS-off critical section asynchronously - and machine check events are rare.The other main asynchronous entries are NMI entries, which can be very high-freq
with perf profiling, but they are special: they don't use the 'idtentry' macro but
are open coded and restore user CR3 unconditionally so don't have this bug.Reported-and-tested-by: Josh Poimboeuf
Reviewed-by: Andy Lutomirski
Acked-by: Thomas Gleixner
Cc: Arjan van de Ven
Cc: Borislav Petkov
Cc: Dan Williams
Cc: Dave Hansen
Cc: David Woodhouse
Cc: Greg Kroah-Hartman
Cc: Linus Torvalds
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20180214073910.boevmg65upbk3vqb@gmail.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit b498c261107461d5c42140dfddd05df83d8ca078 upstream.
That macro was touched around 2.5.8 times, judging by the full history
linux repo, but it was unused even then. Get rid of it already.Signed-off-by: Borislav Petkov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux@dominikbrodowski.net
Link: http://lkml.kernel.org/r/20180212201318.GD14640@pd.tnic
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit b3ccefaed922529e6a67de7b30af5aa38c76ace9 upstream.
With the following commit:
f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros")
... one of my suggested improvements triggered a frame pointer warning:
arch/x86/entry/entry_64.o: warning: objtool: paranoid_entry()+0x11: call without frame pointer save/setup
The warning is correct for the build-time code, but it's actually not
relevant at runtime because of paravirt patching. The paravirt swapgs
call gets replaced with either a SWAPGS instruction or NOPs at runtime.Go back to the previous behavior by removing the ELF function annotation
for paranoid_entry() and adding an unwind hint, which effectively
silences the warning.Reported-by: kbuild test robot
Signed-off-by: Josh Poimboeuf
Cc: Dominik Brodowski
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Fixes: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros")
Link: http://lkml.kernel.org/r/20180212174503.5acbymg5z6p32snu@treble
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 92816f571af81e9a71cc6f3dc8ce1e2fcdf7b6b8 upstream.
... same as the other macros in arch/x86/entry/calling.h
Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dan.j.williams@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-8-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit dde3036d62ba3375840b10ab9ec0d568fd773b07 upstream.
Previously, error_entry() and paranoid_entry() saved the GP registers
onto stack space previously allocated by its callers. Combine these two
steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro
for that.This adds a significant amount ot text size. However, Ingo Molnar points
out that:"these numbers also _very_ significantly over-represent the
extra footprint. The assumptions that resulted in
us compressing the IRQ entry code have changed very
significantly with the new x86 IRQ allocation code we
introduced in the last year:- IRQ vectors are usually populated in tightly clustered
groups.With our new vector allocator code the typical per CPU
allocation percentage on x86 systems is ~3 device vectors
and ~10 fixed vectors out of ~220 vectors - i.e. a very
low ~6% utilization (!). [...]The days where we allocated a lot of vectors on every
CPU and the compression of the IRQ entry code text
mattered are over.- Another issue is that only a small minority of vectors
is frequent enough to actually matter to cache utilization
in practice: 3-4 key IPIs and 1-2 device IRQs at most - and
those vectors tend to be tightly clustered as well into about
two groups, and are probably already on 2-3 cache lines in
practice.For the common case of 'cache cold' IRQs it's the depth of
the call chain and the fragmentation of the resulting I$
that should be the main performance limit - not the overall
size of it.- The CPU side cost of IRQ delivery is still very expensive
even in the best, most cached case, as in 'over a thousand
cycles'. So much stuff is done that maybe contemporary x86
IRQ entry microcode already prefetches the IDT entry and its
expected call target address."[*][*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com
The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need
modification. Previously, %rsp was manually decreased by 15*8; with
this patch, %rsp is decreased by 15 pushq instructions.[jpoimboe@redhat.com: unwind hint improvements]
Suggested-by: Linus Torvalds
Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dan.j.williams@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 30907fd13bb593202574bb20af58d67c70a1ee14 upstream.
entry_SYSCALL_64_after_hwframe() and nmi() can be converted to use
PUSH_AND_CLEAN_REGS instead of opencoded variants thereof. Due to
the interleaving, the additional XOR-based clearing of R8 and R9
in entry_SYSCALL_64_after_hwframe() should not have any noticeable
negative implications.Suggested-by: Linus Torvalds
Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dan.j.williams@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-6-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 3f01daecd545e818098d84fd1ad43e19a508d705 upstream.
Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
This macro uses PUSH instead of MOV and should therefore be faster, at
least on newer CPUs.Suggested-by: Linus Torvalds
Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dan.j.williams@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-5-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit f7bafa2b05ef25eda1d9179fd930b0330cf2b7d1 upstream.
Same as is done for syscalls, interleave XOR with PUSH instructions
for exceptions/interrupts, in order to minimize the cost of the
additional instructions required for register clearing.Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dan.j.williams@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-4-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 502af0d70843c2a9405d7ba1f79b4b0305aaf5f5 upstream.
The two special, opencoded cases for POP_C_REGS can be handled by ASM
macros.Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dan.j.williams@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-3-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 2e3f0098bc45f710a2f4cbcc94b80a1fae7a99a1 upstream.
All current code paths call SAVE_C_REGS and then immediately
SAVE_EXTRA_REGS. Therefore, merge these two macros and order the MOV
sequeneces properly.While at it, remove the macros to save all except specific registers,
as these macros have been unused for a long time.Suggested-by: Linus Torvalds
Signed-off-by: Dominik Brodowski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dan.j.williams@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-2-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 3ac6d8c787b835b997eb23e43e09aa0895ef7d58 upstream.
Clear the 'extra' registers on entering the 64-bit kernel for exceptions
and interrupts. The common registers are not cleared since they are
likely clobbered well before they can be exploited in a speculative
execution attack.Originally-From: Andi Kleen
Signed-off-by: Dan Williams
Cc:
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/151787989146.7847.15749181712358213254.stgit@dwillia2-desk3.amr.corp.intel.com
[ Made small improvements to the changelog and the code comments. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 14b1fcc62043729d12e8ae00f8297ab2ffe9fa91 upstream.
The comment is confusing since the path is taken when
CONFIG_PAGE_TABLE_ISOLATION=y is disabled (while the comment says it is not
taken).Signed-off-by: Nadav Amit
Cc: Andy Lutomirski
Cc: Arjan van de Ven
Cc: Borislav Petkov
Cc: Dan Williams
Cc: Dave Hansen
Cc: David Woodhouse
Cc: Greg Kroah-Hartman
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: nadav.amit@gmail.com
Link: http://lkml.kernel.org/r/20180209170638.15161-1-namit@vmware.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 6b8cf5cc9965673951f1ab3f0e3cf23d06e3e2ee upstream.
At entry userspace may have populated registers with values that could
otherwise be useful in a speculative execution attack. Clear them to
minimize the kernel's attack surface.Originally-From: Andi Kleen
Signed-off-by: Dan Williams
Cc:
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/151787989697.7847.4083702787288600552.stgit@dwillia2-desk3.amr.corp.intel.com
[ Made small improvements to the changelog. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 8e1eb3fa009aa7c0b944b3c8b26b07de0efb3200 upstream.
At entry userspace may have (maliciously) populated the extra registers
outside the syscall calling convention with arbitrary values that could
be useful in a speculative execution (Spectre style) attack.Clear these registers to minimize the kernel's attack surface.
Note, this only clears the extra registers and not the unused
registers for syscalls less than 6 arguments, since those registers are
likely to be clobbered well before their values could be put to use
under speculation.Note, Linus found that the XOR instructions can be executed with
minimized cost if interleaved with the PUSH instructions, and Ingo's
analysis found that R10 and R11 should be included in the register
clearing beyond the typical 'extra' syscall calling convention
registers.Suggested-by: Linus Torvalds
Reported-by: Andi Kleen
Signed-off-by: Dan Williams
Cc:
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/151787988577.7847.16733592218894189003.stgit@dwillia2-desk3.amr.corp.intel.com
[ Made small improvements to the changelog and the code comments. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
08 Feb, 2018
5 commits
-
commit 2fbd7af5af8665d18bcefae3e9700be07e22b681
The syscall table base is a user controlled function pointer in kernel
space. Use array_index_nospec() to prevent any out of bounds speculation.While retpoline prevents speculating into a userspace directed target it
does not stop the pointer de-reference, the concern is leaking memory
relative to the syscall table base, by observing instruction cache
behavior.Reported-by: Linus Torvalds
Signed-off-by: Dan Williams
Signed-off-by: Thomas Gleixner
Cc: linux-arch@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
Cc: gregkh@linuxfoundation.org
Cc: Andy Lutomirski
Cc: alan@linux.intel.com
Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman -
commit 37a8f7c38339b22b69876d6f5a0ab851565284e3
The TS_COMPAT bit is very hot and is accessed from code paths that mostly
also touch thread_info::flags. Move it into struct thread_info to improve
cache locality.The only reason it was in thread_struct is that there was a brief period
during which arch-specific fields were not allowed in struct thread_info.Linus suggested further changing:
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
to:
if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED)))
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);on the theory that frequently dirtying the cacheline even in pure 64-bit
code that never needs to modify status hurts performance. That could be a
reasonable followup patch, but I suspect it matters less on top of this
patch.Suggested-by: Linus Torvalds
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Acked-by: Linus Torvalds
Cc: Borislav Petkov
Cc: Kernel Hardening
Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit d1f7732009e0549eedf8ea1db948dc37be77fd46
With the fast path removed there is no point in splitting the push of the
normal and the extra register set. Just push the extra regs right away.[ tglx: Split out from 'x86/entry/64: Remove the SYSCALL64 fast path' ]
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Borislav Petkov
Cc: Linus Torvalds
Cc: Kernel Hardening
Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 21d375b6b34ff511a507de27bf316b3dde6938d9
The SYCALLL64 fast path was a nice, if small, optimization back in the good
old days when syscalls were actually reasonably fast. Now there is PTI to
slow everything down, and indirect branches are verboten, making everything
messier. The retpoline code in the fast path is particularly nasty.Just get rid of the fast path. The slow path is barely slower.
[ tglx: Split out the 'push all extra regs' part ]
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Borislav Petkov
Cc: Linus Torvalds
Cc: Kernel Hardening
Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman -
commit 1dde7415e99933bb7293d6b2843752cbdb43ec11
Simplify it to call an asm-function instead of pasting 41 insn bytes at
every call site. Also, add alignment to the macro as suggested here:https://support.google.com/faqs/answer/7625886
[dwmw2: Clean up comments, let it clobber %ebx and just tell the compiler]
Signed-off-by: Borislav Petkov
Signed-off-by: David Woodhouse
Signed-off-by: Thomas Gleixner
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-3-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman
24 Jan, 2018
2 commits
-
commit 6f41c34d69eb005e7848716bbcafc979b35037d5 upstream.
The machine check idtentry uses an indirect branch directly from the low
level code. This evades the speculation protection.Replace it by a direct call into C code and issue the indirect call there
so the compiler can apply the proper speculation protection.Signed-off-by: Thomas Gleixner
Reviewed-by:Borislav Petkov
Reviewed-by: David Woodhouse
Niced-by: Peter Zijlstra
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
Signed-off-by: Greg Kroah-Hartman -
commit c995efd5a740d9cbafbf58bde4973e8b50b4d761 upstream.
On context switch from a shallow call stack to a deeper one, as the CPU
does 'ret' up the deeper side it may encounter RSB entries (predictions for
where the 'ret' goes to) which were populated in userspace.This is problematic if neither SMEP nor KPTI (the latter of which marks
userspace pages as NX for the kernel) are active, as malicious code in
userspace may then be executed speculatively.Overwrite the CPU's return prediction stack with calls which are predicted
to return to an infinite loop, to "capture" speculation if this
happens. This is required both for retpoline, and also in conjunction with
IBRS for !SMEP && !KPTI.On Skylake+ the problem is slightly different, and an *underflow* of the
RSB may cause errant branch predictions to occur. So there it's not so much
overwrite, as *filling* the RSB to attempt to prevent it getting
empty. This is only a partial solution for Skylake+ since there are many
other conditions which may result in the RSB becoming empty. The full
solution on Skylake+ is to use IBRS, which will prevent the problem even
when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
required on context switch.[ tglx: Added missing vendor check and slighty massaged comments and
changelog ]Signed-off-by: David Woodhouse
Signed-off-by: Thomas Gleixner
Acked-by: Arjan van de Ven
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Josh Poimboeuf
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Andy Lutomirski
Cc: Dave Hansen
Cc: Kees Cook
Cc: Tim Chen
Cc: Greg Kroah-Hartman
Cc: Paul Turner
Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman
17 Jan, 2018
2 commits
-
commit f10ee3dcc9f0aba92a5c4c064628be5200765dc2 upstream.
The switch to the user space page tables in the low level ASM code sets
unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
address of the page directory to the user part, bit 11 is switching the
PCID to the PCID associated with the user page tables.This fails on a machine which lacks PCID support because bit 11 is set in
CR3. Bit 11 is reserved when PCID is inactive.While the Intel SDM claims that the reserved bits are ignored when PCID is
disabled, the AMD APM states that they should be cleared.This went unnoticed as the AMD APM was not checked when the code was
developed and reviewed and test systems with Intel CPUs never failed to
boot. The report is against a Centos 6 host where the guest fails to boot,
so it's not yet clear whether this is a virt issue or can happen on real
hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
the reserved bits.Make sure that on non PCID machines bit 11 is not set by the page table
switching code.Andy suggested to rename the related bits and masks so they are clearly
describing what they should be used for, which is done as well for clarity.That split could have been done with alternatives but the macro hell is
horrible and ugly. This can be done on top if someone cares to remove the
extra orq. For now it's a straight forward fix.Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Laura Abbott
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: stable
Cc: Borislav Petkov
Cc: Andy Lutomirski
Cc: Willy Tarreau
Cc: David Woodhouse
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
Signed-off-by: Greg Kroah-Hartman -
commit 2641f08bb7fc63a636a2b18173221d7040a3512e upstream.
Convert indirect jumps in core 32/64bit entry assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return
address after the 'call' instruction must be *precisely* at the
.Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work,
and the use of alternatives will mess that up unless we play horrid
games to prepend with NOPs and make the variants the same length. It's
not worth it; in the case where we ALTERNATIVE out the retpoline, the
first instruction at __x86.indirect_thunk.rax is going to be a bare
jmp *%rax anyway.Signed-off-by: David Woodhouse
Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Acked-by: Arjan van de Ven
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Josh Poimboeuf
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Andy Lutomirski
Cc: Dave Hansen
Cc: Kees Cook
Cc: Tim Chen
Cc: Greg Kroah-Hartman
Cc: Paul Turner
Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman
05 Jan, 2018
1 commit
-
commit d7732ba55c4b6a2da339bb12589c515830cfac2c upstream.
The preparation for PTI which added CR3 switching to the entry code
misplaced the CR3 switch in entry_SYSCALL_compat().With PTI enabled the entry code tries to access a per cpu variable after
switching to kernel GS. This fails because that variable is not mapped to
user space. This results in a double fault and in the worst case a kernel
crash.Move the switch ahead of the access and clobber RSP which has been saved
already.Fixes: 8a09317b895f ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching")
Reported-by: Lars Wendler
Reported-by: Laura Abbott
Signed-off-by: Thomas Gleixner
Cc: Borislav Betkov
Cc: Andy Lutomirski ,
Cc: Dave Hansen ,
Cc: Peter Zijlstra ,
Cc: Greg KH , ,
Cc: Boris Ostrovsky ,
Cc: Juergen Gross
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos
Signed-off-by: Greg Kroah-Hartman
03 Jan, 2018
4 commits
-
commit 21e94459110252d41b45c0c8ba50fd72a664d50c upstream.
Most NMI/paranoid exceptions will not in fact change pagetables and would
thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3
writes.Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the
kernel mappings and now that we track which user PCIDs need flushing we can
avoid those too when possible.This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both
sites have plenty available.Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 6fd166aae78c0ab738d49bda653cbd9e3b1491cf upstream.
We can use PCID to retain the TLBs across CR3 switches; including those now
part of the user/kernel switch. This increases performance of kernel
entry/exit at the cost of more expensive/complicated TLB flushing.Now that we have two address spaces, one for kernel and one for user space,
we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID
(just like we use the PFN LSB for the PGD). Since we do TLB invalidation
from kernel space, the existing code will only invalidate the kernel PCID,
we augment that by marking the corresponding user PCID invalid, and upon
switching back to userspace, use a flushing CR3 write for the switch.In order to access the user_pcid_flush_mask we use PER_CPU storage, which
means the previously established SWAPGS vs CR3 ordering is now mandatory
and required.Having to do this memory access does require additional registers, most
sites have a functioning stack and we can spill one (RAX), sites without
functional stack need to otherwise provide the second scratch register.Note: PCID is generally available on Intel Sandybridge and later CPUs.
Note: Up until this point TLB flushing was broken in this series.Based-on-code-from: Dave Hansen
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 85900ea51577e31b186e523c8f4e068c79ecc7d3 upstream.
Make VSYSCALLs work fully in PTI mode by mapping them properly to the user
space visible page tables.[ tglx: Hide unused functions (Patch by Arnd Bergmann) ]
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 upstream.
Add the initial files for kernel page table isolation, with a minimal init
function and the boot time detection for this misfeature.Signed-off-by: Thomas Gleixner
Reviewed-by: Borislav Petkov
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman