05 Oct, 2017
1 commit
-
commit 814fb7bb7db5433757d76f4c4502c96fc53b0b5e upstream.
On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
set a task's extended state (xstate) or "FPU" registers. ptrace() can
set them for another task using the PTRACE_SETREGSET request with
NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
In either case, registers can be set to any value, but the kernel
assumes that the XSAVE area itself remains valid in the sense that the
CPU can restore it.However, in the case where the kernel is using the uncompacted xstate
format (which it does whenever the XSAVES instruction is unavailable),
it was possible for userspace to set the xcomp_bv field in the
xstate_header to an arbitrary value. However, all bits in that field
are reserved in the uncompacted case, so when switching to a task with
nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault. This
caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit. In
addition, since the error is otherwise ignored, the FPU registers from
the task previously executing on the CPU were leaked.Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
the uncompacted case, and returning an error otherwise.The reason for validating xcomp_bv rather than simply overwriting it
with 0 is that we want userspace to see an error if it (incorrectly)
provides an XSAVE area in compacted format rather than in uncompacted
format.Note that as before, in case of error we clear the task's FPU state.
This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
better to return an error before changing anything. But it seems the
"clear on error" behavior is fine for now, and it's a little tricky to
do otherwise because it would mean we couldn't simply copy the full
userspace state into kernel memory in one __copy_from_user().This bug was found by syzkaller, which hit the above-mentioned
WARN_ON_FPU():WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
RIP: 0010:__switch_to+0x5b5/0x5d0
RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
FS: 00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
Call Trace:
Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1fHere is a C reproducer. The expected behavior is that the program spin
forever with no output. However, on a buggy kernel running on a
processor with the "xsave" feature but without the "xsaves" feature
(e.g. Sandy Bridge through Broadwell for Intel), within a second or two
the program reports that the xmm registers were corrupted, i.e. were not
restored correctly. With CONFIG_X86_DEBUG_FPU=y it also hits the above
kernel warning.#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include
#includeint main(void)
{
int pid = fork();
uint64_t xstate[512];
struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };if (pid == 0) {
bool tracee = true;
for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
tracee = (fork() != 0);
uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
asm volatile(" movdqu %0, %%xmm0\n"
" mov %0, %%rbx\n"
"1: movdqu %%xmm0, %0\n"
" mov %0, %%rax\n"
" cmp %%rax, %%rbx\n"
" je 1b\n"
: "+m" (xmm0) : : "rax", "rbx", "xmm0");
printf("BUG: xmm registers corrupted! tracee=%d, xmm0=%08X%08X%08X%08X\n",
tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
} else {
usleep(100000);
ptrace(PTRACE_ATTACH, pid, 0, 0);
wait(NULL);
ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
xstate[65] = -1;
ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
ptrace(PTRACE_CONT, pid, 0, 0);
wait(NULL);
}
return 1;
}Note: the program only tests for the bug using the ptrace() system call.
The bug can also be reproduced using the rt_sigreturn() system call, but
only when called from a 32-bit program, since for 64-bit programs the
kernel restores the FPU state from the signal frame by doing XRSTOR
directly from userspace memory (with proper error checking).Reported-by: Dmitry Vyukov
Signed-off-by: Eric Biggers
Reviewed-by: Kees Cook
Reviewed-by: Rik van Riel
Acked-by: Dave Hansen
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Eric Biggers
Cc: Fenghua Yu
Cc: Kevin Hao
Cc: Linus Torvalds
Cc: Michael Halcrow
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Wanpeng Li
Cc: Yu-cheng Yu
Cc: kernel-hardening@lists.openwall.com
Fixes: 0b29643a5843 ("x86/xsaves: Change compacted format xsave area header")
Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
04 Aug, 2016
1 commit
-
The use of config_enabled() against config options is ambiguous. In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED(). Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
clearer.This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.I noticed two cases where config_enabled() is used against a tristate
option:- config_enabled(CONFIG_HWMON)
[ drivers/net/wireless/ath/ath10k/thermal.c ]- config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
[ drivers/gpu/drm/gma500/opregion.c ]I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Acked-by: Kees Cook
Cc: Stas Sergeev
Cc: Matt Redfearn
Cc: Joshua Kinard
Cc: Jiri Slaby
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Markos Chandras
Cc: "Dmitry V. Levin"
Cc: yu-cheng yu
Cc: James Hogan
Cc: Brian Gerst
Cc: Johannes Berg
Cc: Peter Zijlstra
Cc: Al Viro
Cc: Will Drewry
Cc: Nikolay Martynov
Cc: Huacai Chen
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Daniel Borkmann
Cc: Leonid Yegoshin
Cc: Rafal Milecki
Cc: James Cowgill
Cc: Greg Kroah-Hartman
Cc: Ralf Baechle
Cc: Alex Smith
Cc: Adam Buchbinder
Cc: Qais Yousef
Cc: Jiang Liu
Cc: Mikko Rapeli
Cc: Paul Gortmaker
Cc: Denys Vlasenko
Cc: Brian Norris
Cc: Hidehiro Kawai
Cc: "Luis R. Rodriguez"
Cc: Andy Lutomirski
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: "Kirill A. Shutemov"
Cc: Roland McGrath
Cc: Paul Burton
Cc: Kalle Valo
Cc: Viresh Kumar
Cc: Tony Wu
Cc: Huaitong Han
Cc: Sumit Semwal
Cc: Alexei Starovoitov
Cc: Juergen Gross
Cc: Jason Cooper
Cc: "David S. Miller"
Cc: Oleg Nesterov
Cc: Andrea Gelmini
Cc: David Woodhouse
Cc: Marc Zyngier
Cc: Rabin Vincent
Cc: "Maciej W. Rozycki"
Cc: David Daney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jul, 2016
1 commit
-
When the kernel is using XSAVES compacted format, we cannot do
__copy_from_user() from a signal frame, which has standard-format data.
Fix it by using copyin_to_xsaves(), which converts between formats and
filters out all supervisor states that we do not allow userspace to
write.Signed-off-by: Yu-cheng Yu
Signed-off-by: Fenghua Yu
Reviewed-by: Dave Hansen
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Ravi V Shankar
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1468253937-40008-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Ingo Molnar
18 Jun, 2016
3 commits
-
XSAVES is a kernel instruction and uses a compacted format. When working
with user space, the kernel should provide standard-format, non-supervisor
state data. We cannot do __copy_to_user() from a compacted-format kernel
xstate area to a signal frame.Dave Hansen proposes this method to simplify copy xstate directly to user.
This patch is based on an earlier patch from Fenghua Yu
Originally-from: Fenghua Yu
Signed-off-by: Yu-cheng Yu
Reviewed-by: Dave Hansen
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Quentin Casasnovas
Cc: Ravi V. Shankar
Cc: Sai Praneeth Prakhya
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/c36f419d525517d04209a28dd8e1e5af9000036e.1463760376.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar -
… it from 'fpu_user_xstate_size'
User space uses standard format xsave area. fpstate in signal frame
should have standard format size.To explicitly distinguish between xstate size in kernel space and the
one in user space, we rename 'xstate_size' to 'fpu_kernel_xstate_size'.Cleanup only, no change in functionality.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
[ Rebased the patch and cleaned up the naming. ]
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2ecbae347a5152d94be52adf7d0f3b7305d90d99.1463760376.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org> -
The kernel xstate area can be in standard or compacted format;
it is always in standard format for user mode. When XSAVES is
enabled, the kernel uses the compacted format and it is necessary
to use a separate fpu_user_xstate_size for signal/ptrace frames.Signed-off-by: Fenghua Yu
[ Rebased the patch and cleaned up the naming. ]
Signed-off-by: Yu-cheng Yu
Reviewed-by: Dave Hansen
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Quentin Casasnovas
Cc: Ravi V. Shankar
Cc: Sai Praneeth Prakhya
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/8756ec34dabddfc727cda5743195eb81e8caf91c.1463760376.git.yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar
08 Jun, 2016
1 commit
-
I've been carrying this patch around for a bit and it's helped me
solve at least a couple FPU-related bugs. In addition to using
it for debugging, I also drug it out because using AVX (and
AVX2/AVX-512) can have serious power consequences for a modern
core. It's very important to be able to figure out who is using
it.It's also insanely useful to go out and see who is using a given
feature, like MPX or Memory Protection Keys. If you, for
instance, want to find all processes using protection keys, you
can do:echo 'xfeatures & 0x200' > filter
Since 0x200 is the protection keys feature bit.
Note that this touches the KVM code. KVM did a CREATE_TRACE_POINTS
and then included a bunch of random headers. If anyone one of
those included other tracepoints, it would have defined the *OTHER*
tracepoints. That's bogus, so move it to the right place.Signed-off-by: Dave Hansen
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Denys Vlasenko
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Quentin Casasnovas
Cc: Steven Rostedt
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20160601174220.3CDFB90E@viggo.jf.intel.com
Signed-off-by: Ingo Molnar
12 Nov, 2015
1 commit
-
(This should have gone to LKML originally. Sorry for the extra
noise, folks on the cc.)Background:
Signal frames on x86 have two formats:
1. For 32-bit executables (whether on a real 32-bit kernel or
under 32-bit emulation on a 64-bit kernel) we have a
'fpregset_t' that includes the "FSAVE" registers.2. For 64-bit executables (on 64-bit kernels obviously), the
'fpregset_t' is smaller and does not contain the "FSAVE"
state.When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.Problem:
save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable. This is for real 32-bit and
ia32 emulation.But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog(). But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed). This leads to the FPU/XSAVE state being initialized.For MPX, this leads to the most permissive state and means we
silently lose bounds violations. I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state. I'm not sure why
no one has spotted this bug.I believe this was broken by:
72a671ced66d ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
way back in 2012.
Signed-off-by: Dave Hansen
Cc:
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@sr71.net
Cc: fenghua.yu@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com
Signed-off-by: Ingo Molnar
04 Nov, 2015
1 commit
-
Pull x86 sigcontext header cleanups from Ingo Molnar:
"This series reorganizes and cleans up various aspects of the main
sigcontext UAPI headers, such as unifying the data structures and
updating/adding lots of comments to explain all the ABI details and
quirks. The headers can now also be built in user-space standalone"* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Clean up too long lines
x86/headers: Remove references on the kernel side
x86/headers: Remove direct sigcontext32.h uses
x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
x86/headers: Make sigcontext pointers bit independent
x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
x86/headers: Unify register type definitions between 32-bit compat and i386
x86/headers: Use ABI types consistently in sigcontext*.h
x86/headers: Separate out legacy user-space structure definitions
x86/headers: Clean up and better document uapi/asm/sigcontext.h
x86/headers: Clean up uapi/asm/sigcontext32.h
x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h
14 Sep, 2015
1 commit
-
There are two concepts that have some confusing naming:
1. Extended State Component numbers (currently called
XFEATURE_BIT_*)
2. Extended State Component masks (currently called XSTATE_*)The numbers are (currently) from 0-9. State component 3 is the
bounds registers for MPX, for instance.But when we want to enable "state component 3", we go set a bit
in XCR0. The bit we set is 1<
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tim Chen
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
[ Ported to v4.3-rc1. ]
Signed-off-by: Ingo Molnar
08 Sep, 2015
1 commit
-
Remove uses of _fpstate_ia32 from the kernel, and move the
legacy _fpstate_ia32 definition to the user-space only portion
of the header.Acked-by: Mikko Rapeli
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar
19 May, 2015
3 commits
-
This cleans up the call sites and the function a bit,
and also makes it more symmetric with the other high
level FPU state handling functions.It's still only valid for the current task, as we copy
to the FPU registers of the current CPU.No change in functionality.
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar -
Use these consistent names:
struct fregs_state # was: i387_fsave_struct
struct fxregs_state # was: i387_fxsave_struct
struct swregs_state # was: i387_soft_struct
struct xregs_state # was: xsave_struct
union fpregs_state # was: thread_xstateCc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar -
fpu/xstate.c has a lot of generic FPU signal frame handling routines,
move them into a separate file: fpu/signal.c.Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar