05 Oct, 2017

1 commit

  • commit 814fb7bb7db5433757d76f4c4502c96fc53b0b5e upstream.

    On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
    set a task's extended state (xstate) or "FPU" registers. ptrace() can
    set them for another task using the PTRACE_SETREGSET request with
    NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
    In either case, registers can be set to any value, but the kernel
    assumes that the XSAVE area itself remains valid in the sense that the
    CPU can restore it.

    However, in the case where the kernel is using the uncompacted xstate
    format (which it does whenever the XSAVES instruction is unavailable),
    it was possible for userspace to set the xcomp_bv field in the
    xstate_header to an arbitrary value. However, all bits in that field
    are reserved in the uncompacted case, so when switching to a task with
    nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault. This
    caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit. In
    addition, since the error is otherwise ignored, the FPU registers from
    the task previously executing on the CPU were leaked.

    Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
    the uncompacted case, and returning an error otherwise.

    The reason for validating xcomp_bv rather than simply overwriting it
    with 0 is that we want userspace to see an error if it (incorrectly)
    provides an XSAVE area in compacted format rather than in uncompacted
    format.

    Note that as before, in case of error we clear the task's FPU state.
    This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
    better to return an error before changing anything. But it seems the
    "clear on error" behavior is fine for now, and it's a little tricky to
    do otherwise because it would mean we couldn't simply copy the full
    userspace state into kernel memory in one __copy_from_user().

    This bug was found by syzkaller, which hit the above-mentioned
    WARN_ON_FPU():

    WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
    RIP: 0010:__switch_to+0x5b5/0x5d0
    RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
    RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
    RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
    RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
    R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
    FS: 00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
    Call Trace:
    Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f

    Here is a C reproducer. The expected behavior is that the program spin
    forever with no output. However, on a buggy kernel running on a
    processor with the "xsave" feature but without the "xsaves" feature
    (e.g. Sandy Bridge through Broadwell for Intel), within a second or two
    the program reports that the xmm registers were corrupted, i.e. were not
    restored correctly. With CONFIG_X86_DEBUG_FPU=y it also hits the above
    kernel warning.

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    int main(void)
    {
    int pid = fork();
    uint64_t xstate[512];
    struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };

    if (pid == 0) {
    bool tracee = true;
    for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
    tracee = (fork() != 0);
    uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
    asm volatile(" movdqu %0, %%xmm0\n"
    " mov %0, %%rbx\n"
    "1: movdqu %%xmm0, %0\n"
    " mov %0, %%rax\n"
    " cmp %%rax, %%rbx\n"
    " je 1b\n"
    : "+m" (xmm0) : : "rax", "rbx", "xmm0");
    printf("BUG: xmm registers corrupted! tracee=%d, xmm0=%08X%08X%08X%08X\n",
    tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
    } else {
    usleep(100000);
    ptrace(PTRACE_ATTACH, pid, 0, 0);
    wait(NULL);
    ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
    xstate[65] = -1;
    ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
    ptrace(PTRACE_CONT, pid, 0, 0);
    wait(NULL);
    }
    return 1;
    }

    Note: the program only tests for the bug using the ptrace() system call.
    The bug can also be reproduced using the rt_sigreturn() system call, but
    only when called from a 32-bit program, since for 64-bit programs the
    kernel restores the FPU state from the signal frame by doing XRSTOR
    directly from userspace memory (with proper error checking).

    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Biggers
    Reviewed-by: Kees Cook
    Reviewed-by: Rik van Riel
    Acked-by: Dave Hansen
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Eric Biggers
    Cc: Fenghua Yu
    Cc: Kevin Hao
    Cc: Linus Torvalds
    Cc: Michael Halcrow
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Cc: Yu-cheng Yu
    Cc: kernel-hardening@lists.openwall.com
    Fixes: 0b29643a5843 ("x86/xsaves: Change compacted format xsave area header")
    Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
    Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     

04 Aug, 2016

1 commit

  • The use of config_enabled() against config options is ambiguous. In
    practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
    author might have used it for the meaning of IS_ENABLED(). Using
    IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
    clearer.

    This commit replaces config_enabled() with IS_ENABLED() where possible.
    This commit is only touching bool config options.

    I noticed two cases where config_enabled() is used against a tristate
    option:

    - config_enabled(CONFIG_HWMON)
    [ drivers/net/wireless/ath/ath10k/thermal.c ]

    - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
    [ drivers/gpu/drm/gma500/opregion.c ]

    I did not touch them because they should be converted to IS_BUILTIN()
    in order to keep the logic, but I was not sure it was the authors'
    intention.

    Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Acked-by: Kees Cook
    Cc: Stas Sergeev
    Cc: Matt Redfearn
    Cc: Joshua Kinard
    Cc: Jiri Slaby
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Markos Chandras
    Cc: "Dmitry V. Levin"
    Cc: yu-cheng yu
    Cc: James Hogan
    Cc: Brian Gerst
    Cc: Johannes Berg
    Cc: Peter Zijlstra
    Cc: Al Viro
    Cc: Will Drewry
    Cc: Nikolay Martynov
    Cc: Huacai Chen
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Daniel Borkmann
    Cc: Leonid Yegoshin
    Cc: Rafal Milecki
    Cc: James Cowgill
    Cc: Greg Kroah-Hartman
    Cc: Ralf Baechle
    Cc: Alex Smith
    Cc: Adam Buchbinder
    Cc: Qais Yousef
    Cc: Jiang Liu
    Cc: Mikko Rapeli
    Cc: Paul Gortmaker
    Cc: Denys Vlasenko
    Cc: Brian Norris
    Cc: Hidehiro Kawai
    Cc: "Luis R. Rodriguez"
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Dave Hansen
    Cc: "Kirill A. Shutemov"
    Cc: Roland McGrath
    Cc: Paul Burton
    Cc: Kalle Valo
    Cc: Viresh Kumar
    Cc: Tony Wu
    Cc: Huaitong Han
    Cc: Sumit Semwal
    Cc: Alexei Starovoitov
    Cc: Juergen Gross
    Cc: Jason Cooper
    Cc: "David S. Miller"
    Cc: Oleg Nesterov
    Cc: Andrea Gelmini
    Cc: David Woodhouse
    Cc: Marc Zyngier
    Cc: Rabin Vincent
    Cc: "Maciej W. Rozycki"
    Cc: David Daney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

11 Jul, 2016

1 commit

  • When the kernel is using XSAVES compacted format, we cannot do
    __copy_from_user() from a signal frame, which has standard-format data.
    Fix it by using copyin_to_xsaves(), which converts between formats and
    filters out all supervisor states that we do not allow userspace to
    write.

    Signed-off-by: Yu-cheng Yu
    Signed-off-by: Fenghua Yu
    Reviewed-by: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Ravi V Shankar
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1468253937-40008-2-git-send-email-fenghua.yu@intel.com
    Signed-off-by: Ingo Molnar

    Yu-cheng Yu
     

18 Jun, 2016

3 commits

  • XSAVES is a kernel instruction and uses a compacted format. When working
    with user space, the kernel should provide standard-format, non-supervisor
    state data. We cannot do __copy_to_user() from a compacted-format kernel
    xstate area to a signal frame.

    Dave Hansen proposes this method to simplify copy xstate directly to user.

    This patch is based on an earlier patch from Fenghua Yu

    Originally-from: Fenghua Yu
    Signed-off-by: Yu-cheng Yu
    Reviewed-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Quentin Casasnovas
    Cc: Ravi V. Shankar
    Cc: Sai Praneeth Prakhya
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/c36f419d525517d04209a28dd8e1e5af9000036e.1463760376.git.yu-cheng.yu@intel.com
    Signed-off-by: Ingo Molnar

    Yu-cheng Yu
     
  • … it from 'fpu_user_xstate_size'

    User space uses standard format xsave area. fpstate in signal frame
    should have standard format size.

    To explicitly distinguish between xstate size in kernel space and the
    one in user space, we rename 'xstate_size' to 'fpu_kernel_xstate_size'.

    Cleanup only, no change in functionality.

    Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
    [ Rebased the patch and cleaned up the naming. ]
    Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
    Reviewed-by: Dave Hansen <dave.hansen@intel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
    Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
    Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/2ecbae347a5152d94be52adf7d0f3b7305d90d99.1463760376.git.yu-cheng.yu@intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Fenghua Yu
     
  • The kernel xstate area can be in standard or compacted format;
    it is always in standard format for user mode. When XSAVES is
    enabled, the kernel uses the compacted format and it is necessary
    to use a separate fpu_user_xstate_size for signal/ptrace frames.

    Signed-off-by: Fenghua Yu
    [ Rebased the patch and cleaned up the naming. ]
    Signed-off-by: Yu-cheng Yu
    Reviewed-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Quentin Casasnovas
    Cc: Ravi V. Shankar
    Cc: Sai Praneeth Prakhya
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/8756ec34dabddfc727cda5743195eb81e8caf91c.1463760376.git.yu-cheng.yu@intel.com
    Signed-off-by: Ingo Molnar

    Fenghua Yu
     

08 Jun, 2016

1 commit

  • I've been carrying this patch around for a bit and it's helped me
    solve at least a couple FPU-related bugs. In addition to using
    it for debugging, I also drug it out because using AVX (and
    AVX2/AVX-512) can have serious power consequences for a modern
    core. It's very important to be able to figure out who is using
    it.

    It's also insanely useful to go out and see who is using a given
    feature, like MPX or Memory Protection Keys. If you, for
    instance, want to find all processes using protection keys, you
    can do:

    echo 'xfeatures & 0x200' > filter

    Since 0x200 is the protection keys feature bit.

    Note that this touches the KVM code. KVM did a CREATE_TRACE_POINTS
    and then included a bunch of random headers. If anyone one of
    those included other tracepoints, it would have defined the *OTHER*
    tracepoints. That's bogus, so move it to the right place.

    Signed-off-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Quentin Casasnovas
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20160601174220.3CDFB90E@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

12 Nov, 2015

1 commit

  • (This should have gone to LKML originally. Sorry for the extra
    noise, folks on the cc.)

    Background:

    Signal frames on x86 have two formats:

    1. For 32-bit executables (whether on a real 32-bit kernel or
    under 32-bit emulation on a 64-bit kernel) we have a
    'fpregset_t' that includes the "FSAVE" registers.

    2. For 64-bit executables (on 64-bit kernels obviously), the
    'fpregset_t' is smaller and does not contain the "FSAVE"
    state.

    When creating the signal frame, we have to be aware of whether
    we are running a 32 or 64-bit executable so we create the
    correct format signal frame.

    Problem:

    save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
    called for a 32-bit executable. This is for real 32-bit and
    ia32 emulation.

    But, fpu__init_prepare_fx_sw_frame() only initializes
    'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
    32-bit kernels.

    This leads to really wierd situations where 32-bit programs
    lose their extended state when returning from a signal handler.
    The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
    out to userspace in save_xstate_epilog(). But when returning
    from the signal, the kernel errors out in check_for_xstate()
    when it does not see FP_XSTATE_MAGIC1 present (because it was
    zeroed). This leads to the FPU/XSAVE state being initialized.

    For MPX, this leads to the most permissive state and means we
    silently lose bounds violations. I think this would also mean
    that we could lose *ANY* FPU/SSE/AVX state. I'm not sure why
    no one has spotted this bug.

    I believe this was broken by:

    72a671ced66d ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

    way back in 2012.

    Signed-off-by: Dave Hansen
    Cc:
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@sr71.net
    Cc: fenghua.yu@intel.com
    Cc: yu-cheng.yu@intel.com
    Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

04 Nov, 2015

1 commit

  • Pull x86 sigcontext header cleanups from Ingo Molnar:
    "This series reorganizes and cleans up various aspects of the main
    sigcontext UAPI headers, such as unifying the data structures and
    updating/adding lots of comments to explain all the ABI details and
    quirks. The headers can now also be built in user-space standalone"

    * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/headers: Clean up too long lines
    x86/headers: Remove references on the kernel side
    x86/headers: Remove direct sigcontext32.h uses
    x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
    x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
    x86/headers: Make sigcontext pointers bit independent
    x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
    x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
    x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
    x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
    x86/headers: Unify register type definitions between 32-bit compat and i386
    x86/headers: Use ABI types consistently in sigcontext*.h
    x86/headers: Separate out legacy user-space structure definitions
    x86/headers: Clean up and better document uapi/asm/sigcontext.h
    x86/headers: Clean up uapi/asm/sigcontext32.h
    x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h

    Linus Torvalds
     

14 Sep, 2015

1 commit

  • There are two concepts that have some confusing naming:
    1. Extended State Component numbers (currently called
    XFEATURE_BIT_*)
    2. Extended State Component masks (currently called XSTATE_*)

    The numbers are (currently) from 0-9. State component 3 is the
    bounds registers for MPX, for instance.

    But when we want to enable "state component 3", we go set a bit
    in XCR0. The bit we set is 1<
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tim Chen
    Cc: dave@sr71.net
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
    [ Ported to v4.3-rc1. ]
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

08 Sep, 2015

1 commit

  • Remove uses of _fpstate_ia32 from the kernel, and move the
    legacy _fpstate_ia32 definition to the user-space only portion
    of the header.

    Acked-by: Mikko Rapeli
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 May, 2015

3 commits

  • This cleans up the call sites and the function a bit,
    and also makes it more symmetric with the other high
    level FPU state handling functions.

    It's still only valid for the current task, as we copy
    to the FPU registers of the current CPU.

    No change in functionality.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Use these consistent names:

    struct fregs_state # was: i387_fsave_struct
    struct fxregs_state # was: i387_fxsave_struct
    struct swregs_state # was: i387_soft_struct
    struct xregs_state # was: xsave_struct
    union fpregs_state # was: thread_xstate

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • fpu/xstate.c has a lot of generic FPU signal frame handling routines,
    move them into a separate file: fpu/signal.c.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar