27 Apr, 2015

1 commit

  • AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
    SS == 0 results in an invalid usermode state in which SS is apparently
    equal to __USER_DS but causes #SS if used.

    Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
    ensuring that SYSRET never happens with SS set to NULL.

    This was exposed by a recent vDSO cleanup.

    Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
    Signed-off-by: Andy Lutomirski
    Cc: Peter Anvin
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: Brian Gerst
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

09 Apr, 2015

1 commit

  • The change which affected how execve clears EXTRA_REGS missed
    32-bit execve syscalls.

    Fix this by using 64-bit execve stub epilogue for them too.

    Run-tested.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1428439424-7258-3-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

06 Apr, 2015

1 commit

  • The 'pax' argument is unnecesary. Instead, store the RAX value
    directly in regs.

    This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext()
    was changed to return an error code instead of EAX directly:

    https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174

    In 2007 sigaltstack syscall support was added, where the return
    value of restore_sigcontext() was changed to carry the memory-copying
    failure code.

    But instead of putting 'ax' into regs->ax directly, it was carried
    in via a pointer and then returned, where the generic syscall return
    code copied it to regs->ax.

    So there was never any deeper reason for this suboptimal pattern, it
    was simply never noticed after being introduced.

    Signed-off-by: Brian Gerst
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com
    Signed-off-by: Ingo Molnar

    Brian Gerst
     

03 Apr, 2015

1 commit

  • SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
    with usergs and IRQs on. That means that we rely on STI to
    correctly mask interrupts for one instruction. This is okay by
    itself, but the semantics with respect to NMIs are unclear.

    Avoid the whole issue by using SYSRETL instead. For background,
    Intel CPUs don't allow SYSCALL from compat mode, but they do
    allow SYSRETL back to compat mode. Go figure.

    To avoid doing too much at once, this doesn't revamp the calling
    convention. We still return with EBP, EDX, and ECX on the user
    stack.

    Oddly this seems to be 30 cycles or so faster. Avoiding POPFQ
    and STI will account for under half of that, I think, so my best
    guess is that Intel just optimizes SYSRET much better than
    SYSEXIT.

    Signed-off-by: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

01 Apr, 2015

1 commit

  • This mimics the recent similar 64-bit change.
    Saves ~110 bytes of code.

    Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
    I also looked at the diff of entry_64.o disassembly, to have
    a different view of the changes.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1427821211-25099-2-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

27 Mar, 2015

2 commits

  • There are a couple of syscall argument zero-extension instructions in
    the 32-bit compat entry code, and it was mentioned that people keep
    trying to optimize them out, introducing bugs.

    Make them more visible, and add a "do not remove" comment.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1427452582-21624-3-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • The existing comment has proven to be not very clear.

    Replace it with a comment similar to the one we now have in the 64-bit
    syscall entry point. (Three instances, one per 32-bit syscall entry).

    In the INT80 entry point's CFI annotations, replace mysterious
    expressions with numric constants. In this case, raw numbers
    look more understandable.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1427452582-21624-2-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

25 Mar, 2015

4 commits

  • The THREAD_INFO() macro has a somewhat confusingly generic name,
    defined in a generic .h C header file. It also does not make it
    clear that it constructs a memory operand for use in assembly
    code.

    Rename it to ASM_THREAD_INFO() to make it all glaringly
    obvious on first glance.

    Acked-by: Borislav Petkov
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Before:

    TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d

    After:

    movl THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d

    to turn it into a clear thread_info accessor.

    No code changed:

    md5:
    fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.before.asm
    fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.after.asm

    e39f2958a5d1300158e276e4f7663263 entry_64.o.before.asm
    e39f2958a5d1300158e276e4f7663263 entry_64.o.after.asm

    Acked-by: Andy Lutomirski
    Acked-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • PER_CPU_VAR(kernel_stack) was set up in a way where it points
    five stack slots below the top of stack.

    Presumably, it was done to avoid one "sub $5*8,%rsp"
    in syscall/sysenter code paths, where iret frame needs to be
    created by hand.

    Ironically, none of them benefits from this optimization,
    since all of them need to allocate additional data on stack
    (struct pt_regs), so they still have to perform subtraction.

    This patch eliminates KERNEL_STACK_OFFSET.

    PER_CPU_VAR(kernel_stack) now points directly to top of stack.
    pt_regs allocations are adjusted to allocate iret frame as well.
    Hopefully we can merge it later with 32-bit specific
    PER_CPU_VAR(cpu_current_top_of_stack) variable...

    Net result in generated code is that constants in several insns
    are changed.

    This change is necessary for changing struct pt_regs creation
    in SYSCALL64 code path from MOV to PUSH instructions.

    Signed-off-by: Denys Vlasenko
    Acked-by: Borislav Petkov
    Acked-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • This changes the THREAD_INFO() definition and all its callsites
    so that they do not count stack position from
    (top of stack - KERNEL_STACK_OFFSET), but from top of stack.

    Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
    are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
    - "calculate thread_info's address using information that
    rsp is SIZEOF_PTREGS bytes below top of stack".

    While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
    "((off)-THREAD_SIZE)(reg)". The form without parentheses
    falsely looks like we invoke THREAD_SIZE() macro.

    Improve comment atop THREAD_INFO macro definition.

    This patch does not change generated code (verified by objdump).

    Signed-off-by: Denys Vlasenko
    Acked-by: Borislav Petkov
    Acked-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

23 Mar, 2015

1 commit

  • Both the execve() and sigreturn() family of syscalls have the
    ability to change registers in ways that may not be compatabile
    with the syscall path they were called from.

    In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
    and some bits in eflags.

    These syscalls have stubs that are hardcoded to jump to the IRET path,
    and not return to the original syscall path.

    The following commit:

    76f5df43cab5e76 ("Always allocate a complete "struct pt_regs" on the kernel stack")

    recently changed this for some 32-bit compat syscalls, but introduced a bug where
    execve from a 32-bit program to a 64-bit program would fail because it still returned
    via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.

    This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
    that the IRET path is always taken on exit to userspace.

    Signed-off-by: Brian Gerst
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
    [ Improved the changelog and comments. ]
    Signed-off-by: Ingo Molnar

    Brian Gerst
     

06 Mar, 2015

2 commits

  • It has nothing to do with init -- there's only one TSS per cpu.

    Other names considered include:

    - current_tss: Confusing because we never switch the tss.
    - singleton_tss: Too long.

    This patch was generated with 's/init_tss/cpu_tss/g'. Followup
    patches will fix INIT_TSS and INIT_TSS_IST by hand.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • The ia32 sysenter code loaded the top of the kernel stack into
    rsp by loading kernel_stack and then adjusting it. It can be
    simplified to just read sp0 directly.

    This requires the addition of a new asm-offsets entry for sp0.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

05 Mar, 2015

5 commits

  • The last instance of "mysterious" SS+8 constant is replaced by
    SIZEOF_PTREGS.

    Message-Id:
    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/d35aeba3059407ac54f472ddcfbea767ff8916ac.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • Use of a small macro - one with conditional expansion - does
    more harm than good. It obfuscates code, with minimal code
    reuse.

    For example, because of obfuscation it's not obvious that
    in 'ia32_sysenter_target', we can optimize loading of r9 -
    currently it is loaded with a detour through ebp.

    This patch folds the IA32_ARG_FIXUP macro into its callers.

    No code changes.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/4da092094cd78734384ac31e0d4ec1d8f69145a2.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics.
    Moreover, they differ in 32- and 64-bit mode.

    What is saved? What is not? Is rsp set? Are interrupts disabled?
    People tend to not remember these details well enough.

    This patch adds comments which explain in detail
    what registers are modified by each of these instructions.

    The comments are placed immediately before corresponding
    entry and exit points.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • ARGOFFSET is zero now, removing it changes no code.

    A few macros lost "offset" parameter, since it is always zero
    now too.

    No code changes - verified with objdump.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • The 64-bit entry code was using six stack slots less by not
    saving/restoring registers which are callee-preserved according
    to the C ABI, and was not allocating space for them.

    Only when syscalls needed a complete "struct pt_regs" was
    the complete area allocated and filled in.

    As an additional twist, on interrupt entry a "slightly less
    truncated pt_regs" trick is used, to make nested interrupt
    stacks easier to unwind.

    This proved to be a source of significant obfuscation and subtle
    bugs. For example, 'stub_fork' had to pop the return address,
    extend the struct, save registers, and push return address back.
    Ugly. 'ia32_ptregs_common' pops return address and "returns" via
    jmp insn, throwing a wrench into CPU return stack cache.

    This patch changes the code to always allocate a complete
    "struct pt_regs" on the kernel stack. The saving of registers
    is still done lazily.

    "Partial pt_regs" trick on interrupt stack is retained.

    Macros which manipulate "struct pt_regs" on stack are reworked:

    - ALLOC_PT_GPREGS_ON_STACK allocates the structure.

    - SAVE_C_REGS saves to it those registers which are clobbered
    by C code.

    - SAVE_EXTRA_REGS saves to it all other registers.

    - Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros
    reverse it.

    'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance
    with the return pointer.

    LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
    instead of magic numbers.

    'error_entry' and 'save_paranoid' now use SAVE_C_REGS +
    SAVE_EXTRA_REGS instead of having it open-coded yet again.

    Patch was run-tested: 64-bit executables, 32-bit executables,
    strace works.

    Timing tests did not show measurable difference in 32-bit
    and 64-bit syscalls.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com
    Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

04 Mar, 2015

4 commits


13 Feb, 2015

1 commit

  • If an attacker can cause a controlled kernel stack overflow, overwriting
    the restart block is a very juicy exploit target. This is because the
    restart_block is held in the same memory allocation as the kernel stack.

    Moving the restart block to struct task_struct prevents this exploit by
    making the restart_block harder to locate.

    Note that there are other fields in thread_info that are also easy
    targets, at least on some architectures.

    It's also a decent simplification, since the restart code is more or less
    identical on all architectures.

    [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
    Signed-off-by: Andy Lutomirski
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: David Miller
    Acked-by: Richard Weinberger
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Steven Miao
    Cc: Mark Salter
    Cc: Aurelien Jacquiot
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Richard Kuo
    Cc: "Luck, Tony"
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Jonas Bonn
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Michael Ellerman (powerpc)
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Chris Metcalf
    Cc: Guan Xuetao
    Cc: Chris Zankel
    Cc: Max Filippov
    Cc: Oleg Nesterov
    Cc: Guenter Roeck
    Signed-off-by: James Hogan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

14 Jan, 2015

1 commit

  • The values of these two constants are the same, the meaning is different.

    Acked-by: Borislav Petkov
    CC: Linus Torvalds
    CC: Oleg Nesterov
    CC: "H. Peter Anvin"
    CC: Borislav Petkov
    CC: Frederic Weisbecker
    CC: X86 ML
    CC: Alexei Starovoitov
    CC: Will Drewry
    CC: Kees Cook
    CC: linux-kernel@vger.kernel.org
    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski

    Denys Vlasenko
     

14 Dec, 2014

1 commit

  • Hook up x86-64, i386 and x32 ABIs.

    Signed-off-by: David Drysdale
    Cc: Meredydd Luff
    Cc: Shuah Khan
    Cc: "Eric W. Biederman"
    Cc: Andy Lutomirski
    Cc: Alexander Viro
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Kees Cook
    Cc: Arnd Bergmann
    Cc: Rich Felker
    Cc: Christoph Hellwig
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Drysdale
     

09 Dec, 2014

1 commit


20 Nov, 2014

1 commit


01 Nov, 2014

1 commit

  • Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code
    reads out of bounds, causing the NT fix to be unreliable. But, and
    this is much, much worse, if your stack is somehow just below the
    top of the direct map (or a hole), you read out of bounds and crash.

    Excerpt from the crash:

    [ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296

    2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp)

    That read is deterministically above the top of the stack. I
    thought I even single-stepped through this code when I wrote it to
    check the offset, but I clearly screwed it up.

    Fixes: 8c7aa698baca ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
    Reported-by: Rusty Russell
    Cc: stable@vger.kernel.org
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

20 Oct, 2014

1 commit

  • Pull audit updates from Eric Paris:
    "So this change across a whole bunch of arches really solves one basic
    problem. We want to audit when seccomp is killing a process. seccomp
    hooks in before the audit syscall entry code. audit_syscall_entry
    took as an argument the arch of the given syscall. Since the arch is
    part of what makes a syscall number meaningful it's an important part
    of the record, but it isn't available when seccomp shoots the
    syscall...

    For most arch's we have a better way to get the arch (syscall_get_arch)
    So the solution was two fold: Implement syscall_get_arch() everywhere
    there is audit which didn't have it. Use syscall_get_arch() in the
    seccomp audit code. Having syscall_get_arch() everywhere meant it was
    a useless flag on the stack and we could get rid of it for the typical
    syscall entry.

    The other changes inside the audit system aren't grand, fixed some
    records that had invalid spaces. Better locking around the task comm
    field. Removing some dead functions and structs. Make some things
    static. Really minor stuff"

    * git://git.infradead.org/users/eparis/audit: (31 commits)
    audit: rename audit_log_remove_rule to disambiguate for trees
    audit: cull redundancy in audit_rule_change
    audit: WARN if audit_rule_change called illegally
    audit: put rule existence check in canonical order
    next: openrisc: Fix build
    audit: get comm using lock to avoid race in string printing
    audit: remove open_arg() function that is never used
    audit: correct AUDIT_GET_FEATURE return message type
    audit: set nlmsg_len for multicast messages.
    audit: use union for audit_field values since they are mutually exclusive
    audit: invalid op= values for rules
    audit: use atomic_t to simplify audit_serial()
    kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
    audit: reduce scope of audit_log_fcaps
    audit: reduce scope of audit_net_id
    audit: arm64: Remove the audit arch argument to audit_syscall_entry
    arm64: audit: Add audit hook in syscall_trace_enter/exit()
    audit: x86: drop arch from __audit_syscall_entry() interface
    sparc: implement is_32bit_task
    sparc: properly conditionalize use of TIF_32BIT
    ...

    Linus Torvalds
     

14 Oct, 2014

1 commit


09 Oct, 2014

1 commit


07 Oct, 2014

1 commit

  • The NT flag doesn't do anything in long mode other than causing IRET
    to #GP. Oddly, CPL3 code can still set NT using popf.

    Entry via hardware or software interrupt clears NT automatically, so
    the only relevant entries are fast syscalls.

    If user code causes kernel code to run with NT set, then there's at
    least some (small) chance that it could cause trouble. For example,
    user code could cause a call to EFI code with NT set, and who knows
    what would happen? Apparently some games on Wine sometimes do
    this (!), and, if an IRET return happens, they will segfault. That
    segfault cannot be handled, because signal delivery fails, too.

    This patch programs the CPU to clear NT on entry via SYSCALL (both
    32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
    in software on entry via SYSENTER.

    To save a few cycles, this borrows a trick from Jan Beulich in Xen:
    it checks whether NT is set before trying to clear it. As a result,
    it seems to have very little effect on SYSENTER performance on my
    machine.

    There's another minor bug fix in here: it looks like the CFI
    annotations were wrong if CONFIG_AUDITSYSCALL=n.

    Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

    I haven't touched anything on 32-bit kernels.

    The syscall mask change comes from a variant of this patch by Anish
    Bhatt.

    Note to stable maintainers: there is no known security issue here.
    A misguided program can set NT and cause the kernel to try and fail
    to deliver SIGSEGV, crashing the program. This patch fixes Far Cry
    on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275

    Cc:
    Reported-by: Anish Bhatt
    Signed-off-by: Andy Lutomirski
    Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
    Signed-off-by: H. Peter Anvin

    Andy Lutomirski
     

24 Sep, 2014

1 commit

  • Since the arch is found locally in __audit_syscall_entry(), there is no need to
    pass it in as a parameter. Delete it from the parameter list.

    x86* was the only arch to call __audit_syscall_entry() directly and did so from
    assembly code.

    Signed-off-by: Richard Guy Briggs
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: x86@kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-audit@redhat.com
    Signed-off-by: Eric Paris

    ---

    As this patch relies on changes in the audit tree, I think it
    appropriate to send it through my tree rather than the x86 tree.

    Richard Guy Briggs
     

06 May, 2014

1 commit

  • Currently, vdso.so files are prepared and analyzed by a combination
    of objcopy, nm, some linker script tricks, and some simple ELF
    parsers in the kernel. Replace all of that with plain C code that
    runs at build time.

    All five vdso images now generate .c files that are compiled and
    linked in to the kernel image.

    This should cause only one userspace-visible change: the loaded vDSO
    images are stripped more heavily than they used to be. Everything
    outside the loadable segment is dropped. In particular, this causes
    the section table and section name strings to be missing. This
    should be fine: real dynamic loaders don't load or inspect these
    tables anyway. The result is roughly equivalent to eu-strip's
    --strip-sections option.

    The purpose of this change is to enable the vvar and hpet mappings
    to be moved to the page following the vDSO load segment. Currently,
    it is possible for the section table to extend into the page after
    the load segment, so, if we map it, it risks overlapping the vvar or
    hpet page. This happens whenever the load segment is just under a
    multiple of PAGE_SIZE.

    The only real subtlety here is that the old code had a C file with
    inline assembler that did 'call VDSO32_vsyscall' and a linker script
    that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
    likely worked by accident: the linker script entry defines a symbol
    associated with an address as opposed to an alias for the real
    dynamic symbol __kernel_vsyscall. That caused ld to relocate the
    reference at link time instead of leaving an interposable dynamic
    relocation. Since the VDSO32_vsyscall hack is no longer needed, I
    now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
    work. vdso2c will generate an error and abort the build if the
    resulting image contains any dynamic relocations, so we won't
    silently generate bad vdso images.

    (Dynamic relocations are a problem because nothing will even attempt
    to relocate the vdso.)

    Signed-off-by: Andy Lutomirski
    Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
    Signed-off-by: H. Peter Anvin

    Andy Lutomirski
     

09 Nov, 2013

4 commits


05 Sep, 2013

1 commit