27 Apr, 2015
1 commit
-
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.This was exposed by a recent vDSO cleanup.
Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski
Cc: Peter Anvin
Cc: Borislav Petkov
Cc: Denys Vlasenko
Cc: Brian Gerst
Signed-off-by: Linus Torvalds
09 Apr, 2015
1 commit
-
The change which affected how execve clears EXTRA_REGS missed
32-bit execve syscalls.Fix this by using 64-bit execve stub epilogue for them too.
Run-tested.
Signed-off-by: Denys Vlasenko
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Will Drewry
Link: http://lkml.kernel.org/r/1428439424-7258-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar
06 Apr, 2015
1 commit
-
The 'pax' argument is unnecesary. Instead, store the RAX value
directly in regs.This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext()
was changed to return an error code instead of EAX directly:https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174
In 2007 sigaltstack syscall support was added, where the return
value of restore_sigcontext() was changed to carry the memory-copying
failure code.But instead of putting 'ax' into regs->ax directly, it was carried
in via a pointer and then returned, where the generic syscall return
code copied it to regs->ax.So there was never any deeper reason for this suboptimal pattern, it
was simply never noticed after being introduced.Signed-off-by: Brian Gerst
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar
03 Apr, 2015
1 commit
-
SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
with usergs and IRQs on. That means that we rely on STI to
correctly mask interrupts for one instruction. This is okay by
itself, but the semantics with respect to NMIs are unclear.Avoid the whole issue by using SYSRETL instead. For background,
Intel CPUs don't allow SYSCALL from compat mode, but they do
allow SYSRETL back to compat mode. Go figure.To avoid doing too much at once, this doesn't revamp the calling
convention. We still return with EBP, EDX, and ECX on the user
stack.Oddly this seems to be 30 cycles or so faster. Avoiding POPFQ
and STI will account for under half of that, I think, so my best
guess is that Intel just optimizes SYSRET much better than
SYSEXIT.Signed-off-by: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org
Signed-off-by: Ingo Molnar
01 Apr, 2015
1 commit
-
This mimics the recent similar 64-bit change.
Saves ~110 bytes of code.Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
I also looked at the diff of entry_64.o disassembly, to have
a different view of the changes.Signed-off-by: Denys Vlasenko
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/1427821211-25099-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar
27 Mar, 2015
2 commits
-
There are a couple of syscall argument zero-extension instructions in
the 32-bit compat entry code, and it was mentioned that people keep
trying to optimize them out, introducing bugs.Make them more visible, and add a "do not remove" comment.
Signed-off-by: Denys Vlasenko
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/1427452582-21624-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar -
The existing comment has proven to be not very clear.
Replace it with a comment similar to the one we now have in the 64-bit
syscall entry point. (Three instances, one per 32-bit syscall entry).In the INT80 entry point's CFI annotations, replace mysterious
expressions with numric constants. In this case, raw numbers
look more understandable.Signed-off-by: Denys Vlasenko
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/1427452582-21624-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar
25 Mar, 2015
4 commits
-
The THREAD_INFO() macro has a somewhat confusingly generic name,
defined in a generic .h C header file. It also does not make it
clear that it constructs a memory operand for use in assembly
code.Rename it to ASM_THREAD_INFO() to make it all glaringly
obvious on first glance.Acked-by: Borislav Petkov
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Denys Vlasenko
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com
Signed-off-by: Ingo Molnar -
Before:
TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
After:
movl THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d
to turn it into a clear thread_info accessor.
No code changed:
md5:
fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.before.asm
fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.after.asme39f2958a5d1300158e276e4f7663263 entry_64.o.before.asm
e39f2958a5d1300158e276e4f7663263 entry_64.o.after.asmAcked-by: Andy Lutomirski
Acked-by: Denys Vlasenko
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com
Signed-off-by: Ingo Molnar -
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.This patch eliminates KERNEL_STACK_OFFSET.
PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...Net result in generated code is that constants in several insns
are changed.This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.Signed-off-by: Denys Vlasenko
Acked-by: Borislav Petkov
Acked-by: Andy Lutomirski
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar -
This changes the THREAD_INFO() definition and all its callsites
so that they do not count stack position from
(top of stack - KERNEL_STACK_OFFSET), but from top of stack.Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
- "calculate thread_info's address using information that
rsp is SIZEOF_PTREGS bytes below top of stack".While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
"((off)-THREAD_SIZE)(reg)". The form without parentheses
falsely looks like we invoke THREAD_SIZE() macro.Improve comment atop THREAD_INFO macro definition.
This patch does not change generated code (verified by objdump).
Signed-off-by: Denys Vlasenko
Acked-by: Borislav Petkov
Acked-by: Andy Lutomirski
Cc: Alexei Starovoitov
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar
23 Mar, 2015
1 commit
-
Both the execve() and sigreturn() family of syscalls have the
ability to change registers in ways that may not be compatabile
with the syscall path they were called from.In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
and some bits in eflags.These syscalls have stubs that are hardcoded to jump to the IRET path,
and not return to the original syscall path.The following commit:
76f5df43cab5e76 ("Always allocate a complete "struct pt_regs" on the kernel stack")
recently changed this for some 32-bit compat syscalls, but introduced a bug where
execve from a 32-bit program to a 64-bit program would fail because it still returned
via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
that the IRET path is always taken on exit to userspace.Signed-off-by: Brian Gerst
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
[ Improved the changelog and comments. ]
Signed-off-by: Ingo Molnar
06 Mar, 2015
2 commits
-
It has nothing to do with init -- there's only one TSS per cpu.
Other names considered include:
- current_tss: Confusing because we never switch the tss.
- singleton_tss: Too long.This patch was generated with 's/init_tss/cpu_tss/g'. Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar -
The ia32 sysenter code loaded the top of the kernel stack into
rsp by loading kernel_stack and then adjusting it. It can be
simplified to just read sp0 directly.This requires the addition of a new asm-offsets entry for sp0.
Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar
05 Mar, 2015
5 commits
-
The last instance of "mysterious" SS+8 constant is replaced by
SIZEOF_PTREGS.Message-Id:
Signed-off-by: Denys Vlasenko
Signed-off-by: Andy Lutomirski
Cc: Alexei Starovoitov
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Steven Rostedt
Cc: Will Drewry
Link: http://lkml.kernel.org/r/d35aeba3059407ac54f472ddcfbea767ff8916ac.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar -
Use of a small macro - one with conditional expansion - does
more harm than good. It obfuscates code, with minimal code
reuse.For example, because of obfuscation it's not obvious that
in 'ia32_sysenter_target', we can optimize loading of r9 -
currently it is loaded with a detour through ebp.This patch folds the IA32_ARG_FIXUP macro into its callers.
No code changes.
Signed-off-by: Denys Vlasenko
Signed-off-by: Andy Lutomirski
Cc: Alexei Starovoitov
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Will Drewry
Link: http://lkml.kernel.org/r/4da092094cd78734384ac31e0d4ec1d8f69145a2.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar -
SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics.
Moreover, they differ in 32- and 64-bit mode.What is saved? What is not? Is rsp set? Are interrupts disabled?
People tend to not remember these details well enough.This patch adds comments which explain in detail
what registers are modified by each of these instructions.The comments are placed immediately before corresponding
entry and exit points.Signed-off-by: Denys Vlasenko
Signed-off-by: Andy Lutomirski
Cc: Alexei Starovoitov
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Will Drewry
Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar -
ARGOFFSET is zero now, removing it changes no code.
A few macros lost "offset" parameter, since it is always zero
now too.No code changes - verified with objdump.
Signed-off-by: Denys Vlasenko
Signed-off-by: Andy Lutomirski
Cc: Alexei Starovoitov
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Will Drewry
Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar -
The 64-bit entry code was using six stack slots less by not
saving/restoring registers which are callee-preserved according
to the C ABI, and was not allocating space for them.Only when syscalls needed a complete "struct pt_regs" was
the complete area allocated and filled in.As an additional twist, on interrupt entry a "slightly less
truncated pt_regs" trick is used, to make nested interrupt
stacks easier to unwind.This proved to be a source of significant obfuscation and subtle
bugs. For example, 'stub_fork' had to pop the return address,
extend the struct, save registers, and push return address back.
Ugly. 'ia32_ptregs_common' pops return address and "returns" via
jmp insn, throwing a wrench into CPU return stack cache.This patch changes the code to always allocate a complete
"struct pt_regs" on the kernel stack. The saving of registers
is still done lazily."Partial pt_regs" trick on interrupt stack is retained.
Macros which manipulate "struct pt_regs" on stack are reworked:
- ALLOC_PT_GPREGS_ON_STACK allocates the structure.
- SAVE_C_REGS saves to it those registers which are clobbered
by C code.- SAVE_EXTRA_REGS saves to it all other registers.
- Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros
reverse it.'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance
with the return pointer.LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
instead of magic numbers.'error_entry' and 'save_paranoid' now use SAVE_C_REGS +
SAVE_EXTRA_REGS instead of having it open-coded yet again.Patch was run-tested: 64-bit executables, 32-bit executables,
strace works.Timing tests did not show measurable difference in 32-bit
and 64-bit syscalls.Signed-off-by: Denys Vlasenko
Signed-off-by: Andy Lutomirski
Cc: Alexei Starovoitov
Cc: Borislav Petkov
Cc: Frederic Weisbecker
Cc: H. Peter Anvin
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Will Drewry
Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar
04 Mar, 2015
4 commits
-
Signed-off-by: Ingo Molnar
-
The check against lastcomm is racy, and the message it produces
isn't necessary. vm86 support can be disabled on a 32-bit
kernel also, and doesn't have this message. Switch to
sys_ni_syscall instead.Signed-off-by: Brian Gerst
Cc: Borislav Petkov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1425439896-8322-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar -
Combine the 32-bit syscall tables into one file.
Signed-off-by: Brian Gerst
Cc: Borislav Petkov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1425439896-8322-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar -
compat_ni_syscall() does the same thing as sys_ni_syscall().
Signed-off-by: Brian Gerst
Cc: Borislav Petkov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1425439896-8322-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar
13 Feb, 2015
1 commit
-
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.It's also a decent simplification, since the restart code is more or less
identical on all architectures.[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski
Cc: Thomas Gleixner
Cc: Al Viro
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Kees Cook
Cc: David Miller
Acked-by: Richard Weinberger
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Vineet Gupta
Cc: Russell King
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Steven Miao
Cc: Mark Salter
Cc: Aurelien Jacquiot
Cc: Mikael Starvik
Cc: Jesper Nilsson
Cc: David Howells
Cc: Richard Kuo
Cc: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Michal Simek
Cc: Ralf Baechle
Cc: Jonas Bonn
Cc: "James E.J. Bottomley"
Cc: Helge Deller
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Acked-by: Michael Ellerman (powerpc)
Tested-by: Michael Ellerman (powerpc)
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Chen Liqin
Cc: Lennox Wu
Cc: Chris Metcalf
Cc: Guan Xuetao
Cc: Chris Zankel
Cc: Max Filippov
Cc: Oleg Nesterov
Cc: Guenter Roeck
Signed-off-by: James Hogan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Jan, 2015
1 commit
-
The values of these two constants are the same, the meaning is different.
Acked-by: Borislav Petkov
CC: Linus Torvalds
CC: Oleg Nesterov
CC: "H. Peter Anvin"
CC: Borislav Petkov
CC: Frederic Weisbecker
CC: X86 ML
CC: Alexei Starovoitov
CC: Will Drewry
CC: Kees Cook
CC: linux-kernel@vger.kernel.org
Signed-off-by: Denys Vlasenko
Signed-off-by: Andy Lutomirski
14 Dec, 2014
1 commit
-
Hook up x86-64, i386 and x32 ABIs.
Signed-off-by: David Drysdale
Cc: Meredydd Luff
Cc: Shuah Khan
Cc: "Eric W. Biederman"
Cc: Andy Lutomirski
Cc: Alexander Viro
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Kees Cook
Cc: Arnd Bergmann
Cc: Rich Felker
Cc: Christoph Hellwig
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2014
1 commit
20 Nov, 2014
1 commit
-
Signed-off-by: Al Viro
01 Nov, 2014
1 commit
-
Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code
reads out of bounds, causing the NT fix to be unreliable. But, and
this is much, much worse, if your stack is somehow just below the
top of the direct map (or a hole), you read out of bounds and crash.Excerpt from the crash:
[ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296
2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp)
That read is deterministically above the top of the stack. I
thought I even single-stepped through this code when I wrote it to
check the offset, but I clearly screwed it up.Fixes: 8c7aa698baca ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
Reported-by: Rusty Russell
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski
Signed-off-by: Linus Torvalds
20 Oct, 2014
1 commit
-
Pull audit updates from Eric Paris:
"So this change across a whole bunch of arches really solves one basic
problem. We want to audit when seccomp is killing a process. seccomp
hooks in before the audit syscall entry code. audit_syscall_entry
took as an argument the arch of the given syscall. Since the arch is
part of what makes a syscall number meaningful it's an important part
of the record, but it isn't available when seccomp shoots the
syscall...For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold: Implement syscall_get_arch() everywhere
there is audit which didn't have it. Use syscall_get_arch() in the
seccomp audit code. Having syscall_get_arch() everywhere meant it was
a useless flag on the stack and we could get rid of it for the typical
syscall entry.The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces. Better locking around the task comm
field. Removing some dead functions and structs. Make some things
static. Really minor stuff"* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: rename audit_log_remove_rule to disambiguate for trees
audit: cull redundancy in audit_rule_change
audit: WARN if audit_rule_change called illegally
audit: put rule existence check in canonical order
next: openrisc: Fix build
audit: get comm using lock to avoid race in string printing
audit: remove open_arg() function that is never used
audit: correct AUDIT_GET_FEATURE return message type
audit: set nlmsg_len for multicast messages.
audit: use union for audit_field values since they are mutually exclusive
audit: invalid op= values for rules
audit: use atomic_t to simplify audit_serial()
kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
audit: reduce scope of audit_log_fcaps
audit: reduce scope of audit_net_id
audit: arm64: Remove the audit arch argument to audit_syscall_entry
arm64: audit: Add audit hook in syscall_trace_enter/exit()
audit: x86: drop arch from __audit_syscall_entry() interface
sparc: implement is_32bit_task
sparc: properly conditionalize use of TIF_32BIT
...
14 Oct, 2014
1 commit
-
Pull x86 fixes from Ingo Molnar:
"Misc smaller fixes that missed the v3.17 cycle"* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Add arch/x86/purgatory/ make generated files to gitignore
x86: Fix section conflict for numachip
x86: Reject x32 executables if x32 ABI not supported
x86_64, entry: Filter RFLAGS.NT on entry from userspace
x86, boot, kaslr: Fix nuisance warning on 32-bit builds
09 Oct, 2014
1 commit
-
... rather than doing that in the guts of ->load_binary().
[updated to fix the bug spotted by Shentino - for SIGSEGV we really need
something stronger than send_sig_info(); again, better do that in one place]Signed-off-by: Al Viro
07 Oct, 2014
1 commit
-
The NT flag doesn't do anything in long mode other than causing IRET
to #GP. Oddly, CPL3 code can still set NT using popf.Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble. For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen? Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault. That
segfault cannot be handled, because signal delivery fails, too.This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it. As a result,
it seems to have very little effect on SYSENTER performance on my
machine.There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
I haven't touched anything on 32-bit kernels.
The syscall mask change comes from a variant of this patch by Anish
Bhatt.Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program. This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275Cc:
Reported-by: Anish Bhatt
Signed-off-by: Andy Lutomirski
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin
24 Sep, 2014
1 commit
-
Since the arch is found locally in __audit_syscall_entry(), there is no need to
pass it in as a parameter. Delete it from the parameter list.x86* was the only arch to call __audit_syscall_entry() directly and did so from
assembly code.Signed-off-by: Richard Guy Briggs
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-audit@redhat.com
Signed-off-by: Eric Paris---
As this patch relies on changes in the audit tree, I think it
appropriate to send it through my tree rather than the x86 tree.
06 May, 2014
1 commit
-
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel. Replace all of that with plain C code that
runs at build time.All five vdso images now generate .c files that are compiled and
linked in to the kernel image.This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be. Everything
outside the loadable segment is dropped. In particular, this causes
the section table and section name strings to be missing. This
should be fine: real dynamic loaders don't load or inspect these
tables anyway. The result is roughly equivalent to eu-strip's
--strip-sections option.The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment. Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page. This happens whenever the load segment is just under a
multiple of PAGE_SIZE.The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall. That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation. Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work. vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)Signed-off-by: Andy Lutomirski
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin
09 Nov, 2013
4 commits
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
just getting rid of bitrot
Signed-off-by: Al Viro
05 Sep, 2013
1 commit
-
Pull x86 SMAP fixes from Ingo Molnar:
"Fixes for Intel SMAP support, to fix SIGSEGVs during bootup"* 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP
x86, smap: Handle csum_partial_copy_*_user()