30 Dec, 2017
31 commits
-
commit 613e396bc0d4c7604fba23256644e78454c68cf6 upstream.
init_espfix_bsp() needs to be invoked before the page table isolation
initialization. Move it into mm_init() which is the place where pti_init()
will be added.While at it get rid of the #ifdeffery and provide proper stub functions.
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 92a0f81d89571e3e8759366e050ee05cc545ef99 upstream.
Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
and 0-day already hit a case where the fixmap PTEs were cleared by
cleanup_highmap().Aside of that the fixmap API is a pain as it's all backwards.
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit ed1bbc40a0d10e0c5c74fe7bdc6298295cf40255 upstream.
Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 1a3b0caeb77edeac5ce5fa05e6a61c474c9a9745 upstream.
Unclutter tlbflush.h a little.
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit dd95f1a4b5ca904c78e6a097091eb21436478abb upstream.
There are effectively two ASID types:
1. The one stored in the mmu_context that goes from 0..5
2. The one programmed into the hardware that goes from 1..6This consolidates the locations where converting between the two (by doing
a +1) to a single place which gives us a nice place to comment.
PAGE_TABLE_ISOLATION will also need to, given an ASID, know which hardware
ASID to flush for the userspace mapping.Signed-off-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit cb0a9144a744e55207e24dcef812f05cd15a499a upstream.
First, it's nice to remove the magic numbers.
Second, PAGE_TABLE_ISOLATION is going to consume half of the available ASID
space. The space is currently unused, but add a comment to spell out this
new restriction.Signed-off-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 50fb83a62cf472dc53ba23bd3f7bd6c1b2b3b53e upstream.
For flushing the TLB, the ASID which has been programmed into the hardware
must be known. That differs from what is in 'cpu_tlbstate'.Add functions to transform the 'cpu_tlbstate' values into to the one
programmed into the hardware (CR3).It's not easy to include mmu_context.h into tlbflush.h, so just move the
CR3 building over to tlbflush.h.Signed-off-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 3f67af51e56f291d7417d77c4f67cd774633c5e1 upstream.
Per popular request..
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit b5fc6d943808b570bdfbec80f40c6b3855f1c48b upstream.
atomic64_inc_return() already implies smp_mb() before and after.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit a501686b2923ce6f2ff2b1d0d50682c6411baf72 upstream.
__flush_tlb_single() is for user mappings, __flush_tlb_one() for
kernel mappings.Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 23cb7d46f371844c004784ad9552a57446f73e5a upstream.
Commit:
ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
... grubbed into tlbflush internals without coherent explanation.
Since it says its a precaution and the SDM doesn't mention anything like
this, take it out back.Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: fenghua.yu@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 3e46e0f5ee3643a1239be9046c7ba6c66ca2b329 upstream.
Since uv_flush_tlb_others() implements flush_tlb_others() which is
about flushing user mappings, we should use __flush_tlb_single(),
which too is about flushing user mappings.Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Acked-by: Andrew Banman
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Mike Travis
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 4fe2d8b11a370af286287a2661de9d4e6c9a145a upstream.
If the kernel oopses while on the trampoline stack, it will print
"" even if SYSENTER is not involved. That is rather confusing.The "SYSENTER" stack is used for a lot more than SYSENTER now. Give it a
better string to display in stack dumps, and rename the kernel code to
match.Also move the 32-bit code over to the new naming even though it still uses
the entry stack only for SYSENTER.Signed-off-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit e8ffe96e5933d417195268478479933d56213a3f upstream.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 5a7ccf4754fb3660569a6de52ba7f7fc3dfaf280 upstream.
The old docs had the vsyscall range wrong and were missing the fixmap.
Fix both.There used to be 8 MB reserved for future vsyscalls, but that's long gone.
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Dave Hansen
Cc: David Laight
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Kees Cook
Cc: Kirill A. Shutemov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit a4828f81037f491b2cc986595e3a969a6eeb2fb5 upstream.
The LDT is inherited across fork() or exec(), but that makes no sense
at all because exec() is supposed to start the process clean.The reason why this happens is that init_new_context_ldt() is called from
init_new_context() which obviously needs to be called for both fork() and
exec().It would be surprising if anything relies on that behaviour, so it seems to
be safe to remove that misfeature.Split the context initialization into two parts. Clear the LDT pointer and
initialize the mutex from the general context init and move the LDT
duplication to arch_dup_mmap() which is only called on fork().Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Andy Lutomirsky
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: dan.j.williams@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit c2b3496bb30bd159e9de42e5c952e1f1f33c9a77 upstream.
The LDT is duplicated on fork() and on exec(), which is wrong as exec()
should start from a clean state, i.e. without LDT. To fix this the LDT
duplication code will be moved into arch_dup_mmap() which is only called
for fork().This introduces a locking problem. arch_dup_mmap() holds mmap_sem of the
parent process, but the LDT duplication code needs to acquire
mm->context.lock to access the LDT data safely, which is the reverse lock
order of write_ldt() where mmap_sem nests into context.lock.Solve this by introducing a new rw semaphore which serializes the
read/write_ldt() syscall operations and use context.lock to protect the
actual installment of the LDT descriptor.So context.lock stabilizes mm->context.ldt and can nest inside of the new
semaphore or mmap_sem.Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Andy Lutomirsky
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: dan.j.williams@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit c10e83f598d08046dd1ebc8360d4bb12d802d51b upstream.
In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
allowed to fail. Fix up all instances.Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andy Lutomirski
Cc: Andy Lutomirsky
Cc: Boris Ostrovsky
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: Dave Hansen
Cc: David Laight
Cc: Denys Vlasenko
Cc: Eduardo Valentin
Cc: Greg KH
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Will Deacon
Cc: aliguori@amazon.com
Cc: dan.j.williams@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 4831b779403a836158917d59a7ca880483c67378 upstream.
If something goes wrong with pagetable setup, vsyscall=native will
accidentally fall back to emulation. Make it warn and fail so that we
notice.Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 49275fef986abfb8b476e4708aaecc07e7d3e087 upstream.
The kernel is very erratic as to which pagetables have _PAGE_USER set. The
vsyscall page gets lucky: it seems that all of the relevant pagetables are
among the apparently arbitrary ones that set _PAGE_USER. Rather than
relying on chance, just explicitly set _PAGE_USER.This will let us clean up pagetable setup to stop setting _PAGE_USER. The
added code can also be reused by pagetable isolation to manage the
_PAGE_USER bit in the usermode tables.[ tglx: Folded paravirt fix from Juergen Gross ]
Signed-off-by: Andy Lutomirski
Signed-off-by: Thomas Gleixner
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Hansen
Cc: David Laight
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 146122e24bdf208015d629babba673e28d090709 upstream.
The address hints are a trainwreck. The array entry numbers have to kept
magically in sync with the actual hints, which is doomed as some of the
array members are initialized at runtime via the entry numbers.Designated initializers have been around before this code was
implemented....Use the entry numbers to populate the address hints array and add the
missing bits and pieces. Split 32 and 64 bit for readability sake.Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit c05344947b37f7cda726e802457370bc6eac4d26 upstream.
The check for a present page in printk_prot():
if (!pgprot_val(prot)) {
/* Not present */is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and
the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when
analyzing mapping correctness. Check for the present bit to make an
informed decision.Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 7bbcbd3d1cdcbacd0f9f8dc4c98d550972f1ca30 upstream.
The recent cpu_entry_area changes fail to compile on 32-bit when BIGSMP=y
and NR_CPUS=512, because the fixmap area becomes too big.Limit the number of CPUs with BIGSMP to 64, which is already way to big for
32-bit, but it's at least a working limitation.We performed a quick survey of 32-bit-only machines that might be affected
by this change negatively, but found none.Signed-off-by: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 32d0b95300db03c2b23b2ea2c94769a4a138e79d upstream.
[note, only the inat.h portion, to get objtool back in sync - gregkh]
b0caa8c8c6bbc422bc3c32b64852d6d618f32b49 Mon Sep 17 00:00:00 2001
When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero. Thus, the segment base address is needed to
calculate correctly the linear address.If running in protected mode, the segment selector to be used when
computing a linear address is determined by either any of segment override
prefixes in the instruction or inferred from the registers involved in the
computation of the effective address; in that order. Also, there are cases
when the segment override prefixes shall be ignored (i.e., code segments
are always selected by the CS segment register; string instructions always
use the ES segment register when using rDI register as operand). In long
mode, segment registers are ignored, except for FS and GS. In these two
cases, base addresses are obtained from the respective MSRs.For clarity, this process can be split into four steps (and an equal
number of functions): determine if segment prefixes overrides can be used;
parse the segment override prefixes, and use them if found; if not found
or cannot be used, use the default segment registers associated with the
operand registers. Once the segment register to use has been identified,
read its value to obtain the segment selector.The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into a pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
the respective segment registers.In order to identify the segment registers, a new set of #defines is
introduced. It also includes two special identifiers. One of them
indicates when the default segment register associated with instruction
operands shall be used. Another one indicates that the contents of the
segment register shall be ignored; this identifier is used when in long
mode.Improvements-by: Borislav Petkov
Signed-off-by: Ricardo Neri
Signed-off-by: Thomas Gleixner
Reviewed-by: Borislav Petkov
Cc: "Michael S. Tsirkin"
Cc: Peter Zijlstra
Cc: Dave Hansen
Cc: ricardo.neri@intel.com
Cc: Adrian Hunter
Cc: Paul Gortmaker
Cc: Huang Rui
Cc: Qiaowei Ren
Cc: Shuah Khan
Cc: Kees Cook
Cc: Jonathan Corbet
Cc: Jiri Slaby
Cc: Dmitry Vyukov
Cc: "Ravi V. Shankar"
Cc: Chris Metcalf
Cc: Brian Gerst
Cc: Arnaldo Carvalho de Melo
Cc: Andy Lutomirski
Cc: Colin Ian King
Cc: Chen Yucong
Cc: Adam Buchbinder
Cc: Vlastimil Babka
Cc: Lorenzo Stoakes
Cc: Masami Hiramatsu
Cc: Paolo Bonzini
Cc: Andrew Morton
Cc: Thomas Garnier
Link: https://lkml.kernel.org/r/1509135945-13762-14-git-send-email-ricardo.neri-calderon@linux.intel.com
Cc: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit f5b5fab1780c98b74526dbac527574bd02dc16f8 upstream
Update x86-opcode-map.txt based on the October 2017 Intel SDM publication.
Fix INVPID to INVVPID.
Add UD0 and UD1 instruction opcodes.Also sync the objtool and perf tooling copies of this file.
Signed-off-by: Randy Dunlap
Acked-by: Masami Hiramatsu
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Masami Hiramatsu
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/aac062d7-c0f6-96e3-5c92-ed299e2bd3da@infradead.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 14c47b54b0d9389e3ca0718e805cdd90c5a4303a upstream.
The new ORC unwinder breaks the build of a 64-bit kernel on a 32-bit
host. Building the kernel on a i386 or x32 host fails with:orc_dump.c: In function 'orc_dump':
orc_dump.c:105:26: error: passing argument 2 of 'elf_getshdrnum' from incompatible pointer type [-Werror=incompatible-pointer-types]
if (elf_getshdrnum(elf, &nr_sections)) {
^
In file included from /usr/local/include/gelf.h:32:0,
from elf.h:22,
from warn.h:26,
from orc_dump.c:20:
/usr/local/include/libelf.h:304:12: note: expected 'size_t * {aka unsigned int *}' but argument is of type 'long unsigned int *'
extern int elf_getshdrnum (Elf *__elf, size_t *__dst);
^~~~~~~~~~~~~~
orc_dump.c:190:17: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'Elf64_Sxword {aka long long int}' [-Werror=format=]
printf("%s+%lx:", name, rela.r_addend);
~~^ ~~~~~~~~~~~~~
%llxFix the build failure.
Another problem is that if the user specifies HOSTCC or HOSTLD
variables, they are ignored in the objtool makefile. Change the
Makefile to respect these variables.Signed-off-by: Mikulas Patocka
Signed-off-by: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Sven Joachim
Cc: Thomas Gleixner
Fixes: 627fce14809b ("objtool: Add ORC unwind table generation")
Link: http://lkml.kernel.org/r/19f0e64d8e07e30a7b307cd010eb780c404fe08d.1512252895.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit a356d2ae50790f49858ebed35da9e206336fafee upstream.
objtool grew this new warning:
Warning: synced file at 'tools/objtool/arch/x86/include/asm/inat.h' differs from latest kernel version at 'arch/x86/include/asm/inat.h'
which upstream header grew new INAT_SEG_* definitions.
Sync up the tooling version of the header.
Reported-by: Linus Torvalds
Cc: Josh Poimboeuf
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 9eb719855f6c9b21eb5889d9ac2ca1c60527ad89 upstream.
Stephen Rothwell reported this cross-compilation build failure:
| In file included from orc_dump.c:19:0:
| orc.h:21:10: fatal error: asm/orc_types.h: No such file or directory
| ...Caused by:
6a77cff819ae ("objtool: Move synced files to their original relative locations")
Use the proper arch header files location, not the host-arch location.
Bisected-by: Stephen Rothwell
Reported-by: Stephen Rothwell
Signed-off-by: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Linux-Next Mailing List
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20171108030152.bd76eahiwjwjt3kp@treble
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 3bd51c5a371de917e4e7401c9df006b5998579df upstream.
Replace the nasty diff checks in the objtool Makefile with a clean bash
script, and make the warnings more specific.Heavily inspired by tools/perf/check-headers.sh.
Suggested-by: Ingo Molnar
Signed-off-by: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/ab015f15ccd8c0c6008493c3c6ee3d495eaf2927.1509974346.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
commit 6a77cff819ae3e31992bde6432c9b5720748a89b upstream.
This will enable more straightforward comparisons, and it also makes the
files 100% identical.Suggested-by: Ingo Molnar
Signed-off-by: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/407b2aaa317741f48fcf821592c0e96ab3be1890.1509974346.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
This reverts commit 9704f8147e88213f2fa580f713b42b08a4f1a7d2 which was
upstream commit a94b9367e044ba672c9f4105eb1516ff6ff4948a.Shouldn't have been here, sorry about that.
Reported-by: Chris Rankin
Reported-by: Willy Tarreau
Cc: Ido Schimmel
Cc: Ozgur
Cc: Wei Wang
Cc: Martin KaFai Lau
Cc: Eric Dumazet
Cc: David S. Miller
Cc: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
25 Dec, 2017
9 commits
-
commit d15155824c5014803d91b829736d249c500bdda6 upstream.
linux/compiler.h is included indirectly by linux/types.h via
uapi/linux/types.h -> uapi/linux/posix_types.h -> linux/stddef.h
-> uapi/linux/stddef.h and is needed to provide a proper definition of
offsetof.Unfortunately, compiler.h requires a definition of
smp_read_barrier_depends() for defining lockless_dereference() and soon
for defining READ_ONCE(), which means that all
users of READ_ONCE() will need to include asm/barrier.h to avoid splats
such as:In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from arch/h8300/kernel/asm-offsets.c:11:
include/linux/list.h: In function 'list_empty':
>> include/linux/compiler.h:343:2: error: implicit declaration of function 'smp_read_barrier_depends' [-Werror=implicit-function-declaration]
smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
^A better alternative is to include asm/barrier.h in linux/compiler.h,
but this requires a type definition for "bool" on some architectures
(e.g. x86), which is defined later by linux/types.h. Type "bool" is also
used directly in linux/compiler.h, so the whole thing is pretty fragile.This patch splits compiler.h in two: compiler_types.h contains type
annotations, definitions and the compiler-specific parts, whereas
compiler.h #includes compiler-types.h and additionally defines macros
such as {READ,WRITE.ACCESS}_ONCE().uapi/linux/stddef.h and linux/linkage.h are then moved over to include
linux/compiler_types.h, which fixes the build for h8 and blackfin.Signed-off-by: Will Deacon
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1508840570-22169-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 2255f8d520b0a318fc6d387d0940854b2f522a7f ]
These tests should cover the following cases:
- MOV with both zero-extended and sign-extended immediates
- implicit truncation of register contents via ALU32/MOV32
- implicit 32-bit truncation of ALU32 output
- oversized register source operand for ALU32 shift
- right-shift of a number that could be positive or negative
- map access where adding the operation size to the offset causes signed
32-bit overflow
- direct stack access at a ~4GiB offsetAlso remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests
that should fail independent of what flags userspace passes.Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Alexei Starovoitov
[ Upstream commit bb7f0f989ca7de1153bd128a40a71709e339fa03 ]
There were various issues related to the limited size of integers used in
the verifier:
- `off + size` overflow in __check_map_access()
- `off + reg->off` overflow in check_mem_access()
- `off + reg->var_off.value` overflow or 32-bit truncation of
`reg->var_off.value` in check_mem_access()
- 32-bit truncation in check_stack_boundary()Make sure that any integer math cannot overflow by not allowing
pointer math with large values.Also reduce the scope of "scalar op scalar" tracking.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 179d1c5602997fef5a940c6ddcf31212cbfebd14 ]
This could be made safe by passing through a reference to env and checking
for env->allow_ptr_leaks, but it would only work one way and is probably
not worth the hassle - not doing it will not directly lead to program
rejection.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit a5ec6ae161d72f01411169a938fa5f8baea16e8f ]
Force strict alignment checks for stack pointers because the tracking of
stack spills relies on it; unaligned stack accesses can lead to corruption
of spilled registers, which is exploitable.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
Prevent indirect stack accesses at non-constant addresses, which would
permit reading and corrupting spilled pointers.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 468f6eafa6c44cb2c5d8aad35e12f06c240a812a ]
32-bit ALU ops operate on 32-bit values and have 32-bit outputs.
Adjust the verifier accordingly.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 0c17d1d2c61936401f4702e1846e2c19b200f958 ]
Properly handle register truncation to a smaller size.
The old code first mirrors the clearing of the high 32 bits in the bitwise
tristate representation, which is correct. But then, it computes the new
arithmetic bounds as the intersection between the old arithmetic bounds and
the bounds resulting from the bitwise tristate representation. Therefore,
when coerce_reg_to_32() is called on a number with bounds
[0xffff'fff8, 0x1'0000'0007], the verifier computes
[0xffff'fff8, 0xffff'ffff] as bounds of the truncated number.
This is incorrect: The truncated number could also be in the range [0, 7],
and no meaningful arithmetic bounds can be computed in that case apart from
the obvious [0, 0xffff'ffff].Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.Debian assigned CVE-2017-16996 for this issue.
v2:
- flip the mask during arithmetic bounds calculation (Ben Hutchings)
v3:
- add CVE number (Ben Hutchings)Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values")
Signed-off-by: Jann Horn
Acked-by: Edward Cree
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman