30 Dec, 2017

31 commits

  • commit 613e396bc0d4c7604fba23256644e78454c68cf6 upstream.

    init_espfix_bsp() needs to be invoked before the page table isolation
    initialization. Move it into mm_init() which is the place where pti_init()
    will be added.

    While at it get rid of the #ifdeffery and provide proper stub functions.

    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 92a0f81d89571e3e8759366e050ee05cc545ef99 upstream.

    Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
    and 0-day already hit a case where the fixmap PTEs were cleared by
    cleanup_highmap().

    Aside of that the fixmap API is a pain as it's all backwards.

    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit ed1bbc40a0d10e0c5c74fe7bdc6298295cf40255 upstream.

    Separate the cpu_entry_area code out of cpu/common.c and the fixmap.

    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 1a3b0caeb77edeac5ce5fa05e6a61c474c9a9745 upstream.

    Unclutter tlbflush.h a little.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit dd95f1a4b5ca904c78e6a097091eb21436478abb upstream.

    There are effectively two ASID types:

    1. The one stored in the mmu_context that goes from 0..5
    2. The one programmed into the hardware that goes from 1..6

    This consolidates the locations where converting between the two (by doing
    a +1) to a single place which gives us a nice place to comment.
    PAGE_TABLE_ISOLATION will also need to, given an ASID, know which hardware
    ASID to flush for the userspace mapping.

    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • commit cb0a9144a744e55207e24dcef812f05cd15a499a upstream.

    First, it's nice to remove the magic numbers.

    Second, PAGE_TABLE_ISOLATION is going to consume half of the available ASID
    space. The space is currently unused, but add a comment to spell out this
    new restriction.

    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • commit 50fb83a62cf472dc53ba23bd3f7bd6c1b2b3b53e upstream.

    For flushing the TLB, the ASID which has been programmed into the hardware
    must be known. That differs from what is in 'cpu_tlbstate'.

    Add functions to transform the 'cpu_tlbstate' values into to the one
    programmed into the hardware (CR3).

    It's not easy to include mmu_context.h into tlbflush.h, so just move the
    CR3 building over to tlbflush.h.

    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • commit 3f67af51e56f291d7417d77c4f67cd774633c5e1 upstream.

    Per popular request..

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit b5fc6d943808b570bdfbec80f40c6b3855f1c48b upstream.

    atomic64_inc_return() already implies smp_mb() before and after.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit a501686b2923ce6f2ff2b1d0d50682c6411baf72 upstream.

    __flush_tlb_single() is for user mappings, __flush_tlb_one() for
    kernel mappings.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 23cb7d46f371844c004784ad9552a57446f73e5a upstream.

    Commit:

    ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")

    ... grubbed into tlbflush internals without coherent explanation.

    Since it says its a precaution and the SDM doesn't mention anything like
    this, take it out back.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: fenghua.yu@intel.com
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 3e46e0f5ee3643a1239be9046c7ba6c66ca2b329 upstream.

    Since uv_flush_tlb_others() implements flush_tlb_others() which is
    about flushing user mappings, we should use __flush_tlb_single(),
    which too is about flushing user mappings.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Acked-by: Andrew Banman
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Mike Travis
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 4fe2d8b11a370af286287a2661de9d4e6c9a145a upstream.

    If the kernel oopses while on the trampoline stack, it will print
    "" even if SYSENTER is not involved. That is rather confusing.

    The "SYSENTER" stack is used for a lot more than SYSENTER now. Give it a
    better string to display in stack dumps, and rename the kernel code to
    match.

    Also move the 32-bit code over to the new naming even though it still uses
    the entry stack only for SYSENTER.

    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • commit e8ffe96e5933d417195268478479933d56213a3f upstream.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 5a7ccf4754fb3660569a6de52ba7f7fc3dfaf280 upstream.

    The old docs had the vsyscall range wrong and were missing the fixmap.
    Fix both.

    There used to be 8 MB reserved for future vsyscalls, but that's long gone.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Dave Hansen
    Cc: David Laight
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kees Cook
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     
  • commit a4828f81037f491b2cc986595e3a969a6eeb2fb5 upstream.

    The LDT is inherited across fork() or exec(), but that makes no sense
    at all because exec() is supposed to start the process clean.

    The reason why this happens is that init_new_context_ldt() is called from
    init_new_context() which obviously needs to be called for both fork() and
    exec().

    It would be surprising if anything relies on that behaviour, so it seems to
    be safe to remove that misfeature.

    Split the context initialization into two parts. Clear the LDT pointer and
    initialize the mutex from the general context init and move the LDT
    duplication to arch_dup_mmap() which is only called on fork().

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Andy Lutomirsky
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: dan.j.williams@intel.com
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: kirill.shutemov@linux.intel.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit c2b3496bb30bd159e9de42e5c952e1f1f33c9a77 upstream.

    The LDT is duplicated on fork() and on exec(), which is wrong as exec()
    should start from a clean state, i.e. without LDT. To fix this the LDT
    duplication code will be moved into arch_dup_mmap() which is only called
    for fork().

    This introduces a locking problem. arch_dup_mmap() holds mmap_sem of the
    parent process, but the LDT duplication code needs to acquire
    mm->context.lock to access the LDT data safely, which is the reverse lock
    order of write_ldt() where mmap_sem nests into context.lock.

    Solve this by introducing a new rw semaphore which serializes the
    read/write_ldt() syscall operations and use context.lock to protect the
    actual installment of the LDT descriptor.

    So context.lock stabilizes mm->context.ldt and can nest inside of the new
    semaphore or mmap_sem.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Andy Lutomirsky
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: dan.j.williams@intel.com
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: kirill.shutemov@linux.intel.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit c10e83f598d08046dd1ebc8360d4bb12d802d51b upstream.

    In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
    allowed to fail. Fix up all instances.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Andy Lutomirsky
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: dan.j.williams@intel.com
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: kirill.shutemov@linux.intel.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 4831b779403a836158917d59a7ca880483c67378 upstream.

    If something goes wrong with pagetable setup, vsyscall=native will
    accidentally fall back to emulation. Make it warn and fail so that we
    notice.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     
  • commit 49275fef986abfb8b476e4708aaecc07e7d3e087 upstream.

    The kernel is very erratic as to which pagetables have _PAGE_USER set. The
    vsyscall page gets lucky: it seems that all of the relevant pagetables are
    among the apparently arbitrary ones that set _PAGE_USER. Rather than
    relying on chance, just explicitly set _PAGE_USER.

    This will let us clean up pagetable setup to stop setting _PAGE_USER. The
    added code can also be reused by pagetable isolation to manage the
    _PAGE_USER bit in the usermode tables.

    [ tglx: Folded paravirt fix from Juergen Gross ]

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     
  • commit 146122e24bdf208015d629babba673e28d090709 upstream.

    The address hints are a trainwreck. The array entry numbers have to kept
    magically in sync with the actual hints, which is doomed as some of the
    array members are initialized at runtime via the entry numbers.

    Designated initializers have been around before this code was
    implemented....

    Use the entry numbers to populate the address hints array and add the
    missing bits and pieces. Split 32 and 64 bit for readability sake.

    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit c05344947b37f7cda726e802457370bc6eac4d26 upstream.

    The check for a present page in printk_prot():

    if (!pgprot_val(prot)) {
    /* Not present */

    is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and
    the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when
    analyzing mapping correctness. Check for the present bit to make an
    informed decision.

    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 7bbcbd3d1cdcbacd0f9f8dc4c98d550972f1ca30 upstream.

    The recent cpu_entry_area changes fail to compile on 32-bit when BIGSMP=y
    and NR_CPUS=512, because the fixmap area becomes too big.

    Limit the number of CPUs with BIGSMP to 64, which is already way to big for
    32-bit, but it's at least a working limitation.

    We performed a quick survey of 32-bit-only machines that might be affected
    by this change negatively, but found none.

    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 32d0b95300db03c2b23b2ea2c94769a4a138e79d upstream.

    [note, only the inat.h portion, to get objtool back in sync - gregkh]

    b0caa8c8c6bbc422bc3c32b64852d6d618f32b49 Mon Sep 17 00:00:00 2001
    When computing a linear address and segmentation is used, we need to know
    the base address of the segment involved in the computation. In most of
    the cases, the segment base address will be zero as in USER_DS/USER32_DS.
    However, it may be possible that a user space program defines its own
    segments via a local descriptor table. In such a case, the segment base
    address may not be zero. Thus, the segment base address is needed to
    calculate correctly the linear address.

    If running in protected mode, the segment selector to be used when
    computing a linear address is determined by either any of segment override
    prefixes in the instruction or inferred from the registers involved in the
    computation of the effective address; in that order. Also, there are cases
    when the segment override prefixes shall be ignored (i.e., code segments
    are always selected by the CS segment register; string instructions always
    use the ES segment register when using rDI register as operand). In long
    mode, segment registers are ignored, except for FS and GS. In these two
    cases, base addresses are obtained from the respective MSRs.

    For clarity, this process can be split into four steps (and an equal
    number of functions): determine if segment prefixes overrides can be used;
    parse the segment override prefixes, and use them if found; if not found
    or cannot be used, use the default segment registers associated with the
    operand registers. Once the segment register to use has been identified,
    read its value to obtain the segment selector.

    The method to obtain the segment selector depends on several factors. In
    32-bit builds, segment selectors are saved into a pt_regs structure
    when switching to kernel mode. The same is also true for virtual-8086
    mode. In 64-bit builds, segmentation is mostly ignored, except when
    running a program in 32-bit legacy mode. In this case, CS and SS can be
    obtained from pt_regs. DS, ES, FS and GS can be read directly from
    the respective segment registers.

    In order to identify the segment registers, a new set of #defines is
    introduced. It also includes two special identifiers. One of them
    indicates when the default segment register associated with instruction
    operands shall be used. Another one indicates that the contents of the
    segment register shall be ignored; this identifier is used when in long
    mode.

    Improvements-by: Borislav Petkov
    Signed-off-by: Ricardo Neri
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Borislav Petkov
    Cc: "Michael S. Tsirkin"
    Cc: Peter Zijlstra
    Cc: Dave Hansen
    Cc: ricardo.neri@intel.com
    Cc: Adrian Hunter
    Cc: Paul Gortmaker
    Cc: Huang Rui
    Cc: Qiaowei Ren
    Cc: Shuah Khan
    Cc: Kees Cook
    Cc: Jonathan Corbet
    Cc: Jiri Slaby
    Cc: Dmitry Vyukov
    Cc: "Ravi V. Shankar"
    Cc: Chris Metcalf
    Cc: Brian Gerst
    Cc: Arnaldo Carvalho de Melo
    Cc: Andy Lutomirski
    Cc: Colin Ian King
    Cc: Chen Yucong
    Cc: Adam Buchbinder
    Cc: Vlastimil Babka
    Cc: Lorenzo Stoakes
    Cc: Masami Hiramatsu
    Cc: Paolo Bonzini
    Cc: Andrew Morton
    Cc: Thomas Garnier
    Link: https://lkml.kernel.org/r/1509135945-13762-14-git-send-email-ricardo.neri-calderon@linux.intel.com
    Cc: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ricardo Neri
     
  • commit f5b5fab1780c98b74526dbac527574bd02dc16f8 upstream

    Update x86-opcode-map.txt based on the October 2017 Intel SDM publication.
    Fix INVPID to INVVPID.
    Add UD0 and UD1 instruction opcodes.

    Also sync the objtool and perf tooling copies of this file.

    Signed-off-by: Randy Dunlap
    Acked-by: Masami Hiramatsu
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/aac062d7-c0f6-96e3-5c92-ed299e2bd3da@infradead.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Randy Dunlap
     
  • commit 14c47b54b0d9389e3ca0718e805cdd90c5a4303a upstream.

    The new ORC unwinder breaks the build of a 64-bit kernel on a 32-bit
    host. Building the kernel on a i386 or x32 host fails with:

    orc_dump.c: In function 'orc_dump':
    orc_dump.c:105:26: error: passing argument 2 of 'elf_getshdrnum' from incompatible pointer type [-Werror=incompatible-pointer-types]
    if (elf_getshdrnum(elf, &nr_sections)) {
    ^
    In file included from /usr/local/include/gelf.h:32:0,
    from elf.h:22,
    from warn.h:26,
    from orc_dump.c:20:
    /usr/local/include/libelf.h:304:12: note: expected 'size_t * {aka unsigned int *}' but argument is of type 'long unsigned int *'
    extern int elf_getshdrnum (Elf *__elf, size_t *__dst);
    ^~~~~~~~~~~~~~
    orc_dump.c:190:17: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'Elf64_Sxword {aka long long int}' [-Werror=format=]
    printf("%s+%lx:", name, rela.r_addend);
    ~~^ ~~~~~~~~~~~~~
    %llx

    Fix the build failure.

    Another problem is that if the user specifies HOSTCC or HOSTLD
    variables, they are ignored in the objtool makefile. Change the
    Makefile to respect these variables.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sven Joachim
    Cc: Thomas Gleixner
    Fixes: 627fce14809b ("objtool: Add ORC unwind table generation")
    Link: http://lkml.kernel.org/r/19f0e64d8e07e30a7b307cd010eb780c404fe08d.1512252895.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • commit a356d2ae50790f49858ebed35da9e206336fafee upstream.

    objtool grew this new warning:

    Warning: synced file at 'tools/objtool/arch/x86/include/asm/inat.h' differs from latest kernel version at 'arch/x86/include/asm/inat.h'

    which upstream header grew new INAT_SEG_* definitions.

    Sync up the tooling version of the header.

    Reported-by: Linus Torvalds
    Cc: Josh Poimboeuf
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ingo Molnar
     
  • commit 9eb719855f6c9b21eb5889d9ac2ca1c60527ad89 upstream.

    Stephen Rothwell reported this cross-compilation build failure:

    | In file included from orc_dump.c:19:0:
    | orc.h:21:10: fatal error: asm/orc_types.h: No such file or directory
    | ...

    Caused by:

    6a77cff819ae ("objtool: Move synced files to their original relative locations")

    Use the proper arch header files location, not the host-arch location.

    Bisected-by: Stephen Rothwell
    Reported-by: Stephen Rothwell
    Signed-off-by: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Linux-Next Mailing List
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20171108030152.bd76eahiwjwjt3kp@treble
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Josh Poimboeuf
     
  • commit 3bd51c5a371de917e4e7401c9df006b5998579df upstream.

    Replace the nasty diff checks in the objtool Makefile with a clean bash
    script, and make the warnings more specific.

    Heavily inspired by tools/perf/check-headers.sh.

    Suggested-by: Ingo Molnar
    Signed-off-by: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/ab015f15ccd8c0c6008493c3c6ee3d495eaf2927.1509974346.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Josh Poimboeuf
     
  • commit 6a77cff819ae3e31992bde6432c9b5720748a89b upstream.

    This will enable more straightforward comparisons, and it also makes the
    files 100% identical.

    Suggested-by: Ingo Molnar
    Signed-off-by: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/407b2aaa317741f48fcf821592c0e96ab3be1890.1509974346.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Josh Poimboeuf
     
  • This reverts commit 9704f8147e88213f2fa580f713b42b08a4f1a7d2 which was
    upstream commit a94b9367e044ba672c9f4105eb1516ff6ff4948a.

    Shouldn't have been here, sorry about that.

    Reported-by: Chris Rankin
    Reported-by: Willy Tarreau
    Cc: Ido Schimmel
    Cc: Ozgur
    Cc: Wei Wang
    Cc: Martin KaFai Lau
    Cc: Eric Dumazet
    Cc: David S. Miller
    Cc: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

25 Dec, 2017

9 commits

  • Greg Kroah-Hartman
     
  • commit d15155824c5014803d91b829736d249c500bdda6 upstream.

    linux/compiler.h is included indirectly by linux/types.h via
    uapi/linux/types.h -> uapi/linux/posix_types.h -> linux/stddef.h
    -> uapi/linux/stddef.h and is needed to provide a proper definition of
    offsetof.

    Unfortunately, compiler.h requires a definition of
    smp_read_barrier_depends() for defining lockless_dereference() and soon
    for defining READ_ONCE(), which means that all
    users of READ_ONCE() will need to include asm/barrier.h to avoid splats
    such as:

    In file included from include/uapi/linux/stddef.h:1:0,
    from include/linux/stddef.h:4,
    from arch/h8300/kernel/asm-offsets.c:11:
    include/linux/list.h: In function 'list_empty':
    >> include/linux/compiler.h:343:2: error: implicit declaration of function 'smp_read_barrier_depends' [-Werror=implicit-function-declaration]
    smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
    ^

    A better alternative is to include asm/barrier.h in linux/compiler.h,
    but this requires a type definition for "bool" on some architectures
    (e.g. x86), which is defined later by linux/types.h. Type "bool" is also
    used directly in linux/compiler.h, so the whole thing is pretty fragile.

    This patch splits compiler.h in two: compiler_types.h contains type
    annotations, definitions and the compiler-specific parts, whereas
    compiler.h #includes compiler-types.h and additionally defines macros
    such as {READ,WRITE.ACCESS}_ONCE().

    uapi/linux/stddef.h and linux/linkage.h are then moved over to include
    linux/compiler_types.h, which fixes the build for h8 and blackfin.

    Signed-off-by: Will Deacon
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1508840570-22169-2-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     
  • From: Jann Horn

    [ Upstream commit 2255f8d520b0a318fc6d387d0940854b2f522a7f ]

    These tests should cover the following cases:

    - MOV with both zero-extended and sign-extended immediates
    - implicit truncation of register contents via ALU32/MOV32
    - implicit 32-bit truncation of ALU32 output
    - oversized register source operand for ALU32 shift
    - right-shift of a number that could be positive or negative
    - map access where adding the operation size to the offset causes signed
    32-bit overflow
    - direct stack access at a ~4GiB offset

    Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests
    that should fail independent of what flags userspace passes.

    Signed-off-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Alexei Starovoitov

    [ Upstream commit bb7f0f989ca7de1153bd128a40a71709e339fa03 ]

    There were various issues related to the limited size of integers used in
    the verifier:
    - `off + size` overflow in __check_map_access()
    - `off + reg->off` overflow in check_mem_access()
    - `off + reg->var_off.value` overflow or 32-bit truncation of
    `reg->var_off.value` in check_mem_access()
    - 32-bit truncation in check_stack_boundary()

    Make sure that any integer math cannot overflow by not allowing
    pointer math with large values.

    Also reduce the scope of "scalar op scalar" tracking.

    Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
    Reported-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Jann Horn

    [ Upstream commit 179d1c5602997fef5a940c6ddcf31212cbfebd14 ]

    This could be made safe by passing through a reference to env and checking
    for env->allow_ptr_leaks, but it would only work one way and is probably
    not worth the hassle - not doing it will not directly lead to program
    rejection.

    Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
    Signed-off-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Jann Horn

    [ Upstream commit a5ec6ae161d72f01411169a938fa5f8baea16e8f ]

    Force strict alignment checks for stack pointers because the tracking of
    stack spills relies on it; unaligned stack accesses can lead to corruption
    of spilled registers, which is exploitable.

    Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
    Signed-off-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Jann Horn

    Prevent indirect stack accesses at non-constant addresses, which would
    permit reading and corrupting spilled pointers.

    Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
    Signed-off-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Jann Horn

    [ Upstream commit 468f6eafa6c44cb2c5d8aad35e12f06c240a812a ]

    32-bit ALU ops operate on 32-bit values and have 32-bit outputs.
    Adjust the verifier accordingly.

    Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
    Signed-off-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Jann Horn

    [ Upstream commit 0c17d1d2c61936401f4702e1846e2c19b200f958 ]

    Properly handle register truncation to a smaller size.

    The old code first mirrors the clearing of the high 32 bits in the bitwise
    tristate representation, which is correct. But then, it computes the new
    arithmetic bounds as the intersection between the old arithmetic bounds and
    the bounds resulting from the bitwise tristate representation. Therefore,
    when coerce_reg_to_32() is called on a number with bounds
    [0xffff'fff8, 0x1'0000'0007], the verifier computes
    [0xffff'fff8, 0xffff'ffff] as bounds of the truncated number.
    This is incorrect: The truncated number could also be in the range [0, 7],
    and no meaningful arithmetic bounds can be computed in that case apart from
    the obvious [0, 0xffff'ffff].

    Starting with v4.14, this is exploitable by unprivileged users as long as
    the unprivileged_bpf_disabled sysctl isn't set.

    Debian assigned CVE-2017-16996 for this issue.

    v2:
    - flip the mask during arithmetic bounds calculation (Ben Hutchings)
    v3:
    - add CVE number (Ben Hutchings)

    Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values")
    Signed-off-by: Jann Horn
    Acked-by: Edward Cree
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann