13 Jan, 2019

2 commits

  • [ Upstream commit 254eb5505ca0ca749d3a491fc6668b6c16647a99 ]

    The LDT remap placement has been changed. It's now placed before the direct
    mapping in the kernel virtual address space for both paging modes.

    Change address markers order accordingly.

    Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Thomas Gleixner
    Cc: bp@alien8.de
    Cc: hpa@zytor.com
    Cc: dave.hansen@linux.intel.com
    Cc: luto@kernel.org
    Cc: peterz@infradead.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: bhe@redhat.com
    Cc: hans.van.kranenburg@mendix.com
    Cc: linux-mm@kvack.org
    Cc: xen-devel@lists.xenproject.org
    Link: https://lkml.kernel.org/r/20181130202328.65359-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Sasha Levin

    Kirill A. Shutemov
     
  • [ Upstream commit 16877a5570e0c5f4270d5b17f9bab427bcae9514 ]

    There is a guard hole at the beginning of the kernel address space, also
    used by hypervisors. It occupies 16 PGD entries.

    This reserved range is not defined explicitely, it is calculated relative
    to other entities: direct mapping and user space ranges.

    The calculation got broken by recent changes of the kernel memory layout:
    LDT remap range is now mapped before direct mapping and makes the
    calculation invalid.

    The breakage leads to crash on Xen dom0 boot[1].

    Define the reserved range explicitely. It's part of kernel ABI (hypervisors
    expect it to be stable) and must not depend on changes in the rest of
    kernel memory layout.

    [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html

    Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
    Reported-by: Hans van Kranenburg
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Thomas Gleixner
    Tested-by: Hans van Kranenburg
    Reviewed-by: Juergen Gross
    Cc: bp@alien8.de
    Cc: hpa@zytor.com
    Cc: dave.hansen@linux.intel.com
    Cc: luto@kernel.org
    Cc: peterz@infradead.org
    Cc: boris.ostrovsky@oracle.com
    Cc: bhe@redhat.com
    Cc: linux-mm@kvack.org
    Cc: xen-devel@lists.xenproject.org
    Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Sasha Levin

    Kirill A. Shutemov
     

10 Jan, 2019

2 commits

  • commit ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d upstream.

    Commit:

    f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()"

    addressed a case where __flush_tlb_all() is called without preemption
    being disabled. It also left a warning to catch other cases where
    preemption is not disabled.

    That warning triggers for the memory hotplug path which is also used for
    persistent memory enabling:

    WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
    RIP: 0010:__flush_tlb_all+0x1b/0x3a
    [..]
    Call Trace:
    phys_pud_init+0x29c/0x2bb
    kernel_physical_mapping_init+0xfc/0x219
    init_memory_mapping+0x1a5/0x3b0
    arch_add_memory+0x2c/0x50
    devm_memremap_pages+0x3aa/0x610
    pmem_attach_disk+0x585/0x700 [nd_pmem]

    Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
    and Dave confirmed the expectation for TLB flush is for modifying /
    invalidating existing PTE entries, but not initial population [2]. Drop
    the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
    expectation that this path is only ever populating empty entries for the
    linear map. Note, at linear map teardown time there is a call to the
    all-cpu flush_tlb_all() to invalidate the removed mappings.

    [1]: https://lkml.kernel.org/r/9DFD717D-857D-493D-A606-B635D72BAC21@amacapital.net
    [2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa0b5@intel.com

    [ mingo: Minor readability edits. ]

    Suggested-by: Dave Hansen
    Reported-by: Andy Lutomirski
    Signed-off-by: Dan Williams
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Kirill A. Shutemov
    Cc:
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Sebastian Andrzej Siewior
    Cc: Thomas Gleixner
    Cc: dave.hansen@intel.com
    Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
    Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     
  • commit 5b5e4d623ec8a34689df98e42d038a3b594d2ff9 upstream.

    Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever
    the system is deemed affected by L1TF vulnerability. Even though the limit
    is quite high for most deployments it seems to be too restrictive for
    deployments which are willing to live with the mitigation disabled.

    We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is
    clearly out of the limit.

    Drop the swap restriction when l1tf=off is specified. It also doesn't make
    much sense to warn about too much memory for the l1tf mitigation when it is
    forcefully disabled by the administrator.

    [ tglx: Folded the documentation delta change ]

    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Signed-off-by: Michal Hocko
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Andi Kleen
    Acked-by: Jiri Kosina
    Cc: Linus Torvalds
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Borislav Petkov
    Cc:
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     

06 Dec, 2018

2 commits

  • commit 4c71a2b6fd7e42814aa68a6dec88abf3b42ea573 upstream

    The IBPB speculation barrier is issued from switch_mm() when the kernel
    switches to a user space task with a different mm than the user space task
    which ran last on the same CPU.

    An additional optimization is to avoid IBPB when the incoming task can be
    ptraced by the outgoing task. This optimization only works when switching
    directly between two user space tasks. When switching from a kernel task to
    a user space task the optimization fails because the previous task cannot
    be accessed anymore. So for quite some scenarios the optimization is just
    adding overhead.

    The upcoming conditional IBPB support will issue IBPB only for user space
    tasks which have the TIF_SPEC_IB bit set. This requires to handle the
    following cases:

    1) Switch from a user space task (potential attacker) which has
    TIF_SPEC_IB set to a user space task (potential victim) which has
    TIF_SPEC_IB not set.

    2) Switch from a user space task (potential attacker) which has
    TIF_SPEC_IB not set to a user space task (potential victim) which has
    TIF_SPEC_IB set.

    This needs to be optimized for the case where the IBPB can be avoided when
    only kernel threads ran in between user space tasks which belong to the
    same process.

    The current check whether two tasks belong to the same context is using the
    tasks context id. While correct, it's simpler to use the mm pointer because
    it allows to mangle the TIF_SPEC_IB bit into it. The context id based
    mechanism requires extra storage, which creates worse code.

    When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into
    the per CPU storage which is used to track the last user space mm which was
    running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of
    the incoming task to make the decision whether IBPB needs to be issued or
    not to cover the two cases above.

    As conditional IBPB is going to be the default, remove the dubious ptrace
    check for the IBPB always case and simply issue IBPB always when the
    process changes.

    Move the storage to a different place in the struct as the original one
    created a hole.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Cc: Jiri Kosina
    Cc: Tom Lendacky
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Cc: Tim Chen
    Cc: Andi Kleen
    Cc: Dave Hansen
    Cc: Casey Schaufler
    Cc: Asit Mallick
    Cc: Arjan van de Ven
    Cc: Jon Masters
    Cc: Waiman Long
    Cc: Greg KH
    Cc: Dave Stewart
    Cc: Kees Cook
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream

    Currently, IBPB is only issued in cases when switching into a non-dumpable
    process, the rationale being to protect such 'important and security
    sensitive' processess (such as GPG) from data leaking into a different
    userspace process via spectre v2.

    This is however completely insufficient to provide proper userspace-to-userpace
    spectrev2 protection, as any process can poison branch buffers before being
    scheduled out, and the newly scheduled process immediately becomes spectrev2
    victim.

    In order to minimize the performance impact (for usecases that do require
    spectrev2 protection), issue the barrier only in cases when switching between
    processess where the victim can't be ptraced by the potential attacker (as in
    such cases, the attacker doesn't have to bother with branch buffers at all).

    [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
    PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
    fine-grained ]

    Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
    Originally-by: Tim Chen
    Signed-off-by: Jiri Kosina
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Josh Poimboeuf
    Cc: Andrea Arcangeli
    Cc: "WoodhouseDavid"
    Cc: Andi Kleen
    Cc: "SchauflerCasey"
    Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm
    Signed-off-by: Greg Kroah-Hartman

    Jiri Kosina
     

14 Nov, 2018

1 commit

  • commit f77084d96355f5fba8e2c1fb3a51a393b1570de7 upstream.

    The WARN_ON_ONCE(__read_cr3() != build_cr3()) in switch_mm_irqs_off()
    triggers every once in a while during a snapshotted system upgrade.

    The warning triggers since commit decab0888e6e ("x86/mm: Remove
    preempt_disable/enable() from __native_flush_tlb()"). The callchain is:

    get_page_from_freelist() -> post_alloc_hook() -> __kernel_map_pages()

    with CONFIG_DEBUG_PAGEALLOC enabled.

    Disable preemption during CR3 reset / __flush_tlb_all() and add a comment
    why preemption has to be disabled so it won't be removed accidentaly.

    Add another preemptible() check in __flush_tlb_all() to catch callers with
    enabled preemption when PGE is enabled, because PGE enabled does not
    trigger the warning in __native_flush_tlb(). Suggested by Andy Lutomirski.

    Fixes: decab0888e6e ("x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()")
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20181017103432.zgv46nlu3hc7k4rq@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Andrzej Siewior
     

04 Oct, 2018

3 commits

  • [ Upstream commit ff924c5a1ec7548825cc2d07980b03be4224ffac ]

    Fix the section mismatch warning in arch/x86/mm/pti.c:

    WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte()
    The function pti_clone_pgtable() references
    the function __init pti_user_pagetable_walk_pte().
    This is often because pti_clone_pgtable lacks a __init
    annotation or the annotation of pti_user_pagetable_walk_pte is wrong.
    FATAL: modpost: Section mismatches detected.

    Fixes: 85900ea51577 ("x86/pti: Map the vsyscall page if needed")
    Reported-by: kbuild test robot
    Signed-off-by: Randy Dunlap
    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Randy Dunlap
     
  • commit 05ab1d8a4b36ee912b7087c6da127439ed0a903e upstream.

    We met a kernel panic when enabling earlycon, which is due to the fixmap
    address of earlycon is not statically setup.

    Currently the static fixmap setup in head_64.S only covers 2M virtual
    address space, while it actually could be in 4M space with different
    kernel configurations, e.g. when VSYSCALL emulation is disabled.

    So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2,
    and add a build time check to ensure that the fixmap is covered by the
    initial static page tables.

    Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable")
    Suggested-by: Thomas Gleixner
    Signed-off-by: Feng Tang
    Signed-off-by: Thomas Gleixner
    Tested-by: kernel test robot
    Reviewed-by: Juergen Gross (Xen parts)
    Cc: H Peter Anvin
    Cc: Peter Zijlstra
    Cc: Michal Hocko
    Cc: Yinghai Lu
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Andy Lutomirsky
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
    Signed-off-by: Greg Kroah-Hartman

    Feng Tang
     
  • [ Upstream commit 3b6c62f363a19ce82bf378187ab97c9dc01e3927 ]

    Without this change the distance table calculation for emulated nodes
    may use the wrong numa node and report an incorrect distance.

    Signed-off-by: Dan Williams
    Cc: David Rientjes
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Wei Yang
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/153089328103.27680.14778434392225818887.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

26 Sep, 2018

3 commits

  • [ Upstream commit 935232ce28dfabff1171e5a7113b2d865fa9ee63 ]

    The addr counter will overflow if the last PMD of the address space is
    cloned, resulting in an endless loop.

    Check for that and bail out of the loop when it happens.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Thomas Gleixner
    Tested-by: Pavel Machek
    Cc: "H . Peter Anvin"
    Cc: linux-mm@kvack.org
    Cc: Linus Torvalds
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Jiri Kosina
    Cc: Boris Ostrovsky
    Cc: Brian Gerst
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: Andrea Arcangeli
    Cc: Waiman Long
    Cc: "David H . Gutteridge"
    Cc: joro@8bytes.org
    Link: https://lkml.kernel.org/r/1531906876-13451-25-git-send-email-joro@8bytes.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Joerg Roedel
     
  • [ Upstream commit 8c934e01a7ce685d98e970880f5941d79272c654 ]

    pti_user_pagetable_walk_pmd() can return NULL, so the return value should
    be checked to prevent a NULL pointer dereference.

    Add the check and a warning when the PMD allocation fails.

    Signed-off-by: Jiang Biao
    Signed-off-by: Thomas Gleixner
    Cc: dave.hansen@linux.intel.com
    Cc: luto@kernel.org
    Cc: hpa@zytor.com
    Cc: albcamus@gmail.com
    Cc: zhong.weidong@zte.com.cn
    Link: https://lkml.kernel.org/r/1532045192-49622-2-git-send-email-jiang.biao2@zte.com.cn
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jiang Biao
     
  • [ Upstream commit b2b7d986a89b6c94b1331a909de1217214fb08c1 ]

    pti_user_pagetable_walk_p4d() can return NULL, so the return value should
    be checked to prevent a NULL pointer dereference.

    Add the check and a warning when the P4D allocation fails.

    Signed-off-by: Jiang Biao
    Signed-off-by: Thomas Gleixner
    Cc: dave.hansen@linux.intel.com
    Cc: luto@kernel.org
    Cc: hpa@zytor.com
    Cc: albcamus@gmail.com
    Cc: zhong.weidong@zte.com.cn
    Link: https://lkml.kernel.org/r/1532045192-49622-1-git-send-email-jiang.biao2@zte.com.cn
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jiang Biao
     

20 Sep, 2018

1 commit

  • [ Upstream commit 6863ea0cda8725072522cd78bda332d9a0b73150 ]

    It is perfectly okay to take page-faults, especially on the
    vmalloc area while executing an NMI handler. Remove the
    warning.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Thomas Gleixner
    Tested-by: David H. Gutteridge
    Cc: "H . Peter Anvin"
    Cc: linux-mm@kvack.org
    Cc: Linus Torvalds
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Jiri Kosina
    Cc: Boris Ostrovsky
    Cc: Brian Gerst
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: Andrea Arcangeli
    Cc: Waiman Long
    Cc: Pavel Machek
    Cc: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: joro@8bytes.org
    Link: https://lkml.kernel.org/r/1532533683-5988-2-git-send-email-joro@8bytes.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Joerg Roedel
     

05 Sep, 2018

3 commits

  • commit 4012e77a903d114f915fc607d6d2ed54a3d6c9b1 upstream.

    A NMI can hit in the middle of context switching or in the middle of
    switch_mm_irqs_off(). In either case, CR3 might not match current->mm,
    which could cause copy_from_user_nmi() and friends to read the wrong
    memory.

    Fix it by adding a new nmi_uaccess_okay() helper and checking it in
    copy_from_user_nmi() and in __copy_from_user_nmi()'s callers.

    Signed-off-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Rik van Riel
    Cc: Nadav Amit
    Cc: Borislav Petkov
    Cc: Jann Horn
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     
  • commit b0a182f875689647b014bc01d36b340217792852 upstream.

    Two users have reported [1] that they have an "extremely unlikely" system
    with more than MAX_PA/2 memory and L1TF mitigation is not effective. In
    fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to
    holes in the e820 map, the main region is almost 500MB over the 32GB limit:

    [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000081effffff] usable

    Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the
    500MB revealed, that there's an off-by-one error in the check in
    l1tf_select_mitigation().

    l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range
    check in the mitigation path does not take this into account.

    Instead of amending the range check, make l1tf_pfn_limit() return the first
    PFN which is over the limit which is less error prone. Adjust the other
    users accordingly.

    [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

    Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
    Reported-by: George Anchev
    Reported-by: Christopher Snowhill
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Thomas Gleixner
    Cc: "H . Peter Anvin"
    Cc: Linus Torvalds
    Cc: Andi Kleen
    Cc: Dave Hansen
    Cc: Michal Hocko
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180823134418.17008-1-vbabka@suse.cz
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     
  • commit 9df9516940a61d29aedf4d91b483ca6597e7d480 upstream.

    On 32bit PAE kernels on 64bit hardware with enough physical bits,
    l1tf_pfn_limit() will overflow unsigned long. This in turn affects
    max_swapfile_size() and can lead to swapon returning -EINVAL. This has been
    observed in a 32bit guest with 42 bits physical address size, where
    max_swapfile_size() overflows exactly to 1 << 32, thus zero, and produces
    the following warning to dmesg:

    [ 6.396845] Truncating oversized swap area, only using 0k out of 2047996k

    Fix this by using unsigned long long instead.

    Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Reported-by: Dominique Leuenberger
    Reported-by: Adrian Schroeter
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Thomas Gleixner
    Acked-by: Andi Kleen
    Acked-by: Michal Hocko
    Cc: "H . Peter Anvin"
    Cc: Linus Torvalds
    Cc: Dave Hansen
    Cc: Michal Hocko
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180820095835.5298-1-vbabka@suse.cz
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     

18 Aug, 2018

3 commits

  • commit 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e upstream.

    ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
    a pud / pmd map. The following preconditions are met at their entry.
    - All pte entries for a target pud/pmd address range have been cleared.
    - System-wide TLB purges have been peformed for a target pud/pmd address
    range.

    The preconditions assure that there is no stale TLB entry for the range.
    Speculation may not cache TLB entries since it requires all levels of page
    entries, including ptes, to have P & A-bits set for an associated address.
    However, speculation may cache pud/pmd entries (paging-structure caches)
    when they have P-bit set.

    Add a system-wide TLB purge (INVLPG) to a single page after clearing
    pud/pmd entry's P-bit.

    SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
    states that:
    INVLPG invalidates all paging-structure caches associated with the
    current PCID regardless of the liner addresses to which they correspond.

    Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
    Signed-off-by: Toshi Kani
    Signed-off-by: Thomas Gleixner
    Cc: mhocko@suse.com
    Cc: akpm@linux-foundation.org
    Cc: hpa@zytor.com
    Cc: cpandya@codeaurora.org
    Cc: linux-mm@kvack.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Joerg Roedel
    Cc: stable@vger.kernel.org
    Cc: Andrew Morton
    Cc: Michal Hocko
    Cc: "H. Peter Anvin"
    Cc:
    Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
    Signed-off-by: Greg Kroah-Hartman

    Toshi Kani
     
  • commit 785a19f9d1dd8a4ab2d0633be4656653bd3de1fc upstream.

    The following kernel panic was observed on ARM64 platform due to a stale
    TLB entry.

    1. ioremap with 4K size, a valid pte page table is set.
    2. iounmap it, its pte entry is set to 0.
    3. ioremap the same address with 2M size, update its pmd entry with
    a new value.
    4. CPU may hit an exception because the old pmd entry is still in TLB,
    which leads to a kernel panic.

    Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
    table") has addressed this panic by falling to pte mappings in the above
    case on ARM64.

    To support pmd mappings in all cases, TLB purge needs to be performed
    in this case on ARM64.

    Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
    so that TLB purge can be added later in seprate patches.

    [toshi.kani@hpe.com: merge changes, rewrite patch description]
    Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
    Signed-off-by: Chintan Pandya
    Signed-off-by: Toshi Kani
    Signed-off-by: Thomas Gleixner
    Cc: mhocko@suse.com
    Cc: akpm@linux-foundation.org
    Cc: hpa@zytor.com
    Cc: linux-mm@kvack.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Will Deacon
    Cc: Joerg Roedel
    Cc: stable@vger.kernel.org
    Cc: Andrew Morton
    Cc: Michal Hocko
    Cc: "H. Peter Anvin"
    Cc:
    Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
    Signed-off-by: Greg Kroah-Hartman

    Chintan Pandya
     
  • commit f967db0b9ed44ec3057a28f3b28efc51df51b835 upstream.

    ioremap() supports pmd mappings on x86-PAE. However, kernel's pmd
    tables are not shared among processes on x86-PAE. Therefore, any
    update to sync'd pmd entries need re-syncing. Freeing a pte page
    also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

    Disable free page handling on x86-PAE. pud_free_pmd_page() and
    pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
    This assures that ioremap() does not update sync'd pmd entries at the
    cost of falling back to pte mappings.

    Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
    Reported-by: Joerg Roedel
    Signed-off-by: Toshi Kani
    Signed-off-by: Thomas Gleixner
    Cc: mhocko@suse.com
    Cc: akpm@linux-foundation.org
    Cc: hpa@zytor.com
    Cc: cpandya@codeaurora.org
    Cc: linux-mm@kvack.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: stable@vger.kernel.org
    Cc: Andrew Morton
    Cc: Michal Hocko
    Cc: "H. Peter Anvin"
    Cc:
    Link: https://lkml.kernel.org/r/20180627141348.21777-2-toshi.kani@hpe.com
    Signed-off-by: Greg Kroah-Hartman

    Toshi Kani
     

16 Aug, 2018

8 commits

  • commit 792adb90fa724ce07c0171cbc96b9215af4b1045 upstream.

    The introduction of generic_max_swapfile_size and arch-specific versions has
    broken linking on x86 with CONFIG_SWAP=n due to undefined reference to
    'generic_max_swapfile_size'. Fix it by compiling the x86-specific
    max_swapfile_size() only with CONFIG_SWAP=y.

    Reported-by: Tomas Pruzina
    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Signed-off-by: Vlastimil Babka
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     
  • commit 1063711b57393c1999248cccb57bebfaf16739e7 upstream

    The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
    without inverting the address bits, which makes the PTE entry vulnerable
    for L1TF.

    Make it use the right low level macros to actually invert the address bits
    to protect against L1TF.

    In principle this could be avoided because MMIO tracing is not likely to be
    enabled on production machines, but the fix is straigt forward and for
    consistency sake it's better to get rid of the open coded PTE manipulation.

    Signed-off-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     
  • commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream

    set_memory_np() is used to mark kernel mappings not present, but it has
    it's own open coded mechanism which does not have the L1TF protection of
    inverting the address bits.

    Replace the open coded PTE manipulation with the L1TF protecting low level
    PTE routines.

    Passes the CPA self test.

    Signed-off-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     
  • commit 447ae316670230d7d29430e2cbf1f5db4f49d14c upstream

    The next patch in this series will have to make the definition of
    irq_cpustat_t available to entering_irq().

    Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
    dependencies like

    asm/smp.h
    asm/apic.h
    asm/hardirq.h
    linux/irq.h
    linux/topology.h
    linux/smp.h
    asm/smp.h

    or

    linux/gfp.h
    linux/mmzone.h
    asm/mmzone.h
    asm/mmzone_64.h
    asm/smp.h
    asm/apic.h
    asm/hardirq.h
    linux/irq.h
    linux/irqdesc.h
    linux/kobject.h
    linux/sysfs.h
    linux/kernfs.h
    linux/idr.h
    linux/gfp.h

    and others.

    This causes compilation errors because of the header guards becoming
    effective in the second inclusion: symbols/macros that had been defined
    before wouldn't be available to intermediate headers in the #include chain
    anymore.

    A possible workaround would be to move the definition of irq_cpustat_t
    into its own header and include that from both, asm/hardirq.h and
    asm/apic.h.

    However, this wouldn't solve the real problem, namely asm/harirq.h
    unnecessarily pulling in all the linux/irq.h cruft: nothing in
    asm/hardirq.h itself requires it. Also, note that there are some other
    archs, like e.g. arm64, which don't have that #include in their
    asm/hardirq.h.

    Remove the linux/irq.h #include from x86' asm/hardirq.h.

    Fix resulting compilation errors by adding appropriate #includes to *.c
    files as needed.

    Note that some of these *.c files could be cleaned up a bit wrt. to their
    set of #includes, but that should better be done from separate patches, if
    at all.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream

    The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
    offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
    offset. The lower word is zeroed, thus systems with less than 4GB memory are
    safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
    to L1TF; with even more memory, also the swap offfset influences the address.
    This might be a problem with 32bit PAE guests running on large 64bit hosts.

    By continuing to keep the whole swap entry in either high or low 32bit word of
    PTE we would limit the swap size too much. Thus this patch uses the whole PAE
    PTE with the same layout as the 64bit version does. The macros just become a
    bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Thomas Gleixner
    Acked-by: Michal Hocko
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     
  • commit 1a7ed1ba4bba6c075d5ad61bb75e3fbc870840d6 upstream

    The previous patch has limited swap file size so that large offsets cannot
    clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.

    It assumed that offsets are encoded starting with bit 12, same as pfn. But
    on x86_64, offsets are encoded starting with bit 9.

    Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
    and 256TB with 46bit MAX_PA.

    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     
  • commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream

    For the L1TF workaround its necessary to limit the swap file size to below
    MAX_PA/2, so that the higher bits of the swap offset inverted never point
    to valid memory.

    Add a mechanism for the architecture to override the swap file size check
    in swapfile.c and add a x86 specific max swapfile check function that
    enforces that limit.

    The check is only enabled if the CPU is vulnerable to L1TF.

    In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
    with 46bit PA it is 32TB. The limit is only per individual swap file, so
    it's always possible to exceed these limits with multiple swap files or
    partitions.

    Signed-off-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Acked-by: Michal Hocko
    Acked-by: Dave Hansen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     
  • commit 42e4089c7890725fcd329999252dc489b72f2921 upstream

    For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
    table entry. This sets the high bits in the CPU's address space, thus
    making sure to point to not point an unmapped entry to valid cached memory.

    Some server system BIOSes put the MMIO mappings high up in the physical
    address space. If such an high mapping was mapped to unprivileged users
    they could attack low memory by setting such a mapping to PROT_NONE. This
    could happen through a special device driver which is not access
    protected. Normal /dev/mem is of course access protected.

    To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.

    Valid page mappings are allowed because the system is then unsafe anyways.

    It's not expected that users commonly use PROT_NONE on MMIO. But to
    minimize any impact this is only enforced if the mapping actually refers to
    a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
    the check for root.

    For mmaps this is straight forward and can be handled in vm_insert_pfn and
    in remap_pfn_range().

    For mprotect it's a bit trickier. At the point where the actual PTEs are
    accessed a lot of state has been changed and it would be difficult to undo
    on an error. Since this is a uncommon case use a separate early page talk
    walk pass for MMIO PROT_NONE mappings that checks for this condition
    early. For non MMIO and non PROT_NONE there are no changes.

    Signed-off-by: Andi Kleen
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Acked-by: Dave Hansen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     

03 Jul, 2018

1 commit

  • commit 2bdce74412c249ac01dfe36b6b0043ffd7a5361e upstream.

    Hussam reports:

    I was poking around and for no real reason, I did cat /dev/mem and
    strings /dev/mem. Then I saw the following warning in dmesg. I saved it
    and rebooted immediately.

    memremap attempted on mixed range 0x000000000009c000 size: 0x1000
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 11810 at kernel/memremap.c:98 memremap+0x104/0x170
    [..]
    Call Trace:
    xlate_dev_mem_ptr+0x25/0x40
    read_mem+0x89/0x1a0
    __vfs_read+0x36/0x170

    The memremap() implementation checks for attempts to remap System RAM
    with MEMREMAP_WB and instead redirects those mapping attempts to the
    linear map. However, that only works if the physical address range
    being remapped is page aligned. In low memory we have situations like
    the following:

    00000000-00000fff : Reserved
    00001000-0009fbff : System RAM
    0009fc00-0009ffff : Reserved

    ...where System RAM intersects Reserved ranges on a sub-page page
    granularity.

    Given that devmem_is_allowed() special cases any attempt to map System
    RAM in the first 1MB of memory, replace page_is_ram() with the more
    precise region_intersects() to trap attempts to map disallowed ranges.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=199999
    Link: http://lkml.kernel.org/r/152856436164.18127.2847888121707136898.stgit@dwillia2-desk3.amr.corp.intel.com
    Fixes: 92281dee825f ("arch: introduce memremap()")
    Signed-off-by: Dan Williams
    Reported-by: Hussam Al-Tayeb
    Tested-by: Hussam Al-Tayeb
    Cc: Christoph Hellwig
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

30 May, 2018

2 commits

  • [ Upstream commit 639d6aafe437a7464399d2a77d006049053df06f ]

    __ro_after_init data gets stuck in the .rodata section. That's normally
    fine because the kernel itself manages the R/W properties.

    But, if we run __change_page_attr() on an area which is __ro_after_init,
    the .rodata checks will trigger and force the area to be immediately
    read-only, even if it is early-ish in boot. This caused problems when
    trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
    it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
    to clear _PAGE_RW. The kernel then oopses the next time it wrote to
    a __ro_after_init data structure.

    To fix this, add the kernel_set_to_readonly check, just like we have
    for kernel text, just a few lines below in this function.

    Signed-off-by: Dave Hansen
    Acked-by: Kees Cook
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Arjan van de Ven
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: David Woodhouse
    Cc: Greg Kroah-Hartman
    Cc: Hugh Dickins
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Nadav Amit
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • [ Upstream commit e3e288121408c3abeed5af60b87b95c847143845 ]

    The pmd_set_huge() and pud_set_huge() functions are used from
    the generic ioremap() code to establish large mappings where this
    is possible.

    But the generic ioremap() code does not check whether the
    PMD/PUD entries are already populated with a non-leaf entry,
    so that any page-table pages these entries point to will be
    lost.

    Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a
    BUG_ON() in vmalloc_sync_one() when PMD entries are synced
    from swapper_pg_dir to the current page-table. This happens
    because the PMD entry from swapper_pg_dir was promoted to a
    huge-page entry while the current PGD still contains the
    non-leaf entry. Because both entries are present and point
    to a different page, the BUG_ON() triggers.

    This was actually triggered with pti-x32 enabled in a KVM
    virtual machine by the graphics driver.

    A real and better fix for that would be to improve the
    page-table handling in the generic ioremap() code. But that is
    out-of-scope for this patch-set and left for later work.

    Reported-by: David H. Gutteridge
    Signed-off-by: Joerg Roedel
    Reviewed-by: Thomas Gleixner
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: Jiri Kosina
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Pavel Machek
    Cc: Peter Zijlstra
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Joerg Roedel
     

23 May, 2018

1 commit

  • commit 0a0b152083cfc44ec1bb599b57b7aab41327f998 upstream.

    I got a bug report that the following code (roughly) was
    causing a SIGSEGV:

    mprotect(ptr, size, PROT_EXEC);
    mprotect(ptr, size, PROT_NONE);
    mprotect(ptr, size, PROT_READ);
    *ptr = 100;

    The problem is hit when the mprotect(PROT_EXEC)
    is implicitly assigned a protection key to the VMA, and made
    that key ACCESS_DENY|WRITE_DENY. The PROT_NONE mprotect()
    failed to remove the protection key, and the PROT_NONE->
    PROT_READ left the PTE usable, but the pkey still in place
    and left the memory inaccessible.

    To fix this, we ensure that we always "override" the pkee
    at mprotect() if the VMA does not have execute-only
    permissions, but the VMA has the execute-only pkey.

    We had a check for PROT_READ/WRITE, but it did not work
    for PROT_NONE. This entirely removes the PROT_* checks,
    which ensures that PROT_NONE now works.

    Reported-by: Shakeel Butt
    Signed-off-by: Dave Hansen
    Cc: Andrew Morton
    Cc: Dave Hansen
    Cc: Linus Torvalds
    Cc: Michael Ellermen
    Cc: Peter Zijlstra
    Cc: Ram Pai
    Cc: Shuah Khan
    Cc: Thomas Gleixner
    Cc: linux-mm@kvack.org
    Cc: stable@vger.kernel.org
    Fixes: 62b5f7d013f ("mm/core, x86/mm/pkeys: Add execute-only protection keys support")
    Link: http://lkml.kernel.org/r/20180509171351.084C5A71@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     

26 Apr, 2018

1 commit

  • [ Upstream commit 595dd46ebfc10be041a365d0a3fa99df50b6ba73 ]

    Commit:

    df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data")

    ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
    However, accessing the vsyscall user page will cause an SMAP fault.

    Replace memcpy() with copy_from_user() to fix this bug works, but adding
    a common way to handle this sort of user page may be useful for future.

    Currently, only vsyscall page requires KCORE_USER.

    Signed-off-by: Jia Zhang
    Reviewed-by: Jiri Olsa
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: jolsa@redhat.com
    Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jia Zhang
     

29 Mar, 2018

2 commits

  • commit 28ee90fe6048fa7b7ceaeb8831c0e4e454a4cf89 upstream.

    Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
    clear a given pud/pmd entry and free up lower level page table(s).

    The address range associated with the pud/pmd entry must have been
    purged by INVLPG.

    Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com
    Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
    Signed-off-by: Toshi Kani
    Reported-by: Lei Li
    Cc: Michal Hocko
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Toshi Kani
     
  • commit b6bdb7517c3d3f41f20e5c2948d6bc3f8897394e upstream.

    On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
    create pud/pmd mappings. A kernel panic was observed on arm64 systems
    with Cortex-A75 in the following steps as described by Hanjun Guo.

    1. ioremap a 4K size, valid page table will build,
    2. iounmap it, pte0 will set to 0;
    3. ioremap the same address with 2M size, pgd/pmd is unchanged,
    then set the a new value for pmd;
    4. pte0 is leaked;
    5. CPU may meet exception because the old pmd is still in TLB,
    which will lead to kernel panic.

    This panic is not reproducible on x86. INVLPG, called from iounmap,
    purges all levels of entries associated with purged address on x86. x86
    still has memory leak.

    The patch changes the ioremap path to free unmapped page table(s) since
    doing so in the unmap path has the following issues:

    - The iounmap() path is shared with vunmap(). Since vmap() only
    supports pte mappings, making vunmap() to free a pte page is an
    overhead for regular vmap users as they do not need a pte page freed
    up.

    - Checking if all entries in a pte page are cleared in the unmap path
    is racy, and serializing this check is expensive.

    - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
    Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
    purge.

    Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
    clear a given pud/pmd entry and free up a page for the lower level
    entries.

    This patch implements their stub functions on x86 and arm64, which work
    as workaround.

    [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
    Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
    Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
    Reported-by: Lei Li
    Signed-off-by: Toshi Kani
    Cc: Catalin Marinas
    Cc: Wang Xuefeng
    Cc: Will Deacon
    Cc: Hanjun Guo
    Cc: Michal Hocko
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: Matthew Wilcox
    Cc: Chintan Pandya
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Toshi Kani
     

21 Mar, 2018

1 commit

  • commit 18a955219bf7d9008ce480d4451b6b8bf4483a22 upstream.

    Gratian Crisan reported that vmalloc_fault() crashes when CONFIG_HUGETLBFS
    is not set since the function inadvertently uses pXn_huge(), which always
    return 0 in this case. ioremap() does not depend on CONFIG_HUGETLBFS.

    Fix vmalloc_fault() to call pXd_large() instead.

    Fixes: f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages properly")
    Reported-by: Gratian Crisan
    Signed-off-by: Toshi Kani
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: Borislav Petkov
    Cc: Andy Lutomirski
    Link: https://lkml.kernel.org/r/20180313170347.3829-2-toshi.kani@hpe.com
    Signed-off-by: Greg Kroah-Hartman

    Toshi Kani
     

15 Mar, 2018

2 commits

  • commit 531bb52a869a9c6e08c8d17ba955fcbfc18037ad upstream.

    This is boot code and thus Spectre-safe: we run this _way_ before userspace
    comes along to have a chance to poison our branch predictor.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Thomas Gleixner
    Acked-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Arjan van de Ven
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Woodhouse
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Tom Lendacky
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 3b3a9268bba62b35a29bafe0931715b1725fdf26 upstream.

    This comment referred to a conditional call to kmemcheck_hide() that was
    here until commit 4950276672fc ("kmemcheck: remove annotations").

    Now that kmemcheck has been removed, it doesn't make sense anymore.

    Signed-off-by: Jann Horn
    Acked-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20180219175039.253089-1-jannh@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

09 Mar, 2018

1 commit

  • commit 945fd17ab6bab8a4d05da6c3170519fbcfe62ddb upstream.

    The separation of the cpu_entry_area from the fixmap missed the fact that
    on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
    initial_page_table by the previous synchronizations.

    This results in suspend/resume failures because 32bit utilizes initial page
    table for resume. The absence of the cpu_entry_area mapping results in a
    triple fault, aka. insta reboot.

    With PAE enabled this works by chance because the PGD entry which covers
    the fixmap and other parts incindentally provides the cpu_entry_area
    mapping as well.

    Synchronize the initial page table after setting up the cpu entry
    area. Instead of adding yet another copy of the same code, move it to a
    function and invoke it from the various places.

    It needs to be investigated if the existing calls in setup_arch() and
    setup_per_cpu_areas() can be replaced by the later invocation from
    setup_cpu_entry_areas(), but that's beyond the scope of this fix.

    Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
    Reported-by: Woody Suwalski
    Signed-off-by: Thomas Gleixner
    Tested-by: Woody Suwalski
    Cc: William Grant
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

25 Feb, 2018

1 commit

  • [ Upstream commit 6d60ce384d1d5ca32b595244db4077a419acc687 ]

    If something calls ioremap() with an address not aligned to PAGE_SIZE, the
    returned address might be not aligned as well. This led to a probe
    registered on exactly the returned address, but the entire page was armed
    for mmiotracing.

    On calling iounmap() the address passed to unregister_kmmio_probe() was
    PAGE_SIZE aligned by the caller leading to a complete freeze of the
    machine.

    We should always page align addresses while (un)registerung mappings,
    because the mmiotracer works on top of pages, not mappings. We still keep
    track of the probes based on their real addresses and lengths though,
    because the mmiotrace still needs to know what are mapped memory regions.

    Also move the call to mmiotrace_iounmap() prior page aligning the address,
    so that all probes are unregistered properly, otherwise the kernel ends up
    failing memory allocations randomly after disabling the mmiotracer.

    Tested-by: Lyude
    Signed-off-by: Karol Herbst
    Acked-by: Pekka Paalanen
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: nouveau@lists.freedesktop.org
    Link: http://lkml.kernel.org/r/20171127075139.4928-1-kherbst@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Karol Herbst