23 Oct, 2015

1 commit

  • commit ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 upstream.

    Unused space between the end of __ex_table and the start of
    rodata can be left W+x in the kernel page tables. Extend the
    setting of the NX bit to cover this gap by starting from
    text_end rather than rodata_start.

    Before:
    ---[ High Kernel Mapping ]---
    0xffffffff80000000-0xffffffff81000000 16M pmd
    0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
    0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
    0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte
    0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
    0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
    0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
    0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
    0xffffffff82200000-0xffffffffa0000000 478M pmd

    After:
    ---[ High Kernel Mapping ]---
    0xffffffff80000000-0xffffffff81000000 16M pmd
    0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
    0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
    0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte
    0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
    0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
    0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
    0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
    0xffffffff82200000-0xffffffffa0000000 478M pmd

    Signed-off-by: Stephen Smalley
    Acked-by: Kees Cook
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Stephen Smalley
     

30 Sep, 2015

1 commit

  • commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream.

    The variable pmd_idx is not initialized for the first iteration of the
    for loop.

    Assign the proper value which indexes the start address.

    Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once'
    Signed-off-by: Minfei Huang
    Cc: tony.luck@intel.com
    Cc: wangnan0@huawei.com
    Cc: david.vrabel@citrix.com
    Reviewed-by: yinghai@kernel.org
    Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Minfei Huang
     

11 Aug, 2015

4 commits

  • commit bbc03778b9954a2ec93baed63718e4df0192f130 upstream.

    flush_tlb_info->flush_start/end are both normal virtual
    addresses. When calculating 'nr_pages' (only used for the
    tracepoint), I neglected to put parenthesis in.

    Thanks to David Koufaty for pointing this out.

    Signed-off-by: Dave Hansen
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@sr71.net
    Link: http://lkml.kernel.org/r/20150720230153.9E834081@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • commit d4f86beacc21d538dc41e1fc75a22e084f547edf upstream.

    While populating zero shadow wrong bits in upper level page
    tables used. __PAGE_KERNEL_RO that was used for pgd/pud/pmd has
    _PAGE_BIT_GLOBAL set. Global bit is present only in the lowest
    level of the page translation hierarchy (ptes), and it should be
    zero in upper levels.

    This bug seems doesn't cause any troubles on Intel cpus, while
    on AMDs it cause kernel crash on boot.

    Use _KERNPG_TABLE bits for pgds/puds/pmds to fix this.

    Reported-by: Borislav Petkov
    Signed-off-by: Andrey Ryabinin
    Cc: Alexander Popov
    Cc: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1435828178-10975-5-git-send-email-a.ryabinin@samsung.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andrey Ryabinin
     
  • commit 241d2c54c62fa0939fc9a9512b48ac3434e90a89 upstream.

    load_cr3() doesn't cause tlb_flush if PGE enabled.

    This may cause tons of false positive reports spamming the
    kernel to death.

    To fix this __flush_tlb_all() should be called explicitly
    after CR3 changed.

    Signed-off-by: Andrey Ryabinin
    Cc: Alexander Popov
    Cc: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Borislav Petkov
    Cc: Dmitry Vyukov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1435828178-10975-4-git-send-email-a.ryabinin@samsung.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andrey Ryabinin
     
  • commit 5d5aa3cfca5cf74cd928daf3674642e6004328d1 upstream.

    Currently KASAN shadow region page tables created without
    respect of physical offset (phys_base). This causes kernel halt
    when phys_base is not zero.

    So let's initialize KASAN shadow region page tables in
    kasan_early_init() using __pa_nodebug() which considers
    phys_base.

    This patch also separates x86_64_start_kernel() from KASAN low
    level details by moving kasan_map_early_shadow(init_level4_pgt)
    into kasan_early_init().

    Remove the comment before clear_bss() which stopped bringing
    much profit to the code readability. Otherwise describing all
    the new order dependencies would be too verbose.

    Signed-off-by: Alexander Popov
    Signed-off-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Borislav Petkov
    Cc: Dmitry Vyukov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Alexander Popov
     

04 Aug, 2015

1 commit

  • commit a89652769470d12cd484ee3d3f7bde0742be8d96 upstream.

    MPX setups private anonymous mapping, but uses vma->vm_ops too.
    This can confuse core VM, as it relies on vm->vm_ops to
    distinguish file VMAs from anonymous.

    As result we will get SIGBUS, because handle_pte_fault() thinks
    it's file VMA without vm_ops->fault and it doesn't know how to
    handle the situation properly.

    Let's fix that by not setting ->vm_ops.

    We don't really need ->vm_ops here: MPX VMA can be detected with
    VM_MPX flag. And vma_merge() will not merge MPX VMA with non-MPX
    VMA, because ->vm_flags won't match.

    The only thing left is name of VMA. I'm not sure if it's part of
    ABI, or we can just drop it. The patch keep it by providing
    arch_vma_name() on x86.

    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@sr71.net
    Link: http://lkml.kernel.org/r/20150720212958.305CC3E9@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     

07 May, 2015

1 commit

  • Pull x86 fixes from Ingo Molnar:
    "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
    two build fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/fpu: Always restore_xinit_state() when use_eager_cpu()
    x86: Make cpu_tss available to external modules
    efi: Fix error handling in add_sysfs_runtime_map_entry()
    x86/spinlocks: Fix regression in spinlock contention detection
    x86/mm: Clean up types in xlate_dev_mem_ptr()
    x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
    efivarfs: Ensure VariableName is NUL-terminated

    Linus Torvalds
     

20 Apr, 2015

1 commit

  • Pavel Machek reported the following compiler warning on
    x86/32 CONFIG_HIGHMEM64G=y builds:

    arch/x86/mm/ioremap.c:344:10: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

    Clean up the types in this function by using a single natural type for
    internal calculations (unsigned long), to make it more apparent what's
    happening, and also to remove fragile casts.

    Reported-by: Pavel Machek
    Cc: jgross@suse.com
    Cc: roland@purestorage.com
    Link: http://lkml.kernel.org/r/20150416080440.GA507@amd
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

15 Apr, 2015

6 commits

  • Memtest is a simple feature which fills the memory with a given set of
    patterns and validates memory contents, if bad memory regions is detected
    it reserves them via memblock API. Since memblock API is widely used by
    other architectures this feature can be enabled outside of x86 world.

    This patch set promotes memtest to live under generic mm umbrella and
    enables memtest feature for arm/arm64.

    It was reported that this patch set was useful for tracking down an issue
    with some errant DMA on an arm64 platform.

    This patch (of 6):

    There is nothing platform dependent in the core memtest code, so other
    platforms might benefit from this feature too.

    [linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
    Signed-off-by: Vladimir Murzin
    Acked-by: Will Deacon
    Tested-by: Mark Rutland
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Catalin Marinas
    Cc: Russell King
    Cc: Paul Bolle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Murzin
     
  • When an architecture fully supports randomizing the ELF load location,
    a per-arch mmap_rnd() function is used to find a randomized mmap base.
    In preparation for randomizing the location of ET_DYN binaries
    separately from mmap, this renames and exports these functions as
    arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
    for describing this feature on architectures that support it
    (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
    already supports a separated ET_DYN ASLR from mmap ASLR without the
    ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

    Signed-off-by: Kees Cook
    Cc: Hector Marco-Gisbert
    Cc: Russell King
    Reviewed-by: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Alexander Viro
    Cc: Oleg Nesterov
    Cc: Andy Lutomirski
    Cc: "David A. Long"
    Cc: Andrey Ryabinin
    Cc: Arun Chandran
    Cc: Yann Droneaud
    Cc: Min-Hua Chen
    Cc: Paul Burton
    Cc: Alex Smith
    Cc: Markos Chandras
    Cc: Vineeth Vijayan
    Cc: Jeff Bailey
    Cc: Michael Holzheu
    Cc: Ben Hutchings
    Cc: Behan Webster
    Cc: Ismael Ripoll
    Cc: Jan-Simon Mller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • In preparation for splitting out ET_DYN ASLR, this refactors the use of
    mmap_rnd() to be used similarly to arm, and extracts the checking of
    PF_RANDOMIZE.

    Signed-off-by: Kees Cook
    Reviewed-by: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Implement huge KVA mapping interfaces on x86.

    On x86, MTRRs can override PAT memory types with a 4KB granularity. When
    using a huge page, MTRRs can override the memory type of the huge page,
    which may lead a performance penalty. The processor can also behave in an
    undefined manner if a huge page is mapped to a memory range that MTRRs
    have mapped with multiple different memory types. Therefore, the mapping
    code falls back to use a smaller page size toward 4KB when a mapping range
    is covered by non-WB type of MTRRs. The WB type of MTRRs has no affect on
    the PAT memory types.

    pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see if a
    given range is covered by MTRRs. MTRR_TYPE_WRBACK indicates that the
    range is either covered by WB or not covered and the MTRR default value is
    set to WB. 0xFF indicates that MTRRs are disabled.

    HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE is set.
    X86_32 without X86_PAE is not supported since such config can unlikey be
    benefited from this feature, and there was an issue found in testing.

    [fengguang.wu@intel.com: ioremap_pud_capable can be static]
    Signed-off-by: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Dave Hansen
    Cc: Robert Elliott
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Implement huge I/O mapping capability interfaces for ioremap() on x86.

    IOREMAP_MAX_ORDER is defined to PUD_SHIFT on x86/64 and PMD_SHIFT on
    x86/32, which overrides the default value defined in .

    Signed-off-by: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Dave Hansen
    Cc: Robert Elliott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • We would want to use number of page table level to define mm_struct.
    Let's expose it as CONFIG_PGTABLE_LEVELS.

    Signed-off-by: Kirill A. Shutemov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Tested-by: Guenter Roeck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

14 Apr, 2015

2 commits

  • Pull x86 fix from Ingo Molnar:
    "Leftover from 4.0

    Fix a local stack variable corruption with certain kdump usage
    patterns (Dave Young)"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm/numa: Fix kernel stack corruption in numa_init()->numa_clear_kernel_node_hotplug()

    Linus Torvalds
     
  • Pull x86 mm changes from Ingo Molnar:
    "The main changes in this cycle were:

    - reduce the x86/32 PAE per task PGD allocation overhead from 4K to
    0.032k (Fenghua Yu)

    - early_ioremap/memunmap() usage cleanups (Juergen Gross)

    - gbpages support cleanups (Luis R Rodriguez)

    - improve AMD Bulldozer (family 0x15) ASLR I$ aliasing workaround to
    increase randomization by 3 bits (per bootup) (Hector
    Marco-Gisbert)

    - misc fixlets"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm: Improve AMD Bulldozer ASLR workaround
    x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion
    init.h: Clean up the __setup()/early_param() macros
    x86/mm: Simplify probe_page_size_mask()
    x86/mm: Further simplify 1 GB kernel linear mappings handling
    x86/mm: Use early_param_on_off() for direct_gbpages
    init.h: Add early_param_on_off()
    x86/mm: Simplify enabling direct_gbpages
    x86/mm: Use IS_ENABLED() for direct_gbpages
    x86/mm: Unexport set_memory_ro() and set_memory_rw()
    x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c
    x86/mm: Use early_memunmap() instead of early_iounmap()
    x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases
    x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes

    Linus Torvalds
     

07 Apr, 2015

1 commit

  • I got below kernel panic during kdump test on Thinkpad T420
    laptop:

    [ 0.000000] No NUMA configuration found
    [ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000037ba4fff]
    [ 0.000000] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81d21910
    ...
    [ 0.000000] Call Trace:
    [ 0.000000] [] dump_stack+0x45/0x57
    [ 0.000000] [] panic+0xd0/0x204
    [ 0.000000] [] ? numa_clear_kernel_node_hotplug+0xe6/0xf2
    [ 0.000000] [] __stack_chk_fail+0x1b/0x20
    [ 0.000000] [] numa_clear_kernel_node_hotplug+0xe6/0xf2
    [ 0.000000] [] numa_init+0x1a5/0x520
    [ 0.000000] [] x86_numa_init+0x19/0x3d
    [ 0.000000] [] initmem_init+0x9/0xb
    [ 0.000000] [] setup_arch+0x94f/0xc82
    [ 0.000000] [] ? early_idt_handlers+0x120/0x120
    [ 0.000000] [] ? printk+0x55/0x6b
    [ 0.000000] [] ? early_idt_handlers+0x120/0x120
    [ 0.000000] [] start_kernel+0xe8/0x4d6
    [ 0.000000] [] ? early_idt_handlers+0x120/0x120
    [ 0.000000] [] ? early_idt_handlers+0x120/0x120
    [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c
    [ 0.000000] [] x86_64_start_kernel+0x161/0x184
    [ 0.000000] ---[ end Kernel panic - not syncing: stack-protector: Kernel sta

    This is caused by writing over the end of numa mask bitmap
    in numa_clear_kernel_node().

    numa_clear_kernel_node() tries to set the node id in a mask bitmap,
    by iterating all reserved regions and assuming that every region
    has a valid nid.

    This assumption is not true because there's an exception for some
    graphic memory quirks. See trim_snb_memory() in arch/x86/kernel/setup.c

    It is easily to reproduce the bug in the kdump kernel because kdump
    kernel use pre-reserved memory instead of the whole memory, but
    kexec pass other reserved memory ranges to 2nd kernel as well.
    like below in my test:

    kdump kernel ram 0x2d000000 - 0x37bfffff
    One of the reserved regions: 0x40000000 - 0x40100000 which
    includes 0x40004000, a page excluded in trim_snb_memory(). For
    this memblock reserved region the nid is not set, it is still
    default value MAX_NUMNODES. later node_set will set bit
    MAX_NUMNODES thus stack corruption happen.

    This also happens when booting with mem= kernel commandline
    during my test.

    Fixing it by adding a check, do not call node_set in case nid is
    MAX_NUMNODES.

    Signed-off-by: Dave Young
    Reviewed-by: Yasuaki Ishimatsu
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: bhe@redhat.com
    Cc: qiuxishi@huawei.com
    Link: http://lkml.kernel.org/r/20150407134132.GA23522@dhcp-16-198.nay.redhat.com
    Signed-off-by: Ingo Molnar

    Dave Young
     

23 Mar, 2015

2 commits

  • user_mode_vm() and user_mode() are now the same. Change all callers
    of user_mode_vm() to user_mode().

    The next patch will remove the definition of user_mode_vm.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brad Spengler
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org
    [ Merged to a more recent kernel. ]
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • This is slightly shorter and slightly faster. It's also more
    correct: the split between user and kernel addresses is
    TASK_SIZE_MAX, regardless of ti->flags.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brad Spengler
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/09156b63bad90a327827003c9e53faa82ef4c56e.1426728647.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

05 Mar, 2015

6 commits

  • …more readable fashion

    The initialization of these two arrays is a bit difficult to follow:
    restructure it optically so that a 2D structure shows which bit in
    the PTE is set and which not.

    Also improve on comments a bit.

    No code or data changed:

    # arch/x86/mm/init.o:

    text data bss dec hex filename
    4585 424 29776 34785 87e1 init.o.before
    4585 424 29776 34785 87e1 init.o.after

    md5:
    a82e11ff58bcfd0af3a94662a701f65d init.o.before.asm
    a82e11ff58bcfd0af3a94662a701f65d init.o.after.asm

    Reviewed-by: Juergen Gross <jgross@suse.com>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Jan Beulich <JBeulich@suse.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Luis R. Rodriguez <mcgrof@suse.com>
    Cc: Toshi Kani <toshi.kani@hp.com>
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20150305082135.GB5969@gmail.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Now that we've simplified the gbpages config space, move the
    'page_size_mask' initialization into probe_page_size_mask(),
    right next to the PSE and PGE enablement lines.

    Cc: Luis R. Rodriguez
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Vrabel
    Cc: Dexuan Cui
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: JBeulich@suse.com
    Cc: Jan Beulich
    Cc: Joonsoo Kim
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Pavel Machek
    Cc: Thomas Gleixner
    Cc: Tony Lindgren
    Cc: Toshi Kani
    Cc: Vlastimil Babka
    Cc: Xishi Qiu
    Cc: julia.lawall@lip6.fr
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • It's a bit pointless to allow Kconfig configuration for 1GB kernel
    mappings, it's already hidden behind a 'default y' and CONFIG_EXPERT.

    Remove this complication and simplify the code by renaming
    CONFIG_ENABLE_DIRECT_GBPAGES to CONFIG_X86_DIRECT_GBPAGES and
    document the DEBUG_PAGE_ALLOC and KMEMCHECK quirks.

    Cc: Luis R. Rodriguez
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Vrabel
    Cc: Dexuan Cui
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: JBeulich@suse.com
    Cc: Jan Beulich
    Cc: Joonsoo Kim
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Pavel Machek
    Cc: Thomas Gleixner
    Cc: Tony Lindgren
    Cc: Toshi Kani
    Cc: Vlastimil Babka
    Cc: Xishi Qiu
    Cc: julia.lawall@lip6.fr
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The enabler / disabler is pretty simple, just use the
    provided wrappers, this lets us easily relate the variable
    to the associated Kconfig entry.

    Signed-off-by: Luis R. Rodriguez
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Vrabel
    Cc: Dexuan Cui
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: JBeulich@suse.com
    Cc: Jan Beulich
    Cc: Joonsoo Kim
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Pavel Machek
    Cc: Thomas Gleixner
    Cc: Tony Lindgren
    Cc: Toshi Kani
    Cc: Vlastimil Babka
    Cc: Xishi Qiu
    Cc: julia.lawall@lip6.fr
    Link: http://lkml.kernel.org/r/1425518654-3403-5-git-send-email-mcgrof@do-not-panic.com
    Signed-off-by: Ingo Molnar

    Luis R. Rodriguez
     
  • direct_gbpages can be force enabled as an early parameter
    but not really have taken effect when DEBUG_PAGEALLOC
    or KMEMCHECK is enabled. You can also enable direct_gbpages
    right now if you have an x86_64 architecture but your CPU
    doesn't really have support for this feature. In both cases
    PG_LEVEL_1G won't actually be enabled but direct_gbpages is used
    in other areas under the assumptions that PG_LEVEL_1G
    was set. Fix this by putting together all requirements
    which make this feature sensible to enable under, and only
    enable both finally flipping on PG_LEVEL_1G and leaving
    PG_LEVEL_1G set when this is true.

    We only enable this feature then to be possible on sensible
    builds defined by the new ENABLE_DIRECT_GBPAGES. If the
    CPU has support for it you can either enable this by using
    the DIRECT_GBPAGES option or using the early kernel parameter.
    If a platform had support for this you can always force disable
    it as well.

    Signed-off-by: Luis R. Rodriguez
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Vrabel
    Cc: Dexuan Cui
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: JBeulich@suse.com
    Cc: Jan Beulich
    Cc: Joonsoo Kim
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Pavel Machek
    Cc: Thomas Gleixner
    Cc: Tony Lindgren
    Cc: Toshi Kani
    Cc: Vlastimil Babka
    Cc: Xishi Qiu
    Cc: julia.lawall@lip6.fr
    Link: http://lkml.kernel.org/r/1425518654-3403-3-git-send-email-mcgrof@do-not-panic.com
    Signed-off-by: Ingo Molnar

    Luis R. Rodriguez
     
  • Replace #ifdef eyesore with IS_ENABLED() use.

    Signed-off-by: Luis R. Rodriguez
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: David Vrabel
    Cc: Dexuan Cui
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: JBeulich@suse.com
    Cc: Jan Beulich
    Cc: Joonsoo Kim
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Pavel Machek
    Cc: Thomas Gleixner
    Cc: Tony Lindgren
    Cc: Toshi Kani
    Cc: Vlastimil Babka
    Cc: Xishi Qiu
    Cc: julia.lawall@lip6.fr
    Link: http://lkml.kernel.org/r/1425518654-3403-2-git-send-email-mcgrof@do-not-panic.com
    Signed-off-by: Ingo Molnar

    Luis R. Rodriguez
     

04 Mar, 2015

1 commit


28 Feb, 2015

1 commit

  • This effectively unexports set_memory_ro() and set_memory_rw()
    functions, and thus reverts:

    a03352d2c1dc ("x86: export set_memory_ro and set_memory_rw").

    They have been introduced for debugging purposes in e1000e, but
    no module user is in mainline kernel (anymore?) and we
    explicitly do not want modules to use these functions, as they
    i.e. protect eBPF (interpreted & JIT'ed) images from malicious
    modifications or bugs.

    Outside of eBPF scope, I believe also other set_memory_*()
    functions should be unexported on x86 for modules.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Cc: Arjan van de Ven
    Cc: Borislav Petkov
    Cc: Bruce Allan
    Cc: H. Peter Anvin
    Cc: Jesse Brandeburg
    Cc: Thomas Gleixner
    Cc: davem@davemloft.net
    Link: http://lkml.kernel.org/r/a064393a0a5d319eebde5c761cfd743132d4f213.1425040940.git.daniel@iogearbox.net
    Signed-off-by: Ingo Molnar

    Daniel Borkmann
     

24 Feb, 2015

1 commit


22 Feb, 2015

1 commit

  • Pull misc x86 fixes from Ingo Molnar:
    "This contains:

    - EFI fixes
    - a boot printout fix
    - ASLR/kASLR fixes
    - intel microcode driver fixes
    - other misc fixes

    Most of the linecount comes from an EFI revert"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch
    x86/microcode/intel: Handle truncated microcode images more robustly
    x86/microcode/intel: Guard against stack overflow in the loader
    x86, mm/ASLR: Fix stack randomization on 64-bit systems
    x86/mm/init: Fix incorrect page size in init_memory_mapping() printks
    x86/mm/ASLR: Propagate base load address calculation
    Documentation/x86: Fix path in zero-page.txt
    x86/apic: Fix the devicetree build in certain configs
    Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes"
    x86/efi: Avoid triple faults during EFI mixed mode calls

    Linus Torvalds
     

19 Feb, 2015

6 commits

  • Pull ASLR and kASLR fixes from Borislav Petkov:

    - Add a global flag announcing KASLR state so that relevant code can do
    informed decisions based on its setting. (Jiri Kosina)

    - Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert)

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The issue is that the stack for processes is not properly randomized on
    64 bit architectures due to an integer overflow.

    The affected function is randomize_stack_top() in file
    "fs/binfmt_elf.c":

    static unsigned long randomize_stack_top(unsigned long stack_top)
    {
    unsigned int random_variable = 0;

    if ((current->flags & PF_RANDOMIZE) &&
    !(current->personality & ADDR_NO_RANDOMIZE)) {
    random_variable = get_random_int() & STACK_RND_MASK;
    random_variable <<
    Signed-off-by: Ismael Ripoll
    [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
    Signed-off-by: Kees Cook
    Cc:
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Al Viro
    Fixes: CVE-2015-1593
    Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net
    Signed-off-by: Borislav Petkov

    Hector Marco-Gisbert
     
  • With 32-bit non-PAE kernels, we have 2 page sizes available
    (at most): 4k and 4M.

    Enabling PAE replaces that 4M size with a 2M one (which 64-bit
    systems use too).

    But, when booting a 32-bit non-PAE kernel, in one of our
    early-boot printouts, we say:

    init_memory_mapping: [mem 0x00000000-0x000fffff]
    [mem 0x00000000-0x000fffff] page 4k
    init_memory_mapping: [mem 0x37000000-0x373fffff]
    [mem 0x37000000-0x373fffff] page 2M
    init_memory_mapping: [mem 0x00100000-0x36ffffff]
    [mem 0x00100000-0x003fffff] page 4k
    [mem 0x00400000-0x36ffffff] page 2M
    init_memory_mapping: [mem 0x37400000-0x377fdfff]
    [mem 0x37400000-0x377fdfff] page 4k

    Which is obviously wrong. There is no 2M page available. This
    is probably because of a badly-named variable: in the map_range
    code: PG_LEVEL_2M.

    Instead of renaming all the PG_LEVEL_2M's. This patch just
    fixes the printout:

    init_memory_mapping: [mem 0x00000000-0x000fffff]
    [mem 0x00000000-0x000fffff] page 4k
    init_memory_mapping: [mem 0x37000000-0x373fffff]
    [mem 0x37000000-0x373fffff] page 4M
    init_memory_mapping: [mem 0x00100000-0x36ffffff]
    [mem 0x00100000-0x003fffff] page 4k
    [mem 0x00400000-0x36ffffff] page 4M
    init_memory_mapping: [mem 0x37400000-0x377fdfff]
    [mem 0x37400000-0x377fdfff] page 4k
    BRK [0x03206000, 0x03206fff] PGTABLE

    Signed-off-by: Dave Hansen
    Cc: Pekka Enberg
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/20150210212030.665EC267@viggo.jf.intel.com
    Signed-off-by: Borislav Petkov

    Dave Hansen
     
  • Not just setting it when the feature is available is for
    consistency, and may allow Xen to drop its custom clearing of
    the flag (unless it needs it cleared earlier than this code
    executes). Note that the change is benign to ix86, as the flag
    starts out clear there.

    Signed-off-by: Jan Beulich
    Cc: Andy Lutomirski
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/54C215D10200007800058912@mail.emea.novell.com
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • STRICT_DEVMEM and PAT produce same failure accessing /dev/mem,
    which is quite confusing to the user. Make printk messages
    different to lessen confusion.

    Signed-off-by: Pavel Machek
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Ingo Molnar

    Pavel Machek
     
  • With more embedded systems emerging using Quark, among other
    things, 32-bit kernel matters again. 32-bit machine and kernel
    uses PAE paging, which currently wastes at least 4K of memory
    per process on Linux where we have to reserve an entire page to
    support a single 32-byte PGD structure. It would be a very good
    thing if we could eliminate that wastage.

    PAE paging is used to access more than 4GB memory on x86-32. And
    it is required for NX.

    In this patch, we still allocate one page for pgd for a Xen
    domain and 64-bit kernel because one page pgd is assumed in
    these cases. But we can save memory space by only allocating
    32-byte pgd for 32-bit PAE kernel when it is not running as a
    Xen domain.

    Signed-off-by: Fenghua Yu
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Christoph Lameter
    Cc: Dave Hansen
    Cc: Glenn Williamson
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1421382601-46912-1-git-send-email-fenghua.yu@intel.com
    Signed-off-by: Ingo Molnar

    Fenghua Yu
     

17 Feb, 2015

1 commit

  • Pull x86 perf updates from Ingo Molnar:
    "This series tightens up RDPMC permissions: currently even highly
    sandboxed x86 execution environments (such as seccomp) have permission
    to execute RDPMC, which may leak various perf events / PMU state such
    as timing information and other CPU execution details.

    This 'all is allowed' RDPMC mode is still preserved as the
    (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is
    that RDPMC access is only allowed if a perf event is mmap-ed (which is
    needed to correctly interpret RDPMC counter values in any case).

    As a side effect of these changes CR4 handling is cleaned up in the
    x86 code and a shadow copy of the CR4 value is added.

    The extra CR4 manipulation adds ~ of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
    perf/x86: Only allow rdpmc if a perf_event is mapped
    perf: Pass the event to arch_perf_update_userpage()
    perf: Add pmu callbacks to track event mapping and unmapping
    x86: Add a comment clarifying LDT context switching
    x86: Store a per-cpu shadow copy of CR4
    x86: Clean up cr4 manipulation

    Linus Torvalds
     

14 Feb, 2015

3 commits

  • This feature let us to detect accesses out of bounds of global variables.
    This will work as for globals in kernel image, so for globals in modules.
    Currently this won't work for symbols in user-specified sections (e.g.
    __init, __read_mostly, ...)

    The idea of this is simple. Compiler increases each global variable by
    redzone size and add constructors invoking __asan_register_globals()
    function. Information about global variable (address, size, size with
    redzone ...) passed to __asan_register_globals() so we could poison
    variable's redzone.

    This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
    address making shadow memory handling (
    kasan_module_alloc()/kasan_module_free() ) more simple. Such alignment
    guarantees that each shadow page backing modules address space correspond
    to only one module_alloc() allocation.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • Stack instrumentation allows to detect out of bounds memory accesses for
    variables allocated on stack. Compiler adds redzones around every
    variable on stack and poisons redzones in function's prologue.

    Such approach significantly increases stack usage, so all in-kernel stacks
    size were doubled.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • This patch adds arch specific code for kernel address sanitizer.

    16TB of virtual addressed used for shadow memory. It's located in range
    [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
    stacks.

    At early stage we map whole shadow region with zero page. Latter, after
    pages mapped to direct mapping address range we unmap zero pages from
    corresponding shadow (see kasan_map_shadow()) and allocate and map a real
    shadow memory reusing vmemmap_populate() function.

    Also replace __pa with __pa_nodebug before shadow initialized. __pa with
    CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
    __phys_addr is instrumented, so __asan_load could be called before shadow
    area initialized.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Jim Davis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin