Eric Lee / smarc-fsl-linux-kernel

23 Oct, 2015

1 commit

b05730b2e x86/mm: Set NX on gap between __ex_table and rodata ... Browse Code »

commit ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 upstream.

Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables. Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.

Before:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte
0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
0xffffffff82200000-0xffffffffa0000000 478M pmd

After:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte
0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
0xffffffff82200000-0xffffffffa0000000 478M pmd

Signed-off-by: Stephen Smalley
Acked-by: Kees Cook
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Stephen Smalley
10 years ago

30 Sep, 2015

1 commit

f48b9555a x86/mm: Initialize pmd_idx in page_table_range_init_count() ... Browse Code »

commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream.

The variable pmd_idx is not initialized for the first iteration of the
for loop.

Assign the proper value which indexes the start address.

Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once'
Signed-off-by: Minfei Huang
Cc: tony.luck@intel.com
Cc: wangnan0@huawei.com
Cc: david.vrabel@citrix.com
Reviewed-by: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Minfei Huang
10 years ago

11 Aug, 2015

4 commits

5728ec989 x86/mm: Add parenthesis for TLB tracepoint size calculation ... Browse Code »

commit bbc03778b9954a2ec93baed63718e4df0192f130 upstream.

flush_tlb_info->flush_start/end are both normal virtual
addresses. When calculating 'nr_pages' (only used for the
tracepoint), I neglected to put parenthesis in.

Thanks to David Koufaty for pointing this out.

Signed-off-by: Dave Hansen
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150720230153.9E834081@viggo.jf.intel.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Dave Hansen
10 years ago
6b12a75d0 x86/kasan: Fix boot crash on AMD processors ... Browse Code »

commit d4f86beacc21d538dc41e1fc75a22e084f547edf upstream.

While populating zero shadow wrong bits in upper level page
tables used. __PAGE_KERNEL_RO that was used for pgd/pud/pmd has
_PAGE_BIT_GLOBAL set. Global bit is present only in the lowest
level of the page translation hierarchy (ptes), and it should be
zero in upper levels.

This bug seems doesn't cause any troubles on Intel cpus, while
on AMDs it cause kernel crash on boot.

Use _KERNPG_TABLE bits for pgds/puds/pmds to fix this.

Reported-by: Borislav Petkov
Signed-off-by: Andrey Ryabinin
Cc: Alexander Popov
Cc: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1435828178-10975-5-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Andrey Ryabinin
10 years ago
bcb87bb5a x86/kasan: Flush TLBs after switching CR3 ... Browse Code »

commit 241d2c54c62fa0939fc9a9512b48ac3434e90a89 upstream.

load_cr3() doesn't cause tlb_flush if PGE enabled.

This may cause tons of false positive reports spamming the
kernel to death.

To fix this __flush_tlb_all() should be called explicitly
after CR3 changed.

Signed-off-by: Andrey Ryabinin
Cc: Alexander Popov
Cc: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Borislav Petkov
Cc: Dmitry Vyukov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1435828178-10975-4-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Andrey Ryabinin
10 years ago
2fe36f4f2 x86/kasan: Fix KASAN shadow region page tables ... Browse Code »

commit 5d5aa3cfca5cf74cd928daf3674642e6004328d1 upstream.

Currently KASAN shadow region page tables created without
respect of physical offset (phys_base). This causes kernel halt
when phys_base is not zero.

So let's initialize KASAN shadow region page tables in
kasan_early_init() using __pa_nodebug() which considers
phys_base.

This patch also separates x86_64_start_kernel() from KASAN low
level details by moving kasan_map_early_shadow(init_level4_pgt)
into kasan_early_init().

Remove the comment before clear_bss() which stopped bringing
much profit to the code readability. Otherwise describing all
the new order dependencies would be too verbose.

Signed-off-by: Alexander Popov
Signed-off-by: Andrey Ryabinin
Cc: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Borislav Petkov
Cc: Dmitry Vyukov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Alexander Popov
10 years ago

04 Aug, 2015

1 commit

eee184651 x86/mpx: Do not set ->vm_ops on MPX VMAs ... Browse Code »

commit a89652769470d12cd484ee3d3f7bde0742be8d96 upstream.

MPX setups private anonymous mapping, but uses vma->vm_ops too.
This can confuse core VM, as it relies on vm->vm_ops to
distinguish file VMAs from anonymous.

As result we will get SIGBUS, because handle_pte_fault() thinks
it's file VMA without vm_ops->fault and it doesn't know how to
handle the situation properly.

Let's fix that by not setting ->vm_ops.

We don't really need ->vm_ops here: MPX VMA can be detected with
VM_MPX flag. And vma_merge() will not merge MPX VMA with non-MPX
VMA, because ->vm_flags won't match.

The only thing left is name of VMA. I'm not sure if it's part of
ABI, or we can just drop it. The patch keep it by providing
arch_vma_name() on x86.

Signed-off-by: Kirill A. Shutemov
Signed-off-by: Dave Hansen
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150720212958.305CC3E9@viggo.jf.intel.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Kirill A. Shutemov
10 years ago

07 May, 2015

1 commit

3d54ac9e3 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
two build fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Always restore_xinit_state() when use_eager_cpu()
x86: Make cpu_tss available to external modules
efi: Fix error handling in add_sysfs_runtime_map_entry()
x86/spinlocks: Fix regression in spinlock contention detection
x86/mm: Clean up types in xlate_dev_mem_ptr()
x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
efivarfs: Ensure VariableName is NUL-terminated

Linus Torvalds
10 years ago

20 Apr, 2015

1 commit

94d4b4765 x86/mm: Clean up types in xlate_dev_mem_ptr() ... Browse Code »

Pavel Machek reported the following compiler warning on
x86/32 CONFIG_HIGHMEM64G=y builds:

arch/x86/mm/ioremap.c:344:10: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Clean up the types in this function by using a single natural type for
internal calculations (unsigned long), to make it more apparent what's
happening, and also to remove fragile casts.

Reported-by: Pavel Machek
Cc: jgross@suse.com
Cc: roland@purestorage.com
Link: http://lkml.kernel.org/r/20150416080440.GA507@amd
Signed-off-by: Ingo Molnar

Ingo Molnar
10 years ago

15 Apr, 2015

6 commits

4a20799d1 mm: move memtest under mm ... Browse Code »

Memtest is a simple feature which fills the memory with a given set of
patterns and validates memory contents, if bad memory regions is detected
it reserves them via memblock API. Since memblock API is widely used by
other architectures this feature can be enabled outside of x86 world.

This patch set promotes memtest to live under generic mm umbrella and
enables memtest feature for arm/arm64.

It was reported that this patch set was useful for tracking down an issue
with some errant DMA on an arm64 platform.

This patch (of 6):

There is nothing platform dependent in the core memtest code, so other
platforms might benefit from this feature too.

[linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
Signed-off-by: Vladimir Murzin
Acked-by: Will Deacon
Tested-by: Mark Rutland
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Catalin Marinas
Cc: Russell King
Cc: Paul Bolle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Murzin
10 years ago
2b68f6cae mm: expose arch_mmap_rnd when available ... Browse Code »

When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook
Cc: Hector Marco-Gisbert
Cc: Russell King
Reviewed-by: Ingo Molnar
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Alexander Viro
Cc: Oleg Nesterov
Cc: Andy Lutomirski
Cc: "David A. Long"
Cc: Andrey Ryabinin
Cc: Arun Chandran
Cc: Yann Droneaud
Cc: Min-Hua Chen
Cc: Paul Burton
Cc: Alex Smith
Cc: Markos Chandras
Cc: Vineeth Vijayan
Cc: Jeff Bailey
Cc: Michael Holzheu
Cc: Ben Hutchings
Cc: Behan Webster
Cc: Ismael Ripoll
Cc: Jan-Simon Mller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
10 years ago
82168140b x86: standardize mmap_rnd() usage ... Browse Code »

In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm, and extracts the checking of
PF_RANDOMIZE.

Signed-off-by: Kees Cook
Reviewed-by: Ingo Molnar
Cc: Oleg Nesterov
Cc: Andy Lutomirski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
10 years ago
6b6378355 x86, mm: support huge KVA mappings on x86 ... Browse Code »

Implement huge KVA mapping interfaces on x86.

On x86, MTRRs can override PAT memory types with a 4KB granularity. When
using a huge page, MTRRs can override the memory type of the huge page,
which may lead a performance penalty. The processor can also behave in an
undefined manner if a huge page is mapped to a memory range that MTRRs
have mapped with multiple different memory types. Therefore, the mapping
code falls back to use a smaller page size toward 4KB when a mapping range
is covered by non-WB type of MTRRs. The WB type of MTRRs has no affect on
the PAT memory types.

pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see if a
given range is covered by MTRRs. MTRR_TYPE_WRBACK indicates that the
range is either covered by WB or not covered and the MTRR default value is
set to WB. 0xFF indicates that MTRRs are disabled.

HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE is set.
X86_32 without X86_PAE is not supported since such config can unlikey be
benefited from this feature, and there was an issue found in testing.

[fengguang.wu@intel.com: ioremap_pud_capable can be static]
Signed-off-by: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Arnd Bergmann
Cc: Dave Hansen
Cc: Robert Elliott
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
10 years ago
5d72b4fba x86, mm: support huge I/O mapping capability I/F ... Browse Code »

Implement huge I/O mapping capability interfaces for ioremap() on x86.

IOREMAP_MAX_ORDER is defined to PUD_SHIFT on x86/64 and PMD_SHIFT on
x86/32, which overrides the default value defined in .

Signed-off-by: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Arnd Bergmann
Cc: Dave Hansen
Cc: Robert Elliott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
10 years ago
982333683 x86: expose number of page table levels on Kconfig level ... Browse Code »

We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Tested-by: Guenter Roeck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
10 years ago

14 Apr, 2015

2 commits

ec1bc8e4c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fix from Ingo Molnar:
"Leftover from 4.0

Fix a local stack variable corruption with certain kdump usage
patterns (Dave Young)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/numa: Fix kernel stack corruption in numa_init()->numa_clear_kernel_node_hotplug()

Linus Torvalds
10 years ago
6cf78d4b3 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 mm changes from Ingo Molnar:
"The main changes in this cycle were:

- reduce the x86/32 PAE per task PGD allocation overhead from 4K to
0.032k (Fenghua Yu)

- early_ioremap/memunmap() usage cleanups (Juergen Gross)

- gbpages support cleanups (Luis R Rodriguez)

- improve AMD Bulldozer (family 0x15) ASLR I$ aliasing workaround to
increase randomization by 3 bits (per bootup) (Hector
Marco-Gisbert)

- misc fixlets"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Improve AMD Bulldozer ASLR workaround
x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion
init.h: Clean up the __setup()/early_param() macros
x86/mm: Simplify probe_page_size_mask()
x86/mm: Further simplify 1 GB kernel linear mappings handling
x86/mm: Use early_param_on_off() for direct_gbpages
init.h: Add early_param_on_off()
x86/mm: Simplify enabling direct_gbpages
x86/mm: Use IS_ENABLED() for direct_gbpages
x86/mm: Unexport set_memory_ro() and set_memory_rw()
x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c
x86/mm: Use early_memunmap() instead of early_iounmap()
x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases
x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes

Linus Torvalds
10 years ago

07 Apr, 2015

1 commit

22ef882e6 x86/mm/numa: Fix kernel stack corruption in numa_init()->numa_clear_kernel_node_hotplug() ... Browse Code »

I got below kernel panic during kdump test on Thinkpad T420
laptop:

[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000037ba4fff]
[ 0.000000] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81d21910
...
[ 0.000000] Call Trace:
[ 0.000000] [] dump_stack+0x45/0x57
[ 0.000000] [] panic+0xd0/0x204
[ 0.000000] [] ? numa_clear_kernel_node_hotplug+0xe6/0xf2
[ 0.000000] [] __stack_chk_fail+0x1b/0x20
[ 0.000000] [] numa_clear_kernel_node_hotplug+0xe6/0xf2
[ 0.000000] [] numa_init+0x1a5/0x520
[ 0.000000] [] x86_numa_init+0x19/0x3d
[ 0.000000] [] initmem_init+0x9/0xb
[ 0.000000] [] setup_arch+0x94f/0xc82
[ 0.000000] [] ? early_idt_handlers+0x120/0x120
[ 0.000000] [] ? printk+0x55/0x6b
[ 0.000000] [] ? early_idt_handlers+0x120/0x120
[ 0.000000] [] start_kernel+0xe8/0x4d6
[ 0.000000] [] ? early_idt_handlers+0x120/0x120
[ 0.000000] [] ? early_idt_handlers+0x120/0x120
[ 0.000000] [] x86_64_start_reservations+0x2a/0x2c
[ 0.000000] [] x86_64_start_kernel+0x161/0x184
[ 0.000000] ---[ end Kernel panic - not syncing: stack-protector: Kernel sta

This is caused by writing over the end of numa mask bitmap
in numa_clear_kernel_node().

numa_clear_kernel_node() tries to set the node id in a mask bitmap,
by iterating all reserved regions and assuming that every region
has a valid nid.

This assumption is not true because there's an exception for some
graphic memory quirks. See trim_snb_memory() in arch/x86/kernel/setup.c

It is easily to reproduce the bug in the kdump kernel because kdump
kernel use pre-reserved memory instead of the whole memory, but
kexec pass other reserved memory ranges to 2nd kernel as well.
like below in my test:

kdump kernel ram 0x2d000000 - 0x37bfffff
One of the reserved regions: 0x40000000 - 0x40100000 which
includes 0x40004000, a page excluded in trim_snb_memory(). For
this memblock reserved region the nid is not set, it is still
default value MAX_NUMNODES. later node_set will set bit
MAX_NUMNODES thus stack corruption happen.

This also happens when booting with mem= kernel commandline
during my test.

Fixing it by adding a check, do not call node_set in case nid is
MAX_NUMNODES.

Signed-off-by: Dave Young
Reviewed-by: Yasuaki Ishimatsu
Cc: Borislav Petkov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Cc: bhe@redhat.com
Cc: qiuxishi@huawei.com
Link: http://lkml.kernel.org/r/20150407134132.GA23522@dhcp-16-198.nay.redhat.com
Signed-off-by: Ingo Molnar

Dave Young
10 years ago

23 Mar, 2015

2 commits

f39b6f0ef x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' ... Browse Code »

user_mode_vm() and user_mode() are now the same. Change all callers
of user_mode_vm() to user_mode().

The next patch will remove the definition of user_mode_vm.

Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brad Spengler
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org
[ Merged to a more recent kernel. ]
Signed-off-by: Ingo Molnar

Andy Lutomirski
10 years ago
d31bf07f7 x86/mm/fault: Use TASK_SIZE_MAX in is_prefetch() ... Browse Code »

This is slightly shorter and slightly faster. It's also more
correct: the split between user and kernel addresses is
TASK_SIZE_MAX, regardless of ti->flags.

Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brad Spengler
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/09156b63bad90a327827003c9e53faa82ef4c56e.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar

Andy Lutomirski
10 years ago

05 Mar, 2015

6 commits

c709feda5 x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit … ... Browse Code »

…more readable fashion

The initialization of these two arrays is a bit difficult to follow:
restructure it optically so that a 2D structure shows which bit in
the PTE is set and which not.

Also improve on comments a bit.

No code or data changed:

# arch/x86/mm/init.o:

text data bss dec hex filename
4585 424 29776 34785 87e1 init.o.before
4585 424 29776 34785 87e1 init.o.after

md5:
a82e11ff58bcfd0af3a94662a701f65d init.o.before.asm
a82e11ff58bcfd0af3a94662a701f65d init.o.after.asm

Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150305082135.GB5969@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
10 years ago
e61980a70 x86/mm: Simplify probe_page_size_mask() ... Browse Code »

Now that we've simplified the gbpages config space, move the
'page_size_mask' initialization into probe_page_size_mask(),
right next to the PSE and PGE enablement lines.

Cc: Luis R. Rodriguez
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: David Vrabel
Cc: Dexuan Cui
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: JBeulich@suse.com
Cc: Jan Beulich
Cc: Joonsoo Kim
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Pavel Machek
Cc: Thomas Gleixner
Cc: Tony Lindgren
Cc: Toshi Kani
Cc: Vlastimil Babka
Cc: Xishi Qiu
Cc: julia.lawall@lip6.fr
Signed-off-by: Ingo Molnar

Ingo Molnar
10 years ago
10971ab26 x86/mm: Further simplify 1 GB kernel linear mappings handling ... Browse Code »

It's a bit pointless to allow Kconfig configuration for 1GB kernel
mappings, it's already hidden behind a 'default y' and CONFIG_EXPERT.

Remove this complication and simplify the code by renaming
CONFIG_ENABLE_DIRECT_GBPAGES to CONFIG_X86_DIRECT_GBPAGES and
document the DEBUG_PAGE_ALLOC and KMEMCHECK quirks.

Cc: Luis R. Rodriguez
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: David Vrabel
Cc: Dexuan Cui
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: JBeulich@suse.com
Cc: Jan Beulich
Cc: Joonsoo Kim
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Pavel Machek
Cc: Thomas Gleixner
Cc: Tony Lindgren
Cc: Toshi Kani
Cc: Vlastimil Babka
Cc: Xishi Qiu
Cc: julia.lawall@lip6.fr
Signed-off-by: Ingo Molnar

Ingo Molnar
10 years ago
73c8c861d x86/mm: Use early_param_on_off() for direct_gbpages ... Browse Code »

The enabler / disabler is pretty simple, just use the
provided wrappers, this lets us easily relate the variable
to the associated Kconfig entry.

Signed-off-by: Luis R. Rodriguez
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: David Vrabel
Cc: Dexuan Cui
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: JBeulich@suse.com
Cc: Jan Beulich
Cc: Joonsoo Kim
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Pavel Machek
Cc: Thomas Gleixner
Cc: Tony Lindgren
Cc: Toshi Kani
Cc: Vlastimil Babka
Cc: Xishi Qiu
Cc: julia.lawall@lip6.fr
Link: http://lkml.kernel.org/r/1425518654-3403-5-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar

Luis R. Rodriguez
10 years ago
e5008abe9 x86/mm: Simplify enabling direct_gbpages ... Browse Code »

direct_gbpages can be force enabled as an early parameter
but not really have taken effect when DEBUG_PAGEALLOC
or KMEMCHECK is enabled. You can also enable direct_gbpages
right now if you have an x86_64 architecture but your CPU
doesn't really have support for this feature. In both cases
PG_LEVEL_1G won't actually be enabled but direct_gbpages is used
in other areas under the assumptions that PG_LEVEL_1G
was set. Fix this by putting together all requirements
which make this feature sensible to enable under, and only
enable both finally flipping on PG_LEVEL_1G and leaving
PG_LEVEL_1G set when this is true.

We only enable this feature then to be possible on sensible
builds defined by the new ENABLE_DIRECT_GBPAGES. If the
CPU has support for it you can either enable this by using
the DIRECT_GBPAGES option or using the early kernel parameter.
If a platform had support for this you can always force disable
it as well.

Signed-off-by: Luis R. Rodriguez
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: David Vrabel
Cc: Dexuan Cui
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: JBeulich@suse.com
Cc: Jan Beulich
Cc: Joonsoo Kim
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Pavel Machek
Cc: Thomas Gleixner
Cc: Tony Lindgren
Cc: Toshi Kani
Cc: Vlastimil Babka
Cc: Xishi Qiu
Cc: julia.lawall@lip6.fr
Link: http://lkml.kernel.org/r/1425518654-3403-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar

Luis R. Rodriguez
10 years ago
d9fd579c2 x86/mm: Use IS_ENABLED() for direct_gbpages ... Browse Code »

Replace #ifdef eyesore with IS_ENABLED() use.

Signed-off-by: Luis R. Rodriguez
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: David Vrabel
Cc: Dexuan Cui
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: JBeulich@suse.com
Cc: Jan Beulich
Cc: Joonsoo Kim
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Pavel Machek
Cc: Thomas Gleixner
Cc: Tony Lindgren
Cc: Toshi Kani
Cc: Vlastimil Babka
Cc: Xishi Qiu
Cc: julia.lawall@lip6.fr
Link: http://lkml.kernel.org/r/1425518654-3403-2-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar

Luis R. Rodriguez
10 years ago

04 Mar, 2015

1 commit

d2c032e3d Merge tag 'v4.0-rc2' into x86/asm, to refresh the tree ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
10 years ago

28 Feb, 2015

1 commit

6bbb614ec x86/mm: Unexport set_memory_ro() and set_memory_rw() ... Browse Code »

This effectively unexports set_memory_ro() and set_memory_rw()
functions, and thus reverts:

a03352d2c1dc ("x86: export set_memory_ro and set_memory_rw").

They have been introduced for debugging purposes in e1000e, but
no module user is in mainline kernel (anymore?) and we
explicitly do not want modules to use these functions, as they
i.e. protect eBPF (interpreted & JIT'ed) images from malicious
modifications or bugs.

Outside of eBPF scope, I believe also other set_memory_*()
functions should be unexported on x86 for modules.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Cc: Arjan van de Ven
Cc: Borislav Petkov
Cc: Bruce Allan
Cc: H. Peter Anvin
Cc: Jesse Brandeburg
Cc: Thomas Gleixner
Cc: davem@davemloft.net
Link: http://lkml.kernel.org/r/a064393a0a5d319eebde5c761cfd743132d4f213.1425040940.git.daniel@iogearbox.net
Signed-off-by: Ingo Molnar

Daniel Borkmann
10 years ago

24 Feb, 2015

1 commit

a1fb6696c Merge tag 'v4.0-rc1' into x86/mm, to refresh the tree ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
10 years ago

22 Feb, 2015

1 commit

5fbe4c224 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull misc x86 fixes from Ingo Molnar:
"This contains:

- EFI fixes
- a boot printout fix
- ASLR/kASLR fixes
- intel microcode driver fixes
- other misc fixes

Most of the linecount comes from an EFI revert"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch
x86/microcode/intel: Handle truncated microcode images more robustly
x86/microcode/intel: Guard against stack overflow in the loader
x86, mm/ASLR: Fix stack randomization on 64-bit systems
x86/mm/init: Fix incorrect page size in init_memory_mapping() printks
x86/mm/ASLR: Propagate base load address calculation
Documentation/x86: Fix path in zero-page.txt
x86/apic: Fix the devicetree build in certain configs
Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes"
x86/efi: Avoid triple faults during EFI mixed mode calls

Linus Torvalds
10 years ago

19 Feb, 2015

6 commits

a267b0a34 Merge branch 'tip-x86-kaslr' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent ... Browse Code »

Pull ASLR and kASLR fixes from Borislav Petkov:

- Add a global flag announcing KASLR state so that relevant code can do
informed decisions based on its setting. (Jiri Kosina)

- Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert)

Signed-off-by: Ingo Molnar

Ingo Molnar
10 years ago
4e7c22d44 x86, mm/ASLR: Fix stack randomization on 64-bit systems ... Browse Code »

The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.

The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":

static unsigned long randomize_stack_top(unsigned long stack_top)
{
unsigned int random_variable = 0;

if ((current->flags & PF_RANDOMIZE) &&
!(current->personality & ADDR_NO_RANDOMIZE)) {
random_variable = get_random_int() & STACK_RND_MASK;
random_variable <<
Signed-off-by: Ismael Ripoll
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: Kees Cook
Cc:
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Al Viro
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net
Signed-off-by: Borislav Petkov

Hector Marco-Gisbert
10 years ago
f15e05186 x86/mm/init: Fix incorrect page size in init_memory_mapping() printks ... Browse Code »

With 32-bit non-PAE kernels, we have 2 page sizes available
(at most): 4k and 4M.

Enabling PAE replaces that 4M size with a 2M one (which 64-bit
systems use too).

But, when booting a 32-bit non-PAE kernel, in one of our
early-boot printouts, we say:

init_memory_mapping: [mem 0x00000000-0x000fffff]
[mem 0x00000000-0x000fffff] page 4k
init_memory_mapping: [mem 0x37000000-0x373fffff]
[mem 0x37000000-0x373fffff] page 2M
init_memory_mapping: [mem 0x00100000-0x36ffffff]
[mem 0x00100000-0x003fffff] page 4k
[mem 0x00400000-0x36ffffff] page 2M
init_memory_mapping: [mem 0x37400000-0x377fdfff]
[mem 0x37400000-0x377fdfff] page 4k

Which is obviously wrong. There is no 2M page available. This
is probably because of a badly-named variable: in the map_range
code: PG_LEVEL_2M.

Instead of renaming all the PG_LEVEL_2M's. This patch just
fixes the printout:

init_memory_mapping: [mem 0x00000000-0x000fffff]
[mem 0x00000000-0x000fffff] page 4k
init_memory_mapping: [mem 0x37000000-0x373fffff]
[mem 0x37000000-0x373fffff] page 4M
init_memory_mapping: [mem 0x00100000-0x36ffffff]
[mem 0x00100000-0x003fffff] page 4k
[mem 0x00400000-0x36ffffff] page 4M
init_memory_mapping: [mem 0x37400000-0x377fdfff]
[mem 0x37400000-0x377fdfff] page 4k
BRK [0x03206000, 0x03206fff] PGTABLE

Signed-off-by: Dave Hansen
Cc: Pekka Enberg
Cc: Yinghai Lu
Link: http://lkml.kernel.org/r/20150210212030.665EC267@viggo.jf.intel.com
Signed-off-by: Borislav Petkov

Dave Hansen
10 years ago
0cdb81bef x86-64: Also clear _PAGE_GLOBAL from __supported_pte_mask if !cpu_has_pge ... Browse Code »

Not just setting it when the feature is available is for
consistency, and may allow Xen to drop its custom clearing of
the flag (unless it needs it cleared earlier than this code
executes). Note that the change is benign to ix86, as the flag
starts out clear there.

Signed-off-by: Jan Beulich
Cc: Andy Lutomirski
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/54C215D10200007800058912@mail.emea.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
10 years ago
1f40a8bfa x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases ... Browse Code »

STRICT_DEVMEM and PAT produce same failure accessing /dev/mem,
which is quite confusing to the user. Make printk messages
different to lessen confusion.

Signed-off-by: Pavel Machek
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Ingo Molnar

Pavel Machek
10 years ago
1db491f77 x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes ... Browse Code »

With more embedded systems emerging using Quark, among other
things, 32-bit kernel matters again. 32-bit machine and kernel
uses PAE paging, which currently wastes at least 4K of memory
per process on Linux where we have to reserve an entire page to
support a single 32-byte PGD structure. It would be a very good
thing if we could eliminate that wastage.

PAE paging is used to access more than 4GB memory on x86-32. And
it is required for NX.

In this patch, we still allocate one page for pgd for a Xen
domain and 64-bit kernel because one page pgd is assumed in
these cases. But we can save memory space by only allocating
32-byte pgd for 32-bit PAE kernel when it is not running as a
Xen domain.

Signed-off-by: Fenghua Yu
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Christoph Lameter
Cc: Dave Hansen
Cc: Glenn Williamson
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1421382601-46912-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: Ingo Molnar

Fenghua Yu
10 years ago

17 Feb, 2015

1 commit

37507717d Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 perf updates from Ingo Molnar:
"This series tightens up RDPMC permissions: currently even highly
sandboxed x86 execution environments (such as seccomp) have permission
to execute RDPMC, which may leak various perf events / PMU state such
as timing information and other CPU execution details.

This 'all is allowed' RDPMC mode is still preserved as the
(non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is
that RDPMC access is only allowed if a perf event is mmap-ed (which is
needed to correctly interpret RDPMC counter values in any case).

As a side effect of these changes CR4 handling is cleaned up in the
x86 code and a shadow copy of the CR4 value is added.

The extra CR4 manipulation adds ~ of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
perf/x86: Only allow rdpmc if a perf_event is mapped
perf: Pass the event to arch_perf_update_userpage()
perf: Add pmu callbacks to track event mapping and unmapping
x86: Add a comment clarifying LDT context switching
x86: Store a per-cpu shadow copy of CR4
x86: Clean up cr4 manipulation

Linus Torvalds
10 years ago

14 Feb, 2015

3 commits

bebf56a1b kasan: enable instrumentation of global variables ... Browse Code »

This feature let us to detect accesses out of bounds of global variables.
This will work as for globals in kernel image, so for globals in modules.
Currently this won't work for symbols in user-specified sections (e.g.
__init, __read_mostly, ...)

The idea of this is simple. Compiler increases each global variable by
redzone size and add constructors invoking __asan_register_globals()
function. Information about global variable (address, size, size with
redzone ...) passed to __asan_register_globals() so we could poison
variable's redzone.

This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
address making shadow memory handling (
kasan_module_alloc()/kasan_module_free() ) more simple. Such alignment
guarantees that each shadow page backing modules address space correspond
to only one module_alloc() allocation.

Signed-off-by: Andrey Ryabinin
Cc: Dmitry Vyukov
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrey Konovalov
Cc: Yuri Gribov
Cc: Konstantin Khlebnikov
Cc: Sasha Levin
Cc: Christoph Lameter
Cc: Joonsoo Kim
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
10 years ago
c420f167d kasan: enable stack instrumentation ... Browse Code »

Stack instrumentation allows to detect out of bounds memory accesses for
variables allocated on stack. Compiler adds redzones around every
variable on stack and poisons redzones in function's prologue.

Such approach significantly increases stack usage, so all in-kernel stacks
size were doubled.

Signed-off-by: Andrey Ryabinin
Cc: Dmitry Vyukov
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrey Konovalov
Cc: Yuri Gribov
Cc: Konstantin Khlebnikov
Cc: Sasha Levin
Cc: Christoph Lameter
Cc: Joonsoo Kim
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
10 years ago
ef7f0d6a6 x86_64: add KASan support ... Browse Code »

This patch adds arch specific code for kernel address sanitizer.

16TB of virtual addressed used for shadow memory. It's located in range
[ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
stacks.

At early stage we map whole shadow region with zero page. Latter, after
pages mapped to direct mapping address range we unmap zero pages from
corresponding shadow (see kasan_map_shadow()) and allocate and map a real
shadow memory reusing vmemmap_populate() function.

Also replace __pa with __pa_nodebug before shadow initialized. __pa with
CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
__phys_addr is instrumented, so __asan_load could be called before shadow
area initialized.

Signed-off-by: Andrey Ryabinin
Cc: Dmitry Vyukov
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrey Konovalov
Cc: Yuri Gribov
Cc: Konstantin Khlebnikov
Cc: Sasha Levin
Cc: Christoph Lameter
Cc: Joonsoo Kim
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Jim Davis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
10 years ago