14 Nov, 2018

1 commit

  • [ Upstream commit 9c1442a9d039a1a3302fa93e9a11001c5f23b624 ]

    We currently align the end of the compressed image to a multiple of
    16. However, the PE-COFF header included in the EFI stub says that
    the file alignment is 32 bytes, and when adding an EFI signature to
    the file it must first be padded to this alignment.

    sbsigntool commands warn about this:

    warning: file-aligned section .text extends beyond end of file
    warning: checksum areas are greater than image size. Invalid section table?

    Worse, pesign -at least when creating a detached signature- uses the
    hash of the unpadded file, resulting in an invalid signature if
    padding is required.

    Avoid both these problems by increasing alignment to 32 bytes when
    CONFIG_EFI_STUB is enabled.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     

05 Sep, 2018

1 commit

  • [ Upstream commit 92a4728608a8fd228c572bc8ff50dd98aa0ddf2a ]

    Dirk Gouders reported that two consecutive "make" invocations on an
    already compiled tree will show alternating behaviors:

    $ make
    CALL scripts/checksyscalls.sh
    DESCEND objtool
    CHK include/generated/compile.h
    DATAREL arch/x86/boot/compressed/vmlinux
    Kernel: arch/x86/boot/bzImage is ready (#48)
    Building modules, stage 2.
    MODPOST 165 modules

    $ make
    CALL scripts/checksyscalls.sh
    DESCEND objtool
    CHK include/generated/compile.h
    LD arch/x86/boot/compressed/vmlinux
    ZOFFSET arch/x86/boot/zoffset.h
    AS arch/x86/boot/header.o
    LD arch/x86/boot/setup.elf
    OBJCOPY arch/x86/boot/setup.bin
    OBJCOPY arch/x86/boot/vmlinux.bin
    BUILD arch/x86/boot/bzImage
    Setup is 15644 bytes (padded to 15872 bytes).
    System is 6663 kB
    CRC 3eb90f40
    Kernel: arch/x86/boot/bzImage is ready (#48)
    Building modules, stage 2.
    MODPOST 165 modules

    He bisected it back to:

    commit 98f78525371b ("x86/boot: Refuse to build with data relocations")

    The root cause was the use of the "if_changed" kbuild function multiple
    times for the same target. It was designed to only be used once per
    target, otherwise it will effectively always trigger, flipping back and
    forth between the two commands getting recorded by "if_changed". Instead,
    this patch merges the two commands into a single function to get stable
    build artifacts (i.e. .vmlinux.cmd), and a single build behavior.

    Bisected-and-Reported-by: Dirk Gouders
    Fix-Suggested-by: Masahiro Yamada
    Signed-off-by: Kees Cook
    Reviewed-by: Masahiro Yamada
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

23 May, 2018

1 commit

  • commit 0b3225ab9407f557a8e20f23f37aa7236c10a9b1 upstream.

    Mixed mode allows a kernel built for x86_64 to interact with 32-bit
    EFI firmware, but requires us to define all struct definitions carefully
    when it comes to pointer sizes.

    'struct efi_pci_io_protocol_32' currently uses a 'void *' for the
    'romimage' field, which will be interpreted as a 64-bit field
    on such kernels, potentially resulting in bogus memory references
    and subsequent crashes.

    Tested-by: Hans de Goede
    Signed-off-by: Ard Biesheuvel
    Cc:
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     

29 Mar, 2018

1 commit

  • commit c55b8550fa57ba4f5e507be406ff9fc2845713e8 upstream.

    Since the x86-64 kernel must be aligned to 2MB, refuse to boot the
    kernel if the alignment of the LOAD segment isn't a multiple of 2MB.

    Signed-off-by: H.J. Lu
    Cc: Andy Shevchenko
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Juergen Gross
    Cc: Kees Cook
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    H.J. Lu
     

03 Jan, 2018

1 commit

  • commit aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 upstream.

    Add the initial files for kernel page table isolation, with a minimal init
    function and the boot time detection for this misfeature.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

20 Dec, 2017

2 commits

  • commit 6d7e0ba2d2be9e50cccba213baf07e0e183c1b24 upstream.

    If the machine does not support the paging mode for which the kernel was
    compiled, the boot process cannot continue.

    It's not possible to let the kernel detect the mismatch as it does not even
    reach the point where cpu features can be evaluted due to a triple fault in
    the KASLR setup.

    Instead of instantaneous silent reboot, emit an error message which gives
    the user the information why the boot fails.

    Fixes: 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
    Reported-by: Borislav Petkov
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Thomas Gleixner
    Tested-by: Borislav Petkov
    Cc: Andi Kleen
    Cc: Andy Lutomirski
    Cc: linux-mm@kvack.org
    Cc: Cyrill Gorcunov
    Cc: Linus Torvalds
    Link: https://lkml.kernel.org/r/20171204124059.63515-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     
  • commit 08529078d8d9adf689bf39cc38d53979a0869970 upstream.

    Prerequisite for fixing the current problem of instantaneous reboots when a
    5-level paging kernel is booted on 4-level paging hardware.

    At the same time this change prepares the decompression code to boot-time
    switching between 4- and 5-level paging.

    [ tglx: Folded the GCC < 5 fix. ]

    Fixes: 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y")
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Andy Lutomirski
    Cc: linux-mm@kvack.org
    Cc: Cyrill Gorcunov
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Link: https://lkml.kernel.org/r/20171204124059.63515-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

08 Sep, 2017

1 commit

  • Pull EFI updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Transparently fall back to other poweroff method(s) if EFI poweroff
    fails (and returns)

    - Use separate PE/COFF section headers for the RX and RW parts of the
    ARM stub loader so that the firmware can use strict mapping
    permissions

    - Add support for requesting the firmware to wipe RAM at warm reboot

    - Increase the size of the random seed obtained from UEFI so CRNG
    fast init can complete earlier

    - Update the EFI framebuffer address if it points to a BAR that gets
    moved by the PCI resource allocation code

    - Enable "reset attack mitigation" of TPM environments: this is
    enabled if the kernel is configured with
    CONFIG_RESET_ATTACK_MITIGATION=y.

    - Clang related fixes

    - Misc cleanups, constification, refactoring, etc"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi/bgrt: Use efi_mem_type()
    efi: Move efi_mem_type() to common code
    efi/reboot: Make function pointer orig_pm_power_off static
    efi/random: Increase size of firmware supplied randomness
    efi/libstub: Enable reset attack mitigation
    firmware/efi/esrt: Constify attribute_group structures
    firmware/efi: Constify attribute_group structures
    firmware/dcdbas: Constify attribute_group structures
    arm/efi: Split zImage code and data into separate PE/COFF sections
    arm/efi: Replace open coded constants with symbolic ones
    arm/efi: Remove pointless dummy .reloc section
    arm/efi: Remove forbidden values from the PE/COFF header
    drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
    efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
    efi/arm/arm64: Add missing assignment of efi.config_table
    efi/libstub/arm64: Set -fpie when building the EFI stub
    efi/libstub/arm64: Force 'hidden' visibility for section markers
    efi/libstub/arm64: Use hidden attribute for struct screen_info reference
    efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP

    Linus Torvalds
     

05 Sep, 2017

4 commits

  • Pull x86 apic updates from Thomas Gleixner:
    "This update provides:

    - Cleanup of the IDT management including the removal of the extra
    tracing IDT. A first step to cleanup the vector management code.

    - The removal of the paravirt op adjust_exception_frame. This is a
    XEN specific issue, but merged through this branch to avoid nasty
    merge collisions

    - Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
    disabled the TSC DEADLINE timer in CPUID.

    - Adjust a debug message in the ioapic code to print out the
    information correctly"

    * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
    x86/idt: Fix the X86_TRAP_BP gate
    x86/xen: Get rid of paravirt op adjust_exception_frame
    x86/eisa: Add missing include
    x86/idt: Remove superfluous ALIGNment
    x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
    x86/idt: Remove the tracing IDT leftovers
    x86/idt: Hide set_intr_gate()
    x86/idt: Simplify alloc_intr_gate()
    x86/idt: Deinline setup functions
    x86/idt: Remove unused functions/inlines
    x86/idt: Move interrupt gate initialization to IDT code
    x86/idt: Move APIC gate initialization to tables
    x86/idt: Move regular trap init to tables
    x86/idt: Move IST stack based traps to table init
    x86/idt: Move debug stack init to table based
    x86/idt: Switch early trap init to IDT tables
    x86/idt: Prepare for table based init
    x86/idt: Move early IDT setup out of 32-bit asm
    x86/idt: Move early IDT handler setup to IDT code
    x86/idt: Consolidate IDT invalidation
    ...

    Linus Torvalds
     
  • Pull x86 mm changes from Ingo Molnar:
    "PCID support, 5-level paging support, Secure Memory Encryption support

    The main changes in this cycle are support for three new, complex
    hardware features of x86 CPUs:

    - Add 5-level paging support, which is a new hardware feature on
    upcoming Intel CPUs allowing up to 128 PB of virtual address space
    and 4 PB of physical RAM space - a 512-fold increase over the old
    limits. (Supercomputers of the future forecasting hurricanes on an
    ever warming planet can certainly make good use of more RAM.)

    Many of the necessary changes went upstream in previous cycles,
    v4.14 is the first kernel that can enable 5-level paging.

    This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
    default.

    (By Kirill A. Shutemov)

    - Add 'encrypted memory' support, which is a new hardware feature on
    upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
    RAM to be encrypted and decrypted (mostly) transparently by the
    CPU, with a little help from the kernel to transition to/from
    encrypted RAM. Such RAM should be more secure against various
    attacks like RAM access via the memory bus and should make the
    radio signature of memory bus traffic harder to intercept (and
    decrypt) as well.

    This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
    by default.

    (By Tom Lendacky)

    - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
    hardware feature that attaches an address space tag to TLB entries
    and thus allows to skip TLB flushing in many cases, even if we
    switch mm's.

    (By Andy Lutomirski)

    All three of these features were in the works for a long time, and
    it's coincidence of the three independent development paths that they
    are all enabled in v4.14 at once"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
    x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
    x86/mm: Use pr_cont() in dump_pagetable()
    x86/mm: Fix SME encryption stack ptr handling
    kvm/x86: Avoid clearing the C-bit in rsvd_bits()
    x86/CPU: Align CR3 defines
    x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
    acpi, x86/mm: Remove encryption mask from ACPI page protection type
    x86/mm, kexec: Fix memory corruption with SME on successive kexecs
    x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
    x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
    x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
    x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
    x86/mm: Allow userspace have mappings above 47-bit
    x86/mm: Prepare to expose larger address space to userspace
    x86/mpx: Do not allow MPX if we have mappings above 47-bit
    x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
    x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
    x86/mm/dump_pagetables: Fix printout of p4d level
    x86/mm/dump_pagetables: Generalize address normalization
    x86/boot: Fix memremap() related build failure
    ...

    Linus Torvalds
     
  • Pull x86 boot updates from Ingo Molnar:
    "The main changes are KASL related fixes and cleanups: in particular we
    now exclude certain physical memory ranges as KASLR randomization
    targets that have proven to be unreliable (early-)RAM on some firmware
    versions"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice
    x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address
    efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor
    x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()
    x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()
    x86/boot/KASLR: Wrap e820 entries walking code into new function process_e820_entries()

    Linus Torvalds
     
  • Pull x86 asm updates from Ingo Molnar:

    - Introduce the ORC unwinder, which can be enabled via
    CONFIG_ORC_UNWINDER=y.

    The ORC unwinder is a lightweight, Linux kernel specific debuginfo
    implementation, which aims to be DWARF done right for unwinding.
    Objtool is used to generate the ORC unwinder tables during build, so
    the data format is flexible and kernel internal: there's no
    dependency on debuginfo created by an external toolchain.

    The ORC unwinder is almost two orders of magnitude faster than the
    (out of tree) DWARF unwinder - which is important for perf call graph
    profiling. It is also significantly simpler and is coded defensively:
    there has not been a single ORC related kernel crash so far, even
    with early versions. (knock on wood!)

    But the main advantage is that enabling the ORC unwinder allows
    CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
    measurably:

    With frame pointers disabled, GCC does not have to add frame pointer
    instrumentation code to every function in the kernel. The kernel's
    .text size decreases by about 3.2%, resulting in better cache
    utilization and fewer instructions executed, resulting in a broad
    kernel-wide speedup. Average speedup of system calls should be
    roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
    a speedup of 5-10% for some function execution intense workloads.

    The main cost of the unwinder is that the unwinder data has to be
    stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
    config - which is a modest cost on modern x86 systems.

    Given how young the ORC unwinder code is it's not enabled by default
    - but given the performance advantages the plan is to eventually make
    it the default unwinder on x86.

    See Documentation/x86/orc-unwinder.txt for more details.

    - Remove lguest support: its intended role was that of a temporary
    proof of concept for virtualization, plus its removal will enable the
    reduction (removal) of the paravirt API as well, so Rusty agreed to
    its removal. (Juergen Gross)

    - Clean up and fix FSGS related functionality (Andy Lutomirski)

    - Clean up IO access APIs (Andy Shevchenko)

    - Enhance the symbol namespace (Jiri Slaby)

    * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
    objtool: Handle GCC stack pointer adjustment bug
    x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
    x86/fpu/math-emu: Add ENDPROC to functions
    x86/boot/64: Extract efi_pe_entry() from startup_64()
    x86/boot/32: Extract efi_pe_entry() from startup_32()
    x86/lguest: Remove lguest support
    x86/paravirt/xen: Remove xen_patch()
    objtool: Fix objtool fallthrough detection with function padding
    x86/xen/64: Fix the reported SS and CS in SYSCALL
    objtool: Track DRAP separately from callee-saved registers
    objtool: Fix validate_branch() return codes
    x86: Clarify/fix no-op barriers for text_poke_bp()
    x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
    selftests/x86/fsgsbase: Test selectors 1, 2, and 3
    x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
    x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
    x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
    x86/asm/32: Fix regs_get_register() on segment registers
    x86/xen/64: Rearrange the SYSCALL entries
    x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
    ...

    Linus Torvalds
     

31 Aug, 2017

1 commit

  • …FI_LOADER_* from KASLR's choice

    There's a potential bug in how we select the KASLR kernel address n
    the early boot code.

    The KASLR boot code currently chooses the kernel image's physical memory
    location from E820_TYPE_RAM regions by walking over all e820 entries.

    E820_TYPE_RAM includes EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA
    as well, so those regions can end up hosting the kernel image. According to
    the UEFI spec, all memory regions marked as EfiBootServicesCode and
    EfiBootServicesData are available as free memory after the first call
    to ExitBootServices(). I.e. so such regions should be usable for the
    kernel, per spec.

    In real life however, we have workarounds for broken x86 firmware,
    where we keep such regions reserved until SetVirtualAddressMap() is done.

    See the following code in should_map_region():

    static bool should_map_region(efi_memory_desc_t *md)
    {
    ...
    /*
    * Map boot services regions as a workaround for buggy
    * firmware that accesses them even when they shouldn't.
    *
    * See efi_{reserve,free}_boot_services().
    */
    if (md->type =3D=3D EFI_BOOT_SERVICES_CODE ||
    md->type =3D=3D EFI_BOOT_SERVICES_DATA)
    return false;

    This workaround suppressed a boot crash, but potential issues still
    remain because no one prevents the regions from overlapping with kernel
    image by KASLR.

    So let's make sure that EFI_BOOT_SERVICES_{CODE|DATA} regions are never
    chosen as kernel memory for the workaround to work fine.

    Furthermore, EFI_LOADER_{CODE|DATA} regions are also excluded because
    they can be used after ExitBootServices() as defined in EFI spec.

    As a result, we choose kernel address only from EFI_CONVENTIONAL_MEMORY
    which is the only memory type we know to be safely free.

    Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Junichi Nomura <j-nomura@ce.jp.nec.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matt Fleming <matt@codeblueprint.co.uk>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Garnier <thgarnie@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: fanc.fnst@cn.fujitsu.com
    Cc: izumi.taku@jp.fujitsu.com
    Link: http://lkml.kernel.org/r/20170828074444.GC23181@hori1.linux.bs1.fc.nec.co.jp
    [ Rewrote/fixed/clarified the changelog and the in code comments. ]
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Naoya Horiguchi
     

29 Aug, 2017

5 commits

  • If a zero for the number of lines manages to slip through, scroll()
    may underflow some offset calculations, causing accesses outside the
    video memory.

    Make the check in __putstr() more pessimistic to prevent that.

    Signed-off-by: Jan H. Schönherr
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.de
    Signed-off-by: Ingo Molnar

    Jan H. Schönherr
     
  • The current slack space is not enough for LZ4, which has a worst case
    overhead of 0.4% for data that cannot be further compressed. With
    an LZ4 compressed kernel with an embedded initrd, the output is likely
    to overwrite the input.

    Increase the slack space to avoid that.

    Signed-off-by: Jan H. Schönherr
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.de
    Signed-off-by: Ingo Molnar

    Jan H. Schönherr
     
  • Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into
    startup_64().

    In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry()
    to start at 0x210. But this requirement was removed long time ago, in:

    99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code")

    The way it is now makes the code less readable and illogical. Given
    we can now safely extract the inlined efi_pe_entry() body from
    startup_64() into a separate function, we do so.

    We also annotate the function appropriatelly by ENTRY+ENDPROC.

    ABI offsets are preserved:

    0000000000000000 T startup_32
    0000000000000200 T startup_64
    0000000000000390 T efi64_stub_entry

    On the top-level, it looked like:

    .org 0x200
    ENTRY(startup_64)
    #ifdef CONFIG_EFI_STUB ; start of inlined
    jmp preferred_addr
    GLOBAL(efi_pe_entry)
    ... ; a lot of assembly (efi_pe_entry)
    leaq preferred_addr(%rax), %rax
    jmp *%rax
    preferred_addr:
    #endif ; end of inlined
    ... ; a lot of assembly (startup_64)
    ENDPROC(startup_64)

    And it is now converted into:

    .org 0x200
    ENTRY(startup_64)
    ... ; a lot of assembly (startup_64)
    ENDPROC(startup_64)

    #ifdef CONFIG_EFI_STUB
    ENTRY(efi_pe_entry)
    ... ; a lot of assembly (efi_pe_entry)
    leaq startup_64(%rax), %rax
    jmp *%rax
    ENDPROC(efi_pe_entry)
    #endif

    Signed-off-by: Jiri Slaby
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: David Woodhouse
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: ard.biesheuvel@linaro.org
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.cz
    Signed-off-by: Ingo Molnar

    Jiri Slaby
     
  • The efi_pe_entry() body is somehow squashed into startup_32(). In the old days,
    we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start
    at 0x10.

    But this requirement was removed long time ago, in:

    99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code")

    The way it is now makes the code less readable and illogical. Given
    we can now safely extract the inlined efi_pe_entry() body from
    startup_32() into a separate function, we do so and we separate it to two
    functions as they are marked already: efi_pe_entry() + efi32_stub_entry().

    We also annotate the functions appropriatelly by ENTRY+ENDPROC.

    ABI offset is preserved:

    0000 128 FUNC GLOBAL DEFAULT 6 startup_32
    0080 60 FUNC GLOBAL DEFAULT 6 efi_pe_entry
    00bc 68 FUNC GLOBAL DEFAULT 6 efi32_stub_entry

    On the top-level, it looked like this:

    ENTRY(startup_32)
    #ifdef CONFIG_EFI_STUB ; start of inlined
    jmp preferred_addr
    ENTRY(efi_pe_entry)
    ... ; a lot of assembly (efi_pe_entry)
    ENTRY(efi32_stub_entry)
    ... ; a lot of assembly (efi32_stub_entry)
    leal preferred_addr(%eax), %eax
    jmp *%eax
    preferred_addr:
    #endif ; end of inlined
    ... ; a lot of assembly (startup_32)
    ENDPROC(startup_32)

    And it is now converted into:

    ENTRY(startup_32)
    ... ; a lot of assembly (startup_32)
    ENDPROC(startup_32)

    #ifdef CONFIG_EFI_STUB
    ENTRY(efi_pe_entry)
    ... ; a lot of assembly (efi_pe_entry)
    ENDPROC(efi_pe_entry)

    ENTRY(efi32_stub_entry)
    ... ; a lot of assembly (efi32_stub_entry)
    leal startup_32(%eax), %eax
    jmp *%eax
    ENDPROC(efi32_stub_entry)
    #endif

    Signed-off-by: Jiri Slaby
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: David Woodhouse
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: ard.biesheuvel@linaro.org
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.cz
    Signed-off-by: Ingo Molnar

    Jiri Slaby
     
  • The first 32 bits of gate struct are the same for 32 and 64 bit kernels.

    The 32-bit version uses desc_struct and no designated data structure,
    so we need different accessors for 32 and 64 bit kernels.

    Aside of that the macros which are necessary to build the 32-bit
    gate descriptor are horrible to read.

    Unify the gate structs and switch all code fiddling with it over.

    Signed-off-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

26 Aug, 2017

2 commits

  • If a machine is reset while secrets are present in RAM, it may be
    possible for code executed after the reboot to extract those secrets
    from untouched memory. The Trusted Computing Group specified a mechanism
    for requesting that the firmware clear all RAM on reset before booting
    another OS. This is done by setting the MemoryOverwriteRequestControl
    variable at startup. If userspace can ensure that all secrets are
    removed as part of a controlled shutdown, it can reset this variable to
    0 before triggering a hardware reboot.

    Signed-off-by: Matthew Garrett
    Signed-off-by: Ard Biesheuvel
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170825155019.6740-2-ard.biesheuvel@linaro.org
    Signed-off-by: Ingo Molnar

    Matthew Garrett
     
  • Conflicts:
    arch/x86/kernel/head64.c
    arch/x86/mm/mmap.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

17 Aug, 2017

3 commits

  • Currently KASLR will parse all e820 entries of RAM type and add all
    candidate positions into the slots array. After that we choose one slot
    randomly as the new position which the kernel will be decompressed into
    and run at.

    On systems with EFI enabled, e820 memory regions are coming from EFI
    memory regions by combining adjacent regions.

    These EFI memory regions have various attributes, and the "mirrored"
    attribute is one of them. The physical memory region whose descriptors
    in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
    mirrored. The address range mirroring feature of the kernel arranges such
    mirrored regions into normal zones and other regions into movable zones.

    With the mirroring feature enabled, the code and data of the kernel can only
    be located in the more reliable mirrored regions. However, the current KASLR
    code doesn't check EFI memory entries, and could choose a new kernel position
    in non-mirrored regions. This will break the intended functionality of the
    address range mirroring feature.

    To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored
    region to process for adding candidate of randomization slot. If EFI is disabled
    or no mirrored region found, still process the e820 memory map.

    Signed-off-by: Baoquan He
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: ard.biesheuvel@linaro.org
    Cc: fanc.fnst@cn.fujitsu.com
    Cc: izumi.taku@jp.fujitsu.com
    Cc: keescook@chromium.org
    Cc: linux-efi@vger.kernel.org
    Cc: matt@codeblueprint.co.uk
    Cc: n-horiguchi@ah.jp.nec.com
    Cc: thgarnie@google.com
    Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-bhe@redhat.com
    [ Rewrote most of the text. ]
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • The existing map iteration helper for_each_efi_memory_desc_in_map can
    only be used after the kernel initializes the EFI subsystem to set up
    struct efi_memory_map.

    Before that we also need iterate map descriptors which are stored in several
    intermediate structures, like struct efi_boot_memmap for arch independent
    usage and struct efi_info for x86 arch only.

    Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and
    replace several places where that primitive is open coded.

    Signed-off-by: Baoquan He
    [ Various improvements to the text. ]
    Acked-by: Matt Fleming
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: ard.biesheuvel@linaro.org
    Cc: fanc.fnst@cn.fujitsu.com
    Cc: izumi.taku@jp.fujitsu.com
    Cc: keescook@chromium.org
    Cc: linux-efi@vger.kernel.org
    Cc: n-horiguchi@ah.jp.nec.com
    Cc: thgarnie@google.com
    Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Jul, 2017

1 commit

  • The clang warning 'address-of-packed-member' is disabled for the general
    kernel code, also disable it for the x86 boot code.

    This suppresses a bunch of warnings like this when building with clang:

    ./arch/x86/include/asm/processor.h:535:30: warning: taking address of
    packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
    unaligned pointer value [-Waddress-of-packed-member]
    return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
    ^~~~~~~~~~~~~~~~~~~
    ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
    'this_cpu_read_stable'
    #define this_cpu_read_stable(var) percpu_stable_op("mov", var)
    ^~~
    ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
    'percpu_stable_op'
    : "p" (&(var)));
    ^~~

    Signed-off-by: Matthias Kaehlcke
    Cc: Doug Anderson
    Cc: Linus Torvalds
    Cc: Masahiro Yamada
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.org
    Signed-off-by: Ingo Molnar

    Matthias Kaehlcke
     

25 Jul, 2017

1 commit

  • undef memcpy() and friends in boot/string.c so that the functions
    defined here will have the correct names, otherwise we end up
    up trying to redefine __builtin_memcpy() etc.

    Surprisingly, GCC allows this (and, helpfully, discards the
    __builtin_ prefix from the function name when compiling it),
    but clang does not.

    Adding these #undef's appears to preserve what I assume was
    the original intent of the code.

    Signed-off-by: Michael Davidson
    Signed-off-by: Matthias Kaehlcke
    Acked-by: H. Peter Anvin
    Cc: Arnd Bergmann
    Cc: Bernhard.Rosenkranzer@linaro.org
    Cc: Greg Hackmann
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Nick Desaulniers
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170724235155.79255-1-mka@chromium.org
    Signed-off-by: Ingo Molnar

    Michael Davidson
     

20 Jul, 2017

1 commit

  • Every kernel build on x86 will result in some output:

    Setup is 13084 bytes (padded to 13312 bytes).
    System is 4833 kB
    CRC 6d35fa35
    Kernel: arch/x86/boot/bzImage is ready (#2)

    This shuts it up, so that 'make -s' is truely silent as long as
    everything works. Building without '-s' should produce unchanged
    output.

    Signed-off-by: Arnd Bergmann
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170719125310.2487451-6-arnd@arndb.de
    Signed-off-by: Ingo Molnar

    Arnd Bergmann
     

18 Jul, 2017

4 commits

  • Changes to the existing page table macros will allow the SME support to
    be enabled in a simple fashion with minimal changes to files that use these
    macros. Since the memory encryption mask will now be part of the regular
    pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
    _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
    without the encryption mask before SME becomes active. Two new pgprot()
    macros are defined to allow setting or clearing the page encryption mask.

    The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does
    not support encryption for MMIO areas so this define removes the encryption
    mask from the page attribute.

    Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
    creating a physical address with the encryption mask. These are used when
    working with the cr3 register so that the PGD can be encrypted. The current
    __va() macro is updated so that the virtual address is generated based off
    of the physical address without the encryption mask thus allowing the same
    virtual address to be generated regardless of whether encryption is enabled
    for that physical location or not.

    Also, an early initialization function is added for SME. If SME is active,
    this function:

    - Updates the early_pmd_flags so that early page faults create mappings
    with the encryption mask.

    - Updates the __supported_pte_mask to include the encryption mask.

    - Updates the protection_map entries to include the encryption mask so
    that user-space allocations will automatically have the encryption mask
    applied.

    Signed-off-by: Tom Lendacky
    Reviewed-by: Thomas Gleixner
    Reviewed-by: Borislav Petkov
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brijesh Singh
    Cc: Dave Young
    Cc: Dmitry Vyukov
    Cc: Jonathan Corbet
    Cc: Konrad Rzeszutek Wilk
    Cc: Larry Woodman
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Michael S. Tsirkin
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krčmář
    Cc: Rik van Riel
    Cc: Toshimitsu Kani
    Cc: kasan-dev@googlegroups.com
    Cc: kvm@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-efi@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com
    Signed-off-by: Ingo Molnar

    Tom Lendacky
     
  • Now process_e820_entry() is not limited to e820 entry processing, rename
    it to process_mem_region(). And adjust the code comment accordingly.

    Signed-off-by: Baoquan He
    Acked-by: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: fanc.fnst@cn.fujitsu.com
    Cc: izumi.taku@jp.fujitsu.com
    Cc: matt@codeblueprint.co.uk
    Cc: thgarnie@google.com
    Link: http://lkml.kernel.org/r/1499603862-11516-4-git-send-email-bhe@redhat.com
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • This makes process_e820_entry() be able to process any kind of memory
    region.

    Signed-off-by: Baoquan He
    Acked-by: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: fanc.fnst@cn.fujitsu.com
    Cc: izumi.taku@jp.fujitsu.com
    Cc: matt@codeblueprint.co.uk
    Cc: thgarnie@google.com
    Link: http://lkml.kernel.org/r/1499603862-11516-3-git-send-email-bhe@redhat.com
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • The original function process_e820_entry() only takes care of each
    e820 entry passed.

    And move the E820_TYPE_RAM checking logic into process_e820_entries().

    And remove the redundent local variable 'addr' definition in
    find_random_phys_addr().

    Signed-off-by: Baoquan He
    Acked-by: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: fanc.fnst@cn.fujitsu.com
    Cc: izumi.taku@jp.fujitsu.com
    Cc: matt@codeblueprint.co.uk
    Cc: thgarnie@google.com
    Link: http://lkml.kernel.org/r/1499603862-11516-2-git-send-email-bhe@redhat.com
    Signed-off-by: Ingo Molnar

    Baoquan He
     

13 Jul, 2017

1 commit

  • This adds support for compiling with a rough equivalent to the glibc
    _FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
    overflow checks for string.h functions when the compiler determines the
    size of the source or destination buffer at compile-time. Unlike glibc,
    it covers buffer reads in addition to writes.

    GNU C __builtin_*_chk intrinsics are avoided because they would force a
    much more complex implementation. They aren't designed to detect read
    overflows and offer no real benefit when using an implementation based
    on inline checks. Inline checks don't add up to much code size and
    allow full use of the regular string intrinsics while avoiding the need
    for a bunch of _chk functions and per-arch assembly to avoid wrapper
    overhead.

    This detects various overflows at compile-time in various drivers and
    some non-x86 core kernel code. There will likely be issues caught in
    regular use at runtime too.

    Future improvements left out of initial implementation for simplicity,
    as it's all quite optional and can be done incrementally:

    * Some of the fortified string functions (strncpy, strcat), don't yet
    place a limit on reads from the source based on __builtin_object_size of
    the source buffer.

    * Extending coverage to more string functions like strlcat.

    * It should be possible to optionally use __builtin_object_size(x, 1) for
    some functions (C strings) to detect intra-object overflows (like
    glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
    approach to avoid likely compatibility issues.

    * The compile-time checks should be made available via a separate config
    option which can be enabled by default (or always enabled) once enough
    time has passed to get the issues it catches fixed.

    Kees said:
    "This is great to have. While it was out-of-tree code, it would have
    blocked at least CVE-2016-3858 from being exploitable (improper size
    argument to strlcpy()). I've sent a number of fixes for
    out-of-bounds-reads that this detected upstream already"

    [arnd@arndb.de: x86: fix fortified memcpy]
    Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
    [keescook@chromium.org: avoid panic() in favor of BUG()]
    Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
    [keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
    Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
    Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
    Signed-off-by: Daniel Micay
    Signed-off-by: Kees Cook
    Signed-off-by: Arnd Bergmann
    Acked-by: Kees Cook
    Cc: Mark Rutland
    Cc: Daniel Axtens
    Cc: Rasmus Villemoes
    Cc: Andy Shevchenko
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Micay
     

04 Jul, 2017

3 commits

  • Pull x86 mm updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Continued work to add support for 5-level paging provided by future
    Intel CPUs. In particular we switch the x86 GUP code to the generic
    implementation. (Kirill A. Shutemov)

    - Continued work to add PCID CPU support to native kernels as well.
    In this round most of the focus is on reworking/refreshing the TLB
    flush infrastructure for the upcoming PCID changes. (Andy
    Lutomirski)"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
    x86/mm: Delete a big outdated comment about TLB flushing
    x86/mm: Don't reenter flush_tlb_func_common()
    x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
    x86/ftrace: Exclude functions in head64.c from function-tracing
    x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap
    x86/mm: Remove reset_lazy_tlbstate()
    x86/ldt: Simplify the LDT switching logic
    x86/boot/64: Put __startup_64() into .head.text
    x86/mm: Add support for 5-level paging for KASLR
    x86/mm: Make kernel_physical_mapping_init() support 5-level paging
    x86/mm: Add sync_global_pgds() for configuration with 5-level paging
    x86/boot/64: Add support of additional page table level during early boot
    x86/boot/64: Rename init_level4_pgt and early_level4_pgt
    x86/boot/64: Rewrite startup_64() in C
    x86/boot/compressed: Enable 5-level paging during decompression stage
    x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
    x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
    x86/boot/efi: Cleanup initialization of GDT entries
    x86/asm: Fix comment in return_from_SYSCALL_64()
    x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
    ...

    Linus Torvalds
     
  • Pull x86 cleanups from Ingo Molnar:
    "Two small cleanups"

    * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/paravirt: Remove unnecessary return from void function
    x86/boot: Add missing strchr() declaration

    Linus Torvalds
     
  • Pull x86 boot updates from Ingo Molnar:
    "The main changes in this cycle were KASLR improvements for rare
    environments with special boot options, by Baoquan He. Also misc
    smaller changes/cleanups"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/debug: Extend the lower bound of crash kernel low reservations
    x86/boot: Remove unused copy_*_gs() functions
    x86/KASLR: Use the right memcpy() implementation
    Documentation/kernel-parameters.txt: Update 'memmap=' boot option description
    x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options
    x86/KASLR: Parse all 'memmap=' boot option entries

    Linus Torvalds
     

30 Jun, 2017

3 commits

  • KASLR uses hack to detect whether we booted via startup_32() or
    startup_64(): it checks what is loaded into cr3 and compares it to
    _pgtables. _pgtables is the array of page tables where early code
    allocates page table from.

    KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but
    that's not true if we booted with 5-level paging enabled. In this case top
    level page table is allocated separately and only the first p4d page table
    is allocated from the array.

    Let's modify the check to cover both 4- and 5-level paging cases.

    The patch also renames 'level4p' to 'top_level_pgt' as it now can hold
    page table for 4th or 5th level, depending on configuration.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170628121730.43079-1-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     
  • Kernel text KASLR is separated into physical address and virtual
    address randomization. And for virtual address randomization, we
    only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
    So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
    but not the original kernel loading address 'output'.

    The bug will cause kernel boot failure if kernel is loaded at a different
    position than the address, 16M, which is decided at compiled time.
    Kexec/kdump is such practical case.

    To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
    value.

    Tested-by: Dave Young
    Signed-off-by: Baoquan He
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately")
    Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • For kernel text KASLR, the virtual address is confined to area of 1G,
    [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of
    virtual address randomization, we only randomize to get an offset
    between 16M and 1G, then add this offset to the starting address,
    0xffffffff80000000. Here 16M is the offset which is decided at linking
    stage. So the amount of the local variable 'virt_addr' which respresents
    the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.

    Add a debug check for the offset. If out of bounds, print error
    message and hang there.

    Suggested-by: Ingo Molnar
    Signed-off-by: Baoquan He
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.com
    Signed-off-by: Ingo Molnar

    Baoquan He
     

24 Jun, 2017

1 commit

  • The Sparse static analyzer emits this warning:

    symbol 'strchr' was not declared. Should it be static?

    This patch adds the appropriate extern declaration to string.h
    to fix the warning.

    Signed-off-by: Tommy Nguyen
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170623143601.GA20743@NoChina
    Signed-off-by: Ingo Molnar

    Tommy Nguyen
     

13 Jun, 2017

1 commit

  • We need to cover two basic cases: when bootloader left us in 32-bit mode
    and when bootloader enabled long mode.

    The patch implements unified codepath to enabled 5-level paging for both
    cases. It means case when we start in 32-bit mode, we first enable long
    mode with 4-level and then switch over to 5-level paging.

    Switching from 4-level to 5-level paging is not trivial. We cannot do it
    directly. Setting LA57 in long mode would trigger #GP. So we need to
    switch off long mode first and the then re-enable with 5-level paging.

    NOTE: The need of switching off long mode means we are in trouble if
    bootloader put us above 4G boundary. If bootloader wants to boot 5-level
    paging kernel, it has to put kernel below 4G or enable 5-level paging on
    it's own, so we could avoid the step.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20170606113133.22974-7-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov