03 Aug, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - GCC plugin support by Emese Revfy from grsecurity, with a fixup from
    Kees Cook. The plugins are meant to be used for static analysis of
    the kernel code. Two plugins are provided already.

    - reduction of the gcc commandline by Arnd Bergmann.

    - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

    - bin2c fix by Michael Tautschnig

    - setlocalversion fix by Wolfram Sang

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    gcc-plugins: disable under COMPILE_TEST
    kbuild: Abort build on bad stack protector flag
    scripts: Fix size mismatch of kexec_purgatory_size
    kbuild: make samples depend on headers_install
    Kbuild: don't add obj tree in additional includes
    Kbuild: arch: look for generated headers in obtree
    Kbuild: always prefix objtree in LINUXINCLUDE
    Kbuild: avoid duplicate include path
    Kbuild: don't add ../../ to include path
    vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
    kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
    kconfig.h: use already defined macros for IS_REACHABLE() define
    export.h: use __is_defined() to check if __KSYM_* is defined
    kconfig.h: use __is_defined() to check if MODULE is defined
    kbuild: setlocalversion: print error to STDERR
    Add sancov plugin
    Add Cyclomatic complexity GCC plugin
    GCC plugin infrastructure
    Shared library support

    Linus Torvalds
     

26 Jul, 2016

2 commits

  • Pull x86 boot updates from Ingo Molnar:
    "The main changes:

    - add initial commits to randomize kernel memory section virtual
    addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
    (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)

    - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
    Cook)

    - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)

    - misc cleanups/fixes"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/boot: Simplify EBDA-vs-BIOS reservation logic
    x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
    x86/boot: Reorganize and clean up the BIOS area reservation code
    x86/mm: Do not reference phys addr beyond kernel
    x86/mm: Add memory hotplug support for KASLR memory randomization
    x86/mm: Enable KASLR for vmalloc memory regions
    x86/mm: Enable KASLR for physical mapping memory regions
    x86/mm: Implement ASLR for kernel memory regions
    x86/mm: Separate variable for trampoline PGD
    x86/mm: Add PUD VA support for physical mapping
    x86/mm: Update physical mapping variable names
    x86/mm: Refactor KASLR entropy functions
    x86/KASLR: Fix boot crash with certain memory configurations
    x86/boot/64: Add forgotten end of function marker
    x86/KASLR: Allow randomization below the load address
    x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
    x86/KASLR: Randomize virtual address separately
    x86/KASLR: Clarify identity map interface
    x86/boot: Refuse to build with data relocations
    x86/KASLR, x86/power: Remove x86 hibernation restrictions

    Linus Torvalds
     
  • Pull x86 mm updates from Ingo Molnar:
    "Various x86 low level modifications:

    - preparatory work to support virtually mapped kernel stacks (Andy
    Lutomirski)

    - support for 64-bit __get_user() on 32-bit kernels (Benjamin
    LaHaise)

    - (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

    - MPX enhancements (Dave Hansen)

    - mremap() extension to allow remapping of the special VDSO vma, for
    purposes of user level context save/restore (Dmitry Safonov)

    - hweight and entry code cleanups (Borislav Petkov)

    - bitops code generation optimizations and cleanups with modern GCC
    (H. Peter Anvin)

    - syscall entry code optimizations (Paolo Bonzini)"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
    x86/mm/cpa: Add missing comment in populate_pdg()
    x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
    x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
    x86/smp: Remove unnecessary initialization of thread_info::cpu
    x86/smp: Remove stack_smp_processor_id()
    x86/uaccess: Move thread_info::addr_limit to thread_struct
    x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
    x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
    x86/dumpstack: When OOPSing, rewind the stack before do_exit()
    x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
    x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
    x86/dumpstack: Try harder to get a call trace on stack overflow
    x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
    x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
    x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
    x86/mm: Use pte_none() to test for empty PTE
    x86/mm: Disallow running with 32-bit PTEs to work around erratum
    x86/mm: Ignore A/D bits in pte/pmd/pud_none()
    x86/mm: Move swap offset/type up in PTE to work around erratum
    x86/entry: Inline enter_from_user_mode()
    ...

    Linus Torvalds
     

19 Jul, 2016

1 commit

  • There are very few files that need add an -I$(obj) gcc for the preprocessor
    or the assembler. For C files, we add always these for both the objtree and
    srctree, but for the other ones we require the Makefile to add them, and
    Kbuild then adds it for both trees.

    As a preparation for changing the meaning of the -I$(obj) directive to
    only refer to the srctree, this changes the two instances in arch/x86 to use
    an explictit $(objtree) prefix where needed, otherwise we won't find the
    headers any more, as reported by the kbuild 0day builder.

    arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Michal Marek

    Arnd Bergmann
     

15 Jul, 2016

1 commit


13 Jul, 2016

1 commit

  • The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
    Landing) has an erratum where a processor thread setting the Accessed
    or Dirty bits may not do so atomically against its checks for the
    Present bit. This may cause a thread (which is about to page fault)
    to set A and/or D, even though the Present bit had already been
    atomically cleared.

    These bits are truly "stray". In the case of the Dirty bit, the
    thread associated with the stray set was *not* allowed to write to
    the page. This means that we do not have to launder the bit(s); we
    can simply ignore them.

    If the PTE is used for storing a swap index or a NUMA migration index,
    the A bit could be misinterpreted as part of the swap type. The stray
    bits being set cause a software-cleared PTE to be interpreted as a
    swap entry. In some cases (like when the swap index ends up being
    for a non-existent swapfile), the kernel detects the stray value
    and WARN()s about it, but there is no guarantee that the kernel can
    always detect it.

    When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
    to move the swap PTE format around to avoid these troublesome bits.
    But, 32-bit non-PAE is tight on bits. So, disallow it from running
    on this hardware. I can't imagine anyone wanting to run 32-bit
    non-highmem kernels on this hardware, but disallowing them from
    running entirely is surely the safe thing to do.

    Signed-off-by: Dave Hansen
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: dave.hansen@intel.com
    Cc: linux-mm@kvack.org
    Cc: mhocko@suse.com
    Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

08 Jul, 2016

3 commits

  • Add the physical mapping in the list of randomized memory regions.

    The physical memory mapping holds most allocations from boot and heap
    allocators. Knowing the base address and physical memory size, an attacker
    can deduce the PDE virtual address for the vDSO memory page. This attack
    was demonstrated at CanSecWest 2016, in the following presentation:

    "Getting Physical: Extreme Abuse of Intel Based Paged Systems":
    https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf

    (See second part of the presentation).

    The exploits used against Linux worked successfully against 4.6+ but
    fail with KASLR memory enabled:

    https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits

    Similar research was done at Google leading to this patch proposal.

    Variants exists to overwrite /proc or /sys objects ACLs leading to
    elevation of privileges. These variants were tested against 4.6+.

    The page offset used by the compressed kernel retains the static value
    since it is not yet randomized during this boot stage.

    Signed-off-by: Thomas Garnier
    Signed-off-by: Kees Cook
    Cc: Alexander Kuleshov
    Cc: Alexander Popov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Aneesh Kumar K.V
    Cc: Baoquan He
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Christian Borntraeger
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: Jan Beulich
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Lv Zheng
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Stephen Smalley
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: Xiao Guangrong
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: linux-doc@vger.kernel.org
    Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Thomas Garnier
     
  • Move the KASLR entropy functions into arch/x86/lib to be used in early
    kernel boot for KASLR memory randomization.

    Signed-off-by: Thomas Garnier
    Signed-off-by: Kees Cook
    Cc: Alexander Kuleshov
    Cc: Alexander Popov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Aneesh Kumar K.V
    Cc: Baoquan He
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Christian Borntraeger
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: Jan Beulich
    Cc: Joerg Roedel
    Cc: Jonathan Corbet
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Lv Zheng
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Stephen Smalley
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: Xiao Guangrong
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: linux-doc@vger.kernel.org
    Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Thomas Garnier
     
  • Ye Xiaolong reported this boot crash:

    |
    | XZ-compressed data is corrupt
    |
    | -- System halted
    |

    Fix the bug in mem_avoid_overlap() of finding the earliest overlap.

    Reported-and-tested-by: Ye Xiaolong
    Signed-off-by: Baoquan He
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Baoquan He
     

27 Jun, 2016

1 commit

  • Remove unused variable 'efi', it is never used. This fixes the following
    clang build warning:

    arch/x86/boot/compressed/eboot.c:803:2: warning: Value stored to 'efi' is never read

    Signed-off-by: Colin Ian King
    Signed-off-by: Matt Fleming
    Cc: Ard Biesheuvel
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/1466839230-12781-4-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar

    Colin Ian King
     

26 Jun, 2016

6 commits

  • Currently the kernel image physical address randomization's lower
    boundary is the original kernel load address.

    For bootloaders that load kernels into very high memory (e.g. kexec),
    this means randomization takes place in a very small window at the
    top of memory, ignoring the large region of physical memory below
    the load address.

    Since mem_avoid[] is already correctly tracking the regions that must be
    avoided, this patch changes the minimum address to whatever is less:
    512M (to conservatively avoid unknown things in lower memory) or the
    load address. Now, for example, if the kernel is loaded at 8G, [512M,
    8G) will be added to the list of possible physical memory positions.

    Signed-off-by: Yinghai Lu
    [ Rewrote the changelog, refactored the code to use min(). ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: H.J. Lu
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org
    [ Edited the changelog some more, plus the code comment as well. ]
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • We want the physical address to be randomized anywhere between
    16MB and the top of physical memory (up to 64TB).

    This patch exchanges the prior slots[] array for the new slot_areas[]
    array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
    address offset for 64-bit. As before, process_e820_entry() walks
    memory and populates slot_areas[], splitting on any detected mem_avoid
    collisions.

    Finally, since the slots[] array and its associated functions are not
    needed any more, so they are removed.

    Based on earlier patches by Baoquan He.

    Originally-from: Baoquan He
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: H.J. Lu
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • The current KASLR implementation randomizes the physical and virtual
    addresses of the kernel together (both are offset by the same amount). It
    calculates the delta of the physical address where vmlinux was linked
    to load and where it is finally loaded. If the delta is not equal to 0
    (i.e. the kernel was relocated), relocation handling needs be done.

    On 64-bit, this patch randomizes both the physical address where kernel
    is decompressed and the virtual address where kernel text is mapped and
    will execute from. We now have two values being chosen, so the function
    arguments are reorganized to pass by pointer so they can be directly
    updated. Since relocation handling only depends on the virtual address,
    we must check the virtual delta, not the physical delta for processing
    kernel relocations. This also populates the page table for the new
    virtual address range. 32-bit does not support a separate virtual address,
    so it continues to use the physical offset for its virtual offset.

    Additionally updates the sanity checks done on the resulting kernel
    addresses since they are potentially separate now.

    [kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
    [kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
    Signed-off-by: Baoquan He
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: H.J. Lu
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • This extracts the call to prepare_level4() into a top-level function
    that the user of the pagetable.c interface must call to initialize
    the new page tables. For clarity and to match the "finalize" function,
    it has been renamed to initialize_identity_maps(). This function also
    gains the initialization of mapping_info so we don't have to do it each
    time in add_identity_map().

    Additionally add copyright notice to the top, to make it clear that the
    bulk of the pagetable.c code was written by Yinghai, and that I just
    added bugs later. :)

    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: H.J. Lu
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • The compressed kernel is built with -fPIC/-fPIE so that it can run in any
    location a bootloader happens to put it. However, since ELF relocation
    processing is not happening (and all the relocation information has
    already been stripped at link time), none of the code can use data
    relocations (e.g. static assignments of pointers). This is already noted
    in a warning comment at the top of misc.c, but this adds an explicit
    check for the condition during the linking stage to block any such bugs
    from appearing.

    If this was in place with the earlier bug in pagetable.c, the build
    would fail like this:

    ...
    CC arch/x86/boot/compressed/pagetable.o
    DATAREL arch/x86/boot/compressed/vmlinux
    error: arch/x86/boot/compressed/pagetable.o has data relocations!
    make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
    ...

    A clean build shows:

    ...
    CC arch/x86/boot/compressed/pagetable.o
    DATAREL arch/x86/boot/compressed/vmlinux
    LD arch/x86/boot/compressed/vmlinux
    ...

    Suggested-by: Ingo Molnar
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: H.J. Lu
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • With the following fix:

    70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")

    ... there is no longer a problem with hibernation resuming a
    KASLR-booted kernel image, so remove the restriction.

    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jonathan Corbet
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Linux PM list
    Cc: Logan Gunthorpe
    Cc: Pavel Machek
    Cc: Peter Zijlstra
    Cc: Stephen Smalley
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Cc: linux-doc@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.net
    Signed-off-by: Ingo Molnar

    Kees Cook
     

11 Jun, 2016

1 commit


09 Jun, 2016

2 commits

  • Remove open-coded uses of set instructions to use CC_SET()/CC_OUT() in
    arch/x86/boot/boot.h.

    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/r/1465414726-197858-10-git-send-email-hpa@linux.intel.com
    Reviewed-by: Andy Lutomirski
    Reviewed-by: Borislav Petkov
    Acked-by: Peter Zijlstra (Intel)

    H. Peter Anvin
     
  • The gcc people have confirmed that using "bool" when combined with
    inline assembly always is treated as a byte-sized operand that can be
    assumed to be 0 or 1, which is exactly what the SET instruction
    emits. Change the output types and intermediate variables of as many
    operations as practical to "bool".

    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/r/1465414726-197858-3-git-send-email-hpa@linux.intel.com
    Reviewed-by: Andy Lutomirski
    Reviewed-by: Borislav Petkov
    Acked-by: Peter Zijlstra (Intel)

    H. Peter Anvin
     

08 Jun, 2016

1 commit


27 May, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
    unexports symbols which are not used in the current config [Nicolas
    Pitre]

    - several kbuild rule cleanups [Masahiro Yamada]

    - warning option adjustments for gcov etc [Arnd Bergmann]

    - a few more small fixes

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
    kbuild: move -Wunused-const-variable to W=1 warning level
    kbuild: fix if_change and friends to consider argument order
    kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
    kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
    gcov: disable -Wmaybe-uninitialized warning
    gcov: disable tree-loop-im to reduce stack usage
    gcov: disable for COMPILE_TEST
    Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
    Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
    kbuild: forbid kernel directory to contain spaces and colons
    kbuild: adjust ksym_dep_filter for some cmd_* renames
    kbuild: Fix dependencies for final vmlinux link
    kbuild: better abstract vmlinux sequential prerequisites
    kbuild: fix call to adjust_autoksyms.sh when output directory specified
    kbuild: Get rid of KBUILD_STR
    kbuild: rename cmd_as_s_S to cmd_cpp_s_S
    kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
    kbuild: drop redundant "PHONY += FORCE"
    kbuild: delete unnecessary "@:"
    kbuild: mark help target as PHONY
    ...

    Linus Torvalds
     

17 May, 2016

1 commit

  • Pull x86 boot updates from Ingo Molnar:
    "The biggest changes in this cycle were:

    - prepare for more KASLR related changes, by restructuring, cleaning
    up and fixing the existing boot code. (Kees Cook, Baoquan He,
    Yinghai Lu)

    - simplifly/concentrate subarch handling code, eliminate
    paravirt_enabled() usage. (Luis R Rodriguez)"

    * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
    x86/KASLR: Clarify purpose of each get_random_long()
    x86/KASLR: Add virtual address choosing function
    x86/KASLR: Return earliest overlap when avoiding regions
    x86/KASLR: Add 'struct slot_area' to manage random_addr slots
    x86/boot: Add missing file header comments
    x86/KASLR: Initialize mapping_info every time
    x86/boot: Comment what finalize_identity_maps() does
    x86/KASLR: Build identity mappings on demand
    x86/boot: Split out kernel_ident_mapping_init()
    x86/boot: Clean up indenting for asm/boot.h
    x86/KASLR: Improve comments around the mem_avoid[] logic
    x86/boot: Simplify pointer casting in choose_random_location()
    x86/KASLR: Consolidate mem_avoid[] entries
    x86/boot: Clean up pointer casting
    x86/boot: Warn on future overlapping memcpy() use
    x86/boot: Extract error reporting functions
    x86/boot: Correctly bounds-check relocations
    x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
    x86/boot: Fix "run_size" calculation
    x86/boot: Calculate decompression size during boot not build
    ...

    Linus Torvalds
     

10 May, 2016

7 commits

  • KASLR will be calling get_random_long() twice, but the debug output
    won't distinguishing between them. This patch adds a report on when it
    is fetching the physical vs virtual address. With this, once the virtual
    offset is separate, the report changes from:

    KASLR using RDTSC...
    KASLR using RDTSC...

    into:

    Physical KASLR using RDTSC...
    Virtual KASLR using RDTSC...

    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • To support randomizing the kernel virtual address separately from the
    physical address, this patch adds find_random_virt_addr() to choose
    a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
    Since this address is virtual, not physical, we can place the kernel
    anywhere in this region, as long as it is aligned and (in the case of
    kernel being larger than the slot size) placed with enough room to load
    the entire kernel image.

    For clarity and readability, find_random_addr() is renamed to
    find_random_phys_addr() and has "size" renamed to "image_size" to match
    find_random_virt_addr().

    Signed-off-by: Baoquan He
    [ Rewrote changelog, refactored slot calculation for readability. ]
    [ Renamed find_random_phys_addr() and size argument. ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • In preparation for being able to detect where to split up contiguous
    memory regions that overlap with memory regions to avoid, we need to
    pass back what the earliest overlapping region was. This modifies the
    overlap checker to return that information.

    Based on a separate mem_min_overlap() implementation by Baoquan He.

    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • In order to support KASLR moving the kernel anywhere in physical memory
    (which could be up to 64TB), we need to handle counting the potential
    randomization locations in a more efficient manner.

    In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
    randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
    the starting address of candidate positions is stored into the slots[]
    array, one at a time. This method would cost too much memory and it's
    also very inefficient to get and save the slot information into the slot
    array one by one.

    This patch introduces 'struct slot_area' to manage each contiguous region
    of randomization slots. Each slot_area will contain the starting address
    and how many available slots are in this area. As with the original code,
    the slot_areas[] will avoid the mem_avoid[] regions.

    Since setup_data is a linked list, it could contain an unknown number
    of memory regions to be avoided, which could cause us to fragment
    the contiguous memory that the slot_area array is tracking. In normal
    operation this level of fragmentation will be extremely rare, but we
    choose a suitably large value (100) for the array. If setup_data forces
    the slot_area array to become highly fragmented and there are more
    slots available beyond the first 100 found, the rest will be ignored
    for KASLR selection.

    The function store_slot_info() is used to calculate the number of slots
    available in the passed-in memory region and stores it into slot_areas[]
    after adjusting for alignment and size requirements.

    Signed-off-by: Baoquan He
    [ Rewrote changelog, squashed with new functions. ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Baoquan He
     
  • There were some files with missing header comments. Since they are
    included from both compressed and regular kernels, make note of that.
    Also corrects a typo in the mem_avoid comments.

    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • As it turns out, mapping_info DOES need to be initialized every
    time, because pgt_data address could be changed during kernel
    relocation. So it can not be build time assigned.

    Without this, page tables were not being corrected updated, which
    could cause reboots when a physical address beyond 2G was chosen.

    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462825332-10505-2-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • So it is not really obvious that finalize_identity_maps() doesn't do any
    finalization but it *actually* writes CR3 with the ident PGD. Comment
    that at the call site.

    Signed-off-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: bhe@redhat.com
    Cc: dyoung@redhat.com
    Cc: jkosina@suse.cz
    Cc: linux-tip-commits@vger.kernel.org
    Cc: luto@kernel.org
    Cc: vgoyal@redhat.com
    Cc: yinghai@kernel.org
    Link: http://lkml.kernel.org/r/20160507100541.GA24613@pd.tnic
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

07 May, 2016

3 commits

  • Currently KASLR only supports relocation in a small physical range (from
    16M to 1G), due to using the initial kernel page table identity mapping.
    To support ranges above this, we need to have an identity mapping for the
    desired memory range before we can decompress (and later run) the kernel.

    32-bit kernels already have the needed identity mapping. This patch adds
    identity mappings for the needed memory ranges on 64-bit kernels. This
    happens in two possible boot paths:

    If loaded via startup_32(), we need to set up the needed identity map.

    If loaded from a 64-bit bootloader, the bootloader will have already
    set up an identity mapping, and we'll start via the compressed kernel's
    startup_64(). In this case, the bootloader's page tables need to be
    avoided while selecting the new uncompressed kernel location. If not,
    the decompressor could overwrite them during decompression.

    To accomplish this, we could walk the pagetable and find every page
    that is used, and add them to mem_avoid, but this needs extra code and
    will require increasing the size of the mem_avoid array.

    Instead, we can create a new set of page tables for our own identity
    mapping instead. The pages for the new page table will come from the
    _pagetable section of the compressed kernel, which means they are
    already contained by in mem_avoid array. To do this, we reuse the code
    from the uncompressed kernel's identity mapping routines.

    The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce
    init_size, as now the compressed kernel's _rodata to _end will contribute
    to init_size.

    To handle the possible mappings, we need to increase the existing page
    table buffer size:

    When booting via startup_64(), we need to cover the old VO, params,
    cmdline and uncompressed kernel. In an extreme case we could have them
    all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings.
    And we'll need 2 for first 2M for VGA RAM. One more is needed for level4.
    This gets us to 19 pages total.

    When booting via startup_32(), KASLR could move the uncompressed kernel
    above 4G, so we need to create extra identity mappings, which should only
    need (2+2) pages at most when it is beyond the 512G boundary. So 19
    pages is sufficient for this case as well.

    The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their
    names to maintain logical consistency with the existing BOOT_HEAP_SIZE
    and BOOT_STACK_SIZE defines.

    This patch is based on earlier patches from Yinghai Lu and Baoquan He.

    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jiri Kosina
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • This attempts to improve the comments that describe how the memory
    range used for decompression is avoided. Additionally uses an enum
    instead of raw numbers for the mem_avoid[] indexing.

    Suggested-by: Borislav Petkov
    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/20160506194459.GA16480@www.outflux.net
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • Pass them down as 'unsigned long' directly and get rid of more casting and
    assignments.

    Signed-off-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: bhe@redhat.com
    Cc: dyoung@redhat.com
    Cc: linux-tip-commits@vger.kernel.org
    Cc: luto@kernel.org
    Cc: vgoyal@redhat.com
    Cc: yinghai@kernel.org
    Link: http://lkml.kernel.org/r/20160506115015.GI24044@pd.tnic
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

06 May, 2016

2 commits

  • The mem_avoid[] array is used to track positions that should be avoided (like
    the compressed kernel, decompression code, etc) when selecting a memory
    position for the randomly relocated kernel. Since ZO is now at the end of
    the decompression buffer and the decompression code (and its heap and
    stack) are at the front, we can safely consolidate the decompression entry,
    the heap entry, and the stack entry. The boot_params memory, however, could
    be elsewhere, so it should be explicitly included.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Baoquan He
    [ Rwrote changelog, cleaned up code comments. ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462486436-3707-3-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Currently extract_kernel() defines the input and output buffer pointers
    as "unsigned char *" since that's effectively what they are. It passes
    these to the decompressor routine and to the ELF parser, which both
    logically deal with buffer pointers too. There is some casting ("unsigned
    long") done to validate the numerical value of the pointers, but it is
    relatively limited.

    However, choose_random_location() operates almost exclusively on the
    numerical representation of these pointers, so it ended up carrying
    a lot of "unsigned long" casts. With the future physical/virtual split
    these casts were going to multiply, so this attempts to solve the
    problem by doing all the casting in choose_random_location()'s entry
    and return instead of through-out the code. Adjusts argument names to
    be more meaningful, and changes one us of "choice" to "output" to make
    the future physical/virtual split more clear (i.e. "choice" should be
    strictly a function return value and not used as an intermediate).

    Suggested-by: Ingo Molnar
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: Yinghai Lu
    Cc: kernel-hardening@lists.openwall.com
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1462486436-3707-2-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     

03 May, 2016

2 commits

  • If an overlapping memcpy() is ever attempted, we should at least report
    it, in case it might lead to problems, so it could be changed to a
    memmove() call instead.

    Suggested-by: Ingo Molnar
    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Lasse Collin
    Cc: Linus Torvalds
    Cc: One Thousand Gnomes
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/1462229461-3370-3-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • Currently to use warn(), a caller would need to include misc.h. However,
    this means they would get the (unavailable during compressed boot)
    gcc built-in memcpy family of functions. But since string.c is defining
    these memcpy functions for use by misc.c, we end up in a weird circular
    dependency.

    To break this loop, move the error reporting functions outside of misc.c
    with their own header so that they can be independently included by
    other sources. Since the screen-writing routines use memmove(), keep the
    low-level *_putstr() functions in misc.c.

    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Lasse Collin
    Cc: Linus Torvalds
    Cc: One Thousand Gnomes
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yinghai Lu
    Link: http://lkml.kernel.org/r/1462229461-3370-2-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     

29 Apr, 2016

4 commits

  • Relocation handling performs bounds checking on the resulting calculated
    addresses. The existing code uses output_len (VO size plus relocs size) as
    the max address. This is not right since the max_addr check should stop at
    the end of VO and exclude bss, brk, etc, which follows. The valid range
    should be VO [_text, __bss_start] in the loaded physical address space.

    This patch adds an export for __bss_start in voffset.h and uses it to
    set the correct limit for max_addr.

    Signed-off-by: Yinghai Lu
    [ Rewrote the changelog. ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Since 'run_size' is now calculated in misc.c, the old script and associated
    argument passing is no longer needed. This patch removes them, and renames
    'run_size' to the more descriptive 'kernel_total_size'.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Baoquan He
    [ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Triplett
    Cc: Junjie Mao
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Currently, the "run_size" variable holds the total kernel size
    (size of code plus brk and bss) and is calculated via the shell script
    arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
    of the .bss and .brk sections from the vmlinux, and adds them as follows:

    run_size = $(( $offsetA + $sizeA + $sizeB ))

    However, this is not correct (it is too large). To illustrate, here's
    a walk-through of the script's calculation, compared to the correct way
    to find it.

    First, offsetA is found as the starting address of the first .bss or
    .brk section seen in the ELF file. The sizeA and sizeB values are the
    respective section sizes.

    [bhe@x1 linux]$ objdump -h vmlinux

    vmlinux: file format elf64-x86-64

    Sections:
    Idx Name Size VMA LMA File off Algn
    27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12
    ALLOC
    28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0
    ALLOC

    Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
    0x00027000. The resulting run_size is 0x145f000:

    0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000

    However, if we instead examine the ELF LOAD program headers, we see a
    different picture.

    [bhe@x1 linux]$ readelf -l vmlinux

    Elf file type is EXEC (Executable file)
    Entry point 0x1000000
    There are 5 program headers, starting at offset 64

    Program Headers:
    Type Offset VirtAddr PhysAddr
    FileSiz MemSiz Flags Align
    LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
    0x0000000000b5e000 0x0000000000b5e000 R E 200000
    LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
    0x0000000000145000 0x0000000000145000 RW 200000
    LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000
    0x0000000000018158 0x0000000000018158 RW 200000
    LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
    0x000000000016a000 0x0000000000301000 RWE 200000
    NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
    0x00000000000001bc 0x00000000000001bc 4

    Section to Segment mapping:
    Segment Sections...
    00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
    __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
    __modver
    01 .data .vvar
    02 .data..percpu
    03 .init.text .init.data .x86_cpu_dev.init .parainstructions
    .altinstructions .altinstr_replacement .iommu_table .apicdrivers
    .exit.text .smp_locks .bss .brk
    04 .notes

    As mentioned, run_size needs to be the size of the running kernel
    including .bss and .brk. We can see from the Section/Segment mapping
    above that .bss and .brk are included in segment 03 (which corresponds
    to the final LOAD program header). To find the run_size, we calculate
    the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
    and its MemSiz (0x0000000000301000), minus the physical load address of
    the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
    resulting run_size is 0x105f000:

    0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000

    So, from this we can see that the existing run_size calculation is
    0x400000 too high. And, as it turns out, the correct run_size is
    actually equal to VO_end - VO_text, which is certainly easier to calculate.
    _end: 0xffffffff8205f000
    _text:0xffffffff81000000

    0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000

    As a result, run_size is a simple constant, so we don't need to pass it
    around; we already have voffset.h for such things. We can share voffset.h
    between misc.c and header.S instead of getting run_size in other ways.
    This patch moves voffset.h creation code to boot/compressed/Makefile,
    and switches misc.c to use the VO_end - VO_text calculation for run_size.

    Dependence before:

    boot/header.S ==> boot/voffset.h ==> vmlinux
    boot/header.S ==> compressed/vmlinux ==> compressed/misc.c

    Dependence after:

    boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux

    Signed-off-by: Yinghai Lu
    Signed-off-by: Baoquan He
    [ Rewrote the changelog. ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Triplett
    Cc: Junjie Mao
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: lasse.collin@tukaani.org
    Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
    Link: http://lkml.kernel.org/r/1461888548-32439-5-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
    This doesn't work well because mkpiggy.c doesn't know the details of the
    decompressor in use. As a result, it can only make an estimation, which
    has risks:

    - output + output_len (VO) could be much bigger than input + input_len
    (ZO). In this case, the decompressed kernel plus relocs could overwrite
    the decompression code while it is running.

    - The head code of ZO could be bigger than z_extract_offset. In this case
    an overwrite could happen when the head code is running to move ZO to
    the end of buffer. Though currently the size of the head code is very
    small it's still a potential risk. Since there is no rule to limit the
    size of the head code of ZO, it runs the risk of suddenly becoming a
    (hard to find) bug.

    Instead, this moves the z_extract_offset calculation into header.S, and
    makes adjustments to be sure that the above two cases can never happen,
    and further corrects the comments describing the calculations.

    Since we have (in the previous patch) made ZO always be located against
    the end of decompression buffer, z_extract_offset is only used here to
    calculate an appropriate buffer size (INIT_SIZE), and is not longer used
    elsewhere. As such, it can be removed from voffset.h.

    Additionally clean up #if/#else #define to improve readability.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Baoquan He
    [ Rewrote the changelog and comments. ]
    Signed-off-by: Kees Cook
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Young
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: lasse.collin@tukaani.org
    Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Yinghai Lu