07 Oct, 2019

1 commit

  • In commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") we
    changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
    executable mappings.

    Then, people reported that it broke some binaries that had overlapping
    segments from the same file, and commit ad55eac74f20 ("elf: enforce
    MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
    overlaying elf segment cases. But only some - despite the summary line
    of that commit, it only did it when it also does a temporary brk vma for
    one obvious overlapping case.

    Now Russell King reports another overlapping case with old 32-bit x86
    binaries, which doesn't trigger that limited case. End result: we had
    better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.

    Yes, it's a sign of old binaries generated with old tool-chains, but we
    do pride ourselves on not breaking existing setups.

    This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
    and the old load_elf_library() use-cases, because nobody has reported
    breakage for those. Yet.

    Note that in all the cases seen so far, the overlapping elf sections
    seem to be just re-mapping of the same executable with different section
    attributes. We could possibly introduce a new MAP_FIXED_NOFILECHANGE
    flag or similar, which acts like NOREPLACE, but allows just remapping
    the same executable file using different protection flags.

    It's not clear that would make a huge difference to anything, but if
    people really hate that "elf remaps over previous maps" behavior, maybe
    at least a more limited form of remapping would alleviate some concerns.

    Alternatively, we should take a look at our elf_map() logic to see if we
    end up not mapping things properly the first time.

    In the meantime, this is the minimal "don't do that then" patch while
    people hopefully think about it more.

    Reported-by: Russell King
    Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
    Fixes: ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments")
    Cc: Michal Hocko
    Cc: Kees Cook
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

27 Sep, 2019

1 commit

  • When brk was moved for binaries without an interpreter, it should have
    been limited to ET_DYN only. In other words, the special case was an
    ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
    The bug manifested for giant static executables, where the brk would end
    up in the middle of the text area on 32-bit architectures.

    Reported-and-tested-by: Richard Kojedzinszky
    Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
    Cc: stable@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: Linus Torvalds

    Kees Cook
     

25 Sep, 2019

1 commit

  • Patch series "Provide generic top-down mmap layout functions", v6.

    This series introduces generic functions to make top-down mmap layout
    easily accessible to architectures, in particular riscv which was the
    initial goal of this series. The generic implementation was taken from
    arm64 and used successively by arm, mips and finally riscv.

    Note that in addition the series fixes 2 issues:

    - stack randomization was taken into account even if not necessary.

    - [1] fixed an issue with mmap base which did not take into account
    randomization but did not report it to arm and mips, so by moving arm64
    into a generic library, this problem is now fixed for both
    architectures.

    This work is an effort to factorize architecture functions to avoid code
    duplication and oversights as in [1].

    [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html

    This patch (of 14):

    This preparatory commit moves this function so that further introduction
    of generic topdown mmap layout is contained only in mm/util.c.

    Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr
    Signed-off-by: Alexandre Ghiti
    Acked-by: Kees Cook
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Luis Chamberlain
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: James Hogan
    Cc: Palmer Dabbelt
    Cc: Albert Ou
    Cc: Alexander Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ghiti
     

17 Jul, 2019

1 commit

  • "passed_fileno" variable was deleted 11 years ago in 2.6.25.

    Link: http://lkml.kernel.org/r/20190529201747.GA23248@avx2
    Fixes: d20894a23708 ("Remove a.out interpreter support in ELF loader")
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have MODULE_LICENCE("GPL*") inside which was used in the initial
    scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

9 commits

  • Commmit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
    made changes in the rare case when the ELF loader was directly invoked
    (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
    the loader), by moving into the mmap region to avoid both ET_EXEC and
    PIE binaries. This had the effect of also moving the brk region into
    mmap, which could lead to the stack and brk being arbitrarily close to
    each other. An unlucky process wouldn't get its requested stack size
    and stack allocations could end up scribbling on the heap.

    This is illustrated here. In the case of using the loader directly, brk
    (so helpfully identified as "[heap]") is allocated with the _loader_ not
    the binary. For example, with ASLR entirely disabled, you can see this
    more clearly:

    $ /bin/cat /proc/self/maps
    555555554000-55555555c000 r-xp 00000000 ... /bin/cat
    55555575b000-55555575c000 r--p 00007000 ... /bin/cat
    55555575c000-55555575d000 rw-p 00008000 ... /bin/cat
    55555575d000-55555577e000 rw-p 00000000 ... [heap]
    ...
    7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
    7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
    7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffe000-7ffff7fff000 rw-p 00000000 ...
    7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]

    $ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps
    ...
    7ffff7bcc000-7ffff7bd4000 r-xp 00000000 ... /bin/cat
    7ffff7bd4000-7ffff7dd3000 ---p 00008000 ... /bin/cat
    7ffff7dd3000-7ffff7dd4000 r--p 00007000 ... /bin/cat
    7ffff7dd4000-7ffff7dd5000 rw-p 00008000 ... /bin/cat
    7ffff7dd5000-7ffff7dfc000 r-xp 00000000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7fb2000-7ffff7fd6000 rw-p 00000000 ...
    7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
    7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
    7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffe000-7ffff8020000 rw-p 00000000 ... [heap]
    7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]

    The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since
    nothing is there in the direct loader case (and ET_EXEC is still far
    away at 0x400000). Anything that ran before should still work (i.e.
    the ultimately-launched binary already had the brk very far from its
    text, so this should be no different from a COMPAT_BRK standpoint). The
    only risk I see here is that if someone started to suddenly depend on
    the entire memory space lower than the mmap region being available when
    launching binaries via a direct loader execs which seems highly
    unlikely, I'd hope: this would mean a binary would _not_ work when
    exec()ed normally.

    (Note that this is only done under CONFIG_ARCH_HAS_ELF_RANDOMIZATION
    when randomization is turned on.)

    Link: http://lkml.kernel.org/r/20190422225727.GA21011@beast
    Link: https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=Omfhx_p0nCKPSjA@mail.gmail.com
    Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
    Signed-off-by: Kees Cook
    Reported-by: Ali Saidi
    Cc: Ali Saidi
    Cc: Guenter Roeck
    Cc: Michal Hocko
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Jann Horn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Get "current_pt_regs" pointer right before usage.

    Space savings on x86_64:

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-180 (-180)
    Function old new delta
    load_elf_binary 5806 5626 -180 !!!

    Looks like the compiler doesn't know that "current_pt_regs" is stable
    pointer (because it doesn't know ->stack isn't) even though it knows
    that "current" is stable pointer. So it saves it in the very beginning
    and then tries to carry it through a lot of code.

    Here is what happens here:

    load_elf_binary()
    ...
    mov rax,QWORD PTR gs:0x14c00
    mov r13,QWORD PTR [rax+0x18] r13 = current->stack
    call kmem_cache_alloc # first kmalloc

    [980 bytes later!]

    # let's spill that sucker because we need a register
    # for "load_bias" calculations at
    #
    # if (interpreter) {
    # load_bias = ELF_ET_DYN_BASE;
    # if (current->flags & PF_RANDOMIZE)
    # load_bias += arch_mmap_rnd();
    # elf_flags |= elf_fixed;
    # }
    mov QWORD PTR [rsp+0x68],r13

    If this is not _the_ root cause it is still eeeeh.

    After the patch things become much simpler:

    mov rax, QWORD PTR gs:0x14c00 # current
    mov rdx, QWORD PTR [rax+0x18] # current->stack
    movq [rdx+0x3fb8], 0 # fill pt_regs
    ...
    call finalize_exec

    Link: http://lkml.kernel.org/r/20190419200343.GA19788@avx2
    Signed-off-by: Alexey Dobriyan
    Tested-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There are two places where mapping protections are calculated: one for
    executable, another one for interpreter -- take them out.

    ELF read and execute permissions are interchanged with Linux PROT_READ
    and PROT_EXEC, microoptimizations are welcome!

    Link: http://lkml.kernel.org/r/20190417213413.GB26474@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190416202002.GB24304@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Rewrite

    for (...) {
    if (->p_type == PT_INTERP) {
    ...
    break;
    }
    }

    loop into

    for (...) {
    if (->p_type != PT_INTERP)
    continue;
    ...
    break;
    }

    Link: http://lkml.kernel.org/r/20190416201906.GA24304@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190314205042.GE18143@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There is no reason for PT_INTERP filename to linger till the end of the
    whole loading process.

    Link: http://lkml.kernel.org/r/20190314204953.GD18143@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Nikitas Angelinas
    Reviewed-by: Andrew Morton
    Cc: Mukesh Ojha
    [nikitas.angelinas@gmail.com: fix GPF when dereferencing invalid interpreter]
    Link: http://lkml.kernel.org/r/20190330140032.GA1527@vostro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190314204707.GC18143@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • As pointed out by zoujc@lenovo.com, setup_arg_pages() already
    initialized current->mm->start_stack.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=202881
    Reported-by:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Mar, 2019

3 commits

  • Link: http://lkml.kernel.org/r/20190204202830.GC27482@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • [adobriyan@gmail.com: fixup compilation]
    Link: http://lkml.kernel.org/r/20190205064334.GA2152@avx2
    Link: http://lkml.kernel.org/r/20190204202800.GB27482@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Number of ELF program headers is 16-bit by spec, so total size
    comfortably fits into "unsigned int".

    Space savings: 7 bytes!

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-7 (-7)
    Function old new delta
    load_elf_phdrs 137 130 -7

    Link: http://lkml.kernel.org/r/20190204202715.GA27482@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

03 Oct, 2018

1 commit

  • Linus recently observed that if we did not worry about the padding
    member in struct siginfo it is only about 48 bytes, and 48 bytes is
    much nicer than 128 bytes for allocating on the stack and copying
    around in the kernel.

    The obvious thing of only adding the padding when userspace is
    including siginfo.h won't work as there are sigframe definitions in
    the kernel that embed struct siginfo.

    So split siginfo in two; kernel_siginfo and siginfo. Keeping the
    traditional name for the userspace definition. While the version that
    is used internally to the kernel and ultimately will not be padded to
    128 bytes is called kernel_siginfo.

    The definition of struct kernel_siginfo I have put in include/signal_types.h

    A set of buildtime checks has been added to verify the two structures have
    the same field offsets.

    To make it easy to verify the change kernel_siginfo retains the same
    size as siginfo. The reduction in size comes in a following change.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

14 Aug, 2018

1 commit

  • Pull MIPS updates from Paul Burton:
    "Here are the main MIPS changes for 4.19.

    An overview of the general architecture changes:

    - Massive DMA ops refactoring from Christoph Hellwig (huzzah for
    deleting crufty code!).

    - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes &
    corresponding regsets to expose DSP ASE & floating point mode state
    respectively, both for live debugging & core dumps.

    - We better optimize our code by hard-coding cpu_has_* macros at
    compile time where their values are known due to the ISA revision
    that the kernel build is targeting.

    - The EJTAG exception handler now better handles SMP systems, where
    it was previously possible for CPUs to clobber a register value
    saved by another CPU.

    - Our implementation of memset() gained a couple of fixes for MIPSr6
    systems to return correct values in some cases where stores fault.

    - We now implement ioremap_wc() using the uncached-accelerated cache
    coherency attribute where supported, which is detected during boot,
    and fall back to plain uncached access where necessary. The
    MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
    ioremap_cacheable_cow() are removed.

    - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
    systems by reworking the way we ensure remote CPUs that may be
    running threads within the affected process switch mode.

    - Systems using the MIPS Coherence Manager will now set the
    MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
    maintenance overhead when flushing the icache.

    - A few fixes were made for building with clang/LLVM, which now
    sucessfully builds kernels for many of our platforms.

    - Miscellaneous cleanups all over.

    And some platform-specific changes:

    - ar7 gained stubs for a few clock API functions to fix build
    failures for some drivers.

    - ath79 gained support for a few new SoCs, a few fixes & better
    gpio-keys support.

    - Ci20 now exposes its SPI bus using the spi-gpio driver.

    - The generic platform can now auto-detect a suitable value for
    PHYS_OFFSET based upon the memory map described by the device tree,
    allowing us to avoid wasting memory on page book-keeping for
    systems where RAM starts at a non-zero physical address.

    - Ingenic systems using the jz4740 platform code now link their
    vmlinuz higher to allow for kernels of a realistic size.

    - Loongson32 now builds the kernel targeting MIPSr1 rather than
    MIPSr2 to avoid CPU errata.

    - Loongson64 gains a couple of fixes, a workaround for a write
    buffering issue & support for the Loongson 3A R3.1 CPU.

    - Malta now uses the piix4-poweroff driver to handle powering down.

    - Microsemi Ocelot gained support for its SPI bus & NOR flash, its
    second MDIO bus and can now be supported by a FIT/.itb image.

    - Octeon saw a bunch of header cleanups which remove a lot of
    duplicate or unused code"

    * tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits)
    MIPS: Remove remnants of UASM_ISA
    MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()
    MIPS: VDSO: Force link endianness
    MIPS: Always specify -EB or -EL when using clang
    MIPS: Use dins to simplify __write_64bit_c0_split()
    MIPS: Use read-write output operand in __write_64bit_c0_split()
    MIPS: Avoid using array as parameter to write_c0_kpgd()
    MIPS: vdso: Allow clang's --target flag in VDSO cflags
    MIPS: genvdso: Remove GOT checks
    MIPS: Remove obsolete MIPS checks for DST node "chosen@0"
    MIPS: generic: Remove input symbols from defconfig
    MIPS: Delete unused code in linux32.c
    MIPS: Remove unused sys_32_mmap2
    MIPS: Remove nabi_no_regargs
    mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123
    mips: dts: mscc: Add spi on Ocelot
    MIPS: Loongson: Merge load addresses
    MIPS: Loongson: Set Loongson32 to MIPS32R1
    MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller
    MIPS: generic: Select MIPS_AUTO_PFN_OFFSET
    ...

    Linus Torvalds
     

20 Jul, 2018

1 commit

  • The regset API documented in defines -ENODEV as the
    result of the `->active' handler to be used where the feature requested
    is not available on the hardware found. However code handling core file
    note generation in `fill_thread_core_info' interpretes any non-zero
    result from the `->active' handler as the regset requested being active.
    Consequently processing continues (and hopefully gracefully fails later
    on) rather than being abandoned right away for the regset requested.

    Fix the problem then by making the code proceed only if a positive
    result is returned from the `->active' handler.

    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Paul Burton
    Fixes: 4206d3aa1978 ("elf core dump: notes user_regset")
    Patchwork: https://patchwork.linux-mips.org/patch/19332/
    Cc: Alexander Viro
    Cc: James Hogan
    Cc: Ralf Baechle
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-kernel@vger.kernel.org

    Maciej W. Rozycki
     

15 Jul, 2018

1 commit

  • The current code does not make sure to page align bss before calling
    vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to
    the requested lenght not being correctly aligned.

    Let us make sure to align it properly.

    Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured
    for libc5.

    Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
    Signed-off-by: Oscar Salvador
    Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com
    Tested-by: Tetsuo Handa
    Acked-by: Kees Cook
    Cc: Michal Hocko
    Cc: Nicolas Pitre
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     

15 Jun, 2018

1 commit

  • Nobody ever tried to self destruct by unmapping whole address space at
    once:

    munmap((void *)0, (1ULL << 47) - 4096);

    Doing this produces 2 warnings for zero-length vmalloc allocations:

    a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14
    a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
    ...
    a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
    ...

    Fix is to switch to kvmalloc().

    Steps to reproduce:

    // vsyscall=none
    #include
    #include
    int main(void)
    {
    setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY});
    munmap((void *)0, (1ULL << 47) - 4096);
    return 0;
    }

    Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

13 Jun, 2018

2 commits

  • The vmalloc() function has no 2-factor argument form, so multiplication
    factors need to be wrapped in array_size(). This patch replaces cases of:

    vmalloc(a * b)

    with:
    vmalloc(array_size(a, b))

    as well as handling cases of:

    vmalloc(a * b * c)

    with:

    vmalloc(array3_size(a, b, c))

    This does, however, attempt to ignore constant size factors like:

    vmalloc(4 * 1024)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vmalloc(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vmalloc(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vmalloc(C1 * C2 * C3, ...)
    |
    vmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vmalloc(C1 * C2, ...)
    |
    vmalloc(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     
  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

21 Apr, 2018

1 commit

  • Commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is
    printing spurious messages under memory pressure due to map_addr == -ENOMEM.

    9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
    14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
    16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

    Complain only if -EEXIST, and use %px for printing the address.

    Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
    Fixes: 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Cc: Andrei Vagin
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

12 Apr, 2018

3 commits

  • Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from
    elf_map" applied, some ELF binaries in his environment fail to start
    with

    [ 23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already
    [ 23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon

    The reason is that the above binary has overlapping elf segments:

    LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000
    0x0000000000013a8c 0x0000000000013a8c R E 10000
    LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
    0x00000000000002c0 0x00000000000005e8 RW 10000
    LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328
    0x0000000000000384 0x00000000000094a0 RW 10000

    That binary has two RW LOAD segments, the first crosses a page border
    into the second

    0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)

    Handle this situation by enforcing MAP_FIXED when we establish a
    temporary brk VMA to handle overlapping segments. All other mappings
    will still use MAP_FIXED_NOREPLACE.

    Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Reported-by: Anshuman Khandual
    Reviewed-by: Khalid Aziz
    Cc: Andrei Vagin
    Cc: Michael Ellerman
    Cc: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Stephen Rothwell
    Cc: Mark Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Both load_elf_interp and load_elf_binary rely on elf_map to map segments
    on a controlled address and they use MAP_FIXED to enforce that. This is
    however dangerous thing prone to silent data corruption which can be
    even exploitable.

    Let's take CVE-2017-1000253 as an example. At the time (before commit
    eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
    ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
    the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
    we could end up mapping over the existing stack with some luck.

    The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix
    bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much
    further from the stack (eab09532d400 and later by c715b72c1ba4: "mm:
    revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive
    stack consumption early during execve fully stopped by da029c11e6b1
    ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
    safe and any attack should be impractical. On the other hand this is
    just too subtle assumption so it can break quite easily and hard to
    spot.

    I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
    fundamentally dangerous. Moreover it shouldn't be even needed. We are
    at the early process stage and so there shouldn't be unrelated mappings
    (except for stack and loader) existing so mmap for a given address should
    succeed even without MAP_FIXED. Something is terribly wrong if this is
    not the case and we should rather fail than silently corrupt the
    underlying mapping.

    Address this issue by changing MAP_FIXED to the newly added
    MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an
    existing mapping clashing with the requested one without clobbering it.

    [mhocko@suse.com: fix build]
    [akpm@linux-foundation.org: coding-style fixes]
    [avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC]
    Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org
    Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrei Vagin
    Signed-off-by: Michal Hocko
    Reviewed-by: Khalid Aziz
    Acked-by: Michael Ellerman
    Acked-by: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Provide a final callback into fs/exec.c before start_thread() takes
    over, to handle any last-minute changes, like the coming restoration of
    the stack limit.

    Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org
    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Ben Hutchings
    Cc: Ben Hutchings
    Cc: Brad Spengler
    Cc: Greg KH
    Cc: Hugh Dickins
    Cc: "Jason A. Donenfeld"
    Cc: Laura Abbott
    Cc: Michal Hocko
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Willy Tarreau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

07 Feb, 2018

1 commit

  • If vm.max_map_count bumped above 2^26 (67+ mil) and system has enough RAM
    to allocate all the VMAs (~12.8 GB on Fedora 27 with 200-byte VMAs), then
    it should be possible to overflow 32-bit "size", pass paranoia check,
    allocate very little vmalloc space and oops while writing into vmalloc
    guard page...

    But I didn't test this, only coredump of regular process.

    Link: http://lkml.kernel.org/r/20180112203427.GA9109@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

17 Nov, 2017

1 commit

  • Pull ARM updates from Russell King:

    - add support for ELF fdpic binaries on both MMU and noMMU platforms

    - linker script cleanups

    - support for compressed .data section for XIP images

    - discard memblock arrays when possible

    - various cleanups

    - atomic DMA pool updates

    - better diagnostics of missing/corrupt device tree

    - export information to allow userspace kexec tool to place images more
    inteligently, so that the device tree isn't overwritten by the
    booting kernel

    - make early_printk more efficient on semihosted systems

    - noMMU cleanups

    - SA1111 PCMCIA update in preparation for further cleanups

    * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (38 commits)
    ARM: 8719/1: NOMMU: work around maybe-uninitialized warning
    ARM: 8717/2: debug printch/printascii: translate '\n' to "\r\n" not "\n\r"
    ARM: 8713/1: NOMMU: Support MPU in XIP configuration
    ARM: 8712/1: NOMMU: Use more MPU regions to cover memory
    ARM: 8711/1: V7M: Add support for MPU to M-class
    ARM: 8710/1: Kconfig: Kill CONFIG_VECTORS_BASE
    ARM: 8709/1: NOMMU: Disallow MPU for XIP
    ARM: 8708/1: NOMMU: Rework MPU to be mostly done in C
    ARM: 8707/1: NOMMU: Update MPU accessors to use cp15 helpers
    ARM: 8706/1: NOMMU: Move out MPU setup in separate module
    ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()
    ARM: 8705/1: early_printk: use printascii() rather than printch()
    ARM: 8703/1: debug.S: move hexbuf to a writable section
    ARM: add additional table to compressed kernel
    ARM: decompressor: fix BSS size calculation
    pcmcia: sa1111: remove special sa1111 mmio accessors
    pcmcia: sa1111: use sa1111_get_irq() to obtain IRQ resources
    ARM: better diagnostics with missing/corrupt dtb
    ARM: 8699/1: dma-mapping: Remove init_dma_coherent_pool_size()
    ARM: 8698/1: dma-mapping: Mark atomic_pool as __ro_after_init
    ..

    Linus Torvalds
     

03 Nov, 2017

1 commit

  • Currently the regset API doesn't allow for the possibility that
    regsets (or at least, the amount of meaningful data in a regset)
    may change in size.

    In particular, this results in useless padding being added to
    coredumps if a regset's current size is smaller than its
    theoretical maximum size.

    This patch adds a get_size() function to struct user_regset.
    Individual regset implementations can implement this function to
    return the current size of the regset data. A regset_size()
    function is added to provide callers with an abstract interface for
    determining the size of a regset without needing to know whether
    the regset is dynamically sized or not.

    The only affected user of this interface is the ELF coredump code:
    This patch ports ELF coredump to dump regsets with their actual
    size in the coredump. This has no effect except for new regsets
    that are dynamically sized and provide a get_size() implementation.

    Signed-off-by: Dave Martin
    Reviewed-by: Catalin Marinas
    Cc: Oleg Nesterov
    Cc: Alexander Viro
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Dmitry Safonov
    Cc: H. J. Lu
    Signed-off-by: Will Deacon

    Dave Martin
     

03 Oct, 2017

1 commit


15 Sep, 2017

1 commit

  • Pull more set_fs removal from Al Viro:
    "Christoph's 'use kernel_read and friends rather than open-coding
    set_fs()' series"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: unexport vfs_readv and vfs_writev
    fs: unexport vfs_read and vfs_write
    fs: unexport __vfs_read/__vfs_write
    lustre: switch to kernel_write
    gadget/f_mass_storage: stop messing with the address limit
    mconsole: switch to kernel_read
    btrfs: switch write_buf to kernel_write
    net/9p: switch p9_fd_read to kernel_write
    mm/nommu: switch do_mmap_private to kernel_read
    serial2002: switch serial2002_tty_write to kernel_{read/write}
    fs: make the buf argument to __kernel_write a void pointer
    fs: fix kernel_write prototype
    fs: fix kernel_read prototype
    fs: move kernel_read to fs/read_write.c
    fs: move kernel_write to fs/read_write.c
    autofs4: switch autofs4_write to __kernel_write
    ashmem: switch to ->read_iter

    Linus Torvalds
     

11 Sep, 2017

1 commit

  • On platforms where both ELF and ELF-FDPIC variants are available, the
    regular ELF loader will happily identify FDPIC binaries as proper ELF
    and load them without the necessary FDPIC fixups, resulting in an
    immediate user space crash. Let's prevent binflt_elf from loading those
    binaries so binfmt_elf_fdpic has a chance to pick them up. For those
    architectures that don't define elf_check_fdpic(), a default version
    returning false is provided.

    Signed-off-by: Nicolas Pitre
    Acked-by: Mickael GUENE
    Tested-by: Vincent Abriou
    Tested-by: Andras Szemzo

    Nicolas Pitre
     

08 Sep, 2017

1 commit

  • Pull secureexec update from Kees Cook:
    "This series has the ultimate goal of providing a sane stack rlimit
    when running set*id processes.

    To do this, the bprm_secureexec LSM hook is collapsed into the
    bprm_set_creds hook so the secureexec-ness of an exec can be
    determined early enough to make decisions about rlimits and the
    resulting memory layouts. Other logic acting on the secureexec-ness of
    an exec is similarly consolidated. Capabilities needed some special
    handling, but the refactoring removed other special handling, so that
    was a wash"

    * tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    exec: Consolidate pdeath_signal clearing
    exec: Use sane stack rlimit under secureexec
    exec: Consolidate dumpability logic
    smack: Remove redundant pdeath_signal clearing
    exec: Use secureexec for clearing pdeath_signal
    exec: Use secureexec for setting dumpability
    LSM: drop bprm_secureexec hook
    commoncap: Move cap_elevated calculation into bprm_set_creds
    commoncap: Refactor to remove bprm_secureexec hook
    smack: Refactor to remove bprm_secureexec hook
    selinux: Refactor to remove bprm_secureexec hook
    apparmor: Refactor to remove bprm_secureexec hook
    binfmt: Introduce secureexec flag
    exec: Correct comments about "point of no return"
    exec: Rename bprm->cred_prepared to called_set_creds

    Linus Torvalds
     

05 Sep, 2017

1 commit

  • Use proper ssize_t and size_t types for the return value and count
    argument, move the offset last and make it an in/out argument like
    all other read/write helpers, and make the buf argument a void pointer
    to get rid of lots of casts in the callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

17 Aug, 2017

1 commit

  • The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and
    randomize_stack_top() are not required.

    PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not
    set, no need to re-check after that.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Dmitry Safonov
    Cc: stable@vger.kernel.org
    Cc: Andy Lutomirski
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: "Kirill A. Shutemov"
    Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com

    Oleg Nesterov
     

02 Aug, 2017

1 commit

  • The bprm_secureexec hook can be moved earlier. Right now, it is called
    during create_elf_tables(), via load_binary(), via search_binary_handler(),
    via exec_binprm(). Nearly all (see exception below) state used by
    bprm_secureexec is created during the bprm_set_creds hook, called from
    prepare_binprm().

    For all LSMs (except commoncaps described next), only the first execution
    of bprm_set_creds takes any effect (they all check bprm->called_set_creds
    which prepare_binprm() sets after the first call to the bprm_set_creds
    hook). However, all these LSMs also only do anything with bprm_secureexec
    when they detected a secure state during their first run of bprm_set_creds.
    Therefore, it is functionally identical to move the detection into
    bprm_set_creds, since the results from secureexec here only need to be
    based on the first call to the LSM's bprm_set_creds hook.

    The single exception is that the commoncaps secureexec hook also examines
    euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
    via prepare_binprm(), which can be called multiple times (e.g.
    binfmt_script, binfmt_misc), and may clear the euid/egid for the final
    load (i.e. the script interpreter). However, while commoncaps specifically
    ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
    prepare_binprm() may get called, it needs to base the secureexec decision
    on the final call to bprm_set_creds. As a result, it will need special
    handling.

    To begin this refactoring, this adds the secureexec flag to the bprm
    struct, and calls the secureexec hook during setup_new_exec(). This is
    safe since all the cred work is finished (and past the point of no return).
    This explicit call will be removed in later patches once the hook has been
    removed.

    Cc: David Howells
    Signed-off-by: Kees Cook
    Reviewed-by: John Johansen
    Acked-by: Serge Hallyn
    Reviewed-by: James Morris

    Kees Cook
     

11 Jul, 2017

2 commits

  • When building the argv/envp pointers, the envp is needlessly
    pre-incremented instead of just continuing after the argv pointers are
    finished. In some (likely impossible) race where the strings could be
    changed from userspace between copy_strings() and here, it might be
    possible to confuse the envp position. Instead, just use sp like
    everything else.

    Link: http://lkml.kernel.org/r/20170622173838.GA43308@beast
    Signed-off-by: Kees Cook
    Cc: Rik van Riel
    Cc: Daniel Micay
    Cc: Qualys Security Advisory
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Alexander Viro
    Cc: Dmitry Safonov
    Cc: Andy Lutomirski
    Cc: Grzegorz Andrejczuk
    Cc: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • The ELF_ET_DYN_BASE position was originally intended to keep loaders
    away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2
    /bin/cat" might cause the subsequent load of /bin/cat into where the
    loader had been loaded.)

    With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
    ELF_ET_DYN_BASE continued to be used since the kernel was only looking
    at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the
    top 1/3rd of the TASK_SIZE, a substantial portion of the address space
    is unused.

    For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
    loaded above the mmap region. This means they can be made to collide
    (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
    pathological stack regions.

    Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
    region in all cases, and will now additionally avoid programs falling
    back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
    if it would have collided with the stack, now it will fail to load
    instead of falling back to the mmap region).

    To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
    are loaded into the mmap region, leaving space available for either an
    ET_EXEC binary with a fixed location or PIE being loaded into mmap by
    the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
    which means architectures can now safely lower their values without risk
    of loaders colliding with their subsequently loaded programs.

    For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
    the entire 32-bit address space for 32-bit pointers.

    Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
    suggestions on how to implement this solution.

    Fixes: d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR")
    Link: http://lkml.kernel.org/r/20170621173201.GA114489@beast
    Signed-off-by: Kees Cook
    Acked-by: Rik van Riel
    Cc: Daniel Micay
    Cc: Qualys Security Advisory
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Alexander Viro
    Cc: Dmitry Safonov
    Cc: Andy Lutomirski
    Cc: Grzegorz Andrejczuk
    Cc: Masahiro Yamada
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: James Hogan
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Pratyush Anand
    Cc: Russell King
    Cc: Will Deacon
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook