01 Feb, 2020

8 commits

  • Unmapping whole address space at once with

    munmap(0, (1ULL<
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • array_size() macro will do overflow check anyway.

    Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Comment says ELF header is "too large to be on stack". 64 bytes on
    64-bit is not large by any means.

    Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If some mapping goes past TASK_SIZE it will be rejected by kernel which
    means no such userspace binaries exist.

    Mark every such check as unlikely.

    Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • "current->mm" pointer is stable in general except few cases one of which
    execve(2). Compiler can't treat is as stable but it _is_ stable most of
    the time. During ELF loading process ->mm becomes stable right after
    flush_old_exec().

    Help compiler by caching current->mm, otherwise it continues to refetch
    it.

    add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-141 (-141)
    Function old new delta
    elf_core_dump 5062 5039 -23
    load_elf_binary 5426 5308 -118

    Note: other cases are left as is because it is either pessimisation or
    no change in binary size.

    Link: http://lkml.kernel.org/r/20191215124755.GB21124@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • ELF header is read into bprm->buf[] by generic execve code.

    Save a memcpy and allocate just one header for the interpreter instead
    of two headers (64 bytes instead of 128 on 64-bit).

    Link: http://lkml.kernel.org/r/20191208171242.GA19716@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Only executable segments should be accounted to ->start_code just like
    they do to ->end_code (correctly).

    Link: http://lkml.kernel.org/r/20191208171410.GB19716@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Filling auxv vector as array with index (auxv[i++] = ...) generates
    terrible code. "saved_auxv" should be reworked because it is the worst
    member of mm_struct by size/usefullness ratio but do it later.

    Meanwhile help gcc a little with *auxv++ idiom.

    Space savings on x86_64:

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-127 (-127)
    Function old new delta
    load_elf_binary 5470 5343 -127

    Link: http://lkml.kernel.org/r/20191208172301.GD19716@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

05 Dec, 2019

2 commits


15 Nov, 2019

1 commit

  • We store elapsed time for a crashed process in struct elf_prstatus using
    'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
    incompatible with the kernel's idea of timeval since the structure layout
    no longer matches on 32-bit architectures.

    This changes the definition of the elf_prstatus structure to use
    __kernel_old_timeval instead, which is hardcoded to the currently used
    binary layout. There is no risk of overflow in y2038 though, because
    the time values are all relative times, and can store up to 68 years
    of process elapsed time.

    There is a risk of applications breaking at build time when they
    use the new kernel headers and expect the type to be exactly 'timeval'
    rather than a structure that has the same fields as before. Those
    applications have to be modified to deal with 64-bit time_t anyway.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

07 Oct, 2019

1 commit

  • In commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") we
    changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
    executable mappings.

    Then, people reported that it broke some binaries that had overlapping
    segments from the same file, and commit ad55eac74f20 ("elf: enforce
    MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
    overlaying elf segment cases. But only some - despite the summary line
    of that commit, it only did it when it also does a temporary brk vma for
    one obvious overlapping case.

    Now Russell King reports another overlapping case with old 32-bit x86
    binaries, which doesn't trigger that limited case. End result: we had
    better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.

    Yes, it's a sign of old binaries generated with old tool-chains, but we
    do pride ourselves on not breaking existing setups.

    This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
    and the old load_elf_library() use-cases, because nobody has reported
    breakage for those. Yet.

    Note that in all the cases seen so far, the overlapping elf sections
    seem to be just re-mapping of the same executable with different section
    attributes. We could possibly introduce a new MAP_FIXED_NOFILECHANGE
    flag or similar, which acts like NOREPLACE, but allows just remapping
    the same executable file using different protection flags.

    It's not clear that would make a huge difference to anything, but if
    people really hate that "elf remaps over previous maps" behavior, maybe
    at least a more limited form of remapping would alleviate some concerns.

    Alternatively, we should take a look at our elf_map() logic to see if we
    end up not mapping things properly the first time.

    In the meantime, this is the minimal "don't do that then" patch while
    people hopefully think about it more.

    Reported-by: Russell King
    Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
    Fixes: ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments")
    Cc: Michal Hocko
    Cc: Kees Cook
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

27 Sep, 2019

1 commit

  • When brk was moved for binaries without an interpreter, it should have
    been limited to ET_DYN only. In other words, the special case was an
    ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
    The bug manifested for giant static executables, where the brk would end
    up in the middle of the text area on 32-bit architectures.

    Reported-and-tested-by: Richard Kojedzinszky
    Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
    Cc: stable@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: Linus Torvalds

    Kees Cook
     

25 Sep, 2019

1 commit

  • Patch series "Provide generic top-down mmap layout functions", v6.

    This series introduces generic functions to make top-down mmap layout
    easily accessible to architectures, in particular riscv which was the
    initial goal of this series. The generic implementation was taken from
    arm64 and used successively by arm, mips and finally riscv.

    Note that in addition the series fixes 2 issues:

    - stack randomization was taken into account even if not necessary.

    - [1] fixed an issue with mmap base which did not take into account
    randomization but did not report it to arm and mips, so by moving arm64
    into a generic library, this problem is now fixed for both
    architectures.

    This work is an effort to factorize architecture functions to avoid code
    duplication and oversights as in [1].

    [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html

    This patch (of 14):

    This preparatory commit moves this function so that further introduction
    of generic topdown mmap layout is contained only in mm/util.c.

    Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr
    Signed-off-by: Alexandre Ghiti
    Acked-by: Kees Cook
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Luis Chamberlain
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: James Hogan
    Cc: Palmer Dabbelt
    Cc: Albert Ou
    Cc: Alexander Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ghiti
     

17 Jul, 2019

1 commit

  • "passed_fileno" variable was deleted 11 years ago in 2.6.25.

    Link: http://lkml.kernel.org/r/20190529201747.GA23248@avx2
    Fixes: d20894a23708 ("Remove a.out interpreter support in ELF loader")
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have MODULE_LICENCE("GPL*") inside which was used in the initial
    scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

9 commits

  • Commmit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"),
    made changes in the rare case when the ELF loader was directly invoked
    (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of
    the loader), by moving into the mmap region to avoid both ET_EXEC and
    PIE binaries. This had the effect of also moving the brk region into
    mmap, which could lead to the stack and brk being arbitrarily close to
    each other. An unlucky process wouldn't get its requested stack size
    and stack allocations could end up scribbling on the heap.

    This is illustrated here. In the case of using the loader directly, brk
    (so helpfully identified as "[heap]") is allocated with the _loader_ not
    the binary. For example, with ASLR entirely disabled, you can see this
    more clearly:

    $ /bin/cat /proc/self/maps
    555555554000-55555555c000 r-xp 00000000 ... /bin/cat
    55555575b000-55555575c000 r--p 00007000 ... /bin/cat
    55555575c000-55555575d000 rw-p 00008000 ... /bin/cat
    55555575d000-55555577e000 rw-p 00000000 ... [heap]
    ...
    7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
    7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
    7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffe000-7ffff7fff000 rw-p 00000000 ...
    7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]

    $ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps
    ...
    7ffff7bcc000-7ffff7bd4000 r-xp 00000000 ... /bin/cat
    7ffff7bd4000-7ffff7dd3000 ---p 00008000 ... /bin/cat
    7ffff7dd3000-7ffff7dd4000 r--p 00007000 ... /bin/cat
    7ffff7dd4000-7ffff7dd5000 rw-p 00008000 ... /bin/cat
    7ffff7dd5000-7ffff7dfc000 r-xp 00000000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7fb2000-7ffff7fd6000 rw-p 00000000 ...
    7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar]
    7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso]
    7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so
    7ffff7ffe000-7ffff8020000 rw-p 00000000 ... [heap]
    7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack]

    The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since
    nothing is there in the direct loader case (and ET_EXEC is still far
    away at 0x400000). Anything that ran before should still work (i.e.
    the ultimately-launched binary already had the brk very far from its
    text, so this should be no different from a COMPAT_BRK standpoint). The
    only risk I see here is that if someone started to suddenly depend on
    the entire memory space lower than the mmap region being available when
    launching binaries via a direct loader execs which seems highly
    unlikely, I'd hope: this would mean a binary would _not_ work when
    exec()ed normally.

    (Note that this is only done under CONFIG_ARCH_HAS_ELF_RANDOMIZATION
    when randomization is turned on.)

    Link: http://lkml.kernel.org/r/20190422225727.GA21011@beast
    Link: https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=Omfhx_p0nCKPSjA@mail.gmail.com
    Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
    Signed-off-by: Kees Cook
    Reported-by: Ali Saidi
    Cc: Ali Saidi
    Cc: Guenter Roeck
    Cc: Michal Hocko
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Jann Horn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Get "current_pt_regs" pointer right before usage.

    Space savings on x86_64:

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-180 (-180)
    Function old new delta
    load_elf_binary 5806 5626 -180 !!!

    Looks like the compiler doesn't know that "current_pt_regs" is stable
    pointer (because it doesn't know ->stack isn't) even though it knows
    that "current" is stable pointer. So it saves it in the very beginning
    and then tries to carry it through a lot of code.

    Here is what happens here:

    load_elf_binary()
    ...
    mov rax,QWORD PTR gs:0x14c00
    mov r13,QWORD PTR [rax+0x18] r13 = current->stack
    call kmem_cache_alloc # first kmalloc

    [980 bytes later!]

    # let's spill that sucker because we need a register
    # for "load_bias" calculations at
    #
    # if (interpreter) {
    # load_bias = ELF_ET_DYN_BASE;
    # if (current->flags & PF_RANDOMIZE)
    # load_bias += arch_mmap_rnd();
    # elf_flags |= elf_fixed;
    # }
    mov QWORD PTR [rsp+0x68],r13

    If this is not _the_ root cause it is still eeeeh.

    After the patch things become much simpler:

    mov rax, QWORD PTR gs:0x14c00 # current
    mov rdx, QWORD PTR [rax+0x18] # current->stack
    movq [rdx+0x3fb8], 0 # fill pt_regs
    ...
    call finalize_exec

    Link: http://lkml.kernel.org/r/20190419200343.GA19788@avx2
    Signed-off-by: Alexey Dobriyan
    Tested-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There are two places where mapping protections are calculated: one for
    executable, another one for interpreter -- take them out.

    ELF read and execute permissions are interchanged with Linux PROT_READ
    and PROT_EXEC, microoptimizations are welcome!

    Link: http://lkml.kernel.org/r/20190417213413.GB26474@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190416202002.GB24304@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Rewrite

    for (...) {
    if (->p_type == PT_INTERP) {
    ...
    break;
    }
    }

    loop into

    for (...) {
    if (->p_type != PT_INTERP)
    continue;
    ...
    break;
    }

    Link: http://lkml.kernel.org/r/20190416201906.GA24304@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190314205042.GE18143@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There is no reason for PT_INTERP filename to linger till the end of the
    whole loading process.

    Link: http://lkml.kernel.org/r/20190314204953.GD18143@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Nikitas Angelinas
    Reviewed-by: Andrew Morton
    Cc: Mukesh Ojha
    [nikitas.angelinas@gmail.com: fix GPF when dereferencing invalid interpreter]
    Link: http://lkml.kernel.org/r/20190330140032.GA1527@vostro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Link: http://lkml.kernel.org/r/20190314204707.GC18143@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • As pointed out by zoujc@lenovo.com, setup_arg_pages() already
    initialized current->mm->start_stack.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=202881
    Reported-by:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Mar, 2019

3 commits

  • Link: http://lkml.kernel.org/r/20190204202830.GC27482@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • [adobriyan@gmail.com: fixup compilation]
    Link: http://lkml.kernel.org/r/20190205064334.GA2152@avx2
    Link: http://lkml.kernel.org/r/20190204202800.GB27482@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Number of ELF program headers is 16-bit by spec, so total size
    comfortably fits into "unsigned int".

    Space savings: 7 bytes!

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-7 (-7)
    Function old new delta
    load_elf_phdrs 137 130 -7

    Link: http://lkml.kernel.org/r/20190204202715.GA27482@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

03 Oct, 2018

1 commit

  • Linus recently observed that if we did not worry about the padding
    member in struct siginfo it is only about 48 bytes, and 48 bytes is
    much nicer than 128 bytes for allocating on the stack and copying
    around in the kernel.

    The obvious thing of only adding the padding when userspace is
    including siginfo.h won't work as there are sigframe definitions in
    the kernel that embed struct siginfo.

    So split siginfo in two; kernel_siginfo and siginfo. Keeping the
    traditional name for the userspace definition. While the version that
    is used internally to the kernel and ultimately will not be padded to
    128 bytes is called kernel_siginfo.

    The definition of struct kernel_siginfo I have put in include/signal_types.h

    A set of buildtime checks has been added to verify the two structures have
    the same field offsets.

    To make it easy to verify the change kernel_siginfo retains the same
    size as siginfo. The reduction in size comes in a following change.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

14 Aug, 2018

1 commit

  • Pull MIPS updates from Paul Burton:
    "Here are the main MIPS changes for 4.19.

    An overview of the general architecture changes:

    - Massive DMA ops refactoring from Christoph Hellwig (huzzah for
    deleting crufty code!).

    - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes &
    corresponding regsets to expose DSP ASE & floating point mode state
    respectively, both for live debugging & core dumps.

    - We better optimize our code by hard-coding cpu_has_* macros at
    compile time where their values are known due to the ISA revision
    that the kernel build is targeting.

    - The EJTAG exception handler now better handles SMP systems, where
    it was previously possible for CPUs to clobber a register value
    saved by another CPU.

    - Our implementation of memset() gained a couple of fixes for MIPSr6
    systems to return correct values in some cases where stores fault.

    - We now implement ioremap_wc() using the uncached-accelerated cache
    coherency attribute where supported, which is detected during boot,
    and fall back to plain uncached access where necessary. The
    MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
    ioremap_cacheable_cow() are removed.

    - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
    systems by reworking the way we ensure remote CPUs that may be
    running threads within the affected process switch mode.

    - Systems using the MIPS Coherence Manager will now set the
    MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
    maintenance overhead when flushing the icache.

    - A few fixes were made for building with clang/LLVM, which now
    sucessfully builds kernels for many of our platforms.

    - Miscellaneous cleanups all over.

    And some platform-specific changes:

    - ar7 gained stubs for a few clock API functions to fix build
    failures for some drivers.

    - ath79 gained support for a few new SoCs, a few fixes & better
    gpio-keys support.

    - Ci20 now exposes its SPI bus using the spi-gpio driver.

    - The generic platform can now auto-detect a suitable value for
    PHYS_OFFSET based upon the memory map described by the device tree,
    allowing us to avoid wasting memory on page book-keeping for
    systems where RAM starts at a non-zero physical address.

    - Ingenic systems using the jz4740 platform code now link their
    vmlinuz higher to allow for kernels of a realistic size.

    - Loongson32 now builds the kernel targeting MIPSr1 rather than
    MIPSr2 to avoid CPU errata.

    - Loongson64 gains a couple of fixes, a workaround for a write
    buffering issue & support for the Loongson 3A R3.1 CPU.

    - Malta now uses the piix4-poweroff driver to handle powering down.

    - Microsemi Ocelot gained support for its SPI bus & NOR flash, its
    second MDIO bus and can now be supported by a FIT/.itb image.

    - Octeon saw a bunch of header cleanups which remove a lot of
    duplicate or unused code"

    * tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits)
    MIPS: Remove remnants of UASM_ISA
    MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()
    MIPS: VDSO: Force link endianness
    MIPS: Always specify -EB or -EL when using clang
    MIPS: Use dins to simplify __write_64bit_c0_split()
    MIPS: Use read-write output operand in __write_64bit_c0_split()
    MIPS: Avoid using array as parameter to write_c0_kpgd()
    MIPS: vdso: Allow clang's --target flag in VDSO cflags
    MIPS: genvdso: Remove GOT checks
    MIPS: Remove obsolete MIPS checks for DST node "chosen@0"
    MIPS: generic: Remove input symbols from defconfig
    MIPS: Delete unused code in linux32.c
    MIPS: Remove unused sys_32_mmap2
    MIPS: Remove nabi_no_regargs
    mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123
    mips: dts: mscc: Add spi on Ocelot
    MIPS: Loongson: Merge load addresses
    MIPS: Loongson: Set Loongson32 to MIPS32R1
    MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller
    MIPS: generic: Select MIPS_AUTO_PFN_OFFSET
    ...

    Linus Torvalds
     

20 Jul, 2018

1 commit

  • The regset API documented in defines -ENODEV as the
    result of the `->active' handler to be used where the feature requested
    is not available on the hardware found. However code handling core file
    note generation in `fill_thread_core_info' interpretes any non-zero
    result from the `->active' handler as the regset requested being active.
    Consequently processing continues (and hopefully gracefully fails later
    on) rather than being abandoned right away for the regset requested.

    Fix the problem then by making the code proceed only if a positive
    result is returned from the `->active' handler.

    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Paul Burton
    Fixes: 4206d3aa1978 ("elf core dump: notes user_regset")
    Patchwork: https://patchwork.linux-mips.org/patch/19332/
    Cc: Alexander Viro
    Cc: James Hogan
    Cc: Ralf Baechle
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-kernel@vger.kernel.org

    Maciej W. Rozycki
     

15 Jul, 2018

1 commit

  • The current code does not make sure to page align bss before calling
    vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to
    the requested lenght not being correctly aligned.

    Let us make sure to align it properly.

    Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured
    for libc5.

    Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
    Signed-off-by: Oscar Salvador
    Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com
    Tested-by: Tetsuo Handa
    Acked-by: Kees Cook
    Cc: Michal Hocko
    Cc: Nicolas Pitre
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     

15 Jun, 2018

1 commit

  • Nobody ever tried to self destruct by unmapping whole address space at
    once:

    munmap((void *)0, (1ULL << 47) - 4096);

    Doing this produces 2 warnings for zero-length vmalloc allocations:

    a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14
    a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
    ...
    a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
    ...

    Fix is to switch to kvmalloc().

    Steps to reproduce:

    // vsyscall=none
    #include
    #include
    int main(void)
    {
    setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY});
    munmap((void *)0, (1ULL << 47) - 4096);
    return 0;
    }

    Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

13 Jun, 2018

2 commits

  • The vmalloc() function has no 2-factor argument form, so multiplication
    factors need to be wrapped in array_size(). This patch replaces cases of:

    vmalloc(a * b)

    with:
    vmalloc(array_size(a, b))

    as well as handling cases of:

    vmalloc(a * b * c)

    with:

    vmalloc(array3_size(a, b, c))

    This does, however, attempt to ignore constant size factors like:

    vmalloc(4 * 1024)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vmalloc(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vmalloc(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vmalloc(C1 * C2 * C3, ...)
    |
    vmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vmalloc(C1 * C2, ...)
    |
    vmalloc(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     
  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

21 Apr, 2018

1 commit

  • Commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is
    printing spurious messages under memory pressure due to map_addr == -ENOMEM.

    9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
    14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
    16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

    Complain only if -EEXIST, and use %px for printing the address.

    Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
    Fixes: 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Cc: Andrei Vagin
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

12 Apr, 2018

3 commits

  • Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from
    elf_map" applied, some ELF binaries in his environment fail to start
    with

    [ 23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already
    [ 23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon

    The reason is that the above binary has overlapping elf segments:

    LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000
    0x0000000000013a8c 0x0000000000013a8c R E 10000
    LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
    0x00000000000002c0 0x00000000000005e8 RW 10000
    LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328
    0x0000000000000384 0x00000000000094a0 RW 10000

    That binary has two RW LOAD segments, the first crosses a page border
    into the second

    0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)

    Handle this situation by enforcing MAP_FIXED when we establish a
    temporary brk VMA to handle overlapping segments. All other mappings
    will still use MAP_FIXED_NOREPLACE.

    Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Reported-by: Anshuman Khandual
    Reviewed-by: Khalid Aziz
    Cc: Andrei Vagin
    Cc: Michael Ellerman
    Cc: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Stephen Rothwell
    Cc: Mark Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Both load_elf_interp and load_elf_binary rely on elf_map to map segments
    on a controlled address and they use MAP_FIXED to enforce that. This is
    however dangerous thing prone to silent data corruption which can be
    even exploitable.

    Let's take CVE-2017-1000253 as an example. At the time (before commit
    eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
    ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
    the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
    we could end up mapping over the existing stack with some luck.

    The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix
    bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much
    further from the stack (eab09532d400 and later by c715b72c1ba4: "mm:
    revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive
    stack consumption early during execve fully stopped by da029c11e6b1
    ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
    safe and any attack should be impractical. On the other hand this is
    just too subtle assumption so it can break quite easily and hard to
    spot.

    I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
    fundamentally dangerous. Moreover it shouldn't be even needed. We are
    at the early process stage and so there shouldn't be unrelated mappings
    (except for stack and loader) existing so mmap for a given address should
    succeed even without MAP_FIXED. Something is terribly wrong if this is
    not the case and we should rather fail than silently corrupt the
    underlying mapping.

    Address this issue by changing MAP_FIXED to the newly added
    MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an
    existing mapping clashing with the requested one without clobbering it.

    [mhocko@suse.com: fix build]
    [akpm@linux-foundation.org: coding-style fixes]
    [avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC]
    Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org
    Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrei Vagin
    Signed-off-by: Michal Hocko
    Reviewed-by: Khalid Aziz
    Acked-by: Michael Ellerman
    Acked-by: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Provide a final callback into fs/exec.c before start_thread() takes
    over, to handle any last-minute changes, like the coming restoration of
    the stack limit.

    Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org
    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Ben Hutchings
    Cc: Ben Hutchings
    Cc: Brad Spengler
    Cc: Greg KH
    Cc: Hugh Dickins
    Cc: "Jason A. Donenfeld"
    Cc: Laura Abbott
    Cc: Michal Hocko
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Willy Tarreau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

07 Feb, 2018

1 commit

  • If vm.max_map_count bumped above 2^26 (67+ mil) and system has enough RAM
    to allocate all the VMAs (~12.8 GB on Fedora 27 with 200-byte VMAs), then
    it should be possible to overflow 32-bit "size", pass paranoia check,
    allocate very little vmalloc space and oops while writing into vmalloc
    guard page...

    But I didn't test this, only coredump of regular process.

    Link: http://lkml.kernel.org/r/20180112203427.GA9109@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan