07 Feb, 2018

1 commit

  • If vm.max_map_count bumped above 2^26 (67+ mil) and system has enough RAM
    to allocate all the VMAs (~12.8 GB on Fedora 27 with 200-byte VMAs), then
    it should be possible to overflow 32-bit "size", pass paranoia check,
    allocate very little vmalloc space and oops while writing into vmalloc
    guard page...

    But I didn't test this, only coredump of regular process.

    Link: http://lkml.kernel.org/r/20180112203427.GA9109@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

17 Nov, 2017

1 commit

  • Pull ARM updates from Russell King:

    - add support for ELF fdpic binaries on both MMU and noMMU platforms

    - linker script cleanups

    - support for compressed .data section for XIP images

    - discard memblock arrays when possible

    - various cleanups

    - atomic DMA pool updates

    - better diagnostics of missing/corrupt device tree

    - export information to allow userspace kexec tool to place images more
    inteligently, so that the device tree isn't overwritten by the
    booting kernel

    - make early_printk more efficient on semihosted systems

    - noMMU cleanups

    - SA1111 PCMCIA update in preparation for further cleanups

    * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (38 commits)
    ARM: 8719/1: NOMMU: work around maybe-uninitialized warning
    ARM: 8717/2: debug printch/printascii: translate '\n' to "\r\n" not "\n\r"
    ARM: 8713/1: NOMMU: Support MPU in XIP configuration
    ARM: 8712/1: NOMMU: Use more MPU regions to cover memory
    ARM: 8711/1: V7M: Add support for MPU to M-class
    ARM: 8710/1: Kconfig: Kill CONFIG_VECTORS_BASE
    ARM: 8709/1: NOMMU: Disallow MPU for XIP
    ARM: 8708/1: NOMMU: Rework MPU to be mostly done in C
    ARM: 8707/1: NOMMU: Update MPU accessors to use cp15 helpers
    ARM: 8706/1: NOMMU: Move out MPU setup in separate module
    ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()
    ARM: 8705/1: early_printk: use printascii() rather than printch()
    ARM: 8703/1: debug.S: move hexbuf to a writable section
    ARM: add additional table to compressed kernel
    ARM: decompressor: fix BSS size calculation
    pcmcia: sa1111: remove special sa1111 mmio accessors
    pcmcia: sa1111: use sa1111_get_irq() to obtain IRQ resources
    ARM: better diagnostics with missing/corrupt dtb
    ARM: 8699/1: dma-mapping: Remove init_dma_coherent_pool_size()
    ARM: 8698/1: dma-mapping: Mark atomic_pool as __ro_after_init
    ..

    Linus Torvalds
     

03 Nov, 2017

1 commit

  • Currently the regset API doesn't allow for the possibility that
    regsets (or at least, the amount of meaningful data in a regset)
    may change in size.

    In particular, this results in useless padding being added to
    coredumps if a regset's current size is smaller than its
    theoretical maximum size.

    This patch adds a get_size() function to struct user_regset.
    Individual regset implementations can implement this function to
    return the current size of the regset data. A regset_size()
    function is added to provide callers with an abstract interface for
    determining the size of a regset without needing to know whether
    the regset is dynamically sized or not.

    The only affected user of this interface is the ELF coredump code:
    This patch ports ELF coredump to dump regsets with their actual
    size in the coredump. This has no effect except for new regsets
    that are dynamically sized and provide a get_size() implementation.

    Signed-off-by: Dave Martin
    Reviewed-by: Catalin Marinas
    Cc: Oleg Nesterov
    Cc: Alexander Viro
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Dmitry Safonov
    Cc: H. J. Lu
    Signed-off-by: Will Deacon

    Dave Martin
     

03 Oct, 2017

1 commit


15 Sep, 2017

1 commit

  • Pull more set_fs removal from Al Viro:
    "Christoph's 'use kernel_read and friends rather than open-coding
    set_fs()' series"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: unexport vfs_readv and vfs_writev
    fs: unexport vfs_read and vfs_write
    fs: unexport __vfs_read/__vfs_write
    lustre: switch to kernel_write
    gadget/f_mass_storage: stop messing with the address limit
    mconsole: switch to kernel_read
    btrfs: switch write_buf to kernel_write
    net/9p: switch p9_fd_read to kernel_write
    mm/nommu: switch do_mmap_private to kernel_read
    serial2002: switch serial2002_tty_write to kernel_{read/write}
    fs: make the buf argument to __kernel_write a void pointer
    fs: fix kernel_write prototype
    fs: fix kernel_read prototype
    fs: move kernel_read to fs/read_write.c
    fs: move kernel_write to fs/read_write.c
    autofs4: switch autofs4_write to __kernel_write
    ashmem: switch to ->read_iter

    Linus Torvalds
     

11 Sep, 2017

1 commit

  • On platforms where both ELF and ELF-FDPIC variants are available, the
    regular ELF loader will happily identify FDPIC binaries as proper ELF
    and load them without the necessary FDPIC fixups, resulting in an
    immediate user space crash. Let's prevent binflt_elf from loading those
    binaries so binfmt_elf_fdpic has a chance to pick them up. For those
    architectures that don't define elf_check_fdpic(), a default version
    returning false is provided.

    Signed-off-by: Nicolas Pitre
    Acked-by: Mickael GUENE
    Tested-by: Vincent Abriou
    Tested-by: Andras Szemzo

    Nicolas Pitre
     

08 Sep, 2017

1 commit

  • Pull secureexec update from Kees Cook:
    "This series has the ultimate goal of providing a sane stack rlimit
    when running set*id processes.

    To do this, the bprm_secureexec LSM hook is collapsed into the
    bprm_set_creds hook so the secureexec-ness of an exec can be
    determined early enough to make decisions about rlimits and the
    resulting memory layouts. Other logic acting on the secureexec-ness of
    an exec is similarly consolidated. Capabilities needed some special
    handling, but the refactoring removed other special handling, so that
    was a wash"

    * tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    exec: Consolidate pdeath_signal clearing
    exec: Use sane stack rlimit under secureexec
    exec: Consolidate dumpability logic
    smack: Remove redundant pdeath_signal clearing
    exec: Use secureexec for clearing pdeath_signal
    exec: Use secureexec for setting dumpability
    LSM: drop bprm_secureexec hook
    commoncap: Move cap_elevated calculation into bprm_set_creds
    commoncap: Refactor to remove bprm_secureexec hook
    smack: Refactor to remove bprm_secureexec hook
    selinux: Refactor to remove bprm_secureexec hook
    apparmor: Refactor to remove bprm_secureexec hook
    binfmt: Introduce secureexec flag
    exec: Correct comments about "point of no return"
    exec: Rename bprm->cred_prepared to called_set_creds

    Linus Torvalds
     

05 Sep, 2017

1 commit

  • Use proper ssize_t and size_t types for the return value and count
    argument, move the offset last and make it an in/out argument like
    all other read/write helpers, and make the buf argument a void pointer
    to get rid of lots of casts in the callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

17 Aug, 2017

1 commit

  • The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and
    randomize_stack_top() are not required.

    PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not
    set, no need to re-check after that.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Dmitry Safonov
    Cc: stable@vger.kernel.org
    Cc: Andy Lutomirski
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: "Kirill A. Shutemov"
    Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com

    Oleg Nesterov
     

02 Aug, 2017

1 commit

  • The bprm_secureexec hook can be moved earlier. Right now, it is called
    during create_elf_tables(), via load_binary(), via search_binary_handler(),
    via exec_binprm(). Nearly all (see exception below) state used by
    bprm_secureexec is created during the bprm_set_creds hook, called from
    prepare_binprm().

    For all LSMs (except commoncaps described next), only the first execution
    of bprm_set_creds takes any effect (they all check bprm->called_set_creds
    which prepare_binprm() sets after the first call to the bprm_set_creds
    hook). However, all these LSMs also only do anything with bprm_secureexec
    when they detected a secure state during their first run of bprm_set_creds.
    Therefore, it is functionally identical to move the detection into
    bprm_set_creds, since the results from secureexec here only need to be
    based on the first call to the LSM's bprm_set_creds hook.

    The single exception is that the commoncaps secureexec hook also examines
    euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
    via prepare_binprm(), which can be called multiple times (e.g.
    binfmt_script, binfmt_misc), and may clear the euid/egid for the final
    load (i.e. the script interpreter). However, while commoncaps specifically
    ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
    prepare_binprm() may get called, it needs to base the secureexec decision
    on the final call to bprm_set_creds. As a result, it will need special
    handling.

    To begin this refactoring, this adds the secureexec flag to the bprm
    struct, and calls the secureexec hook during setup_new_exec(). This is
    safe since all the cred work is finished (and past the point of no return).
    This explicit call will be removed in later patches once the hook has been
    removed.

    Cc: David Howells
    Signed-off-by: Kees Cook
    Reviewed-by: John Johansen
    Acked-by: Serge Hallyn
    Reviewed-by: James Morris

    Kees Cook
     

11 Jul, 2017

2 commits

  • When building the argv/envp pointers, the envp is needlessly
    pre-incremented instead of just continuing after the argv pointers are
    finished. In some (likely impossible) race where the strings could be
    changed from userspace between copy_strings() and here, it might be
    possible to confuse the envp position. Instead, just use sp like
    everything else.

    Link: http://lkml.kernel.org/r/20170622173838.GA43308@beast
    Signed-off-by: Kees Cook
    Cc: Rik van Riel
    Cc: Daniel Micay
    Cc: Qualys Security Advisory
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Alexander Viro
    Cc: Dmitry Safonov
    Cc: Andy Lutomirski
    Cc: Grzegorz Andrejczuk
    Cc: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • The ELF_ET_DYN_BASE position was originally intended to keep loaders
    away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2
    /bin/cat" might cause the subsequent load of /bin/cat into where the
    loader had been loaded.)

    With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
    ELF_ET_DYN_BASE continued to be used since the kernel was only looking
    at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the
    top 1/3rd of the TASK_SIZE, a substantial portion of the address space
    is unused.

    For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
    loaded above the mmap region. This means they can be made to collide
    (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
    pathological stack regions.

    Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
    region in all cases, and will now additionally avoid programs falling
    back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
    if it would have collided with the stack, now it will fail to load
    instead of falling back to the mmap region).

    To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
    are loaded into the mmap region, leaving space available for either an
    ET_EXEC binary with a fixed location or PIE being loaded into mmap by
    the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
    which means architectures can now safely lower their values without risk
    of loaders colliding with their subsequently loaded programs.

    For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
    the entire 32-bit address space for 32-bit pointers.

    Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
    suggestions on how to implement this solution.

    Fixes: d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR")
    Link: http://lkml.kernel.org/r/20170621173201.GA114489@beast
    Signed-off-by: Kees Cook
    Acked-by: Rik van Riel
    Cc: Daniel Micay
    Cc: Qualys Security Advisory
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Alexander Viro
    Cc: Dmitry Safonov
    Cc: Andy Lutomirski
    Cc: Grzegorz Andrejczuk
    Cc: Masahiro Yamada
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: James Hogan
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Pratyush Anand
    Cc: Russell King
    Cc: Will Deacon
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

02 Mar, 2017

4 commits


23 Feb, 2017

1 commit

  • On 32-bit powerpc the ELF PLT sections of binaries (built with
    --bss-plt, or with a toolchain which defaults to it) look like this:

    [17] .sbss NOBITS 0002aff8 01aff8 000014 00 WA 0 0 4
    [18] .plt NOBITS 0002b00c 01aff8 000084 00 WAX 0 0 4
    [19] .bss NOBITS 0002b090 01aff8 0000a4 00 WA 0 0 4

    Which results in an ELF load header:

    Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
    LOAD 0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000

    This is all correct, the load region containing the PLT is marked as
    executable. Note that the PLT starts at 0002b00c but the file mapping
    ends at 0002aff8, so the PLT falls in the 0 fill section described by
    the load header, and after a page boundary.

    Unfortunately the generic ELF loader ignores the X bit in the load
    headers when it creates the 0 filled non-file backed mappings. It
    assumes all of these mappings are RW BSS sections, which is not the case
    for PPC.

    gcc/ld has an option (--secure-plt) to not do this, this is said to
    incur a small performance penalty.

    Currently, to support 32-bit binaries with PLT in BSS kernel maps
    *entire brk area* with executable rights for all binaries, even
    --secure-plt ones.

    Stop doing that.

    Teach the ELF loader to check the X bit in the relevant load header and
    create 0 filled anonymous mappings that are executable if the load
    header requests that.

    Test program showing the difference in /proc/$PID/maps:

    int main() {
    char buf[16*1024];
    char *p = malloc(123); /* make "[heap]" mapping appear */
    int fd = open("/proc/self/maps", O_RDONLY);
    int len = read(fd, buf, sizeof(buf));
    write(1, buf, len);
    printf("%p\n", p);
    return 0;
    }

    Compiled using: gcc -mbss-plt -m32 -Os test.c -otest

    Unpatched ppc64 kernel:
    00100000-00120000 r-xp 00000000 00:00 0 [vdso]
    0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so
    0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so
    0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so
    10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test
    10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test
    10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test
    10690000-106c0000 rwxp 00000000 00:00 0 [heap]
    f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so
    f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so
    f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so
    ffa90000-ffac0000 rw-p 00000000 00:00 0 [stack]
    0x10690008

    Patched ppc64 kernel:
    00100000-00120000 r-xp 00000000 00:00 0 [vdso]
    0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so
    0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so
    0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so
    10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test
    10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test
    10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test
    10180000-101b0000 rw-p 00000000 00:00 0 [heap]
    ^^^^ this has changed
    f7c60000-f7c90000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so
    f7c90000-f7ca0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so
    f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so
    ff860000-ff890000 rw-p 00000000 00:00 0 [stack]
    0x10180008

    The patch was originally posted in 2012 by Jason Gunthorpe
    and apparently ignored:

    https://lkml.org/lkml/2012/9/30/138

    Lightly run-tested.

    Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.com
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Denys Vlasenko
    Acked-by: Kees Cook
    Acked-by: Michael Ellerman
    Tested-by: Jason Gunthorpe
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Aneesh Kumar K.V"
    Cc: Oleg Nesterov
    Cc: Florian Weimer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     

01 Feb, 2017

3 commits

  • Use the new nsec based cputime accessors as part of the whole cputime
    conversion from cputime_t to nsecs.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-12-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Now that most cputime readers use the transition API which return the
    task cputime in old style cputime_t, we can safely store the cputime in
    nsecs. This will eventually make cputime statistics less opaque and more
    granular. Back and forth convertions between cputime_t and nsecs in order
    to deal with cputime_t random granularity won't be needed anymore.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • This API returns a task's cputime in cputime_t in order to ease the
    conversion of cputime internals to use nsecs units instead. Blindly
    converting all cputime readers to use this API now will later let us
    convert more smoothly and step by step all these places to use the
    new nsec based cputime.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

15 Jan, 2017

1 commit

  • If the last section of a core file ends with an unmapped or zero page,
    the size of the file does not correspond with the last dump_skip() call.
    gdb complains that the file is truncated and can be confusing to users.

    After all of the vma sections are written, make sure that the file size
    is no smaller than the current file position.

    This problem can be demonstrated with gdb's bigcore testcase on the
    sparc architecture.

    Signed-off-by: Dave Kleikamp
    Cc: Alexander Viro
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Al Viro

    Dave Kleikamp
     

25 Dec, 2016

1 commit


13 Dec, 2016

1 commit

  • We have observed page allocations failures of order 4 during core dump
    while trying to allocate vma_filesz. This results in a useless core
    file of size 0. To improve reliability use vmalloc().

    Note that the vmalloc() allocation is bounded by sysctl_max_map_count,
    which is 65,530 by default. So with a 4k page size, and 8 bytes per
    seg, this is a max of 128 pages or an order 7 allocation. Other parts
    of the core dump path, such as fill_files_note() are already using
    vmalloc() for presumably similar reasons.

    Link: http://lkml.kernel.org/r/1479745791-17611-1-git-send-email-jbaron@akamai.com
    Signed-off-by: Jason Baron
    Cc: Al Viro

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     

15 Sep, 2016

1 commit

  • Killed PR_REG_SIZE and PR_REG_PTR macro as we can get regset size
    from regset view.
    I wish I could also kill PRSTATUS_SIZE nicely.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Dmitry Safonov
    Cc: 0x7f454c46@gmail.com
    Cc: linux-mm@kvack.org
    Cc: luto@kernel.org
    Cc: gorcunov@openvz.org
    Cc: xemul@virtuozzo.com
    Link: http://lkml.kernel.org/r/20160905133308.28234-5-dsafonov@virtuozzo.com
    Signed-off-by: Thomas Gleixner

    Dmitry Safonov
     

01 Sep, 2016

1 commit

  • We used to delay switching to the new credentials until after we had
    mapped the executable (and possible elf interpreter). That was kind of
    odd to begin with, since the new executable will actually then _run_
    with the new creds, but whatever.

    The bigger problem was that we also want to make sure that we turn off
    prof events and tracing before we start mapping the new executable
    state. So while this is a cleanup, it's also a fix for a possible
    information leak.

    Reported-by: Robert Święcki
    Tested-by: Peter Zijlstra
    Acked-by: David Howells
    Acked-by: Oleg Nesterov
    Acked-by: Andy Lutomirski
    Acked-by: Eric W. Biederman
    Cc: Willy Tarreau
    Cc: Kees Cook
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Aug, 2016

1 commit

  • A double-bug exists in the bss calculation code, where an overflow can
    happen in the "last_bss - elf_bss" calculation, but vm_brk internally
    aligns the argument, underflowing it, wrapping back around safe. We
    shouldn't depend on these bugs staying in sync, so this cleans up the
    bss padding handling to avoid the overflow.

    This moves the bss padzero() before the last_bss > elf_bss case, since
    the zero-filling of the ELF_PAGE should have nothing to do with the
    relationship of last_bss and elf_bss: any trailing portion should be
    zeroed, and a zero size is already handled by padzero().

    Then it handles the math on elf_bss vs last_bss correctly. These need
    to both be ELF_PAGE aligned to get the comparison correct, since that's
    the expected granularity of the mappings. Since elf_bss already had
    alignment-based padding happen in padzero(), the "start" of the new
    vm_brk() should be moved forward as done in the original code. However,
    since the "end" of the vm_brk() area will already become PAGE_ALIGNed in
    vm_brk() then last_bss should get aligned here to avoid hiding it as a
    side-effect.

    Additionally makes a cosmetic change to the initial last_bss calculation
    so it's easier to read in comparison to the load_addr calculation above
    it (i.e. the only difference is p_filesz vs p_memsz).

    Link: http://lkml.kernel.org/r/1468014494-25291-2-git-send-email-keescook@chromium.org
    Signed-off-by: Kees Cook
    Reported-by: Hector Marco-Gisbert
    Cc: Ismael Ripoll Ripoll
    Cc: Alexander Viro
    Cc: "Kirill A. Shutemov"
    Cc: Oleg Nesterov
    Cc: Chen Gang
    Cc: Michal Hocko
    Cc: Konstantin Khlebnikov
    Cc: Andrea Arcangeli
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

08 Jun, 2016

1 commit

  • The offset in the core file used to be tracked with ->written field of
    the coredump_params structure. The field was retired in favour of
    file->f_pos.

    However, ->f_pos is not maintained for pipes which leads to breakage.

    Restore explicit tracking of the offset in coredump_params. Introduce
    ->pos field for this purpose since ->written was already reused.

    Fixes: a00839395103 ("get rid of coredump_params->written").

    Reported-by: Zbigniew Jędrzejewski-Szmek
    Signed-off-by: Mateusz Guzik
    Reviewed-by: Omar Sandoval
    Signed-off-by: Al Viro

    Mateusz Guzik
     

28 May, 2016

1 commit

  • The do_brk() and vm_brk() return value was "unsigned long" and returned
    the starting address on success, and an error value on failure. The
    reasons are entirely historical, and go back to it basically behaving
    like the mmap() interface does.

    However, nobody actually wanted that interface, and it causes totally
    pointless IS_ERR_VALUE() confusion.

    What every single caller actually wants is just the simpler integer
    return of zero for success and negative error number on failure.

    So just convert to that much clearer and more common calling convention,
    and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 May, 2016

1 commit

  • load_elf_library doesn't handle vm_brk failure although nothing really
    indicates it cannot do that because the function is allowed to fail due
    to vm_mmap failures already. This might be not a problem now but later
    patch will make vm_brk killable (resp. mmap_sem for write waiting will
    become killable) and so the failure will be more probable.

    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

19 May, 2016

1 commit

  • Pull misc vfs cleanups from Al Viro:
    "Assorted cleanups and fixes all over the place"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    coredump: only charge written data against RLIMIT_CORE
    coredump: get rid of coredump_params->written
    ecryptfs_lookup(): try either only encrypted or plaintext name
    ecryptfs: avoid multiple aliases for directories
    bpf: reject invalid names right in ->lookup()
    __d_alloc(): treat NULL name as QSTR("/", 1)
    mtd: switch ubi_open_volume_path() to vfs_stat()
    mtd: switch open_mtd_by_chdev() to use of vfs_stat()

    Linus Torvalds
     

13 May, 2016

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

28 Feb, 2016

1 commit

  • Replace calls to get_random_int() followed by a cast to (unsigned long)
    with calls to get_random_long(). Also address shifting bug which, in
    case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

    Signed-off-by: Daniel Cashman
    Acked-by: Kees Cook
    Cc: "Theodore Ts'o"
    Cc: Arnd Bergmann
    Cc: Greg Kroah-Hartman
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: David S. Miller
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Al Viro
    Cc: Nick Kralevich
    Cc: Jeff Vander Stoep
    Cc: Mark Salyzyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Cashman
     

20 Jan, 2016

1 commit

  • Also pass any interpreter's file header to `arch_check_elf' so that any
    architecture handler can have a look at it if needed.

    Signed-off-by: Maciej W. Rozycki
    Acked-by: Andrew Morton
    Acked-by: Al Viro
    Cc: Matthew Fortune
    Cc: linux-mips@linux-mips.org
    Cc: linux-kernel@vger.kernel.org
    Patchwork: https://patchwork.linux-mips.org/patch/11478/
    Signed-off-by: Ralf Baechle

    Maciej W. Rozycki
     

12 Nov, 2015

1 commit

  • Pull vfs update from Al Viro:

    - misc stable fixes

    - trivial kernel-doc and comment fixups

    - remove never-used block_page_mkwrite() wrapper function, and rename
    the function that is _actually_ used to not have double underscores.

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: 9p: cache.h: Add #define of include guard
    vfs: remove stale comment in inode_operations
    vfs: remove unused wrapper block_page_mkwrite()
    binfmt_elf: Correct `arch_check_elf's description
    fs: fix writeback.c kernel-doc warnings
    fs: fix inode.c kernel-doc warning
    fs/pipe.c: return error code rather than 0 in pipe_write()
    fs/pipe.c: preserve alloc_file() error code
    binfmt_elf: Don't clobber passed executable's file header
    FS-Cache: Handle a write to the page immediately beyond the EOF marker
    cachefiles: perform test on s_blocksize when opening cache file.
    FS-Cache: Don't override netfs's primary_index if registering failed
    FS-Cache: Increase reference of parent after registering, netfs success
    debugfs: fix refcount imbalance in start_creating

    Linus Torvalds
     

11 Nov, 2015

2 commits

  • Correct `arch_check_elf's description, mistakenly copied and pasted from
    `arch_elf_pt_proc'.

    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Al Viro

    Maciej W. Rozycki
     
  • Do not clobber the buffer space passed from `search_binary_handler' and
    originally preloaded by `prepare_binprm' with the executable's file
    header by overwriting it with its interpreter's file header. Instead
    keep the buffer space intact and directly use the data structure locally
    allocated for the interpreter's file header, fixing a bug introduced in
    2.1.14 with loadable module support (linux-mips.org commit beb11695
    [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
    Adjust the amount of data read from the interpreter's file accordingly.

    This was not an issue before loadable module support, because back then
    `load_elf_binary' was executed only once for a given ELF executable,
    whether the function succeeded or failed.

    With loadable module support supported and enabled, upon a failure of
    `load_elf_binary' -- which may for example be caused by architecture
    code rejecting an executable due to a missing hardware feature requested
    in the file header -- a module load is attempted and then the function
    reexecuted by `search_binary_handler'. With the executable's file
    header replaced with its interpreter's file header the executable can
    then be erroneously accepted in this subsequent attempt.

    Cc: stable@vger.kernel.org # all the way back
    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Al Viro

    Maciej W. Rozycki
     

10 Nov, 2015

1 commit

  • Add two new flags to the existing coredump mechanism for ELF files to
    allow us to explicitly filter DAX mappings. This is desirable because
    DAX mappings, like hugetlb mappings, have the potential to be very
    large.

    Update the coredump_filter documentation in
    Documentation/filesystems/proc.txt so that it addresses the new DAX
    coredump flags. Also update the documented default value of
    coredump_filter to be consistent with the core(5) man page. The
    documentation being updated talks about bit 4, Dump ELF headers, which
    is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the
    kernel config. This kernel config option defaults to "y" if both ELF
    binaries and coredump are enabled.

    Signed-off-by: Ross Zwisler
    Acked-by: Jeff Moyer
    Signed-off-by: Dan Williams

    Ross Zwisler
     

05 Jul, 2015

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     

24 Jun, 2015

1 commit