17 Oct, 2020

4 commits

  • In both binfmt_elf and binfmt_elf_fdpic, use a new helper
    dump_vma_snapshot() to take a snapshot of the VMA list (including the gate
    VMA, if we have one) while protected by the mmap_lock, and then use that
    snapshot instead of walking the VMA list without locking.

    An alternative approach would be to keep the mmap_lock held across the
    entire core dumping operation; however, keeping the mmap_lock locked while
    we may be blocked for an unbounded amount of time (e.g. because we're
    dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock
    blocks things like the ->release handler of userfaultfd, and we don't
    really want critical system daemons to grind to a halt just because
    someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or
    something like that.

    Since both the normal ELF code and the FDPIC ELF code need this
    functionality (and if any other binfmt wants to add coredump support in
    the future, they'd probably need it, too), implement this with a common
    helper in fs/coredump.c.

    A downside of this approach is that we now need a bigger amount of kernel
    memory per userspace VMA in the normal ELF case, and that we need O(n)
    kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't
    be terribly bad.

    There currently is a data race between stack expansion and anything that
    reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to
    mitigate that for core dumping, take the mmap_lock in write mode when
    taking a snapshot of the VMA hierarchy. (If we only took the mmap_lock in
    read mode, we could end up with a corrupted core dump if someone does
    get_user_pages_remote() concurrently. Not really a major problem, but
    taking the mmap_lock either way works here, so we might as well avoid the
    issue.) (This doesn't do anything about the existing data races with stack
    expansion in other mm code.)

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly
    different code to figure out which VMAs should be dumped, and if so,
    whether the dump should contain the entire VMA or just its first page.

    Eliminate duplicate code by reworking the binfmt_elf version into a
    generic core dumping helper in coredump.c.

    As part of that, change the heuristic for detecting executable/library
    header pages to check whether the inode is executable instead of looking
    at the file mode.

    This is less problematic in terms of locking because it lets us avoid
    get_user() under the mmap_sem. (And arguably it looks nicer and makes
    more sense in generic code.)

    Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is
    only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA
    has been written to.

    Suggested-by: Linus Torvalds
    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of
    pages into the coredump file. Extract that logic into a common helper.

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • Patch series "Fix ELF / FDPIC ELF core dumping, and use mmap_lock properly in there", v5.

    At the moment, we have that rather ugly mmget_still_valid() helper to work
    around : ELF core dumping doesn't
    take the mmap_sem while traversing the task's VMAs, and if anything (like
    userfaultfd) then remotely messes with the VMA tree, fireworks ensue. So
    at the moment we use mmget_still_valid() to bail out in any writers that
    might be operating on a remote mm's VMAs.

    With this series, I'm trying to get rid of the need for that as cleanly as
    possible. ("cleanly" meaning "avoid holding the mmap_lock across
    unbounded sleeps".)

    Patches 1, 2, 3 and 4 are relatively unrelated cleanups in the core
    dumping code.

    Patches 5 and 6 implement the main change: Instead of repeatedly accessing
    the VMA list with sleeps in between, we snapshot it at the start with
    proper locking, and then later we just use our copy of the VMA list. This
    ensures that the kernel won't crash, that VMA metadata in the coredump is
    consistent even in the presence of concurrent modifications, and that any
    virtual addresses that aren't being concurrently modified have their
    contents show up in the core dump properly.

    The disadvantage of this approach is that we need a bit more memory during
    core dumping for storing metadata about all VMAs.

    At the end of the series, patch 7 removes the old workaround for this
    issue (mmget_still_valid()).

    I have tested:

    - Creating a simple core dump on X86-64 still works.
    - The created coredump on X86-64 opens in GDB and looks plausible.
    - X86-64 core dumps contain the first page for executable mappings at
    offset 0, and don't contain the first page for non-executable file
    mappings or executable mappings at offset !=0.
    - NOMMU 32-bit ARM can still generate plausible-looking core dumps
    through the FDPIC implementation. (I can't test this with GDB because
    GDB is missing some structure definition for nommu ARM, but I've
    poked around in the hexdump and it looked decent.)

    This patch (of 7):

    dump_emit() is for kernel pointers, and VMAs describe userspace memory.
    Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS,
    even if it probably doesn't matter much on !MMU systems - especially given
    that it looks like we can just use the same get_dump_page() as on MMU if
    we move it out of the CONFIG_MMU block.

    One small change we have to make in get_dump_page() is to use
    __get_user_pages_locked() instead of __get_user_pages(), since the latter
    doesn't exist on nommu. On mmu builds, __get_user_pages_locked() will
    just call __get_user_pages() for us.

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-1-jannh@google.com
    Link: http://lkml.kernel.org/r/20200827114932.3572699-2-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     

08 Aug, 2020

2 commits

  • Pull fdpick coredump update from Al Viro:
    "Switches fdpic coredumps away from original aout dumping primitives to
    the same kind of regset use as regular elf coredumps do"

    * 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    [elf-fdpic] switch coredump to regsets
    [elf-fdpic] use elf_dump_thread_status() for the dumper thread as well
    [elf-fdpic] move allocation of elf_thread_status into elf_dump_thread_status()
    [elf-fdpic] coredump: don't bother with cyclic list for per-thread objects
    kill elf_fpxregs_t
    take fdpic-related parts of elf_prstatus out
    unexport linux/elfcore.h

    Linus Torvalds
     
  • Patch series "mm: cleanup usage of "

    Most architectures have very similar versions of pXd_alloc_one() and
    pXd_free_one() for intermediate levels of page table. These patches add
    generic versions of these functions in and enable
    use of the generic functions where appropriate.

    In addition, functions declared and defined in headers are
    used mostly by core mm and early mm initialization in arch and there is no
    actual reason to have the included all over the place.
    The first patch in this series removes unneeded includes of

    In the end it didn't work out as neatly as I hoped and moving
    pXd_alloc_track() definitions to would require
    unnecessary changes to arches that have custom page table allocations, so
    I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
    to mm/.

    This patch (of 8):

    In most cases header is required only for allocations of
    page table memory. Most of the .c files that include that header do not
    use symbols declared in and do not require that header.

    As for the other header files that used to include , it is
    possible to move that include into the .c file that actually uses symbols
    from and drop the include from the header file.

    The process was somewhat automated using

    sed -i -E '/[
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: Geert Uytterhoeven [m68k]
    Cc: Abdul Haleem
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Christophe Leroy
    Cc: Joerg Roedel
    Cc: Max Filippov
    Cc: Peter Zijlstra
    Cc: Satheesh Rajendran
    Cc: Stafford Horne
    Cc: Stephen Rothwell
    Cc: Steven Rostedt
    Cc: Joerg Roedel
    Cc: Matthew Wilcox
    Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

28 Jul, 2020

6 commits


11 Jun, 2020

1 commit

  • Pull misc uaccess updates from Al Viro:
    "Assorted uaccess patches for this cycle - the stuff that didn't fit
    into thematic series"

    * 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    bpf: make bpf_check_uarg_tail_zero() use check_zeroed_user()
    x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user()
    user_regset_copyout_zero(): use clear_user()
    TEST_ACCESS_OK _never_ had been checked anywhere
    x86: switch cp_stat64() to unsafe_put_user()
    binfmt_flat: don't use __put_user()
    binfmt_elf_fdpic: don't use __... uaccess primitives
    binfmt_elf: don't bother with __{put,copy_to}_user()
    pselect6() and friends: take handling the combined 6th/7th args into helper

    Linus Torvalds
     

05 Jun, 2020

1 commit

  • Pull execve updates from Eric Biederman:
    "Last cycle for the Nth time I ran into bugs and quality of
    implementation issues related to exec that could not be easily be
    fixed because of the way exec is implemented. So I have been digging
    into exec and cleanup up what I can.

    I don't think I have exec sorted out enough to fix the issues I
    started with but I have made some headway this cycle with 4 sets of
    changes.

    - promised cleanups after introducing exec_update_mutex

    - trivial cleanups for exec

    - control flow simplifications

    - remove the recomputation of bprm->cred

    The net result is code that is a bit easier to understand and work
    with and a decrease in the number of lines of code (if you don't count
    the added tests)"

    * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
    exec: Compute file based creds only once
    exec: Add a per bprm->file version of per_clear
    binfmt_elf_fdpic: fix execfd build regression
    selftests/exec: Add binfmt_script regression test
    exec: Remove recursion from search_binary_handler
    exec: Generic execfd support
    exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
    exec: Move the call of prepare_binprm into search_binary_handler
    exec: Allow load_misc_binary to call prepare_binprm unconditionally
    exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
    exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
    exec: Teach prepare_exec_creds how exec treats uids & gids
    exec: Set the point of no return sooner
    exec: Move handling of the point of no return to the top level
    exec: Run sync_mm_rss before taking exec_update_mutex
    exec: Fix spelling of search_binary_handler in a comment
    exec: Move the comment from above de_thread to above unshare_sighand
    exec: Rename flush_old_exec begin_new_exec
    exec: Move most of setup_new_exec into flush_old_exec
    exec: In setup_new_exec cache current in the local variable me
    ...

    Linus Torvalds
     

04 Jun, 2020

1 commit


28 May, 2020

1 commit

  • The change to bprm->have_execfd was incomplete, leading
    to a build failure:

    fs/binfmt_elf_fdpic.c: In function 'create_elf_fdpic_tables':
    fs/binfmt_elf_fdpic.c:591:27: error: 'BINPRM_FLAGS_EXECFD' undeclared

    Change the last user of BINPRM_FLAGS_EXECFD in a corresponding
    way.

    Reported-by: Valdis Klētnieks
    Fixes: b8a61c9e7b4a ("exec: Generic execfd support")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Eric W. Biederman

    Arnd Bergmann
     

21 May, 2020

1 commit

  • Most of the support for passing the file descriptor of an executable
    to an interpreter already lives in the generic code and in binfmt_elf.
    Rework the fields in binfmt_elf that deal with executable file
    descriptor passing to make executable file descriptor passing a first
    class concept.

    Move the fd_install from binfmt_misc into begin_new_exec after the new
    creds have been installed. This means that accessing the file through
    /proc//fd/N is able to see the creds for the new executable
    before allowing access to the new executables files.

    Performing the install of the executables file descriptor after
    the point of no return also means that nothing special needs to
    be done on error. The exiting of the process will close all
    of it's open files.

    Move the would_dump from binfmt_misc into begin_new_exec right
    after would_dump is called on the bprm->file. This makes it
    obvious this case exists and that no nesting of bprm->file is
    currently supported.

    In binfmt_misc the movement of fd_install into generic code means
    that it's special error exit path is no longer needed.

    Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.org
    Acked-by: Linus Torvalds
    Reviewed-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

08 May, 2020

3 commits

  • There is and has been for a very long time been a lot more going on in
    flush_old_exec than just flushing the old state. After the movement
    of code from setup_new_exec there is a whole lot more going on than
    just flushing the old executables state.

    Rename flush_old_exec to begin_new_exec to more accurately reflect
    what this function does.

    Reviewed-by: Kees Cook
    Reviewed-by: Greg Ungerer
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • The two functions are now always called one right after the
    other so merge them together to make future maintenance easier.

    Reviewed-by: Kees Cook
    Reviewed-by: Greg Ungerer
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • In 2016 Linus moved install_exec_creds immediately after
    setup_new_exec, in binfmt_elf as a cleanup and as part of closing a
    potential information leak.

    Perform the same cleanup for the other binary formats.

    Different binary formats doing the same things the same way makes exec
    easier to reason about and easier to maintain.

    Greg Ungerer reports:
    > I tested the the whole series on non-MMU m68k and non-MMU arm
    > (exercising binfmt_flat) and it all tested out with no problems,
    > so for the binfmt_flat changes:
    Tested-by: Greg Ungerer

    Ref: 9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm")
    Reviewed-by: Kees Cook
    Reviewed-by: Greg Ungerer
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

06 May, 2020

1 commit


15 Nov, 2019

1 commit

  • We store elapsed time for a crashed process in struct elf_prstatus using
    'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
    incompatible with the kernel's idea of timeval since the structure layout
    no longer matches on 32-bit architectures.

    This changes the definition of the elf_prstatus structure to use
    __kernel_old_timeval instead, which is hardcoded to the currently used
    binary layout. There is no risk of overflow in y2038 though, because
    the time values are all relative times, and can store up to 68 years
    of process elapsed time.

    There is a risk of applications breaking at build time when they
    use the new kernel headers and expect the type to be exactly 'timeval'
    rather than a structure that has the same fields as before. Those
    applications have to be modified to deal with 64-bit time_t anyway.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

12 Apr, 2018

1 commit

  • Provide a final callback into fs/exec.c before start_thread() takes
    over, to handle any last-minute changes, like the coming restoration of
    the stack limit.

    Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org
    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Ben Hutchings
    Cc: Ben Hutchings
    Cc: Brad Spengler
    Cc: Greg KH
    Cc: Hugh Dickins
    Cc: "Jason A. Donenfeld"
    Cc: Laura Abbott
    Cc: Michal Hocko
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Willy Tarreau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

18 Nov, 2017

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted stuff, really no common topic here"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: grab the lock instead of blocking in __fd_install during resizing
    vfs: stop clearing close on exec when closing a fd
    include/linux/fs.h: fix comment about struct address_space
    fs: make fiemap work from compat_ioctl
    coda: fix 'kernel memory exposure attempt' in fsync
    pstore: remove unneeded unlikely()
    vfs: remove unneeded unlikely()
    stubs for mount_bdev() and kill_block_super() in !CONFIG_BLOCK case
    make vfs_ustat() static
    do_handle_open() should be static
    elf_fdpic: fix unused variable warning
    fold destroy_super() into __put_super()
    new helper: destroy_unused_super()
    fix address space warnings in ipc/
    acct.h: get rid of detritus

    Linus Torvalds
     

12 Oct, 2017

1 commit

  • The elf_fdpic code shows a harmless warning when built with MMU disabled,
    I ran into this now that fdpic is available on ARM randconfig builds
    since commit 50b2b2e691cd ("ARM: add ELF_FDPIC support").

    fs/binfmt_elf_fdpic.c: In function 'elf_fdpic_dump_segments':
    fs/binfmt_elf_fdpic.c:1501:17: error: unused variable 'addr' [-Werror=unused-variable]

    This adds another #ifdef around the variable declaration to shut up
    the warning.

    Fixes: e6c1baa9b562 ("convert the rest of binfmt_elf_fdpic to dump_emit()")
    Acked-by: Nicolas Pitre
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Arnd Bergmann
     

03 Oct, 2017

1 commit


15 Sep, 2017

1 commit

  • Pull more set_fs removal from Al Viro:
    "Christoph's 'use kernel_read and friends rather than open-coding
    set_fs()' series"

    * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: unexport vfs_readv and vfs_writev
    fs: unexport vfs_read and vfs_write
    fs: unexport __vfs_read/__vfs_write
    lustre: switch to kernel_write
    gadget/f_mass_storage: stop messing with the address limit
    mconsole: switch to kernel_read
    btrfs: switch write_buf to kernel_write
    net/9p: switch p9_fd_read to kernel_write
    mm/nommu: switch do_mmap_private to kernel_read
    serial2002: switch serial2002_tty_write to kernel_{read/write}
    fs: make the buf argument to __kernel_write a void pointer
    fs: fix kernel_write prototype
    fs: fix kernel_read prototype
    fs: move kernel_read to fs/read_write.c
    fs: move kernel_write to fs/read_write.c
    autofs4: switch autofs4_write to __kernel_write
    ashmem: switch to ->read_iter

    Linus Torvalds
     

11 Sep, 2017

2 commits

  • In elf_fdpic_map_file() there is a test to ensure the dynamic section in
    user space is properly terminated. However it does so by dereferencing
    a user address directly. Add proper user space accessor.

    Signed-off-by: Nicolas Pitre
    Acked-by: Mickael GUENE
    Tested-by: Vincent Abriou
    Tested-by: Andras Szemzo

    Nicolas Pitre
     
  • Provide the necessary changes to be able to execute ELF-FDPIC binaries
    on ARM systems with an MMU.

    The default for CONFIG_BINFMT_ELF_FDPIC is also set to n if the regular
    ELF loader is already configured so not to force FDPIC support on
    everyone. Given that CONFIG_BINFMT_ELF depends on CONFIG_MMU, this means
    CONFIG_BINFMT_ELF_FDPIC will still default to y when !MMU.

    Signed-off-by: Nicolas Pitre
    Acked-by: Mickael GUENE
    Tested-by: Vincent Abriou
    Tested-by: Andras Szemzo

    Nicolas Pitre
     

05 Sep, 2017

1 commit

  • Use proper ssize_t and size_t types for the return value and count
    argument, move the offset last and make it an in/out argument like
    all other read/write helpers, and make the buf argument a void pointer
    to get rid of lots of casts in the callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

02 Aug, 2017

1 commit

  • The bprm_secureexec hook can be moved earlier. Right now, it is called
    during create_elf_tables(), via load_binary(), via search_binary_handler(),
    via exec_binprm(). Nearly all (see exception below) state used by
    bprm_secureexec is created during the bprm_set_creds hook, called from
    prepare_binprm().

    For all LSMs (except commoncaps described next), only the first execution
    of bprm_set_creds takes any effect (they all check bprm->called_set_creds
    which prepare_binprm() sets after the first call to the bprm_set_creds
    hook). However, all these LSMs also only do anything with bprm_secureexec
    when they detected a secure state during their first run of bprm_set_creds.
    Therefore, it is functionally identical to move the detection into
    bprm_set_creds, since the results from secureexec here only need to be
    based on the first call to the LSM's bprm_set_creds hook.

    The single exception is that the commoncaps secureexec hook also examines
    euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
    via prepare_binprm(), which can be called multiple times (e.g.
    binfmt_script, binfmt_misc), and may clear the euid/egid for the final
    load (i.e. the script interpreter). However, while commoncaps specifically
    ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
    prepare_binprm() may get called, it needs to base the secureexec decision
    on the final call to bprm_set_creds. As a result, it will need special
    handling.

    To begin this refactoring, this adds the secureexec flag to the bprm
    struct, and calls the secureexec hook during setup_new_exec(). This is
    safe since all the cred work is finished (and past the point of no return).
    This explicit call will be removed in later patches once the hook has been
    removed.

    Cc: David Howells
    Signed-off-by: Kees Cook
    Reviewed-by: John Johansen
    Acked-by: Serge Hallyn
    Reviewed-by: James Morris

    Kees Cook
     

02 Mar, 2017

3 commits


01 Feb, 2017

3 commits

  • Use the new nsec based cputime accessors as part of the whole cputime
    conversion from cputime_t to nsecs.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-12-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Now that most cputime readers use the transition API which return the
    task cputime in old style cputime_t, we can safely store the cputime in
    nsecs. This will eventually make cputime statistics less opaque and more
    granular. Back and forth convertions between cputime_t and nsecs in order
    to deal with cputime_t random granularity won't be needed anymore.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • This API returns a task's cputime in cputime_t in order to ease the
    conversion of cputime internals to use nsecs units instead. Blindly
    converting all cputime readers to use this API now will later let us
    convert more smoothly and step by step all these places to use the
    new nsec based cputime.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

25 Dec, 2016

1 commit