30 Oct, 2020

1 commit

  • There is a regular need in the kernel to provide a way to declare having a
    dynamically sized set of trailing elements in a structure. Kernel code should
    always use “flexible array members”[1] for these cases. The older style of
    one-element or zero-length arrays should no longer be used[2].

    [1] https://en.wikipedia.org/wiki/Flexible_array_member
    [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

19 Oct, 2020

1 commit

  • create_elf_tables() runs after setup_new_exec(), so other tasks can
    already access our new mm and do things like process_madvise() on it. (At
    the time I'm writing this commit, process_madvise() is not in mainline
    yet, but has been in akpm's tree for some time.)

    While I believe that there are currently no APIs that would actually allow
    another process to mess up our VMA tree (process_madvise() is limited to
    MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm
    under which no syscalls have been executed yet), this seems like an
    accident waiting to happen.

    Let's make sure that we always take the mmap lock around GUP paths as long
    as another process might be able to see the mm.

    (Yes, this diff looks suspicious because we drop the lock before doing
    anything with `vma`, but that's because we actually don't do anything with
    it apart from the NULL check.)

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Michel Lespinasse
    Cc: "Eric W . Biederman"
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: Mauro Carvalho Chehab
    Cc: Sakari Ailus
    Link: https://lkml.kernel.org/r/CAG48ez1-PBCdv3y8pn-Ty-b+FmBSLwDuVKFSt8h7wARLy0dF-Q@mail.gmail.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     

17 Oct, 2020

4 commits

  • In both binfmt_elf and binfmt_elf_fdpic, use a new helper
    dump_vma_snapshot() to take a snapshot of the VMA list (including the gate
    VMA, if we have one) while protected by the mmap_lock, and then use that
    snapshot instead of walking the VMA list without locking.

    An alternative approach would be to keep the mmap_lock held across the
    entire core dumping operation; however, keeping the mmap_lock locked while
    we may be blocked for an unbounded amount of time (e.g. because we're
    dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock
    blocks things like the ->release handler of userfaultfd, and we don't
    really want critical system daemons to grind to a halt just because
    someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or
    something like that.

    Since both the normal ELF code and the FDPIC ELF code need this
    functionality (and if any other binfmt wants to add coredump support in
    the future, they'd probably need it, too), implement this with a common
    helper in fs/coredump.c.

    A downside of this approach is that we now need a bigger amount of kernel
    memory per userspace VMA in the normal ELF case, and that we need O(n)
    kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't
    be terribly bad.

    There currently is a data race between stack expansion and anything that
    reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to
    mitigate that for core dumping, take the mmap_lock in write mode when
    taking a snapshot of the VMA hierarchy. (If we only took the mmap_lock in
    read mode, we could end up with a corrupted core dump if someone does
    get_user_pages_remote() concurrently. Not really a major problem, but
    taking the mmap_lock either way works here, so we might as well avoid the
    issue.) (This doesn't do anything about the existing data races with stack
    expansion in other mm code.)

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly
    different code to figure out which VMAs should be dumped, and if so,
    whether the dump should contain the entire VMA or just its first page.

    Eliminate duplicate code by reworking the binfmt_elf version into a
    generic core dumping helper in coredump.c.

    As part of that, change the heuristic for detecting executable/library
    header pages to check whether the inode is executable instead of looking
    at the file mode.

    This is less problematic in terms of locking because it lets us avoid
    get_user() under the mmap_sem. (And arguably it looks nicer and makes
    more sense in generic code.)

    Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is
    only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA
    has been written to.

    Suggested-by: Linus Torvalds
    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of
    pages into the coredump file. Extract that logic into a common helper.

    Signed-off-by: Jann Horn
    Signed-off-by: Andrew Morton
    Acked-by: Linus Torvalds
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Cc: "Eric W . Biederman"
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.com
    Signed-off-by: Linus Torvalds

    Jann Horn
     
  • Patch series "Selecting Load Addresses According to p_align", v3.

    The current ELF loading mechancism provides page-aligned mappings. This
    can lead to the program being loaded in a way unsuitable for file-backed,
    transparent huge pages when handling PIE executables.

    While specifying -z,max-page-size=0x200000 to the linker will generate
    suitably aligned segments for huge pages on x86_64, the executable needs
    to be loaded at a suitably aligned address as well. This alignment
    requires the binary's cooperation, as distinct segments need to be
    appropriately paddded to be eligible for THP.

    For binaries built with increased alignment, this limits the number of
    bits usable for ASLR, but provides some randomization over using fixed
    load addresses/non-PIE binaries.

    This patch (of 2):

    The current ELF loading mechancism provides page-aligned mappings. This
    can lead to the program being loaded in a way unsuitable for file-backed,
    transparent huge pages when handling PIE executables.

    For binaries built with increased alignment, this limits the number of
    bits usable for ASLR, but provides some randomization over using fixed
    load addresses/non-PIE binaries.

    Tested by verifying program with -Wl,-z,max-page-size=0x200000 loading.

    [akpm@linux-foundation.org: fix max() warning]
    [ckennelly@google.com: augment comment]
    Link: https://lkml.kernel.org/r/20200821233848.3904680-2-ckennelly@google.com

    Signed-off-by: Chris Kennelly
    Signed-off-by: Andrew Morton
    Cc: Alexander Viro
    Cc: Alexey Dobriyan
    Cc: Song Liu
    Cc: David Rientjes
    Cc: Ian Rogers
    Cc: Hugh Dickens
    Cc: Suren Baghdasaryan
    Cc: Sandeep Patil
    Cc: Fangrui Song
    Cc: Nick Desaulniers
    Cc: "Kirill A. Shutemov"
    Cc: Mike Kravetz
    Cc: Shuah Khan
    Link: https://lkml.kernel.org/r/20200820170541.1132271-1-ckennelly@google.com
    Link: https://lkml.kernel.org/r/20200820170541.1132271-2-ckennelly@google.com
    Signed-off-by: Linus Torvalds

    Chris Kennelly
     

28 Jul, 2020

2 commits

  • all uses are conditional upon ELF_CORE_COPY_XFPREGS, which has not
    been defined on any architecture since 2010

    Signed-off-by: Al Viro

    Al Viro
     
  • Two new helpers: given a process and regset, dump into a buffer.
    regset_get() takes a buffer and size, regset_get_alloc() takes size
    and allocates a buffer.

    Return value in both cases is the amount of data actually dumped in
    case of success or -E... on error.

    In both cases the size is capped by regset->n * regset->size, so
    ->get() is called with offset 0 and size no more than what regset
    expects.

    binfmt_elf.c callers of ->get() are switched to using those; the other
    caller (copy_regset_to_user()) will need some preparations to switch.

    Signed-off-by: Al Viro

    Al Viro
     

11 Jun, 2020

1 commit

  • Pull misc uaccess updates from Al Viro:
    "Assorted uaccess patches for this cycle - the stuff that didn't fit
    into thematic series"

    * 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    bpf: make bpf_check_uarg_tail_zero() use check_zeroed_user()
    x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user()
    user_regset_copyout_zero(): use clear_user()
    TEST_ACCESS_OK _never_ had been checked anywhere
    x86: switch cp_stat64() to unsafe_put_user()
    binfmt_flat: don't use __put_user()
    binfmt_elf_fdpic: don't use __... uaccess primitives
    binfmt_elf: don't bother with __{put,copy_to}_user()
    pselect6() and friends: take handling the combined 6th/7th args into helper

    Linus Torvalds
     

05 Jun, 2020

3 commits

  • Merge yet more updates from Andrew Morton:

    - More MM work. 100ish more to go. Mike Rapoport's "mm: remove
    __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

    - Various other little subsystems

    * emailed patches from Andrew Morton : (127 commits)
    lib/ubsan.c: fix gcc-10 warnings
    tools/testing/selftests/vm: remove duplicate headers
    selftests: vm: pkeys: fix multilib builds for x86
    selftests: vm: pkeys: use the correct page size on powerpc
    selftests/vm/pkeys: override access right definitions on powerpc
    selftests/vm/pkeys: test correct behaviour of pkey-0
    selftests/vm/pkeys: introduce a sub-page allocator
    selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
    selftests/vm/pkeys: associate key on a mapped page and detect write violation
    selftests/vm/pkeys: associate key on a mapped page and detect access violation
    selftests/vm/pkeys: improve checks to determine pkey support
    selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
    selftests/vm/pkeys: fix number of reserved powerpc pkeys
    selftests/vm/pkeys: introduce powerpc support
    selftests/vm/pkeys: introduce generic pkey abstractions
    selftests: vm: pkeys: use the correct huge page size
    selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
    selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
    selftests/vm/pkeys: fix pkey_disable_clear()
    selftests: vm: pkeys: add helpers for pkey bits
    ...

    Linus Torvalds
     
  • The ifndef was added a long time ago to support archs that would define
    their own mapping function. The last user was the metag arch which was
    removed from the tree, and as such there are no users left. Let's kill
    it.

    Signed-off-by: Anthony Iliopoulos
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200402161543.4119-1-ailiop@suse.com
    Signed-off-by: Linus Torvalds

    Anthony Iliopoulos
     
  • Pull execve updates from Eric Biederman:
    "Last cycle for the Nth time I ran into bugs and quality of
    implementation issues related to exec that could not be easily be
    fixed because of the way exec is implemented. So I have been digging
    into exec and cleanup up what I can.

    I don't think I have exec sorted out enough to fix the issues I
    started with but I have made some headway this cycle with 4 sets of
    changes.

    - promised cleanups after introducing exec_update_mutex

    - trivial cleanups for exec

    - control flow simplifications

    - remove the recomputation of bprm->cred

    The net result is code that is a bit easier to understand and work
    with and a decrease in the number of lines of code (if you don't count
    the added tests)"

    * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
    exec: Compute file based creds only once
    exec: Add a per bprm->file version of per_clear
    binfmt_elf_fdpic: fix execfd build regression
    selftests/exec: Add binfmt_script regression test
    exec: Remove recursion from search_binary_handler
    exec: Generic execfd support
    exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
    exec: Move the call of prepare_binprm into search_binary_handler
    exec: Allow load_misc_binary to call prepare_binprm unconditionally
    exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
    exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
    exec: Teach prepare_exec_creds how exec treats uids & gids
    exec: Set the point of no return sooner
    exec: Move handling of the point of no return to the top level
    exec: Run sync_mm_rss before taking exec_update_mutex
    exec: Fix spelling of search_binary_handler in a comment
    exec: Move the comment from above de_thread to above unshare_sighand
    exec: Rename flush_old_exec begin_new_exec
    exec: Move most of setup_new_exec into flush_old_exec
    exec: In setup_new_exec cache current in the local variable me
    ...

    Linus Torvalds
     

04 Jun, 2020

1 commit


02 Jun, 2020

2 commits

  • Pull uaccess/coredump updates from Al Viro:
    "set_fs() removal in coredump-related area - mostly Christoph's
    stuff..."

    * 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump
    binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump
    binfmt_elf: remove the set_fs in fill_siginfo_note
    signal: refactor copy_siginfo_to_user32
    powerpc/spufs: simplify spufs core dumping
    powerpc/spufs: stop using access_ok
    powerpc/spufs: fix copy_to_user while atomic

    Linus Torvalds
     
  • Pull arm64 updates from Will Deacon:
    "A sizeable pile of arm64 updates for 5.8.

    Summary below, but the big two features are support for Branch Target
    Identification and Clang's Shadow Call stack. The latter is currently
    arm64-only, but the high-level parts are all in core code so it could
    easily be adopted by other architectures pending toolchain support

    Branch Target Identification (BTI):

    - Support for ARMv8.5-BTI in both user- and kernel-space. This allows
    branch targets to limit the types of branch from which they can be
    called and additionally prevents branching to arbitrary code,
    although kernel support requires a very recent toolchain.

    - Function annotation via SYM_FUNC_START() so that assembly functions
    are wrapped with the relevant "landing pad" instructions.

    - BPF and vDSO updates to use the new instructions.

    - Addition of a new HWCAP and exposure of BTI capability to userspace
    via ID register emulation, along with ELF loader support for the
    BTI feature in .note.gnu.property.

    - Non-critical fixes to CFI unwind annotations in the sigreturn
    trampoline.

    Shadow Call Stack (SCS):

    - Support for Clang's Shadow Call Stack feature, which reserves
    platform register x18 to point at a separate stack for each task
    that holds only return addresses. This protects function return
    control flow from buffer overruns on the main stack.

    - Save/restore of x18 across problematic boundaries (user-mode,
    hypervisor, EFI, suspend, etc).

    - Core support for SCS, should other architectures want to use it
    too.

    - SCS overflow checking on context-switch as part of the existing
    stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.

    CPU feature detection:

    - Removed numerous "SANITY CHECK" errors when running on a system
    with mismatched AArch32 support at EL1. This is primarily a concern
    for KVM, which disabled support for 32-bit guests on such a system.

    - Addition of new ID registers and fields as the architecture has
    been extended.

    Perf and PMU drivers:

    - Minor fixes and cleanups to system PMU drivers.

    Hardware errata:

    - Unify KVM workarounds for VHE and nVHE configurations.

    - Sort vendor errata entries in Kconfig.

    Secure Monitor Call Calling Convention (SMCCC):

    - Update to the latest specification from Arm (v1.2).

    - Allow PSCI code to query the SMCCC version.

    Software Delegated Exception Interface (SDEI):

    - Unexport a bunch of unused symbols.

    - Minor fixes to handling of firmware data.

    Pointer authentication:

    - Add support for dumping the kernel PAC mask in vmcoreinfo so that
    the stack can be unwound by tools such as kdump.

    - Simplification of key initialisation during CPU bringup.

    BPF backend:

    - Improve immediate generation for logical and add/sub instructions.

    vDSO:

    - Minor fixes to the linker flags for consistency with other
    architectures and support for LLVM's unwinder.

    - Clean up logic to initialise and map the vDSO into userspace.

    ACPI:

    - Work around for an ambiguity in the IORT specification relating to
    the "num_ids" field.

    - Support _DMA method for all named components rather than only PCIe
    root complexes.

    - Minor other IORT-related fixes.

    Miscellaneous:

    - Initialise debug traps early for KGDB and fix KDB cacheflushing
    deadlock.

    - Minor tweaks to early boot state (documentation update, set
    TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).

    - Refactoring and cleanup"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
    KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
    KVM: arm64: Check advertised Stage-2 page size capability
    arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
    ACPI/IORT: Remove the unused __get_pci_rid()
    arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
    arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
    arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
    arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
    arm64/cpufeature: Introduce ID_MMFR5 CPU register
    arm64/cpufeature: Introduce ID_DFR1 CPU register
    arm64/cpufeature: Introduce ID_PFR2 CPU register
    arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
    arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
    arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
    arm64: mm: Add asid_gen_match() helper
    firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
    arm64: vdso: Fix CFI directives in sigreturn trampoline
    arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
    ...

    Linus Torvalds
     

29 May, 2020

1 commit

  • KMSAN reported uninitialized data being written to disk when dumping
    core. As a result, several kilobytes of kmalloc memory may be written
    to the core file and then read by a non-privileged user.

    Reported-by: sam
    Signed-off-by: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc:
    Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com
    Link: https://github.com/google/kmsan/issues/76
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

21 May, 2020

1 commit

  • Most of the support for passing the file descriptor of an executable
    to an interpreter already lives in the generic code and in binfmt_elf.
    Rework the fields in binfmt_elf that deal with executable file
    descriptor passing to make executable file descriptor passing a first
    class concept.

    Move the fd_install from binfmt_misc into begin_new_exec after the new
    creds have been installed. This means that accessing the file through
    /proc//fd/N is able to see the creds for the new executable
    before allowing access to the new executables files.

    Performing the install of the executables file descriptor after
    the point of no return also means that nothing special needs to
    be done on error. The exiting of the process will close all
    of it's open files.

    Move the would_dump from binfmt_misc into begin_new_exec right
    after would_dump is called on the bprm->file. This makes it
    obvious this case exists and that no nesting of bprm->file is
    currently supported.

    In binfmt_misc the movement of fd_install into generic code means
    that it's special error exit path is no longer needed.

    Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.org
    Acked-by: Linus Torvalds
    Reviewed-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

08 May, 2020

2 commits

  • There is and has been for a very long time been a lot more going on in
    flush_old_exec than just flushing the old state. After the movement
    of code from setup_new_exec there is a whole lot more going on than
    just flushing the old executables state.

    Rename flush_old_exec to begin_new_exec to more accurately reflect
    what this function does.

    Reviewed-by: Kees Cook
    Reviewed-by: Greg Ungerer
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • The two functions are now always called one right after the
    other so merge them together to make future maintenance easier.

    Reviewed-by: Kees Cook
    Reviewed-by: Greg Ungerer
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

06 May, 2020

2 commits

  • There is no logic in elf_core_dump itself or in the various arch helpers
    called from it which use uaccess routines on kernel pointers except for
    the file writes thate are nicely encapsulated by using __kernel_write in
    dump_emit.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • The code in binfmt_elf.c is differnt from the rest of the code that
    processes siginfo, as it sends siginfo from a kernel buffer to a file
    rather than from kernel memory to userspace buffers. To remove it's
    use of set_fs the code needs some different siginfo helpers.

    Add the helper copy_siginfo_to_external to copy from the kernel's
    internal siginfo layout to a buffer in the siginfo layout that
    userspace expects.

    Modify fill_siginfo_note to use copy_siginfo_to_external instead of
    set_fs and copy_siginfo_to_user.

    Update compat_binfmt_elf.c to use the previously added
    copy_siginfo_to_external32 to handle the compat case.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Eric W. Biederman
     

05 May, 2020

1 commit

  • Merge in user support for Branch Target Identification, which narrowly
    missed the cut for 5.7 after a late ABI concern.

    * for-next/bti-user:
    arm64: bti: Document behaviour for dynamically linked binaries
    arm64: elf: Fix allnoconfig kernel build with !ARCH_USE_GNU_PROPERTY
    arm64: BTI: Add Kconfig entry for userspace BTI
    mm: smaps: Report arm64 guarded pages in smaps
    arm64: mm: Display guarded pages in ptdump
    KVM: arm64: BTI: Reset BTYPE when skipping emulated instructions
    arm64: BTI: Reset BTYPE when skipping emulated instructions
    arm64: traps: Shuffle code to eliminate forward declarations
    arm64: unify native/compat instruction skipping
    arm64: BTI: Decode BYTPE bits when printing PSTATE
    arm64: elf: Enable BTI at exec based on ELF program properties
    elf: Allow arch to tweak initial mmap prot flags
    arm64: Basic Branch Target Identification support
    ELF: Add ELF program property parsing support
    ELF: UAPI and Kconfig additions for ELF program properties

    Will Deacon
     

08 Apr, 2020

4 commits

  • Static executables don't need to free NULL pointer.

    It doesn't matter really because static executable is not common scenario
    but do it anyway out of pedantry.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200219185330.GA4933@avx2
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • PT_INTERP ELF header can be spared if executable is static.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200219185012.GB4871@avx2
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • "loc" variable became just a wrapper for PT_INTERP ELF header after main
    ELF header was moved to "bprm->buf". Delete it.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200219184847.GA4871@avx2
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • This replaces all remaining open encodings with is_vm_hugetlb_page().

    Signed-off-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Michael Ellerman
    Cc: Alexander Viro
    Cc: Will Deacon
    Cc: "Aneesh Kumar K.V"
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Geert Uytterhoeven
    Cc: Guo Ren
    Cc: Mel Gorman
    Cc: Paul Burton
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

17 Mar, 2020

2 commits

  • An arch may want to tweak the mmap prot flags for an
    ELFexecutable's initial mappings. For example, arm64 is going to
    need to add PROT_BTI for executable pages in an ELF process whose
    executable is marked as using Branch Target Identification (an
    ARMv8.5-A control flow integrity feature).

    So that this can be done in a generic way, add a hook
    arch_elf_adjust_prot() to modify the prot flags as desired: arches
    can select CONFIG_HAVE_ELF_PROT and implement their own backend
    where necessary.

    By default, leave the prot flags unchanged.

    Signed-off-by: Mark Brown
    Signed-off-by: Dave Martin
    Reviewed-by: Catalin Marinas
    Reviewed-by: Kees Cook
    Signed-off-by: Catalin Marinas

    Dave Martin
     
  • ELF program properties will be needed for detecting whether to
    enable optional architecture or ABI features for a new ELF process.

    For now, there are no generic properties that we care about, so do
    nothing unless CONFIG_ARCH_USE_GNU_PROPERTY=y.

    Otherwise, the presence of properties using the PT_PROGRAM_PROPERTY
    phdrs entry (if any), and notify each property to the arch code.

    For now, the added code is not used.

    Signed-off-by: Mark Brown
    Signed-off-by: Dave Martin
    Reviewed-by: Kees Cook
    Signed-off-by: Catalin Marinas

    Dave Martin
     

01 Feb, 2020

8 commits

  • Unmapping whole address space at once with

    munmap(0, (1ULL<
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • array_size() macro will do overflow check anyway.

    Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Comment says ELF header is "too large to be on stack". 64 bytes on
    64-bit is not large by any means.

    Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If some mapping goes past TASK_SIZE it will be rejected by kernel which
    means no such userspace binaries exist.

    Mark every such check as unlikely.

    Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • "current->mm" pointer is stable in general except few cases one of which
    execve(2). Compiler can't treat is as stable but it _is_ stable most of
    the time. During ELF loading process ->mm becomes stable right after
    flush_old_exec().

    Help compiler by caching current->mm, otherwise it continues to refetch
    it.

    add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-141 (-141)
    Function old new delta
    elf_core_dump 5062 5039 -23
    load_elf_binary 5426 5308 -118

    Note: other cases are left as is because it is either pessimisation or
    no change in binary size.

    Link: http://lkml.kernel.org/r/20191215124755.GB21124@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • ELF header is read into bprm->buf[] by generic execve code.

    Save a memcpy and allocate just one header for the interpreter instead
    of two headers (64 bytes instead of 128 on 64-bit).

    Link: http://lkml.kernel.org/r/20191208171242.GA19716@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Only executable segments should be accounted to ->start_code just like
    they do to ->end_code (correctly).

    Link: http://lkml.kernel.org/r/20191208171410.GB19716@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Filling auxv vector as array with index (auxv[i++] = ...) generates
    terrible code. "saved_auxv" should be reworked because it is the worst
    member of mm_struct by size/usefullness ratio but do it later.

    Meanwhile help gcc a little with *auxv++ idiom.

    Space savings on x86_64:

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-127 (-127)
    Function old new delta
    load_elf_binary 5470 5343 -127

    Link: http://lkml.kernel.org/r/20191208172301.GD19716@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

05 Dec, 2019

2 commits


15 Nov, 2019

1 commit

  • We store elapsed time for a crashed process in struct elf_prstatus using
    'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
    incompatible with the kernel's idea of timeval since the structure layout
    no longer matches on 32-bit architectures.

    This changes the definition of the elf_prstatus structure to use
    __kernel_old_timeval instead, which is hardcoded to the currently used
    binary layout. There is no risk of overflow in y2038 though, because
    the time values are all relative times, and can store up to 68 years
    of process elapsed time.

    There is a risk of applications breaking at build time when they
    use the new kernel headers and expect the type to be exactly 'timeval'
    rather than a structure that has the same fields as before. Those
    applications have to be modified to deal with 64-bit time_t anyway.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

07 Oct, 2019

1 commit

  • In commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") we
    changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
    executable mappings.

    Then, people reported that it broke some binaries that had overlapping
    segments from the same file, and commit ad55eac74f20 ("elf: enforce
    MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
    overlaying elf segment cases. But only some - despite the summary line
    of that commit, it only did it when it also does a temporary brk vma for
    one obvious overlapping case.

    Now Russell King reports another overlapping case with old 32-bit x86
    binaries, which doesn't trigger that limited case. End result: we had
    better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.

    Yes, it's a sign of old binaries generated with old tool-chains, but we
    do pride ourselves on not breaking existing setups.

    This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
    and the old load_elf_library() use-cases, because nobody has reported
    breakage for those. Yet.

    Note that in all the cases seen so far, the overlapping elf sections
    seem to be just re-mapping of the same executable with different section
    attributes. We could possibly introduce a new MAP_FIXED_NOFILECHANGE
    flag or similar, which acts like NOREPLACE, but allows just remapping
    the same executable file using different protection flags.

    It's not clear that would make a huge difference to anything, but if
    people really hate that "elf remaps over previous maps" behavior, maybe
    at least a more limited form of remapping would alleviate some concerns.

    Alternatively, we should take a look at our elf_map() logic to see if we
    end up not mapping things properly the first time.

    In the meantime, this is the minimal "don't do that then" patch while
    people hopefully think about it more.

    Reported-by: Russell King
    Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
    Fixes: ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments")
    Cc: Michal Hocko
    Cc: Kees Cook
    Signed-off-by: Linus Torvalds

    Linus Torvalds