13 Jan, 2019

1 commit

  • commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream.

    This changes the fork(2) syscall to record the process start_time after
    initializing the basic task structure but still before making the new
    process visible to user-space.

    Technically, we could record the start_time anytime during fork(2). But
    this might lead to scenarios where a start_time is recorded long before
    a process becomes visible to user-space. For instance, with
    userfaultfd(2) and TLS, user-space can delay the execution of fork(2)
    for an indefinite amount of time (and will, if this causes network
    access, or similar).

    By recording the start_time late, it much closer reflects the point in
    time where the process becomes live and can be observed by other
    processes.

    Lastly, this makes it much harder for user-space to predict and control
    the start_time they get assigned. Previously, user-space could fork a
    process and stall it in copy_thread_tls() before its pid is allocated,
    but after its start_time is recorded. This can be misused to later-on
    cycle through PIDs and resume the stalled fork(2) yielding a process
    that has the same pid and start_time as a process that existed before.
    This can be used to circumvent security systems that identify processes
    by their pid+start_time combination.

    Even though user-space was always aware that start_time recording is
    flaky (but several projects are known to still rely on start_time-based
    identification), changing the start_time to be recorded late will help
    mitigate existing attacks and make it much harder for user-space to
    control the start_time a process gets assigned.

    Reported-by: Jann Horn
    Signed-off-by: Tom Gundersen
    Signed-off-by: David Herrmann
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Herrmann
     

15 Sep, 2018

1 commit

  • [ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]

    Before this change, if a multithreaded process forks while one of its
    threads is changing a signal handler using sigaction(), the memcpy() in
    copy_sighand() can race with the struct assignment in do_sigaction(). It
    isn't clear whether this can cause corruption of the userspace signal
    handler pointer, but it definitely can cause inconsistency between
    different fields of struct sigaction.

    Take the appropriate spinlock to avoid this.

    I have tested that this patch prevents inconsistency between sa_sigaction
    and sa_flags, which is possible before this patch.

    Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
    Signed-off-by: Jann Horn
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Rik van Riel
    Cc: "Peter Zijlstra (Intel)"
    Cc: Kees Cook
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

03 Aug, 2018

1 commit

  • commit e01e80634ecdde1dd113ac43b3adad21b47f3957 upstream.

    One of the classes of kernel stack content leaks[1] is exposing the
    contents of prior heap or stack contents when a new process stack is
    allocated. Normally, those stacks are not zeroed, and the old contents
    remain in place. In the face of stack content exposure flaws, those
    contents can leak to userspace.

    Fixing this will make the kernel no longer vulnerable to these flaws, as
    the stack will be wiped each time a stack is assigned to a new process.
    There's not a meaningful change in runtime performance; it almost looks
    like it provides a benefit.

    Performing back-to-back kernel builds before:
    Run times: 157.86 157.09 158.90 160.94 160.80
    Mean: 159.12
    Std Dev: 1.54

    and after:
    Run times: 159.31 157.34 156.71 158.15 160.81
    Mean: 158.46
    Std Dev: 1.46

    Instead of making this a build or runtime config, Andy Lutomirski
    recommended this just be enabled by default.

    [1] A noisy search for many kinds of stack content leaks can be seen here:
    https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak

    I did some more with perf and cycle counts on running 100,000 execs of
    /bin/true.

    before:
    Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
    Mean: 221015379122.60
    Std Dev: 4662486552.47

    after:
    Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
    Mean: 217745009865.40
    Std Dev: 5935559279.99

    It continues to look like it's faster, though the deviation is rather
    wide, but I'm not sure what I could do that would be less noisy. I'm
    open to ideas!

    Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
    Signed-off-by: Kees Cook
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Laura Abbott
    Cc: Rasmus Villemoes
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

22 Feb, 2018

1 commit

  • commit 75f296d93bcebcfe375884ddac79e30263a31766 upstream.

    Convert all allocations that used a NOTRACK flag to stop using it.

    Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Levin, Alexander (Sasha Levin)
     

30 Dec, 2017

1 commit

  • commit c10e83f598d08046dd1ebc8360d4bb12d802d51b upstream.

    In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
    allowed to fail. Fix up all instances.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Andy Lutomirsky
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: dan.j.williams@intel.com
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: kirill.shutemov@linux.intel.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

14 Oct, 2017

1 commit

  • Kmemleak considers any pointers on task stacks as references. This
    patch clears newly allocated and reused vmap stacks.

    Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz
    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

04 Oct, 2017

1 commit

  • Drop the global lru lock in isolate callback before calling
    zap_page_range which calls cond_resched, and re-acquire the global lru
    lock before returning. Also change return code to LRU_REMOVED_RETRY.

    Use mmput_async when fail to acquire mmap sem in an atomic context.

    Fix "BUG: sleeping function called from invalid context"
    errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

    Also restore mmput_async, which was initially introduced in commit
    ec8d7c14ea14 ("mm, oom_reaper: do not mmput synchronously from the oom
    reaper context"), and was removed in commit 212925802454 ("mm: oom: let
    oom_reap_task and exit_mmap run concurrently").

    Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com
    Fixes: f2517eb76f1f2 ("android: binder: Add global lru shrinker to binder")
    Signed-off-by: Sherry Yang
    Signed-off-by: Greg Kroah-Hartman
    Reported-by: Kyle Yan
    Acked-by: Arve Hjønnevåg
    Acked-by: Michal Hocko
    Cc: Martijn Coenen
    Cc: Todd Kjos
    Cc: Riley Andrews
    Cc: Ingo Molnar
    Cc: Vlastimil Babka
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Oleg Nesterov
    Cc: Hoeun Ryu
    Cc: Christopher Lameter
    Cc: Vegard Nossum
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sherry Yang
     

13 Sep, 2017

1 commit

  • Pull selinux updates from Paul Moore:
    "A relatively quiet period for SELinux, 11 patches with only two/three
    having any substantive changes.

    These noteworthy changes include another tweak to the NNP/nosuid
    handling, per-file labeling for cgroups, and an object class fix for
    AF_UNIX/SOCK_RAW sockets; the rest of the changes are minor tweaks or
    administrative updates (Stephen's email update explains the file
    explosion in the diffstat).

    Everything passes the selinux-testsuite"

    [ Also a couple of small patches from the security tree from Tetsuo
    Handa for Tomoyo and LSM cleanup. The separation of security policy
    updates wasn't all that clean - Linus ]

    * tag 'selinux-pr-20170831' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    selinux: constify nf_hook_ops
    selinux: allow per-file labeling for cgroupfs
    lsm_audit: update my email address
    selinux: update my email address
    MAINTAINERS: update the NetLabel and Labeled Networking information
    selinux: use GFP_NOWAIT in the AVC kmem_caches
    selinux: Generalize support for NNP/nosuid SELinux domain transitions
    selinux: genheaders should fail if too many permissions are defined
    selinux: update the selinux info in MAINTAINERS
    credits: update Paul Moore's info
    selinux: Assign proper class to PF_UNIX/SOCK_RAW sockets
    tomoyo: Update URLs in Documentation/admin-guide/LSM/tomoyo.rst
    LSM: Remove security_task_create() hook.

    Linus Torvalds
     

09 Sep, 2017

2 commits

  • ... with the generic rbtree flavor instead. No changes
    in semantics whatsoever.

    Link: http://lkml.kernel.org/r/20170719014603.19029-10-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • HMM provides 3 separate types of functionality:
    - Mirroring: synchronize CPU page table and device page table
    - Device memory: allocating struct page for device memory
    - Migration: migrating regular memory to device memory

    This patch introduces some common helpers and definitions to all of
    those 3 functionality.

    Link: http://lkml.kernel.org/r/20170817000548.32038-3-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Evgeny Baskakov
    Signed-off-by: John Hubbard
    Signed-off-by: Mark Hairgrove
    Signed-off-by: Sherry Cheung
    Signed-off-by: Subhash Gutti
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Johannes Weiner
    Cc: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

07 Sep, 2017

2 commits

  • Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
    in the child process after fork. This differs from MADV_DONTFORK in one
    important way.

    If a child process accesses memory that was MADV_WIPEONFORK, it will get
    zeroes. The address ranges are still valid, they are just empty.

    If a child process accesses memory that was MADV_DONTFORK, it will get a
    segmentation fault, since those address ranges are no longer valid in
    the child after fork.

    Since MADV_DONTFORK also seems to be used to allow very large programs
    to fork in systems with strict memory overcommit restrictions, changing
    the semantics of MADV_DONTFORK might break existing programs.

    MADV_WIPEONFORK only works on private, anonymous VMAs.

    The use case is libraries that store or cache information, and want to
    know that they need to regenerate it in the child process after fork.

    Examples of this would be:
    - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
    check, which is too slow without a PID cache)
    - PKCS#11 API reinitialization check (mandated by specification)
    - glibc's upcoming PRNG (reseed after fork)
    - OpenSSL PRNG (reseed after fork)

    The security benefits of a forking server having a re-inialized PRNG in
    every child process are pretty obvious. However, due to libraries
    having all kinds of internal state, and programs getting compiled with
    many different versions of each library, it is unreasonable to expect
    calling programs to re-initialize everything manually after fork.

    A further complication is the proliferation of clone flags, programs
    bypassing glibc's functions to call clone directly, and programs calling
    unshare, causing the glibc pthread_atfork hook to not get called.

    It would be better to have the kernel take care of this automatically.

    The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
    MADV_WIPEONFORK.

    This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

    [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
    Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com
    Signed-off-by: Rik van Riel
    Reported-by: Florian Weimer
    Reported-by: Colm MacCártaigh
    Reviewed-by: Mike Kravetz
    Cc: "H. Peter Anvin"
    Cc: "Kirill A. Shutemov"
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Ingo Molnar
    Cc: Helge Deller
    Cc: Kees Cook
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • This is purely required because exit_aio() may block and exit_mmap() may
    never start, if the oom_reap_task cannot start running on a mm with
    mm_users == 0.

    At the same time if the OOM reaper doesn't wait at all for the memory of
    the current OOM candidate to be freed by exit_mmap->unmap_vmas, it would
    generate a spurious OOM kill.

    If it wasn't because of the exit_aio or similar blocking functions in
    the last mmput, it would be enough to change the oom_reap_task() in the
    case it finds mm_users == 0, to wait for a timeout or to wait for
    __mmput to set MMF_OOM_SKIP itself, but it's not just exit_mmap the
    problem here so the concurrency of exit_mmap and oom_reap_task is
    apparently warranted.

    It's a non standard runtime, exit_mmap() runs without mmap_sem, and
    oom_reap_task runs with the mmap_sem for reading as usual (kind of
    MADV_DONTNEED).

    The race between the two is solved with a combination of
    tsk_is_oom_victim() (serialized by task_lock) and MMF_OOM_SKIP
    (serialized by a dummy down_write/up_write cycle on the same lines of
    the ksm_exit method).

    If the oom_reap_task() may be running concurrently during exit_mmap,
    exit_mmap will wait it to finish in down_write (before taking down mm
    structures that would make the oom_reap_task fail with use after free).

    If exit_mmap comes first, oom_reap_task() will skip the mm if
    MMF_OOM_SKIP is already set and in turn all memory is already freed and
    furthermore the mm data structures may already have been taken down by
    free_pgtables.

    [aarcange@redhat.com: incremental one liner]
    Link: http://lkml.kernel.org/r/20170726164319.GC29716@redhat.com
    [rientjes@google.com: remove unused mmput_async]
    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708141733130.50317@chino.kir.corp.google.com
    [aarcange@redhat.com: microoptimization]
    Link: http://lkml.kernel.org/r/20170817171240.GB5066@redhat.com
    Link: http://lkml.kernel.org/r/20170726162912.GA29716@redhat.com
    Fixes: 26db62f179d1 ("oom: keep mm of the killed task available")
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: David Rientjes
    Reported-by: David Rientjes
    Tested-by: David Rientjes
    Reviewed-by: Michal Hocko
    Cc: Tetsuo Handa
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

06 Sep, 2017

1 commit

  • Pull arm64 updates from Catalin Marinas:

    - VMAP_STACK support, allowing the kernel stacks to be allocated in the
    vmalloc space with a guard page for trapping stack overflows. One of
    the patches introduces THREAD_ALIGN and changes the generic
    alloc_thread_stack_node() to use this instead of THREAD_SIZE (no
    functional change for other architectures)

    - Contiguous PTE hugetlb support re-enabled (after being reverted a
    couple of times). We now have the semantics agreed in the generic mm
    layer together with API improvements so that the architecture code
    can detect between contiguous and non-contiguous huge PTEs

    - Initial support for persistent memory on ARM: DC CVAP instruction
    exposed to user space (HWCAP) and the in-kernel pmem API implemented

    - raid6 improvements for arm64: faster algorithm for the delta syndrome
    and implementation of the recovery routines using Neon

    - FP/SIMD refactoring and removal of support for Neon in interrupt
    context. This is in preparation for full SVE support

    - PTE accessors converted from inline asm to cmpxchg so that we can use
    LSE atomics if available (ARMv8.1)

    - Perf support for Cortex-A35 and A73

    - Non-urgent fixes and cleanups

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits)
    arm64: cleanup {COMPAT_,}SET_PERSONALITY() macro
    arm64: introduce separated bits for mm_context_t flags
    arm64: hugetlb: Cleanup setup_hugepagesz
    arm64: Re-enable support for contiguous hugepages
    arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages
    arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages
    arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages
    arm64: hugetlb: Add break-before-make logic for contiguous entries
    arm64: hugetlb: Spring clean huge pte accessors
    arm64: hugetlb: Introduce pte_pgprot helper
    arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present
    arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
    arm64: dma-mapping: Mark atomic_pool as __ro_after_init
    arm64: dma-mapping: Do not pass data to gen_pool_set_algo()
    arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths
    arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
    arm64: Move PTE_RDONLY bit handling out of set_pte_at()
    kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg()
    arm64: Convert pte handling from inline asm to using (cmp)xchg
    arm64: neon/efi: Make EFI fpsimd save/restore variables static
    ...

    Linus Torvalds
     

04 Sep, 2017

1 commit


01 Sep, 2017

1 commit

  • Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
    write killable") made it possible to kill a forking task while it is
    waiting to acquire its ->mmap_sem for write, in dup_mmap().

    However, it was overlooked that this introduced an new error path before
    the new mm_struct's ->uprobes_state.xol_area has been set to NULL after
    being copied from the old mm_struct by the memcpy in dup_mm(). For a
    task that has previously hit a uprobe tracepoint, this resulted in the
    'struct xol_area' being freed multiple times if the task was killed at
    just the right time while forking.

    Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather
    than in uprobe_dup_mmap().

    With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C
    program given by commit 2b7e8665b4ff ("fork: fix incorrect fput of
    ->exe_file causing use-after-free"), provided that a uprobe tracepoint
    has been set on the fork_thread() function. For example:

    $ gcc reproducer.c -o reproducer -lpthread
    $ nm reproducer | grep fork_thread
    0000000000400719 t fork_thread
    $ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events
    $ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
    $ ./reproducer

    Here is the use-after-free reported by KASAN:

    BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200
    Read of size 8 at addr ffff8800320a8b88 by task reproducer/198

    CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
    Call Trace:
    dump_stack+0xdb/0x185
    print_address_description+0x7e/0x290
    kasan_report+0x23b/0x350
    __asan_report_load8_noabort+0x19/0x20
    uprobe_clear_state+0x1c4/0x200
    mmput+0xd6/0x360
    do_exit+0x740/0x1670
    do_group_exit+0x13f/0x380
    get_signal+0x597/0x17d0
    do_signal+0x99/0x1df0
    exit_to_usermode_loop+0x166/0x1e0
    syscall_return_slowpath+0x258/0x2c0
    entry_SYSCALL_64_fastpath+0xbc/0xbe

    ...

    Allocated by task 199:
    save_stack_trace+0x1b/0x20
    kasan_kmalloc+0xfc/0x180
    kmem_cache_alloc_trace+0xf3/0x330
    __create_xol_area+0x10f/0x780
    uprobe_notify_resume+0x1674/0x2210
    exit_to_usermode_loop+0x150/0x1e0
    prepare_exit_to_usermode+0x14b/0x180
    retint_user+0x8/0x20

    Freed by task 199:
    save_stack_trace+0x1b/0x20
    kasan_slab_free+0xa8/0x1a0
    kfree+0xba/0x210
    uprobe_clear_state+0x151/0x200
    mmput+0xd6/0x360
    copy_process.part.8+0x605f/0x65d0
    _do_fork+0x1a5/0xbd0
    SyS_clone+0x19/0x20
    do_syscall_64+0x22f/0x660
    return_from_SYSCALL_64+0x0/0x7a

    Note: without KASAN, you may instead see a "Bad page state" message, or
    simply a general protection fault.

    Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com
    Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
    Signed-off-by: Eric Biggers
    Reported-by: Oleg Nesterov
    Acked-by: Oleg Nesterov
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Dmitry Vyukov
    Cc: Ingo Molnar
    Cc: Konstantin Khlebnikov
    Cc: Mark Rutland
    Cc: Michal Hocko
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Cc: [4.7+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

26 Aug, 2017

1 commit

  • Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
    write killable") made it possible to kill a forking task while it is
    waiting to acquire its ->mmap_sem for write, in dup_mmap().

    However, it was overlooked that this introduced an new error path before
    a reference is taken on the mm_struct's ->exe_file. Since the
    ->exe_file of the new mm_struct was already set to the old ->exe_file by
    the memcpy() in dup_mm(), it was possible for the mmput() in the error
    path of dup_mm() to drop a reference to ->exe_file which was never
    taken.

    This caused the struct file to later be freed prematurely.

    Fix it by updating mm_init() to NULL out the ->exe_file, in the same
    place it clears other things like the list of mmaps.

    This bug was found by syzkaller. It can be reproduced using the
    following C program:

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include

    static void *mmap_thread(void *_arg)
    {
    for (;;) {
    mmap(NULL, 0x1000000, PROT_READ,
    MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
    }
    }

    static void *fork_thread(void *_arg)
    {
    usleep(rand() % 10000);
    fork();
    }

    int main(void)
    {
    fork();
    fork();
    fork();
    for (;;) {
    if (fork() == 0) {
    pthread_t t;

    pthread_create(&t, NULL, mmap_thread, NULL);
    pthread_create(&t, NULL, fork_thread, NULL);
    usleep(rand() % 10000);
    syscall(__NR_exit_group, 0);
    }
    wait(NULL);
    }
    }

    No special kernel config options are needed. It usually causes a NULL
    pointer dereference in __remove_shared_vm_struct() during exit, or in
    dup_mmap() (which is usually inlined into copy_process()) during fork.
    Both are due to a vm_area_struct's ->vm_file being used after it's
    already been freed.

    Google Bug Id: 64772007

    Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com
    Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
    Signed-off-by: Eric Biggers
    Tested-by: Mark Rutland
    Acked-by: Michal Hocko
    Cc: Dmitry Vyukov
    Cc: Ingo Molnar
    Cc: Konstantin Khlebnikov
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Cc: [v4.7+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

16 Aug, 2017

1 commit

  • In some cases, an architecture might wish its stacks to be aligned to a
    boundary larger than THREAD_SIZE. For example, using an alignment of
    double THREAD_SIZE can allow for stack overflows smaller than
    THREAD_SIZE to be detected by checking a single bit of the stack
    pointer.

    This patch allows architectures to override the alignment of VMAP'd
    stacks, by defining THREAD_ALIGN. Where not defined, this defaults to
    THREAD_SIZE, as is the case today.

    Signed-off-by: Mark Rutland
    Reviewed-by: Will Deacon
    Tested-by: Laura Abbott
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Catalin Marinas
    Cc: James Morse
    Cc: linux-kernel@vger.kernel.org

    Mark Rutland
     

11 Aug, 2017

2 commits

  • Conflicts:
    include/linux/mm_types.h
    mm/huge_memory.c

    I removed the smp_mb__before_spinlock() like the following commit does:

    8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")

    and fixed up the affected commits.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Patch series "fixes of TLB batching races", v6.

    It turns out that Linux TLB batching mechanism suffers from various
    races. Races that are caused due to batching during reclamation were
    recently handled by Mel and this patch-set deals with others. The more
    fundamental issue is that concurrent updates of the page-tables allow
    for TLB flushes to be batched on one core, while another core changes
    the page-tables. This other core may assume a PTE change does not
    require a flush based on the updated PTE value, while it is unaware that
    TLB flushes are still pending.

    This behavior affects KSM (which may result in memory corruption) and
    MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior). A
    proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
    Memory corruption in KSM is harder to produce in practice, but was
    observed by hacking the kernel and adding a delay before flushing and
    replacing the KSM page.

    Finally, there is also one memory barrier missing, which may affect
    architectures with weak memory model.

    This patch (of 7):

    Setting and clearing mm->tlb_flush_pending can be performed by multiple
    threads, since mmap_sem may only be acquired for read in
    task_numa_work(). If this happens, tlb_flush_pending might be cleared
    while one of the threads still changes PTEs and batches TLB flushes.

    This can lead to the same race between migration and
    change_protection_range() that led to the introduction of
    tlb_flush_pending. The result of this race was data corruption, which
    means that this patch also addresses a theoretically possible data
    corruption.

    An actual data corruption was not observed, yet the race was was
    confirmed by adding assertion to check tlb_flush_pending is not set by
    two threads, adding artificial latency in change_protection_range() and
    using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

    Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
    Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
    change_protection_range")
    Signed-off-by: Nadav Amit
    Acked-by: Mel Gorman
    Acked-by: Rik van Riel
    Acked-by: Minchan Kim
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Andrea Arcangeli
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Jeff Dike
    Cc: Martin Schwidefsky
    Cc: Mel Gorman
    Cc: Russell King
    Cc: Sergey Senozhatsky
    Cc: Tony Luck
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadav Amit
     

10 Aug, 2017

1 commit

  • Lockdep is a runtime locking correctness validator that detects and
    reports a deadlock or its possibility by checking dependencies between
    locks. It's useful since it does not report just an actual deadlock but
    also the possibility of a deadlock that has not actually happened yet.
    That enables problems to be fixed before they affect real systems.

    However, this facility is only applicable to typical locks, such as
    spinlocks and mutexes, which are normally released within the context in
    which they were acquired. However, synchronization primitives like page
    locks or completions, which are allowed to be released in any context,
    also create dependencies and can cause a deadlock.

    So lockdep should track these locks to do a better job. The 'crossrelease'
    implementation makes these primitives also be tracked.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     

25 Jul, 2017

1 commit


18 Jul, 2017

1 commit


13 Jul, 2017

3 commits

  • Use the ascii-armor canary to prevent unterminated C string overflows
    from being able to successfully overwrite the canary, even if they
    somehow obtain the canary value.

    Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
    tree.

    Link: http://lkml.kernel.org/r/20170524155751.424-3-riel@redhat.com
    Signed-off-by: Rik van Riel
    Acked-by: Kees Cook
    Cc: Daniel Micay
    Cc: "Theodore Ts'o"
    Cc: H. Peter Anvin
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Add /proc/self/task//fail-nth file that allows failing
    0-th, 1-st, 2-nd and so on calls systematically.
    Excerpt from the added documentation:

    "Write to this file of integer N makes N-th call in the current task
    fail (N is 0-based). Read from this file returns a single char 'Y' or
    'N' that says if the fault setup with a previous write to this file
    was injected or not, and disables the fault if it wasn't yet injected.
    Note that this file enables all types of faults (slab, futex, etc).
    This setting takes precedence over all other generic settings like
    probability, interval, times, etc. But per-capability settings (e.g.
    fail_futex/ignore-private) take precedence over it. This feature is
    intended for systematic testing of faults in a single system call. See
    an example below"

    Why add a new setting:
    1. Existing settings are global rather than per-task.
    So parallel testing is not possible.
    2. attr->interval is close but it depends on attr->count
    which is non reset to 0, so interval does not work as expected.
    3. Trying to model this with existing settings requires manipulations
    of all of probability, interval, times, space, task-filter and
    unexposed count and per-task make-it-fail files.
    4. Existing settings are per-failure-type, and the set of failure
    types is potentially expanding.
    5. make-it-fail can't be changed by unprivileged user and aggressive
    stress testing better be done from an unprivileged user.
    Similarly, this would require opening the debugfs files to the
    unprivileged user, as he would need to reopen at least times file
    (not possible to pre-open before dropping privs).

    The proposed interface solves all of the above (see the example).

    We want to integrate this into syzkaller fuzzer. A prototype has found
    10 bugs in kernel in first day of usage:

    https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance

    I've made the current interface work with all types of our sandboxes.
    For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to
    make /proc entries non-root owned. So I am fine with the current
    version of the code.

    [akpm@linux-foundation.org: fix build]
    Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Akinobu Mita
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • The reason to disable interrupts seems to be to avoid switching to a
    different processor while handling per cpu data using individual loads and
    stores. If we use per cpu RMV primitives we will not have to disable
    interrupts.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org
    Signed-off-by: Christoph Lameter
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 Jul, 2017

1 commit

  • Pull scheduler fixes from Thomas Gleixner:
    "This scheduler update provides:

    - The (hopefully) final fix for the vtime accounting issues which
    were around for quite some time

    - Use types known to user space in UAPI headers to unbreak user space
    builds

    - Make load balancing respect the current scheduling domain again
    instead of evaluating unrelated CPUs"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors
    sched/fair: Fix load_balance() affinity redo path
    sched/cputime: Accumulate vtime on top of nsec clocksource
    sched/cputime: Move the vtime task fields to their own struct
    sched/cputime: Rename vtime fields
    sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime
    vtime, sched/cputime: Remove vtime_account_user()
    Revert "sched/cputime: Refactor the cputime_adjust() code"

    Linus Torvalds
     

07 Jul, 2017

1 commit


05 Jul, 2017

2 commits

  • We are about to add vtime accumulation fields to the task struct. Let's
    avoid more bloatification and gather vtime information to their own
    struct.

    Tested-by: Luiz Capitulino
    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Thomas Gleixner
    Acked-by: Rik van Riel
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The current "snapshot" based naming on vtime fields suggests we record
    some past event but that's a low level picture of their actual purpose
    which comes out blurry. The real point of these fields is to run a basic
    state machine that tracks down cputime entry while switching between
    contexts.

    So lets reflect that with more meaningful names.

    Tested-by: Luiz Capitulino
    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Thomas Gleixner
    Acked-by: Rik van Riel
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

27 May, 2017

1 commit


23 May, 2017

1 commit

  • If a kthread forks (e.g. usermodehelper since commit 1da5c46fa965) but
    fails in copy_process() between calling dup_task_struct() and setting
    p->set_child_tid, then the value of p->set_child_tid will be inherited
    from the parent and get prematurely freed by free_kthread_struct().

    kthread()
    - worker_thread()
    - process_one_work()
    | - call_usermodehelper_exec_work()
    | - kernel_thread()
    | - _do_fork()
    | - copy_process()
    | - dup_task_struct()
    | - arch_dup_task_struct()
    | - tsk->set_child_tid = current->set_child_tid // implied
    | - ...
    | - goto bad_fork_*
    | - ...
    | - free_task(tsk)
    | - free_kthread_struct(tsk)
    | - kfree(tsk->set_child_tid)
    - ...
    - schedule()
    - __schedule()
    - wq_worker_sleeping()
    - kthread_data(task)->flags // UAF

    The problem started showing up with commit 1da5c46fa965 since it reused
    ->set_child_tid for the kthread worker data.

    A better long-term solution might be to get rid of the ->set_child_tid
    abuse. The comment in set_kthread_struct() also looks slightly wrong.

    Debugged-by: Jamie Iles
    Fixes: 1da5c46fa965 ("kthread: Make struct kthread kmalloc'ed")
    Signed-off-by: Vegard Nossum
    Acked-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Greg Kroah-Hartman
    Cc: Andy Lutomirski
    Cc: Frederic Weisbecker
    Cc: Jamie Iles
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
    Signed-off-by: Thomas Gleixner

    Vegard Nossum
     

14 May, 2017

1 commit

  • Imagine we have a pid namespace and a task from its parent's pid_ns,
    which made setns() to the pid namespace. The task is doing fork(),
    while the pid namespace's child reaper is dying. We have the race
    between them:

    Task from parent pid_ns Child reaper
    copy_process() ..
    alloc_pid() ..
    .. zap_pid_ns_processes()
    .. disable_pid_allocation()
    .. read_lock(&tasklist_lock)
    .. iterate over pids in pid_ns
    .. kill tasks linked to pids
    .. read_unlock(&tasklist_lock)
    write_lock_irq(&tasklist_lock); ..
    attach_pid(p, PIDTYPE_PID); ..
    .. ..

    So, just created task p won't receive SIGKILL signal,
    and the pid namespace will be in contradictory state.
    Only manual kill will help there, but does the userspace
    care about this? I suppose, the most users just inject
    a task into a pid namespace and wait a SIGCHLD from it.

    The patch fixes the problem. It simply checks for
    (pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process().
    We do it under the tasklist_lock, and can't skip
    PIDNS_HASH_ADDING as noted by Oleg:

    "zap_pid_ns_processes() does disable_pid_allocation()
    and then takes tasklist_lock to kill the whole namespace.
    Given that copy_process() checks PIDNS_HASH_ADDING
    under write_lock(tasklist) they can't race;
    if copy_process() takes this lock first, the new child will
    be killed, otherwise copy_process() can't miss
    the change in ->nr_hashed."

    If allocation is disabled, we just return -ENOMEM
    like it's made for such cases in alloc_pid().

    v2: Do not move disable_pid_allocation(), do not
    introduce a new variable in copy_process() and simplify
    the patch as suggested by Oleg Nesterov.
    Account the problem with double irq enabling
    found by Eric W. Biederman.

    Fixes: c876ad768215 ("pidns: Stop pid allocation when init dies")
    Signed-off-by: Kirill Tkhai
    CC: Andrew Morton
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Oleg Nesterov
    CC: Mike Rapoport
    CC: Michal Hocko
    CC: Andy Lutomirski
    CC: "Eric W. Biederman"
    CC: Andrei Vagin
    CC: Cyrill Gorcunov
    CC: Serge Hallyn
    Cc: stable@vger.kernel.org
    Acked-by: Oleg Nesterov
    Signed-off-by: Eric W. Biederman

    Kirill Tkhai
     

13 May, 2017

1 commit


11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

09 May, 2017

2 commits

  • __vmalloc* allows users to provide gfp flags for the underlying
    allocation. This API is quite popular

    $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
    77

    The only problem is that many people are not aware that they really want
    to give __GFP_HIGHMEM along with other flags because there is really no
    reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
    which are mapped to the kernel vmalloc space. About half of users don't
    use this flag, though. This signals that we make the API unnecessarily
    too complex.

    This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
    be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
    are simplified and drop the flag.

    Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Matthew Wilcox
    Cc: Al Viro
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Cristopher Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Using virtually mapped stack, kernel stacks are allocated via vmalloc.

    In the current implementation, two stacks per cpu can be cached when
    tasks are freed and the cached stacks are used again in task
    duplications. But the cached stacks may remain unfreed even when cpu
    are offline. By adding a cpu hotplug callback to free the cached stacks
    when a cpu goes offline, the pages of the cached stacks are not wasted.

    Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com
    Signed-off-by: Hoeun Ryu
    Reviewed-by: Thomas Gleixner
    Acked-by: Michal Hocko
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: Kees Cook
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Mateusz Guzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hoeun Ryu
     

05 May, 2017

1 commit

  • …o 64 bits on 64-bit platforms

    The stack canary is an 'unsigned long' and should be fully initialized to
    random data rather than only 32 bits of random data.

    Signed-off-by: Daniel Micay <danielmicay@gmail.com>
    Acked-by: Arjan van de Ven <arjan@linux.intel.com>
    Acked-by: Rik van Riel <riel@redhat.com>
    Acked-by: Kees Cook <keescook@chromium.org>
    Cc: Arjan van Ven <arjan@linux.intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: kernel-hardening@lists.openwall.com
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Daniel Micay
     

03 May, 2017

2 commits

  • Pull security subsystem updates from James Morris:
    "Highlights:

    IMA:
    - provide ">" and " of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits)
    tpm: Fix reference count to main device
    tpm_tis: convert to using locality callbacks
    tpm: fix handling of the TPM 2.0 event logs
    tpm_crb: remove a cruft constant
    keys: select CONFIG_CRYPTO when selecting DH / KDF
    apparmor: Make path_max parameter readonly
    apparmor: fix parameters so that the permission test is bypassed at boot
    apparmor: fix invalid reference to index variable of iterator line 836
    apparmor: use SHASH_DESC_ON_STACK
    security/apparmor/lsm.c: set debug messages
    apparmor: fix boolreturn.cocci warnings
    Smack: Use GFP_KERNEL for smk_netlbl_mls().
    smack: fix double free in smack_parse_opts_str()
    KEYS: add SP800-56A KDF support for DH
    KEYS: Keyring asymmetric key restrict method with chaining
    KEYS: Restrict asymmetric key linkage using a specific keychain
    KEYS: Add a lookup_restriction function for the asymmetric key type
    KEYS: Add KEYCTL_RESTRICT_KEYRING
    KEYS: Consistent ordering for __key_link_begin and restrict check
    KEYS: Add an optional lookup_restriction hook to key_type
    ...

    Linus Torvalds
     
  • Pull livepatch updates from Jiri Kosina:

    - a per-task consistency model is being added for architectures that
    support reliable stack dumping (extending this, currently rather
    trivial set, is currently in the works).

    This extends the nature of the types of patches that can be applied
    by live patching infrastructure. The code stems from the design
    proposal made [1] back in November 2014. It's a hybrid of SUSE's
    kGraft and RH's kpatch, combining advantages of both: it uses
    kGraft's per-task consistency and syscall barrier switching combined
    with kpatch's stack trace switching. There are also a number of
    fallback options which make it quite flexible.

    Most of the heavy lifting done by Josh Poimboeuf with help from
    Miroslav Benes and Petr Mladek

    [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz

    - module load time patch optimization from Zhou Chengming

    - a few assorted small fixes

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: add missing printk newlines
    livepatch: Cancel transition a safe way for immediate patches
    livepatch: Reduce the time of finding module symbols
    livepatch: make klp_mutex proper part of API
    livepatch: allow removal of a disabled patch
    livepatch: add /proc//patch_state
    livepatch: change to a per-task consistency model
    livepatch: store function sizes
    livepatch: use kstrtobool() in enabled_store()
    livepatch: move patching functions into patch.c
    livepatch: remove unnecessary object loaded check
    livepatch: separate enabled and patched states
    livepatch/s390: add TIF_PATCH_PENDING thread flag
    livepatch/s390: reorganize TIF thread flag bits
    livepatch/powerpc: add TIF_PATCH_PENDING thread flag
    livepatch/x86: add TIF_PATCH_PENDING thread flag
    livepatch: create temporary klp_update_patch_state() stub
    x86/entry: define _TIF_ALLWORK_MASK flags explicitly
    stacktrace/x86: add function for detecting reliable stack traces

    Linus Torvalds
     

02 May, 2017

1 commit

  • Pull perf updates from Ingo Molnar:
    "The main changes in this cycle were:

    Kernel side changes:

    - Kprobes and uprobes changes:
    - Make their trampolines read-only while they are used
    - Make UPROBES_EVENTS default-y which is the distro practice
    - Apply misc fixes and robustization to probe point insertion.

    - add support for AMD IOMMU events

    - extend hw events on Intel Goldmont CPUs

    - ... plus misc fixes and updates.

    Tooling side changes:

    - support s390 jump instructions in perf annotate (Christian
    Borntraeger)

    - vendor hardware events updates (Andi Kleen)

    - add argument support for SDT events in powerpc (Ravi Bangoria)

    - beautify the statx syscall arguments in 'perf trace' (Arnaldo
    Carvalho de Melo)

    - handle inline functions in callchains (Jin Yao)

    - enable sorting by srcline as key (Milian Wolff)

    - add 'brstackinsn' field in 'perf script' to reuse the x86
    instruction decoder used in the Intel PT code to study hot paths to
    samples (Andi Kleen)

    - add PERF_RECORD_NAMESPACES so that the kernel can record
    information required to associate samples to namespaces, helping in
    container problem characterization. (Hari Bathini)

    - allow sorting by symbol_size in 'perf report' and 'perf top'
    (Charles Baylis)

    - in perf stat, make system wide (-a) the default option if no target
    was specified and one of following conditions is met:
    - no workload specified (current behaviour)
    - a workload is specified but all requested events are system wide
    ones, like uncore ones. (Jiri Olsa)

    - ... plus lots of other updates, enhancements, cleanups and fixes"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (235 commits)
    perf tools: Fix the code to strip command name
    tools arch x86: Sync cpufeatures.h
    tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel
    tools: Update asm-generic/mman-common.h copy from the kernel
    perf tools: Use just forward declarations for struct thread where possible
    perf tools: Add the right header to obtain PERF_ALIGN()
    perf tools: Remove poll.h and wait.h from util.h
    perf tools: Remove string.h, unistd.h and sys/stat.h from util.h
    perf tools: Remove stale prototypes from builtin.h
    perf tools: Remove string.h from util.h
    perf tools: Remove sys/ioctl.h from util.h
    perf tools: Remove a few more needless includes from util.h
    perf tools: Include sys/param.h where needed
    perf callchain: Move callchain specific routines from util.[ch]
    perf tools: Add compress.h for the *_decompress_to_file() headers
    perf mem: Fix display of data source snoop indication
    perf debug: Move dump_stack() and sighandler_dump_stack() to debug.h
    perf kvm: Make function only used by 'perf kvm' static
    perf tools: Move timestamp routines from util.h to time-utils.h
    perf tools: Move units conversion/formatting routines to separate object
    ...

    Linus Torvalds