01 Oct, 2020

1 commit

  • [ Upstream commit 6914303824bb572278568330d72fc1f8f9814e67 ]

    This changes perf_event_set_clock to use the new exec_update_mutex
    instead of cred_guard_mutex.

    This should be safe, as the credentials are only used for reading.

    Signed-off-by: Bernd Edlinger
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Sasha Levin

    Bernd Edlinger
     

26 Aug, 2020

1 commit

  • commit c17c3dc9d08b9aad9a55a1e53f205187972f448e upstream.

    syzbot crashed on the VM_BUG_ON_PAGE(PageTail) in munlock_vma_page(), when
    called from uprobes __replace_page(). Which of many ways to fix it?
    Settled on not calling when PageCompound (since Head and Tail are equals
    in this context, PageCompound the usual check in uprobes.c, and the prior
    use of FOLL_SPLIT_PMD will have cleared PageMlocked already).

    Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
    Reported-by: syzbot
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Reviewed-by: Srikar Dronamraju
    Acked-by: Song Liu
    Acked-by: Oleg Nesterov
    Cc: "Kirill A. Shutemov"
    Cc: [5.4+]
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008161338360.20413@eggly.anvils
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Hugh Dickins
     

11 Aug, 2020

1 commit

  • commit 90c91dfb86d0ff545bd329d3ddd72c147e2ae198 upstream.

    Kan and Andi reported that we fail to kill rotation when the flexible
    events go empty, but the context does not. XXX moar

    Fixes: fd7d55172d1e ("perf/cgroups: Don't rotate events for cgroups unnecessarily")
    Reported-by: Andi Kleen
    Reported-by: Kan Liang
    Tested-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200305123851.GX2596@hirez.programming.kicks-ass.net
    Cc: Robin Murphy
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

29 Jul, 2020

1 commit

  • commit fe5ed7ab99c656bd2f5b79b49df0e9ebf2cead8a upstream.

    If a tracee is uprobed and it hits int3 inserted by debugger, handle_swbp()
    does send_sig(SIGTRAP, current, 0) which means si_code == SI_USER. This used
    to work when this code was written, but then GDB started to validate si_code
    and now it simply can't use breakpoints if the tracee has an active uprobe:

    # cat test.c
    void unused_func(void)
    {
    }
    int main(void)
    {
    return 0;
    }

    # gcc -g test.c -o test
    # perf probe -x ./test -a unused_func
    # perf record -e probe_test:unused_func gdb ./test -ex run
    GNU gdb (GDB) 10.0.50.20200714-git
    ...
    Program received signal SIGTRAP, Trace/breakpoint trap.
    0x00007ffff7ddf909 in dl_main () from /lib64/ld-linux-x86-64.so.2
    (gdb)

    The tracee hits the internal breakpoint inserted by GDB to monitor shared
    library events but GDB misinterprets this SIGTRAP and reports a signal.

    Change handle_swbp() to use force_sig(SIGTRAP), this matches do_int3_user()
    and fixes the problem.

    This is the minimal fix for -stable, arch/x86/kernel/uprobes.c is equally
    wrong; it should use send_sigtrap(TRAP_TRACE) instead of send_sig(SIGTRAP),
    but this doesn't confuse GDB and needs another x86-specific patch.

    Reported-by: Aaron Merey
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Ingo Molnar
    Reviewed-by: Srikar Dronamraju
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20200723154420.GA32043@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     

17 Jun, 2020

1 commit

  • commit 2ed6edd33a214bca02bd2b45e3fc3038a059436b upstream.

    Under rare circumstances, task_function_call() can repeatedly fail and
    cause a soft lockup.

    There is a slight race where the process is no longer running on the cpu
    we targeted by the time remote_function() runs. The code will simply
    try again. If we are very unlucky, this will continue to fail, until a
    watchdog fires. This can happen in a heavily loaded, multi-core virtual
    machine.

    Reported-by: syzbot+bb4935a5c09b5ff79940@syzkaller.appspotmail.com
    Signed-off-by: Barret Rhoden
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200414222920.121401-1-brho@google.com
    Signed-off-by: Greg Kroah-Hartman

    Barret Rhoden
     

11 Jun, 2020

1 commit

  • commit 013b2deba9a6b80ca02f4fafd7dedf875e9b4450 upstream.

    uprobe_write_opcode() must not cross page boundary; prepare_uprobe()
    relies on arch_uprobe_analyze_insn() which should validate "vaddr" but
    some architectures (csky, s390, and sparc) don't do this.

    We can remove the BUG_ON() check in prepare_uprobe() and validate the
    offset early in __uprobe_register(). The new IS_ALIGNED() check matches
    the alignment check in arch_prepare_kprobe() on supported architectures,
    so I think that all insns must be aligned to UPROBE_SWBP_INSN_SIZE.

    Another problem is __update_ref_ctr() which was wrong from the very
    beginning, it can read/write outside of kmap'ed page unless "vaddr" is
    aligned to sizeof(short), __uprobe_register() should check this too.

    Reported-by: Linus Torvalds
    Suggested-by: Linus Torvalds
    Signed-off-by: Oleg Nesterov
    Reviewed-by: Srikar Dronamraju
    Acked-by: Christian Borntraeger
    Tested-by: Sven Schnelle
    Cc: Steven Rostedt
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     

02 May, 2020

1 commit

  • commit f3bed55e850926614b9898fe982f66d2541a36a5 upstream.

    Current logic yields the child task as the parent.

    Before:
    $ perf record bash -c "perf list > /dev/null"
    $ perf script -D |grep 'FORK\|EXIT'
    4387036190981094 0x5a70 [0x30]: PERF_RECORD_FORK(10472:10472):(10470:10470)
    4387036606207580 0xf050 [0x30]: PERF_RECORD_EXIT(10472:10472):(10472:10472)
    4387036607103839 0x17150 [0x30]: PERF_RECORD_EXIT(10470:10470):(10470:10470)
    ^
    Note the repeated values here -------------------/

    After:
    383281514043 0x9d8 [0x30]: PERF_RECORD_FORK(2268:2268):(2266:2266)
    383442003996 0x2180 [0x30]: PERF_RECORD_EXIT(2268:2268):(2266:2266)
    383451297778 0xb70 [0x30]: PERF_RECORD_EXIT(2266:2266):(2265:2265)

    Fixes: 94d5d1b2d891 ("perf_counter: Report the cloning task as parent on perf_counter_fork()")
    Reported-by: KP Singh
    Signed-off-by: Ian Rogers
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200417182842.12522-1-irogers@google.com
    Signed-off-by: Greg Kroah-Hartman

    Ian Rogers
     

29 Apr, 2020

1 commit

  • [ Upstream commit d3296fb372bf7497b0e5d0478c4e7a677ec6f6e9 ]

    We hit following warning when running tests on kernel
    compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y:

    WARNING: CPU: 19 PID: 4472 at mm/gup.c:2381 __get_user_pages_fast+0x1a4/0x200
    CPU: 19 PID: 4472 Comm: dummy Not tainted 5.6.0-rc6+ #3
    RIP: 0010:__get_user_pages_fast+0x1a4/0x200
    ...
    Call Trace:
    perf_prepare_sample+0xff1/0x1d90
    perf_event_output_forward+0xe8/0x210
    __perf_event_overflow+0x11a/0x310
    __intel_pmu_pebs_event+0x657/0x850
    intel_pmu_drain_pebs_nhm+0x7de/0x11d0
    handle_pmi_common+0x1b2/0x650
    intel_pmu_handle_irq+0x17b/0x370
    perf_event_nmi_handler+0x40/0x60
    nmi_handle+0x192/0x590
    default_do_nmi+0x6d/0x150
    do_nmi+0x2f9/0x3c0
    nmi+0x8e/0xd7

    While __get_user_pages_fast() is IRQ-safe, it calls access_ok(),
    which warns on:

    WARN_ON_ONCE(!in_task() && !pagefault_disabled())

    Peter suggested disabling page faults around __get_user_pages_fast(),
    which gets rid of the warning in access_ok() call.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Link: https://lkml.kernel.org/r/20200407141427.3184722-1-jolsa@kernel.org
    Signed-off-by: Sasha Levin

    Jiri Olsa
     

11 Feb, 2020

1 commit

  • commit 003461559ef7a9bd0239bae35a22ad8924d6e9ad upstream.

    Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of
    a perf ring buffer may lead to an integer underflow in locked memory
    accounting. This may lead to the undesired behaviors, such as failures in
    BPF map creation.

    Address this by adjusting the accounting logic to take into account the
    possibility that the amount of already locked memory may exceed the
    current limit.

    Fixes: c4b75479741c ("perf/core: Make the mlock accounting simple again")
    Suggested-by: Alexander Shishkin
    Signed-off-by: Song Liu
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc:
    Acked-by: Alexander Shishkin
    Link: https://lkml.kernel.org/r/20200123181146.2238074-1-songliubraving@fb.com
    Signed-off-by: Greg Kroah-Hartman

    Song Liu
     

23 Jan, 2020

1 commit

  • commit da9ec3d3dd0f1240a48920be063448a2242dbd90 upstream.

    Vince reports a worrying issue:

    | so I was tracking down some odd behavior in the perf_fuzzer which turns
    | out to be because perf_even_open() sometimes returns 0 (indicating a file
    | descriptor of 0) even though as far as I can tell stdin is still open.

    ... and further the cause:

    | error is triggered if aux_sample_size has non-zero value.
    |
    | seems to be this line in kernel/events/core.c:
    |
    | if (perf_need_aux_event(event) && !perf_get_aux_event(event, group_leader))
    | goto err_locked;
    |
    | (note, err is never set)

    This seems to be a thinko in commit:

    ab43762ef010967e ("perf: Allow normal events to output AUX data")

    ... and we should probably return -EINVAL here, as this should only
    happen when the new event is mis-configured or does not have a
    compatible aux_event group leader.

    Fixes: ab43762ef010967e ("perf: Allow normal events to output AUX data")
    Reported-by: Vince Weaver
    Signed-off-by: Mark Rutland
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Acked-by: Alexander Shishkin
    Tested-by: Vince Weaver
    Signed-off-by: Greg Kroah-Hartman

    Mark Rutland
     

31 Dec, 2019

1 commit

  • [ Upstream commit 36b3db03b4741b8935b68fffc7e69951d8d70a89 ]

    Commit:

    5e6c3c7b1ec2 ("perf/aux: Fix tracking of auxiliary trace buffer allocation")

    tried to guess the correct combination of arithmetic operations that would
    undo the AUX buffer's mlock accounting, and failed, leaking the bottom part
    when an allocation needs to be charged partially to both user->locked_vm
    and mm->pinned_vm, eventually leaving the user with no locked bonus:

    $ perf record -e intel_pt//u -m1,128 uname
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.061 MB perf.data ]

    $ perf record -e intel_pt//u -m1,128 uname
    Permission error mapping pages.
    Consider increasing /proc/sys/kernel/perf_event_mlock_kb,
    or try again with a smaller value of -m/--mmap_pages.
    (current value: 1,128)

    Fix this by subtracting both locked and pinned counts when AUX buffer is
    unmapped.

    Reported-by: Thomas Richter
    Tested-by: Thomas Richter
    Signed-off-by: Alexander Shishkin
    Acked-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin

    Alexander Shishkin
     

13 Nov, 2019

6 commits

  • It looks like a "static inline" has been missed in front
    of the empty definition of perf_cgroup_switch() under
    certain configurations.

    Fixes the following sparse warning:

    kernel/events/core.c:1035:1: warning: symbol 'perf_cgroup_switch' was not declared. Should it be static?

    Signed-off-by: Ben Dooks (Codethink)
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Mark Rutland
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/20191106132527.19977-1-ben.dooks@codethink.co.uk
    Signed-off-by: Ingo Molnar

    Ben Dooks (Codethink)
     
  • Commit:

    313ccb9615948 ("perf: Allocate context task_ctx_data for child event")

    makes the inherit path skip over the current event in case of task_ctx_data
    allocation failure. This, however, is inconsistent with allocation failures
    in perf_event_alloc(), which would abort the fork.

    Correct this by returning an error code on task_ctx_data allocation
    failure and failing the fork in that case.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/20191105075702.60319-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • Commit

    ab43762ef0109 ("perf: Allow normal events to output AUX data")

    added 'aux_output' bit to the attribute structure, which relies on AUX
    events and grouping, neither of which is supported for the kernel events.
    This notwithstanding, attempts have been made to use it in the kernel
    code, suggesting the necessity of an explicit hard -EINVAL.

    Fix this by rejecting attributes with aux_output set for kernel events.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/20191030134731.5437-3-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • A comment is in a wrong place in perf_event_create_kernel_counter().
    Fix that.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/20191030134731.5437-2-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • Commit

    f733c6b508bc ("perf/core: Fix inheritance of aux_output groups")

    adds a NULL pointer dereference in case inherit_group() races with
    perf_release(), which causes the below crash:

    > BUG: kernel NULL pointer dereference, address: 000000000000010b
    > #PF: supervisor read access in kernel mode
    > #PF: error_code(0x0000) - not-present page
    > PGD 3b203b067 P4D 3b203b067 PUD 3b2040067 PMD 0
    > Oops: 0000 [#1] SMP KASAN
    > CPU: 0 PID: 315 Comm: exclusive-group Tainted: G B 5.4.0-rc3-00181-g72e1839403cb-dirty #878
    > RIP: 0010:perf_get_aux_event+0x86/0x270
    > Call Trace:
    > ? __perf_read_group_add+0x3b0/0x3b0
    > ? __kasan_check_write+0x14/0x20
    > ? __perf_event_init_context+0x154/0x170
    > inherit_task_group.isra.0.part.0+0x14b/0x170
    > perf_event_init_task+0x296/0x4b0

    Fix this by skipping over events that are getting closed, in the
    inheritance path.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: f733c6b508bc ("perf/core: Fix inheritance of aux_output groups")
    Link: https://lkml.kernel.org/r/20191101151248.47327-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • While discussing uncore event scheduling, I noticed we do not in fact
    seem to dis-allow making uncore-cgroup events. Such events make no
    sense what so ever because the cgroup is a CPU local state where
    uncore counts across a number of CPUs.

    Disallow them.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 Oct, 2019

1 commit

  • Commit:

    1a5941312414c ("perf: Add wakeup watermark control to the AUX area")

    added attr.__reserved_2 padding, but forgot to add an ABI check to reject
    attributes with this field set. Fix that.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: adrian.hunter@intel.com
    Cc: mathieu.poirier@linaro.org
    Link: https://lkml.kernel.org/r/20191025121636.75182-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     

27 Oct, 2019

1 commit

  • Pull perf fixes from Thomas Gleixner:
    "A set of perf fixes:

    kernel:

    - Unbreak the tracking of auxiliary buffer allocations which got
    imbalanced causing recource limit failures.

    - Fix the fallout of splitting of ToPA entries which missed to shift
    the base entry PA correctly.

    - Use the correct context to lookup the AUX event when unmapping the
    associated AUX buffer so the event can be stopped and the buffer
    reference dropped.

    tools:

    - Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
    /proc/kcore

    - Fix freeing id arrays in the event list so the correct event is
    closed.

    - Sync sched.h anc kvm.h headers with the kernel sources.

    - Link jvmti against tools/lib/ctype.o to have weak strlcpy().

    - Fix multiple memory and file descriptor leaks, found by coverity in
    perf annotate.

    - Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
    by a static analysis tool"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/aux: Fix AUX output stopping
    perf/aux: Fix tracking of auxiliary trace buffer allocation
    perf/x86/intel/pt: Fix base for single entry topa
    perf kmem: Fix memory leak in compact_gfp_flags()
    tools headers UAPI: Sync sched.h with the kernel
    tools headers kvm: Sync kvm.h headers with the kernel sources
    tools headers kvm: Sync kvm headers with the kernel sources
    tools headers kvm: Sync kvm headers with the kernel sources
    perf c2c: Fix memory leak in build_cl_output()
    perf tools: Fix mode setting in copyfile_mode_ns()
    perf annotate: Fix multiple memory and file descriptor leaks
    perf tools: Fix resource leak of closedir() on the error paths
    perf evlist: Fix fix for freed id arrays
    perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()

    Linus Torvalds
     

22 Oct, 2019

1 commit

  • Commit:

    8a58ddae2379 ("perf/core: Fix exclusive events' grouping")

    allows CAP_EXCLUSIVE events to be grouped with other events. Since all
    of those also happen to be AUX events (which is not the case the other
    way around, because arch/s390), this changes the rules for stopping the
    output: the AUX event may not be on its PMU's context any more, if it's
    grouped with a HW event, in which case it will be on that HW event's
    context instead. If that's the case, munmap() of the AUX buffer can't
    find and stop the AUX event, potentially leaving the last reference with
    the atomic context, which will then end up freeing the AUX buffer. This
    will then trip warnings:

    Fix this by using the context's PMU context when looking for events
    to stop, instead of the event's PMU context.

    Signed-off-by: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     

21 Oct, 2019

1 commit

  • The following commit from the v5.4 merge window:

    d44248a41337 ("perf/core: Rework memory accounting in perf_mmap()")

    ... breaks auxiliary trace buffer tracking.

    If I run command 'perf record -e rbd000' to record samples and saving
    them in the **auxiliary** trace buffer then the value of 'locked_vm' becomes
    negative after all trace buffers have been allocated and released:

    During allocation the values increase:

    [52.250027] perf_mmap user->locked_vm:0x87 pinned_vm:0x0 ret:0
    [52.250115] perf_mmap user->locked_vm:0x107 pinned_vm:0x0 ret:0
    [52.250251] perf_mmap user->locked_vm:0x188 pinned_vm:0x0 ret:0
    [52.250326] perf_mmap user->locked_vm:0x208 pinned_vm:0x0 ret:0
    [52.250441] perf_mmap user->locked_vm:0x289 pinned_vm:0x0 ret:0
    [52.250498] perf_mmap user->locked_vm:0x309 pinned_vm:0x0 ret:0
    [52.250613] perf_mmap user->locked_vm:0x38a pinned_vm:0x0 ret:0
    [52.250715] perf_mmap user->locked_vm:0x408 pinned_vm:0x2 ret:0
    [52.250834] perf_mmap user->locked_vm:0x408 pinned_vm:0x83 ret:0
    [52.250915] perf_mmap user->locked_vm:0x408 pinned_vm:0x103 ret:0
    [52.251061] perf_mmap user->locked_vm:0x408 pinned_vm:0x184 ret:0
    [52.251146] perf_mmap user->locked_vm:0x408 pinned_vm:0x204 ret:0
    [52.251299] perf_mmap user->locked_vm:0x408 pinned_vm:0x285 ret:0
    [52.251383] perf_mmap user->locked_vm:0x408 pinned_vm:0x305 ret:0
    [52.251544] perf_mmap user->locked_vm:0x408 pinned_vm:0x386 ret:0
    [52.251634] perf_mmap user->locked_vm:0x408 pinned_vm:0x406 ret:0
    [52.253018] perf_mmap user->locked_vm:0x408 pinned_vm:0x487 ret:0
    [52.253197] perf_mmap user->locked_vm:0x408 pinned_vm:0x508 ret:0
    [52.253374] perf_mmap user->locked_vm:0x408 pinned_vm:0x589 ret:0
    [52.253550] perf_mmap user->locked_vm:0x408 pinned_vm:0x60a ret:0
    [52.253726] perf_mmap user->locked_vm:0x408 pinned_vm:0x68b ret:0
    [52.253903] perf_mmap user->locked_vm:0x408 pinned_vm:0x70c ret:0
    [52.254084] perf_mmap user->locked_vm:0x408 pinned_vm:0x78d ret:0
    [52.254263] perf_mmap user->locked_vm:0x408 pinned_vm:0x80e ret:0

    The value of user->locked_vm increases to a limit then the memory
    is tracked by pinned_vm.

    During deallocation the size is subtracted from pinned_vm until
    it hits a limit. Then a larger value is subtracted from locked_vm
    leading to a large number (because of type unsigned):

    [64.267797] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x78d
    [64.267826] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x70c
    [64.267848] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x68b
    [64.267869] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x60a
    [64.267891] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x589
    [64.267911] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x508
    [64.267933] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x487
    [64.267952] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x406
    [64.268883] perf_mmap_close mmap_user->locked_vm:0x307 pinned_vm:0x406
    [64.269117] perf_mmap_close mmap_user->locked_vm:0x206 pinned_vm:0x406
    [64.269433] perf_mmap_close mmap_user->locked_vm:0x105 pinned_vm:0x406
    [64.269536] perf_mmap_close mmap_user->locked_vm:0x4 pinned_vm:0x404
    [64.269797] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff84 pinned_vm:0x303
    [64.270105] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff04 pinned_vm:0x202
    [64.270374] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe84 pinned_vm:0x101
    [64.270628] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe04 pinned_vm:0x0

    This value sticks for the user until system is rebooted, causing
    follow-on system calls using locked_vm resource limit to fail.

    Note: There is no issue using the normal trace buffer.

    In fact the issue is in perf_mmap_close(). During allocation auxiliary
    trace buffer memory is either traced as 'extra' and added to 'pinned_vm'
    or trace as 'user_extra' and added to 'locked_vm'. This applies for
    normal trace buffers and auxiliary trace buffer.

    However in function perf_mmap_close() all auxiliary trace buffer is
    subtraced from 'locked_vm' and never from 'pinned_vm'. This breaks the
    ballance.

    Signed-off-by: Thomas Richter
    Acked-by: Peter Zijlstra
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: acme@kernel.org
    Cc: gor@linux.ibm.com
    Cc: hechaol@fb.com
    Cc: heiko.carstens@de.ibm.com
    Cc: linux-perf-users@vger.kernel.org
    Cc: songliubraving@fb.com
    Fixes: d44248a41337 ("perf/core: Rework memory accounting in perf_mmap()")
    Link: https://lkml.kernel.org/r/20191021083354.67868-1-tmricht@linux.ibm.com
    [ Minor readability edits. ]
    Signed-off-by: Ingo Molnar

    Thomas Richter
     

19 Oct, 2019

1 commit

  • Attaching uprobe to text section in THP splits the PMD mapped page table
    into PTE mapped entries. On uprobe detach, we would like to regroup PMD
    mapped page table entry to regain performance benefit of THP.

    However, the regroup is broken For perf_event based trace_uprobe. This
    is because perf_event based trace_uprobe calls uprobe_unregister twice
    on close: first in TRACE_REG_PERF_CLOSE, then in
    TRACE_REG_PERF_UNREGISTER. The second call will split the PMD mapped
    page table entry, which is not the desired behavior.

    Fix this by only use FOLL_SPLIT_PMD for uprobe register case.

    Add a WARN() to confirm uprobe unregister never work on huge pages, and
    abort the operation when this WARN() triggers.

    Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com
    Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
    Signed-off-by: Song Liu
    Reviewed-by: Srikar Dronamraju
    Cc: Kirill A. Shutemov
    Cc: Oleg Nesterov
    Cc: Matthew Wilcox (Oracle)
    Cc: William Kucharski
    Cc: Yang Shi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     

09 Oct, 2019

2 commits

  • In perf_rotate_context(), when the first cpu flexible event fail to
    schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is
    NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus
    cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next
    perf_event_sched_in() will skip all cpu flexible events because of the
    EVENT_FLEXIBLE bit.

    In the next call of perf_rotate_context(), cpu_rotate stays 1, and
    cpu_event stays NULL, so this process repeats. The end result is, flexible
    events on this cpu will not be scheduled (until another event being added
    to the cpuctx).

    Here is an easy repro of this issue. On Intel CPUs, where ref-cycles
    could only use one counter, run one pinned event for ref-cycles, one
    flexible event for ref-cycles, and one flexible event for cycles. The
    flexible ref-cycles is never scheduled, which is expected. However,
    because of this issue, the cycles event is never scheduled either.

    $ perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000

    time counts unit events
    1.000152973 15,412,480 ref-cycles:D
    1.000152973 ref-cycles (0.00%)
    1.000152973 cycles (0.00%)
    2.000486957 18,263,120 ref-cycles:D
    2.000486957 ref-cycles (0.00%)
    2.000486957 cycles (0.00%)

    To fix this, when the flexible_active list is empty, try rotate the
    first event in the flexible_groups. Also, rename ctx_first_active() to
    ctx_event_to_rotate(), which is more accurate.

    Signed-off-by: Song Liu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Thomas Gleixner
    Fixes: 8d5bce0c37fa ("perf/core: Optimize perf_rotate_context() event scheduling")
    Link: https://lkml.kernel.org/r/20191008165949.920548-1-songliubraving@fb.com
    Signed-off-by: Ingo Molnar

    Song Liu
     
  • perf_mmap() always increases user->locked_vm. As a result, "extra" could
    grow bigger than "user_extra", which doesn't make sense. Here is an
    example case:

    (Note: Assume "user_lock_limit" is very small.)

    | # of perf_mmap calls |vma->vm_mm->pinned_vm|user->locked_vm|
    | 0 | 0 | 0 |
    | 1 | user_extra | user_extra |
    | 2 | 3 * user_extra | 2 * user_extra|
    | 3 | 6 * user_extra | 3 * user_extra|
    | 4 | 10 * user_extra | 4 * user_extra|

    Fix this by maintaining proper user_extra and extra.

    Reviewed-By: Hechao Li
    Reported-by: Hechao Li
    Signed-off-by: Song Liu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Jie Meng
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190904214618.3795672-1-songliubraving@fb.com
    Signed-off-by: Ingo Molnar

    Song Liu
     

07 Oct, 2019

1 commit

  • Commit:

    ab43762ef010 ("perf: Allow normal events to output AUX data")

    forgets to configure aux_output relation in the inherited groups, which
    results in child PEBS events forever failing to schedule.

    Fix this by setting up the AUX output link in the inheritance path.

    Signed-off-by: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20191004125729.32397-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     

01 Oct, 2019

1 commit

  • Switch perf_event_open() syscall from it's own copying
    struct perf_event_attr from userspace to the new dedicated
    copy_struct_from_user() helper.

    The change is very straightforward, and helps unify the syscall
    interface for struct-from-userspace syscalls.

    Signed-off-by: Aleksa Sarai
    Reviewed-by: Kees Cook
    Reviewed-by: Christian Brauner
    [christian.brauner@ubuntu.com: improve commit message]
    Link: https://lore.kernel.org/r/20191001011055.19283-5-cyphar@cyphar.com
    Signed-off-by: Christian Brauner

    Aleksa Sarai
     

28 Sep, 2019

1 commit

  • Pull kernel lockdown mode from James Morris:
    "This is the latest iteration of the kernel lockdown patchset, from
    Matthew Garrett, David Howells and others.

    From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

    There are two major changes since this was last proposed for mainline:

    - Separating lockdown from EFI secure boot. Background discussion is
    covered here: https://lwn.net/Articles/751061/

    - Implementation as an LSM, with a default stackable lockdown LSM
    module. This allows the lockdown feature to be policy-driven,
    rather than encoding an implicit policy within the mechanism.

    The new locked_down LSM hook is provided to allow LSMs to make a
    policy decision around whether kernel functionality that would allow
    tampering with or examining the runtime state of the kernel should be
    permitted.

    The included lockdown LSM provides an implementation with a simple
    policy intended for general purpose use. This policy provides a coarse
    level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

    Enable the kernel lockdown feature. If set to integrity, kernel features
    that allow userland to modify the running kernel are disabled. If set to
    confidentiality, kernel features that allow userland to extract
    confidential information from the kernel are also disabled.

    This may also be controlled via /sys/kernel/security/lockdown and
    overriden by kernel configuration.

    New or existing LSMs may implement finer-grained controls of the
    lockdown features. Refer to the lockdown_reason documentation in
    include/linux/security.h for details.

    The lockdown feature has had signficant design feedback and review
    across many subsystems. This code has been in linux-next for some
    weeks, with a few fixes applied along the way.

    Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
    when kernel lockdown is in confidentiality mode") is missing a
    Signed-off-by from its author. Matthew responded that he is providing
    this under category (c) of the DCO"

    * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
    kexec: Fix file verification on S390
    security: constify some arrays in lockdown LSM
    lockdown: Print current->comm in restriction messages
    efi: Restrict efivar_ssdt_load when the kernel is locked down
    tracefs: Restrict tracefs when the kernel is locked down
    debugfs: Restrict debugfs when the kernel is locked down
    kexec: Allow kexec_file() with appropriate IMA policy when locked down
    lockdown: Lock down perf when in confidentiality mode
    bpf: Restrict bpf when kernel lockdown is in confidentiality mode
    lockdown: Lock down tracing and perf kprobes when in confidentiality mode
    lockdown: Lock down /proc/kcore
    x86/mmiotrace: Lock down the testmmiotrace module
    lockdown: Lock down module params that specify hardware parameters (eg. ioport)
    lockdown: Lock down TIOCSSERIAL
    lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
    acpi: Disable ACPI table override if the kernel is locked down
    acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
    ACPI: Limit access to custom_method when the kernel is locked down
    x86/msr: Restrict MSR access when the kernel is locked down
    x86: Lock down IO port access when the kernel is locked down
    ...

    Linus Torvalds
     

27 Sep, 2019

1 commit

  • Pull more perf updates from Ingo Molnar:
    "The only kernel change is comment typo fixes.

    The rest is mostly tooling fixes, but also new vendor event additions
    and updates, a bigger libperf/libtraceevent library and a header files
    reorganization that came in a bit late"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (108 commits)
    perf unwind: Fix libunwind build failure on i386 systems
    perf parser: Remove needless include directives
    perf build: Add detection of java-11-openjdk-devel package
    perf jvmti: Include JVMTI support for s390
    perf vendor events: Remove P8 HW events which are not supported
    perf evlist: Fix access of freed id arrays
    perf stat: Fix free memory access / memory leaks in metrics
    perf tools: Replace needless mmap.h with what is needed, event.h
    perf evsel: Move config terms to a separate header
    perf evlist: Remove unused perf_evlist__fprintf() method
    perf evsel: Introduce evsel_fprintf.h
    perf evsel: Remove need for symbol_conf in evsel_fprintf.c
    perf copyfile: Move copyfile routines to separate files
    libperf: Add perf_evlist__poll() function
    libperf: Add perf_evlist__add_pollfd() function
    libperf: Add perf_evlist__alloc_pollfd() function
    libperf: Add libperf_init() call to the tests
    libperf: Merge libperf_set_print() into libperf_init()
    libperf: Add libperf dependency for tests targets
    libperf: Use sys/types.h to get ssize_t, not unistd.h
    ...

    Linus Torvalds
     

25 Sep, 2019

3 commits

  • After all uprobes are removed from the huge page (with PTE pgtable), it is
    possible to collapse the pmd and benefit from THP again. This patch does
    the collapse by calling collapse_pte_mapped_thp().

    Link: http://lkml.kernel.org/r/20190815164525.1848545-7-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Kirill A. Shutemov
    Reported-by: kbuild test robot
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Use the newly added FOLL_SPLIT_PMD in uprobe. This preserves the huge
    page when the uprobe is enabled. When the uprobe is disabled, newer
    instances of the same application could still benefit from huge page.

    For the next step, we will enable khugepaged to regroup the pmd, so that
    existing instances of the application could also benefit from huge page
    after the uprobe is disabled.

    Link: http://lkml.kernel.org/r/20190815164525.1848545-5-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Kirill A. Shutemov
    Reviewed-by: Srikar Dronamraju
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Currently, uprobe swaps the target page with a anonymous page in both
    install_breakpoint() and remove_breakpoint(). When all uprobes on a page
    are removed, the given mm is still using an anonymous page (not the
    original page).

    This patch allows uprobe to use original page when possible (all uprobes
    on the page are already removed, and the original page is in page cache
    and uptodate).

    As suggested by Oleg, we unmap the old_page and let the original page
    fault in.

    Link: http://lkml.kernel.org/r/20190815164525.1848545-3-songliubraving@fb.com
    Signed-off-by: Song Liu
    Suggested-by: Oleg Nesterov
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     

21 Sep, 2019

1 commit

  • Fix typos in a few functions' documentation comments.

    Signed-off-by: Roy Ben Shlomo
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: royb@sentinelone.com
    Link: http://lore.kernel.org/lkml/20190920171254.31373-1-royb@sentinelone.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Roy Ben Shlomo
     

18 Sep, 2019

1 commit

  • Pull core timer updates from Thomas Gleixner:
    "Timers and timekeeping updates:

    - A large overhaul of the posix CPU timer code which is a preparation
    for moving the CPU timer expiry out into task work so it can be
    properly accounted on the task/process.

    An update to the bogus permission checks will come later during the
    merge window as feedback was not complete before heading of for
    travel.

    - Switch the timerqueue code to use cached rbtrees and get rid of the
    homebrewn caching of the leftmost node.

    - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
    single function

    - Implement the separation of hrtimers to be forced to expire in hard
    interrupt context even when PREEMPT_RT is enabled and mark the
    affected timers accordingly.

    - Implement a mechanism for hrtimers and the timer wheel to protect
    RT against priority inversion and live lock issues when a (hr)timer
    which should be canceled is currently executing the callback.
    Instead of infinitely spinning, the task which tries to cancel the
    timer blocks on a per cpu base expiry lock which is held and
    released by the (hr)timer expiry code.

    - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
    resulting in faster access to timekeeping functions.

    - Updates to various clocksource/clockevent drivers and their device
    tree bindings.

    - The usual small improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
    posix-cpu-timers: Fix permission check regression
    posix-cpu-timers: Always clear head pointer on dequeue
    hrtimer: Add a missing bracket and hide `migration_base' on !SMP
    posix-cpu-timers: Make expiry_active check actually work correctly
    posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
    tick: Mark sched_timer to expire in hard interrupt context
    hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
    x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
    posix-cpu-timers: Utilize timerqueue for storage
    posix-cpu-timers: Move state tracking to struct posix_cputimers
    posix-cpu-timers: Deduplicate rlimit handling
    posix-cpu-timers: Remove pointless comparisons
    posix-cpu-timers: Get rid of 64bit divisions
    posix-cpu-timers: Consolidate timer expiry further
    posix-cpu-timers: Get rid of zero checks
    rlimit: Rewrite non-sensical RLIMIT_CPU comment
    posix-cpu-timers: Respect INFINITY for hard RTTIME limit
    posix-cpu-timers: Switch thread group sampling to array
    posix-cpu-timers: Restructure expiry array
    posix-cpu-timers: Remove cputime_expires
    ...

    Linus Torvalds
     

17 Sep, 2019

2 commits

  • Pull scheduler updates from Ingo Molnar:

    - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
    Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
    Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

    As perf and the scheduler is getting bigger and more complex,
    document the status quo of current responsibilities and interests,
    and spread the review pain^H^H^H^H fun via an increase in the Cc:
    linecount generated by scripts/get_maintainer.pl. :-)

    - Add another series of patches that brings the -rt (PREEMPT_RT) tree
    closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
    into a new CONFIG_PREEMPTION category that will allow the eventual
    introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
    to go though.

    - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
    allow the finer shaping of CPU bandwidth usage.

    - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

    - Improve the behavior of high CPU count, high thread count
    applications running under cpu.cfs_quota_us constraints.

    - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

    - Improve CPU isolation housekeeping CPU allocation NUMA locality.

    - Fix deadline scheduler bandwidth calculations and logic when cpusets
    rebuilds the topology, or when it gets deadline-throttled while it's
    being offlined.

    - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
    setscheduler() system calls without creating global serialization.
    Add new synchronization between cpuset topology-changing events and
    the deadline acceptance tests in setscheduler(), which were broken
    before.

    - Rework the active_mm state machine to be less confusing and more
    optimal.

    - Rework (simplify) the pick_next_task() slowpath.

    - Improve load-balancing on AMD EPYC systems.

    - ... and misc cleanups, smaller fixes and improvements - please see
    the Git log for more details.

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
    sched/psi: Correct overly pessimistic size calculation
    sched/fair: Speed-up energy-aware wake-ups
    sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
    sched/uclamp: Update CPU's refcount on TG's clamp changes
    sched/uclamp: Use TG's clamps to restrict TASK's clamps
    sched/uclamp: Propagate system defaults to the root group
    sched/uclamp: Propagate parent clamps
    sched/uclamp: Extend CPU's cgroup controller
    sched/topology: Improve load balancing on AMD EPYC systems
    arch, ia64: Make NUMA select SMP
    sched, perf: MAINTAINERS update, add submaintainers and reviewers
    sched/fair: Use rq_lock/unlock in online_fair_sched_group
    cpufreq: schedutil: fix equation in comment
    sched: Rework pick_next_task() slow-path
    sched: Allow put_prev_task() to drop rq->lock
    sched/fair: Expose newidle_balance()
    sched: Add task_struct pointer to sched_class::set_curr_task
    sched: Rework CPU hotplug task selection
    sched/{rt,deadline}: Fix set_next_task vs pick_next_task
    sched: Fix kerneldoc comment for ia64_set_curr_task
    ...

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "Kernel side changes:

    - Improved kbprobes robustness

    - Intel PEBS support for PT hardware tracing

    - Other Intel PT improvements: high order pages memory footprint
    reduction and various related cleanups

    - Misc cleanups

    The perf tooling side has been very busy in this cycle, with over 300
    commits. This is an incomplete high-level summary of the many
    improvements done by over 30 developers:

    - Lots of updates to the following tools:

    'perf c2c'
    'perf config'
    'perf record'
    'perf report'
    'perf script'
    'perf test'
    'perf top'
    'perf trace'

    - Updates to libperf and libtraceevent, and a consolidation of the
    proliferation of x86 instruction decoder libraries.

    - Vendor event updates for Intel and PowerPC CPUs,

    - Updates to hardware tracing tooling for ARM and Intel CPUs,

    - ... and lots of other changes and cleanups - see the shortlog and
    Git log for details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (322 commits)
    kprobes: Prohibit probing on BUG() and WARN() address
    perf/x86: Make more stuff static
    x86, perf: Fix the dependency of the x86 insn decoder selftest
    objtool: Ignore intentional differences for the x86 insn decoder
    objtool: Update sync-check.sh from perf's check-headers.sh
    perf build: Ignore intentional differences for the x86 insn decoder
    perf intel-pt: Use shared x86 insn decoder
    perf intel-pt: Remove inat.c from build dependency list
    perf: Update .gitignore file
    objtool: Move x86 insn decoder to a common location
    perf metricgroup: Support multiple events for metricgroup
    perf metricgroup: Scale the metric result
    perf pmu: Change convert_scale from static to global
    perf symbols: Move mem_info and branch_info out of symbol.h
    perf auxtrace: Uninline functions that touch perf_session
    perf tools: Remove needless evlist.h include directives
    perf tools: Remove needless evlist.h include directives
    perf tools: Remove needless thread_map.h include directives
    perf tools: Remove needless thread.h include directives
    perf tools: Remove needless map.h include directives
    ...

    Linus Torvalds
     

16 Sep, 2019

1 commit


06 Sep, 2019

1 commit

  • If we disable the compiler's auto-initialization feature, if
    -fplugin-arg-structleak_plugin-byref or -ftrivial-auto-var-init=pattern
    are disabled, arch_hw_breakpoint may be used before initialization after:

    9a4903dde2c86 ("perf/hw_breakpoint: Split attribute parse and commit")

    On our ARM platform, the struct step_ctrl in arch_hw_breakpoint, which
    used to be zero-initialized by kzalloc(), may be used in
    arch_install_hw_breakpoint() without initialization.

    Signed-off-by: Mark-PK Tsai
    Cc: Alexander Shishkin
    Cc: Alix Wu
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: YJ Chiang
    Link: https://lkml.kernel.org/r/20190906060115.9460-1-mark-pk.tsai@mediatek.com
    [ Minor edits. ]
    Signed-off-by: Ingo Molnar

    Mark-PK Tsai
     

28 Aug, 2019

1 commit

  • In some cases, ordinary (non-AUX) events can generate data for AUX events.
    For example, PEBS events can come out as records in the Intel PT stream
    instead of their usual DS records, if configured to do so.

    One requirement for such events is to consistently schedule together, to
    ensure that the data from the "AUX output" events isn't lost while their
    corresponding AUX event is not scheduled. We use grouping to provide this
    guarantee: an "AUX output" event can be added to a group where an AUX event
    is a group leader, and provided that the former supports writing to the
    latter.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: kan.liang@linux.intel.com
    Link: https://lkml.kernel.org/r/20190806084606.4021-2-alexander.shishkin@linux.intel.com

    Alexander Shishkin
     

20 Aug, 2019

1 commit


02 Aug, 2019

1 commit

  • To guarantee that the multiplexing mechanism and the hrtimer driven events
    work on PREEMPT_RT enabled kernels it's required that the related hrtimers
    expire in hard interrupt context. Mark them so PREEMPT_RT kernels wont
    defer them to soft interrupt context.

    No functional change.

    [ tglx: Split out of larger combo patch. Added changelog ]

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20190726185753.169509224@linutronix.de

    Sebastian Andrzej Siewior