25 Jul, 2019

1 commit

  • Some hardware PMU drivers will override perf_event.cpu inside their
    event_init callback. This causes a lockdep splat when initialized through
    the kernel API:

    WARNING: CPU: 0 PID: 250 at kernel/events/core.c:2917 ctx_sched_out+0x78/0x208
    pc : ctx_sched_out+0x78/0x208
    Call trace:
    ctx_sched_out+0x78/0x208
    __perf_install_in_context+0x160/0x248
    remote_function+0x58/0x68
    generic_exec_single+0x100/0x180
    smp_call_function_single+0x174/0x1b8
    perf_install_in_context+0x178/0x188
    perf_event_create_kernel_counter+0x118/0x160

    Fix this by calling perf_install_in_context with event->cpu, just like
    perf_event_open

    Signed-off-by: Leonard Crestez
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Mark Rutland
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Frank Li
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/c4ebe0503623066896d7046def4d6b1e06e0eb2e.1563972056.git.leonard.crestez@nxp.com
    Signed-off-by: Ingo Molnar

    Leonard Crestez
     

20 Jul, 2019

1 commit


13 Jul, 2019

2 commits

  • So far, we tried to disallow grouping exclusive events for the fear of
    complications they would cause with moving between contexts. Specifically,
    moving a software group to a hardware context would violate the exclusivity
    rules if both groups contain matching exclusive events.

    This attempt was, however, unsuccessful: the check that we have in the
    perf_event_open() syscall is both wrong (looks at wrong PMU) and
    insufficient (group leader may still be exclusive), as can be illustrated
    by running:

    $ perf record -e '{intel_pt//,cycles}' uname
    $ perf record -e '{cycles,intel_pt//}' uname

    ultimately successfully.

    Furthermore, we are completely free to trigger the exclusivity violation
    by:

    perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'

    even though the helpful perf record will not allow that, the ABI will.

    The warning later in the perf_event_open() path will also not trigger, because
    it's also wrong.

    Fix all this by validating the original group before moving, getting rid
    of broken safeguards and placing a useful one to perf_install_in_context().

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: mathieu.poirier@linaro.org
    Cc: will.deacon@arm.com
    Fixes: bed5b25ad9c8a ("perf: Add a pmu capability for "exclusive" events")
    Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • Syzcaller reported the following Use-after-Free bug:

    close() clone()

    copy_process()
    perf_event_init_task()
    perf_event_init_context()
    mutex_lock(parent_ctx->mutex)
    inherit_task_group()
    inherit_group()
    inherit_event()
    mutex_lock(event->child_mutex)
    // expose event on child list
    list_add_tail()
    mutex_unlock(event->child_mutex)
    mutex_unlock(parent_ctx->mutex)

    ...
    goto bad_fork_*

    bad_fork_cleanup_perf:
    perf_event_free_task()

    perf_release()
    perf_event_release_kernel()
    list_for_each_entry()
    mutex_lock(ctx->mutex)
    mutex_lock(event->child_mutex)
    // event is from the failing inherit
    // on the other CPU
    perf_remove_from_context()
    list_move()
    mutex_unlock(event->child_mutex)
    mutex_unlock(ctx->mutex)

    mutex_lock(ctx->mutex)
    list_for_each_entry_safe()
    // event already stolen
    mutex_unlock(ctx->mutex)

    delayed_free_task()
    free_task()

    list_for_each_entry_safe()
    list_del()
    free_event()
    _free_event()
    // and so event->hw.target
    // is the already freed failed clone()
    if (event->hw.target)
    put_task_struct(event->hw.target)
    // WHOOPSIE, already quite dead

    Which puts the lie to the the comment on perf_event_free_task():
    'unexposed, unused context' not so much.

    Which is a 'fun' confluence of fail; copy_process() doing an
    unconditional free_task() and not respecting refcounts, and perf having
    creative locking. In particular:

    82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")

    seems to have overlooked this 'fun' parade.

    Solve it by using the fact that detached events still have a reference
    count on their (previous) context. With this perf_event_free_task()
    can detect when events have escaped and wait for their destruction.

    Debugged-by: Alexander Shishkin
    Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Mark Rutland
    Cc:
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Jul, 2019

1 commit

  • Pull perf updates from Ingo Molnar:
    "The main changes in this cycle on the kernel side were:

    - CPU PMU and uncore driver updates to Intel Snow Ridge, IceLake,
    KabyLake, AmberLake and WhiskeyLake CPUs.

    - Rework the MSR probing infrastructure to make it more robust, make
    it work better on virtualized systems and to better expose it on
    sysfs.

    - Rework PMU attributes group support based on the feedback from
    Greg. The core sysfs patch that adds sysfs_update_groups() was
    acked by Greg.

    There's a lot of perf tooling changes as well, all around the place:

    - vendor updates to Intel, cs-etm (ARM), ARM64, s390,

    - various enhancements to Intel PT tooling support:
    - Improve CBR (Core to Bus Ratio) packets support.
    - Export power and ptwrite events to sqlite and postgresql.
    - Add support for decoding PEBS via PT packets.
    - Add support for samples to contain IPC ratio, collecting cycles
    information from CYC packets, showing the IPC info periodically
    - Allow using time ranges

    - lots of updates to perf pmu, perf stat, perf trace, eBPF support,
    perf record, perf diff, etc. - please see the shortlog and Git log
    for details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (252 commits)
    tools arch x86: Sync asm/cpufeatures.h with the with the kernel
    tools build: Check if gettid() is available before providing helper
    perf jvmti: Address gcc string overflow warning for strncpy()
    perf python: Remove -fstack-protector-strong if clang doesn't have it
    perf annotate TUI browser: Do not use member from variable within its own initialization
    perf tests: Fix record+probe_libc_inet_pton.sh for powerpc64
    perf evsel: Do not rely on errno values for precise_ip fallback
    perf thread: Allow references to thread objects after machine__exit()
    perf header: Assign proper ff->ph in perf_event__synthesize_features()
    tools arch kvm: Sync kvm headers with the kernel sources
    perf script: Allow specifying the files to process guest samples
    perf tools metric: Don't include duration_time in group
    perf list: Avoid extra : for --raw metrics
    perf vendor events intel: Metric fixes for SKX/CLX
    perf tools: Fix typos / broken sentences
    perf jevents: Add support for Hisi hip08 L3C PMU aliasing
    perf jevents: Add support for Hisi hip08 HHA PMU aliasing
    perf jevents: Add support for Hisi hip08 DDRC PMU aliasing
    perf pmu: Support more complex PMU event aliasing
    perf diff: Documentation -c cycles option
    ...

    Linus Torvalds
     

09 Jul, 2019

4 commits

  • …iederm/user-namespace

    Pull force_sig() argument change from Eric Biederman:
    "A source of error over the years has been that force_sig has taken a
    task parameter when it is only safe to use force_sig with the current
    task.

    The force_sig function is built for delivering synchronous signals
    such as SIGSEGV where the userspace application caused a synchronous
    fault (such as a page fault) and the kernel responded with a signal.

    Because the name force_sig does not make this clear, and because the
    force_sig takes a task parameter the function force_sig has been
    abused for sending other kinds of signals over the years. Slowly those
    have been fixed when the oopses have been tracked down.

    This set of changes fixes the remaining abusers of force_sig and
    carefully rips out the task parameter from force_sig and friends
    making this kind of error almost impossible in the future"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
    signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
    signal: Remove the signal number and task parameters from force_sig_info
    signal: Factor force_sig_info_to_task out of force_sig_info
    signal: Generate the siginfo in force_sig
    signal: Move the computation of force into send_signal and correct it.
    signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
    signal: Remove the task parameter from force_sig_fault
    signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
    signal: Explicitly call force_sig_fault on current
    signal/unicore32: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from __do_user_fault
    signal/arm: Remove tsk parameter from ptrace_break
    signal/nds32: Remove tsk parameter from send_sigtrap
    signal/riscv: Remove tsk parameter from do_trap
    signal/sh: Remove tsk parameter from force_sig_info_fault
    signal/um: Remove task parameter from send_sigtrap
    signal/x86: Remove task parameter from send_sigtrap
    signal: Remove task parameter from force_sig_mceerr
    signal: Remove task parameter from force_sig
    signal: Remove task parameter from force_sigsegv
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The changes in this cycle are:

    - RCU flavor consolidation cleanups and optmizations

    - Documentation updates

    - Miscellaneous fixes

    - SRCU updates

    - RCU-sync flavor consolidation

    - Torture-test updates

    - Linux-kernel memory-consistency-model updates, most notably the
    addition of plain C-language accesses"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
    tools/memory-model: Improve data-race detection
    tools/memory-model: Change definition of rcu-fence
    tools/memory-model: Expand definition of barrier
    tools/memory-model: Do not use "herd" to refer to "herd7"
    tools/memory-model: Fix comment in MP+poonceonces.litmus
    Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic()
    rcu: Don't return a value from rcu_assign_pointer()
    rcu: Force inlining of rcu_read_lock()
    rcu: Fix irritating whitespace error in rcu_assign_pointer()
    rcu: Upgrade sync_exp_work_done() to smp_mb()
    rcutorture: Upper case solves the case of the vanishing NULL pointer
    torture: Suppress propagating trace_printk() warning
    rcutorture: Dump trace buffer for callback pipe drain failures
    torture: Add --trust-make to suppress "make clean"
    torture: Make --cpus override idleness calculations
    torture: Run kernel build in source directory
    torture: Add function graph-tracing cheat sheet
    torture: Capture qemu output
    rcutorture: Tweak kvm options
    rcutorture: Add trivial RCU implementation
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "The timer and timekeeping departement delivers:

    Core:

    - The consolidation of the VDSO code into a generic library including
    the conversion of x86 and ARM64. Conversion of ARM and MIPS are en
    route through the relevant maintainer trees and should end up in
    5.4.

    This gets rid of the unnecessary different copies of the same code
    and brings all architectures on the same level of VDSO
    functionality.

    - Make the NTP user space interface more robust by restricting the
    TAI offset to prevent undefined behaviour. Includes a selftest.

    - Validate user input in the compat settimeofday() syscall to catch
    invalid values which would be turned into valid values by a
    multiplication overflow

    - Consolidate the time accessors

    - Small fixes, improvements and cleanups all over the place

    Drivers:

    - Support for the NXP system counter, TI davinci timer

    - Move the Microsoft HyperV clocksource/events code into the
    drivers/clocksource directory so it can be shared between x86 and
    ARM64.

    - Overhaul of the Tegra driver

    - Delay timer support for IXP4xx

    - Small fixes, improvements and cleanups as usual"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    time: Validate user input in compat_settimeofday()
    timer: Document TIMER_PINNED
    clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic
    clocksource/drivers: Make Hyper-V clocksource ISA agnostic
    MAINTAINERS: Fix Andy's surname and the directory entries of VDSO
    hrtimer: Use a bullet for the returns bullet list
    arm64: vdso: Fix compilation with clang older than 8
    arm64: compat: Fix __arch_get_hw_counter() implementation
    arm64: Fix __arch_get_hw_counter() implementation
    lib/vdso: Make delta calculation work correctly
    MAINTAINERS: Add entry for the generic VDSO library
    arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system
    arm64: vdso: Remove unnecessary asm-offsets.c definitions
    vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h
    clocksource/drivers/davinci: Add support for clocksource
    clocksource/drivers/davinci: Add support for clockevents
    clocksource/drivers/tegra: Set up maximum-ticks limit properly
    clocksource/drivers/tegra: Cycles can't be 0
    clocksource/drivers/tegra: Restore base address before cleanup
    clocksource/drivers/tegra: Add verbose definition for 1MHz constant
    ...

    Linus Torvalds
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

29 Jun, 2019

1 commit

  • …k/linux-rcu into core/rcu

    Pull rcu/next + tools/memory-model changes from Paul E. McKenney:

    - RCU flavor consolidation cleanups and optmizations
    - Documentation updates
    - Miscellaneous fixes
    - SRCU updates
    - RCU-sync flavor consolidation
    - Torture-test updates
    - Linux-kernel memory-consistency-model updates, most notably the addition of plain C-language accesses

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

27 Jun, 2019

1 commit


25 Jun, 2019

3 commits

  • Currently perf_rotate_context assumes that if the context's nr_events !=
    nr_active a rotation is necessary for perf event multiplexing. With
    cgroups, nr_events is the total count of events for all cgroups and
    nr_active will not include events in a cgroup other than the current
    task's. This makes rotation appear necessary for cgroups when it is not.

    Add a perf_event_context flag that is set when rotation is necessary.
    Clear the flag during sched_out and set it when a flexible sched_in
    fails due to resources.

    Signed-off-by: Ian Rogers
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: https://lkml.kernel.org/r/20190601082722.44543-1-irogers@google.com
    Signed-off-by: Ingo Molnar

    Ian Rogers
     
  • The perf fuzzer caused Skylake machine to crash:

    [ 9680.085831] Call Trace:
    [ 9680.088301]
    [ 9680.090363] perf_output_sample_regs+0x43/0xa0
    [ 9680.094928] perf_output_sample+0x3aa/0x7a0
    [ 9680.099181] perf_event_output_forward+0x53/0x80
    [ 9680.103917] __perf_event_overflow+0x52/0xf0
    [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0
    [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150
    [ 9680.117475] ? check_preempt_wakeup+0x181/0x230
    [ 9680.122091] ? check_preempt_curr+0x62/0x90
    [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140
    [ 9680.130355] ? try_to_wake_up+0x54/0x460
    [ 9680.134366] ? reweight_entity+0x15b/0x1a0
    [ 9680.138559] ? __queue_work+0x103/0x3f0
    [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270
    [ 9680.147194] ? timerqueue_del+0x1e/0x40
    [ 9680.151092] ? __remove_hrtimer+0x35/0x70
    [ 9680.155191] __hrtimer_run_queues+0x100/0x280
    [ 9680.159658] hrtimer_interrupt+0x100/0x220
    [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140
    [ 9680.168555] apic_timer_interrupt+0xf/0x20
    [ 9680.172756]

    The XMM registers can only be collected by PEBS hardware events on the
    platforms with PEBS baseline support, e.g. Icelake, not software/probe
    events.

    Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU
    which support extended registers. For X86, the extended registers are
    XMM registers.

    Add has_extended_regs() to check if extended registers are applied.

    The generic code define the mask of extended registers as 0 if arch
    headers haven't overridden it.

    Originally-by: Peter Zijlstra (Intel)
    Reported-by: Vince Weaver
    Signed-off-by: Kan Liang
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers")
    Link: https://lkml.kernel.org/r/1559081314-9714-1-git-send-email-kan.liang@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kan Liang
     
  • perf_event_open() limits the sample_period to 63 bits. See:

    0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")

    Make ioctl() consistent with it.

    Also on PowerPC, negative sample_period could cause a recursive
    PMIs leading to a hang (reported when running perf-fuzzer).

    Signed-off-by: Ravi Bangoria
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: acme@kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: maddy@linux.vnet.ibm.com
    Cc: mpe@ellerman.id.au
    Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")
    Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bangoria@linux.ibm.com
    Signed-off-by: Ingo Molnar

    Ravi Bangoria
     

22 Jun, 2019

1 commit


17 Jun, 2019

1 commit

  • perf_sample_regs_user() uses 'current->mm' to test for the presence of
    userspace, but this is insufficient, consider use_mm().

    A better test is: '!(current->flags & PF_KTHREAD)', exec() clears
    PF_KTHREAD after it sets the new ->mm but before it drops to userspace
    for the first time.

    Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process")

    Reported-by: Ravi Bangoria
    Reported-by: Young Xiao
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Michael Ellerman
    Cc: Naveen N. Rao
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jun, 2019

2 commits

  • Adding attr_update attribute group into pmu, to allow
    having multiple attribute groups for same group name.

    This will allow us to update "events" or "format"
    directories with attributes that depend on various
    HW conditions.

    For example having group_format_extra group that updates
    "format" directory only if pmu version is 2 and higher:

    static umode_t
    exra_is_visible(struct kobject *kobj, struct attribute *attr, int i)
    {
    return x86_pmu.version >= 2 ? attr->mode : 0;
    }

    static struct attribute_group group_format_extra = {
    .name = "format",
    .is_visible = exra_is_visible,
    };

    Signed-off-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190512155518.21468-3-jolsa@kernel.org
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     
  • Currently, non-privileged user could only use uprobe with

    kernel.perf_event_paranoid = -1

    However, setting perf_event_paranoid to -1 leaks other users' processes to
    non-privileged uprobes.

    To introduce proper permission control of uprobes, we are building the
    following system:

    A daemon with CAP_SYS_ADMIN is in charge to create uprobes via tracefs;
    Users asks the daemon to create uprobes;
    Then user can attach uprobe only to processes owned by the user.

    This patch allows non-privileged user to attach uprobe to processes owned
    by the user.

    The following example shows how to use uprobe with non-privileged user.
    This is based on Brendan's blog post [1]

    1. Create uprobe with root:

    sudo perf probe -x 'readline%return +0($retval):string'

    2. Then non-root user can use the uprobe as:

    perf record -vvv -e probe_bash:readline__return -p sleep 20
    perf script

    [1] http://www.brendangregg.com/blog/2015-06-28/linux-ftrace-uprobe.html

    Signed-off-by: Song Liu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190507161545.788381-1-songliubraving@fb.com
    Signed-off-by: Ingo Molnar

    Song Liu
     

29 May, 2019

1 commit


27 May, 2019

1 commit


24 May, 2019

4 commits

  • While the IRQ/NMI will nest, the nest-count will be invariant over the
    actual exception, since it will decrement equal to increment.

    This means we can -- carefully -- use a regular variable since the
    typical LOAD-STORE race doesn't exist (similar to preempt_count).

    This optimizes the ring-buffer for all LOAD-STORE architectures, since
    they need to use atomic ops to implement local_t.

    Suggested-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: acme@kernel.org
    Cc: mark.rutland@arm.com
    Cc: namhyung@kernel.org
    Cc: yabinc@google.com
    Link: http://lkml.kernel.org/r/20190517115418.481392777@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We must use {READ,WRITE}_ONCE() on rb->user_page data such that
    concurrent usage will see whole values. A few key sites were missing
    this.

    Suggested-by: Yabin Cui
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: acme@kernel.org
    Cc: mark.rutland@arm.com
    Cc: namhyung@kernel.org
    Fixes: 7b732a750477 ("perf_counter: new output ABI - part 1")
    Link: http://lkml.kernel.org/r/20190517115418.394192145@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Similar to how decrementing rb->next too early can cause data_head to
    (temporarily) be observed to go backward, so too can this happen when
    we increment too late.

    This barrier() ensures the rb->head load happens after the increment,
    both the one in the 'goto again' path, as the one from
    perf_output_get_handle() -- albeit very unlikely to matter for the
    latter.

    Suggested-by: Yabin Cui
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: acme@kernel.org
    Cc: mark.rutland@arm.com
    Cc: namhyung@kernel.org
    Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables")
    Link: http://lkml.kernel.org/r/20190517115418.309516009@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In perf_output_put_handle(), an IRQ/NMI can happen in below location and
    write records to the same ring buffer:

    ...
    local_dec_and_test(&rb->nest)
    ... user_page->data_head = head;
    ...

    In this case, a value A is written to data_head in the IRQ, then a value
    B is written to data_head after the IRQ. And A > B. As a result,
    data_head is temporarily decreased from A to B. And a reader may see
    data_head < data_tail if it read the buffer frequently enough, which
    creates unexpected behaviors.

    This can be fixed by moving dec(&rb->nest) to after updating data_head,
    which prevents the IRQ/NMI above from updating data_head.

    [ Split up by peterz. ]

    Signed-off-by: Yabin Cui
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: mark.rutland@arm.com
    Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables")
    Link: http://lkml.kernel.org/r/20190517115418.224478157@infradead.org
    Signed-off-by: Ingo Molnar

    Yabin Cui
     

15 May, 2019

2 commits

  • This updates each existing invalidation to use the correct mmu notifier
    event that represent what is happening to the CPU page table. See the
    patch which introduced the events to see the rational behind this.

    Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     
  • CPU page table update can happens for many reasons, not only as a result
    of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
    a result of kernel activities (memory compression, reclaim, migration,
    ...).

    Users of mmu notifier API track changes to the CPU page table and take
    specific action for them. While current API only provide range of virtual
    address affected by the change, not why the changes is happening.

    This patchset do the initial mechanical convertion of all the places that
    calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
    event as well as the vma if it is know (most invalidation happens against
    a given vma). Passing down the vma allows the users of mmu notifier to
    inspect the new vma page protection.

    The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
    should assume that every for the range is going away when that event
    happens. A latter patch do convert mm call path to use a more appropriate
    events for each call.

    This is done as 2 patches so that no call site is forgotten especialy
    as it uses this following coccinelle patch:

    %vm_mm, E3, E4)
    ...>

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(..., struct vm_area_struct *VMA, ...) {
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN, VMA;
    @@
    FN(...) {
    struct vm_area_struct *VMA;
    }

    @@
    expression E1, E2, E3, E4;
    identifier FN;
    @@
    FN(...) {
    }
    ---------------------------------------------------------------------->%

    Applied with:
    spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
    spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
    spatch --sp-file mmu-notifier.spatch --dir mm --in-place

    Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Reviewed-by: Ralph Campbell
    Reviewed-by: Ira Weiny
    Cc: Christian König
    Cc: Joonas Lahtinen
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Jan Kara
    Cc: Andrea Arcangeli
    Cc: Peter Xu
    Cc: Felix Kuehling
    Cc: Jason Gunthorpe
    Cc: Ross Zwisler
    Cc: Dan Williams
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Michal Hocko
    Cc: Christian Koenig
    Cc: John Hubbard
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

08 May, 2019

1 commit

  • Pull printk updates from Petr Mladek:

    - Allow state reset of printk_once() calls.

    - Prevent crashes when dereferencing invalid pointers in vsprintf().
    Only the first byte is checked for simplicity.

    - Make vsprintf warnings consistent and inlined.

    - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
    modifiers.

    - Some clean up of vsprintf and test_printf code.

    * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    lib/vsprintf: Make function pointer_string static
    vsprintf: Limit the length of inlined error messages
    vsprintf: Avoid confusion between invalid address and value
    vsprintf: Prevent crash when dereferencing invalid pointers
    vsprintf: Consolidate handling of unknown pointer specifiers
    vsprintf: Factor out %pO handler as kobject_string()
    vsprintf: Factor out %pV handler as va_format()
    vsprintf: Factor out %p[iI] handler as ip_addr_string()
    vsprintf: Do not check address of well-known strings
    vsprintf: Consistent %pK handling for kptr_restrict == 0
    vsprintf: Shuffle restricted_pointer()
    printk: Tie printk_once / printk_deferred_once into .data.once for reset
    treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
    lib/test_printf: Switch to bitmap_zalloc()

    Linus Torvalds
     

07 May, 2019

2 commits

  • Pull x86 mm updates from Ingo Molnar:
    "The changes in here are:

    - text_poke() fixes and an extensive set of executability lockdowns,
    to (hopefully) eliminate the last residual circumstances under
    which we are using W|X mappings even temporarily on x86 kernels.
    This required a broad range of surgery in text patching facilities,
    module loading, trampoline handling and other bits.

    - tweak page fault messages to be more informative and more
    structured.

    - remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
    default.

    - reduce KASLR granularity on 5-level paging kernels from 512 GB to
    1 GB.

    - misc other changes and updates"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    x86/mm: Initialize PGD cache during mm initialization
    x86/alternatives: Add comment about module removal races
    x86/kprobes: Use vmalloc special flag
    x86/ftrace: Use vmalloc special flag
    bpf: Use vmalloc special flag
    modules: Use vmalloc special flag
    mm/vmalloc: Add flag for freeing of special permsissions
    mm/hibernation: Make hibernation handle unmapped pages
    x86/mm/cpa: Add set_direct_map_*() functions
    x86/alternatives: Remove the return value of text_poke_*()
    x86/jump-label: Remove support for custom text poker
    x86/modules: Avoid breaking W^X while loading modules
    x86/kprobes: Set instruction page as executable
    x86/ftrace: Set trampoline pages as executable
    x86/kgdb: Avoid redundant comparison of patched code
    x86/alternatives: Use temporary mm for text poking
    x86/alternatives: Initialize temporary mm for patching
    fork: Provide a function for copying init_mm
    uprobes: Initialize uprobes earlier
    x86/mm: Save debug registers when loading a temporary mm
    ...

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "The main kernel changes were:

    - add support for Intel's "adaptive PEBS v4" - which embedds LBS data
    in PEBS records and can thus batch up and reduce the IRQ (NMI) rate
    significantly - reducing overhead and making call-graph profiling
    less intrusive.

    - add Intel CPU core and uncore support updates for Tremont, Icelake,

    - extend the x86 PMU constraints scheduler with 'constraint ranges'
    to better support Icelake hw constraints,

    - make x86 call-chain support work better with CONFIG_FRAME_POINTER=y

    - misc other changes

    Tooling changes:

    - updates to the main tools: 'perf record', 'perf trace', 'perf
    stat'

    - updated Intel and S/390 vendor events

    - libtraceevent updates

    - misc other updates and fixes"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
    perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER
    watchdog: Fix typo in comment
    perf/x86/intel: Add Tremont core PMU support
    perf/x86/intel/uncore: Add Intel Icelake uncore support
    perf/x86/msr: Add Icelake support
    perf/x86/intel/rapl: Add Icelake support
    perf/x86/intel/cstate: Add Icelake support
    perf/x86/intel: Add Icelake support
    perf/x86: Support constraint ranges
    perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them
    perf/x86/intel: Support adaptive PEBS v4
    perf/x86/intel/ds: Extract code of event update in short period
    perf/x86/intel: Extract memory code PEBS parser for reuse
    perf/x86: Support outputting XMM registers
    perf/x86/intel: Force resched when TFA sysctl is modified
    perf/core: Add perf_pmu_resched() as global function
    perf/headers: Fix stale comment for struct perf_addr_filter
    perf/core: Make perf_swevent_init_cpu() static
    perf/x86: Add sanity checks to x86_schedule_events()
    perf/x86: Optimize x86_schedule_events()
    ...

    Linus Torvalds
     

03 May, 2019

1 commit

  • This recent commit:

    5768402fd9c6e87 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically")

    overlooked the fact that the previous one page granularity of the AUX buffer
    provided an implicit double buffering capability to the PMU driver, which
    went away when the entire buffer became one high-order page.

    Always make the full-trace mode AUX allocation at least two-part to preserve
    the previous behavior and allow the implicit double buffering to continue.

    Reported-by: Ammy Yi
    Signed-off-by: Alexander Shishkin
    Acked-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: adrian.hunter@intel.com
    Fixes: 5768402fd9c6e87 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically")
    Link: http://lkml.kernel.org/r/20190503085536.24119-2-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     

30 Apr, 2019

1 commit

  • In order to have a separate address space for text poking, we need to
    duplicate init_mm early during start_kernel(). This, however, introduces
    a problem since uprobes functions are called from dup_mmap(), but
    uprobes is still not initialized in this early stage.

    Since uprobes initialization is necassary for fork, and since all the
    dependant initialization has been done when fork is initialized (percpu
    and vmalloc), move uprobes initialization to fork_init(). It does not
    seem uprobes introduces any security problem for the poking_mm.

    Crash and burn if uprobes initialization fails, similarly to other early
    initializations. Change the init_probes() name to probes_init() to match
    other early initialization functions name convention.

    Reported-by: kernel test robot
    Signed-off-by: Nadav Amit
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Rick Edgecombe
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: ard.biesheuvel@linaro.org
    Cc: deneen.t.dock@intel.com
    Cc: kernel-hardening@lists.openwall.com
    Cc: kristen@linux.intel.com
    Cc: linux_dti@icloud.com
    Cc: will.deacon@arm.com
    Link: https://lkml.kernel.org/r/20190426232303.28381-6-nadav.amit@gmail.com
    Signed-off-by: Ingo Molnar

    Nadav Amit
     

16 Apr, 2019

5 commits

  • This patch add perf_pmu_resched() a global function that can be called
    to force rescheduling of events for a given PMU. The function locks
    both cpuctx and task_ctx internally. This will be used by a subsequent
    patch.

    Signed-off-by: Stephane Eranian
    [ Simplified the calling convention. ]
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: kan.liang@intel.com
    Cc: nelson.dsouza@intel.com
    Cc: tonyj@suse.com
    Link: https://lkml.kernel.org/r/20190408173252.37932-2-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The following commit:

    1627314fb54a33e ("perf: Suppress AUX/OVERWRITE records")

    has an unintended side-effect of also suppressing all AUX records with no flags
    and non-zero size, so all the regular records in the full trace mode.
    This breaks some use cases for people.

    Fix this by restoring "regular" AUX records.

    Reported-by: Ben Gainey
    Tested-by: Ben Gainey
    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: 1627314fb54a33e ("perf: Suppress AUX/OVERWRITE records")
    Link: https://lkml.kernel.org/r/20190329091338.29999-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • The following recent commit:

    c60f83b813e5 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset")

    changes the address filtering logic to communicate filter ranges to the PMU driver
    via a single address range object, instead of having the driver do the final bit of
    math.

    That change forgets to take into account kernel filters, which are not calculated
    the same way as DSO based filters.

    Fix that by passing the kernel filters the same way as file-based filters.
    This doesn't require any additional changes in the drivers.

    Reported-by: Adrian Hunter
    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: c60f83b813e5 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset")
    Link: https://lkml.kernel.org/r/20190329091212.29870-1-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

12 Apr, 2019

1 commit

  • Thomas-Mich Richter reported he triggered a WARN()ing from event_function_local()
    on his s390. The problem boils down to:

    CPU-A CPU-B

    perf_event_overflow()
    perf_event_disable_inatomic()
    @pending_disable = 1
    irq_work_queue();

    sched-out
    event_sched_out()
    @pending_disable = 0

    sched-in
    perf_event_overflow()
    perf_event_disable_inatomic()
    @pending_disable = 1;
    irq_work_queue(); // FAILS

    irq_work_run()
    perf_pending_event()
    if (@pending_disable)
    perf_event_disable_local(); // WHOOPS

    The problem exists in generic, but s390 is particularly sensitive
    because it doesn't implement arch_irq_work_raise(), nor does it call
    irq_work_run() from it's PMU interrupt handler (nor would that be
    sufficient in this case, because s390 also generates
    perf_event_overflow() from pmu::stop). Add to that the fact that s390
    is a virtual architecture and (virtual) CPU-A can stall long enough
    for the above race to happen, even if it would self-IPI.

    Adding a irq_work_sync() to event_sched_in() would work for all hardare
    PMUs that properly use irq_work_run() but fails for software PMUs.

    Instead encode the CPU number in @pending_disable, such that we can
    tell which CPU requested the disable. This then allows us to detect
    the above scenario and even redirect the IPI to make up for the failed
    queue.

    Reported-by: Thomas-Mich Richter
    Tested-by: Thomas Richter
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Mark Rutland
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Heiko Carstens
    Cc: Hendrik Brueckner
    Cc: Jiri Olsa
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 Apr, 2019

1 commit

  • %pF and %pf are functionally equivalent to %pS and %ps conversion
    specifiers. The former are deprecated, therefore switch the current users
    to use the preferred variant.

    The changes have been produced by the following command:

    git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
    while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

    And verifying the result.

    Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
    Cc: Andy Shevchenko
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-um@lists.infradead.org
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: drbd-dev@lists.linbit.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-mm@kvack.org
    Cc: ceph-devel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Sakari Ailus
    Acked-by: David Sterba (for btrfs)
    Acked-by: Mike Rapoport (for mm/memblock.c)
    Acked-by: Bjorn Helgaas (for drivers/pci)
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Petr Mladek

    Sakari Ailus
     

03 Apr, 2019

1 commit

  • 'make W=1' causes GCC to complain:

    kernel/events/core.c:11877:6: warning: no previous prototype for 'perf_swevent_init_cpu' [-Wmissing-prototypes]

    It's not referenced anywhere else, make it static.

    Signed-off-by: Valdis Kletnieks
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/28974.1552377997@turing-police
    Signed-off-by: Ingo Molnar

    Valdis Kletnieks
     

23 Mar, 2019

1 commit

  • …ux/kernel/git/acme/linux into perf/urgent

    Pull perf/core improvements and fixes from Arnaldo:

    kernel:

    Stephane Eranian :

    - Restore mmap record type correctly when handling PERF_RECORD_MMAP2
    events, as the same template is used for all the threads interested
    in mmap events, some may want just PERF_RECORD_MMAP, while some
    may want the extra info in MMAP2 records.

    perf probe:

    Adrian Hunter:

    - Fix getting the kernel map, because since changes related to x86 PTI
    entry trampolines handling, there are more than one kernel map.

    perf script:

    Andi Kleen:

    - Support insn output for normal samples, i.e.:

    perf script -F ip,sym,insn --xed

    Will fetch the sample IP from the thread address space and feed it
    to Intel's XED disassembler, producing lines such as:

    ffffffffa4068804 native_write_msr wrmsr
    ffffffffa415b95e __hrtimer_next_event_base movq 0x18(%rax), %rdx

    That match 'perf annotate's output.

    - Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in
    addition to PERF_RECORD_SAMPLE.

    perf report:

    - Add a new --samples option to save a small random number of samples
    per hist entry, using a reservoir technique to select a representative
    number of samples.

    Then allow browsing the samples using 'perf script' as part of the hist
    entry context menu. This automatically adds the right filters, so only
    the thread or CPU of the sample is displayed. Then we use less' search
    functionality to directly jump to the time stamp of the selected sample.

    It uses different menus for assembler and source display. Assembler
    needs xed installed and source needs debuginfo.

    - Fix the UI browser scripts pop up menu when there are many scripts
    available.

    perf report:

    Andi Kleen:

    - Add 'time' sort option. E.g.:

    % perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
    ...
    0.67% 277061.87300 [.] _dl_start
    0.50% 277061.87300 [.] f1
    0.50% 277061.87300 [.] f2
    0.33% 277061.87300 [.] main
    0.29% 277061.87300 [.] _dl_lookup_symbol_x
    0.29% 277061.87300 [.] dl_main
    0.29% 277061.87300 [.] do_lookup_x
    0.17% 277061.87300 [.] _dl_debug_initialize
    0.17% 277061.87300 [.] _dl_init_paths
    0.08% 277061.87300 [.] check_match
    0.04% 277061.87300 [.] _dl_count_modids
    1.33% 277061.87400 [.] f1
    1.33% 277061.87400 [.] f2
    1.33% 277061.87400 [.] main
    1.17% 277061.87500 [.] main
    1.08% 277061.87500 [.] f1
    1.08% 277061.87500 [.] f2
    1.00% 277061.87600 [.] main
    0.83% 277061.87600 [.] f1
    0.83% 277061.87600 [.] f2
    1.00% 277061.87700 [.] main

    tools headers:

    Arnaldo Carvalho de Melo:

    - Update x86's syscall_64.tbl, no change in tools/perf behaviour.

    - Sync copies asm-generic/unistd.h and linux/in with the kernel sources.

    perf data:

    Jiri Olsa:

    - Prep work to support having perf.data stored as a directory, with one
    file per CPU, that ultimately will allow having one ring buffer reading
    thread per CPU.

    Vendor events:

    Martin Liška:

    - perf PMU events for AMD Family 17h.

    perf script python:

    Tony Jones:

    - Add python3 support for the remaining Intel PT related scripts, with
    these we should have a clean build of perf with python3 while still
    supporting the build with python2.

    libbpf:

    Arnaldo Carvalho de Melo:

    - Fix the build on uCLibc, adding the missing stdarg.h since we use
    va_list in one typedef.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

    Thomas Gleixner