28 Oct, 2019

1 commit


22 Oct, 2019

1 commit

  • A race condition exists while initialiazing perf_trace_buf from
    perf_trace_init() and perf_kprobe_init().

    CPU0 CPU1
    perf_trace_init()
    mutex_lock(&event_mutex)
    perf_trace_event_init()
    perf_trace_event_reg()
    total_ref_count == 0
    buf = alloc_percpu()
    perf_trace_buf[i] = buf
    tp_event->class->reg() //fails perf_kprobe_init()
    goto fail perf_trace_event_init()
    perf_trace_event_reg()
    fail:
    total_ref_count == 0

    total_ref_count == 0
    buf = alloc_percpu()
    perf_trace_buf[i] = buf
    tp_event->class->reg()
    total_ref_count++

    free_percpu(perf_trace_buf[i])
    perf_trace_buf[i] = NULL

    Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
    causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
    getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
    event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
    fix this race.

    The race caused the following bug:

    Unable to handle kernel paging request at virtual address 0000003106f2003c
    Mem abort info:
    ESR = 0x96000045
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    Data abort info:
    ISV = 0, ISS = 0x00000045
    CM = 0, WnR = 1
    user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
    [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
    Internal error: Oops: 96000045 [#1] PREEMPT SMP
    Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
    pstate: 80400005 (Nzcv daif +PAN -UAO)
    pc : __memset+0x20/0x1ac
    lr : memset+0x3c/0x50
    sp : ffffffc09319fc50

    __memset+0x20/0x1ac
    perf_trace_buf_alloc+0x140/0x1a0
    perf_trace_sys_enter+0x158/0x310
    syscall_trace_enter+0x348/0x7c0
    el0_svc_common+0x11c/0x368
    el0_svc_handler+0x12c/0x198
    el0_svc+0x8/0xc

    Ramdumps showed the following:
    total_ref_count = 3
    perf_trace_buf = (
    0x0 -> NULL,
    0x0 -> NULL,
    0x0 -> NULL,
    0x0 -> NULL)

    Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org

    Cc: stable@vger.kernel.org
    Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU")
    Acked-by: Song Liu
    Signed-off-by: Prateek Sood
    Signed-off-by: Steven Rostedt (VMware)

    Prateek Sood
     

18 Oct, 2019

1 commit

  • In current mainline, the degree of access to perf_event_open(2) system
    call depends on the perf_event_paranoid sysctl. This has a number of
    limitations:

    1. The sysctl is only a single value. Many types of accesses are controlled
    based on the single value thus making the control very limited and
    coarse grained.
    2. The sysctl is global, so if the sysctl is changed, then that means
    all processes get access to perf_event_open(2) opening the door to
    security issues.

    This patch adds LSM and SELinux access checking which will be used in
    Android to access perf_event_open(2) for the purposes of attaching BPF
    programs to tracepoints, perf profiling and other operations from
    userspace. These operations are intended for production systems.

    5 new LSM hooks are added:
    1. perf_event_open: This controls access during the perf_event_open(2)
    syscall itself. The hook is called from all the places that the
    perf_event_paranoid sysctl is checked to keep it consistent with the
    systctl. The hook gets passed a 'type' argument which controls CPU,
    kernel and tracepoint accesses (in this context, CPU, kernel and
    tracepoint have the same semantics as the perf_event_paranoid sysctl).
    Additionally, I added an 'open' type which is similar to
    perf_event_paranoid sysctl == 3 patch carried in Android and several other
    distros but was rejected in mainline [1] in 2016.

    2. perf_event_alloc: This allocates a new security object for the event
    which stores the current SID within the event. It will be useful when
    the perf event's FD is passed through IPC to another process which may
    try to read the FD. Appropriate security checks will limit access.

    3. perf_event_free: Called when the event is closed.

    4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.

    5. perf_event_write: Called from the ioctl(2) syscalls for the event.

    [1] https://lwn.net/Articles/696240/

    Since Peter had suggest LSM hooks in 2016 [1], I am adding his
    Suggested-by tag below.

    To use this patch, we set the perf_event_paranoid sysctl to -1 and then
    apply selinux checking as appropriate (default deny everything, and then
    add policy rules to give access to domains that need it). In the future
    we can remove the perf_event_paranoid sysctl altogether.

    Suggested-by: Peter Zijlstra
    Co-developed-by: Peter Zijlstra
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: James Morris
    Cc: Arnaldo Carvalho de Melo
    Cc: rostedt@goodmis.org
    Cc: Yonghong Song
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: Alexei Starovoitov
    Cc: jeffv@google.com
    Cc: Jiri Olsa
    Cc: Daniel Borkmann
    Cc: primiano@google.com
    Cc: Song Liu
    Cc: rsavitski@google.com
    Cc: Namhyung Kim
    Cc: Matthew Garrett
    Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org

    Joel Fernandes (Google)
     

17 Jul, 2019

1 commit


21 Feb, 2019

1 commit

  • The first version of this method was missing the check for
    `ret == PATH_MAX`; then such a check was added, but it didn't call kfree()
    on error, so there was still a small memory leak in the error case.
    Fix it by using strndup_user() instead of open-coding it.

    Link: http://lkml.kernel.org/r/20190220165443.152385-1-jannh@google.com

    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Fixes: 0eadcc7a7bc0 ("perf/core: Fix perf_uprobe_init()")
    Reviewed-by: Masami Hiramatsu
    Acked-by: Song Liu
    Signed-off-by: Jann Horn
    Signed-off-by: Steven Rostedt (VMware)

    Jann Horn
     

11 Oct, 2018

1 commit

  • This patch enables uprobes with reference counter in fd-based uprobe.
    Highest 32 bits of perf_event_attr.config is used to stored offset
    of the reference count (semaphore).

    Format information in /sys/bus/event_source/devices/uprobe/format/ is
    updated to reflect this new feature.

    Link: http://lkml.kernel.org/r/20181002053636.1896903-1-songliubraving@fb.com

    Cc: Oleg Nesterov
    Acked-by: Peter Zijlstra (Intel)
    Reviewed-and-tested-by: Ravi Bangoria
    Signed-off-by: Song Liu
    Signed-off-by: Steven Rostedt (VMware)

    Song Liu
     

17 Aug, 2018

1 commit


10 Apr, 2018

2 commits

  • Similarly to the uprobe PMU fix in perf_kprobe_init(), fix error
    handling in perf_uprobe_init() as well.

    Reported-by: 范龙飞
    Signed-off-by: Song Liu
    Acked-by: Masami Hiramatsu
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: e12f03d7031a ("perf/core: Implement the 'perf_kprobe' PMU")
    Signed-off-by: Ingo Molnar

    Song Liu
     
  • Fix error handling in perf_kprobe_init():

    ==================================================================
    BUG: KASAN: slab-out-of-bounds in strlen+0x8e/0xa0 lib/string.c:482
    Read of size 1 at addr ffff88003f9cc5c0 by task syz-executor2/23095

    CPU: 0 PID: 23095 Comm: syz-executor2 Not tainted 4.16.0+ #24
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0xca/0x13e lib/dump_stack.c:113
    print_address_description+0x6e/0x2c0 mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report+0x256/0x380 mm/kasan/report.c:412
    strlen+0x8e/0xa0 lib/string.c:482
    kstrdup+0x21/0x70 mm/util.c:55
    alloc_trace_kprobe+0xc8/0x930 kernel/trace/trace_kprobe.c:325
    create_local_trace_kprobe+0x4f/0x3a0 kernel/trace/trace_kprobe.c:1438
    perf_kprobe_init+0x149/0x1f0 kernel/trace/trace_event_perf.c:264
    perf_kprobe_event_init+0xa8/0x120 kernel/events/core.c:8407
    perf_try_init_event+0xcb/0x2a0 kernel/events/core.c:9719
    perf_init_event kernel/events/core.c:9750 [inline]
    perf_event_alloc+0x1367/0x1e20 kernel/events/core.c:10022
    SYSC_perf_event_open+0x242/0x2330 kernel/events/core.c:10477
    do_syscall_64+0x198/0x640 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    Reported-by: 范龙飞
    Signed-off-by: Masami Hiramatsu
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Song Liu
    Cc: Thomas Gleixner
    Fixes: e12f03d7031a ("perf/core: Implement the 'perf_kprobe' PMU")
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

06 Feb, 2018

2 commits

  • This patch adds perf_uprobe support with similar pattern as previous
    patch (for kprobe).

    Two functions, create_local_trace_uprobe() and
    destroy_local_trace_uprobe(), are created so a uprobe can be created
    and attached to the file descriptor created by perf_event_open().

    Signed-off-by: Song Liu
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Yonghong Song
    Reviewed-by: Josef Bacik
    Cc:
    Cc:
    Cc:
    Cc:
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20171206224518.3598254-7-songliubraving@fb.com
    Signed-off-by: Ingo Molnar

    Song Liu
     
  • A new PMU type, perf_kprobe is added. Based on attr from perf_event_open(),
    perf_kprobe creates a kprobe (or kretprobe) for the perf_event. This
    kprobe is private to this perf_event, and thus not added to global
    lists, and not available in tracefs.

    Two functions, create_local_trace_kprobe() and
    destroy_local_trace_kprobe() are added to created and destroy these
    local trace_kprobe.

    Signed-off-by: Song Liu
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Yonghong Song
    Reviewed-by: Josef Bacik
    Cc:
    Cc:
    Cc:
    Cc:
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20171206224518.3598254-6-songliubraving@fb.com
    Signed-off-by: Ingo Molnar

    Song Liu
     

17 Oct, 2017

3 commits

  • ops->flags _should_ be 0 at this point, so setting the flag using
    bitwise or is a bit daft.

    Link: http://lkml.kernel.org/r/20171011080224.315585202@infradead.org

    Requested-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Steven Rostedt (VMware)

    Peter Zijlstra
     
  • The function-trace perf interface is a tad messed up. Where all
    the other trace perf interfaces use a single trace hook
    registration and use per-cpu RCU based hlist to iterate the events,
    function-trace actually needs multiple hook registrations in order to
    minimize function entry patching when filters are present.

    The end result is that we iterate events both on the trace hook and on
    the hlist, which results in reporting events multiple times.

    Since function-trace cannot use the regular scheme, fix it the other
    way around, use singleton hlists.

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Steven Rostedt (VMware)

    Peter Zijlstra
     
  • Revert commit:

    75e8387685f6 ("perf/ftrace: Fix double traces of perf on ftrace:function")

    The reason I instantly stumbled on that patch is that it only addresses the
    ftrace situation and doesn't mention the other _5_ places that use this
    interface. It doesn't explain why those don't have the problem and if not, why
    their solution doesn't work for ftrace.

    It doesn't, but this is just putting more duct tape on.

    Link: http://lkml.kernel.org/r/20171011080224.200565770@infradead.org

    Cc: Zhou Chengming
    Cc: Jiri Olsa
    Cc: Ingo Molnar
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Steven Rostedt (VMware)

    Peter Zijlstra
     

29 Aug, 2017

1 commit

  • When running perf on the ftrace:function tracepoint, there is a bug
    which can be reproduced by:

    perf record -e ftrace:function -a sleep 20 &
    perf record -e ftrace:function ls
    perf script

    ls 10304 [005] 171.853235: ftrace:function:
    perf_output_begin
    ls 10304 [005] 171.853237: ftrace:function:
    perf_output_begin
    ls 10304 [005] 171.853239: ftrace:function:
    task_tgid_nr_ns
    ls 10304 [005] 171.853240: ftrace:function:
    task_tgid_nr_ns
    ls 10304 [005] 171.853242: ftrace:function:
    __task_pid_nr_ns
    ls 10304 [005] 171.853244: ftrace:function:
    __task_pid_nr_ns

    We can see that all the function traces are doubled.

    The problem is caused by the inconsistency of the register
    function perf_ftrace_event_register() with the probe function
    perf_ftrace_function_call(). The former registers one probe
    for every perf_event. And the latter handles all perf_events
    on the current cpu. So when two perf_events on the current cpu,
    the traces of them will be doubled.

    So this patch adds an extra parameter "event" for perf_tp_event,
    only send sample data to this event when it's not NULL.

    Signed-off-by: Zhou Chengming
    Reviewed-by: Jiri Olsa
    Acked-by: Steven Rostedt (VMware)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: acme@kernel.org
    Cc: alexander.shishkin@linux.intel.com
    Cc: huawei.libin@huawei.com
    Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.com
    Signed-off-by: Ingo Molnar

    Zhou Chengming
     

18 May, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support SPI based w5100 devices, from Akinobu Mita.

    2) Partial Segmentation Offload, from Alexander Duyck.

    3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

    4) Allow cls_flower stats offload, from Amir Vadai.

    5) Implement bpf blinding, from Daniel Borkmann.

    6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
    actually using FASYNC these atomics are superfluous. From Eric
    Dumazet.

    7) Run TCP more preemptibly, also from Eric Dumazet.

    8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
    driver, from Gal Pressman.

    9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

    10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

    11) Support tunneling offloads in qed, from Manish Chopra.

    12) aRFS offloading in mlx5e, from Maor Gottlieb.

    13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
    Leitner.

    14) Add MSG_EOR support to TCP, this allows controlling packet
    coalescing on application record boundaries for more accurate
    socket timestamp sampling. From Martin KaFai Lau.

    15) Fix alignment of 64-bit netlink attributes across the board, from
    Nicolas Dichtel.

    16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

    17) Several conversions of drivers to ethtool ksettings, from Philippe
    Reynes.

    18) Checksum neutral ILA in ipv6, from Tom Herbert.

    19) Factorize all of the various marvell dsa drivers into one, from
    Vivien Didelot

    20) Add VF support to qed driver, from Yuval Mintz"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
    Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
    Revert "phy dp83867: Make rgmii parameters optional"
    r8169: default to 64-bit DMA on recent PCIe chips
    phy dp83867: Make rgmii parameters optional
    phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
    bpf: arm64: remove callee-save registers use for tmp registers
    asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
    switchdev: pass pointer to fib_info instead of copy
    net_sched: close another race condition in tcf_mirred_release()
    tipc: fix nametable publication field in nl compat
    drivers: net: Don't print unpopulated net_device name
    qed: add support for dcbx.
    ravb: Add missing free_irq() calls to ravb_close()
    qed: Remove a stray tab
    net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
    net: ethernet: fec-mpc52xx: use phydev from struct net_device
    bpf, doc: fix typo on bpf_asm descriptions
    stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
    net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
    net: ethernet: fs-enet: use phydev from struct net_device
    ...

    Linus Torvalds
     

08 Apr, 2016

2 commits

  • split allows to move expensive update of 'struct trace_entry' to later phase.
    Repurpose unused 1st argument of perf_tp_event() to indicate event type.

    While splitting use temp variable 'rctx' instead of '*rctx' to avoid
    unnecessary loads done by the compiler due to -fno-strict-aliasing

    Signed-off-by: Alexei Starovoitov
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
    It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
    with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu init logic and
    subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
    so we can safely drop memset from all of the above cases and move it into
    perf_ftrace_function_call that calls it with stack allocated pt_regs.

    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

31 Mar, 2016

1 commit

  • Currently we check sample type for ftrace:function events
    even if it's not created as a sampling event. That prevents
    creating ftrace_function event in counting mode.

    Make sure we check sample types only for sampling events.

    Before:
    $ sudo perf stat -e ftrace:function ls
    ...

    Performance counter stats for 'ls':

    ftrace:function

    0.001983662 seconds time elapsed

    After:
    $ sudo perf stat -e ftrace:function ls
    ...

    Performance counter stats for 'ls':

    44,498 ftrace:function

    0.037534722 seconds time elapsed

    Suggested-by: Namhyung Kim
    Signed-off-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Steven Rostedt
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1458138873-1553-2-git-send-email-jolsa@kernel.org
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

13 Jan, 2016

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Not much new with tracing for this release. Mostly just clean ups and
    minor fixes.

    Here's what else is new:

    - A new TRACE_EVENT_FN_COND macro, combining both _FN and _COND for
    those that want both.

    - New selftest to test the instance create and delete

    - Better debug output when ftrace fails"

    * tag 'trace-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits)
    ftrace: Fix the race between ftrace and insmod
    ftrace: Add infrastructure for delayed enabling of module functions
    x86: ftrace: Fix the comments for ftrace_modify_code_direct()
    tracing: Fix comment to use tracing_on over tracing_enable
    metag: ftrace: Fix the comments for ftrace_modify_code
    sh: ftrace: Fix the comments for ftrace_modify_code()
    ia64: ftrace: Fix the comments for ftrace_modify_code()
    ftrace: Clean up ftrace_module_init() code
    ftrace: Join functions ftrace_module_init() and ftrace_init_module()
    tracing: Introduce TRACE_EVENT_FN_COND macro
    tracing: Use seq_buf_used() in seq_buf_to_user() instead of len
    bpf: Constify bpf_verifier_ops structure
    ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too
    ftrace: Remove use of control list and ops
    ftrace: Fix output of enabled_functions for showing tramp
    ftrace: Fix a typo in comment
    ftrace: Show all tramps registered to a record on ftrace_bug()
    ftrace: Add variable ftrace_expected for archs to show expected code
    ftrace: Add new type to distinguish what kind of ftrace_bug()
    tracing: Update cond flag when enabling or disabling a trigger
    ...

    Linus Torvalds
     

24 Dec, 2015

1 commit

  • Currently perf has its own list function within the ftrace infrastructure
    that seems to be used only to allow for it to have per-cpu disabling as well
    as a check to make sure that it's not called while RCU is not watching. It
    uses something called the "control_ops" which is used to iterate over ops
    under it with the control_list_func().

    The problem is that this control_ops and control_list_func unnecessarily
    complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags
    (FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code
    that is special with the control ops and add the needed checks within the
    generic ftrace_list_func().

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

23 Nov, 2015

1 commit

  • There were still a number of references to my old Red Hat email
    address in the kernel source. Remove these while keeping the
    Red Hat copyright notices intact.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

14 May, 2015

1 commit


14 Jan, 2015

1 commit

  • Both Linus (most recent) and Steve (a while ago) reported that perf
    related callbacks have massive stack bloat.

    The problem is that software events need a pt_regs in order to
    properly report the event location and unwind stack. And because we
    could not assume one was present we allocated one on stack and filled
    it with minimal bits required for operation.

    Now, pt_regs is quite large, so this is undesirable. Furthermore it
    turns out that most sites actually have a pt_regs pointer available,
    making this even more onerous, as the stack space is pointless waste.

    This patch addresses the problem by observing that software events
    have well defined nesting semantics, therefore we can use static
    per-cpu storage instead of on-stack.

    Linus made the further observation that all but the scheduler callers
    of perf_sw_event() have a pt_regs available, so we change the regular
    perf_sw_event() to require a valid pt_regs (where it used to be
    optional) and add perf_sw_event_sched() for the scheduler.

    We have a scheduler specific call instead of a more generic _noregs()
    like construct because we can assume non-recursion from the scheduler
    and thereby simplify the code further (_noregs would have to put the
    recursion context call inline in order to assertain which __perf_regs
    element to use).

    One last note on the implementation of perf_trace_buf_prepare(); we
    allow .regs = NULL for those cases where we already have a pt_regs
    pointer available and do not need another.

    Reported-by: Linus Torvalds
    Reported-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Javi Merino
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Oleg Nesterov
    Cc: Paul Mackerras
    Cc: Petr Mladek
    Cc: Steven Rostedt
    Cc: Tom Zanussi
    Cc: Vaibhav Nagarnaik
    Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra (Intel)
     

28 Jul, 2014

1 commit

  • There's no need to check cloned event's permission once the
    parent was already checked.

    Also the code is checking 'current' process permissions, which
    is not owner process for cloned events, thus could end up with
    wrong permission check result.

    Reported-by: Alexander Yarygin
    Tested-by: Alexander Yarygin
    Signed-off-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Corey Ashford
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Linus Torvalds
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

24 Apr, 2014

1 commit

  • Use NOKPROBE_SYMBOL macro to protect functions from
    kprobes instead of __kprobes annotation in ftrace.
    This applies nokprobe_inline annotation for some cases,
    because NOKPROBE_SYMBOL() will inhibit inlining by
    referring the symbol address.

    Signed-off-by: Masami Hiramatsu
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20140417081828.26341.55152.stgit@ltc230.yrl.intra.hitachi.co.jp
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

11 Mar, 2014

2 commits

  • Recent issues with user space callchains processing within
    page fault handler tracing showed as Peter said 'there's
    just too much fail surface'.

    The user space stack dump is just another source of the this issue.

    Related list discussions:
    http://marc.info/?t=139302086500001&r=1&w=2
    http://marc.info/?t=139301437300003&r=1&w=2

    Suggested-by: Peter Zijlstra
    Signed-off-by: Jiri Olsa
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Vince Weaver
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: H. Peter Anvin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1393775800-13524-3-git-send-email-jolsa@redhat.com
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     
  • Recent issues with user space callchains processing within
    page fault handler tracing showed as Peter said 'there's
    just too much fail surface'.

    Related list discussions:

    http://marc.info/?t=139302086500001&r=1&w=2
    http://marc.info/?t=139301437300003&r=1&w=2

    Suggested-by: Peter Zijlstra
    Signed-off-by: Jiri Olsa
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: "H. Peter Anvin"
    Cc: Vince Weaver
    Cc: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1393775800-13524-2-git-send-email-jolsa@redhat.com
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

19 Nov, 2013

2 commits

  • The 64-bit attr.config value for perf trace events was being copied into
    an "int" before doing a comparison, meaning the top 32 bits were
    being truncated.

    As far as I can tell this didn't cause any errors, but it did mean
    it was possible to create valid aliases for all the tracepoint ids
    which I don't think was intended. (For example, 0xffffffff00000018
    and 0x18 both enable the same tracepoint).

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1311151236100.11932@vincent-weaver-1.um.maine.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • Vince's perf-trinity fuzzer found yet another 'interesting' problem.

    When we sample the irq_work_exit tracepoint with period==1 (or
    PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
    infinite event generation loop:

    ,->
    | irq_work_exit() ->
    | trace_irq_work_exit() ->
    | ...
    | __perf_event_overflow() -> (due to fasync)
    | irq_work_queue() -> (irq_work_list must be empty)
    '--------- arch_irq_work_raise()

    Similar things can happen due to regular poll() wakeups if we exceed
    the ring-buffer wakeup watermark, or have an event_limit.

    To avoid this, dis-allow sampling this particular tracepoint.

    In order to achieve this, create a special perf_perm function pointer
    for each event and call this (when set) on trying to create a
    tracepoint perf event.

    [ roasted: use expr... to allow for ',' in your expression ]

    Reported-by: Vince Weaver
    Tested-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Dave Jones
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Nov, 2013

1 commit

  • The current default perf paranoid level is "1" which has
    "perf_paranoid_kernel()" return false, and giving any operations that
    use it, access to normal users. Unfortunately, this includes function
    tracing and normal users should not be allowed to enable function
    tracing by default.

    The proper level is defined at "-1" (full perf access), which
    "perf_paranoid_tracepoint_raw()" will only give access to. Use that
    check instead for enabling function tracing.

    Reported-by: Dave Jones
    Reported-by: Vince Weaver
    Tested-by: Vince Weaver
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Frederic Weisbecker
    Cc: stable@vger.kernel.org # 3.4+
    CVE: CVE-2013-2930
    Fixes: ced39002f5ea ("ftrace, perf: Add support to use function tracepoint in perf")
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

19 Jul, 2013

2 commits

  • Every perf_trace_buf_prepare() caller does
    WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
    almost the same.

    Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
    the meaning of _ONCE, but I think this is fine.

    - 4947014 2932448 10104832 17984294 1126b26 vmlinux
    + 4948422 2932448 10104832 17985702 11270a6 vmlinux

    on my build.

    Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com

    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL)
    make no sense if hlist_empty(head). Change perf_ftrace_function_call()
    to check event_function.perf_events beforehand.

    Link: http://lkml.kernel.org/r/20130617170204.GA19803@redhat.com

    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     

21 Aug, 2012

1 commit

  • …/acme/linux into perf/core

    Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

    * Fix include order for bison/flex-generated C files, from Ben Hutchings

    * Build fixes and documentation corrections from David Ahern

    * Group parsing support, from Jiri Olsa

    * UI/gtk refactorings and improvements from Namhyung Kim

    * NULL deref fix for perf script, from Namhyung Kim

    * Assorted cleanups from Robert Richter

    * Let O= makes handle relative paths, from Steven Rostedt

    * perf script python fixes, from Feng Tang.

    * Improve 'perf lock' error message when the needed tracepoints
    are not present, from David Ahern.

    * Initial bash completion support, from Frederic Weisbecker

    * Allow building without libelf, from Namhyung Kim.

    * Support DWARF CFI based unwind to have callchains when %bp
    based unwinding is not possible, from Jiri Olsa.

    * Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF
    section was the end goal, several fixes for code that handles all
    architectures and cleanups are included, from Cody Schafer.

    * Add a description for the JIT interface, from Andi Kleen.

    * Assorted fixes for Documentation and build in 32 bit, from Robert Richter

    * Add support for non-tracepoint events in perf script python, from Feng Tang

    * Cache the libtraceevent event_format associated to each evsel early, so that we
    avoid relookups, i.e. calling pevent_find_event repeatedly when processing
    tracepoint events.

    [ This is to reduce the surface contact with libtraceevents and make clear what
    is that the perf tools needs from that lib: so far parsing the common and per
    event fields. ]

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

31 Jul, 2012

1 commit

  • A few events are interesting not only for a current task.
    For example, sched_stat_* events are interesting for a task
    which wakes up. For this reason, it will be good if such
    events will be delivered to a target task too.

    Now a target task can be set by using __perf_task().

    The original idea and a draft patch belongs to Peter Zijlstra.

    I need these events for profiling sleep times. sched_switch is used for
    getting callchains and sched_stat_* is used for getting time periods.
    These events are combined in user space, then it can be analyzed by
    perf tools.

    Inspired-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Arun Sharma
    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     

20 Jul, 2012

2 commits

  • Return as the 4th paramater to the function tracer callback the pt_regs.

    Later patches that implement regs passing for the architectures will require
    having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
    to take the time to pass a full set of pt_regs to the ftrace_ops callback
    function. If the arch does not support it then it should pass NULL.

    If an arch can pass full regs, then it should define:
    ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1

    Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org

    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently the function trace callback receives only the ip and parent_ip
    of the function that it traced. It would be more powerful to also return
    the ops that registered the function as well. This allows the same function
    to act differently depending on what ftrace_ops registered it.

    Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org

    Reviewed-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

22 Feb, 2012

3 commits

  • Adding support to filter function trace event via perf
    interface. It is now possible to use filter interface
    in the perf tool like:

    perf record -e ftrace:function --filter="(ip == mm_*)" ls

    The filter syntax is restricted to the the 'ip' field only,
    and following operators are accepted '==' '!=' '||', ending
    up with the filter strings like:

    ip == f1[, ]f2 ... || ip != f3[, ]f4 ...

    with comma ',' or space ' ' as a function separator. If the
    space ' ' is used as a separator, the right side of the
    assignment needs to be enclosed in double quotes '"', e.g.:

    perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
    perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
    perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls

    The '==' operator adds trace filter with same effect as would
    be added via set_ftrace_filter file.

    The '!=' operator adds trace filter with same effect as would
    be added via set_ftrace_notrace file.

    The right side of the '!=', '==' operators is list of functions
    or regexp. to be added to filter separated by space.

    The '||' operator is used for connecting multiple filter definitions
    together. It is possible to have more than one '==' and '!='
    operators within one filter string.

    Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com

    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Adding perf registration support for the ftrace function event,
    so it is now possible to register it via perf interface.

    The perf_event struct statically contains ftrace_ops as a handle
    for function tracer. The function tracer is registered/unregistered
    in open/close actions.

    To be efficient, we enable/disable ftrace_ops each time the traced
    process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers).
    This way tracing is enabled only when the process is running.
    Intentionally using this way instead of the event's hw state
    PERF_HES_STOPPED, which would not disable the ftrace_ops.

    It is now possible to use function trace within perf commands
    like:

    perf record -e ftrace:function ls
    perf stat -e ftrace:function ls

    Allowed only for root.

    Link: http://lkml.kernel.org/r/1329317514-8131-6-git-send-email-jolsa@redhat.com

    Acked-by: Frederic Weisbecker
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
    perf event schedule in/out actions.

    The add action is invoked for when the perf event is scheduled in,
    while the del action is invoked when the event is scheduled out.

    Link: http://lkml.kernel.org/r/1329317514-8131-4-git-send-email-jolsa@redhat.com

    Acked-by: Frederic Weisbecker
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Jiri Olsa