28 Feb, 2019

1 commit


01 Feb, 2019

1 commit

  • Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
    [ 13.007000] WARNING: possible circular locking dependency detected
    [ 13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
    [ 13.008124] ------------------------------------------------------
    [ 13.008624] test_progs/246 is trying to acquire lock:
    [ 13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
    [ 13.009770]
    [ 13.009770] but task is already holding lock:
    [ 13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
    [ 13.010877]
    [ 13.010877] which lock already depends on the new lock.
    [ 13.010877]
    [ 13.011532]
    [ 13.011532] the existing dependency chain (in reverse order) is:
    [ 13.012129]
    [ 13.012129] -> #4 (bpf_event_mutex){+.+.}:
    [ 13.012582] perf_event_query_prog_array+0x9b/0x130
    [ 13.013016] _perf_ioctl+0x3aa/0x830
    [ 13.013354] perf_ioctl+0x2e/0x50
    [ 13.013668] do_vfs_ioctl+0x8f/0x6a0
    [ 13.014003] ksys_ioctl+0x70/0x80
    [ 13.014320] __x64_sys_ioctl+0x16/0x20
    [ 13.014668] do_syscall_64+0x4a/0x180
    [ 13.015007] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 13.015469]
    [ 13.015469] -> #3 (&cpuctx_mutex){+.+.}:
    [ 13.015910] perf_event_init_cpu+0x5a/0x90
    [ 13.016291] perf_event_init+0x1b2/0x1de
    [ 13.016654] start_kernel+0x2b8/0x42a
    [ 13.016995] secondary_startup_64+0xa4/0xb0
    [ 13.017382]
    [ 13.017382] -> #2 (pmus_lock){+.+.}:
    [ 13.017794] perf_event_init_cpu+0x21/0x90
    [ 13.018172] cpuhp_invoke_callback+0xb3/0x960
    [ 13.018573] _cpu_up+0xa7/0x140
    [ 13.018871] do_cpu_up+0xa4/0xc0
    [ 13.019178] smp_init+0xcd/0xd2
    [ 13.019483] kernel_init_freeable+0x123/0x24f
    [ 13.019878] kernel_init+0xa/0x110
    [ 13.020201] ret_from_fork+0x24/0x30
    [ 13.020541]
    [ 13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
    [ 13.021051] static_key_slow_inc+0xe/0x20
    [ 13.021424] tracepoint_probe_register_prio+0x28c/0x300
    [ 13.021891] perf_trace_event_init+0x11f/0x250
    [ 13.022297] perf_trace_init+0x6b/0xa0
    [ 13.022644] perf_tp_event_init+0x25/0x40
    [ 13.023011] perf_try_init_event+0x6b/0x90
    [ 13.023386] perf_event_alloc+0x9a8/0xc40
    [ 13.023754] __do_sys_perf_event_open+0x1dd/0xd30
    [ 13.024173] do_syscall_64+0x4a/0x180
    [ 13.024519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 13.024968]
    [ 13.024968] -> #0 (tracepoints_mutex){+.+.}:
    [ 13.025434] __mutex_lock+0x86/0x970
    [ 13.025764] tracepoint_probe_register_prio+0x2d/0x300
    [ 13.026215] bpf_probe_register+0x40/0x60
    [ 13.026584] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
    [ 13.027042] __do_sys_bpf+0x94f/0x1a90
    [ 13.027389] do_syscall_64+0x4a/0x180
    [ 13.027727] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 13.028171]
    [ 13.028171] other info that might help us debug this:
    [ 13.028171]
    [ 13.028807] Chain exists of:
    [ 13.028807] tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
    [ 13.028807]
    [ 13.029666] Possible unsafe locking scenario:
    [ 13.029666]
    [ 13.030140] CPU0 CPU1
    [ 13.030510] ---- ----
    [ 13.030875] lock(bpf_event_mutex);
    [ 13.031166] lock(&cpuctx_mutex);
    [ 13.031645] lock(bpf_event_mutex);
    [ 13.032135] lock(tracepoints_mutex);
    [ 13.032441]
    [ 13.032441] *** DEADLOCK ***
    [ 13.032441]
    [ 13.032911] 1 lock held by test_progs/246:
    [ 13.033239] #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
    [ 13.033909]
    [ 13.033909] stack backtrace:
    [ 13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
    [ 13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
    [ 13.035657] Call Trace:
    [ 13.035859] dump_stack+0x5f/0x8b
    [ 13.036130] print_circular_bug.isra.37+0x1ce/0x1db
    [ 13.036526] __lock_acquire+0x1158/0x1350
    [ 13.036852] ? lock_acquire+0x98/0x190
    [ 13.037154] lock_acquire+0x98/0x190
    [ 13.037447] ? tracepoint_probe_register_prio+0x2d/0x300
    [ 13.037876] __mutex_lock+0x86/0x970
    [ 13.038167] ? tracepoint_probe_register_prio+0x2d/0x300
    [ 13.038600] ? tracepoint_probe_register_prio+0x2d/0x300
    [ 13.039028] ? __mutex_lock+0x86/0x970
    [ 13.039337] ? __mutex_lock+0x24a/0x970
    [ 13.039649] ? bpf_probe_register+0x1d/0x60
    [ 13.039992] ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
    [ 13.040478] ? tracepoint_probe_register_prio+0x2d/0x300
    [ 13.040906] tracepoint_probe_register_prio+0x2d/0x300
    [ 13.041325] bpf_probe_register+0x40/0x60
    [ 13.041649] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
    [ 13.042068] ? __might_fault+0x3e/0x90
    [ 13.042374] __do_sys_bpf+0x94f/0x1a90
    [ 13.042678] do_syscall_64+0x4a/0x180
    [ 13.042975] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 13.043382] RIP: 0033:0x7f23b10a07f9
    [ 13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
    [ 13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
    [ 13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
    [ 13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
    [ 13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
    [ 13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000

    Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
    there is no need to take bpf_event_mutex too.
    bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
    bpf_raw_tracepoints don't need to take this mutex.

    Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
    Acked-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

22 Jan, 2019

1 commit

  • For the original mode of operation it isn't needed, since we report back
    errors via PERF_RECORD_LOST records in the ring buffer, but for use in
    bpf_perf_event_output() it is convenient to return the errors, basically
    -ENOSPC.

    Currently bpf_perf_event_output() returns an error indication, the last
    thing it does, which is to push it to the ring buffer is that can fail
    and if so, this failure won't be reported back to its users, fix it.

    Reported-by: Jamal Hadi Salim
    Tested-by: Jamal Hadi Salim
    Acked-by: Peter Zijlstra (Intel)
    Cc: Adrian Hunter
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/r/20190118150938.GN5823@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

04 Jan, 2019

1 commit

  • Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
    of the user address range verification function since we got rid of the
    old racy i386-only code to walk page tables by hand.

    It existed because the original 80386 would not honor the write protect
    bit when in kernel mode, so you had to do COW by hand before doing any
    user access. But we haven't supported that in a long time, and these
    days the 'type' argument is a purely historical artifact.

    A discussion about extending 'user_access_begin()' to do the range
    checking resulted this patch, because there is no way we're going to
    move the old VERIFY_xyz interface to that model. And it's best done at
    the end of the merge window when I've done most of my merges, so let's
    just get this done once and for all.

    This patch was mostly done with a sed-script, with manual fix-ups for
    the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

    There were a couple of notable cases:

    - csky still had the old "verify_area()" name as an alias.

    - the iter_iov code had magical hardcoded knowledge of the actual
    values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
    really used it)

    - microblaze used the type argument for a debug printout

    but other than those oddities this should be a total no-op patch.

    I tried to fix up all architectures, did fairly extensive grepping for
    access_ok() uses, and the changes are trivial, but I may have missed
    something. Any missed conversion should be trivially fixable, though.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

19 Dec, 2018

1 commit

  • Distributions build drivers as modules, including network and filesystem
    drivers which export numerous tracepoints. This enables
    bpf(BPF_RAW_TRACEPOINT_OPEN) to attach to those tracepoints.

    Signed-off-by: Matt Mullins
    Acked-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Matt Mullins
     

24 Nov, 2018

1 commit

  • A format string consisting of "%p" or "%s" followed by an invalid
    specifier (e.g. "%p%\n" or "%s%") could pass the check which
    would make format_decode (lib/vsprintf.c) to warn.

    Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
    Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
    Signed-off-by: Martynas Pumputis
    Signed-off-by: Daniel Borkmann

    Martynas Pumputis
     

17 Aug, 2018

1 commit


05 Jun, 2018

1 commit

  • Commit bf6fa2c893c5 ("bpf: implement bpf_get_current_cgroup_id()
    helper") introduced a new helper bpf_get_current_cgroup_id().
    The helper has a dependency on CONFIG_CGROUPS.

    When CONFIG_CGROUPS is not defined, using the helper will result
    the following verifier error:
    kernel subsystem misconfigured func bpf_get_current_cgroup_id#80
    which is hard for users to interpret.
    Guarding the reference to bpf_get_current_cgroup_id_proto with
    CONFIG_CGROUPS will result in below better message:
    unknown func bpf_get_current_cgroup_id#80

    Fixes: bf6fa2c893c5 ("bpf: implement bpf_get_current_cgroup_id() helper")
    Suggested-by: Daniel Borkmann
    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

04 Jun, 2018

1 commit

  • bpf has been used extensively for tracing. For example, bcc
    contains an almost full set of bpf-based tools to trace kernel
    and user functions/events. Most tracing tools are currently
    either filtered based on pid or system-wide.

    Containers have been used quite extensively in industry and
    cgroup is often used together to provide resource isolation
    and protection. Several processes may run inside the same
    container. It is often desirable to get container-level tracing
    results as well, e.g. syscall count, function count, I/O
    activity, etc.

    This patch implements a new helper, bpf_get_current_cgroup_id(),
    which will return cgroup id based on the cgroup within which
    the current task is running.

    The later patch will provide an example to show that
    userspace can get the same cgroup id so it could
    configure a filter or policy in the bpf program based on
    task cgroup id.

    The helper is currently implemented for tracing. It can
    be added to other program types as well when needed.

    Acked-by: Alexei Starovoitov
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     

03 Jun, 2018

1 commit

  • Wang reported that all the testcases for BPF_PROG_TYPE_PERF_EVENT
    program type in test_verifier report the following errors on x86_32:

    172/p unpriv: spill/fill of different pointers ldx FAIL
    Unexpected error message!
    0: (bf) r6 = r10
    1: (07) r6 += -8
    2: (15) if r1 == 0x0 goto pc+3
    R1=ctx(id=0,off=0,imm=0) R6=fp-8,call_-1 R10=fp0,call_-1
    3: (bf) r2 = r10
    4: (07) r2 += -76
    5: (7b) *(u64 *)(r6 +0) = r2
    6: (55) if r1 != 0x0 goto pc+1
    R1=ctx(id=0,off=0,imm=0) R2=fp-76,call_-1 R6=fp-8,call_-1 R10=fp0,call_-1 fp-8=fp
    7: (7b) *(u64 *)(r6 +0) = r1
    8: (79) r1 = *(u64 *)(r6 +0)
    9: (79) r1 = *(u64 *)(r1 +68)
    invalid bpf_context access off=68 size=8

    378/p check bpf_perf_event_data->sample_period byte load permitted FAIL
    Failed to load prog 'Permission denied'!
    0: (b7) r0 = 0
    1: (71) r0 = *(u8 *)(r1 +68)
    invalid bpf_context access off=68 size=1

    379/p check bpf_perf_event_data->sample_period half load permitted FAIL
    Failed to load prog 'Permission denied'!
    0: (b7) r0 = 0
    1: (69) r0 = *(u16 *)(r1 +68)
    invalid bpf_context access off=68 size=2

    380/p check bpf_perf_event_data->sample_period word load permitted FAIL
    Failed to load prog 'Permission denied'!
    0: (b7) r0 = 0
    1: (61) r0 = *(u32 *)(r1 +68)
    invalid bpf_context access off=68 size=4

    381/p check bpf_perf_event_data->sample_period dword load permitted FAIL
    Failed to load prog 'Permission denied'!
    0: (b7) r0 = 0
    1: (79) r0 = *(u64 *)(r1 +68)
    invalid bpf_context access off=68 size=8

    Reason is that struct pt_regs on x86_32 doesn't fully align to 8 byte
    boundary due to its size of 68 bytes. Therefore, bpf_ctx_narrow_access_ok()
    will then bail out saying that off & (size_default - 1) which is 68 & 7
    doesn't cleanly align in the case of sample_period access from struct
    bpf_perf_event_data, hence verifier wrongly thinks we might be doing an
    unaligned access here though underlying arch can handle it just fine.
    Therefore adjust this down to machine size and check and rewrite the
    offset for narrow access on that basis. We also need to fix corresponding
    pe_prog_is_valid_access(), since we hit the check for off % size != 0
    (e.g. 68 % 8 -> 4) in the first and last test. With that in place, progs
    for tracing work on x86_32.

    Reported-by: Wang YanQing
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Tested-by: Wang YanQing
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

30 May, 2018

1 commit


25 May, 2018

1 commit

  • Currently, suppose a userspace application has loaded a bpf program
    and attached it to a tracepoint/kprobe/uprobe, and a bpf
    introspection tool, e.g., bpftool, wants to show which bpf program
    is attached to which tracepoint/kprobe/uprobe. Such attachment
    information will be really useful to understand the overall bpf
    deployment in the system.

    There is a name field (16 bytes) for each program, which could
    be used to encode the attachment point. There are some drawbacks
    for this approaches. First, bpftool user (e.g., an admin) may not
    really understand the association between the name and the
    attachment point. Second, if one program is attached to multiple
    places, encoding a proper name which can imply all these
    attachments becomes difficult.

    This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
    Given a pid and fd, if the is associated with a
    tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
    . prog_id
    . tracepoint name, or
    . k[ret]probe funcname + offset or kernel addr, or
    . u[ret]probe filename + offset
    to the userspace.
    The user can use "bpftool prog" to find more information about
    bpf program itself with prog_id.

    Acked-by: Martin KaFai Lau
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     

30 Apr, 2018

1 commit

  • Currently, the bpf_current_task_under_cgroup helper has a check where if
    the BPF program is running in_interrupt(), it will return -EINVAL. This
    prevents the helper to be used in many useful scenarios, particularly
    BPF programs attached to Perf Events.

    This commit removes the check. Tested a few NMI (Perf Event) and some
    softirq context, the helper returns the correct result.

    Signed-off-by: Teng Qin
    Signed-off-by: Alexei Starovoitov

    Teng Qin
     

29 Apr, 2018

1 commit

  • Currently, stackmap and bpf_get_stackid helper are provided
    for bpf program to get the stack trace. This approach has
    a limitation though. If two stack traces have the same hash,
    only one will get stored in the stackmap table,
    so some stack traces are missing from user perspective.

    This patch implements a new helper, bpf_get_stack, will
    send stack traces directly to bpf program. The bpf program
    is able to see all stack traces, and then can do in-kernel
    processing or send stack traces to user space through
    shared map or bpf_perf_event_output.

    Acked-by: Alexei Starovoitov
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     

11 Apr, 2018

1 commit

  • syzbot reported a possible deadlock in perf_event_detach_bpf_prog.
    The error details:
    ======================================================
    WARNING: possible circular locking dependency detected
    4.16.0-rc7+ #3 Not tainted
    ------------------------------------------------------
    syz-executor7/24531 is trying to acquire lock:
    (bpf_event_mutex){+.+.}, at: [] perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854

    but task is already holding lock:
    (&mm->mmap_sem){++++}, at: [] vm_mmap_pgoff+0x198/0x280 mm/util.c:353

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++}:
    __might_fault+0x13a/0x1d0 mm/memory.c:4571
    _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
    copy_to_user include/linux/uaccess.h:155 [inline]
    bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694
    perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891
    _perf_ioctl kernel/events/core.c:4750 [inline]
    perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770
    vfs_ioctl fs/ioctl.c:46 [inline]
    do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
    SYSC_ioctl fs/ioctl.c:701 [inline]
    SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    -> #0 (bpf_event_mutex){+.+.}:
    lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
    __mutex_lock_common kernel/locking/mutex.c:756 [inline]
    __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
    mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
    perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
    perf_event_free_bpf_prog kernel/events/core.c:8147 [inline]
    _free_event+0xbdb/0x10f0 kernel/events/core.c:4116
    put_event+0x24/0x30 kernel/events/core.c:4204
    perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172
    remove_vma+0xb4/0x1b0 mm/mmap.c:172
    remove_vma_list mm/mmap.c:2490 [inline]
    do_munmap+0x82a/0xdf0 mm/mmap.c:2731
    mmap_region+0x59e/0x15a0 mm/mmap.c:1646
    do_mmap+0x6c0/0xe00 mm/mmap.c:1483
    do_mmap_pgoff include/linux/mm.h:2223 [inline]
    vm_mmap_pgoff+0x1de/0x280 mm/util.c:355
    SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
    SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491
    SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
    SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&mm->mmap_sem);
    lock(bpf_event_mutex);
    lock(&mm->mmap_sem);
    lock(bpf_event_mutex);

    *** DEADLOCK ***
    ======================================================

    The bug is introduced by Commit f371b304f12e ("bpf/tracing: allow
    user space to query prog array on the same tp") where copy_to_user,
    which requires mm->mmap_sem, is called inside bpf_event_mutex lock.
    At the same time, during perf_event file descriptor close,
    mm->mmap_sem is held first and then subsequent
    perf_event_detach_bpf_prog needs bpf_event_mutex lock.
    Such a senario caused a deadlock.

    As suggested by Daniel, moving copy_to_user out of the
    bpf_event_mutex lock should fix the problem.

    Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp")
    Reported-by: syzbot+dc5ca0e4c9bfafaf2bae@syzkaller.appspotmail.com
    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

31 Mar, 2018

1 commit

  • == The problem ==

    There are use-cases when a program of some type can be attached to
    multiple attach points and those attach points must have different
    permissions to access context or to call helpers.

    E.g. context structure may have fields for both IPv4 and IPv6 but it
    doesn't make sense to read from / write to IPv6 field when attach point
    is somewhere in IPv4 stack.

    Same applies to BPF-helpers: it may make sense to call some helper from
    some attach point, but not from other for same prog type.

    == The solution ==

    Introduce `expected_attach_type` field in in `struct bpf_attr` for
    `BPF_PROG_LOAD` command. If scenario described in "The problem" section
    is the case for some prog type, the field will be checked twice:

    1) At load time prog type is checked to see if attach type for it must
    be known to validate program permissions correctly. Prog will be
    rejected with EINVAL if it's the case and `expected_attach_type` is
    not specified or has invalid value.

    2) At attach time `attach_type` is compared with `expected_attach_type`,
    if prog type requires to have one, and, if they differ, attach will
    be rejected with EINVAL.

    The `expected_attach_type` is now available as part of `struct bpf_prog`
    in both `bpf_verifier_ops->is_valid_access()` and
    `bpf_verifier_ops->get_func_proto()` () and can be used to check context
    accesses and calls to helpers correspondingly.

    Initially the idea was discussed by Alexei Starovoitov and
    Daniel Borkmann here:
    https://marc.info/?l=linux-netdev&m=152107378717201&w=2

    Signed-off-by: Andrey Ignatov
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Andrey Ignatov
     

29 Mar, 2018

1 commit

  • Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
    kernel internal arguments of the tracepoints in their raw form.

    >From bpf program point of view the access to the arguments look like:
    struct bpf_raw_tracepoint_args {
    __u64 args[0];
    };

    int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
    {
    // program can read args[N] where N depends on tracepoint
    // and statically verified at program load+attach time
    }

    kprobe+bpf infrastructure allows programs access function arguments.
    This feature allows programs access raw tracepoint arguments.

    Similar to proposed 'dynamic ftrace events' there are no abi guarantees
    to what the tracepoints arguments are and what their meaning is.
    The program needs to type cast args properly and use bpf_probe_read()
    helper to access struct fields when argument is a pointer.

    For every tracepoint __bpf_trace_##call function is prepared.
    In assembler it looks like:
    (gdb) disassemble __bpf_trace_xdp_exception
    Dump of assembler code for function __bpf_trace_xdp_exception:
    0xffffffff81132080 : mov %ecx,%ecx
    0xffffffff81132082 : jmpq 0xffffffff811231f0

    where

    TRACE_EVENT(xdp_exception,
    TP_PROTO(const struct net_device *dev,
    const struct bpf_prog *xdp, u32 act),

    The above assembler snippet is casting 32-bit 'act' field into 'u64'
    to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
    All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
    and in total this approach adds 7k bytes to .text.

    This approach gives the lowest possible overhead
    while calling trace_xdp_exception() from kernel C code and
    transitioning into bpf land.
    Since tracepoint+bpf are used at speeds of 1M+ events per second
    this is valuable optimization.

    The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
    that returns anon_inode FD of 'bpf-raw-tracepoint' object.

    The user space looks like:
    // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
    prog_fd = bpf_prog_load(...);
    // receive anon_inode fd for given bpf_raw_tracepoint with prog attached
    raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

    Ctrl-C of tracing daemon or cmdline tool that uses this feature
    will automatically detach bpf program, unload it and
    unregister tracepoint probe.

    On the kernel side the __bpf_raw_tp_map section of pointers to
    tracepoint definition and to __bpf_trace_*() probe function is used
    to find a tracepoint with "xdp_exception" name and
    corresponding __bpf_trace_xdp_exception() probe function
    which are passed to tracepoint_probe_register() to connect probe
    with tracepoint.

    Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
    tracepoint mechanisms. perf_event_open() can be used in parallel
    on the same tracepoint.
    Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
    Each with its own bpf program. The kernel will execute
    all tracepoint probes and all attached bpf programs.

    In the future bpf_raw_tracepoints can be extended with
    query/introspection logic.

    __bpf_raw_tp_map section logic was contributed by Steven Rostedt

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Steven Rostedt (VMware)
    Acked-by: Steven Rostedt (VMware)
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

23 Mar, 2018

1 commit

  • Fun set of conflict resolutions here...

    For the mac80211 stuff, these were fortunately just parallel
    adds. Trivially resolved.

    In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
    function phy_disable_interrupts() earlier in the file, whilst in
    'net-next' the phy_error() call from this function was removed.

    In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
    'rt_table_id' member of rtable collided with a bug fix in 'net' that
    added a new struct member "rt_mtu_locked" which needs to be copied
    over here.

    The mlxsw driver conflict consisted of net-next separating
    the span code and definitions into separate files, whilst
    a 'net' bug fix made some changes to that moved code.

    The mlx5 infiniband conflict resolution was quite non-trivial,
    the RDMA tree's merge commit was used as a guide here, and
    here are their notes:

    ====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch. This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
    drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
    (IB/mlx5: Fix cleanup order on unload) added to for-rc and
    commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
    add as part of the devel cycle both needed to modify the
    init/de-init functions used by mlx5. To support the new
    representors, the new functions added by the cleanup patch
    needed to be made non-static, and the init/de-init list
    added by the representors patch needed to be modified to
    match the init/de-init list changes made by the cleanup
    patch.
    Updates:
    drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
    prototypes added by representors patch to reflect new function
    names as changed by cleanup patch
    drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
    stage list to match new order from cleanup patch
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Mar, 2018

1 commit

  • Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value")
    added helper bpf_perf_prog_read_value so that perf_event type program
    can read event counter and enabled/running time.
    This commit, however, introduced a bug which allows this helper
    for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value
    needs to access perf_event through its bpf_perf_event_data_kern type context,
    which is not available for tracepoint type program.

    This patch fixed the issue by separating bpf_func_proto between tracepoint
    and perf_event type programs and removed bpf_perf_prog_read_value
    from tracepoint func prototype.

    Fixes: 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value")
    Reported-by: Alexei Starovoitov
    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

08 Mar, 2018

1 commit

  • This commit adds new field "addr" to bpf_perf_event_data which could be
    read and used by bpf programs attached to perf events. The value of the
    field is copied from bpf_perf_event_data_kern.addr and contains the
    address value recorded by specifying sample_type with PERF_SAMPLE_ADDR
    when calling perf_event_open.

    Signed-off-by: Teng Qin
    Signed-off-by: Daniel Borkmann

    Teng Qin
     

15 Feb, 2018

1 commit

  • syzkaller tried to perform a prog query in perf_event_query_prog_array()
    where struct perf_event_query_bpf had an ids_len of 1,073,741,353 and
    thus causing a warning due to failed kcalloc() allocation out of the
    bpf_prog_array_copy_to_user() helper. Given we cannot attach more than
    64 programs to a perf event, there's no point in allowing huge ids_len.
    Therefore, allow a buffer that would fix the maximum number of ids and
    also add a __GFP_NOWARN to the temporary ids buffer.

    Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp")
    Fixes: 0911287ce32b ("bpf: fix bpf_prog_array_copy_to_user() issues")
    Reported-by: syzbot+cab5816b0edbabf598b3@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

18 Jan, 2018

1 commit

  • Currently, for bpf_trace_printk helper, fake ip address 0x1
    is used with comments saying that fake ip will not be printed.
    This is indeed true for 4.12 and earlier version, but for
    4.13 and later version, the ip address will be printed if
    it cannot be resolved with kallsym. Running samples/bpf/tracex5
    program and you will have the following in the debugfs
    trace_pipe output:
    ...
    -1819 [003] .... 443.497877: 0x00000001: mmap
    -1819 [003] .... 443.498289: 0x00000001: syscall=102 (one of get/set uid/pid/gid)
    ...

    The kernel commit changed this behavior is:
    commit feaf1283d11794b9d518fcfd54b6bf8bee1f0b4b
    Author: Steven Rostedt (VMware)
    Date: Thu Jun 22 17:04:55 2017 -0400

    tracing: Show address when function names are not found
    ...

    This patch changed the comment and also altered the fake ip
    address to 0x0 as users may think 0x1 has some special meaning
    while it doesn't. The new output:
    ...
    -1799 [002] .... 25.953576: 0: mmap
    -1799 [002] .... 25.953865: 0: read(fd=0, buf=00000000053936b5, size=512)
    ...

    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

13 Jan, 2018

3 commits

  • Since error-injection framework is not limited to be used
    by kprobes, nor bpf. Other kernel subsystems can use it
    freely for checking safeness of error-injection, e.g.
    livepatch, ftrace etc.
    So this separate error-injection framework from kprobes.

    Some differences has been made:

    - "kprobe" word is removed from any APIs/structures.
    - BPF_ALLOW_ERROR_INJECTION() is renamed to
    ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
    - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
    feature. It is automatically enabled if the arch supports
    error injection feature for kprobe or ftrace etc.

    Signed-off-by: Masami Hiramatsu
    Reviewed-by: Josef Bacik
    Signed-off-by: Alexei Starovoitov

    Masami Hiramatsu
     
  • Compare instruction pointer with original one on the
    stack instead using per-cpu bpf_kprobe_override flag.

    This patch also consolidates reset_current_kprobe() and
    preempt_enable_no_resched() blocks. Those can be done
    in one place.

    Signed-off-by: Masami Hiramatsu
    Reviewed-by: Josef Bacik
    Signed-off-by: Alexei Starovoitov

    Masami Hiramatsu
     
  • Check whether error injectable event is on function entry or not.
    Currently it checks the event is ftrace-based kprobes or not,
    but that is wrong. It should check if the event is on the entry
    of target function. Since error injection will override a function
    to just return with modified return value, that operation must
    be done before the target function starts making stackframe.

    As a side effect, bpf error injection is no need to depend on
    function-tracer. It can work with sw-breakpoint based kprobe
    events too.

    Signed-off-by: Masami Hiramatsu
    Reviewed-by: Josef Bacik
    Signed-off-by: Alexei Starovoitov

    Masami Hiramatsu
     

18 Dec, 2017

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2017-12-18

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Allow arbitrary function calls from one BPF function to another BPF function.
    As of today when writing BPF programs, __always_inline had to be used in
    the BPF C programs for all functions, unnecessarily causing LLVM to inflate
    code size. Handle this more naturally with support for BPF to BPF calls
    such that this __always_inline restriction can be overcome. As a result,
    it allows for better optimized code and finally enables to introduce core
    BPF libraries in the future that can be reused out of different projects.
    x86 and arm64 JIT support was added as well, from Alexei.

    2) Add infrastructure for tagging functions as error injectable and allow for
    BPF to return arbitrary error values when BPF is attached via kprobes on
    those. This way of injecting errors generically eases testing and debugging
    without having to recompile or restart the kernel. Tags for opting-in for
    this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.

    3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
    call for XDP programs. First part of this work adds handling of BPF
    capabilities included in the firmware, and the later patches add support
    to the nfp verifier part and JIT as well as some small optimizations,
    from Jakub.

    4) The bpftool now also gets support for basic cgroup BPF operations such
    as attaching, detaching and listing current BPF programs. As a requirement
    for the attach part, bpftool can now also load object files through
    'bpftool prog load'. This reuses libbpf which we have in the kernel tree
    as well. bpftool-cgroup man page is added along with it, from Roman.

    5) Back then commit e87c6bc3852b ("bpf: permit multiple bpf attachments for
    a single perf event") added support for attaching multiple BPF programs
    to a single perf event. Given they are configured through perf's ioctl()
    interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
    command in this work in order to return an array of one or multiple BPF
    prog ids that are currently attached, from Yonghong.

    6) Various minor fixes and cleanups to the bpftool's Makefile as well
    as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
    itself or prior installed documentation related to it, from Quentin.

    7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
    required for the test_dev_cgroup test case to run, from Naresh.

    8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.

    9) Fix libbpf's exit code from the Makefile when libelf was not found in
    the system, also from Jakub.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

14 Dec, 2017

1 commit

  • Commit f371b304f12e ("bpf/tracing: allow user space to
    query prog array on the same tp") introduced a perf
    ioctl command to query prog array attached to the
    same perf tracepoint. The commit introduced a
    compilation error under certain config conditions, e.g.,
    (1). CONFIG_BPF_SYSCALL is not defined, or
    (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
    nor CONFIG_KPROBE_EVENTS is defined.

    Error message:
    kernel/events/core.o: In function `perf_ioctl':
    core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'

    This patch fixed this error by guarding the real definition under
    CONFIG_BPF_EVENTS and provided static inline dummy function
    if CONFIG_BPF_EVENTS was not defined.
    It renamed the function from bpf_event_query_prog_array to
    perf_event_query_prog_array and moved the definition from linux/bpf.h
    to linux/trace_events.h so the definition is in proximity to
    other prog_array related functions.

    Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp")
    Reported-by: Stephen Rothwell
    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

13 Dec, 2017

3 commits

  • When tracing and networking programs are both attached in the
    system and both use event-output helpers that eventually call
    into perf_event_output(), then we could end up in a situation
    where the tracing attached program runs in user context while
    a cls_bpf program is triggered on that same CPU out of softirq
    context.

    Since both rely on the same per-cpu perf_sample_data, we could
    potentially corrupt it. This can only ever happen in a combination
    of the two types; all tracing programs use a bpf_prog_active
    counter to bail out in case a program is already running on
    that CPU out of a different context. XDP and cls_bpf programs
    by themselves don't have this issue as they run in the same
    context only. Therefore, split both perf_sample_data so they
    cannot be accessed from each other.

    Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
    Reported-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Tested-by: Song Liu
    Acked-by: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Error injection is sloppy and very ad-hoc. BPF could fill this niche
    perfectly with it's kprobe functionality. We could make sure errors are
    only triggered in specific call chains that we care about with very
    specific situations. Accomplish this with the bpf_override_funciton
    helper. This will modify the probe'd callers return value to the
    specified value and set the PC to an override function that simply
    returns, bypassing the originally probed function. This gives us a nice
    clean way to implement systematic error injection for all of our code
    paths.

    Acked-by: Alexei Starovoitov
    Acked-by: Ingo Molnar
    Signed-off-by: Josef Bacik
    Signed-off-by: Alexei Starovoitov

    Josef Bacik
     
  • Commit e87c6bc3852b ("bpf: permit multiple bpf attachments
    for a single perf event") added support to attach multiple
    bpf programs to a single perf event.
    Although this provides flexibility, users may want to know
    what other bpf programs attached to the same tp interface.
    Besides getting visibility for the underlying bpf system,
    such information may also help consolidate multiple bpf programs,
    understand potential performance issues due to a large array,
    and debug (e.g., one bpf program which overwrites return code
    may impact subsequent program results).

    Commit 2541517c32be ("tracing, perf: Implement BPF programs
    attached to kprobes") utilized the existing perf ioctl
    interface and added the command PERF_EVENT_IOC_SET_BPF
    to attach a bpf program to a tracepoint. This patch adds a new
    ioctl command, given a perf event fd, to query the bpf program
    array attached to the same perf tracepoint event.

    The new uapi ioctl command:
    PERF_EVENT_IOC_QUERY_BPF

    The new uapi/linux/perf_event.h structure:
    struct perf_event_query_bpf {
    __u32 ids_len;
    __u32 prog_cnt;
    __u32 ids[0];
    };

    User space provides buffer "ids" for kernel to copy to.
    When returning from the kernel, the number of available
    programs in the array is set in "prog_cnt".

    The usage:
    struct perf_event_query_bpf *query =
    malloc(sizeof(*query) + sizeof(u32) * ids_len);
    query.ids_len = ids_len;
    err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query);
    if (err == 0) {
    /* query.prog_cnt is the number of available progs,
    * number of progs in ids: (ids_len == 0) ? 0 : query.prog_cnt
    */
    } else if (errno == ENOSPC) {
    /* query.ids_len number of progs copied,
    * query.prog_cnt is the number of available progs
    */
    } else {
    /* other errors */
    }

    Signed-off-by: Yonghong Song
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     

01 Dec, 2017

1 commit


23 Nov, 2017

3 commits

  • Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
    semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
    the compiler generates optimized BPF code when checking boundaries of an
    argument from C code. A typical example of this optimized code can be
    generated using the bpf_perf_event_output helper when operating on variable
    memory:

    /* len is a generic scalar */
    if (len > 0 && len 0x7ffe goto pc+6
    114: (bf) r1 = r6
    115: (18) r2 = 0xffff94e5f166c200
    117: (b7) r3 = 0
    118: (bf) r4 = r7
    119: (85) call bpf_perf_event_output#25
    R5 min value is negative, either use unsigned or 'var &= const'

    With this code, the verifier loses track of the variable.

    Replacing arg5 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
    avoids this quite common case which leads to usability issues, and the
    compiler generates code that the verifier can more easily test:

    if (len
    Signed-off-by: Gianluca Borello
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Gianluca Borello
     
  • Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
    semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
    the compiler generates optimized BPF code when checking boundaries of an
    argument from C code. A typical example of this optimized code can be
    generated using the bpf_probe_read_str helper when operating on variable
    memory:

    /* len is a generic scalar */
    if (len > 0 && len 0x7ffe goto pc-42
    254: (bf) r1 = r7
    255: (79) r2 = *(u64 *)(r10 -88)
    256: (bf) r8 = r4
    257: (85) call bpf_probe_read_str#45
    R2 min value is negative, either use unsigned or 'var &= const'

    With this code, the verifier loses track of the variable.

    Replacing arg2 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
    avoids this quite common case which leads to usability issues, and the
    compiler generates code that the verifier can more easily test:

    if (len
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Gianluca Borello
     
  • Commit 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to
    ARG_CONST_SIZE_OR_ZERO") changed arg2 type to ARG_CONST_SIZE_OR_ZERO to
    simplify writing bpf programs by taking advantage of the new semantics
    introduced for ARG_CONST_SIZE_OR_ZERO which allows <!NULL, 0> arguments.

    In order to prevent the helper from actually passing a NULL pointer to
    probe_kernel_read, which can happen when is passed to the helper,
    the commit also introduced an explicit check against size == 0.

    After the recent introduction of the ARG_PTR_TO_MEM_OR_NULL type,
    bpf_probe_read can not receive a pair of arguments anymore, thus
    the check is not needed anymore and can be removed, since probe_kernel_read
    can correctly handle a <!NULL, 0> call. This also fixes the semantics of
    the helper before it gets officially released and bpf programs start
    relying on this check.

    Fixes: 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO")
    Signed-off-by: Gianluca Borello
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Gianluca Borello
     

14 Nov, 2017

1 commit

  • The helper bpf_probe_read arg2 type is changed
    from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO to permit
    size-0 buffer. Together with newer ARG_CONST_SIZE_OR_ZERO
    semantics which allows non-NULL buffer with size 0,
    this allows simpler bpf programs with verifier acceptance.
    The previous commit which changes ARG_CONST_SIZE_OR_ZERO semantics
    has details on examples.

    Signed-off-by: Yonghong Song
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Yonghong Song
     

11 Nov, 2017

2 commits

  • NACK'd by x86 maintainer.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Error injection is sloppy and very ad-hoc. BPF could fill this niche
    perfectly with it's kprobe functionality. We could make sure errors are
    only triggered in specific call chains that we care about with very
    specific situations. Accomplish this with the bpf_override_funciton
    helper. This will modify the probe'd callers return value to the
    specified value and set the PC to an override function that simply
    returns, bypassing the originally probed function. This gives us a nice
    clean way to implement systematic error injection for all of our code
    paths.

    Acked-by: Alexei Starovoitov
    Signed-off-by: Josef Bacik
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Josef Bacik
     

01 Nov, 2017

1 commit

  • During perf event attaching/detaching bpf programs,
    the tp_event->prog_array change is protected by the
    bpf_event_mutex lock in both attaching and deteching
    functions. Although tp_event->prog_array is a rcu
    pointer, rcu_derefrence is not needed to access it
    since mutex lock will guarantee ordering.

    Verified through "make C=2" that sparse
    locking check still happy with the new change.

    Also change the label name in perf_event_{attach,detach}_bpf_prog
    from "out" to "unlock" to reflect the code action after the label.

    Signed-off-by: Yonghong Song
    Acked-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Yonghong Song
     

27 Oct, 2017

1 commit

  • commit afdb09c720b6 ("security: bpf: Add LSM hooks for bpf object related
    syscall") included linux/bpf.h in linux/security.h. As a result, bpf
    programs including bpf_helpers.h and some other header that ends up
    pulling in also security.h, such as several examples under samples/bpf,
    fail to compile because bpf_tail_call and bpf_get_stackid are now
    "redefined as different kind of symbol".

    >From bpf.h:

    u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
    u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);

    Whereas in bpf_helpers.h they are:

    static void (*bpf_tail_call)(void *ctx, void *map, int index);
    static int (*bpf_get_stackid)(void *ctx, void *map, int flags);

    Fix this by removing the unused declaration of bpf_tail_call and moving
    the declaration of bpf_get_stackid in bpf_trace.c, which is the only
    place where it's needed.

    Signed-off-by: Gianluca Borello
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Gianluca Borello
     

25 Oct, 2017

1 commit

  • This patch enables multiple bpf attachments for a
    kprobe/uprobe/tracepoint single trace event.
    Each trace_event keeps a list of attached perf events.
    When an event happens, all attached bpf programs will
    be executed based on the order of attachment.

    A global bpf_event_mutex lock is introduced to protect
    prog_array attaching and detaching. An alternative will
    be introduce a mutex lock in every trace_event_call
    structure, but it takes a lot of extra memory.
    So a global bpf_event_mutex lock is a good compromise.

    The bpf prog detachment involves allocation of memory.
    If the allocation fails, a dummy do-nothing program
    will replace to-be-detached program in-place.

    Signed-off-by: Yonghong Song
    Acked-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Yonghong Song