03 Oct, 2020

2 commits

  • Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
    bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
    except that it may return NULL. This happens when the cpu parameter is
    out of range. So the caller must check the returned value.

    Signed-off-by: Hao Luo
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com

    Hao Luo
     
  • Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
    ksym so that further dereferences on the ksym can use the BTF info
    to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
    the verifier reads the btf_id stored in the insn[0]'s imm field and
    marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
    which is encoded in btf_vminux by pahole. If the VAR is not of a struct
    type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
    and the mem_size is resolved to the size of the VAR's type.

    >From the VAR btf_id, the verifier can also read the address of the
    ksym's corresponding kernel var from kallsyms and use that to fill
    dst_reg.

    Therefore, the proper functionality of pseudo_btf_id depends on (1)
    kallsyms and (2) the encoding of kernel global VARs in pahole, which
    should be available since pahole v1.18.

    Signed-off-by: Hao Luo
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com

    Hao Luo
     

29 Sep, 2020

3 commits

  • A helper is added to allow seq file writing of kernel data
    structures using vmlinux BTF. Its signature is

    long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
    u32 btf_ptr_size, u64 flags);

    Flags and struct btf_ptr definitions/use are identical to the
    bpf_snprintf_btf helper, and the helper returns 0 on success
    or a negative error value.

    Suggested-by: Alexei Starovoitov
    Signed-off-by: Alan Maguire
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com

    Alan Maguire
     
  • A helper is added to support tracing kernel type information in BPF
    using the BPF Type Format (BTF). Its signature is

    long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
    u32 btf_ptr_size, u64 flags);

    struct btf_ptr * specifies

    - a pointer to the data to be traced
    - the BTF id of the type of data pointed to
    - a flags field is provided for future use; these flags
    are not to be confused with the BTF_F_* flags
    below that control how the btf_ptr is displayed; the
    flags member of the struct btf_ptr may be used to
    disambiguate types in kernel versus module BTF, etc;
    the main distinction is the flags relate to the type
    and information needed in identifying it; not how it
    is displayed.

    For example a BPF program with a struct sk_buff *skb
    could do the following:

    static struct btf_ptr b = { };

    b.ptr = skb;
    b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
    bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);

    Default output looks like this:

    (struct sk_buff){
    .transport_header = (__u16)65535,
    .mac_header = (__u16)65535,
    .end = (sk_buff_data_t)192,
    .head = (unsigned char *)0x000000007524fd8b,
    .data = (unsigned char *)0x000000007524fd8b,
    .truesize = (unsigned int)768,
    .users = (refcount_t){
    .refs = (atomic_t){
    .counter = (int)1,
    },
    },
    }

    Flags modifying display are as follows:

    - BTF_F_COMPACT: no formatting around type information
    - BTF_F_NONAME: no struct/union member names/types
    - BTF_F_PTR_RAW: show raw (unobfuscated) pointer values;
    equivalent to %px.
    - BTF_F_ZERO: show zero-valued struct/union members;
    they are not displayed by default

    Signed-off-by: Alan Maguire
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com

    Alan Maguire
     
  • generalize the "seq_show" seq file support in btf.c to support
    a generic show callback of which we support two instances; the
    current seq file show, and a show with snprintf() behaviour which
    instead writes the type data to a supplied string.

    Both classes of show function call btf_type_show() with different
    targets; the seq file or the string to be written. In the string
    case we need to track additional data - length left in string to write
    and length to return that we would have written (a la snprintf).

    By default show will display type information, field members and
    their types and values etc, and the information is indented
    based upon structure depth. Zeroed fields are omitted.

    Show however supports flags which modify its behaviour:

    BTF_SHOW_COMPACT - suppress newline/indent.
    BTF_SHOW_NONAME - suppress show of type and member names.
    BTF_SHOW_PTR_RAW - do not obfuscate pointer values.
    BTF_SHOW_UNSAFE - do not copy data to safe buffer before display.
    BTF_SHOW_ZERO - show zeroed values (by default they are not shown).

    Signed-off-by: Alan Maguire
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com

    Alan Maguire
     

26 Aug, 2020

1 commit

  • Moving btf_resolve_size into __btf_resolve_size and
    keeping btf_resolve_size public with just first 3
    arguments, because the rest of the arguments are not
    used by outside callers.

    Following changes are adding more arguments, which
    are not useful to outside callers. They will be added
    to the __btf_resolve_size function.

    Signed-off-by: Jiri Olsa
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200825192124.710397-4-jolsa@kernel.org

    Jiri Olsa
     

25 Jun, 2020

1 commit

  • To ensure btf_ctx_access() is safe the verifier checks that the BTF
    arg type is an int, enum, or pointer. When the function does the
    BTF arg lookup it uses the calculation 'arg = off / 8' using the
    fact that registers are 8B. This requires that the first arg is
    in the first reg, the second in the second, and so on. However,
    for __int128 the arg will consume two registers by default LLVM
    implementation. So this will cause the arg layout assumed by the
    'arg = off / 8' calculation to be incorrect.

    Because __int128 is uncommon this patch applies the easiest fix and
    will force int types to be sizeof(u64) or smaller so that they will
    fit in a single register.

    v2: remove unneeded parens per Andrii's feedback

    Fixes: 9e15db66136a1 ("bpf: Implement accurate raw_tp context access via BTF")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/159303723962.11287.13309537171132420717.stgit@john-Precision-5820-Tower

    John Fastabend
     

23 Jan, 2020

1 commit

  • Introduce dynamic program extensions. The users can load additional BPF
    functions and replace global functions in previously loaded BPF programs while
    these programs are executing.

    Global functions are verified individually by the verifier based on their types only.
    Hence the global function in the new program which types match older function can
    safely replace that corresponding function.

    This new function/program is called 'an extension' of old program. At load time
    the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function
    to be replaced. The BPF program type is derived from the target program into
    extension program. Technically bpf_verifier_ops is copied from target program.
    The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops.
    The extension program can call the same bpf helper functions as target program.
    Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program
    types. The verifier allows only one level of replacement. Meaning that the
    extension program cannot recursively extend an extension. That also means that
    the maximum stack size is increasing from 512 to 1024 bytes and maximum
    function nesting level from 8 to 16. The programs don't always consume that
    much. The stack usage is determined by the number of on-stack variables used by
    the program. The verifier could have enforced 512 limit for combined original
    plus extension program, but it makes for difficult user experience. The main
    use case for extensions is to provide generic mechanism to plug external
    programs into policy program or function call chaining.

    BPF trampoline is used to track both fentry/fexit and program extensions
    because both are using the same nop slot at the beginning of every BPF
    function. Attaching fentry/fexit to a function that was replaced is not
    allowed. The opposite is true as well. Replacing a function that currently
    being analyzed with fentry/fexit is not allowed. The executable page allocated
    by BPF trampoline is not used by program extensions. This inefficiency will be
    optimized in future patches.

    Function by function verification of global function supports scalars and
    pointer to context only. Hence program extensions are supported for such class
    of global functions only. In the future the verifier will be extended with
    support to pointers to structures, arrays with sizes, etc.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Andrii Nakryiko
    Acked-by: Toke Høiland-Jørgensen
    Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org

    Alexei Starovoitov
     

10 Jan, 2020

2 commits

  • The patch introduces BPF_MAP_TYPE_STRUCT_OPS. The map value
    is a kernel struct with its func ptr implemented in bpf prog.
    This new map is the interface to register/unregister/introspect
    a bpf implemented kernel struct.

    The kernel struct is actually embedded inside another new struct
    (or called the "value" struct in the code). For example,
    "struct tcp_congestion_ops" is embbeded in:
    struct bpf_struct_ops_tcp_congestion_ops {
    refcount_t refcnt;
    enum bpf_struct_ops_state state;
    struct tcp_congestion_ops data; /*
    INUSE (map updated, i.e. reg) =>
    TOBEFREE (map value deleted, i.e. unreg)

    The kernel subsystem needs to call bpf_struct_ops_get() and
    bpf_struct_ops_put() to manage the "refcnt" in the
    "struct bpf_struct_ops_XYZ". This patch uses a separate refcnt
    for the purose of tracking the subsystem usage. Another approach
    is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
    the subsystem's usage by doing map->refcnt - map->usercnt to filter out
    the map-fd/pinned-map usage. However, that will also tie down the
    future semantics of map->refcnt and map->usercnt.

    The very first subsystem's refcnt (during reg()) holds one
    count to map->refcnt. When the very last subsystem's refcnt
    is gone, it will also release the map->refcnt. All bpf_prog will be
    freed when the map->refcnt reaches 0 (i.e. during map_free()).

    Here is how the bpftool map command will look like:
    [root@arch-fb-vm1 bpf]# bpftool map show
    6: struct_ops name dctcp flags 0x0
    key 4B value 256B max_entries 1 memlock 4096B
    btf_id 6
    [root@arch-fb-vm1 bpf]# bpftool map dump id 6
    [{
    "value": {
    "refcnt": {
    "refs": {
    "counter": 1
    }
    },
    "state": 1,
    "data": {
    "list": {
    "next": 0,
    "prev": 0
    },
    "key": 0,
    "flags": 2,
    "init": 24,
    "release": 0,
    "ssthresh": 25,
    "cong_avoid": 30,
    "set_state": 27,
    "cwnd_event": 28,
    "in_ack_event": 26,
    "undo_cwnd": 29,
    "pkts_acked": 0,
    "min_tso_segs": 0,
    "sndbuf_expand": 0,
    "cong_control": 0,
    "get_info": 0,
    "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
    ],
    "owner": 0
    }
    }
    }
    ]

    Misc Notes:
    * bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
    It does an inplace update on "*value" instead returning a pointer
    to syscall.c. Otherwise, it needs a separate copy of "zero" value
    for the BPF_STRUCT_OPS_STATE_INIT to avoid races.

    * The bpf_struct_ops_map_delete_elem() is also called without
    preempt_disable() from map_delete_elem(). It is because
    the "->unreg()" may requires sleepable context, e.g.
    the "tcp_unregister_congestion_control()".

    * "const" is added to some of the existing "struct btf_func_model *"
    function arg to avoid a compiler warning caused by this patch.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com

    Martin KaFai Lau
     
  • This patch allows the kernel's struct ops (i.e. func ptr) to be
    implemented in BPF. The first use case in this series is the
    "struct tcp_congestion_ops" which will be introduced in a
    latter patch.

    This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS.
    The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular
    func ptr of a kernel struct. The attr->attach_btf_id is the btf id
    of a kernel struct. The attr->expected_attach_type is the member
    "index" of that kernel struct. The first member of a struct starts
    with member index 0. That will avoid ambiguity when a kernel struct
    has multiple func ptrs with the same func signature.

    For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written
    to implement the "init" func ptr of the "struct tcp_congestion_ops".
    The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops"
    of the _running_ kernel. The attr->expected_attach_type is 3.

    The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved
    by arch_prepare_bpf_trampoline that will be done in the next
    patch when introducing BPF_MAP_TYPE_STRUCT_OPS.

    "struct bpf_struct_ops" is introduced as a common interface for the kernel
    struct that supports BPF_PROG_TYPE_STRUCT_OPS prog. The supporting kernel
    struct will need to implement an instance of the "struct bpf_struct_ops".

    The supporting kernel struct also needs to implement a bpf_verifier_ops.
    During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right
    bpf_verifier_ops by searching the attr->attach_btf_id.

    A new "btf_struct_access" is also added to the bpf_verifier_ops such
    that the supporting kernel struct can optionally provide its own specific
    check on accessing the func arg (e.g. provide limited write access).

    After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called
    to initialize some values (e.g. the btf id of the supporting kernel
    struct) and it can only be done once the btf_vmlinux is available.

    The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog
    if the return type of the prog->aux->attach_func_proto is "void".

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com

    Martin KaFai Lau
     

16 Nov, 2019

1 commit

  • Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type
    including their subprograms. This feature allows snooping on input and output
    packets in XDP, TC programs including their return values. In order to do that
    the verifier needs to track types not only of vmlinux, but types of other BPF
    programs as well. The verifier also needs to translate uapi/linux/bpf.h types
    used by networking programs into kernel internal BTF types used by FENTRY/FEXIT
    BPF programs. In some cases LLVM optimizations can remove arguments from BPF
    subprograms without adjusting BTF info that LLVM backend knows. When BTF info
    disagrees with actual types that the verifiers sees the BPF trampoline has to
    fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT
    program can still attach to such subprograms, but it won't be able to recognize
    pointer types like 'struct sk_buff *' and it won't be able to pass them to
    bpf_skb_output() for dumping packets to user space. The FENTRY/FEXIT program
    would need to use bpf_probe_read_kernel() instead.

    The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set
    to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd
    points to previously loaded BPF program the attach_btf_id is BTF type id of
    main function or one of its subprograms.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20191114185720.1641606-18-ast@kernel.org

    Alexei Starovoitov
     

25 Oct, 2019

1 commit

  • This patch makes a few changes to btf_ctx_access() to prepare
    it for non raw_tp use case where the attach_btf_id is not
    necessary a BTF_KIND_TYPEDEF.

    It moves the "btf_trace_" prefix check and typedef-follow logic to a new
    function "check_attach_btf_id()" which is called only once during
    bpf_check(). btf_ctx_access() only operates on a BTF_KIND_FUNC_PROTO
    type now. That should also be more efficient since it is done only
    one instead of every-time check_ctx_access() is called.

    "check_attach_btf_id()" needs to find the func_proto type from
    the attach_btf_id. It needs to store the result into the
    newly added prog->aux->attach_func_proto. func_proto
    btf type has no name, so a proper name should be stored into
    "attach_func_name" also.

    v2:
    - Move the "btf_trace_" check to an earlier verifier phase (Alexei)

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20191025001811.1718491-1-kafai@fb.com

    Martin KaFai Lau
     

17 Oct, 2019

1 commit

  • If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
    for further use by the verifier.
    In-kernel BTF is trusted just like kallsyms and other build artifacts
    embedded into vmlinux.
    Yet run this BTF image through BTF verifier to make sure
    that it is valid and it wasn't mangled during the build.
    If in-kernel BTF is incorrect it means either gcc or pahole or kernel
    are buggy. In such case disallow loading BPF programs.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20191016032505.2089704-4-ast@kernel.org

    Alexei Starovoitov
     

10 Apr, 2019

1 commit

  • Given we'll be reusing BPF array maps for global data/bss/rodata
    sections, we need a way to associate BTF DataSec type as its map
    value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR()
    macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR")
    to get initial map to type association going. While more use cases
    for it are discouraged, this also won't work for global data since
    the use of array map is a BPF loader detail and therefore unknown
    at compilation time. For array maps with just a single entry we make
    an exception in terms of BTF in that key type is declared optional
    if value type is of DataSec type. The latter LLVM is guaranteed to
    emit and it also aligns with how we regard global data maps as just
    a plain buffer area reusing existing map facilities for allowing
    things like introspection with existing tools.

    Signed-off-by: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

02 Feb, 2019

1 commit

  • Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
    bpf program serialize access to other variables.

    Example:
    struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
    };
    struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
    if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
    }

    Restrictions and safety checks:
    - bpf_spin_lock is only allowed inside HASH and ARRAY maps.
    - BTF description of the map is mandatory for safety analysis.
    - bpf program can take one bpf_spin_lock at a time, since two or more can
    cause dead locks.
    - only one 'struct bpf_spin_lock' is allowed per map element.
    It drastically simplifies implementation yet allows bpf program to use
    any number of bpf_spin_locks.
    - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
    - bpf program must bpf_spin_unlock() before return.
    - bpf program can access 'struct bpf_spin_lock' only via
    bpf_spin_lock()/bpf_spin_unlock() helpers.
    - load/store into 'struct bpf_spin_lock lock;' field is not allowed.
    - to use bpf_spin_lock() helper the BTF description of map value must be
    a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
    Nested lock inside another struct is not allowed.
    - syscall map_lookup doesn't copy bpf_spin_lock field to user space.
    - syscall map_update and program map_update do not update bpf_spin_lock field.
    - bpf_spin_lock cannot be on the stack or inside networking packet.
    bpf_spin_lock can only be inside HASH or ARRAY map value.
    - bpf_spin_lock is available to root only and to all program types.
    - bpf_spin_lock is not allowed in inner maps of map-in-map.
    - ld_abs is not allowed inside spin_lock-ed region.
    - tracing progs and socket filter progs cannot use bpf_spin_lock due to
    insufficient preemption checks

    Implementation details:
    - cgroup-bpf class of programs can nest with xdp/tc programs.
    Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
    Other solutions to avoid nested bpf_spin_lock are possible.
    Like making sure that all networking progs run with softirq disabled.
    spin_lock_irqsave is the simplest and doesn't add overhead to the
    programs that don't use it.
    - arch_spinlock_t is used when its implemented as queued_spin_lock
    - archs can force their own arch_spinlock_t
    - on architectures where queued_spin_lock is not available and
    sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
    - presence of bpf_spin_lock inside map value could have been indicated via
    extra flag during map_create, but specifying it via BTF is cleaner.
    It provides introspection for map key/value and reduces user mistakes.

    Next steps:
    - allow bpf_spin_lock in other map types (like cgroup local storage)
    - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
    to request kernel to grab bpf_spin_lock before rewriting the value.
    That will serialize access to map elements.

    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

18 Dec, 2018

1 commit

  • Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
    local storage maps") added bpffs pretty print for cgroup
    local storage maps. The commit worked for struct without kind_flag
    set.

    This patch refactored and made pretty print also work
    with kind_flag set for the struct.

    Acked-by: Martin KaFai Lau
    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Yonghong Song
     

15 Dec, 2018

1 commit

  • The current btf_name_by_offset() is returning "(anon)" type name for
    the offset == 0 case and "(invalid-name-offset)" for the out-of-bound
    offset case.

    It fits well for the internal BTF verbose log purpose which
    is focusing on type. For example,
    offset == 0 => "(anon)" => anonymous type/name.
    Returning non-NULL for the bad offset case is needed
    during the BTF verification process because the BTF verifier may
    complain about another field first before discovering the name_off
    is invalid.

    However, it may not be ideal for the newer use case which does not
    necessary mean type name. For example, when logging line_info
    in the BPF verifier in the next patch, it is better to log an
    empty src line instead of logging "(anon)".

    The existing bpf_name_by_offset() is renamed to __bpf_name_by_offset()
    and static to btf.c.

    A new bpf_name_by_offset() is added for generic context usage. It
    returns "\0" for name_off == 0 (note that btf->strings[0] is "\0")
    and NULL for invalid offset. It allows the caller to decide
    what is the best output in its context.

    The new btf_name_by_offset() is overlapped with btf_name_offset_valid().
    Hence, btf_name_offset_valid() is removed from btf.h to keep the btf.h API
    minimal. The existing btf_name_offset_valid() usage in btf.c could also be
    replaced later.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Martin KaFai Lau
     

13 Dec, 2018

1 commit

  • Implement bpffs pretty printing for cgroup local storage maps
    (both shared and per-cpu).
    Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):

    Shared:
    $ cat /sys/fs/bpf/map_2
    # WARNING!! The output is for debug purpose only
    # WARNING!! The output format will change
    {4294968594,1}: {9999,1039896}

    Per-cpu:
    $ cat /sys/fs/bpf/map_1
    # WARNING!! The output is for debug purpose only
    # WARNING!! The output format will change
    {4294968594,1}: {
    cpu0: {0,0,0,0,0}
    cpu1: {0,0,0,0,0}
    cpu2: {1,104,0,0,0}
    cpu3: {0,0,0,0,0}
    }

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     

10 Dec, 2018

1 commit

  • This patch adds bpf_line_info support.

    It accepts an array of bpf_line_info objects during BPF_PROG_LOAD.
    The "line_info", "line_info_cnt" and "line_info_rec_size" are added
    to the "union bpf_attr". The "line_info_rec_size" makes
    bpf_line_info extensible in the future.

    The new "check_btf_line()" ensures the userspace line_info is valid
    for the kernel to use.

    When the verifier is translating/patching the bpf_prog (through
    "bpf_patch_insn_single()"), the line_infos' insn_off is also
    adjusted by the newly added "bpf_adj_linfo()".

    If the bpf_prog is jited, this patch also provides the jited addrs (in
    aux->jited_linfo) for the corresponding line_info.insn_off.
    "bpf_prog_fill_jited_linfo()" is added to fill the aux->jited_linfo.
    It is currently called by the x86 jit. Other jits can also use
    "bpf_prog_fill_jited_linfo()" and it will be done in the followup patches.
    In the future, if it deemed necessary, a particular jit could also provide
    its own "bpf_prog_fill_jited_linfo()" implementation.

    A few "*line_info*" fields are added to the bpf_prog_info such
    that the user can get the xlated line_info back (i.e. the line_info
    with its insn_off reflecting the translated prog). The jited_line_info
    is available if the prog is jited. It is an array of __u64.
    If the prog is not jited, jited_line_info_cnt is 0.

    The verifier's verbose log with line_info will be done in
    a follow up patch.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Martin KaFai Lau
     

21 Nov, 2018

2 commits

  • Kernel test robot (lkp@intel.com) reports a compilation error at
    https://www.spinics.net/lists/netdev/msg534913.html
    introduced by commit 838e96904ff3 ("bpf: Introduce bpf_func_info").

    If CONFIG_BPF is defined and CONFIG_BPF_SYSCALL is not defined,
    the following error will appear:
    kernel/bpf/core.c:414: undefined reference to `btf_type_by_id'
    kernel/bpf/core.c:415: undefined reference to `btf_name_by_offset'

    When CONFIG_BPF_SYSCALL is not defined,
    let us define stub inline functions for btf_type_by_id()
    and btf_name_by_offset() in include/linux/btf.h.
    This way, the compilation failure can be avoided.

    Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info")
    Reported-by: kbuild test robot
    Cc: Martin KaFai Lau
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     
  • This patch added interface to load a program with the following
    additional information:
    . prog_btf_fd
    . func_info, func_info_rec_size and func_info_cnt
    where func_info will provide function range and type_id
    corresponding to each function.

    The func_info_rec_size is introduced in the UAPI to specify
    struct bpf_func_info size passed from user space. This
    intends to make bpf_func_info structure growable in the future.
    If the kernel gets a different bpf_func_info size from userspace,
    it will try to handle user request with part of bpf_func_info
    it can understand. In this patch, kernel can understand
    struct bpf_func_info {
    __u32 insn_offset;
    __u32 type_id;
    };
    If user passed a bpf func_info record size of 16 bytes, the
    kernel can still handle part of records with the above definition.

    If verifier agrees with function range provided by the user,
    the bpf_prog ksym for each function will use the func name
    provided in the type_id, which is supposed to provide better
    encoding as it is not limited by 16 bytes program name
    limitation and this is better for bpf program which contains
    multiple subprograms.

    The bpf_prog_info interface is also extended to
    return btf_id, func_info, func_info_rec_size and func_info_cnt
    to userspace, so userspace can print out the function prototype
    for each xlated function. The insn_offset in the returned
    func_info corresponds to the insn offset for xlated functions.
    With other jit related fields in bpf_prog_info, userspace can also
    print out function prototypes for each jited function.

    Signed-off-by: Yonghong Song
    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     

09 May, 2018

1 commit

  • This patch gives an ID to each loaded BTF. The ID is allocated by
    the idr like the existing prog-id and map-id.

    The bpf_put(map->btf) is moved to __bpf_map_put() so that the
    userspace can stop seeing the BTF ID ASAP when the last BTF
    refcnt is gone.

    It also makes BTF accessible from userspace through the
    1. new BPF_BTF_GET_FD_BY_ID command. It is limited to CAP_SYS_ADMIN
    which is inline with the BPF_BTF_LOAD cmd and the existing
    BPF_[MAP|PROG]_GET_FD_BY_ID cmd.
    2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info"

    Once the BTF ID handler is accessible from userspace, freeing a BTF
    object has to go through a rcu period. The BPF_BTF_GET_FD_BY_ID cmd
    can then be done under a rcu_read_lock() instead of taking
    spin_lock.
    [Note: A similar rcu usage can be done to the existing
    bpf_prog_get_fd_by_id() in a follow up patch]

    When processing the BPF_BTF_GET_FD_BY_ID cmd,
    refcount_inc_not_zero() is needed because the BTF object
    could be already in the rcu dead row . btf_get() is
    removed since its usage is currently limited to btf.c
    alone. refcount_inc() is used directly instead.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Martin KaFai Lau
     

20 Apr, 2018

4 commits

  • This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
    The original BTF data, which was used to create the BTF fd during
    the earlier BPF_BTF_LOAD call, will be returned.

    The userspace is expected to allocate buffer
    to info.info and the buffer size is set to info.info_len before
    calling BPF_OBJ_GET_INFO_BY_FD.

    The original BTF data is copied to the userspace buffer (info.info).
    Only upto the user's specified info.info_len will be copied.

    The original BTF data size is set to info.info_len. The userspace
    needs to check if it is bigger than its allocated buffer size.
    If it is, the userspace should realloc with the kernel-returned
    info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Martin KaFai Lau
     
  • This patch adds a BPF_BTF_LOAD command which
    1) loads and verifies the BTF (implemented in earlier patches)
    2) returns a BTF fd to userspace. In the next patch, the
    BTF fd can be specified during BPF_MAP_CREATE.

    It currently limits to CAP_SYS_ADMIN.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Martin KaFai Lau
     
  • This patch adds pretty print capability for data with BTF type info.
    The current usage is to allow pretty print for a BPF map.

    The next few patches will allow a read() on a pinned map with BTF
    type info for its key and value.

    This patch uses the seq_printf() infra.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Martin KaFai Lau
     
  • After collecting all btf_type in the first pass in an earlier patch,
    the second pass (in this patch) can validate the reference types
    (e.g. the referring type does exist and it does not refer to itself).

    While checking the reference type, it also gathers other information (e.g.
    the size of an array). This info will be useful in checking the
    struct's members in a later patch. They will also be useful in doing
    pretty print later.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Martin KaFai Lau