22 Aug, 2020

1 commit

  • For bpf_map_elem and bpf_sk_local_storage bpf iterators,
    additional map_id should be shown for fdinfo and
    userspace query. For example, the following is for
    a bpf_map_elem iterator.
    $ cat /proc/1753/fdinfo/9
    pos: 0
    flags: 02000000
    mnt_id: 14
    link_type: iter
    link_id: 34
    prog_tag: 104be6d3fe45e6aa
    prog_id: 173
    target_name: bpf_map_elem
    map_id: 127

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200821184419.574240-1-yhs@fb.com

    Yonghong Song
     

07 Aug, 2020

1 commit

  • Commit a5cbe05a6673 ("bpf: Implement bpf iterator for
    map elements") added bpf iterator support for
    map elements. The map element bpf iterator requires
    info to identify a particular map. In the above
    commit, the attr->link_create.target_fd is used
    to carry map_fd and an enum bpf_iter_link_info
    is added to uapi to specify the target_fd actually
    representing a map_fd:
    enum bpf_iter_link_info {
    BPF_ITER_LINK_UNSPEC = 0,
    BPF_ITER_LINK_MAP_FD = 1,

    MAX_BPF_ITER_LINK_INFO,
    };

    This is an extensible approach as we can grow
    enumerator for pid, cgroup_id, etc. and we can
    unionize target_fd for pid, cgroup_id, etc.
    But in the future, there are chances that
    more complex customization may happen, e.g.,
    for tasks, it could be filtered based on
    both cgroup_id and user_id.

    This patch changed the uapi to have fields
    __aligned_u64 iter_info;
    __u32 iter_info_len;
    for additional iter_info for link_create.
    The iter_info is defined as
    union bpf_iter_link_info {
    struct {
    __u32 map_fd;
    } map;
    };

    So future extension for additional customization
    will be easier. The bpf_iter_link_info will be
    passed to target callback to validate and generic
    bpf_iter framework does not need to deal it any
    more.

    Note that map_fd = 0 will be considered invalid
    and -EBADF will be returned to user space.

    Fixes: a5cbe05a6673 ("bpf: Implement bpf iterator for map elements")
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com

    Yonghong Song
     

26 Jul, 2020

5 commits

  • The bpf iterators for array and percpu array
    are implemented. Similar to hash maps, for percpu
    array map, bpf program will receive values
    from all cpus.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200723184115.590532-1-yhs@fb.com

    Yonghong Song
     
  • The bpf iterators for hash, percpu hash, lru hash
    and lru percpu hash are implemented. During link time,
    bpf_iter_reg->check_target() will check map type
    and ensure the program access key/value region is
    within the map defined key/value size limit.

    For percpu hash and lru hash maps, the bpf program
    will receive values for all cpus. The map element
    bpf iterator infrastructure will prepare value
    properly before passing the value pointer to the
    bpf program.

    This patch set supports readonly map keys and
    read/write map values. It does not support deleting
    map elements, e.g., from hash tables. If there is
    a user case for this, the following mechanism can
    be used to support map deletion for hashtab, etc.
    - permit a new bpf program return value, e.g., 2,
    to let bpf iterator know the map element should
    be removed.
    - since bucket lock is taken, the map element will be
    queued.
    - once bucket lock is released after all elements under
    this bucket are traversed, all to-be-deleted map
    elements can be deleted.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200723184114.590470-1-yhs@fb.com

    Yonghong Song
     
  • The bpf iterator for map elements are implemented.
    The bpf program will receive four parameters:
    bpf_iter_meta *meta: the meta data
    bpf_map *map: the bpf_map whose elements are traversed
    void *key: the key of one element
    void *value: the value of the same element

    Here, meta and map pointers are always valid, and
    key has register type PTR_TO_RDONLY_BUF_OR_NULL and
    value has register type PTR_TO_RDWR_BUF_OR_NULL.
    The kernel will track the access range of key and value
    during verification time. Later, these values will be compared
    against the values in the actual map to ensure all accesses
    are within range.

    A new field iter_seq_info is added to bpf_map_ops which
    is used to add map type specific information, i.e., seq_ops,
    init/fini seq_file func and seq_file private data size.
    Subsequent patches will have actual implementation
    for bpf_map_ops->iter_seq_info.

    In user space, BPF_ITER_LINK_MAP_FD needs to be
    specified in prog attr->link_create.flags, which indicates
    that attr->link_create.target_fd is a map_fd.
    The reason for such an explicit flag is for possible
    future cases where one bpf iterator may allow more than
    one possible customization, e.g., pid and cgroup id for
    task_file.

    Current kernel internal implementation only allows
    the target to register at most one required bpf_iter_link_info.
    To support the above case, optional bpf_iter_link_info's
    are needed, the target can be extended to register such link
    infos, and user provided link_info needs to match one of
    target supported ones.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com

    Yonghong Song
     
  • There is no functionality change for this patch.
    Struct bpf_iter_reg is used to register a bpf_iter target,
    which includes information for both prog_load, link_create
    and seq_file creation.

    This patch puts fields related seq_file creation into
    a different structure. This will be useful for map
    elements iterator where one iterator covers different
    map types and different map types may have different
    seq_ops, init/fini private_data function and
    private_data size.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com

    Yonghong Song
     
  • Currently, the pos pointer in bpf iterator map/task/task_file
    seq_ops->start() is always incremented.
    This is incorrect. It should be increased only if
    *pos is 0 (for SEQ_START_TOKEN) since these start()
    function actually returns the first real object.
    If *pos is not 0, it merely found the object
    based on the state in seq->private, and not really
    advancing the *pos. This patch fixed this issue
    by only incrementing *pos if it is 0.

    Note that the old *pos calculation, although not
    correct, does not affect correctness of bpf_iter
    as bpf_iter seq_file->read() does not support llseek.

    This patch also renamed "mid" in bpf_map iterator
    seq_file private data to "map_id" for better clarity.

    Fixes: 6086d29def80 ("bpf: Add bpf_map iterator")
    Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200722195156.4029817-1-yhs@fb.com

    Yonghong Song
     

22 Jul, 2020

1 commit

  • One additional field btf_id is added to struct
    bpf_ctx_arg_aux to store the precomputed btf_ids.
    The btf_id is computed at build time with
    BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions.
    All existing bpf iterators are changed to used
    pre-compute btf_ids.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com

    Yonghong Song
     

14 May, 2020

2 commits

  • Commit b121b341e598 ("bpf: Add PTR_TO_BTF_ID_OR_NULL
    support") adds a field btf_id_or_null_non0_off to
    bpf_prog->aux structure to indicate that the
    first ctx argument is PTR_TO_BTF_ID reg_type and
    all others are PTR_TO_BTF_ID_OR_NULL.
    This approach does not really scale if we have
    other different reg types in the future, e.g.,
    a pointer to a buffer.

    This patch enables bpf_iter targets registering ctx argument
    reg types which may be different from the default one.
    For example, for pointers to structures, the default reg_type
    is PTR_TO_BTF_ID for tracing program. The target can register
    a particular pointer type as PTR_TO_BTF_ID_OR_NULL which can
    be used by the verifier to enforce accesses.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200513180221.2949882-1-yhs@fb.com

    Yonghong Song
     
  • Currently bpf_iter_reg_target takes parameters from target
    and allocates memory to save them. This is really not
    necessary, esp. in the future we may grow information
    passed from targets to bpf_iter manager.

    The patch refactors the code so target reg_info
    becomes static and bpf_iter manager can just take
    a reference to it.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200513180219.2949605-1-yhs@fb.com

    Yonghong Song
     

10 May, 2020

1 commit

  • Implement seq_file operations to traverse all bpf_maps.

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200509175909.2476096-1-yhs@fb.com

    Yonghong Song