26 Jul, 2019

2 commits

  • This will allow us to write tests for those flags.

    v2:
    * Swap kfree(data) and kfree(user_ctx) (Song Liu)

    Acked-by: Petar Penkov
    Acked-by: Willem de Bruijn
    Acked-by: Song Liu
    Cc: Song Liu
    Cc: Willem de Bruijn
    Cc: Petar Penkov
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Alexei Starovoitov

    Stanislav Fomichev
     
  • C flow dissector supports input flags that tell it to customize parsing
    by either stopping early or trying to parse as deep as possible. Pass
    those flags to the BPF flow dissector so it can make the same
    decisions. In the next commits I'll add support for those flags to
    our reference bpf_flow.c

    v3:
    * Export copy of flow dissector flags instead of moving (Alexei Starovoitov)

    Acked-by: Petar Penkov
    Acked-by: Willem de Bruijn
    Acked-by: Song Liu
    Cc: Song Liu
    Cc: Willem de Bruijn
    Cc: Petar Penkov
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Alexei Starovoitov

    Stanislav Fomichev
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 107 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Reviewed-by: Steve Winslow
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171438.615055994@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


28 Apr, 2019

1 commit

  • After allowing a bpf prog to
    - directly read the skb->sk ptr
    - get the fullsock bpf_sock by "bpf_sk_fullsock()"
    - get the bpf_tcp_sock by "bpf_tcp_sock()"
    - get the listener sock by "bpf_get_listener_sock()"
    - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock"
    into different bpf running context.

    this patch is another effort to make bpf's network programming
    more intuitive to do (together with memory and performance benefit).

    When bpf prog needs to store data for a sk, the current practice is to
    define a map with the usual 4-tuples (src/dst ip/port) as the key.
    If multiple bpf progs require to store different sk data, multiple maps
    have to be defined. Hence, wasting memory to store the duplicated
    keys (i.e. 4 tuples here) in each of the bpf map.
    [ The smallest key could be the sk pointer itself which requires
    some enhancement in the verifier and it is a separate topic. ]

    Also, the bpf prog needs to clean up the elem when sk is freed.
    Otherwise, the bpf map will become full and un-usable quickly.
    The sk-free tracking currently could be done during sk state
    transition (e.g. BPF_SOCK_OPS_STATE_CB).

    The size of the map needs to be predefined which then usually ended-up
    with an over-provisioned map in production. Even the map was re-sizable,
    while the sk naturally come and go away already, this potential re-size
    operation is arguably redundant if the data can be directly connected
    to the sk itself instead of proxy-ing through a bpf map.

    This patch introduces sk->sk_bpf_storage to provide local storage space
    at sk for bpf prog to use. The space will be allocated when the first bpf
    prog has created data for this particular sk.

    The design optimizes the bpf prog's lookup (and then optionally followed by
    an inline update). bpf_spin_lock should be used if the inline update needs
    to be protected.

    BPF_MAP_TYPE_SK_STORAGE:
    -----------------------
    To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in
    this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can
    be created to fit different bpf progs' needs. The map enforces
    BTF to allow printing the sk-local-storage during a system-wise
    sk dump (e.g. "ss -ta") in the future.

    The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete
    a "sk-local-storage" data from a particular sk.
    Think of the map as a meta-data (or "type") of a "sk-local-storage". This
    particular "type" of "sk-local-storage" data can then be stored in any sk.

    The main purposes of this map are mostly:
    1. Define the size of a "sk-local-storage" type.
    2. Provide a similar syscall userspace API as the map (e.g. lookup/update,
    map-id, map-btf...etc.)
    3. Keep track of all sk's storages of this "type" and clean them up
    when the map is freed.

    sk->sk_bpf_storage:
    ------------------
    The main lookup/update/delete is done on sk->sk_bpf_storage (which
    is a "struct bpf_sk_storage"). When doing a lookup,
    the "map" pointer is now used as the "key" to search on the
    sk_storage->list. The "map" pointer is actually serving
    as the "type" of the "sk-local-storage" that is being
    requested.

    To allow very fast lookup, it should be as fast as looking up an
    array at a stable-offset. At the same time, it is not ideal to
    set a hard limit on the number of sk-local-storage "type" that the
    system can have. Hence, this patch takes a cache approach.
    The last search result from sk_storage->list is cached in
    sk_storage->cache[] which is a stable sized array. Each
    "sk-local-storage" type has a stable offset to the cache[] array.
    In the future, a map's flag could be introduced to do cache
    opt-out/enforcement if it became necessary.

    The cache size is 16 (i.e. 16 types of "sk-local-storage").
    Programs can share map. On the program side, having a few bpf_progs
    running in the networking hotpath is already a lot. The bpf_prog
    should have already consolidated the existing sock-key-ed map usage
    to minimize the map lookup penalty. 16 has enough runway to grow.

    All sk-local-storage data will be removed from sk->sk_bpf_storage
    during sk destruction.

    bpf_sk_storage_get() and bpf_sk_storage_delete():
    ------------------------------------------------
    Instead of using bpf_map_(lookup|update|delete)_elem(),
    the bpf prog needs to use the new helper bpf_sk_storage_get() and
    bpf_sk_storage_delete(). The verifier can then enforce the
    ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to
    "create" new elem if one does not exist in the sk. It is done by
    the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be
    provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE.
    The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together,
    it has eliminated the potential use cases for an equivalent
    bpf_map_update_elem() API (for bpf_prog) in this patch.

    Misc notes:
    ----------
    1. map_get_next_key is not supported. From the userspace syscall
    perspective, the map has the socket fd as the key while the map
    can be shared by pinned-file or map-id.

    Since btf is enforced, the existing "ss" could be enhanced to pretty
    print the local-storage.

    Supporting a kernel defined btf with 4 tuples as the return key could
    be explored later also.

    2. The sk->sk_lock cannot be acquired. Atomic operations is used instead.
    e.g. cmpxchg is done on the sk->sk_bpf_storage ptr.
    Please refer to the source code comments for the details in
    synchronization cases and considerations.

    3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.

    Benchmark:
    ---------
    Here is the benchmark data collected by turning on
    the "kernel.bpf_stats_enabled" sysctl.
    Two bpf progs are tested:

    One bpf prog with the usual bpf hashmap (max_entries = 8192) with the
    sk ptr as the key. (verifier is modified to support sk ptr as the key
    That should have shortened the key lookup time.)

    Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.

    Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for
    each egress skb and then bump the cnt. netperf is used to drive
    data with 4096 connected UDP sockets.

    BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run)
    27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633
    loaded_at 2019-04-15T13:46:39-0700 uid 0
    xlated 344B jited 258B memlock 4096B map_ids 16
    btf_id 5

    BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run)
    30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739
    loaded_at 2019-04-15T13:47:54-0700 uid 0
    xlated 168B jited 156B memlock 4096B map_ids 17
    btf_id 6

    Here is a high-level picture on how are the objects organized:

    sk
    ┌──────┐
    │ │
    │ │
    │ │
    │*sk_bpf_storage─────▶ bpf_sk_storage
    └──────┘ ┌───────┐
    ┌───────────┤ list │
    │ │ │
    │ │ │
    │ │ │
    │ └───────┘

    │ elem
    │ ┌────────┐
    ├─▶│ snode │
    │ ├────────┤
    │ │ data │ bpf_map
    │ ├────────┤ ┌─────────┐
    │ │map_node│◀─┬─────┤ list │
    │ └────────┘ │ │ │
    │ │ │ │
    │ elem │ │ │
    │ ┌────────┐ │ └─────────┘
    └─▶│ snode │ │
    ├────────┤ │
    bpf_map │ data │ │
    ┌─────────┐ ├────────┤ │
    │ list ├───────▶│map_node│ │
    │ │ └────────┘ │
    │ │ │
    │ │ elem │
    └─────────┘ ┌────────┐ │
    ┌─▶│ snode │ │
    │ ├────────┤ │
    │ │ data │ │
    │ ├────────┤ │
    │ │map_node│◀─┘
    │ └────────┘


    │ ┌───────┐
    sk └──────────│ list │
    ┌──────┐ │ │
    │ │ │ │
    │ │ │ │
    │ │ └───────┘
    │*sk_bpf_storage───────▶bpf_sk_storage
    └──────┘

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov

    Martin KaFai Lau
     

27 Apr, 2019

1 commit

  • This tests that:
    * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
    uses either:
    * a variable offset to the tracepoint buffer, or
    * an offset beyond the size of the tracepoint buffer
    * a tracer can modify the buffer provided when attached to a writable
    tracepoint in bpf_prog_test_run

    Signed-off-by: Matt Mullins
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Matt Mullins
     

24 Apr, 2019

3 commits

  • Now that we use skb-less flow dissector let's return true nhoff and
    thoff. We used to adjust them by ETH_HLEN because that's how it was
    done in the skb case. For VLAN tests that looks confusing: nhoff is
    pointing to vlan parts :-\

    Warning, this is an API change for BPF_PROG_TEST_RUN! Feel free to drop
    if you think that it's too late at this point to fix it.

    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     
  • Now that we have bpf_flow_dissect which can work on raw data,
    use it when doing BPF_PROG_TEST_RUN for flow dissector.

    Simplifies bpf_prog_test_run_flow_dissector and allows us to
    test no-skb mode.

    Note, that previously, with bpf_flow_dissect_skb we used to call
    eth_type_trans which pulled L2 (ETH_HLEN) header and we explicitly called
    skb_reset_network_header. That means flow_keys->nhoff would be
    initialized to 0 (skb_network_offset) in init_flow_keys.
    Now we call bpf_flow_dissect with nhoff set to ETH_HLEN and need
    to undo it once the dissection is done to preserve the existing behavior.

    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     
  • struct bpf_flow_dissector has a small subset of sk_buff fields that
    flow dissector BPF program is allowed to access and an optional
    pointer to real skb. Real skb is used only in bpf_skb_load_bytes
    helper to read non-linear data.

    The real motivation for this is to be able to call flow dissector
    from eth_get_headlen context where we don't have an skb and need
    to dissect raw bytes.

    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     

12 Apr, 2019

2 commits

  • This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case
    and be sure that nobody is erroneously setting ctx_{in,out}.

    Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
    Reported-by: Daniel Borkmann
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     
  • Commit b0b9395d865e ("bpf: support input __sk_buff context in
    BPF_PROG_TEST_RUN") started using bpf_check_uarg_tail_zero in
    BPF_PROG_TEST_RUN. However, bpf_check_uarg_tail_zero is not defined
    for !CONFIG_BPF_SYSCALL:

    net/bpf/test_run.c: In function ‘bpf_ctx_init’:
    net/bpf/test_run.c:142:9: error: implicit declaration of function ‘bpf_check_uarg_tail_zero’ [-Werror=implicit-function-declaration]
    err = bpf_check_uarg_tail_zero(data_in, max_size, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~

    Let's not build net/bpf/test_run.c when CONFIG_BPF_SYSCALL is not set.

    Reported-by: kbuild test robot
    Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     

11 Apr, 2019

1 commit

  • Add new set of arguments to bpf_attr for BPF_PROG_TEST_RUN:
    * ctx_in/ctx_size_in - input context
    * ctx_out/ctx_size_out - output context

    The intended use case is to pass some meta data to the test runs that
    operate on skb (this has being brought up on recent LPC).

    For programs that use bpf_prog_test_run_skb, support __sk_buff input and
    output. Initially, from input __sk_buff, copy _only_ cb and priority into
    skb, all other non-zero fields are prohibited (with EINVAL).
    If the user has set ctx_out/ctx_size_out, copy the potentially modified
    __sk_buff back to the userspace.

    We require all fields of input __sk_buff except the ones we explicitly
    support to be set to zero. The expectation is that in the future we might
    add support for more fields and we want to fail explicitly if the user
    runs the program on the kernel where we don't yet support them.

    The API is intentionally vague (i.e. we don't explicitly add __sk_buff
    to bpf_attr, but ctx_in) to potentially let other test_run types use
    this interface in the future (this can be xdp_md for xdp types for
    example).

    v4:
    * don't copy more than allowed in bpf_ctx_init [Martin]

    v3:
    * handle case where ctx_in is NULL, but ctx_out is not [Martin]
    * convert size==0 checks to ptr==NULL checks and add some extra ptr
    checks [Martin]

    v2:
    * Addressed comments from Martin Lau

    Signed-off-by: Stanislav Fomichev
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     

09 Mar, 2019

1 commit

  • Sparse warning below:

    sudo make C=2 CF=-D__CHECK_ENDIAN__ M=net/bpf/
    CHECK net/bpf//test_run.c
    net/bpf//test_run.c:19:77: warning: Using plain integer as NULL pointer
    ./include/linux/bpf-cgroup.h:295:77: warning: Using plain integer as NULL pointer

    Fixes: 8bad74f9840f ("bpf: extend cgroup bpf core to allow multiple cgroup storage types")
    Acked-by: Yonghong Song
    Signed-off-by: Bo YU
    Signed-off-by: Daniel Borkmann

    Bo YU
     

05 Mar, 2019

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2019-03-04

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Add AF_XDP support to libbpf. Rationale is to facilitate writing
    AF_XDP applications by offering higher-level APIs that hide many
    of the details of the AF_XDP uapi. Sample programs are converted
    over to this new interface as well, from Magnus.

    2) Introduce a new cant_sleep() macro for annotation of functions
    that cannot sleep and use it in BPF_PROG_RUN() to assert that
    BPF programs run under preemption disabled context, from Peter.

    3) Introduce per BPF prog stats in order to monitor the usage
    of BPF; this is controlled by kernel.bpf_stats_enabled sysctl
    knob where monitoring tools can make use of this to efficiently
    determine the average cost of programs, from Alexei.

    4) Split up BPF selftest's test_progs similarly as we already
    did with test_verifier. This allows to further reduce merge
    conflicts in future and to get more structure into our
    quickly growing BPF selftest suite, from Stanislav.

    5) Fix a bug in BTF's dedup algorithm which can cause an infinite
    loop in some circumstances; also various BPF doc fixes and
    improvements, from Andrii.

    6) Various BPF sample cleanups and migration to libbpf in order
    to further isolate the old sample loader code (so we can get
    rid of it at some point), from Jakub.

    7) Add a new BPF helper for BPF cgroup skb progs that allows
    to set ECN CE code point and a Host Bandwidth Manager (HBM)
    sample program for limiting the bandwidth used by v2 cgroups,
    from Lawrence.

    8) Enable write access to skb->queue_mapping from tc BPF egress
    programs in order to let BPF pick TX queue, from Jesper.

    9) Fix a bug in BPF spinlock handling for map-in-map which did
    not propagate spin_lock_off to the meta map, from Yonghong.

    10) Fix a bug in the new per-CPU BPF prog counters to properly
    initialize stats for each CPU, from Eric.

    11) Add various BPF helper prototypes to selftest's bpf_helpers.h,
    from Willem.

    12) Fix various BPF samples bugs in XDP and tracing progs,
    from Toke, Daniel and Yonghong.

    13) Silence preemption splat in test_bpf after BPF_PROG_RUN()
    enforces it now everywhere, from Anders.

    14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to
    get error handling working, from Dan.

    15) Fix bpftool documentation and auto-completion with regards
    to stream_{verdict,parser} attach types, from Alban.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Feb, 2019

1 commit

  • Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
    makes process unkillable. The problem is that when CONFIG_PREEMPT is
    enabled, we never see need_resched() return true. This is due to the
    fact that preempt_enable() (which we do in bpf_test_run_one on each
    iteration) now handles resched if it's needed.

    Let's disable preemption for the whole run, not per test. In this case
    we can properly see whether resched is needed.
    Let's also properly return -EINTR to the userspace in case of a signal
    interrupt.

    This is a follow up for a recently fixed issue in bpf_test_run, see
    commit df1a2cb7c74b ("bpf/test_run: fix unkillable
    BPF_PROG_TEST_RUN").

    Reported-by: syzbot
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     

25 Feb, 2019

1 commit

  • Three conflicts, one of which, for marvell10g.c is non-trivial and
    requires some follow-up from Heiner or someone else.

    The issue is that Heiner converted the marvell10g driver over to
    use the generic c45 code as much as possible.

    However, in 'net' a bug fix appeared which makes sure that a new
    local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
    is cleared.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Feb, 2019

1 commit

  • Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
    makes process unkillable. The problem is that when CONFIG_PREEMPT is
    enabled, we never see need_resched() return true. This is due to the
    fact that preempt_enable() (which we do in bpf_test_run_one on each
    iteration) now handles resched if it's needed.

    Let's disable preemption for the whole run, not per test. In this case
    we can properly see whether resched is needed.
    Let's also properly return -EINTR to the userspace in case of a signal
    interrupt.

    See recent discussion:
    http://lore.kernel.org/netdev/CAH3MdRWHr4N8jei8jxDppXjmw-Nw=puNDLbu1dQOFQHxfU2onA@mail.gmail.com

    I'll follow up with the same fix bpf_prog_test_run_flow_dissector in
    bpf-next.

    Reported-by: syzbot
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     

29 Jan, 2019

1 commit


11 Dec, 2018

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2018-12-11

    The following pull-request contains BPF updates for your *net-next* tree.

    It has three minor merge conflicts, resolutions:

    1) tools/testing/selftests/bpf/test_verifier.c

    Take first chunk with alignment_prevented_execution.

    2) net/core/filter.c

    [...]
    case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
    case bpf_ctx_range(struct __sk_buff, wire_len):
    return false;
    [...]

    3) include/uapi/linux/bpf.h

    Take the second chunk for the two cases each.

    The main changes are:

    1) Add support for BPF line info via BTF and extend libbpf as well
    as bpftool's program dump to annotate output with BPF C code to
    facilitate debugging and introspection, from Martin.

    2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter
    and all JIT backends, from Jiong.

    3) Improve BPF test coverage on archs with no efficient unaligned
    access by adding an "any alignment" flag to the BPF program load
    to forcefully disable verifier alignment checks, from David.

    4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for
    proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz.

    5) Extend tc BPF programs to use a new __sk_buff field called wire_len
    for more accurate accounting of packets going to wire, from Petar.

    6) Improve bpftool to allow dumping the trace pipe from it and add
    several improvements in bash completion and map/prog dump,
    from Quentin.

    7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for
    kernel addresses and add a dedicated BPF JIT backend allocator,
    from Ard.

    8) Add a BPF helper function for IR remotes to report mouse movements,
    from Sean.

    9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info
    member naming consistent with existing conventions, from Yonghong
    and Song.

    10) Misc cleanups and improvements in allowing to pass interface name
    via cmdline for xdp1 BPF example, from Matteo.

    11) Fix a potential segfault in BPF sample loader's kprobes handling,
    from Daniel T.

    12) Fix SPDX license in libbpf's README.rst, from Andrey.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Dec, 2018

1 commit


02 Dec, 2018

1 commit

  • After commit f42ee093be29 ("bpf/test_run: support cgroup local
    storage") the bpf_test_run() function may fail with -ENOMEM, if
    it's not possible to allocate memory for a cgroup local storage.

    This error shouldn't be mixed with the return value of the testing
    program. Let's add an additional argument with a pointer where to
    store the testing program's result; and make bpf_test_run()
    return either 0 or -ENOMEM.

    Fixes: f42ee093be29 ("bpf/test_run: support cgroup local storage")
    Reported-by: Dan Carpenter
    Suggested-by: Alexei Starovoitov
    Signed-off-by: Roman Gushchin
    Cc: Daniel Borkmann
    Cc: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Roman Gushchin
     

20 Oct, 2018

1 commit

  • Tests are added to make sure CGROUP_SKB cannot access:
    tc_classid, data_meta, flow_keys

    and can read and write:
    mark, prority, and cb[0-4]

    and can read other fields.

    To make selftest with skb->sk work, a dummy sk is added in
    bpf_prog_test_run_skb().

    Signed-off-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Song Liu
     

01 Oct, 2018

1 commit

  • In order to introduce per-cpu cgroup storage, let's generalize
    bpf cgroup core to support multiple cgroup storage types.
    Potentially, per-node cgroup storage can be added later.

    This commit is mostly a formal change that replaces
    cgroup_storage pointer with a array of cgroup_storage pointers.
    It doesn't actually introduce a new storage type,
    it will be done later.

    Each bpf program is now able to have one cgroup storage of each type.

    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Cc: Daniel Borkmann
    Cc: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     

03 Aug, 2018

1 commit

  • Allocate a temporary cgroup storage to use for bpf program test runs.

    Because the test program is not actually attached to a cgroup,
    the storage is allocated manually just for the execution
    of the bpf program.

    If the program is executed multiple times, the storage is not zeroed
    on each run, emulating multiple runs of the program, attached to
    a real cgroup.

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     

12 Jul, 2018

1 commit

  • sykzaller triggered several panics similar to the below:

    [...]
    [ 248.851531] BUG: KASAN: use-after-free in _copy_to_user+0x5c/0x90
    [ 248.857656] Read of size 985 at addr ffff8808017ffff2 by task a.out/1425
    [...]
    [ 248.865902] CPU: 1 PID: 1425 Comm: a.out Not tainted 4.18.0-rc4+ #13
    [ 248.865903] Hardware name: Supermicro SYS-5039MS-H12TRF/X11SSE-F, BIOS 2.1a 03/08/2018
    [ 248.865905] Call Trace:
    [ 248.865910] dump_stack+0xd6/0x185
    [ 248.865911] ? show_regs_print_info+0xb/0xb
    [ 248.865913] ? printk+0x9c/0xc3
    [ 248.865915] ? kmsg_dump_rewind_nolock+0xe4/0xe4
    [ 248.865919] print_address_description+0x6f/0x270
    [ 248.865920] kasan_report+0x25b/0x380
    [ 248.865922] ? _copy_to_user+0x5c/0x90
    [ 248.865924] check_memory_region+0x137/0x190
    [ 248.865925] kasan_check_read+0x11/0x20
    [ 248.865927] _copy_to_user+0x5c/0x90
    [ 248.865930] bpf_test_finish.isra.8+0x4f/0xc0
    [ 248.865932] bpf_prog_test_run_skb+0x6a0/0xba0
    [...]

    After scrubbing the BPF prog a bit from the noise, turns out it called
    bpf_skb_change_head() for the lwt_xmit prog with headroom of 2. Nothing
    wrong in that, however, this was run with repeat >> 0 in bpf_prog_test_run_skb()
    and the same skb thus keeps changing until the pskb_expand_head() called
    from skb_cow() keeps bailing out in atomic alloc context with -ENOMEM.
    So upon return we'll basically have 0 headroom left yet blindly do the
    __skb_push() of 14 bytes and keep copying data from there in bpf_test_finish()
    out of bounds. Fix to check if we have enough headroom and if pskb_expand_head()
    fails, bail out with error.

    Another bug independent of this fix (but related in triggering above) is
    that BPF_PROG_TEST_RUN should be reworked to reset the skb/xdp buffer to
    it's original state from input as otherwise repeating the same test in a
    loop won't work for benchmarking when underlying input buffer is getting
    changed by the prog each time and reused for the next run leading to
    unexpected results.

    Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
    Reported-by: syzbot+709412e651e55ed96498@syzkaller.appspotmail.com
    Reported-by: syzbot+54f39d6ab58f39720a55@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

19 Apr, 2018

1 commit


01 Feb, 2018

1 commit

  • syzkaller was able to generate the following XDP program ...

    (18) r0 = 0x0
    (61) r5 = *(u32 *)(r1 +12)
    (04) (u32) r0 += (u32) 0
    (95) exit

    ... and trigger a NULL pointer dereference in ___bpf_prog_run()
    via bpf_prog_test_run_xdp() where this was attempted to run.

    Reason is that recent xdp_rxq_info addition to XDP programs
    updated all drivers, but not bpf_prog_test_run_xdp(), where
    xdp_buff is set up. Thus when context rewriter does the deref
    on the netdev it's NULL at runtime. Fix it by using xdp_rxq
    from loopback dev. __netif_get_rx_queue() helper can also be
    reused in various other locations later on.

    Fixes: 02dd3291b2f0 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs")
    Reported-by: syzbot+1eb094057b338eb1fc00@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Cc: Jesper Dangaard Brouer
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

27 Sep, 2017

2 commits

  • This work enables generic transfer of metadata from XDP into skb. The
    basic idea is that we can make use of the fact that the resulting skb
    must be linear and already comes with a larger headroom for supporting
    bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
    on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
    for adjusting a new pointer called xdp->data_meta. Thus, the packet has
    a flexible and programmable room for meta data, followed by the actual
    packet data. struct xdp_buff is therefore laid out that we first point
    to data_hard_start, then data_meta directly prepended to data followed
    by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
    account whether we have meta data already prepended and if so, memmove()s
    this along with the given offset provided there's enough room.

    xdp->data_meta is optional and programs are not required to use it. The
    rationale is that when we process the packet in XDP (e.g. as DoS filter),
    we can push further meta data along with it for the XDP_PASS case, and
    give the guarantee that a clsact ingress BPF program on the same device
    can pick this up for further post-processing. Since we work with skb
    there, we can also set skb->mark, skb->priority or other skb meta data
    out of BPF, thus having this scratch space generic and programmable
    allows for more flexibility than defining a direct 1:1 transfer of
    potentially new XDP members into skb (it's also more efficient as we
    don't need to initialize/handle each of such new members). The facility
    also works together with GRO aggregation. The scratch space at the head
    of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
    yet supporting xdp->data_meta can simply be set up with xdp->data_meta
    as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
    such that the subsequent match against xdp->data for later access is
    guaranteed to fail.

    The verifier treats xdp->data_meta/xdp->data the same way as we treat
    xdp->data/xdp->data_end pointer comparisons. The requirement for doing
    the compare against xdp->data is that it hasn't been modified from it's
    original address we got from ctx access. It may have a range marking
    already from prior successful xdp->data/xdp->data_end pointer comparisons
    though.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Just do the rename into bpf_compute_data_pointers() as we'll add
    one more pointer here to recompute.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

02 May, 2017

2 commits


02 Apr, 2017

1 commit

  • development and testing of networking bpf programs is quite cumbersome.
    Despite availability of user space bpf interpreters the kernel is
    the ultimate authority and execution environment.
    Current test frameworks for TC include creation of netns, veth,
    qdiscs and use of various packet generators just to test functionality
    of a bpf program. XDP testing is even more complicated, since
    qemu needs to be started with gro/gso disabled and precise queue
    configuration, transferring of xdp program from host into guest,
    attaching to virtio/eth0 and generating traffic from the host
    while capturing the results from the guest.

    Moreover analyzing performance bottlenecks in XDP program is
    impossible in virtio environment, since cost of running the program
    is tiny comparing to the overhead of virtio packet processing,
    so performance testing can only be done on physical nic
    with another server generating traffic.

    Furthermore ongoing changes to user space control plane of production
    applications cannot be run on the test servers leaving bpf programs
    stubbed out for testing.

    Last but not least, the upstream llvm changes are validated by the bpf
    backend testsuite which has no ability to test the code generated.

    To improve this situation introduce BPF_PROG_TEST_RUN command
    to test and performance benchmark bpf programs.

    Joint work with Daniel Borkmann.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Alexei Starovoitov