03 Aug, 2018

1 commit

  • The bpf_get_local_storage() helper function is used
    to get a pointer to the bpf local storage from a bpf program.

    It takes a pointer to a storage map and flags as arguments.
    Right now it accepts only cgroup storage maps, and flags
    argument has to be 0. Further it can be extended to support
    other types of local storage: e.g. thread local storage etc.

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     

04 Jun, 2018

1 commit

  • bpf has been used extensively for tracing. For example, bcc
    contains an almost full set of bpf-based tools to trace kernel
    and user functions/events. Most tracing tools are currently
    either filtered based on pid or system-wide.

    Containers have been used quite extensively in industry and
    cgroup is often used together to provide resource isolation
    and protection. Several processes may run inside the same
    container. It is often desirable to get container-level tracing
    results as well, e.g. syscall count, function count, I/O
    activity, etc.

    This patch implements a new helper, bpf_get_current_cgroup_id(),
    which will return cgroup id based on the cgroup within which
    the current task is running.

    The later patch will provide an example to show that
    userspace can get the same cgroup id so it could
    configure a filter or policy in the bpf program based on
    task cgroup id.

    The helper is currently implemented for tracing. It can
    be added to other program types as well when needed.

    Acked-by: Alexei Starovoitov
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Yonghong Song
     

10 Jan, 2017

1 commit


23 Oct, 2016

1 commit

  • Use case is mainly for soreuseport to select sockets for the local
    numa node, but since generic, lets also add this for other networking
    and tracing program types.

    Suggested-by: Eric Dumazet
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

21 Sep, 2016

1 commit

  • This work implements direct packet access for helpers and direct packet
    write in a similar fashion as already available for XDP types via commits
    4acf6c0b84c9 ("bpf: enable direct packet data write for xdp progs") and
    6841de8b0d03 ("bpf: allow helpers access the packet directly"), and as a
    complementary feature to the already available direct packet read for tc
    (cls/act) programs.

    For enabling this, we need to introduce two helpers, bpf_skb_pull_data()
    and bpf_csum_update(). The first is generally needed for both, read and
    write, because they would otherwise only be limited to the current linear
    skb head. Usually, when the data_end test fails, programs just bail out,
    or, in the direct read case, use bpf_skb_load_bytes() as an alternative
    to overcome this limitation. If such data sits in non-linear parts, we
    can just pull them in once with the new helper, retest and eventually
    access them.

    At the same time, this also makes sure the skb is uncloned, which is, of
    course, a necessary condition for direct write. As this needs to be an
    invariant for the write part only, the verifier detects writes and adds
    a prologue that is calling bpf_skb_pull_data() to effectively unclone the
    skb from the very beginning in case it is indeed cloned. The heuristic
    makes use of a similar trick that was done in 233577a22089 ("net: filter:
    constify detection of pkt_type_offset"). This comes at zero cost for other
    programs that do not use the direct write feature. Should a program use
    this feature only sparsely and has read access for the most parts with,
    for example, drop return codes, then such write action can be delegated
    to a tail called program for mitigating this cost of potential uncloning
    to a late point in time where it would have been paid similarly with the
    bpf_skb_store_bytes() as well. Advantage of direct write is that the
    writes are inlined whereas the helper cannot make any length assumptions
    and thus needs to generate a call to memcpy() also for small sizes, as well
    as cost of helper call itself with sanity checks are avoided. Plus, when
    direct read is already used, we don't need to cache or perform rechecks
    on the data boundaries (due to verifier invalidating previous checks for
    helpers that change skb->data), so more complex programs using rewrites
    can benefit from switching to direct read plus write.

    For direct packet access to helpers, we save the otherwise needed copy into
    a temp struct sitting on stack memory when use-case allows. Both facilities
    are enabled via may_access_direct_pkt_data() in verifier. For now, we limit
    this to map helpers and csum_diff, and can successively enable other helpers
    where we find it makes sense. Helpers that definitely cannot be allowed for
    this are those part of bpf_helper_changes_skb_data() since they can change
    underlying data, and those that write into memory as this could happen for
    packet typed args when still cloned. bpf_csum_update() helper accommodates
    for the fact that we need to fixup checksum_complete when using direct write
    instead of bpf_skb_store_bytes(), meaning the programs can use available
    helpers like bpf_csum_diff(), and implement csum_add(), csum_sub(),
    csum_block_add(), csum_block_sub() equivalents in eBPF together with the
    new helper. A usage example will be provided for iproute2's examples/bpf/
    directory.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Sep, 2016

2 commits

  • This work adds BPF_CALL_() macros and converts all the eBPF helper functions
    to use them, in a similar fashion like we do with SYSCALL_DEFINE() macros
    that are used today. Motivation for this is to hide all the register handling
    and all necessary casts from the user, so that it is done automatically in the
    background when adding a BPF_CALL_() call.

    This makes current helpers easier to review, eases to write future helpers,
    avoids getting the casting mess wrong, and allows for extending all helpers at
    once (f.e. build time checks, etc). It also helps detecting more easily in
    code reviews that unused registers are not instrumented in the code by accident,
    breaking compatibility with existing programs.

    BPF_CALL_() internals are quite similar to SYSCALL_DEFINE() ones with some
    fundamental differences, for example, for generating the actual helper function
    that carries all u64 regs, we need to fill unused regs, so that we always end up
    with 5 u64 regs as an argument.

    I reviewed several 0-5 generated BPF_CALL_() variants of the .i results and
    they look all as expected. No sparse issue spotted. We let this also sit for a
    few days with Fengguang's kbuild test robot, and there were no issues seen. On
    s390, it barked on the "uses dynamic stack allocation" notice, which is an old
    one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
    to the call wrapper, just telling that the perf raw record/frag sits on stack
    (gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
    and they were fine as well. All eBPF helpers are now converted to use these
    macros, getting rid of a good chunk of all the raw castings.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Some minor misc cleanups, f.e. use sizeof(__u32) instead of hardcoding
    and in __bpf_skb_max_len(), I missed that we always have skb->dev valid
    anyway, so we can drop the unneeded test for dev; also few more other
    misc bits addressed here.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

30 Jun, 2016

1 commit

  • Use smp_processor_id() for the generic helper bpf_get_smp_processor_id()
    instead of the raw variant. This allows for preemption checks when we
    have DEBUG_PREEMPT, and otherwise uses the raw variant anyway. We only
    need to keep the raw variant for socket filters, but we can reuse the
    helper that is already there from cBPF side.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Apr, 2016

1 commit

  • This patch converts all helpers that can use ARG_PTR_TO_RAW_STACK as argument
    type. For tc programs this is bpf_skb_load_bytes(), bpf_skb_get_tunnel_key(),
    bpf_skb_get_tunnel_opt(). For tracing, this optimizes bpf_get_current_comm()
    and bpf_probe_read(). The check in bpf_skb_load_bytes() for MAX_BPF_STACK can
    also be removed since the verifier already makes sure we stay within bounds
    on stack buffers.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Mar, 2016

1 commit

  • Lots of places in the kernel use memcpy(buf, comm, TASK_COMM_LEN); but
    the result is typically passed to print("%s", buf) and extra bytes
    after zero don't cause any harm.
    In bpf the result of bpf_get_current_comm() is used as the part of
    map key and was causing spurious hash map mismatches.
    Use strlcpy() to guarantee zero-terminated string.
    bpf verifier checks that output buffer is zero-initialized,
    so even for short task names the output buffer don't have junk bytes.
    Note it's not a security concern, since kprobe+bpf is root only.

    Fixes: ffeedafbf023 ("bpf: introduce current->pid, tgid, uid, gid, comm accessors")
    Reported-by: Tobias Waldekranz
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

08 Oct, 2015

1 commit

  • While recently arguing on a seccomp discussion that raw prandom_u32()
    access shouldn't be exposed to unpriviledged user space, I forgot the
    fact that SKF_AD_RANDOM extension actually already does it for some time
    in cBPF via commit 4cd3675ebf74 ("filter: added BPF random opcode").

    Since prandom_u32() is being used in a lot of critical networking code,
    lets be more conservative and split their states. Furthermore, consolidate
    eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF,
    bpf_get_prandom_u32() was only accessible for priviledged users, but
    should that change one day, we also don't want to leak raw sequences
    through things like eBPF maps.

    One thought was also to have own per bpf_prog states, but due to ABI
    reasons this is not easily possible, i.e. the program code currently
    cannot access bpf_prog itself, and copying the rnd_state to/from the
    stack scratch space whenever a program uses the prng seems not really
    worth the trouble and seems too hacky. If needed, taus113 could in such
    cases be implemented within eBPF using a map entry to keep the state
    space, or get_random_bytes() could become a second helper in cases where
    performance would not be critical.

    Both sides can trigger a one-time late init via prandom_init_once() on
    the shared state. Performance-wise, there should even be a tiny gain
    as bpf_user_rnd_u32() saves one function call. The PRNG needs to live
    inside the BPF core since kernels could have a NET-less config as well.

    Signed-off-by: Daniel Borkmann
    Acked-by: Hannes Frederic Sowa
    Acked-by: Alexei Starovoitov
    Cc: Chema Gonzalez
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

16 Jun, 2015

1 commit

  • eBPF programs attached to kprobes need to filter based on
    current->pid, uid and other fields, so introduce helper functions:

    u64 bpf_get_current_pid_tgid(void)
    Return: current->tgid << 32 | current->pid

    u64 bpf_get_current_uid_gid(void)
    Return: current_gid << 32 | current_uid

    bpf_get_current_comm(char *buf, int size_of_buf)
    stores current->comm into buf

    They can be used from the programs attached to TC as well to classify packets
    based on current task fields.

    Update tracex2 example to print histogram of write syscalls for each process
    instead of aggregated for all.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

01 Jun, 2015

2 commits

  • Besides others, move bpf_tail_call_proto to the remaining definitions
    of other protos, improve comments a bit (i.e. remove some obvious ones,
    where the code is already self-documenting, add objectives for others),
    simplify bpf_prog_array_compatible() a bit.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • As this is already exported from tracing side via commit d9847d310ab4
    ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might
    as well want to move it to the core, so also networking users can make
    use of it, e.g. to measure diffs for certain flows from ingress/egress.

    Signed-off-by: Daniel Borkmann
    Cc: Alexei Starovoitov
    Cc: Ingo Molnar
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

16 Mar, 2015

2 commits

  • This patch adds the possibility to obtain raw_smp_processor_id() in
    eBPF. Currently, this is only possible in classic BPF where commit
    da2033c28226 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added
    facilities for this.

    Perhaps most importantly, this would also allow us to track per CPU
    statistics with eBPF maps, or to implement a poor-man's per CPU data
    structure through eBPF maps.

    Example function proto-type looks like:

    u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id;

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This work is similar to commit 4cd3675ebf74 ("filter: added BPF
    random opcode") and adds a possibility for packet sampling in eBPF.

    Currently, this is only possible in classic BPF and useful to
    combine sampling with f.e. packet sockets, possible also with tc.

    Example function proto-type looks like:

    u32 (*prandom_u32)(void) = (void *)BPF_FUNC_get_prandom_u32;

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

02 Mar, 2015

1 commit


19 Nov, 2014

1 commit