14 Jul, 2018

9 commits

  • People noticed that the code match on IEEE 802.1ad (ETH_P_8021AD) ethertype,
    and this implies Q-in-Q or double tagged VLANs. Thus, we better parse
    the next VLAN header too. It is even marked as a TODO.

    This is relevant for real world use-cases, as XDP cpumap redirect can be
    used when the NIC RSS hashing is broken. E.g. the ixgbe driver HW cannot
    handle double tagged VLAN packets, and places everything into a single
    RX queue. Using cpumap redirect, users can redistribute traffic across
    CPUs to solve this, which is faster than the network stacks RPS solution.

    It is left as an exerise how to distribute the packets across CPUs. It
    would be convenient to use the RX hash, but that is not _yet_ exposed
    to XDP programs. For now, users can code their own hash, as I've demonstrated
    in the Suricata code (where Q-in-Q is handled correctly).

    Reported-by: Florian Maury
    Reported-by: Marek Majkowski
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     
  • Jakub Kicinski says:

    ====================
    This set is adding support for loading driver and offload XDP
    at the same time. This enables advanced use cases where some
    of the work is offloaded to the NIC and some is done by the host.
    Separate netlink attributes are added for each mode of operation.
    Driver callbacks for offload are cleaned up a little, including
    removal of .prog_attached flag.
    ====================

    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Split handling of offloaded and driver programs completely. Since
    offloaded programs always come with XDP_FLAGS_HW_MODE set in reality
    there could be no sharing, anyway, programs would only be installed
    in driver or in hardware. Splitting the handling allows us to install
    programs in HW and in driver at the same time.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • Add tests for having an XDP program attached in the driver and
    another one attached in HW simultaneously.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • Allow netdevsim to accept driver and offload attachment of XDP
    BPF programs at the same time.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • Split the query of HW-attached program from the software one.
    Introduce new .ndo_bpf command to query HW-attached program.
    This will allow drivers to install different programs in HW
    and SW at the same time. Netlink can now also carry multiple
    programs on dump (in which case mode will be set to
    XDP_ATTACHED_MULTI and user has to check per-attachment point
    attributes, IFLA_XDP_PROG_ID will not be present). We reuse
    IFLA_XDP_PROG_ID skb space for second mode, so rtnl_xdp_size()
    doesn't need to be updated.

    Note that the installation side is still not there, since all
    drivers currently reject installing more than one program at
    the time.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • Basic operations drivers perform during xdp setup and query can
    be moved to helpers in the core. Encapsulate program and flags
    into a structure and add helpers. Note that the structure is
    intended as the "main" program information source in the driver.
    Most drivers will additionally place the program pointer in their
    fast path or ring structures.

    The helpers don't have a huge impact now, but they will
    decrease the code duplication when programs can be installed
    in HW and driver at the same time. Encapsulating the basic
    operations in helpers will hopefully also reduce the number
    of changes to drivers which adopt them.

    Helpers could really be static inline, but they depend on
    definition of struct netdev_bpf which means they'd have
    to be placed in netdevice.h, an already 4500 line header.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • prog_attached of struct netdev_bpf should have been superseded
    by simply setting prog_id long time ago, but we kept it around
    to allow offloading drivers to communicate attachment mode (drv
    vs hw). Subsequently drivers were also allowed to report back
    attachment flags (prog_flags), and since nowadays only programs
    attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
    the attachment mode from the flags driver reports. Remove
    prog_attached member.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • In preparation for support of simultaneous driver and hardware XDP
    support add per-mode attributes. The catch-all IFLA_XDP_PROG_ID
    will still be reported, but user space can now also access the
    program ID in a new IFLA_XDP__PROG_ID attribute.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     

13 Jul, 2018

24 commits

  • Russell King says:

    ====================
    Four further jit compiler improves for 32-bit ARM.
    ====================

    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Improbe the 64-bit ALU implementation from:

    movw r8, #65532
    movt r8, #65535
    movw r9, #65535
    movt r9, #65535
    ldr r7, [fp, #-44]
    adds r7, r7, r8
    str r7, [fp, #-44]
    ldr r7, [fp, #-40]
    adc r7, r7, r9
    str r7, [fp, #-40]

    to:

    movw r8, #65532
    movt r8, #65535
    movw r9, #65535
    movt r9, #65535
    ldrd r6, [fp, #-44]
    adds r6, r6, r8
    adc r7, r7, r9
    strd r6, [fp, #-44]

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Improve the 64-bit store implementation from:

    ldr r6, [fp, #-8]
    str r8, [r6]
    ldr r6, [fp, #-8]
    mov r7, #4
    add r7, r6, r7
    str r9, [r7]

    to:

    ldr r6, [fp, #-8]
    str r8, [r6]
    str r9, [r6, #4]

    We leave the store as two separate STR instructions rather than using
    STRD as the store may not be aligned, and STR can handle misalignment.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Improve the 64-bit sign-extended immediate from:

    mov r6, #1
    str r6, [fp, #-52] ; 0xffffffcc
    mov r6, #0
    str r6, [fp, #-48] ; 0xffffffd0

    to:

    mov r6, #1
    mov r7, #0
    strd r6, [fp, #-52] ; 0xffffffcc

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Rather than writing each 32-bit half of the 64-bit immediate value
    separately when the register is on the stack:

    movw r6, #45056 ; 0xb000
    movt r6, #60979 ; 0xee33
    str r6, [fp, #-44] ; 0xffffffd4
    mov r6, #0
    str r6, [fp, #-40] ; 0xffffffd8

    arrange to use the double-word store when available instead:

    movw r6, #45056 ; 0xb000
    movt r6, #60979 ; 0xee33
    mov r7, #0
    strd r6, [fp, #-44] ; 0xffffffd4

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Russell King says:

    ====================
    This series improves the ARM BPF JIT compiler by:

    - enumerating the stack layout rather than using constants that happen
    to be multiples of four
    - rejig the BPF "register" accesses to use negative numbers instead of
    positive, which could be confused with register numbers in the bpf2a32
    array.
    - since we maintain the ARM FP register as a pointer to the top of our
    scratch space (or, with frame pointers enabled, a valid ARM frame
    pointer register), we can access our scratch space using FP, which is
    constant across all BPF programs, including tail-called programs.
    - use immediate forms of ARM instructions where possible, rather than
    first loading the immediate into an ARM register.
    - use load-with-shift instruction rather than seperate shift instruction
    followed by load
    - avoid reloading index and array in the tail-call code
    - use double-word load/store instructions where available

    Version 2:

    - Fix ARMv5 test pointed out by Olof
    - Fix build error found by 0-day (adding an additional patch)
    ====================

    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Use double-word load and stores where support for this instruction is
    supported by the CPU architecture.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Always use an odd/even register pair for our 64-bit registers, so that
    we're able to use the double-word load/store instructions in the future.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Rearranging the order of the initial tail call code a little allows is
    to avoid reloading the 'array' pointer.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Avoid reloading 'index' after we have validated it - it remains in
    tmp2[1] up to the point that we begin the code to index the pointer
    array, so with a little rearrangement of the registers, we can use
    the already loaded value.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Rather than pre-shifting the rm register for the ldr in the tail call,
    shift it in the load instruction. This eliminates one unnecessary
    instruction.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Rather than moving constants to a register and then using them in a
    subsequent instruction, use them directly in the desired instruction
    cutting out the "middle" register. This removes two instructions from
    the tail call code path.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Provide a version of the imm8m() function that the compiler can optimise
    when used with a constant expression.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Access the eBPF scratch space using the frame pointer rather than our
    stack pointer, as the offsets from the ARM frame pointer are constant
    across all eBPF programs.

    Since we no longer reference the scratch space registers from the stack
    pointer, this simplifies emit_push_r64() as it no longer needs to know
    how many words are pushed onto the stack.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Provide a couple of 64-bit register accessors, and use them where
    appropriate

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Many of the code paths need to have knowledge about whether a register
    is stacked or in a CPU register. Move this decision making to a pair
    of helper functions instead of having it scattered throughout the
    code.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • The decision about whether a BPF register is on the stack or in a CPU
    register is detected at the top BPF insn processing level, and then
    percolated throughout the remainder of the code. Since we now use
    negative register values to represent stacked registers, we can detect
    where a BPF register is stored without restoring to carrying this
    additional metadata through all code paths.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Use negative numbers for eBPF registers that live on the stack.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Provide a set of load/store opcode generators that work with negative
    immediates as well as positive ones.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Enumerate the contents of the JIT scratch stack layout used for storing
    some of the JITs 64-bit registers, tail call counter and AX register.

    Signed-off-by: Russell King
    Signed-off-by: Daniel Borkmann

    Russell King
     
  • Quentin Monnet says:

    ====================
    The three patches in this series are related to the documentation for eBPF
    helpers. The first patch brings minor formatting edits to the documentation
    in include/uapi/linux/bpf.h, and the second one updates the related header
    file under tools/.

    The third patch adds a Makefile under tools/bpf for generating the
    documentation (man pages) about eBPF helpers. The targets defined in this
    file can also be called from the bpftool directory (please refer to
    relevant commit logs for details).
    ====================

    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Provide a new Makefile.helpers in tools/bpf, in order to build and
    install the man page for eBPF helpers. This Makefile is also included in
    the one used to build bpftool documentation, so that it can be called
    either on its own (cd tools/bpf && make -f Makefile.helpers) or from
    bpftool directory (cd tools/bpf/bpftool && make doc, or
    cd tools/bpf/bpftool/Documentation && make helpers).

    Makefile.helpers is not added directly to bpftool to avoid changing its
    Makefile too much (helpers are not 100% directly related with bpftool).
    But the possibility to build the page from bpftool directory makes us
    able to package the helpers man page with bpftool, and to install it
    along with bpftool documentation, so that the doc for helpers becomes
    easily available to developers through the "man" program.

    Cc: linux-man@vger.kernel.org
    Suggested-by: Daniel Borkmann
    Signed-off-by: Quentin Monnet
    Reviewed-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Update with latest changes from include/uapi/linux/bpf.h header.

    Signed-off-by: Quentin Monnet
    Reviewed-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Minor formatting edits for eBPF helpers documentation, including blank
    lines removal, fix of item list for return values in bpf_fib_lookup(),
    and missing prefix on bpf_skb_load_bytes_relative().

    Signed-off-by: Quentin Monnet
    Reviewed-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     

12 Jul, 2018

7 commits

  • Jakub Kicinski says:

    ====================
    This series starts with two minor clean ups to test_offload.py
    selftest script.

    The next 11 patches extend the abilities of bpftool prog load
    beyond the simple cgroup use cases. Three new parameters are
    added:

    - type - allows specifying program type, independent of how
    code sections are named;
    - map - allows reusing existing maps, instead of creating a new
    map on every program load;
    - dev - offload/binding to a device.

    A number of changes to libbpf is required to accomplish the task.
    The section - program type logic mapping is exposed. We should
    probably aim to use the libbpf program section naming everywhere.
    For reuse of maps we need to allow users to set FD for bpf map
    object in libbpf.

    Examples

    Load program my_xdp.o and pin it as /sys/fs/bpf/my_xdp, for xdp
    program type:

    $ bpftool prog load my_xdp.o /sys/fs/bpf/my_xdp \
    type xdp

    As above but for offload:

    $ bpftool prog load my_xdp.o /sys/fs/bpf/my_xdp \
    type xdp \
    dev netdevsim0

    Load program my_maps.o, but for the first map reuse map id 17,
    and for the map called "other_map" reuse pinned map /sys/fs/bpf/map0:

    $ bpftool prog load my_maps.o /sys/fs/bpf/prog \
    map idx 0 id 17 \
    map name other_map pinned /sys/fs/bpf/map0

    v3:
    - fix return codes in patch 5;
    - rename libbpf_prog_type_by_string() -> libbpf_prog_type_by_name();
    - fold file path into xattr in patch 8;
    - add patch 10;
    - use dup3() in patch 12;
    - depend on fd value in patch 12;
    - close old fd in patch 12.
    v2:
    - add compat for reallocarray().
    ====================

    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Add map parameter to prog load which will allow reuse of existing
    maps instead of creating new ones.

    We need feature detection and compat code for reallocarray, since
    it's not available in many libc versions.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • More advanced applications may want to only replace programs without
    destroying associated maps. Allow libbpf users to achieve that.
    Instead of always creating all of the maps at load time, expose to
    users an API to reconstruct the map object from already existing
    map.

    The map parameters are read from the kernel and replace the parameters
    of the ELF map. libbpf does not restrict the map replacement, i.e.
    the reused map does not have to be compatible with the ELF map
    definition. We relay on the verifier for checking the compatibility
    between maps and programs. The ELF map definition is completely
    overwritten by the information read from the kernel, to make sure
    libbpf's view of map object corresponds to the actual map.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Andrey Ignatov
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • reallocarray() is a safer variant of realloc which checks for
    multiplication overflow in case of array allocation. Since it's
    not available in Glibc < 2.26 import kernel's overflow.h and
    add a static inline implementation when needed. Use feature
    detection to probe for existence of reallocarray.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Reviewed-by: Jiong Wang
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • libbpf_strerror() depends on XSI-compliant (POSIX) version of
    strerror_r(), which prevents us from using GNU-extensions in
    libbpf.c, like reallocarray() or dup3(). Move error printing
    code into a separate file to allow it to continue using POSIX
    strerror_r().

    No functional changes.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • bpf_prog_load() is a very useful helper but it doesn't give us full
    flexibility of modifying the BPF objects before loading. Open code
    bpf_prog_load() in bpftool so we can add extra logic in following
    commits.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski
     
  • Similarly to bpf_prog_load() users of bpf_object__open() may need
    to specify the expected program type. Program type is needed at
    open to avoid the kernel version check for program types which don't
    require it.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Acked-by: Andrey Ignatov
    Signed-off-by: Daniel Borkmann

    Jakub Kicinski