27 Apr, 2018

38 commits

  • Up until now we largely assumed that we were interested in ETH_SS_STATS
    type of strings for all ethtool operations, this is about to change with
    the introduction of additional string sets, e.g: ETH_SS_PHY_STATS.
    Update all functions to take an appropriate stringset argument and act
    on it when it is different than ETH_SS_STATS for now.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • This is completely redundant with what netdev_set_default_ethtool_ops()
    does, we are always guaranteed to have a valid dev->ethtool_ops pointer,
    however, within that structure, not all function calls may be populated,
    so we still have to check them individually.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Add a new callback: get_ethtool_phy_stats() which allows network device
    drivers not making use of the PHY library to return PHY statistics.
    Update ethtool_get_phy_stats(), __ethtool_get_sset_count() and
    __ethtool_get_strings() accordingly to interogate the network device
    about ETH_SS_PHY_STATS.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In order to make it possible for network device drivers that do not
    necessarily have a phy_device attached, but still report PHY statistics,
    have a preliminary refactoring consisting in creating helper functions
    that encapsulate the PHY device driver knowledge within PHYLIB.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • The 'pppol2tp' procfs and 'l2tp/tunnels' debugfs files handle reference
    counting of sessions differently than for tunnels.

    For consistency, use the same mechanism for handling both sessions and
    tunnels. That is, drop the reference on the previous session just
    before looking up the next one (rather than in .show()). If necessary
    (if dump stops before *_next_session() returns NULL), drop the last
    reference in .stop().

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • After the introduction of a 128-bit node identity it may be difficult
    for a user to correlate between this identity and the generated node
    hash address.

    We now try to make this easier by introducing a new ioctl() call for
    fetching a node identity by using the hash value as key. This will
    be particularly useful when we extend some of the commands in the
    'tipc' tool, but we also expect regular user applications to need
    this feature.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2018-04-27

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Add extensive BPF helper description into include/uapi/linux/bpf.h
    and a new script bpf_helpers_doc.py which allows for generating a
    man page out of it. Thus, every helper in BPF now comes with proper
    function signature, detailed description and return code explanation,
    from Quentin.

    2) Migrate the BPF collect metadata tunnel tests from BPF samples over
    to the BPF selftests and further extend them with v6 vxlan, geneve
    and ipip tests, simplify the ipip tests, improve documentation and
    convert to bpf_ntoh*() / bpf_hton*() api, from William.

    3) Currently, helpers that expect ARG_PTR_TO_MAP_{KEY,VALUE} can only
    access stack and packet memory. Extend this to allow such helpers
    to also use map values, which enabled use cases where value from
    a first lookup can be directly used as a key for a second lookup,
    from Paul.

    4) Add a new helper bpf_skb_get_xfrm_state() for tc BPF programs in
    order to retrieve XFRM state information containing SPI, peer
    address and reqid values, from Eyal.

    5) Various optimizations in nfp driver's BPF JIT in order to turn ADD
    and SUB instructions with negative immediate into the opposite
    operation with a positive immediate such that nfp can better fit
    small immediates into instructions. Savings in instruction count
    up to 4% have been observed, from Jakub.

    6) Add the BPF prog's gpl_compatible flag to struct bpf_prog_info
    and add support for dumping this through bpftool, from Jiri.

    7) Move the BPF sockmap samples over into BPF selftests instead since
    sockmap was rather a series of tests than sample anyway and this way
    this can be run from automated bots, from John.

    8) Follow-up fix for bpf_adjust_tail() helper in order to make it work
    with generic XDP, from Nikita.

    9) Some follow-up cleanups to BTF, namely, removing unused defines from
    BTF uapi header and renaming 'name' struct btf_* members into name_off
    to make it more clear they are offsets into string section, from Martin.

    10) Remove test_sock_addr from TEST_GEN_PROGS in BPF selftests since
    not run directly but invoked from test_sock_addr.sh, from Yonghong.

    11) Remove redundant ret assignment in sample BPF loader, from Wang.

    12) Add couple of missing files to BPF selftest's gitignore, from Anders.

    There are two trivial merge conflicts while pulling:

    1) Remove samples/sockmap/Makefile since all sockmap tests have been
    moved to selftests.
    2) Add both hunks from tools/testing/selftests/bpf/.gitignore to the
    file since git should ignore all of them.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • 2 redundant ret assignments removed:

    * 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
    label 'done'. No 'ret = 1' needed before the error jump.

    * After the '/* load programs */' part, if everything goes well, then
    the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
    If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
    after the for clause will make the error skipped.

    For example, if some BPF code cannot provide supported program types
    in ELF SEC("unknown"), the for clause will not call load_and_attach()
    to load the BPF code. 1 should be returned to callees instead of 0.

    Signed-off-by: Wang Sheng-Hui
    Signed-off-by: Daniel Borkmann

    Wang Sheng-Hui
     
  • Quentin Monnet says:

    ====================
    eBPF helper functions can be called from within eBPF programs to perform
    a variety of tasks that would be otherwise hard or impossible to do with
    eBPF itself. There is a growing number of such helper functions in the
    kernel, but documentation is scarce. The main user space header file
    does contain a short commented description of most helpers, but it is
    somewhat outdated and not complete. It is more a "cheat sheet" than a
    real documentation accessible to new eBPF developers.

    This commit attempts to improve the situation by replacing the existing
    overview for the helpers with a more developed description. Furthermore,
    a Python script is added to generate a manual page for eBPF helpers. The
    workflow is the following, and requires the rst2man utility:

    $ ./scripts/bpf_helpers_doc.py \
    --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
    $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
    $ man /tmp/bpf-helpers.7

    The objective is to keep all documentation related to the helpers in a
    single place, and to be able to generate from here a manual page that
    could be packaged in the man-pages repository and shipped with most
    distributions.

    Additionally, parsing the prototypes of the helper functions could
    hopefully be reused, with a different Printer object, to generate
    header files needed in some eBPF-related projects.

    Regarding the description of each helper, it comprises several items:

    - The function prototype.
    - A description of the function and of its arguments (except for a
    couple of cases, when there are no arguments and the return value
    makes the function usage really obvious).
    - A description of return values (if not void).

    Additional items such as the list of compatible eBPF program and map
    types for each helper, Linux kernel version that introduced the helper,
    GPL-only restriction, and commit hash could be added in the future, but
    it was decided on the mailing list to leave them aside for now.

    For several helpers, descriptions are inspired (at times, nearly copied)
    from the commit logs introducing them in the kernel--Many thanks to
    their respective authors! Some sentences were also adapted from comments
    from the reviews, thanks to the reviewers as well. Descriptions were
    completed as much as possible, the objective being to have something easily
    accessible even for people just starting with eBPF. There is probably a bit
    more work to do in this direction for some helpers.

    Some RST formatting is used in the descriptions (not in function
    prototypes, to keep them readable, but the Python script provided in
    order to generate the RST for the manual page does add formatting to
    prototypes, to produce something pretty) to get "bold" and "italics" in
    manual pages. Hopefully, the descriptions in bpf.h file remains
    perfectly readable. Note that the few trailing white spaces are
    intentional, removing them would break paragraphs for rst2man.

    The descriptions should ideally be updated each time someone adds a new
    helper, or updates the behaviour (new socket option supported, ...) or
    the interface (new flags available, ...) of existing ones.

    To ease the review process, the documentation has been split into several
    patches.

    v3 -> v4:
    - Add a patch (#9) for newly added BPF helpers.
    - Add a patch (#10) to update UAPI bpf.h version under tools/.
    - Use SPDX tag in Python script.
    - Several fixes on man page header and footer, and helpers documentation.
    Please refer to individual patches for details.

    RFC v2 -> PATCH v3:
    Several fixes on man page header and footer, and helpers documentation.
    Please refer to individual patches for details.

    RFC v1 -> RFC v2:
    - Remove "For" (compatible program and map types), "Since" (minimal
    Linux kernel version required), "GPL only" sections and commit hashes
    for the helpers.
    - Add comment on top of the description list to explain how this
    documentation is supposed to be processed.
    - Update Python script accordingly (remove the same sections, and remove
    paragraphs on program types and GPL restrictions from man page
    header).
    - Split series into several patches.
    ====================

    Signed-off-by: Daniel Borkmann
    Cc: linux-doc@vger.kernel.org
    Cc: linux-man@vger.kernel.org

    Daniel Borkmann
     
  • Update tools/include/uapi/linux/bpf.h file in order to reflect the
    changes for BPF helper functions documentation introduced in previous
    commits.

    Signed-off-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions:

    Helper from Nikita:
    - bpf_xdp_adjust_tail()

    Helper from Eyal:
    - bpf_skb_get_xfrm_state()

    v4:
    - New patch (helpers did not exist yet for previous versions).

    Cc: Nikita V. Shirokov
    Cc: Eyal Birger
    Signed-off-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions, all
    written by John:

    - bpf_redirect_map()
    - bpf_sk_redirect_map()
    - bpf_sock_map_update()
    - bpf_msg_redirect_map()
    - bpf_msg_apply_bytes()
    - bpf_msg_cork_bytes()
    - bpf_msg_pull_data()

    v4:
    - bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
    "his" to "this". Also add a paragraph on performance improvement over
    bpf_redirect() helper.

    v3:
    - bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
    - bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
    - bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
    for generic XDP but supported on native XDP.
    - bpf_msg_pull_data(): Clarify comment about invalidated verifier
    checks.

    Cc: Jesper Dangaard Brouer
    Cc: John Fastabend
    Signed-off-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions:

    Helpers from Lawrence:
    - bpf_setsockopt()
    - bpf_getsockopt()
    - bpf_sock_ops_cb_flags_set()

    Helpers from Yonghong:
    - bpf_perf_event_read_value()
    - bpf_perf_prog_read_value()

    Helper from Josef:
    - bpf_override_return()

    Helper from Andrey:
    - bpf_bind()

    v4:
    - bpf_perf_event_read_value(): State that this helper should be
    preferred over bpf_perf_event_read().

    v3:
    - bpf_perf_event_read_value(): Fix time of selection for perf event type
    in description. Remove occurences of "cores" to avoid confusion with
    "CPU".
    - bpf_bind(): Remove last paragraph of description, which was off topic.

    Cc: Lawrence Brakmo
    Cc: Yonghong Song
    Cc: Josef Bacik
    Cc: Andrey Ignatov
    Signed-off-by: Quentin Monnet
    Acked-by: Yonghong Song
    [for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
    Acked-by: Andrey Ignatov
    [for bpf_bind()]
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions:

    Helper from Kaixu:
    - bpf_perf_event_read()

    Helpers from Martin:
    - bpf_skb_under_cgroup()
    - bpf_xdp_adjust_head()

    Helpers from Sargun:
    - bpf_probe_write_user()
    - bpf_current_task_under_cgroup()

    Helper from Thomas:
    - bpf_skb_change_head()

    Helper from Gianluca:
    - bpf_probe_read_str()

    Helpers from Chenbo:
    - bpf_get_socket_cookie()
    - bpf_get_socket_uid()

    v4:
    - bpf_perf_event_read(): State that bpf_perf_event_read_value() should
    be preferred over this helper.
    - bpf_skb_change_head(): Clarify comment about invalidated verifier
    checks.
    - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
    checks.
    - bpf_probe_write_user(): Add that dst must be a valid user space
    address.
    - bpf_get_socket_cookie(): Improve description by making clearer that
    the cockie belongs to the socket, and state that it remains stable for
    the life of the socket.

    v3:
    - bpf_perf_event_read(): Fix time of selection for perf event type in
    description. Remove occurences of "cores" to avoid confusion with
    "CPU".

    Cc: Martin KaFai Lau
    Cc: Sargun Dhillon
    Cc: Thomas Graf
    Cc: Gianluca Borello
    Cc: Chenbo Feng
    Signed-off-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions, all
    written by Daniel:

    - bpf_get_hash_recalc()
    - bpf_skb_change_tail()
    - bpf_skb_pull_data()
    - bpf_csum_update()
    - bpf_set_hash_invalid()
    - bpf_get_numa_node_id()
    - bpf_set_hash()
    - bpf_skb_adjust_room()
    - bpf_xdp_adjust_meta()

    v4:
    - bpf_skb_change_tail(): Clarify comment about invalidated verifier
    checks.
    - bpf_skb_pull_data(): Clarify the motivation for using this helper or
    bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
    *skb*. Clarify comment about invalidated verifier checks.
    - bpf_csum_update(): Fix description of checksum (entire packet, not IP
    checksum). Fix a typo: "header" instead of "helper".
    - bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
    - bpf_get_numa_node_id(): State that the helper is not restricted to
    programs attached to sockets.
    - bpf_skb_adjust_room(): Clarify comment about invalidated verifier
    checks.
    - bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
    checks.

    Cc: Daniel Borkmann
    Signed-off-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions, all
    written by Daniel:

    - bpf_get_prandom_u32()
    - bpf_get_smp_processor_id()
    - bpf_get_cgroup_classid()
    - bpf_get_route_realm()
    - bpf_skb_load_bytes()
    - bpf_csum_diff()
    - bpf_skb_get_tunnel_opt()
    - bpf_skb_set_tunnel_opt()
    - bpf_skb_change_proto()
    - bpf_skb_change_type()

    v4:
    - bpf_get_prandom_u32(): Warn that the prng is not cryptographically
    secure.
    - bpf_get_smp_processor_id(): Fix a typo (case).
    - bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
    being limited to cgroup v1, and to egress path.
    - bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
    Add a note about usage with TC and advantage of clsact. Fix a typo in
    return value ("sdb" instead of "skb").
    - bpf_skb_load_bytes(): Make explicit loading large data loads it to the
    eBPF stack.
    - bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
    bpf_l3|l4_csum_replace().
    - bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
    metadata" mode, and example of this with Geneve.
    - bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
    description.
    - bpf_skb_change_proto(): Mention that the main use case is NAT64.
    Clarify comment about invalidated verifier checks.

    v3:
    - bpf_get_prandom_u32(): Fix helper name :(. Add description, including
    a note on the internal random state.
    - bpf_get_smp_processor_id(): Add description, including a note on the
    processor id remaining stable during program run.
    - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
    required to use the helper. Add a reference to related documentation.
    State that placing a task in net_cls controller disables cgroup-bpf.
    - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
    required to use this helper.
    - bpf_skb_load_bytes(): Fix comment on current use cases for the helper.

    Cc: Daniel Borkmann
    Signed-off-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions, all
    written by Alexei:

    - bpf_get_current_pid_tgid()
    - bpf_get_current_uid_gid()
    - bpf_get_current_comm()
    - bpf_skb_vlan_push()
    - bpf_skb_vlan_pop()
    - bpf_skb_get_tunnel_key()
    - bpf_skb_set_tunnel_key()
    - bpf_redirect()
    - bpf_perf_event_output()
    - bpf_get_stackid()
    - bpf_get_current_task()

    v4:
    - bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
    note on bpf_redirect_map() providing better performance. Replace "Save
    for" with "Except for".
    - bpf_skb_vlan_push(): Clarify comment about invalidated verifier
    checks.
    - bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
    checks.
    - bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
    mode, and example tunneling protocols with which it can be used.
    - bpf_skb_set_tunnel_key(): Add a reference to the description of
    bpf_skb_get_tunnel_key().
    - bpf_perf_event_output(): Specify that, and for what purpose, the
    helper can be used with programs attached to TC and XDP.

    v3:
    - bpf_skb_get_tunnel_key(): Change and improve description and example.
    - bpf_redirect(): Improve description of BPF_F_INGRESS flag.
    - bpf_perf_event_output(): Fix first sentence of description. Delete
    wrong statement on context being evaluated as a struct pt_reg. Remove
    the long yet incomplete example.
    - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
    configurable.

    Cc: Alexei Starovoitov
    Signed-off-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Add documentation for eBPF helper functions to bpf.h user header file.
    This documentation can be parsed with the Python script provided in
    another commit of the patch series, in order to provide a RST document
    that can later be converted into a man page.

    The objective is to make the documentation easily understandable and
    accessible to all eBPF developers, including beginners.

    This patch contains descriptions for the following helper functions, all
    written by Alexei:

    - bpf_map_lookup_elem()
    - bpf_map_update_elem()
    - bpf_map_delete_elem()
    - bpf_probe_read()
    - bpf_ktime_get_ns()
    - bpf_trace_printk()
    - bpf_skb_store_bytes()
    - bpf_l3_csum_replace()
    - bpf_l4_csum_replace()
    - bpf_tail_call()
    - bpf_clone_redirect()

    v4:
    - bpf_map_lookup_elem(): Add "const" qualifier for key.
    - bpf_map_update_elem(): Add "const" qualifier for key and value.
    - bpf_map_lookup_elem(): Add "const" qualifier for key.
    - bpf_skb_store_bytes(): Clarify comment about invalidated verifier
    checks.
    - bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
    about bpf_csum_diff().
    - bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
    note about bpf_csum_diff().
    - bpf_tail_call(): Bring minor edits to description.
    - bpf_clone_redirect(): Add a note about the relation with
    bpf_redirect(). Also clarify comment about invalidated verifier
    checks.

    v3:
    - bpf_map_lookup_elem(): Fix description of restrictions for flags
    related to the existence of the entry.
    - bpf_trace_printk(): State that trace_pipe can be configured. Fix
    return value in case an unknown format specifier is met. Add a note on
    kernel log notice when the helper is used. Edit example.
    - bpf_tail_call(): Improve comment on stack inheritance.
    - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.

    Cc: Alexei Starovoitov
    Signed-off-by: Quentin Monnet
    Acked-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • Remove previous "overview" of eBPF helpers from user bpf.h header.
    Replace it by a comment explaining how to process the new documentation
    (to come in following patches) with a Python script to produce RST, then
    man page documentation.

    Also add the aforementioned Python script under scripts/. It is used to
    process include/uapi/linux/bpf.h and to extract helper descriptions, to
    turn it into a RST document that can further be processed with rst2man
    to produce a man page. The script takes one "--filename "
    option. If the script is launched from scripts/ in the kernel root
    directory, it should be able to find the location of the header to
    parse, and "--filename " is then optional. If it cannot
    find the file, then the option becomes mandatory. RST-formatted
    documentation is printed to standard output.

    Typical workflow for producing the final man page would be:

    $ ./scripts/bpf_helpers_doc.py \
    --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
    $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
    $ man /tmp/bpf-helpers.7

    Note that the tool kernel-doc cannot be used to document eBPF helpers,
    whose signatures are not available directly in the header files
    (pre-processor directives are used to produce them at the beginning of
    the compilation process).

    v4:
    - Also remove overviews for newly added bpf_xdp_adjust_tail() and
    bpf_skb_get_xfrm_state().
    - Remove vague statement about what helpers are restricted to GPL
    programs in "LICENSE" section for man page footer.
    - Replace license boilerplate with SPDX tag for Python script.

    v3:
    - Change license for man page.
    - Remove "for safety reasons" from man page header text.
    - Change "packets metadata" to "packets" in man page header text.
    - Move and fix comment on helpers introducing no overhead.
    - Remove "NOTES" section from man page footer.
    - Add "LICENSE" section to man page footer.
    - Edit description of file include/uapi/linux/bpf.h in man page footer.

    Signed-off-by: Quentin Monnet
    Signed-off-by: Daniel Borkmann

    Quentin Monnet
     
  • William Tu says:

    ====================
    The patch series provide end-to-end eBPF tunnel testsute. A common topology
    is created below for all types of tunnels:

    Topology:
    ---------
    root namespace | at_ns0 namespace
    |
    ----------- | -----------
    | tnl dev | | | tnl dev | (overlay network)
    ----------- | -----------
    metadata-mode | native-mode
    with bpf |
    |
    ---------- | ----------
    | veth1 | --------- | veth0 | (underlay network)
    ---------- peer ----------

    Device Configuration
    --------------------
    Root namespace with metadata-mode tunnel + BPF
    Device names and addresses:
    veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
    tunnel dev 11, ex: gre11, IPv4: 10.1.1.200 (overlay)

    Namespace at_ns0 with native tunnel
    Device names and addresses:
    veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
    tunnel dev 00, ex: gre00, IPv4: 10.1.1.100 (overlay)

    End-to-end ping packet flow
    ---------------------------
    Most of the tests start by namespace creation, device configuration,
    then ping the underlay and overlay network. When doing 'ping 10.1.1.100'
    from root namespace, the following operations happen:
    1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
    2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
    with remote_ip=172.16.1.200 and others.
    3) Outer tunnel header is prepended and route the packet to veth1's egress
    4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
    5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
    6) Forward the packet to the overlay tnl dev

    Test Cases
    -----------------------------
    Tunnel Type | BPF Programs
    -----------------------------
    GRE: gre_set_tunnel, gre_get_tunnel
    IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
    ERSPAN: erspan_set_tunnel, erspan_get_tunnel
    IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
    VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
    IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
    GENEVE: geneve_set_tunnel, geneve_get_tunnel
    IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
    IPIP: ipip_set_tunnel, ipip_get_tunnel
    IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
    ip6ip6_set_tunnel, ip6ip6_get_tunnel
    XFRM: xfrm_get_state
    ====================

    Signed-off-by: Daniel Borkmann

    Daniel Borkmann
     
  • Move the testsuite to
    selftests/bpf/{test_tunnel_kern.c, test_tunnel.sh}

    Signed-off-by: William Tu
    Signed-off-by: Daniel Borkmann

    William Tu
     
  • The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
    and samples/bpf/test_tunnel_bpf.sh to selftests. There are a couple
    changes from the original:
    1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
    2) simplify the original ipip tests (remove iperf tests)
    3) improve documentation
    4) use bpf_ntoh* and bpf_hton* api

    In summary, 'test_tunnel_kern.o' contains the following bpf program:
    GRE: gre_set_tunnel, gre_get_tunnel
    IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
    ERSPAN: erspan_set_tunnel, erspan_get_tunnel
    IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
    VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
    IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
    GENEVE: geneve_set_tunnel, geneve_get_tunnel
    IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
    IPIP: ipip_set_tunnel, ipip_get_tunnel
    IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
    ip6ip6_set_tunnel, ip6ip6_get_tunnel
    XFRM: xfrm_get_state

    Signed-off-by: William Tu
    Signed-off-by: Daniel Borkmann

    William Tu
     
  • When bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
    pointer, so it was pointing to the new "end of the packet". However skb's
    len field wasn't properly modified, so on the wire ethernet frame had
    original (or even bigger, if adjust_head was used) size. This diff is
    fixing this.

    Fixes: 198d83bb3 (" bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail")
    Signed-off-by: Nikita V. Shirokov
    Signed-off-by: Daniel Borkmann

    Nikita V. Shirokov
     
  • Display the license "gpl" string in bpftool prog command, like:

    # bpftool prog list
    5: tracepoint name func tag 57cd311f2e27366b gpl
    loaded_at Apr 26/09:37 uid 0
    xlated 16B not jited memlock 4096B

    # bpftool --json --pretty prog show
    [{
    "id": 5,
    "type": "tracepoint",
    "name": "func",
    "tag": "57cd311f2e27366b",
    "gpl_compatible": true,
    "loaded_at": "Apr 26/09:37",
    "uid": 0,
    "bytes_xlated": 16,
    "jited": false,
    "bytes_memlock": 4096
    }
    ]

    Signed-off-by: Jiri Olsa
    Signed-off-by: Daniel Borkmann

    Jiri Olsa
     
  • Syncing the bpf.h uapi header with tools.

    Signed-off-by: Jiri Olsa
    Signed-off-by: Daniel Borkmann

    Jiri Olsa
     
  • Adding gpl_compatible flag to struct bpf_prog_info
    so it can be dumped via bpf_prog_get_info_by_fd and
    displayed via bpftool progs dump.

    Alexei noticed 4-byte hole in struct bpf_prog_info,
    so we put the u32 flags field in there, and we can
    keep adding bit fields in there without breaking
    user space.

    Signed-off-by: Jiri Olsa
    Signed-off-by: Daniel Borkmann

    Jiri Olsa
     
  • Willem de Bruijn says:

    ====================
    udp gso

    Segmentation offload reduces cycles/byte for large packets by
    amortizing the cost of protocol stack traversal.

    This patchset implements GSO for UDP. A process can concatenate and
    submit multiple datagrams to the same destination in one send call
    by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
    or passing an analogous cmsg at send time.

    The stack will send the entire large (up to network layer max size)
    datagram through the protocol layer. At the GSO layer, it is broken
    up in individual segments. All receive the same network layer header
    and UDP src and dst port. All but the last segment have the same UDP
    header, but the last may differ in length and checksum.

    Initial results show a significant reduction in UDP cycles/byte.
    See the main patch for more details and benchmark results.

    udp
    876 MB/s 14873 msg/s 624666 calls/s
    11,205,777,429 cycles

    udp gso
    2139 MB/s 36282 msg/s 36282 calls/s
    11,204,374,561 cycles

    The patch set is broken down as follows:
    - patch 1 is a prerequisite: code rearrangement, noop otherwise
    - patch 2 implements the gso logic
    - patch 3 adds protocol stack support for UDP_SEGMENT
    - patch 4,5,7 are refinements
    - patch 6 adds the cmsg interface
    - patch 8..11 are tests

    This idea was presented previously at netconf 2017-2
    http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf

    Changes v1 -> v2
    - Convert __udp_gso_segment to modify headers after skb_segment
    - Split main patch into two, one for gso logic, one for UDP_SEGMENT

    Changes RFC -> v1
    - MSG_MORE:
    fixed, by allowing checksum offload with corking if gso
    - SKB_GSO_UDP_L4:
    made independent from SKB_GSO_UDP
    and removed skb_is_ufo() wrapper
    - NETIF_F_GSO_UDP_L4:
    add to netdev_features_string
    and to netdev-features.txt
    add BUILD_BUG_ON to match SKB_GSO_UDP_L4 value
    - UDP_MAX_SEGMENTS:
    introduce limit on number of segments per gso skb
    to avoid extreme cases like IP_MAX_MTU/IPV4_MIN_MTU
    - CHECKSUM_PARTIAL:
    test against missing feature after ndo_features_check
    if not supported return error, analogous to udp_send_check
    - MSG_ZEROCOPY: removed, deferred for now
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Send udp data between a source and sink, optionally with udp gso.
    The two processes are expected to be run on separate hosts.

    A script is included that runs them together over loopback in a
    single namespace for functionality testing.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Corked sockets take a different path to construct a udp datagram than
    the lockless fast path. Test this alternate path.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Connected sockets use path mtu instead of device mtu.

    Test this path by inserting a route mtu that is lower than the device
    mtu. Verify that the path mtu for the connection matches this lower
    number, then run the same test as in the connectionless case.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Validate udp gso, including edge cases (such as min/max gso sizes).

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Virtual devices such as tunnels and bonding can handle large packets.
    Only segment packets when reaching a physical or loopback device.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Allow specifying segment size in the send call.

    The new control message performs the same function as socket option
    UDP_SEGMENT while avoiding the extra system call.

    [ Export udp_cmsg_send for ipv6. -DaveM ]

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • When sending large datagrams that are later segmented, store data in
    page frags to avoid copying from linear in skb_segment.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • skb_segment by default transfers allocated wmem from the gso skb
    to the tail of the segment list. This underreports real truesize
    of the list, especially if the tail might be dropped.

    Similar to tcp_gso_segment, update wmem_alloc with the aggregate
    list truesize and make each segment responsible for its own
    share by setting skb->destructor.

    Clear gso_skb->destructor prior to calling skb_segment to skip
    the default assignment to tail.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Support generic segmentation offload for udp datagrams. Callers can
    concatenate and send at once the payload of multiple datagrams with
    the same destination.

    To set segment size, the caller sets socket option UDP_SEGMENT to the
    length of each discrete payload. This value must be smaller than or
    equal to the relevant MTU.

    A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
    per send call basis.

    Total byte length may then exceed MTU. If not an exact multiple of
    segment size, the last segment will be shorter.

    The implementation adds a gso_size field to the udp socket, ip(v6)
    cmsg cookie and inet_cork structure to be able to set the value at
    setsockopt or cmsg time and to work with both lockless and corked
    paths.

    Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.

    tcp tso
    3197 MB/s 54232 msg/s 54232 calls/s
    6,457,754,262 cycles

    tcp gso
    1765 MB/s 29939 msg/s 29939 calls/s
    11,203,021,806 cycles

    tcp without tso/gso *
    739 MB/s 12548 msg/s 12548 calls/s
    11,205,483,630 cycles

    udp
    876 MB/s 14873 msg/s 624666 calls/s
    11,205,777,429 cycles

    udp gso
    2139 MB/s 36282 msg/s 36282 calls/s
    11,204,374,561 cycles

    [*] after reverting commit 0a6b2a1dc2a2
    ("tcp: switch to GSO being always on")

    Measured total system cycles ('-a') for one core while pinning both
    the network receive path and benchmark process to that core:

    perf stat -a -C 12 -e cycles \
    ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4

    Note the reduction in calls/s with GSO. Bytes per syscall drops
    increases from 1470 to 61818.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Implement generic segmentation offload support for udp datagrams. A
    follow-up patch adds support to the protocol stack to generate such
    packets.

    UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits
    a large payload into a number of discrete UDP datagrams.

    The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it
    from UFO (SKB_UDP_GSO).

    IPPROTO_UDPLITE is excluded, as that protocol has no gso handler
    registered.

    [ Export __udp_gso_segment for ipv6. -DaveM ]

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • UDP segmentation offload needs access to inet_cork in the udp layer.
    Pass the struct to ip(6)_make_skb instead of allocating it on the
    stack in that function itself.

    This patch is a noop otherwise.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

26 Apr, 2018

2 commits

  • Merging net into net-next to help the bpf folks avoid
    some really ugly merge conflicts.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Jeff Kirsher says:

    ====================
    1GbE Intel Wired LAN Driver Updates 2018-04-25

    This series enables some ethtool and tc-flower filters to be offloaded
    to igb-based network controllers. This is useful when the system
    configuration wants to steer kinds of traffic to a specific hardware
    queue for i210 devices only.

    The first two patch in the series are bug fixes.

    The basis of this series is to export the internal API used to
    configure address filters, so they can be used by ethtool, and
    extending the functionality so an source address can be handled.

    Then, we enable the tc-flower offloading implementation to re-use the
    same infrastructure as ethtool, and storing them in the per-adapter
    "nfc" (Network Filter Config?) list. But for consistency, for
    destructive access they are separated, i.e. an filter added by
    tc-flower can only be removed by tc-flower, but ethtool can read them
    all.

    Only support for VLAN Prio, Source and Destination MAC Address, and
    Ethertype is enabled for now.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller