15 Feb, 2019

1 commit


13 Feb, 2019

1 commit

  • [ Upstream commit 5f30b2e823484ce6a79f2b59901b6351c15effa6 ]

    kzalloc() return should always be checked - notably in example code
    where this may be seen as reference. On failure of allocation in
    livepatch_fix1_dummy_alloc() respectively dummy_alloc() previous
    allocation is freed (thanks to Petr Mladek for
    catching this) and NULL returned.

    Signed-off-by: Nicholas Mc Guire
    Fixes: 439e7271dc2b ("livepatch: introduce shadow variable API")
    Acked-by: Joe Lawrence
    Reviewed-by: Petr Mladek
    Acked-by: Miroslav Benes
    Signed-off-by: Jiri Kosina
    Signed-off-by: Sasha Levin

    Nicholas Mc Guire
     

26 Jan, 2019

1 commit

  • [ Upstream commit 5a863813216ce79e16a8c1503b2543c528b778b6 ]

    Currently, kprobe_events failure won't be handled properly.
    Due to calling system() indirectly to write to kprobe_events,
    it can't be identified whether an error is derived from kprobe or system.

    // buf = "echo '%c:%s %s' >> /s/k/d/t/kprobe_events"
    err = system(buf);
    if (err < 0) {
    printf("failed to create kprobe ..");
    return -1;
    }

    For example, running ./tracex7 sample in ext4 partition,
    "echo p:open_ctree open_ctree >> /s/k/d/t/kprobe_events"
    gets 256 error code system() failure.
    => The error comes from kprobe, but it's not handled correctly.

    According to man of system(3), it's return value
    just passes the termination status of the child shell
    rather than treating the error as -1. (don't care success)

    Which means, currently it's not working as desired.
    (According to the upper code snippet)

    ex) running ./tracex7 with ext4 env.
    # Current Output
    sh: echo: I/O error
    failed to open event open_ctree

    # Desired Output
    failed to create kprobe 'open_ctree' error 'No such file or directory'

    The problem is, error can't be verified whether from child ps
    or system. But using write() directly can verify the command
    failure, and it will treat all error as -1. So I suggest using
    write() directly to 'kprobe_events' rather than calling system().

    Signed-off-by: Daniel T. Lee
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin

    Daniel T. Lee
     

11 Oct, 2018

1 commit

  • Some samples require headers installation, so commit 3fca1700c4c3
    ("kbuild: make samples really depend on headers_install") added
    such dependency in the top Makefile. However, UML fails to build
    with CONFIG_SAMPLES=y because UML does not support headers_install.

    Fixes: 3fca1700c4c3 ("kbuild: make samples really depend on headers_install")
    Reported-by: Kees Cook
    Cc: David Howells
    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

17 Aug, 2018

1 commit

  • It is common XDP practice to unload/deattach the XDP bpf program,
    when the XDP sample program is Ctrl-C interrupted (SIGINT) or
    killed (SIGTERM).

    The samples/bpf programs xdp_redirect_cpu and xdp_rxq_info,
    forgot to trap signal SIGTERM (which is the default signal used
    by the kill command).

    This was discovered by Red Hat QA, which automated scripts depend
    on killing the XDP sample program after a timeout period.

    Fixes: fad3917e361b ("samples/bpf: add cpumap sample program xdp_redirect_cpu")
    Fixes: 0fca931a6f21 ("samples/bpf: program demonstrating access to xdp_rxq_info")
    Reported-by: Jean-Tsung Hsiao
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Yonghong Song
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     

16 Aug, 2018

2 commits

  • Pull networking updates from David Miller:
    "Highlights:

    - Gustavo A. R. Silva keeps working on the implicit switch fallthru
    changes.

    - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
    Luca Coelho.

    - Re-enable ASPM in r8169, from Kai-Heng Feng.

    - Add virtual XFRM interfaces, which avoids all of the limitations of
    existing IPSEC tunnels. From Steffen Klassert.

    - Convert GRO over to use a hash table, so that when we have many
    flows active we don't traverse a long list during accumluation.

    - Many new self tests for routing, TC, tunnels, etc. Too many
    contributors to mention them all, but I'm really happy to keep
    seeing this stuff.

    - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

    - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

    - Add IPSEC offload support to netdevsim, from Shannon Nelson.

    - Add support for slotting with non-uniform distribution to netem
    packet scheduler, from Yousuk Seung.

    - Add UDP GSO support to mlx5e, from Boris Pismenny.

    - Support offloading of Team LAG in NFP, from John Hurley.

    - Allow to configure TX queue selection based upon RX queue, from
    Amritha Nambiar.

    - Support ethtool ring size configuration in aquantia, from Anton
    Mikaev.

    - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

    - Support list based batching and stack traversal of SKBs, this is
    very exciting work. From Edward Cree.

    - Busyloop optimizations in vhost_net, from Toshiaki Makita.

    - Introduce the ETF qdisc, which allows time based transmissions. IGB
    can offload this in hardware. From Vinicius Costa Gomes.

    - Add parameter support to devlink, from Moshe Shemesh.

    - Several multiplication and division optimizations for BPF JIT in
    nfp driver, from Jiong Wang.

    - Lots of prepatory work to make more of the packet scheduler layer
    lockless, when possible, from Vlad Buslov.

    - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
    Toke Høiland-Jørgensen.

    - Support regions and region snapshots in devlink, from Alex Vesker.

    - Allow to attach XDP programs to both HW and SW at the same time on
    a given device, with initial support in nfp. From Jakub Kicinski.

    - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

    - Use PHYLIB in r8169 driver, from Heiner Kallweit.

    - All sorts of changes to support Spectrum 2 in mlxsw driver, from
    Ido Schimmel.

    - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

    - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
    Maxwell.

    - Support for templates in packet scheduler classifier, from Jiri
    Pirko.

    - IPV6 support in RDS, from Ka-Cheong Poon.

    - Native tproxy support in nf_tables, from Máté Eckl.

    - Maintain IP fragment queue in an rbtree, but optimize properly for
    in-order frags. From Peter Oskolkov.

    - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
    bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
    hv/netvsc: Fix NULL dereference at single queue mode fallback
    net: filter: mark expected switch fall-through
    xen-netfront: fix warn message as irq device name has '/'
    cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
    net: dsa: mv88e6xxx: missing unlock on error path
    rds: fix building with IPV6=m
    inet/connection_sock: prefer _THIS_IP_ to current_text_addr
    net: dsa: mv88e6xxx: bitwise vs logical bug
    net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
    ieee802154: hwsim: using right kind of iteration
    net: hns3: Add vlan filter setting by ethtool command -K
    net: hns3: Set tx ring' tc info when netdev is up
    net: hns3: Remove tx ring BD len register in hns3_enet
    net: hns3: Fix desc num set to default when setting channel
    net: hns3: Fix for phy link issue when using marvell phy driver
    net: hns3: Fix for information of phydev lost problem when down/up
    net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
    net: hns3: Add support for serdes loopback selftest
    bnxt_en: take coredump_record structure off stack
    ...

    Linus Torvalds
     
  • Pull Kbuild updates from Masahiro Yamada:

    - verify depmod is installed before modules_install

    - support build salt in case build ids must be unique between builds

    - allow users to specify additional host compiler flags via HOST*FLAGS,
    and rename internal variables to KBUILD_HOST*FLAGS

    - update buildtar script to drop vax support, add arm64 support

    - update builddeb script for better debarch support

    - document the pit-fall of if_changed usage

    - fix parallel build of UML with O= option

    - make 'samples' target depend on headers_install to fix build errors

    - remove deprecated host-progs variable

    - add a new coccinelle script for refcount_t vs atomic_t check

    - improve double-test coccinelle script

    - misc cleanups and fixes

    * tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
    coccicheck: return proper error code on fail
    Coccinelle: doubletest: reduce side effect false positives
    kbuild: remove deprecated host-progs variable
    kbuild: make samples really depend on headers_install
    um: clean up archheaders recipe
    kbuild: add %asm-generic to no-dot-config-targets
    um: fix parallel building with O= option
    scripts: Add Python 3 support to tracing/draw_functrace.py
    builddeb: Add automatic support for sh{3,4}{,eb} architectures
    builddeb: Add automatic support for riscv* architectures
    builddeb: Add automatic support for m68k architecture
    builddeb: Add automatic support for or1k architecture
    builddeb: Add automatic support for sparc64 architecture
    builddeb: Add automatic support for mips{,64}r6{,el} architectures
    builddeb: Add automatic support for mips64el architecture
    builddeb: Add automatic support for ppc64 and powerpcspe architectures
    builddeb: Introduce functions to simplify kconfig tests in set_debarch
    builddeb: Drop check for 32-bit s390
    builddeb: Change architecture detection fallback to use dpkg-architecture
    builddeb: Skip architecture detection when KBUILD_DEBARCH is set
    ...

    Linus Torvalds
     

14 Aug, 2018

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2018-08-13

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Add driver XDP support for veth. This can be used in conjunction with
    redirect of another XDP program e.g. sitting on NIC so the xdp_frame
    can be forwarded to the peer veth directly without modification,
    from Toshiaki.

    2) Add a new BPF map type REUSEPORT_SOCKARRAY and prog type SK_REUSEPORT
    in order to provide more control and visibility on where a SO_REUSEPORT
    sk should be located, and the latter enables to directly select a sk
    from the bpf map. This also enables map-in-map for application migration
    use cases, from Martin.

    3) Add a new BPF helper bpf_skb_ancestor_cgroup_id() that returns the id
    of cgroup v2 that is the ancestor of the cgroup associated with the
    skb at the ancestor_level, from Andrey.

    4) Implement BPF fs map pretty-print support based on BTF data for regular
    hash table and LRU map, from Yonghong.

    5) Decouple the ability to attach BTF for a map from the key and value
    pretty-printer in BPF fs, and enable further support of BTF for maps for
    percpu and LPM trie, from Daniel.

    6) Implement a better BPF sample of using XDP's CPU redirect feature for
    load balancing SKB processing to remote CPU. The sample implements the
    same XDP load balancing as Suricata does which is symmetric hash based
    on IP and L4 protocol, from Jesper.

    7) Revert adding NULL pointer check with WARN_ON_ONCE() in __xdp_return()'s
    critical path as it is ensured that the allocator is present, from Björn.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Aug, 2018

1 commit


10 Aug, 2018

3 commits

  • This implement XDP CPU redirection load-balancing across available
    CPUs, based on the hashing IP-pairs + L4-protocol. This equivalent to
    xdp-cpu-redirect feature in Suricata, which is inspired by the
    Suricata 'ippair' hashing code.

    An important property is that the hashing is flow symmetric, meaning
    that if the source and destination gets swapped then the selected CPU
    will remain the same. This is helps locality by placing both directions
    of a flows on the same CPU, in a forwarding/routing scenario.

    The hashing INITVAL (15485863 the 10^6th prime number) was fairly
    arbitrary choosen, but experiments with kernel tree pktgen scripts
    (pktgen_sample04_many_flows.sh +pktgen_sample05_flow_per_thread.sh)
    showed this improved the distribution.

    This patch also change the default loaded XDP program to be this
    load-balancer. As based on different user feedback, this seems to be
    the expected behavior of the sample xdp_redirect_cpu.

    Link: https://github.com/OISF/suricata/commit/796ec08dd7a63
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     
  • Adjusted function call API to take an initval. This allow the API
    user to set the initial value, as a seed. This could also be used for
    inputting the previous hash.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     
  • The teardown race in cpumap is really hard to reproduce. These changes
    makes it easier to reproduce, for QA.

    The --stress-mode now have a case of a very small queue size of 8, that helps
    to trigger teardown flush to encounter a full queue, which results in calling
    xdp_return_frame API, in a non-NAPI protect context.

    Also increase MAX_CPUS, as my QA department have larger machines than me.

    Tested-by: Jean-Tsung Hsiao
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     

03 Aug, 2018

1 commit

  • The test_cgrp2_attach test covers bpf cgroup attachment code well,
    so let's re-use it for testing allocation/releasing of cgroup storage.

    The extension is pretty straightforward: the bpf program will use
    the cgroup storage to save the number of transmitted bytes.

    Expected output:
    $ ./test_cgrp2_attach2
    Attached DROP prog. This ping in cgroup /foo should fail...
    ping: sendmsg: Operation not permitted
    Attached DROP prog. This ping in cgroup /foo/bar should fail...
    ping: sendmsg: Operation not permitted
    Attached PASS prog. This ping in cgroup /foo/bar should pass...
    Detached PASS from /foo/bar while DROP is attached to /foo.
    This ping in cgroup /foo/bar should fail...
    ping: sendmsg: Operation not permitted
    Attached PASS from /foo/bar and detached DROP from /foo.
    This ping in cgroup /foo/bar should pass...
    ### override:PASS
    ### multi:PASS

    Signed-off-by: Roman Gushchin
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann

    Roman Gushchin
     

27 Jul, 2018

4 commits


21 Jul, 2018

2 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2018-07-20

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Add sharing of BPF objects within one ASIC: this allows for reuse of
    the same program on multiple ports of a device, and therefore gains
    better code store utilization. On top of that, this now also enables
    sharing of maps between programs attached to different ports of a
    device, from Jakub.

    2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature
    detections and unused variable exports, also from Jakub.

    3) First batch of RCU annotation fixes in prog array handling, i.e.
    there are several __rcu markers which are not correct as well as
    some of the RCU handling, from Roman.

    4) Two fixes in BPF sample files related to checking of the prog_cnt
    upper limit from sample loader, from Dan.

    5) Minor cleanup in sockmap to remove a set but not used variable,
    from Colin.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • All conflicts were trivial overlapping changes, so reasonably
    easy to resolve.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Jul, 2018

1 commit

  • Pull networking fixes from David Miller:
    "Lots of fixes, here goes:

    1) NULL deref in qtnfmac, from Gustavo A. R. Silva.

    2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih.

    3) Lost completion messages in AF_XDP, from Magnus Karlsson.

    4) Correct bogus self-assignment in rhashtable, from Rishabh
    Bhatnagar.

    5) Fix regression in ipv6 route append handling, from David Ahern.

    6) Fix masking in __set_phy_supported(), from Heiner Kallweit.

    7) Missing module owner set in x_tables icmp, from Florian Westphal.

    8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire.

    9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy.

    10) Fix NULL deref when using chains in act_csum, from Davide Caratti.

    11) XDP_REDIRECT needs to check if the interface is up and whether the
    MTU is sufficient. From Toshiaki Makita.

    12) Net diag can do a double free when killing TCP_NEW_SYN_RECV
    connections, from Lorenzo Colitti.

    13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a
    full minute, delaying device unregister. From Eric Dumazet.

    14) Update MAC entries in the correct order in ixgbe, from Alexander
    Duyck.

    15) Don't leave partial mangles bpf program in jit_subprogs, from
    Daniel Borkmann.

    16) Fix pfmemalloc SKB state propagation, from Stefano Brivio.

    17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng.

    18) Use after free in tun XDP_TX, from Toshiaki Makita.

    19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole.

    20) Don't reuse remainder of RX page when XDP is set in mlx4, from
    Saeed Mahameed.

    21) Fix window probe handling of TCP rapair sockets, from Stefan
    Baranoff.

    22) Missing socket locking in smc_ioctl(), from Ursula Braun.

    23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann.

    24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva.

    25) Two spots in ipv6 do a rol32() on a hash value but ignore the
    result. Fixes from Colin Ian King"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits)
    tcp: identify cryptic messages as TCP seq # bugs
    ptp: fix missing break in switch
    hv_netvsc: Fix napi reschedule while receive completion is busy
    MAINTAINERS: Drop inactive Vitaly Bordug's email
    net: cavium: Add fine-granular dependencies on PCI
    net: qca_spi: Fix log level if probe fails
    net: qca_spi: Make sure the QCA7000 reset is triggered
    net: qca_spi: Avoid packet drop during initial sync
    ipv6: fix useless rol32 call on hash
    ipv6: sr: fix useless rol32 call on hash
    net: sched: Using NULL instead of plain integer
    net: usb: asix: replace mii_nway_restart in resume path
    net: cxgb3_main: fix potential Spectre v1
    lib/rhashtable: consider param->min_size when setting initial table size
    net/smc: reset recv timeout after clc handshake
    net/smc: add error handling for get_user()
    net/smc: optimize consumer cursor updates
    net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
    ipv6: ila: select CONFIG_DST_CACHE
    net: usb: rtl8150: demote allmulti message to dev_dbg()
    ...

    Linus Torvalds
     

18 Jul, 2018

2 commits


17 Jul, 2018

2 commits


14 Jul, 2018

1 commit

  • People noticed that the code match on IEEE 802.1ad (ETH_P_8021AD) ethertype,
    and this implies Q-in-Q or double tagged VLANs. Thus, we better parse
    the next VLAN header too. It is even marked as a TODO.

    This is relevant for real world use-cases, as XDP cpumap redirect can be
    used when the NIC RSS hashing is broken. E.g. the ixgbe driver HW cannot
    handle double tagged VLAN packets, and places everything into a single
    RX queue. Using cpumap redirect, users can redistribute traffic across
    CPUs to solve this, which is faster than the network stacks RPS solution.

    It is left as an exerise how to distribute the packets across CPUs. It
    would be convenient to use the RX hash, but that is not _yet_ exposed
    to XDP programs. For now, users can code their own hash, as I've demonstrated
    in the Suricata code (where Q-in-Q is handled correctly).

    Reported-by: Florian Maury
    Reported-by: Marek Majkowski
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     

12 Jul, 2018

1 commit


10 Jul, 2018

1 commit

  • The below path error can occur:

    # ./xdp2skb_meta.sh --dev eth0 --list
    ./xdp2skb_meta.sh: line 61: /usr/sbin/tc: No such file or directory

    So just use command names instead of absolute paths of tc and ip.
    In addition, it allow callers to redefine $TC and $IP paths

    Fixes: 36e04a2d78d9 ("samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB")
    Reviewed-by: Jesper Dangaard Brouer
    Signed-off-by: Taeung Song
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Taeung Song
     

07 Jul, 2018

1 commit

  • Pull VFIO fixes from Alex Williamson:

    - Make vfio-pci IGD extensions optional via Kconfig (Alex Williamson)

    - Remove unused and soon to be removed map_atomic callback from mbochs
    sample driver, add unmap callback to avoid dmabuf leaks (Gerd
    Hoffmann)

    - Fix usage of get_user_pages_longterm() (Jason Gunthorpe)

    - Fix sample mbochs driver vm_operations_struct.fault return type
    (Souptick Joarder)

    * tag 'vfio-v4.18-rc4' of git://github.com/awilliam/linux-vfio:
    sample/vfio-mdev: Change return type to vm_fault_t
    vfio: Use get_user_pages_longterm correctly
    sample/mdev/mbochs: add mbochs_kunmap_dmabuf
    sample/mdev/mbochs: remove mbochs_kmap_atomic_dmabuf
    vfio/pci: Make IGD support a configurable option

    Linus Torvalds
     

05 Jul, 2018

5 commits

  • For untracked executables of samples/bpf, add this.

    Untracked files:
    (use "git add ..." to include in what will be committed)

    samples/bpf/cpustat
    samples/bpf/fds_example
    samples/bpf/lathist
    samples/bpf/load_sock_ops
    ...

    Signed-off-by: Taeung Song
    Acked-by: David S. Miller
    Signed-off-by: Daniel Borkmann

    Taeung Song
     
  • test_task_rename() and test_urandom_read()
    can be failed during write() and read(),
    So check the result of them.

    Reviewed-by: David Laight
    Signed-off-by: Taeung Song
    Acked-by: David S. Miller
    Signed-off-by: Daniel Borkmann

    Taeung Song
     
  • To avoid the below build warning message,
    use new generate_load() checking the return value.

    ignoring return value of ‘system’, declared with attribute warn_unused_result

    And it also refactors the duplicate code of both
    test_perf_event_all_cpu() and test_perf_event_task()

    Cc: Teng Qin
    Signed-off-by: Taeung Song
    Acked-by: David S. Miller
    Signed-off-by: Daniel Borkmann

    Taeung Song
     
  • This fixes build error regarding redefinition:

    CLANG-bpf samples/bpf/parse_varlen.o
    samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr'
    struct vlan_hdr {
    ^
    ./include/linux/if_vlan.h:38:8: note: previous definition is here

    So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h

    Signed-off-by: Taeung Song
    Acked-by: David S. Miller
    Signed-off-by: Daniel Borkmann

    Taeung Song
     
  • convert mbochs_region_vm_fault and mbochs_dmabuf_vm_fault
    to return vm_fault_t type.

    Signed-off-by: Souptick Joarder
    Signed-off-by: Alex Williamson

    Souptick Joarder
     

04 Jul, 2018

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2018-07-03

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) Various improvements to bpftool and libbpf, that is, bpftool build
    speed improvements, missing BPF program types added for detection
    by section name, ability to load programs from '.text' section is
    made to work again, and better bash completion handling, from Jakub.

    2) Improvements to nfp JIT's map read handling which allows for optimizing
    memcpy from map to packet, from Jiong.

    3) New BPF sample is added which demonstrates XDP in combination with
    bpf_perf_event_output() helper to sample packets on all CPUs, from Toke.

    4) Add a new BPF kselftest case for tracking connect(2) BPF hooks
    infrastructure in combination with TFO, from Andrey.

    5) Extend the XDP/BPF xdp_rxq_info sample code with a cmdline option to
    read payload from packet data in order to use it for benchmarking.
    Also for '--action XDP_TX' option implement swapping of MAC addresses
    to avoid drops on some hardware seen during testing, from Jesper.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Jul, 2018

1 commit


29 Jun, 2018

3 commits

  • For ACLs implemented using either FIB rules or FIB entries, the BPF
    program needs the FIB lookup status to be able to drop the packet.
    Since the bpf_fib_lookup API has not reached a released kernel yet,
    change the return code to contain an encoding of the FIB lookup
    result and return the nexthop device index in the params struct.

    In addition, inform the BPF program of any post FIB lookup reason as
    to why the packet needs to go up the stack.

    The fib result for unicast routes must have an egress device, so remove
    the check that it is non-NULL.

    Signed-off-by: David Ahern
    Signed-off-by: Daniel Borkmann

    David Ahern
     
  • XDP_TX requires also changing the MAC-addrs, else some hardware
    may drop the TX packet before reaching the wire. This was
    observed with driver mlx5.

    If xdp_rxq_info select --action XDP_TX the swapmac functionality
    is activated. It is also possible to manually enable via cmdline
    option --swapmac. This is practical if wanting to measure the
    overhead of writing/updating payload for other action types.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Toke Høiland-Jørgensen
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     
  • There is a cost associated with reading the packet data payload
    that this test ignored. Add option --read to allow enabling
    reading part of the payload.

    This sample/tool helps us analyse an issue observed with a NIC
    mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4.

    With no_touch of data:

    Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch
    XDP stats CPU pps issue-pps
    XDP-RX CPU 0 14,465,157 0
    XDP-RX CPU 1 14,464,728 0
    XDP-RX CPU 2 14,465,283 0
    XDP-RX CPU 3 14,465,282 0
    XDP-RX CPU 4 14,464,159 0
    XDP-RX CPU 5 14,465,379 0
    XDP-RX CPU total 86,789,992

    When not touching data, we observe that the CPUs have idle cycles.
    When reading data the CPUs are 100% busy in softirq.

    With reading data:

    Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
    XDP stats CPU pps issue-pps
    XDP-RX CPU 0 9,620,639 0
    XDP-RX CPU 1 9,489,843 0
    XDP-RX CPU 2 9,407,854 0
    XDP-RX CPU 3 9,422,289 0
    XDP-RX CPU 4 9,321,959 0
    XDP-RX CPU 5 9,395,242 0
    XDP-RX CPU total 56,657,828

    The effect seen above is a result of cache-misses occuring when
    more RXQs are being used. Based on perf-event observations, our
    conclusion is that the CPUs DDIO (Direct Data I/O) choose to
    deliver packet into main memory, instead of L3-cache. We also
    found, that this can be mitigated by either using less RXQs or by
    reducing NICs the RX-ring size.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Toke Høiland-Jørgensen
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Jesper Dangaard Brouer
     

27 Jun, 2018

1 commit

  • Add an example program showing how to sample packets from XDP using the
    perf event buffer. The example userspace program just prints the ethernet
    header for every packet sampled.

    Reviewed-by: Jakub Kicinski
    Signed-off-by: Toke Høiland-Jørgensen
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Toke Høiland-Jørgensen
     

26 Jun, 2018

1 commit