16 Nov, 2019

1 commit

  • Add few kernel functions with various number of arguments,
    their types and sizes for BPF trampoline testing to cover
    different calling conventions.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20191114185720.1641606-9-ast@kernel.org

    Alexei Starovoitov
     

04 Nov, 2019

13 commits

  • traceroute6 output can be confusing, in that it shows the address
    that a router would use to reach the sender, rather than the address
    the packet used to reach the router.
    Consider this case:

    ------------------------ N2
    | |
    ------ ------ N3 ----
    | R1 | | R2 |------|H2|
    ------ ------ ----
    | |
    ------------------------ N1
    |
    ----
    |H1|
    ----

    where H1's default route is through R1, and R1's default route is
    through R2 over N2.
    traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2.

    The script below can be used to reproduce this scenario.

    traceroute6 output without this patch:

    traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets
    1 2000:101::1 (2000:101::1) 0.036 ms 0.008 ms 0.006 ms
    2 2000:101::2 (2000:101::2) 0.011 ms 0.008 ms 0.007 ms
    3 2000:103::4 (2000:103::4) 0.013 ms 0.010 ms 0.009 ms

    traceroute6 output with this patch:

    traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets
    1 2000:101::1 (2000:101::1) 0.056 ms 0.019 ms 0.006 ms
    2 2000:102::2 (2000:102::2) 0.013 ms 0.008 ms 0.008 ms
    3 2000:103::4 (2000:103::4) 0.013 ms 0.009 ms 0.009 ms

    #!/bin/bash
    #
    # ------------------------ N2
    # | |
    # ------ ------ N3 ----
    # | R1 | | R2 |------|H2|
    # ------ ------ ----
    # | |
    # ------------------------ N1
    # |
    # ----
    # |H1|
    # ----
    #
    # N1: 2000:101::/64
    # N2: 2000:102::/64
    # N3: 2000:103::/64
    #
    # R1's host part of address: 1
    # R2's host part of address: 2
    # H1's host part of address: 3
    # H2's host part of address: 4
    #
    # For example:
    # the IPv6 address of R1's interface on N2 is 2000:102::1/64
    #
    # Nets are implemented by macvlan interfaces (bridge mode) over
    # dummy interfaces.
    #

    # Create net namespaces
    ip netns add host1
    ip netns add host2
    ip netns add rtr1
    ip netns add rtr2

    # Create nets
    ip link add net1 type dummy; ip link set net1 up
    ip link add net2 type dummy; ip link set net2 up
    ip link add net3 type dummy; ip link set net3 up

    # Add interfaces to net1, move them to their nemaspaces
    ip link add link net1 dev host1net1 type macvlan mode bridge
    ip link set host1net1 netns host1
    ip link add link net1 dev rtr1net1 type macvlan mode bridge
    ip link set rtr1net1 netns rtr1
    ip link add link net1 dev rtr2net1 type macvlan mode bridge
    ip link set rtr2net1 netns rtr2

    # Add interfaces to net2, move them to their nemaspaces
    ip link add link net2 dev rtr1net2 type macvlan mode bridge
    ip link set rtr1net2 netns rtr1
    ip link add link net2 dev rtr2net2 type macvlan mode bridge
    ip link set rtr2net2 netns rtr2

    # Add interfaces to net3, move them to their nemaspaces
    ip link add link net3 dev rtr2net3 type macvlan mode bridge
    ip link set rtr2net3 netns rtr2
    ip link add link net3 dev host2net3 type macvlan mode bridge
    ip link set host2net3 netns host2

    # Configure interfaces and routes in host1
    ip netns exec host1 ip link set lo up
    ip netns exec host1 ip link set host1net1 up
    ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1
    ip netns exec host1 ip -6 route add default via 2000:101::1

    # Configure interfaces and routes in rtr1
    ip netns exec rtr1 ip link set lo up
    ip netns exec rtr1 ip link set rtr1net1 up
    ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1
    ip netns exec rtr1 ip link set rtr1net2 up
    ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2
    ip netns exec rtr1 ip -6 route add default via 2000:102::2
    ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1

    # Configure interfaces and routes in rtr2
    ip netns exec rtr2 ip link set lo up
    ip netns exec rtr2 ip link set rtr2net1 up
    ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1
    ip netns exec rtr2 ip link set rtr2net2 up
    ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2
    ip netns exec rtr2 ip link set rtr2net3 up
    ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3
    ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1

    # Configure interfaces and routes in host2
    ip netns exec host2 ip link set lo up
    ip netns exec host2 ip link set host2net3 up
    ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3
    ip netns exec host2 ip -6 route add default via 2000:103::2

    # Ping host2 from host1
    ip netns exec host1 ping6 -c5 2000:103::4

    # Traceroute host2 from host1
    ip netns exec host1 traceroute6 2000:103::4

    # Delete nets
    ip link del net3
    ip link del net2
    ip link del net1

    # Delete namespaces
    ip netns del rtr2
    ip netns del rtr1
    ip netns del host2
    ip netns del host1

    Signed-off-by: Francesco Ruggeri
    Original-patch-by: Honggang Xu
    Signed-off-by: David S. Miller

    Francesco Ruggeri
     
  • As mentioned in commit e95584a889e1 ("tipc: fix unlimited bundling of
    small messages"), the current message bundling algorithm is inefficient
    that can generate bundles of only one payload message, that causes
    unnecessary overheads for both the sender and receiver.

    This commit re-designs the 'tipc_msg_make_bundle()' function (now named
    as 'tipc_msg_try_bundle()'), so that when a message comes at the first
    place, we will just check & keep a reference to it if the message is
    suitable for bundling. The message buffer will be put into the link
    backlog queue and processed as normal. Later on, when another one comes
    we will make a bundle with the first message if possible and so on...
    This way, a bundle if really needed will always consist of at least two
    payload messages. Otherwise, we let the first buffer go its way without
    any need of bundling, so reduce the overheads to zero.

    Moreover, since now we have both the messages in hand, we can even
    optimize the 'tipc_msg_bundle()' function, make bundle of a very large
    (size ~ MSS) and small messages which is not with the current algorithm
    e.g. [1400-byte message] + [10-byte message] (MTU = 1500).

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the
    primary address of the interface the packet was received on, even if
    the path goes through a secondary address. In the example:

    1.0.3.1/24
    ---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ----
    |H1|--------------------------|R1|--------------------------|H2|
    ---- N1 ---- N2 ----

    where 1.0.3.1/24 is R1's primary address on N1, traceroute from
    H1 to H2 returns:

    traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
    1 1.0.3.1 (1.0.3.1) 0.018 ms 0.006 ms 0.006 ms
    2 1.0.2.4 (1.0.2.4) 0.021 ms 0.007 ms 0.007 ms

    After applying this patch, it returns:

    traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
    1 1.0.1.1 (1.0.1.1) 0.033 ms 0.007 ms 0.006 ms
    2 1.0.2.4 (1.0.2.4) 0.011 ms 0.007 ms 0.007 ms

    Original-patch-by: Bill Fenner
    Signed-off-by: Francesco Ruggeri
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Francesco Ruggeri
     
  • use the specified functions to init resource.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Unlocking of a not locked mutex is not allowed.
    Other kernel thread may be in critical section while
    we unlock it because of setting user_feature fail.

    Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
    Cc: Paul Blakey
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • When we destroy the flow tables which may contain the flow_mask,
    so release the flow mask struct.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The most case *index < ma->max, and flow-mask is not NULL.
    We add un/likely for performance.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Simplify the code and remove the unnecessary BUILD_BUG_ON.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The full looking up on flow table traverses all mask array.
    If mask-array is too large, the number of invalid flow-mask
    increase, performance will be drop.

    One bad case, for example: M means flow-mask is valid and NULL
    of flow-mask means deleted.

    +-------------------------------------------+
    | M | NULL | ... | NULL | M|
    +-------------------------------------------+

    In that case, without this patch, openvswitch will traverses all
    mask array, because there will be one flow-mask in the tail. This
    patch changes the way of flow-mask inserting and deleting, and the
    mask array will be keep as below: there is not a NULL hole. In the
    fast path, we can "break" "for" (not "continue") in flow_lookup
    when we get a NULL flow-mask.

    "break"
    v
    +-------------------------------------------+
    | M | M | NULL |... | NULL | NULL|
    +-------------------------------------------+

    This patch don't optimize slow or control path, still using ma->max
    to traverse. Slow path:
    * tbl_mask_array_realloc
    * ovs_flow_tbl_lookup_exact
    * flow_mask_find

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Port the codes to linux upstream and with little changes.

    Pravin B Shelar, says:
    | In case hash collision on mask cache, OVS does extra flow
    | lookup. Following patch avoid it.

    Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • When creating and inserting flow-mask, if there is no available
    flow-mask, we realloc the mask array. When removing flow-mask,
    if necessary, we shrink mask array.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Port the codes to linux upstream and with little changes.

    Pravin B Shelar, says:
    | mask caches index of mask in mask_list. On packet recv OVS
    | need to traverse mask-list to get cached mask. Therefore array
    | is better for retrieving cached mask. This also allows better
    | cache replacement algorithm by directly checking mask's existence.

    Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The idea of this optimization comes from a patch which
    is committed in 2014, openvswitch community. The author
    is Pravin B Shelar. In order to get high performance, I
    implement it again. Later patches will use it.

    Pravin B Shelar, says:
    | On every packet OVS needs to lookup flow-table with every
    | mask until it finds a match. The packet flow-key is first
    | masked with mask in the list and then the masked key is
    | looked up in flow-table. Therefore number of masks can
    | affect packet processing performance.

    Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

03 Nov, 2019

2 commits

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2019-11-02

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 30 non-merge commits during the last 7 day(s) which contain
    a total of 41 files changed, 1864 insertions(+), 474 deletions(-).

    The main changes are:

    1) Fix long standing user vs kernel access issue by introducing
    bpf_probe_read_user() and bpf_probe_read_kernel() helpers, from Daniel.

    2) Accelerated xskmap lookup, from Björn and Maciej.

    3) Support for automatic map pinning in libbpf, from Toke.

    4) Cleanup of BTF-enabled raw tracepoints, from Alexei.

    5) Various fixes to libbpf and selftests.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The only slightly tricky merge conflict was the netdevsim because the
    mutex locking fix overlapped a lot of driver reload reorganization.

    The rest were (relatively) trivial in nature.

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Nov, 2019

8 commits

  • Pull networking fixes from David Miller:

    1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

    2) Several leaks and other fixes in kTLS support of mlx5 driver, from
    Tariq Toukan.

    3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
    Høiland-Jørgensen.

    4) Add an r8152 device ID, from Kazutoshi Noguchi.

    5) Missing include in ipv6's addrconf.c, from Ben Dooks.

    6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
    easily infer the 32-bit secret otherwise etc.

    7) Several netdevice nesting depth fixes from Taehee Yoo.

    8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
    when doing lockless skb_queue_empty() checks, and accessing
    sk_napi_id/sk_incoming_cpu lockless as well.

    9) Fix jumbo packet handling in RXRPC, from David Howells.

    10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

    11) Fix DMA synchronization in gve driver, from Yangchun Fu.

    12) Several bpf offload fixes, from Jakub Kicinski.

    13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

    14) Fix ping latency during high traffic rates in hisilicon driver, from
    Jiangfent Xiao.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
    net: fix installing orphaned programs
    net: cls_bpf: fix NULL deref on offload filter removal
    selftests: bpf: Skip write only files in debugfs
    selftests: net: reuseport_dualstack: fix uninitalized parameter
    r8169: fix wrong PHY ID issue with RTL8168dp
    net: dsa: bcm_sf2: Fix IMP setup for port different than 8
    net: phylink: Fix phylink_dbg() macro
    gve: Fixes DMA synchronization.
    inet: stop leaking jiffies on the wire
    ixgbe: Remove duplicate clear_bit() call
    Documentation: networking: device drivers: Remove stray asterisks
    e1000: fix memory leaks
    i40e: Fix receive buffer starvation for AF_XDP
    igb: Fix constant media auto sense switching when no cable is connected
    net: ethernet: arc: add the missed clk_disable_unprepare
    igb: Enable media autosense for the i350.
    igb/igc: Don't warn on fatal read failures when the device is removed
    tcp: increase tcp_max_syn_backlog max value
    net: increase SOMAXCONN to 4096
    netdevsim: Fix use-after-free during device dismantle
    ...

    Linus Torvalds
     
  • In this commit the XSKMAP entry lookup function used by the XDP
    redirect code is moved from the xskmap.c file to the xdp_sock.h
    header, so the lookup can be inlined from, e.g., the
    bpf_xdp_redirect_map() function.

    Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the
    xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush()
    functions.

    Finally, all the XDP socket functions were moved from linux/bpf.h to
    net/xdp_sock.h, where most of the XDP sockets functions are anyway.

    This yields a ~2% performance boost for the xdpsock "rx_drop"
    scenario.

    Signed-off-by: Björn Töpel
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com

    Björn Töpel
     
  • When netdevice with offloaded BPF programs is destroyed
    the programs are orphaned and removed from the program
    IDA - their IDs get released (the programs may remain
    accessible via existing open file descriptors and pinned
    files). After IDs are released they are set to 0.

    This confuses dev_change_xdp_fd() because it compares
    the __dev_xdp_query() result where 0 means no program
    with prog->aux->id where 0 means orphaned.

    dev_change_xdp_fd() would have incorrectly returned success
    even though it had not installed the program.

    Since drivers already catch this case via bpf_offload_dev_match()
    let them handle this case. The error message drivers produce in
    this case ("program loaded for a different device") is in fact
    correct as the orphaned program must had to be loaded for a
    different device.

    Fixes: c14a9f633d9e ("net: Don't call XDP_SETUP_PROG when nothing is changed")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Commit 401192113730 ("net: sched: refactor block offloads counter
    usage") missed the fact that either new prog or old prog may be
    NULL.

    Fixes: 401192113730 ("net: sched: refactor block offloads counter usage")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Historically linux tried to stick to RFC 791, 1122, 2003
    for IPv4 ID field generation.

    RFC 6864 made clear that no matter how hard we try,
    we can not ensure unicity of IP ID within maximum
    lifetime for all datagrams with a given source
    address/destination address/protocol tuple.

    Linux uses a per socket inet generator (inet_id), initialized
    at connection startup with a XOR of 'jiffies' and other
    fields that appear clear on the wire.

    Thiemo Nagel pointed that this strategy is a privacy
    concern as this provides 16 bits of entropy to fingerprint
    devices.

    Let's switch to a random starting point, this is just as
    good as far as RFC 6864 is concerned and does not leak
    anything critical.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: Thiemo Nagel
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Taking over hw-learned entries is not a likely scenario so restore the
    unlikely() use for the case of SW taking over externally learned
    entries.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If we setup the fdb flags prior to calling fdb_create() we can avoid
    two atomic bitops when learning a new entry.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If we modify br_fdb_update() to take flags directly we can get rid of
    one test and one atomic bitop in the learning path.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

01 Nov, 2019

10 commits

  • Now that there's no restriction from the DSA core side regarding
    the switch IDs and port numbers, only tag_8021q which is currently
    reserving 3 bits for the switch ID and 4 bits for the port number, has
    limitation for these values. Update their descriptions to reflect that.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Because there is no static array describing the links between switches
    anymore, we have no reason to force a limitation of the index value
    set by the device tree.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • The DSA fabric setup code has been simplified a lot so get rid of
    the dsa_tree_remove_switch, dsa_tree_add_switch and dsa_switch_add
    helpers, and keep the code simple with only the dsa_switch_probe and
    dsa_switch_remove functions.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Now that the DSA ports are listed in the switch fabric, there is
    no need to store the dsa_switch structures from the drivers in the
    fabric anymore. So get rid of the dst->ds static array.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • The dsa_switch structure has no routing table specific data to setup,
    so the switch fabric can directly walk its ports and initialize its
    routing table from them.

    This allows us to remove the dsa_switch_setup_routing_table function.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Drivers do not use the ds->rtable static arrays anymore, get rid of it.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Implement a new list of DSA links in the switch fabric itself, to
    provide an alterative to the ds->rtable static arrays.

    At the same time, provide a new dsa_routing_port() helper to abstract
    the usage of ds->rtable in drivers. If there's no port to reach a
    given device, return the first invalid port, ds->num_ports. This avoids
    potential signedness errors or the need to define special values.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • tcp_max_syn_backlog default value depends on memory size
    and TCP ehash size. Before this patch, the max value
    was 2048 [1], which is considered too small nowadays.

    Increase it to 4096 to match the recent SOMAXCONN change.

    [1] This is with TCP ehash size being capped to 524288 buckets.

    Signed-off-by: Eric Dumazet
    Cc: Willy Tarreau
    Cc: Yue Cao
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When rxrpc_recvmsg_data() sets the return value to 1 because it's drained
    all the data for the last packet, it checks the last-packet flag on the
    whole packet - but this is wrong, since the last-packet flag is only set on
    the final subpacket of the last jumbo packet. This means that a call that
    receives its last packet in a jumbo packet won't complete properly.

    Fix this by having rxrpc_locate_data() determine the last-packet state of
    the subpacket it's looking at and passing that back to the caller rather
    than having the caller look in the packet header. The caller then needs to
    cache this in the rxrpc_call struct as rxrpc_locate_data() isn't then
    called again for this packet.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • …rnel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    Just two fixes:
    * HT operation is not allowed on channel 14 (Japan only)
    * netlink policy for nexthop attribute was wrong
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

31 Oct, 2019

6 commits

  • Extend struct tc_action with new "tcfa_flags" field. Set the field in
    tcf_idr_create() function and provide new helper
    tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
    value. Update individual hardware-offloaded actions init() to pass their
    "flags" argument to new helper in order to skip percpu stats allocation
    when user requested it through flags.

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extend TCA_ACT space with nla_bitfield32 flags. Add
    TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
    tcf_action_init_1() and pass resulting value as additional argument to
    a_o->init().

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Modify stats update helper functions introduced in previous patches in this
    series to fallback to regular tc_action->tcfa_{b|q}stats if cpu stats are
    not allocated for the action argument. If regular non-percpu allocated
    counters are in use, then obtain action tcfa_lock while modifying them.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Previous commit introduced helper function for updating qstats and
    refactored set of actions to use the helpers, instead of modifying qstats
    directly. However, one of the affected action exposes its qstats to
    skb_tc_reinsert(), which then modifies it.

    Refactor skb_tc_reinsert() to return integer error code and don't increment
    overlimit qstats in case of error, and use the returned error code in
    tcf_mirred_act() to manually increment the overlimit counter with new
    helper function.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extract common code that increments cpu_qstats counters into standalone act
    API functions. Change hardware offloaded actions that use percpu counter
    allocation to use the new functions instead of accessing cpu_qstats
    directly.

    This commit doesn't change functionality.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extract common code that increments cpu_bstats counter into standalone act
    API function. Change hardware offloaded actions that use percpu counter
    allocation to use the new function instead of incrementing cpu_bstats
    directly.

    This commit doesn't change functionality.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov