30 Jan, 2016

1 commit

  • There are cases where qdisc_dequeue_peeked can return NULL, and the result
    is dereferenced later on in the function.

    Similarly to the other qdisc dequeue functions, check whether the skb
    pointer is NULL and if it is, goto out.

    Signed-off-by: Bernie Harris
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Bernie Harris
     

12 Jan, 2016

2 commits


11 Jan, 2016

2 commits

  • This work adds a generalization of the ingress qdisc as a qdisc holding
    only classifiers. The clsact qdisc works on ingress, but also on egress.
    In both cases, it's execution happens without taking the qdisc lock, and
    the main difference for the egress part compared to prior version of [1]
    is that this can be applied with _any_ underlying real egress qdisc (also
    classless ones).

    Besides solving the use-case of [1], that is, allowing for more programmability
    on assigning skb->priority for the mqprio case that is supported by most
    popular 10G+ NICs, it also opens up a lot more flexibility for other tc
    applications. The main work on classification can already be done at clsact
    egress time if the use-case allows and state stored for later retrieval
    f.e. again in skb->priority with major/minors (which is checked by most
    classful qdiscs before consulting tc_classify()) and/or in other skb fields
    like skb->tc_index for some light-weight post-processing to get to the
    eventual classid in case of a classful qdisc. Another use case is that
    the clsact egress part allows to have a central egress counterpart to
    the ingress classifiers, so that classifiers can easily share state (e.g.
    in cls_bpf via eBPF maps) for ingress and egress.

    Currently, default setups like mq + pfifo_fast would require for this to
    use, for example, prio qdisc instead (to get a tc_classify() run) and to
    duplicate the egress classifier for each queue. With clsact, it allows
    for leaving the setup as is, it can additionally assign skb->priority to
    put the skb in one of pfifo_fast's bands and it can share state with maps.
    Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
    w/o the need to perform a skb_dst_force() to hold on to it any longer. In
    lwt case, we can also use this facility to setup dst metadata via cls_bpf
    (bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
    that (case of IFF_NO_QUEUE devices, for example).

    The realization can be done without any changes to the scheduler core
    framework. All it takes is that we have two a-priori defined minors/child
    classes, where we can mux between ingress and egress classifier list
    (dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
    dev->_tx to avoid extra cacheline miss for moderate loads). The egress
    part is a bit similar modelled to handle_ing() and patched to a noop in
    case the functionality is not used. Both handlers are now called
    sch_handle_ingress() and sch_handle_egress(), code sharing among the two
    doesn't seem practical as there are various minor differences in both
    paths, so that making them conditional in a single handler would rather
    slow things down.

    Full compatibility to ingress qdisc is provided as well. Since both
    piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
    per netdevice, and thus ingress qdisc specific behaviour can be retained
    for user space. This means, either a user does 'tc qdisc add dev foo ingress'
    and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
    alternative, where both, ingress and egress classifier can be configured
    as in the below example. ingress qdisc supports attaching classifier to any
    minor number whereas clsact has two fixed minors for muxing between the
    lists, therefore to not break user space setups, they are better done as
    two separate qdiscs.

    I decided to extend the sch_ingress module with clsact functionality so
    that commonly used code can be reused, the module is being aliased with
    sch_clsact so that it can be auto-loaded properly. Alternative would have been
    to add a flag when initializing ingress to alter its behaviour plus aliasing
    to a different name (as it's more than just ingress). However, the first would
    end up, based on the flag, choosing the new/old behaviour by calling different
    function implementations to handle each anyway, the latter would require to
    register ingress qdisc once again under different alias. So, this really begs
    to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
    by its own that share callbacks used by both.

    Example, adding qdisc:

    # tc qdisc add dev foo clsact
    # tc qdisc show dev foo
    qdisc mq 0: root
    qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc clsact ffff: parent ffff:fff1

    Adding filters (deleting, etc works analogous by specifying ingress/egress):

    # tc filter add dev foo ingress bpf da obj bar.o sec ingress
    # tc filter add dev foo egress bpf da obj bar.o sec egress
    # tc filter show dev foo ingress
    filter protocol all pref 49152 bpf
    filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
    # tc filter show dev foo egress
    filter protocol all pref 49152 bpf
    filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

    A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
    show an empty list for clsact. Either using the parent names (ingress/egress)
    or specifying the full major/minor will then show the related filter lists.

    Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.

    [1] http://patchwork.ozlabs.org/patch/512949/

    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Add a skb_at_tc_ingress() as this will be needed elsewhere as well and
    can hide the ugly ifdef.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

06 Jan, 2016

1 commit

  • When a qdisc is using per cpu stats (currently just the ingress
    qdisc) only the bstats are being freed. This also free's the qstats.

    Fixes: b0ab6f92752b9f9d8 ("net: sched: enable per cpu qstats")
    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    John Fastabend
     

16 Dec, 2015

1 commit

  • Stas Nichiporovich reported a regression in his HFSC qdisc setup
    on a non multi queue device.

    It turns out I mistakenly added a TCQ_F_NOPARENT flag on all qdisc
    allocated in qdisc_create() for non multi queue devices, which was
    rather buggy. I was clearly mislead by the TCQ_F_ONETXQUEUE that is
    also set here for no good reason, since it only matters for the root
    qdisc.

    Fixes: 4eaf3b84f288 ("net_sched: fix qdisc_tree_decrease_qlen() races")
    Reported-by: Stas Nichiporovich
    Tested-by: Stas Nichiporovich
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Dec, 2015

1 commit

  • qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
    devices.

    One problem is that it updates sch->q.qlen and sch->qstats.drops
    on the mq/mqprio root qdisc, while it should not : Daniele
    reported underflows errors :
    [ 681.774821] PAX: sch->q.qlen: 0 n: 1
    [ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
    [ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1
    [ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
    [ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
    [ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
    [ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
    [ 681.774962] Call Trace:
    [ 681.774967] [] dump_stack+0x4c/0x7f
    [ 681.774970] [] report_size_overflow+0x34/0x50
    [ 681.774972] [] qdisc_tree_decrease_qlen+0x152/0x160
    [ 681.774976] [] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
    [ 681.774978] [] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
    [ 681.774980] [] __qdisc_run+0x4d/0x1d0
    [ 681.774983] [] net_tx_action+0xc2/0x160
    [ 681.774985] [] __do_softirq+0xf1/0x200
    [ 681.774987] [] run_ksoftirqd+0x1e/0x30
    [ 681.774989] [] smpboot_thread_fn+0x150/0x260
    [ 681.774991] [] ? sort_range+0x40/0x40
    [ 681.774992] [] kthread+0xe4/0x100
    [ 681.774994] [] ? kthread_worker_fn+0x170/0x170
    [ 681.774995] [] ret_from_fork+0x3e/0x70

    mq/mqprio have their own ways to report qlen/drops by folding stats on
    all their queues, with appropriate locking.

    A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
    without proper locking : concurrent qdisc updates could corrupt the list
    that qdisc_match_from_root() parses to find a qdisc given its handle.

    Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
    qdisc_tree_decrease_qlen() can use to abort its tree traversal,
    as soon as it meets a mq/mqprio qdisc children.

    Second problem can be fixed by RCU protection.
    Qdisc are already freed after RCU grace period, so qdisc_list_add() and
    qdisc_list_del() simply have to use appropriate rcu list variants.

    A future patch will add a per struct netdev_queue list anchor, so that
    qdisc_tree_decrease_qlen() can have more efficient lookups.

    Reported-by: Daniele Fucini
    Signed-off-by: Eric Dumazet
    Cc: Cong Wang
    Cc: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Nov, 2015

2 commits


04 Nov, 2015

1 commit


20 Oct, 2015

1 commit


11 Oct, 2015

2 commits

  • selinux needs few changes to accommodate fact that SYNACK messages
    can be attached to a request socket, lacking sk_security pointer

    (Only syncookies are still attached to a TCP_LISTEN socket)

    Adds a new sk_listener() helper, and use it in selinux and sch_fq

    Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
    Signed-off-by: Eric Dumazet
    Reported by: kernel test robot
    Cc: Paul Moore
    Cc: Stephen Smalley
    Cc: Eric Paris
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Similar to commit c0afd9ce4d6a ("fq_codel: fix return value of fq_codel_drop()")
    ->drop() is supposed to return the number of bytes it dropped,
    but hhf_drop () returns the id of the bucket where it drops
    a packet from.

    Cc: Jamal Hadi Salim
    Cc: Terry Lam
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

09 Oct, 2015

1 commit

  • The Kconfig currently controlling compilation of this code is:

    net/sched/Kconfig:menuconfig NET_SCHED
    net/sched/Kconfig: bool "QoS and/or fair queueing"

    ...meaning that it currently is not being built as a module by anyone.

    Lets remove the modular code that is essentially orphaned, so that
    when reading the driver there is no doubt it is builtin-only.

    Since module_init translates to device_initcall in the non-modular
    case, the init ordering remains unchanged with this commit. We can
    change to one of the other priority initcalls (subsys?) at any later
    date, if desired.

    We also delete the MODULE_LICENSE tag since all that information
    is already contained at the top of the file in the comments.

    Cc: Jamal Hadi Salim
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Paul Gortmaker
     

08 Oct, 2015

1 commit

  • Similar to commit c29390c6dfee ("xps: must clear sender_cpu before forwarding")
    the skb->sender_cpu needs to be cleared when moving from Rx
    Tx, otherwise kernel could crash.

    Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices")
    Cc: Eric Dumazet
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    WANG Cong
     

05 Oct, 2015

2 commits

  • Align with other tc actions.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • After commit 1ce87720d456 ("net: sched: make cls_u32 lockless")
    we began to release tc actions in a RCU callback. However,
    mirred action relies on RTNL lock to protect the global
    mirred_list, therefore we could have a race condition
    between RCU callback and netdevice event, which caused
    a list corruption as reported by Vinson.

    Instead of relying on RTNL lock, introduce a spinlock to
    protect this list.

    Note, in non-bind case, it is still called with RTNL lock,
    therefore should disable BH too.

    Reported-by: Vinson Lee
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

03 Oct, 2015

2 commits

  • Using routing realms as part of the classifier is quite useful, it
    can be viewed as a tag for one or multiple routing entries (think of
    an analogy to net_cls cgroup for processes), set by user space routing
    daemons or via iproute2 as an indicator for traffic classifiers and
    later on processed in the eBPF program.

    Unlike actions, the classifier can inspect device flags and enable
    netif_keep_dst() if necessary. tc actions don't have that possibility,
    but in case people know what they are doing, it can be used from there
    as well (e.g. via devs that must keep dsts by design anyway).

    If a realm is set, the handler returns the non-zero realm. User space
    can set the full 32bit realm for the dst.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • If a listen backlog is very big (to avoid syncookies), then
    the listener sk->sk_wmem_alloc is the main source of false
    sharing, as we need to touch it twice per SYNACK re-transmit
    and TX completion.

    (One SYN packet takes listener lock once, but up to 6 SYNACK
    are generated)

    By attaching the skb to the request socket, we remove this
    source of contention.

    Tested:

    listen(fd, 10485760); // single listener (no SO_REUSEPORT)
    16 RX/TX queue NIC
    Sustain a SYNFLOOD attack of ~320,000 SYN per second,
    Sending ~1,400,000 SYNACK per second.
    Perf profiles now show listener spinlock being next bottleneck.

    20.29% [kernel] [k] queued_spin_lock_slowpath
    10.06% [kernel] [k] __inet_lookup_established
    5.12% [kernel] [k] reqsk_timer_handler
    3.22% [kernel] [k] get_next_timer_interrupt
    3.00% [kernel] [k] tcp_make_synack
    2.77% [kernel] [k] ipt_do_table
    2.70% [kernel] [k] run_timer_softirq
    2.50% [kernel] [k] ip_finish_output
    2.04% [kernel] [k] cascade

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Sep, 2015

1 commit


25 Sep, 2015

1 commit

  • fw filter uses tp->root==NULL to check if it is the old method,
    so it doesn't need allocation at all in this case. This patch
    reverts the offending commit and adds some comments for old
    method to make it obvious.

    Fixes: 33f8b9ecdb15 ("net_sched: move tp->root allocation into fw_init()")
    Reported-by: Akshat Kakkar
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

24 Sep, 2015

3 commits

  • Jamal suggested to further limit the currently allowed subset of opcodes
    that may be used by a direct action return code as the intention is not
    to replace the full action engine, but rather to have a minimal set that
    can be used in the fast-path on things like ingress for some features
    that cls_bpf supports.

    Classifiers can, of course, still be chained together that have direct
    action mode with those that have a full exec pass. For more complex
    scenarios that go beyond this minimal set here, the full tcf_exts_exec()
    path must be used.

    Suggested-by: Jamal Hadi Salim
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • The binding to a particular classid was so far always mandatory for
    cls_bpf, but it doesn't need to be. Therefore, lift this restriction
    as similarly done in other classifiers.

    Only a couple of qdiscs make use of class from the tcf_result, others
    don't strictly care, so let the user choose his needs (those that read
    out class can handle situations where it could be NULL).

    An explicit check for tcf_unbind_filter() is also not needed here, as
    the previous r->class was 0, so the xchg() will return that and
    therefore a callback to the qdisc's unbind_tcf() is skipped.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • In commit 43388da42a49 ("cls_bpf: introduce integrated actions") we
    have added TCA_BPF_FLAGS. We can also retrieve this information from
    the prog, dump it back to user space as well. It's useful in tc when
    displaying/dumping filter info.

    Also, remove tp from cls_bpf_prog_from_efd(), came in as a conflict
    from a rebase and it's unused here (later work may add it along with
    a real user).

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

19 Sep, 2015

3 commits


18 Sep, 2015

3 commits

  • Memory placement in sch_dsmark is silly : Better place mask/value
    in the same cache line.

    Also, we can embed small arrays in the first cache line and
    remove a potential cache miss.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Existing bpf_clone_redirect() helper clones skb before redirecting
    it to RX or TX of destination netdev.
    Introduce bpf_redirect() helper that does that without cloning.

    Benchmarked with two hosts using 10G ixgbe NICs.
    One host is doing line rate pktgen.
    Another host is configured as:
    $ tc qdisc add dev $dev ingress
    $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
    action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
    so it receives the packet on $dev and immediately xmits it on $dev + 1
    The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
    that does bpf_clone_redirect() and performance is 2.0 Mpps

    $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
    action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
    which is using bpf_redirect() - 2.4 Mpps

    and using cls_bpf with integrated actions as:
    $ tc filter add dev $dev root pref 10 \
    bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
    performance is 2.5 Mpps

    To summarize:
    u32+act_bpf using clone_redirect - 2.0 Mpps
    u32+act_bpf using redirect - 2.4 Mpps
    cls_bpf using redirect - 2.5 Mpps

    For comparison linux bridge in this setup is doing 2.1 Mpps
    and ixgbe rx + drop in ip_rcv - 7.8 Mpps

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Often cls_bpf classifier is used with single action drop attached.
    Optimize this use case and let cls_bpf return both classid and action.
    For backwards compatibility reasons enable this feature under
    TCA_BPF_FLAG_ACT_DIRECT flag.

    Then more interesting programs like the following are easier to write:
    int cls_bpf_prog(struct __sk_buff *skb)
    {
    /* classify arp, ip, ipv6 into different traffic classes
    * and drop all other packets
    */
    switch (skb->protocol) {
    case htons(ETH_P_ARP):
    skb->tc_classid = 1;
    break;
    case htons(ETH_P_IP):
    skb->tc_classid = 2;
    break;
    case htons(ETH_P_IPV6):
    skb->tc_classid = 3;
    break;
    default:
    return TC_ACT_SHOT;
    }

    return TC_ACT_OK;
    }

    Joint work with Daniel Borkmann.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

02 Sep, 2015

1 commit


29 Aug, 2015

1 commit

  • Just some minor noise follow-up to address some stylistic issues of
    commit 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}").
    Accidentally v1 instead of v2 of that commit got applied, so this
    patch adds the relative diff.

    Suggested-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

28 Aug, 2015

5 commits

  • David S. Miller
     
  • Now that noqueue qdisc can be attached just like any other qdisc, no
    special treatment is necessary anymore when attaching it as default
    qdisc.

    This change has the added benefit that 'tc qdisc show' prints noqueue
    instead of nothing for devices defaulting to noqueue.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     
  • This way users can attach noqueue just like any other qdisc using tc
    without having to mess with tx_queue_len first.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     
  • Since alloc_netdev_mqs() sets IFF_NO_QUEUE for drivers not initializing
    tx_queue_len, it is safe to assume that if tx_queue_len is zero,
    dev->priv flags always contains IFF_NO_QUEUE.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     
  • For classifiers getting invoked via tc_classify(), we always need an
    extra function call into tc_classify_compat(), as both are being
    exported as symbols and tc_classify() itself doesn't do much except
    handling of reclassifications when tp->classify() returned with
    TC_ACT_RECLASSIFY.

    CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
    all others use tc_classify(). When tc actions are being configured
    out in the kernel, tc_classify() effectively does nothing besides
    delegating.

    We could spare this layer and consolidate both functions. pktgen on
    single CPU constantly pushing skbs directly into the netif_receive_skb()
    path with a dummy classifier on ingress qdisc attached, improves
    slightly from 22.3Mpps to 23.1Mpps.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

27 Aug, 2015

2 commits

  • Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
    with extra care taken to free bpf programs after rcu grace period.
    Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
    and final destruction is done from tc_action_ops->cleanup() callback that is
    called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
    bind and refcnt reach zero which is only possible when classifier is destroyed.
    Previous two patches fixed the last two classifiers (tcindex and rsvp) to
    call tcf_exts_destroy() from rcu callback.

    Similar to gact/mirred there is a race between prog->filter and
    prog->tcf_action. Meaning that the program being replaced may use
    previous default action if it happened to return TC_ACT_UNSPEC.
    act_mirred race betwen tcf_action and tcfm_dev is similar.
    In all cases the race is harmless.
    Long term we may want to improve the situation by replacing the whole
    tc_action->priv as single pointer instead of updating inner fields one by one.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after
    rcu grace period.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov