03 Jan, 2018

1 commit

  • [ Upstream commit b59e6979a86384e68b0ab6ffeab11f0034fba82d ]

    Move static key increments to the beginning of the init function
    so they pair 1:1 with decrements in ingress/clsact_destroy,
    which is called in case ingress/clsact_init fails.

    Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
    Signed-off-by: Jiri Pirko
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jiri Pirko
     

26 Aug, 2017

1 commit

  • For TC classes, their ->get() and ->put() are always paired, and the
    reference counting is completely useless, because:

    1) For class modification and dumping paths, we already hold RTNL lock,
    so all of these ->get(),->change(),->put() are atomic.

    2) For filter bindiing/unbinding, we use other reference counter than
    this one, and they should have RTNL lock too.

    3) For ->qlen_notify(), it is special because it is called on ->enqueue()
    path, but we already hold qdisc tree lock there, and we hold this
    tree lock when graft or delete the class too, so it should not be gone
    or changed until we release the tree lock.

    Therefore, this patch removes ->get() and ->put(), but:

    1) Adds a new ->find() to find the pointer to a class by classid, no
    refcnt.

    2) Move the original class destroy upon the last refcnt into ->delete(),
    right after releasing tree lock. This is fine because the class is
    already removed from hash when holding the lock.

    For those who also use ->put() as ->unbind(), just rename them to reflect
    this change.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

12 Aug, 2017

1 commit


18 May, 2017

1 commit

  • Currently, the filter chains are direcly put into the private structures
    of qdiscs. In order to be able to have multiple chains per qdisc and to
    allow filter chains sharing among qdiscs, there is a need for common
    object that would hold the chains. This introduces such object and calls
    it "tcf_block".

    Helpers to get and put the blocks are provided to be called from
    individual qdisc code. Also, the original filter_list pointers are left
    in qdisc privs to allow the entry into tcf_block processing without any
    added overhead of possible multiple pointer dereference on fast path.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

11 Feb, 2017

1 commit


08 Jun, 2016

1 commit

  • When offloading classifiers such as u32 or flower to hardware, and the
    qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
    since not all of them handle ingress, therefore we must leave those in
    software path. Add a .tcf_cl_offload() callback, so we can generically
    handle them, tested on ixgbe.

    Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support")
    Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
    Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 Jan, 2016

1 commit

  • This work adds a generalization of the ingress qdisc as a qdisc holding
    only classifiers. The clsact qdisc works on ingress, but also on egress.
    In both cases, it's execution happens without taking the qdisc lock, and
    the main difference for the egress part compared to prior version of [1]
    is that this can be applied with _any_ underlying real egress qdisc (also
    classless ones).

    Besides solving the use-case of [1], that is, allowing for more programmability
    on assigning skb->priority for the mqprio case that is supported by most
    popular 10G+ NICs, it also opens up a lot more flexibility for other tc
    applications. The main work on classification can already be done at clsact
    egress time if the use-case allows and state stored for later retrieval
    f.e. again in skb->priority with major/minors (which is checked by most
    classful qdiscs before consulting tc_classify()) and/or in other skb fields
    like skb->tc_index for some light-weight post-processing to get to the
    eventual classid in case of a classful qdisc. Another use case is that
    the clsact egress part allows to have a central egress counterpart to
    the ingress classifiers, so that classifiers can easily share state (e.g.
    in cls_bpf via eBPF maps) for ingress and egress.

    Currently, default setups like mq + pfifo_fast would require for this to
    use, for example, prio qdisc instead (to get a tc_classify() run) and to
    duplicate the egress classifier for each queue. With clsact, it allows
    for leaving the setup as is, it can additionally assign skb->priority to
    put the skb in one of pfifo_fast's bands and it can share state with maps.
    Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
    w/o the need to perform a skb_dst_force() to hold on to it any longer. In
    lwt case, we can also use this facility to setup dst metadata via cls_bpf
    (bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
    that (case of IFF_NO_QUEUE devices, for example).

    The realization can be done without any changes to the scheduler core
    framework. All it takes is that we have two a-priori defined minors/child
    classes, where we can mux between ingress and egress classifier list
    (dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
    dev->_tx to avoid extra cacheline miss for moderate loads). The egress
    part is a bit similar modelled to handle_ing() and patched to a noop in
    case the functionality is not used. Both handlers are now called
    sch_handle_ingress() and sch_handle_egress(), code sharing among the two
    doesn't seem practical as there are various minor differences in both
    paths, so that making them conditional in a single handler would rather
    slow things down.

    Full compatibility to ingress qdisc is provided as well. Since both
    piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
    per netdevice, and thus ingress qdisc specific behaviour can be retained
    for user space. This means, either a user does 'tc qdisc add dev foo ingress'
    and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
    alternative, where both, ingress and egress classifier can be configured
    as in the below example. ingress qdisc supports attaching classifier to any
    minor number whereas clsact has two fixed minors for muxing between the
    lists, therefore to not break user space setups, they are better done as
    two separate qdiscs.

    I decided to extend the sch_ingress module with clsact functionality so
    that commonly used code can be reused, the module is being aliased with
    sch_clsact so that it can be auto-loaded properly. Alternative would have been
    to add a flag when initializing ingress to alter its behaviour plus aliasing
    to a different name (as it's more than just ingress). However, the first would
    end up, based on the flag, choosing the new/old behaviour by calling different
    function implementations to handle each anyway, the latter would require to
    register ingress qdisc once again under different alias. So, this really begs
    to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
    by its own that share callbacks used by both.

    Example, adding qdisc:

    # tc qdisc add dev foo clsact
    # tc qdisc show dev foo
    qdisc mq 0: root
    qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    qdisc clsact ffff: parent ffff:fff1

    Adding filters (deleting, etc works analogous by specifying ingress/egress):

    # tc filter add dev foo ingress bpf da obj bar.o sec ingress
    # tc filter add dev foo egress bpf da obj bar.o sec egress
    # tc filter show dev foo ingress
    filter protocol all pref 49152 bpf
    filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
    # tc filter show dev foo egress
    filter protocol all pref 49152 bpf
    filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

    A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
    show an empty list for clsact. Either using the parent names (ingress/egress)
    or specifying the full major/minor will then show the related filter lists.

    Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.

    [1] http://patchwork.ozlabs.org/patch/512949/

    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 May, 2015

1 commit

  • Ingress qdisc has no other purpose than calling into tc_classify()
    that executes attached classifier(s) and action(s).

    It has a 1:1 relationship to dev->ingress_queue. After having commit
    087c1a601ad7 ("net: sched: run ingress qdisc without locks") removed
    the central ingress lock, one major contention point is gone.

    The extra indirection layers however, are not necessary for calling
    into ingress qdisc. pktgen calling locally into netif_receive_skb()
    with a dummy u32, single CPU result on a Supermicro X10SLM-F, Xeon
    E3-1240: before ~21,1 Mpps, after patch ~22,9 Mpps.

    We can redirect the private classifier list to the netdev directly,
    without changing any classifier API bits (!) and execute on that from
    handle_ing() side. The __QDISC_STATE_DEACTIVATE test can be removed,
    ingress qdisc doesn't have a queue and thus dev_deactivate_queue()
    is also not applicable, ingress_cl_list provides similar behaviour.
    In other words, ingress qdisc acts like TCQ_F_BUILTIN qdisc.

    One next possible step is the removal of the dev's ingress (dummy)
    netdev_queue, and to only have the list member in the netdevice
    itself.

    Note, the filter chain is RCU protected and individual filter elements
    are being kfree'd by sched subsystem after RCU grace period. RCU read
    lock is being held by __netif_receive_skb_core().

    Joint work with Alexei Starovoitov.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

04 May, 2015

1 commit

  • TC classifiers/actions were converted to RCU by John in the series:
    http://thread.gmane.org/gmane.linux.network/329739/focus=329739
    and many follow on patches.
    This is the last patch from that series that finally drops
    ingress spin_lock.

    Single cpu ingress+u32 performance goes from 22.9 Mpps to 24.5 Mpps.

    In two cpu case when both cores are receiving traffic on the same
    device and go into the same ingress+u32 the performance jumps
    from 4.5 + 4.5 Mpps to 23.5 + 23.5 Mpps

    Signed-off-by: John Fastabend
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Jamal Hadi Salim
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

14 Apr, 2015

1 commit

  • Even if we make use of classifier and actions from the egress
    path, we're going into handle_ing() executing additional code
    on a per-packet cost for ingress qdisc, just to realize that
    nothing is attached on ingress.

    Instead, this can just be blinded out as a no-op entirely with
    the use of a static key. On input fast-path, we already make
    use of static keys in various places, e.g. skb time stamping,
    in RPS, etc. It makes sense to not waste time when we're assured
    that no ingress qdisc is attached anywhere.

    Enabling/disabling of that code path is being done via two
    helpers, namely net_{inc,dec}_ingress_queue(), that are being
    invoked under RTNL mutex when a ingress qdisc is being either
    initialized or destructed.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

30 Sep, 2014

1 commit


14 Sep, 2014

1 commit

  • rcu'ify tcf_proto this allows calling tc_classify() without holding
    any locks. Updaters are protected by RTNL.

    This patch prepares the core net_sched infrastracture for running
    the classifier/action chains without holding the qdisc lock however
    it does nothing to ensure cls_xxx and act_xxx types also work without
    locking. Additional patches are required to address the fall out.

    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

14 Mar, 2014

1 commit


11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 May, 2010

1 commit

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

06 Sep, 2009

1 commit

  • Some schedulers don't support creating, changing or deleting classes.
    Make the respective callbacks optionally and consistently return
    -EOPNOTSUPP for unsupported operations, instead of currently either
    -EOPNOTSUPP, -ENOSYS or no error.

    In case of sch_prio and sch_multiq, the removed operations additionally
    checked for an invalid class. This is not necessary since the class
    argument can only orginate from ->get() or in case of ->change is 0
    for creation of new classes, in which case ->change() incorrectly
    returned -ENOENT.

    As a side-effect, this patch fixes a possible (root-only) NULL pointer
    function call in sch_ingress, which didn't implement a so far mandatory
    ->delete() operation.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

05 Sep, 2009

1 commit

  • If the parent qdisc doesn't support classes, use EOPNOTSUPP.
    If the parent class doesn't exist, use ENOENT. Currently EINVAL
    is returned in both cases.

    Additionally check whether grafting is supported and remove a now
    unnecessary graft function from sch_ingress.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

20 Jul, 2008

1 commit


02 Jul, 2008

1 commit


01 Feb, 2008

1 commit

  • Since the old policer code is gone, TC actions are needed for policing.
    The ingress qdisc can get packets directly from netif_receive_skb()
    in case TC actions are enabled or through netfilter otherwise, but
    since without TC actions there is no policer the only thing it actually
    does is count packets.

    Remove the netfilter support and always require TC actions.

    Signed-off-by: Patrick McHardy
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Patrick McHardy
     

29 Jan, 2008

15 commits


16 Oct, 2007

1 commit


31 Jul, 2007

1 commit


15 Jul, 2007

1 commit

  • The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
    remove the old code. The config option will be kept around to select
    the equivalent NET_CLS_ACT options for a short time to allow easier
    upgrades.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

11 Jul, 2007

1 commit


26 Apr, 2007

1 commit

  • Spring cleaning time...

    There seems to be a lot of places in the network code that have
    extra bogus semicolons after conditionals. Most commonly is a
    bogus semicolon after: switch() { }

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger