04 Feb, 2017

1 commit

  • [ Upstream commit 0faa9cb5b3836a979864a6357e01d2046884ad52 ]

    Demonstrating the issue:

    .. add a drop action
    $sudo $TC actions add action drop index 10

    .. retrieve it
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 0 installed 29 sec used 29 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    ... bug 1 above: reference is two.
    Reference is actually 1 but we forget to subtract 1.

    ... do a GET again and we see the same issue
    try a few times and nothing changes
    ~$ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 0 installed 31 sec used 31 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    ... lets try to bind the action to a filter..
    $ sudo $TC qdisc add dev lo ingress
    $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
    u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

    ... and now a few GETs:
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 3 bind 1 installed 204 sec used 204 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 4 bind 1 installed 206 sec used 206 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 5 bind 1 installed 235 sec used 235 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    .... as can be observed the reference count keeps going up.

    After the fix

    $ sudo $TC actions add action drop index 10
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 1 bind 0 installed 4 sec used 4 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 1 bind 0 installed 6 sec used 6 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC qdisc add dev lo ingress
    $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
    u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 1 installed 32 sec used 32 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 1 installed 33 sec used 33 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    Fixes: aecc5cefc389 ("net sched actions: fix GETing actions")
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jamal Hadi Salim
     

30 Oct, 2016

1 commit


13 Oct, 2016

1 commit

  • Krister reported a kernel NULL pointer dereference after
    tcf_action_init_1() invokes a_o->init(), it is a race condition
    where one thread calling tcf_register_action() to initialize
    the netns data after putting act ops in the global list and
    the other thread searching the list and then calling
    a_o->init(net, ...).

    Fix this by moving the pernet ops registration before making
    the action ops visible. This is fine because: a) we don't
    rely on act_base in pernet ops->init(), b) in the worst case we
    have a fully initialized netns but ops is still not ready so
    new actions still can't be created.

    Reported-by: Krister Johansen
    Tested-by: Krister Johansen
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

21 Sep, 2016

1 commit

  • With the batch changes that translated transient actions into
    a temporary list lost in the translation was the fact that
    tcf_action_destroy() will eventually delete the action from
    the permanent location if the refcount is zero.

    Example of what broke:
    ...add a gact action to drop
    sudo $TC actions add action drop index 10
    ...now retrieve it, looks good
    sudo $TC actions get action gact index 10
    ...retrieve it again and find it is gone!
    sudo $TC actions get action gact index 10

    Fixes: 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array"),
    Fixes: 824a7e8863b3 ("net_sched: remove an unnecessary list_del()")
    Fixes: f07fed82ad79 ("net_sched: remove the leftover cleanup_a()")

    Acked-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

20 Sep, 2016

1 commit


18 Aug, 2016

3 commits

  • As pointed out by Jamal, an action could be shared by
    multiple filters, so we can't use list to chain them
    any more after we get rid of the original tc_action.
    Instead, we could just save pointers to these actions
    in tcf_exts, since they are refcount'ed, so convert
    the list to an array of pointers.

    The "ugly" part is the action API still accepts list
    as a parameter, I just introduce a helper function to
    convert the array of pointers to a list, instead of
    relying on the C99 feature to iterate the array.

    Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
    Reported-by: Jamal Hadi Salim
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • This list_del() for tc action is not needed actually,
    because we only use this list to chain bulk operations,
    therefore should not be carried for latter operations.

    Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • After refactoring tc_action into tcf_common, we no
    longer need to cleanup temporary "actions" in list,
    they are permanently stored in the hashtable.

    Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
    Reported-by: Jamal Hadi Salim
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

26 Jul, 2016

2 commits

  • After the previous patch, struct tc_action should be enough
    to represent the generic tc action, tcf_common is not necessary
    any more. This patch gets rid of it to make tc action code
    more readable.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • struct tc_action is confusing, currently we use it for two purposes:
    1) Pass in arguments and carry out results from helper functions
    2) A generic representation for tc actions

    The first one is error-prone, since we need to make sure we don't
    miss anything. This patch aims to get rid of this use, by moving
    tc_action into tcf_common, so that they are allocated together
    in hashtable and can be cast'ed easily.

    And together with the following patch, we could really make
    tc_action a generic representation for all tc actions and each
    type of action can inherit from it.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

30 Jun, 2016

1 commit


16 Jun, 2016

2 commits


08 Jun, 2016

3 commits

  • Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
    agent [1] are problematic at scale :

    For each qdisc/class found in the dump, we currently lock the root qdisc
    spinlock in order to get stats. Sampling stats every 5 seconds from
    thousands of HTB classes is a challenge when the root qdisc spinlock is
    under high pressure. Not only the dumps take time, they also slow
    down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

    An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
    that might need the qdisc lock in fq_codel_dump_stats() and
    fq_codel_dump_class_stats()

    In v2 of this patch, I now use the Qdisc running seqcount to provide
    consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

    I also changed rate estimators to use the same infrastructure
    so that they no longer need to lock root qdisc lock.

    [1]
    http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

    Signed-off-by: Eric Dumazet
    Cc: Cong Wang
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Cc: Kevin Athey
    Cc: Xiaotian Pei
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Signed-off-by: Jamal Hadi Salim
    Acked-by: Cong Wang

    Jamal Hadi Salim
     
  • Useful to know when the action was first used for accounting
    (and debugging)

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

27 Apr, 2016

1 commit


26 Feb, 2016

2 commits

  • Currently tc actions are stored in a per-module hashtable,
    therefore are visible to all network namespaces. This is
    probably the last part of the tc subsystem which is not
    aware of netns now. This patch makes them per-netns,
    several tc action API's need to be adjusted for this.

    The tc action API code is ugly due to historical reasons,
    we need to refactor that code in the future.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • We only release the memory of the hashtable itself, not its
    entries inside. This is not a problem yet since we only call
    it in module release path, and module is refcount'ed by
    actions. This would be a problem after we move the per module
    hinfo into per netns in the latter patch.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

27 Aug, 2015

1 commit


01 Aug, 2015

1 commit


31 Jul, 2015

1 commit

  • Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action
    outside"), we end up with a wrong reference count for a tc action.

    Test case 1:

    FOO="1,6 0 0 4294967295,"
    BAR="1,6 0 0 4294967294,"
    tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
    action bpf bytecode "$FOO"
    tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 1 bind 1
    tc actions replace action bpf bytecode "$BAR" index 1
    tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
    index 1 ref 2 bind 1
    tc actions replace action bpf bytecode "$FOO" index 1
    tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 3 bind 1

    Test case 2:

    FOO="1,6 0 0 4294967295,"
    tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
    tc actions show action gact
    action order 0: gact action pass
    random type none pass val 0
    index 1 ref 1 bind 1
    tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
    tc actions show action gact
    action order 0: gact action pass
    random type none pass val 0
    index 1 ref 2 bind 1
    tc actions add action drop index 1
    RTNETLINK answers: File exists [...]
    tc actions show action gact
    action order 0: gact action pass
    random type none pass val 0
    index 1 ref 3 bind 1

    What happens is that in tcf_hash_check(), we check tcf_common for a given
    index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
    found an existing action. Now there are the following cases:

    1) We do a late binding of an action. In that case, we leave the
    tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
    handler. This is correctly handeled.

    2) We replace the given action, or we try to add one without replacing
    and find out that the action at a specific index already exists
    (thus, we go out with error in that case).

    In case of 2), we have to undo the reference count increase from
    tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
    do so because of the 'tcfc_bindcnt > 0' check which bails out early with
    an -EPERM error.

    Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an
    already classifier-bound action to drop the reference count (which could
    then become negative, wrap around etc), this restriction only accounts for
    invocations outside a specific action's ->init() handler.

    One possible solution would be to add a flag thus we possibly trigger
    the -EPERM ony in situations where it is indeed relevant.

    After the patch, above test cases have correct reference count again.

    Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside")
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

09 Jul, 2015

1 commit

  • Reuse existing percpu infrastructure John Fastabend added for qdisc.

    This patch adds a new cpustats parameter to tcf_hash_create() and all
    actions pass false, meaning this patch should have no effect yet.

    Signed-off-by: Eric Dumazet
    Cc: Alexei Starovoitov
    Acked-by: Jamal Hadi Salim
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 May, 2015

1 commit

  • Not used.

    pedit sets TC_MUNGED when packet content was altered, but all the core
    does is unset MUNGED again and then set OK2MUNGE.

    And the latter isn't tested anywhere. So lets remove both
    TC_MUNGED and TC_OK2MUNGE.

    Signed-off-by: Florian Westphal
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Florian Westphal
     

30 Sep, 2014

3 commits

  • After previous patches to simplify qstats the qstats can be
    made per cpu with a packed union in Qdisc struct.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This removes the use of qstats->qlen variable from the classifiers
    and makes it an explicit argument to gnet_stats_copy_queue().

    The qlen represents the qdisc queue length and is packed into
    the qstats at the last moment before passnig to user space. By
    handling it explicitely we avoid, in the percpu stats case, having
    to figure out which per_cpu variable to put it in.

    It would probably be best to remove it from qstats completely
    but qstats is a user space ABI and can't be broken. A future
    patch could make an internal only qstats structure that would
    avoid having to allocate an additional u32 variable on the
    Qdisc struct. This would make the qstats struct 128bits instead
    of 128+32.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • In order to run qdisc's without locking statistics and estimators
    need to be handled correctly.

    To resolve bstats make the statistics per cpu. And because this is
    only needed for qdiscs that are running without locks which is not
    the case for most qdiscs in the near future only create percpu
    stats when qdiscs set the TCQ_F_CPUSTATS flag.

    Next because estimators use the bstats to calculate packets per
    second and bytes per second the estimator code paths are updated
    to use the per cpu statistics.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

25 Apr, 2014

1 commit

  • It is possible by passing a netlink socket to a more privileged
    executable and then to fool that executable into writing to the socket
    data that happens to be valid netlink message to do something that
    privileged executable did not intend to do.

    To keep this from happening replace bare capable and ns_capable calls
    with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
    Which act the same as the previous calls except they verify that the
    opener of the socket had the desired permissions as well.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 Feb, 2014

5 commits


22 Jan, 2014

2 commits


14 Jan, 2014

3 commits


07 Jan, 2014

2 commits