12 Apr, 2018

1 commit

  • [ Upstream commit 734549eb550c0c720bc89e50501f1b1e98cdd841 ]

    Fixes a bug in the tcf_dump_walker function that can cause some actions
    to not be reported when dumping a large number of actions. This issue
    became more aggrevated when cookies feature was added. In particular
    this issue is manifest when large cookie values are assigned to the
    actions and when enough actions are created that the resulting table
    must be dumped in multiple batches.

    The number of actions returned in each batch is limited by the total
    number of actions and the memory buffer size. With small cookies
    the numeric limit is reached before the buffer size limit, which avoids
    the code path triggering this bug. When large cookies are used buffer
    fills before the numeric limit, and the erroneous code path is hit.

    For example after creating 32 csum actions with the cookie
    aaaabbbbccccdddd

    $ tc actions ls action csum
    total acts 26

    action order 0: csum (tcp) action continue
    index 1 ref 1 bind 0
    cookie aaaabbbbccccdddd

    .....

    action order 25: csum (tcp) action continue
    index 26 ref 1 bind 0
    cookie aaaabbbbccccdddd
    total acts 6

    action order 0: csum (tcp) action continue
    index 28 ref 1 bind 0
    cookie aaaabbbbccccdddd

    ......

    action order 5: csum (tcp) action continue
    index 32 ref 1 bind 0
    cookie aaaabbbbccccdddd

    Note that the action with index 27 is omitted from the report.

    Fixes: 4b3550ef530c ("[NET_SCHED]: Use nla_nest_start/nla_nest_end")"
    Signed-off-by: Craig Dillabaugh
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Craig Dillabaugh
     

09 Nov, 2017

1 commit

  • This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
    If we hold that refcnt, the netns can never be destroyed until
    all actions are destroyed by user, this breaks our netns design
    which we expect all actions are destroyed when we destroy the
    whole netns.

    Cc: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

03 Nov, 2017

2 commits

  • TC actions have been destroyed asynchronously for a long time,
    previously in a RCU callback and now in a workqueue. If we
    don't hold a refcnt for its netns, we could use the per netns
    data structure, struct tcf_idrinfo, after it has been freed by
    netns workqueue.

    Hold refcnt to ensure netns destroy happens after all actions
    are gone.

    Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
    Reported-by: Lucas Bates
    Tested-by: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • I forgot to acquire RTNL in tc_action_net_exit()
    which leads that action ops->cleanup() is not always
    called with RTNL. This usually is not a big deal because
    this function is called after all netns refcnt are gone,
    but given RTNL protects more than just actions, add it
    for safety and consistency.

    Also add an assertion to catch other potential bugs.

    Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
    Reported-by: Lucas Bates
    Tested-by: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

14 Sep, 2017

1 commit

  • Recent commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu") removed
    freeing in call_rcu, which changed already existing hard-to-hit
    race condition into 100% hit:

    [ 598.599825] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
    [ 598.607782] IP: tcf_action_destroy+0xc0/0x140

    Or:

    [ 40.858924] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
    [ 40.862840] IP: tcf_generic_walker+0x534/0x820

    Fix this by storing the ops and use them directly for module_put call.

    Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

13 Sep, 2017

1 commit

  • gen estimator has been rewritten in commit 1c0d32fde5bd
    ("net_sched: gen_estimator: complete rewrite of rate estimators"),
    the caller is no longer needed to wait for a grace period.
    So this patch gets rid of it.

    This also completely closes a race condition between action free
    path and filter chain add/remove path for the following patch.
    Because otherwise the nested RCU callback can't be caught by
    rcu_barrier().

    Please see also the comments in code.

    Cc: Jiri Pirko
    Cc: Jamal Hadi Salim
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Sep, 2017

1 commit

  • The new TC IDR code uses GFP_KERNEL under spin lock. Which leads
    to:

    [ 582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416
    [ 582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc
    [ 582.636939] 2 locks held by tc/3379:
    [ 582.641049] #0: (rtnl_mutex){+.+.+.}, at: [] rtnetlink_rcv_msg+0x92e/0x1400
    [ 582.650958] #1: (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [] tcf_idr_create+0x2f0/0x8e0
    [ 582.662217] Preemption disabled at:
    [ 582.662222] [] tcf_idr_create+0x2f0/0x8e0
    [ 582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G W 4.13.0-rc7-debug-00648-g43503a79b9f0 #287
    [ 582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
    [ 582.691937] Call Trace:
    ...
    [ 582.742460] kmem_cache_alloc+0x286/0x540
    [ 582.747055] radix_tree_node_alloc.constprop.6+0x4a/0x450
    [ 582.753209] idr_get_free_cmn+0x627/0xf80
    ...
    [ 582.815525] idr_alloc_cmn+0x1a8/0x270
    ...
    [ 582.833804] tcf_idr_create+0x31b/0x8e0
    ...

    Try to preallocate the memory with idr_prealloc(GFP_KERNEL)
    (as suggested by Eric Dumazet), and change the allocation
    flags under spin lock.

    Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

31 Aug, 2017

1 commit

  • Typically, each TC filter has its own action. All the actions of the
    same type are saved in its hash table. But the hash buckets are too
    small that it degrades to a list. And the performance is greatly
    affected. For example, it takes about 0m11.914s to insert 64K rules.
    If we convert the hash table to IDR, it only takes about 0m1.500s.
    The improvement is huge.

    But please note that the test result is based on previous patch that
    cls_flower uses IDR.

    Signed-off-by: Chris Mi
    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Chris Mi
     

10 Aug, 2017

1 commit

  • This change allows us to later indicate to rtnetlink core that certain
    doit functions should be called without acquiring rtnl_mutex.

    This change should have no effect, we simply replace the last (now
    unused) calcit argument with the new flag.

    Signed-off-by: Florian Westphal
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     

05 Aug, 2017

1 commit


31 Jul, 2017

3 commits

  • This patch adds support for filtering based on time since last used.
    When we are dumping a large number of actions it is useful to
    have the option of filtering based on when the action was last
    used to reduce the amount of data crossing to user space.

    With this patch the user space app sets the TCA_ROOT_TIME_DELTA
    attribute with the value in milliseconds with "time of interest
    since now". The kernel converts this to jiffies and does the
    filtering comparison matching entries that have seen activity
    since then and returns them to user space.
    Old kernels and old tc continue to work in legacy mode since
    they dont specify this attribute.

    Some example (we have 400 actions bound to 400 filters); at
    installation time. Using updated when tc setting the time of
    interest to 120 seconds earlier (we see 400 actions):
    prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
    400

    go get some coffee and wait for > 120 seconds and try again:

    prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
    0

    Lets see a filter bound to one of these actions:
    ....
    filter pref 10 u32
    filter pref 10 u32 fh 800: ht divisor 1
    filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2 success 1)
    match 7f000002/ffffffff at 12 (success 1 )
    action order 1: gact action pass
    random type none pass val 0
    index 23 ref 2 bind 1 installed 1145 sec used 802 sec
    Action statistics:
    Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    ....

    that coffee took long, no? It was good.

    Now lets ping -c 1 127.0.0.2, then run the actions again:
    prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
    1

    More details please:
    prompt$ hackedtc -s actions ls action gact since 120000

    action order 0: gact action pass
    random type none pass val 0
    index 23 ref 2 bind 1 installed 1270 sec used 30 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    And the filter?

    filter pref 10 u32
    filter pref 10 u32 fh 800: ht divisor 1
    filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 4 success 2)
    match 7f000002/ffffffff at 12 (success 2 )
    action order 1: gact action pass
    random type none pass val 0
    index 23 ref 2 bind 1 installed 1324 sec used 84 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    Signed-off-by: Jamal Hadi Salim
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • When you dump hundreds of thousands of actions, getting only 32 per
    dump batch even when the socket buffer and memory allocations allow
    is inefficient.

    With this change, the user will get as many as possibly fitting
    within the given constraints available to the kernel.

    The top level action TLV space is extended. An attribute
    TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
    is set by the user indicating the user is capable of processing
    these large dumps. Older user space which doesnt set this flag
    doesnt get the large (than 32) batches.
    The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
    actions are put in a single batch. As such user space app knows how long
    to iterate (independent of the type of action being dumped)
    instead of hardcoded maximum of 32 thus maintaining backward compat.

    Some results dumping 1.5M actions below:
    first an unpatched tc which doesnt understand these features...

    prompt$ time -p tc actions ls action gact | grep index | wc -l
    1500000
    real 1388.43
    user 2.07
    sys 1386.79

    Now lets see a patched tc which sets the correct flags when requesting
    a dump:

    prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
    1500000
    real 178.13
    user 2.02
    sys 176.96

    That is about 8x performance improvement for tc app which sets its
    receive buffer to about 32K.

    Signed-off-by: Jamal Hadi Salim
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • Bug fix for an issue which has been around for about a decade.
    We got away with it because the enumeration was larger than needed.

    Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
    Suggested-by: Jiri Pirko
    Reviewed-by: Simon Horman
    Signed-off-by: Jamal Hadi Salim
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

14 Jul, 2017

1 commit


26 May, 2017

1 commit

  • tcf_chain_get() always creates a new filter chain if not found
    in existing ones. This is totally unnecessary when we get or
    delete filters, new chain should be only created for new filters
    (or new actions).

    Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

18 May, 2017

2 commits

  • Introduce new type of termination action called "goto_chain". This allows
    user to specify a chain to be processed. This action type is
    then processed as a return value in tcf_classify loop in similar
    way as "reclassify" is, only it does not reset to the first filter
    in chain but rather reset to the first filter of the desired chain.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Tp pointer will be needed by the next patch in order to get the chain.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

03 May, 2017

1 commit

  • Jump is now the only one using value action opcode. This is going to
    change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.

    This also fixes the TC_ACT_JUMP check, which is incorrectly done as a
    bit check, not a value check.

    Fixes: e0ee84ded796 ("net sched actions: Complete the JUMPX opcode")
    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

25 Apr, 2017

1 commit

  • per discussion at netconf/netdev:
    When we have an action that is capable of branching (example a policer),
    we can achieve a continuation of the action graph by programming a
    "continue" where we find an exact replica of the same filter rule with a lower
    priority and the remainder of the action graph. When you have 100s of thousands
    of filters which require such a feature it gets very inefficient to do two
    lookups.

    This patch completes a leftover feature of action codes. Its time has come.

    Example below where a user labels packets with a different skbmark on ingress
    of a port depending on whether they have/not exceeded the configured rate.
    This mark is then used to make further decisions on some egress port.

    #rate control, very low so we can easily see the effect
    sudo $TC actions add action police rate 1kbit burst 90k \
    conform-exceed pipe/jump 2 index 10
    # skbedit index 11 will be used if the user conforms
    sudo $TC actions add action skbedit mark 11 ok index 11
    # skbedit index 12 will be used if the user does not conform
    sudo $TC actions add action skbedit mark 12 ok index 12

    #lets bind the user ..
    sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
    match ip dst 127.0.0.8/32 flowid 1:10 \
    action police index 10 \
    action skbedit index 11 \
    action skbedit index 12

    #run a ping -f and see what happens..
    #
    jhs@foobar:~$ sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
    filter pref 8 u32
    filter pref 8 u32 fh 800: ht divisor 1
    filter pref 8 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2800 success 1005)
    match 7f000008/ffffffff at 16 (success 1005 )
    action order 1: police 0xa rate 1Kbit burst 23440b mtu 2Kb action pipe/jump 2 overhead 0b
    ref 2 bind 1 installed 207 sec used 122 sec
    Action statistics:
    Sent 84420 bytes 1005 pkt (dropped 0, overlimits 721 requeues 0)
    backlog 0b 0p requeues 0

    action order 2: skbedit mark 11 pass
    index 11 ref 2 bind 1 installed 204 sec used 122 sec
    Action statistics:
    Sent 60564 bytes 721 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    action order 3: skbedit mark 12 pass
    index 12 ref 2 bind 1 installed 201 sec used 122 sec
    Action statistics:
    Sent 23856 bytes 284 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    Not bad, about 28% non-conforming packets..

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

22 Apr, 2017

1 commit


21 Apr, 2017

1 commit

  • Policing filters do not use the TCA_ACT_* enum and the tb[]
    nlattr array in tcf_action_init_1() doesn't get filled for
    them so we should not try to look for a TCA_ACT_COOKIE
    attribute in the then uninitialized array.
    The error handling in cookie allocation then calls
    tcf_hash_release() leading to invalid memory access later
    on.
    Additionally, if cookie allocation fails after an already
    existing non-policing filter has successfully been changed,
    tcf_action_release() should not be called, also we would
    have to roll back the changes in the error handling, so
    instead we now allocate the cookie early and assign it on
    success at the end.

    CVE-2017-7979
    Fixes: 1045ba77a596 ("net sched actions: Add support for user cookies")
    Signed-off-by: Wolfgang Bumiller
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Wolfgang Bumiller
     

18 Apr, 2017

1 commit

  • Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
    for doit functions that call it directly.

    This is the first step to using extended error reporting in rtnetlink.
    >From here individual subsystems can be updated to set netlink_ext_ack as
    needed.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

14 Apr, 2017

1 commit


27 Feb, 2017

2 commits

  • nla_memdup_cookie was overwriting err value, declared at function
    scope and earlier initialized with result of ->init(). At success
    nla_memdup_cookie() returns 0, and thus module refcnt decremented,
    although the action was installed.

    $ sudo tc actions add action pass index 1 cookie 1234
    $ sudo tc actions ls action gact

    action order 0: gact action pass
    random type none pass val 0
    index 1 ref 1 bind 0
    $
    $ lsmod
    Module Size Used by
    act_gact 16384 0
    ...
    $
    $ sudo rmmod act_gact
    [ 52.310283] ------------[ cut here ]------------
    [ 52.312551] WARNING: CPU: 1 PID: 455 at kernel/module.c:1113
    module_put+0x99/0xa0
    [ 52.316278] Modules linked in: act_gact(-) crct10dif_pclmul crc32_pclmul
    ghash_clmulni_intel psmouse pcbc evbug aesni_intel aes_x86_64 crypto_simd
    serio_raw glue_helper pcspkr cryptd
    [ 52.322285] CPU: 1 PID: 455 Comm: rmmod Not tainted 4.10.0+ #11
    [ 52.324261] Call Trace:
    [ 52.325132] dump_stack+0x63/0x87
    [ 52.326236] __warn+0xd1/0xf0
    [ 52.326260] warn_slowpath_null+0x1d/0x20
    [ 52.326260] module_put+0x99/0xa0
    [ 52.326260] tcf_hashinfo_destroy+0x7f/0x90
    [ 52.326260] gact_exit_net+0x27/0x40 [act_gact]
    [ 52.326260] ops_exit_list.isra.6+0x38/0x60
    [ 52.326260] unregister_pernet_operations+0x90/0xe0
    [ 52.326260] unregister_pernet_subsys+0x21/0x30
    [ 52.326260] tcf_unregister_action+0x68/0xa0
    [ 52.326260] gact_cleanup_module+0x17/0xa0f [act_gact]
    [ 52.326260] SyS_delete_module+0x1ba/0x220
    [ 52.326260] entry_SYSCALL_64_fastpath+0x1e/0xad
    [ 52.326260] RIP: 0033:0x7f527ffae367
    [ 52.326260] RSP: 002b:00007ffeb402a598 EFLAGS: 00000202 ORIG_RAX:
    00000000000000b0
    [ 52.326260] RAX: ffffffffffffffda RBX: 0000559b069912a0 RCX: 00007f527ffae367
    [ 52.326260] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559b06991308
    [ 52.326260] RBP: 0000000000000003 R08: 00007f5280264420 R09: 00007ffeb4029511
    [ 52.326260] R10: 000000000000087b R11: 0000000000000202 R12: 00007ffeb4029580
    [ 52.326260] R13: 0000000000000000 R14: 0000000000000000 R15: 0000559b069912a0
    [ 52.354856] ---[ end trace 90d89401542b0db6 ]---
    $

    With the fix:

    $ sudo modprobe act_gact
    $ lsmod
    Module Size Used by
    act_gact 16384 0
    ...
    $ sudo tc actions add action pass index 1 cookie 1234
    $ sudo tc actions ls action gact

    action order 0: gact action pass
    random type none pass val 0
    index 1 ref 1 bind 0
    $
    $ lsmod
    Module Size Used by
    act_gact 16384 1
    ...
    $ sudo rmmod act_gact
    rmmod: ERROR: Module act_gact is in use
    $
    $ sudo /home/mrv/bin/tc actions del action gact index 1
    $ sudo rmmod act_gact
    $ lsmod
    Module Size Used by
    $

    Fixes: 1045ba77a ("net sched actions: Add support for user cookies")
    Signed-off-by: Roman Mashak
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Roman Mashak
     
  • When tc actions are loaded as a module and no actions have been installed,
    flushing them would result in actions removed from the memory, but modules
    reference count not being decremented, so that the modules would not be
    unloaded.

    Following is example with GACT action:

    % sudo modprobe act_gact
    % lsmod
    Module Size Used by
    act_gact 16384 0
    %
    % sudo tc actions ls action gact
    %
    % sudo tc actions flush action gact
    % lsmod
    Module Size Used by
    act_gact 16384 1
    % sudo tc actions flush action gact
    % lsmod
    Module Size Used by
    act_gact 16384 2
    % sudo rmmod act_gact
    rmmod: ERROR: Module act_gact is in use
    ....

    After the fix:
    % lsmod
    Module Size Used by
    act_gact 16384 0
    %
    % sudo tc actions add action pass index 1
    % sudo tc actions add action pass index 2
    % sudo tc actions add action pass index 3
    % lsmod
    Module Size Used by
    act_gact 16384 3
    %
    % sudo tc actions flush action gact
    % lsmod
    Module Size Used by
    act_gact 16384 0
    %
    % sudo tc actions flush action gact
    % lsmod
    Module Size Used by
    act_gact 16384 0
    % sudo rmmod act_gact
    % lsmod
    Module Size Used by
    %

    Fixes: f97017cdefef ("net-sched: Fix actions flushing")
    Signed-off-by: Roman Mashak
    Signed-off-by: Jamal Hadi Salim
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Roman Mashak
     

15 Feb, 2017

1 commit


26 Jan, 2017

1 commit

  • Introduce optional 128-bit action cookie.
    Like all other cookie schemes in the networking world (eg in protocols
    like http or existing kernel fib protocol field, etc) the idea is to save
    user state that when retrieved serves as a correlator. The kernel
    _should not_ intepret it. The user can store whatever they wish in the
    128 bits.

    Sample exercise(showing variable length use of cookie)

    .. create an accept action with cookie a1b2c3d4
    sudo $TC actions add action ok index 1 cookie a1b2c3d4

    .. dump all gact actions..
    sudo $TC -s actions ls action gact

    action order 0: gact action pass
    random type none pass val 0
    index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

    .. bind the accept action to a filter..
    sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
    u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

    ... send some traffic..
    $ ping 127.0.0.1 -c 3
    PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
    64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
    64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

18 Jan, 2017

1 commit


17 Jan, 2017

1 commit

  • Demonstrating the issue:

    .. add a drop action
    $sudo $TC actions add action drop index 10

    .. retrieve it
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 0 installed 29 sec used 29 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    ... bug 1 above: reference is two.
    Reference is actually 1 but we forget to subtract 1.

    ... do a GET again and we see the same issue
    try a few times and nothing changes
    ~$ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 0 installed 31 sec used 31 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    ... lets try to bind the action to a filter..
    $ sudo $TC qdisc add dev lo ingress
    $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
    u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

    ... and now a few GETs:
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 3 bind 1 installed 204 sec used 204 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 4 bind 1 installed 206 sec used 206 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 5 bind 1 installed 235 sec used 235 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    .... as can be observed the reference count keeps going up.

    After the fix

    $ sudo $TC actions add action drop index 10
    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 1 bind 0 installed 4 sec used 4 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 1 bind 0 installed 6 sec used 6 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC qdisc add dev lo ingress
    $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
    u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 1 installed 32 sec used 32 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    $ sudo $TC -s actions get action gact index 10

    action order 1: gact action drop
    random type none pass val 0
    index 10 ref 2 bind 1 installed 33 sec used 33 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    Fixes: aecc5cefc389 ("net sched actions: fix GETing actions")
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

09 Jan, 2017

1 commit

  • Packets sent by the IFB device skip subsequent tc classification.
    A single bit governs this state. Move it out of tc_verd in
    anticipation of removing that __u16 completely.

    The new bitfield tc_skip_classify temporarily uses one bit of a
    hole, until tc_verd is removed completely in a follow-up patch.

    Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
    With that many options, little value in documenting it.

    Introduce a helper function to deduplicate the logic in the two
    sites that check this bit.

    The field tc_skip_classify is set only in IFB on skbs cloned in
    act_mirred, so original packet sources do not have to clear the
    bit when reusing packets (notably, pktgen and octeon).

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

06 Dec, 2016

1 commit

  • 1) Old code was hard to maintain, due to complex lock chains.
    (We probably will be able to remove some kfree_rcu() in callers)

    2) Using a single timer to update all estimators does not scale.

    3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
    is not supposed to work well)

    In this rewrite :

    - I removed the RB tree that had to be scanned in
    gen_estimator_active(). qdisc dumps should be much faster.

    - Each estimator has its own timer.

    - Estimations are maintained in net_rate_estimator structure,
    instead of dirtying the qdisc. Minor, but part of the simplification.

    - Reading the estimator uses RCU and a seqcount to provide proper
    support for 32bit kernels.

    - We reduce memory need when estimators are not used, since
    we store a pointer, instead of the bytes/packets counters.

    - xt_rateest_mt() no longer has to grab a spinlock.
    (In the future, xt_rateest_tg() could be switched to per cpu counters)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Oct, 2016

1 commit


13 Oct, 2016

1 commit

  • Krister reported a kernel NULL pointer dereference after
    tcf_action_init_1() invokes a_o->init(), it is a race condition
    where one thread calling tcf_register_action() to initialize
    the netns data after putting act ops in the global list and
    the other thread searching the list and then calling
    a_o->init(net, ...).

    Fix this by moving the pernet ops registration before making
    the action ops visible. This is fine because: a) we don't
    rely on act_base in pernet ops->init(), b) in the worst case we
    have a fully initialized netns but ops is still not ready so
    new actions still can't be created.

    Reported-by: Krister Johansen
    Tested-by: Krister Johansen
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

21 Sep, 2016

1 commit

  • With the batch changes that translated transient actions into
    a temporary list lost in the translation was the fact that
    tcf_action_destroy() will eventually delete the action from
    the permanent location if the refcount is zero.

    Example of what broke:
    ...add a gact action to drop
    sudo $TC actions add action drop index 10
    ...now retrieve it, looks good
    sudo $TC actions get action gact index 10
    ...retrieve it again and find it is gone!
    sudo $TC actions get action gact index 10

    Fixes: 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array"),
    Fixes: 824a7e8863b3 ("net_sched: remove an unnecessary list_del()")
    Fixes: f07fed82ad79 ("net_sched: remove the leftover cleanup_a()")

    Acked-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

20 Sep, 2016

1 commit


18 Aug, 2016

3 commits

  • As pointed out by Jamal, an action could be shared by
    multiple filters, so we can't use list to chain them
    any more after we get rid of the original tc_action.
    Instead, we could just save pointers to these actions
    in tcf_exts, since they are refcount'ed, so convert
    the list to an array of pointers.

    The "ugly" part is the action API still accepts list
    as a parameter, I just introduce a helper function to
    convert the array of pointers to a list, instead of
    relying on the C99 feature to iterate the array.

    Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
    Reported-by: Jamal Hadi Salim
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • This list_del() for tc action is not needed actually,
    because we only use this list to chain bulk operations,
    therefore should not be carried for latter operations.

    Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • After refactoring tc_action into tcf_common, we no
    longer need to cleanup temporary "actions" in list,
    they are permanently stored in the hashtable.

    Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
    Reported-by: Jamal Hadi Salim
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

26 Jul, 2016

2 commits

  • After the previous patch, struct tc_action should be enough
    to represent the generic tc action, tcf_common is not necessary
    any more. This patch gets rid of it to make tc action code
    more readable.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • struct tc_action is confusing, currently we use it for two purposes:
    1) Pass in arguments and carry out results from helper functions
    2) A generic representation for tc actions

    The first one is error-prone, since we need to make sure we don't
    miss anything. This patch aims to get rid of this use, by moving
    tc_action into tcf_common, so that they are allocated together
    in hashtable and can be cast'ed easily.

    And together with the following patch, we could really make
    tc_action a generic representation for all tc actions and each
    type of action can inherit from it.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong