27 Jan, 2020

1 commit

  • The current implementations of ops->bind_class() are merely
    searching for classid and updating class in the struct tcf_result,
    without invoking either of cl_ops->bind_tcf() or
    cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
    like cbq use them to count filters too. This is why syzbot triggered
    the warning in cbq_destroy_class().

    In order to fix this, we have to call cl_ops->bind_tcf() and
    cl_ops->unbind_tcf() like the filter binding path. This patch does
    so by refactoring out two helper functions __tcf_bind_filter()
    and __tcf_unbind_filter(), which are lockless and accept a Qdisc
    pointer, then teaching each implementation to call them correctly.

    Note, we merely pass the Qdisc pointer as an opaque pointer to
    each filter, they only need to pass it down to the helper
    functions without understanding it at all.

    Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
    Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

16 Jun, 2019

1 commit

  • This config option makes only couple of lines optional.
    Two small helpers and an int in couple of cls structs.

    Remove the config option and always compile this in.
    This saves the user from unexpected surprises when he adds
    a filter with ingress device match which is silently ignored
    in case the config option is not set.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

28 Feb, 2019

1 commit

  • This reverts commit 31a998487641 ("net: sched: fw: don't set arg->stop in
    fw_walk() when empty")

    Cls API function tcf_proto_is_empty() was changed in commit
    6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
    to no longer depend on arg->stop to determine that classifier instance is
    empty. Instead, it adds dedicated arg->nonempty field, which makes the fix
    in fw classifier no longer necessary.

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     

23 Feb, 2019

1 commit

  • For tcindex filter, it is too late to initialize the
    net pointer in tcf_exts_validate(), as tcf_exts_get_net()
    requires a non-NULL net pointer. We can just move its
    initialization into tcf_exts_init(), which just requires
    an additional parameter.

    This makes the code in tcindex_alloc_perfect_hash()
    prettier.

    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

18 Feb, 2019

1 commit

  • Some classifiers set arg->stop in their implementation of tp->walk() API
    when empty. Most of classifiers do not adhere to that convention. Do not
    set arg->stop in fw_walk() to unify tp->walk() behavior among classifier
    implementations.

    Fixes: ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex")
    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     

13 Feb, 2019

2 commits

  • Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
    functions to track rtnl lock status. Extend users of these function in cls
    API to propagate rtnl lock status to them. This allows classifiers to
    obtain rtnl lock when necessary and to pass rtnl lock status to extensions
    and driver offload callbacks.

    Add flags field to tcf proto ops. Add flag value to indicate that
    classifier doesn't require rtnl lock.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Actions API is already updated to not rely on rtnl lock for
    synchronization. However, it need to be provided with rtnl status when
    called from classifiers API in order to be able to correctly release the
    lock when loading kernel module.

    Extend extension validation function with 'rtnl_held' flag which is passed
    to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
    API. No classifier is currently updated to support unlocked execution, so
    pass hardcoded 'true' flag parameter value.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     

25 May, 2018

1 commit

  • Commit 05f0fe6b74db ("RCU, workqueue: Implement rcu_work") introduces
    new API's for dispatching work in a RCU callback. Now we can just
    switch to the new API's for tc filters. This could get rid of a lot
    of code.

    Cc: Tejun Heo
    Cc: "Paul E. McKenney"
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

25 Jan, 2018

1 commit


20 Jan, 2018

4 commits


10 Nov, 2017

1 commit


09 Nov, 2017

1 commit

  • Hold netns refcnt before call_rcu() and release it after
    the tcf_exts_destroy() is done.

    Note, on ->destroy() path we have to respect the return value
    of tcf_exts_get_net(), on other paths it should always return
    true, so we don't need to care.

    Cc: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

30 Oct, 2017

1 commit

  • Several conflicts here.

    NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
    nfp_fl_output() needed some adjustments because the code block is in
    an else block now.

    Parallel additions to net/pkt_cls.h and net/sch_generic.h

    A bug fix in __tcp_retransmit_skb() conflicted with some of
    the rbtree changes in net-next.

    The tc action RCU callback fixes in 'net' had some overlap with some
    of the recent tcf_block reworking.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Oct, 2017

1 commit

  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

17 Oct, 2017

1 commit


01 Sep, 2017

1 commit

  • TC filters when used as classifiers are bound to TC classes.
    However, there is a hidden difference when adding them in different
    orders:

    1. If we add tc classes before its filters, everything is fine.
    Logically, the classes exist before we specify their ID's in
    filters, it is easy to bind them together, just as in the current
    code base.

    2. If we add tc filters before the tc classes they bind, we have to
    do dynamic lookup in fast path. What's worse, this happens all
    the time not just once, because on fast path tcf_result is passed
    on stack, there is no way to propagate back to the one in tc filters.

    This hidden difference hurts performance silently if we have many tc
    classes in hierarchy.

    This patch intends to close this gap by doing the reverse binding when
    we create a new class, in this case we can actually search all the
    filters in its parent, match and fixup by classid. And because
    tcf_result is specific to each type of tc filter, we have to introduce
    a new ops for each filter to tell how to bind the class.

    Note, we still can NOT totally get rid of those class lookup in
    ->enqueue() because cgroup and flow filters have no way to determine
    the classid at setup time, they still have to go through dynamic lookup.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

08 Aug, 2017

1 commit

  • Now we use 'unsigned long fh' as a pointer in every place,
    it is safe to convert it to a void pointer now. This gets
    rid of many casts to pointer.

    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

05 Aug, 2017

3 commits


22 Apr, 2017

2 commits

  • There is no need to NULL tp->root in ->destroy(), since tp is
    going to be freed very soon, and existing readers are still
    safe to read them.

    For cls_route, we always init its tp->root, so it can't be NULL,
    we can drop more useless code.

    Cc: Daniel Borkmann
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • We could have a race condition where in ->classify() path we
    dereference tp->root and meanwhile a parallel ->destroy() makes it
    a NULL. Daniel cured this bug in commit d936377414fa
    ("net, sched: respect rcu grace period on cls destruction").

    This happens when ->destroy() is called for deleting a filter to
    check if we are the last one in tp, this tp is still linked and
    visible at that time. The root cause of this problem is the semantic
    of ->destroy(), it does two things (for non-force case):

    1) check if tp is empty
    2) if tp is empty we could really destroy it

    and its caller, if cares, needs to check its return value to see if it
    is really destroyed. Therefore we can't unlink tp unless we know it is
    empty.

    As suggested by Daniel, we could actually move the test logic to ->delete()
    so that we can safely unlink tp after ->delete() tells us the last one is
    just deleted and before ->destroy().

    Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
    Cc: Roi Dayan
    Cc: Daniel Borkmann
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    WANG Cong
     

14 Apr, 2017

1 commit


20 Sep, 2016

1 commit


23 Aug, 2016

1 commit

  • After commit 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array")
    we do dynamic allocation in tcf_exts_init(), therefore we need
    to handle the ENOMEM case properly.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

25 Sep, 2015

1 commit

  • fw filter uses tp->root==NULL to check if it is the old method,
    so it doesn't need allocation at all in this case. This patch
    reverts the offending commit and adds some comments for old
    method to make it obvious.

    Fixes: 33f8b9ecdb15 ("net_sched: move tp->root allocation into fw_init()")
    Reported-by: Akshat Kakkar
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

10 Mar, 2015

1 commit

  • Kernel automatically creates a tp for each
    (kind, protocol, priority) tuple, which has handle 0,
    when we add a new filter, but it still is left there
    after we remove our own, unless we don't specify the
    handle (literally means all the filters under
    the tuple). For example this one is left:

    # tc filter show dev eth0
    filter parent 8001: protocol arp pref 49152 basic

    The user-space is hard to clean up these for kernel
    because filters like u32 are organized in a complex way.
    So kernel is responsible to remove it after all filters
    are gone. Each type of filter has its own way to
    store the filters, so each type has to provide its
    way to check if all filters are gone.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Mar, 2015

1 commit


10 Dec, 2014

2 commits


07 Oct, 2014

1 commit

  • Using the tcf_proto pointer 'tp' from inside the classifiers callback
    is not valid because it may have been cleaned up by another call_rcu
    occuring on another CPU.

    'tp' is currently being used by tcf_unbind_filter() in this patch we
    move instances of tcf_unbind_filter outside of the call_rcu() context.
    This is safe to do because any running schedulers will either read the
    valid class field or it will be zeroed.

    And all schedulers today when the class is 0 do a lookup using the
    same call used by the tcf_exts_bind(). So even if we have a running
    classifier hit the null class pointer it will do a lookup and get
    to the same result. This is particularly fragile at the moment because
    the only way to verify this is to audit the schedulers call sites.

    Reported-by: Cong Wang
    Signed-off-by: John Fastabend
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    John Fastabend
     

29 Sep, 2014

1 commit


17 Sep, 2014

1 commit

  • When allocating a new structure we also need to call tcf_exts_init
    to initialize exts.

    A follow up patch might be in order to remove some of this code
    and do tcf_exts_assign(). With this we could remove the
    tcf_exts_init/tcf_exts_change pattern for some of the classifiers.
    As part of the future tcf_actions RCU series this will need to be
    done. For now fix the call here.

    Fixes e35a8ee5993ba81fd6c0 ("net: sched: fw use RCU")
    Signed-off-by: John Fastabend
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    John Fastabend
     

14 Sep, 2014

1 commit