21 Jun, 2018

1 commit

  • [ Upstream commit af5d01842fe1fbfb9f5e1c1d957ba02ab6f4569a ]

    When application fails to pass flags in netlink TLV for a new skbedit action,
    the kernel results in the following oops:

    [ 8.307732] BUG: unable to handle kernel paging request at 0000000000021130
    [ 8.309167] PGD 80000000193d1067 P4D 80000000193d1067 PUD 180e0067 PMD 0
    [ 8.310595] Oops: 0000 [#1] SMP PTI
    [ 8.311334] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw
    [ 8.314190] CPU: 1 PID: 397 Comm: tc Not tainted 4.17.0-rc3+ #357
    [ 8.315252] RIP: 0010:__tcf_idr_release+0x33/0x140
    [ 8.316203] RSP: 0018:ffffa0718038f840 EFLAGS: 00010246
    [ 8.317123] RAX: 0000000000000001 RBX: 0000000000021100 RCX: 0000000000000000
    [ 8.319831] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000021100
    [ 8.321181] RBP: 0000000000000000 R08: 000000000004adf8 R09: 0000000000000122
    [ 8.322645] R10: 0000000000000000 R11: ffffffff9e5b01ed R12: 0000000000000000
    [ 8.324157] R13: ffffffff9e0d3cc0 R14: 0000000000000000 R15: 0000000000000000
    [ 8.325590] FS: 00007f591292e700(0000) GS:ffff8fcf5bc40000(0000) knlGS:0000000000000000
    [ 8.327001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8.327987] CR2: 0000000000021130 CR3: 00000000180e6004 CR4: 00000000001606a0
    [ 8.329289] Call Trace:
    [ 8.329735] tcf_skbedit_init+0xa7/0xb0
    [ 8.330423] tcf_action_init_1+0x362/0x410
    [ 8.331139] ? try_to_wake_up+0x44/0x430
    [ 8.331817] tcf_action_init+0x103/0x190
    [ 8.332511] tc_ctl_action+0x11a/0x220
    [ 8.333174] rtnetlink_rcv_msg+0x23d/0x2e0
    [ 8.333902] ? _cond_resched+0x16/0x40
    [ 8.334569] ? __kmalloc_node_track_caller+0x5b/0x2c0
    [ 8.335440] ? rtnl_calcit.isra.31+0xf0/0xf0
    [ 8.336178] netlink_rcv_skb+0xdb/0x110
    [ 8.336855] netlink_unicast+0x167/0x220
    [ 8.337550] netlink_sendmsg+0x2a7/0x390
    [ 8.338258] sock_sendmsg+0x30/0x40
    [ 8.338865] ___sys_sendmsg+0x2c5/0x2e0
    [ 8.339531] ? pagecache_get_page+0x27/0x210
    [ 8.340271] ? filemap_fault+0xa2/0x630
    [ 8.340943] ? page_add_file_rmap+0x108/0x200
    [ 8.341732] ? alloc_set_pte+0x2aa/0x530
    [ 8.342573] ? finish_fault+0x4e/0x70
    [ 8.343332] ? __handle_mm_fault+0xbc1/0x10d0
    [ 8.344337] ? __sys_sendmsg+0x53/0x80
    [ 8.345040] __sys_sendmsg+0x53/0x80
    [ 8.345678] do_syscall_64+0x4f/0x100
    [ 8.346339] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 8.347206] RIP: 0033:0x7f591191da67
    [ 8.347831] RSP: 002b:00007fff745abd48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [ 8.349179] RAX: ffffffffffffffda RBX: 00007fff745abe70 RCX: 00007f591191da67
    [ 8.350431] RDX: 0000000000000000 RSI: 00007fff745abdc0 RDI: 0000000000000003
    [ 8.351659] RBP: 000000005af35251 R08: 0000000000000001 R09: 0000000000000000
    [ 8.352922] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
    [ 8.354183] R13: 00007fff745afed0 R14: 0000000000000001 R15: 00000000006767c0
    [ 8.355400] Code: 41 89 d4 53 89 f5 48 89 fb e8 aa 20 fd ff 85 c0 0f 84 ed 00
    00 00 48 85 db 0f 84 cf 00 00 00 40 84 ed 0f 85 cd 00 00 00 45 84 e4 53 30
    74 0d 85 d2 b8 ff ff ff ff 0f 8f b3 00 00 00 8b 43 2c
    [ 8.358699] RIP: __tcf_idr_release+0x33/0x140 RSP: ffffa0718038f840
    [ 8.359770] CR2: 0000000000021130
    [ 8.360438] ---[ end trace 60c66be45dfc14f0 ]---

    The caller calls action's ->init() and passes pointer to "struct tc_action *a",
    which later may be initialized to point at the existing action, otherwise
    "struct tc_action *a" is still invalid, and therefore dereferencing it is an
    error as happens in tcf_idr_release, where refcnt is decremented.

    So in case of missing flags tcf_idr_release must be called only for
    existing actions.

    v2:
    - prepare patch for net tree

    Fixes: 5e1567aeb7fe ("net sched: skbedit action fix late binding")
    Signed-off-by: Roman Mashak
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Roman Mashak
     

09 Nov, 2017

1 commit

  • This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
    If we hold that refcnt, the netns can never be destroyed until
    all actions are destroyed by user, this breaks our netns design
    which we expect all actions are destroyed when we destroy the
    whole netns.

    Cc: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

03 Nov, 2017

1 commit

  • TC actions have been destroyed asynchronously for a long time,
    previously in a RCU callback and now in a workqueue. If we
    don't hold a refcnt for its netns, we could use the per netns
    data structure, struct tcf_idrinfo, after it has been freed by
    netns workqueue.

    Hold refcnt to ensure netns destroy happens after all actions
    are gone.

    Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
    Reported-by: Lucas Bates
    Tested-by: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

31 Aug, 2017

1 commit

  • Typically, each TC filter has its own action. All the actions of the
    same type are saved in its hash table. But the hash buckets are too
    small that it degrades to a list. And the performance is greatly
    affected. For example, it takes about 0m11.914s to insert 64K rules.
    If we convert the hash table to IDR, it only takes about 0m1.500s.
    The improvement is huge.

    But please note that the test result is based on previous patch that
    cls_flower uses IDR.

    Signed-off-by: Chris Mi
    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Chris Mi
     

14 Apr, 2017

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

28 Oct, 2016

1 commit

  • The user may want to use only some bits of the skb mark in
    his skbedit rules because the remaining part might be used by
    something else.

    Introduce the "mask" parameter to the skbedit actor in order
    to implement such functionality.

    When the mask is specified, only those bits selected by the
    latter are altered really changed by the actor, while the
    rest is left untouched.

    Signed-off-by: Antonio Quartulli
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Antonio Quartulli
     

26 Jul, 2016

1 commit

  • struct tc_action is confusing, currently we use it for two purposes:
    1) Pass in arguments and carry out results from helper functions
    2) A generic representation for tc actions

    The first one is error-prone, since we need to make sure we don't
    miss anything. This patch aims to get rid of this use, by moving
    tc_action into tcf_common, so that they are allocated together
    in hashtable and can be cast'ed easily.

    And together with the following patch, we could really make
    tc_action a generic representation for all tc actions and each
    type of action can inherit from it.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

05 Jul, 2016

2 commits


16 Jun, 2016

1 commit


08 Jun, 2016

3 commits


16 May, 2016

1 commit

  • The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next'
    because we no longer have a per-netns conntrack hash.

    The ip_gre.c conflict as well as the iwlwifi ones were cases of
    overlapping changes.

    Conflicts:
    drivers/net/wireless/intel/iwlwifi/mvm/tx.c
    net/ipv4/ip_gre.c
    net/netfilter/nf_conntrack_core.c

    Signed-off-by: David S. Miller

    David S. Miller
     

11 May, 2016

1 commit

  • The process below was broken and is fixed with this patch.

    //add a skbedit action and give it an instance id of 1
    sudo tc actions add action skbedit mark 10 index 1
    //create a filter which binds to skbedit action id 1
    sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
    match ip dst 17.0.0.1/32 flowid 1:10 action skbedit index 1

    Message before fix was:
    RTNETLINK answers: Invalid argument
    We have an error talking to the kernel

    Signed-off-by: Jamal Hadi Salim
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

27 Apr, 2016

1 commit


26 Feb, 2016

1 commit

  • Currently tc actions are stored in a per-module hashtable,
    therefore are visible to all network namespaces. This is
    probably the last part of the tc subsystem which is not
    aware of netns now. This patch makes them per-netns,
    several tc action API's need to be adjusted for this.

    The tc action API code is ugly due to historical reasons,
    we need to refactor that code in the future.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

09 Jul, 2015

1 commit

  • Reuse existing percpu infrastructure John Fastabend added for qdisc.

    This patch adds a new cpustats parameter to tcf_hash_create() and all
    actions pass false, meaning this patch should have no effect yet.

    Signed-off-by: Eric Dumazet
    Cc: Alexei Starovoitov
    Acked-by: Jamal Hadi Salim
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Feb, 2014

3 commits


22 Jan, 2014

1 commit


20 Jan, 2014

1 commit


14 Jan, 2014

1 commit


07 Jan, 2014

1 commit


28 Dec, 2013

1 commit

  • This is a bug fix. The existing code tries to kill many
    birds with one stone: Handling binding of actions to
    filters, new actions and replacing of action
    attributes. A simple test case to illustrate:

    XXXX
    moja@fe1:~$ sudo tc actions add action drop index 12
    moja@fe1:~$ actions get action gact index 12
    action order 1: gact action drop
    random type none pass val 0
    index 12 ref 1 bind 0
    moja@fe1:~$ sudo tc actions replace action ok index 12
    moja@fe1:~$ actions get action gact index 12
    action order 1: gact action drop
    random type none pass val 0
    index 12 ref 2 bind 0
    XXXX

    The above shows the refcounf being wrongly incremented on replace.
    There are more complex scenarios with binding of actions to filters
    that i am leaving out that didnt work as well...

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     

21 Dec, 2013

1 commit

  • This patch fixes:
    1) pass mask rather than size to tcf_hashinfo_init()
    2) the cleanup should be in reversed order in mirred_cleanup_module()

    Reported-by: Eric Dumazet
    Fixes: 369ba56787d7469c0afd ("net_sched: init struct tcf_hashinfo at register time")
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

19 Dec, 2013

1 commit


10 Dec, 2013

1 commit


07 Dec, 2013

1 commit

  • Several files refer to an old address for the Free Software Foundation
    in the file header comment. Resolve by replacing the address with
    the URL so that we do not have to keep
    updating the header comments anytime the address changes.

    CC: John Fastabend
    CC: Alex Duyck
    CC: Marcel Holtmann
    CC: Gustavo Padovan
    CC: Johan Hedberg
    CC: Jamal Hadi Salim
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jeff Kirsher
     

06 Dec, 2013

1 commit


15 Jan, 2013

1 commit

  • Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
    and struct net pointer is not provided in the call chain. His original
    patch made use of current->nsproxy->net_ns to find the network namespace,
    but this fails to work correctly for userspace code that makes use of
    netlink sockets in different network namespaces. Instead, pass the
    "struct net *" down along the call chain to where it is needed.

    This version removes the ifb changes as Eric has submitted that patch
    separately, but is otherwise identical to the previous version.

    Signed-off-by: Benjamin LaHaise
    Tested-by: Eric Dumazet
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

02 Apr, 2012

1 commit


06 Jul, 2011

1 commit


20 Jan, 2011

1 commit


11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Aug, 2010

1 commit

  • We leak at least 32bits of kernel memory to user land in tc dump,
    because we dont init all fields (capab ?) of the dumped structure.

    Use C99 initializers so that holes and non explicit fields are zeroed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Oct, 2009

1 commit


26 Nov, 2008

1 commit