21 Oct, 2020

1 commit


06 Oct, 2020

1 commit


25 Sep, 2020

1 commit

  • All TC actions call tcf_idr_insert() for new action at the end
    of their ->init(), so we can actually move it to a central place
    in tcf_action_init_1().

    And once the action is inserted into the global IDR, other parallel
    process could free it immediately as its refcnt is still 1, so we can
    not fail after this, we need to move it after the goto action
    validation to avoid handling the failure case after insertion.

    This is found during code review, is not directly triggered by syzbot.
    And this prepares for the next patch.

    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

24 Aug, 2020

1 commit


21 Aug, 2020

1 commit


19 Aug, 2020

1 commit


04 Aug, 2020

1 commit

  • When openvswitch conntrack offload with act_ct action. Fragment packets
    defrag in the ingress tc act_ct action and miss the next chain. Then the
    packet pass to the openvswitch datapath without the mru. The over
    mtu packet will be dropped in output action in openvswitch for over mtu.

    "kernel: net2: dropped over-mtu packet: 1528 > 1500"

    This patch add mru in the tc_skb_ext for adefrag and miss next chain
    situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
    to the qdisc_skb_cb when the packet defrag. And When the chain miss,
    The mru is set to tc_skb_ext which can be got by ovs datapath.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: wenxu
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    wenxu
     

02 Aug, 2020

1 commit


01 Aug, 2020

1 commit


26 Jul, 2020

1 commit

  • The UDP reuseport conflict was a little bit tricky.

    The net-next code, via bpf-next, extracted the reuseport handling
    into a helper so that the BPF sk lookup code could invoke it.

    At the same time, the logic for reuseport handling of unconnected
    sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
    which changed the logic to carry on the reuseport result into the
    rest of the lookup loop if we do not return immediately.

    This requires moving the reuseport_has_conns() logic into the callers.

    While we are here, get rid of inline directives as they do not belong
    in foo.c files.

    The other changes were cases of more straightforward overlapping
    modifications.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Jul, 2020

1 commit

  • The fragment packets do defrag in tcf_ct_handle_fragments
    will clear the skb->cb which make the qdisc_skb_cb clear
    too. So the qdsic_skb_cb should be store before defrag and
    restore after that.
    It also update the pkt_len after all the
    fragments finish the defrag to one packet and make the
    following actions counter correct.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: wenxu
    Signed-off-by: David S. Miller

    wenxu
     

11 Jul, 2020

1 commit


08 Jul, 2020

2 commits

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     
  • When tcf_ct_act execute the tcf_lastuse_update should
    be update or the used stats never update

    filter protocol ip pref 3 flower chain 0
    filter protocol ip pref 3 flower chain 0 handle 0x1
    eth_type ipv4
    dst_ip 1.1.1.1
    ip_flags frag/firstfrag
    skip_hw
    not_in_hw
    action order 1: ct zone 1 nat pipe
    index 1 ref 1 bind 1 installed 103 sec used 103 sec
    Action statistics:
    Sent 151500 bytes 101 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie 4519c04dc64a1a295787aab13b6a50fb

    Signed-off-by: wenxu
    Signed-off-by: David S. Miller

    wenxu
     

04 Jul, 2020

1 commit

  • There are a couple of places in net/sched/ that check skb->protocol and act
    on the value there. However, in the presence of VLAN tags, the value stored
    in skb->protocol can be inconsistent based on whether VLAN acceleration is
    enabled. The commit quoted in the Fixes tag below fixed the users of
    skb->protocol to use a helper that will always see the VLAN ethertype.

    However, most of the callers don't actually handle the VLAN ethertype, but
    expect to find the IP header type in the protocol field. This means that
    things like changing the ECN field, or parsing diffserv values, stops
    working if there's a VLAN tag, or if there are multiple nested VLAN
    tags (QinQ).

    To fix this, change the helper to take an argument that indicates whether
    the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
    make sure to skip all of them, so behaviour is consistent even in QinQ
    mode.

    To make the helper usable from the ECN code, move it to if_vlan.h instead
    of pkt_sched.h.

    v3:
    - Remove empty lines
    - Move vlan variable definitions inside loop in skb_protocol()
    - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
    bpf_skb_ecn_set_ce()

    v2:
    - Use eth_type_vlan() helper in skb_protocol()
    - Also fix code that reads skb->protocol directly
    - Change a couple of 'if/else if' statements to switch constructs to avoid
    calling the helper twice

    Reported-by: Ilya Ponetayev
    Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller

    Toke Høiland-Jørgensen
     

20 Jun, 2020

1 commit

  • This patch adds a drop frames counter to tc flower offloading.
    Reporting h/w dropped frames is necessary for some actions.
    Some actions like police action and the coming introduced stream gate
    action would produce dropped frames which is necessary for user. Status
    update shows how many filtered packets increasing and how many dropped
    in those packets.

    v2: Changes
    - Update commit comments suggest by Jiri Pirko.

    Signed-off-by: Po Liu
    Reviewed-by: Simon Horman
    Reviewed-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Po Liu
     

16 Jun, 2020

1 commit

  • Currently, tcf_ct_flow_table_restore_skb is exported by act_ct
    module, therefore modules using it will have hard-dependency
    on act_ct and will require loading it all the time.

    This can lead to an unnecessary overhead on systems that do not
    use hardware connection tracking action (ct_metadata action) in
    the first place.

    To relax the hard-dependency between the modules, we unexport this
    function and make it a static inline one.

    Fixes: 30b0cf90c6dd ("net/sched: act_ct: Support restoring conntrack info on skbs")
    Signed-off-by: Alaa Hleihel
    Reviewed-by: Roi Dayan
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Alaa Hleihel
     

01 Jun, 2020

1 commit

  • xdp_umem.c had overlapping changes between the 64-bit math fix
    for the calculation of npgs and the removal of the zerocopy
    memory type which got rid of the chunk_size_nohdr member.

    The mlx5 Kconfig conflict is a case where we just take the
    net-next copy of the Kconfig entry dependency as it takes on
    the ESWITCH dependency by one level of indirection which is
    what the 'net' conflicting change is trying to ensure.

    Signed-off-by: David S. Miller

    David S. Miller
     

31 May, 2020

1 commit


23 Apr, 2020

1 commit


26 Mar, 2020

1 commit


19 Mar, 2020

1 commit

  • Currently, on replace, the previous action instance params
    is swapped with a newly allocated params. The old params is
    only freed (via kfree_rcu), without releasing the allocated
    ct zone template related to it.

    Call tcf_ct_params_free (via call_rcu) for the old params,
    so it will release it.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: Paul Blakey
    Signed-off-by: David S. Miller

    Paul Blakey
     

13 Mar, 2020

4 commits

  • Pass the zone's flow table instance on the flow action to the drivers.
    Thus, allowing drivers to register FT add/del/stats callbacks.

    Finally, enable hardware offload on the flow table instance.

    Signed-off-by: Paul Blakey
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     
  • If driver deleted an FT entry, a FT failed to offload, or registered to the
    flow table after flows were already added, we still get packets in
    software.

    For those packets, while restoring the ct state from the flow table
    entry, refresh it's hardware offload.

    Signed-off-by: Paul Blakey
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     
  • Provide an API to restore the ct state pointer.

    This may be used by drivers to restore the ct state if they
    miss in tc chain after they already did the hardware connection
    tracking action (ct_metadata action).

    For example, consider the following rule on chain 0 that is in_hw,
    however chain 1 is not_in_hw:

    $ tc filter add dev ... chain 0 ... \
    flower ... action ct pipe action goto chain 1

    Packets of a flow offloaded (via nf flow table offload) by the driver
    hit this rule in hardware, will be marked with the ct metadata action
    (mark, label, zone) that does the equivalent of the software ct action,
    and when the packet jumps to hardware chain 1, there would be a miss.

    CT was already processed in hardware. Therefore, the driver's miss
    handling should restore the ct state on the skb, using the provided API,
    and continue the packet processing in chain 1.

    Signed-off-by: Paul Blakey
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     
  • NF flow table API associate 5-tuple rule with an action list by calling
    the flow table type action() CB to fill the rule's actions.

    In action CB of act_ct, populate the ct offload entry actions with a new
    ct_metadata action. Initialize the ct_metadata with the ct mark, label and
    zone information. If ct nat was performed, then also append the relevant
    packet mangle actions (e.g. ipv4/ipv6/tcp/udp header rewrites).

    Drivers that offload the ft entries may match on the 5-tuple and perform
    the action list.

    Signed-off-by: Paul Blakey
    Reviewed-by: Jiri Pirko
    Reviewed-by: Edward Cree
    Signed-off-by: David S. Miller

    Paul Blakey
     

09 Mar, 2020

1 commit

  • Convert zones_lock spinlock to zones_mutex mutex,
    and struct (tcf_ct_flow_table)->ref to a refcount,
    so that control path can use regular GFP_KERNEL allocations
    from standard process context. This is more robust
    in case of memory pressure.

    The refcount is needed because tcf_ct_flow_table_put() can
    be called from RCU callback, thus in BH context.

    The issue was spotted by syzbot, as rhashtable_init()
    was called with a spinlock held, which is bad since GFP_KERNEL
    allocations can sleep.

    Note to developers : Please make sure your patches are tested
    with CONFIG_DEBUG_ATOMIC_SLEEP=y

    BUG: sleeping function called from invalid context at mm/slab.h:565
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9582, name: syz-executor610
    2 locks held by syz-executor610/9582:
    #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:72 [inline]
    #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x3f9/0xad0 net/core/rtnetlink.c:5437
    #1: ffffffff8a3961b8 (zones_lock){+...}, at: spin_lock_bh include/linux/spinlock.h:343 [inline]
    #1: ffffffff8a3961b8 (zones_lock){+...}, at: tcf_ct_flow_table_get+0xa3/0x1700 net/sched/act_ct.c:67
    Preemption disabled at:
    [] 0x0
    CPU: 0 PID: 9582 Comm: syz-executor610 Not tainted 5.6.0-rc3-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x188/0x20d lib/dump_stack.c:118
    ___might_sleep.cold+0x1f4/0x23d kernel/sched/core.c:6798
    slab_pre_alloc_hook mm/slab.h:565 [inline]
    slab_alloc_node mm/slab.c:3227 [inline]
    kmem_cache_alloc_node_trace+0x272/0x790 mm/slab.c:3593
    __do_kmalloc_node mm/slab.c:3615 [inline]
    __kmalloc_node+0x38/0x60 mm/slab.c:3623
    kmalloc_node include/linux/slab.h:578 [inline]
    kvmalloc_node+0x61/0xf0 mm/util.c:574
    kvmalloc include/linux/mm.h:645 [inline]
    kvzalloc include/linux/mm.h:653 [inline]
    bucket_table_alloc+0x8b/0x480 lib/rhashtable.c:175
    rhashtable_init+0x3d2/0x750 lib/rhashtable.c:1054
    nf_flow_table_init+0x16d/0x310 net/netfilter/nf_flow_table_core.c:498
    tcf_ct_flow_table_get+0xe33/0x1700 net/sched/act_ct.c:82
    tcf_ct_init+0xba4/0x18a6 net/sched/act_ct.c:1050
    tcf_action_init_1+0x697/0xa20 net/sched/act_api.c:945
    tcf_action_init+0x1e9/0x2f0 net/sched/act_api.c:1001
    tcf_action_add+0xdb/0x370 net/sched/act_api.c:1411
    tc_ctl_action+0x366/0x456 net/sched/act_api.c:1466
    rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5440
    netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2478
    netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
    netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:672
    ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2343
    ___sys_sendmsg+0x100/0x170 net/socket.c:2397
    __sys_sendmsg+0xec/0x1b0 net/socket.c:2430
    do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4403d9
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffd719af218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9
    RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 0000000000000005 R09: 00000000004002c8
    R10: 0000000000000008 R11: 00000000000

    Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone")
    Signed-off-by: Eric Dumazet
    Cc: Paul Blakey
    Cc: Jiri Pirko
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Mar, 2020

2 commits

  • To make the filler functions more generic, use network
    relative skb pulling.

    Signed-off-by: Paul Blakey
    Acked-by: Marcelo Ricardo Leitner
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     
  • When checking the protocol number tcf_ct_flow_table_lookup() handles
    the flow as if it's always ipv4, while it can be ipv6.

    Instead, refactor the code to fetch the tcp header, if available,
    in the relevant family (ipv4/ipv6) filler function, and do the
    check on the returned tcp header.

    Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows")
    Signed-off-by: Paul Blakey
    Acked-by: Marcelo Ricardo Leitner
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     

04 Mar, 2020

3 commits

  • Offload nf conntrack processing by looking up the 5-tuple in the
    zone's flow table.

    The nf conntrack module will process the packets until a connection is
    in established state. Once in established state, the ct state pointer
    (nf_conn) will be restored on the skb from a successful ft lookup.

    Signed-off-by: Paul Blakey
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     
  • Add a ft entry when connections enter an established state and delete
    the connections when they leave the established state.

    The flow table assumes ownership of the connection. In the following
    patch act_ct will lookup the ct state from the FT. In future patches,
    drivers will register for callbacks for ft add/del events and will be
    able to use the information to offload the connections.

    Note that connection aging is managed by the FT.

    Signed-off-by: Paul Blakey
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     
  • Use the NF flow tables infrastructure for CT offload.

    Create a nf flow table per zone.

    Next patches will add FT entries to this table, and do
    the software offload.

    Signed-off-by: Paul Blakey
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paul Blakey
     

10 Dec, 2019

1 commit

  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

05 Dec, 2019

1 commit

  • The act_ct TC module shares a common conntrack and NAT infrastructure
    exposed via netfilter. It's possible that a packet needs both SNAT and
    DNAT manipulation, due to e.g. tuple collision. Netfilter can support
    this because it runs through the NAT table twice - once on ingress and
    again after egress. The act_ct action doesn't have such capability.

    Like netfilter hook infrastructure, we should run through NAT twice to
    keep the symmetry.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: Aaron Conole
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Aaron Conole
     

27 Nov, 2019

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Dynamic tick (nohz) updates, perhaps most notably changes to force
    the tick on when needed due to lengthy in-kernel execution on CPUs
    on which RCU is waiting.

    - Linux-kernel memory consistency model updates.

    - Replace rcu_swap_protected() with rcu_prepace_pointer().

    - Torture-test updates.

    - Documentation updates.

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
    security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
    bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
    fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
    drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
    drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
    x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
    rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
    rcu: Suppress levelspread uninitialized messages
    rcu: Fix uninitialized variable in nocb_gp_wait()
    rcu: Update descriptions for rcu_future_grace_period tracepoint
    rcu: Update descriptions for rcu_nocb_wake tracepoint
    rcu: Remove obsolete descriptions for rcu_barrier tracepoint
    rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
    workqueue: Convert for_each_wq to use built-in list check
    rcu: Several rcu_segcblist functions can be static
    rcu: Remove unused function hlist_bl_del_init_rcu()
    Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
    ...

    Linus Torvalds
     

22 Nov, 2019

1 commit

  • ct_policy and mpls_policy are parsed with nla_parse_nested(), which
    does NL_VALIDATE_STRICT validation, strict_start_type is not needed
    to set as it is actually trying to make some attributes parsed with
    NL_VALIDATE_STRICT.

    This patch is to remove it, and do the same on rtm_nh_policy which
    is parsed by nlmsg_parse().

    Suggested-by: Jakub Kicinski
    Signed-off-by: Xin Long
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Xin Long
     

31 Oct, 2019

4 commits

  • Extend struct tc_action with new "tcfa_flags" field. Set the field in
    tcf_idr_create() function and provide new helper
    tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
    value. Update individual hardware-offloaded actions init() to pass their
    "flags" argument to new helper in order to skip percpu stats allocation
    when user requested it through flags.

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extend TCA_ACT space with nla_bitfield32 flags. Add
    TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
    tcf_action_init_1() and pass resulting value as additional argument to
    a_o->init().

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extract common code that increments cpu_qstats counters into standalone act
    API functions. Change hardware offloaded actions that use percpu counter
    allocation to use the new functions instead of accessing cpu_qstats
    directly.

    This commit doesn't change functionality.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extract common code that increments cpu_bstats counter into standalone act
    API function. Change hardware offloaded actions that use percpu counter
    allocation to use the new function instead of incrementing cpu_bstats
    directly.

    This commit doesn't change functionality.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov