15 Jan, 2020

1 commit


25 Dec, 2019

1 commit

  • The existing PUSH MPLS action inserts MPLS header between ethernet header
    and the IP header. Though this behaviour is fine for L3 VPN where an IP
    packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
    VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
    encapsulate the ethernet packet.

    The new mpls action ADD_MPLS inserts MPLS header at the start of the
    packet or at the start of the l3 header depending on the value of l3 tunnel
    flag in the ADD_MPLS arguments.

    POP_MPLS action is extended to support ethertype 0x6558.

    Signed-off-by: Martin Varghese
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Martin Varghese
     

10 Dec, 2019

1 commit

  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

05 Dec, 2019

2 commits

  • The skb_mpls_push was not updating ethertype of an ethernet packet if
    the packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to
    port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
    not update the ethertype of the packet even though the previous
    push_eth action had added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4

    Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Signed-off-by: David S. Miller

    Martin Varghese
     
  • The openvswitch module shares a common conntrack and NAT infrastructure
    exposed via netfilter. It's possible that a packet needs both SNAT and
    DNAT manipulation, due to e.g. tuple collision. Netfilter can support
    this because it runs through the NAT table twice - once on ingress and
    again after egress. The openvswitch module doesn't have such capability.

    Like netfilter hook infrastructure, we should run through NAT twice to
    keep the symmetry.

    Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
    Signed-off-by: Aaron Conole
    Signed-off-by: David S. Miller

    Aaron Conole
     

03 Dec, 2019

1 commit

  • The skb_mpls_pop was not updating ethertype of an ethernet packet if the
    packet was originally received from a non ARPHRD_ETHER device.

    In the below OVS data path flow, since the device corresponding to port 7
    is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
    the ethertype of the packet even though the previous push_eth action had
    added an ethernet header to the packet.

    recirc_id(0),in_port(7),eth_type(0x8847),
    mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
    actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
    pop_mpls(eth_type=0x800),4

    Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper")
    Signed-off-by: Martin Varghese
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Martin Varghese
     

02 Dec, 2019

2 commits

  • If we can't build the flow del notification, we can simply delete
    the flow, no need to crash the kernel. Still keep a WARN_ON to
    preserve debuggability.

    Note: the BUG_ON() predates the Fixes tag, but this change
    can be applied only after the mentioned commit.

    v1 -> v2:
    - do not leak an skb on error

    Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • All the callers of ovs_flow_cmd_build_info() already deal with
    error return code correctly, so we can handle the error condition
    in a more gracefull way. Still dump a warning to preserve
    debuggability.

    v1 -> v2:
    - clarify the commit message
    - clean the skb and report the error (DaveM)

    Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

27 Nov, 2019

1 commit

  • When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant
    flow has no UFID, we can exceed the computed size, as
    ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY
    attribute.
    Take the above in account when computing the flow command message
    size.

    Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.")
    Reported-by: Qi Jun Ding
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

16 Nov, 2019

1 commit

  • The nla_put_u16/nla_put_u32 makes sure that
    *attrlen is align. The call tree is that:

    nla_put_u16/nla_put_u32
    -> nla_put attrlen = sizeof(u16) or sizeof(u32)
    -> __nla_put attrlen
    -> __nla_reserve attrlen
    -> skb_put(skb, nla_total_size(attrlen))

    nla_total_size returns the total length of attribute
    including padding.

    Cc: Joe Stringer
    Cc: William Tu
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

15 Nov, 2019

1 commit

  • When using the kernel datapath, the upcall don't
    include skb hash info relatived. That will introduce
    some problem, because the hash of skb is important
    in kernel stack. For example, VXLAN module uses
    it to select UDP src port. The tx queue selection
    may also use the hash in stack.

    Hash is computed in different ways. Hash is random
    for a TCP socket, and hash may be computed in hardware,
    or software stack. Recalculation hash is not easy.

    Hash of TCP socket is computed:
    tcp_v4_connect
    -> sk_set_txhash (is random)

    __tcp_transmit_skb
    -> skb_set_hash_from_sk

    There will be one upcall, without information of skb
    hash, to ovs-vswitchd, for the first packet of a TCP
    session. The rest packets will be processed in Open vSwitch
    modules, hash kept. If this tcp session is forward to
    VXLAN module, then the UDP src port of first tcp packet
    is different from rest packets.

    TCP packets may come from the host or dockers, to Open vSwitch.
    To fix it, we store the hash info to upcall, and restore hash
    when packets sent back.

    +---------------+ +-------------------------+
    | Docker/VMs | | ovs-vswitchd |
    +----+----------+ +-+--------------------+--+
    | ^ |
    | | |
    | | upcall v restore packet hash (not recalculate)
    | +-+--------------------+--+
    | tap netdev | | vxlan module
    +---------------> +--> Open vSwitch ko +-->
    or internal type | |
    +-------------------------+

    Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

07 Nov, 2019

1 commit

  • The commit 69c51582ff786 ("dpif-netlink: don't allocate per
    thread netlink sockets"), in Open vSwitch ovs-vswitchd, has
    changed the number of allocated sockets to just one per port
    by moving the socket array from a per handler structure to
    a per datapath one. In the kernel datapath, a vport will have
    only one socket in most case, if so select it directly in
    fast-path.

    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

06 Nov, 2019

1 commit


04 Nov, 2019

10 commits

  • use the specified functions to init resource.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Unlocking of a not locked mutex is not allowed.
    Other kernel thread may be in critical section while
    we unlock it because of setting user_feature fail.

    Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
    Cc: Paul Blakey
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • When we destroy the flow tables which may contain the flow_mask,
    so release the flow mask struct.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The most case *index < ma->max, and flow-mask is not NULL.
    We add un/likely for performance.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Simplify the code and remove the unnecessary BUILD_BUG_ON.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The full looking up on flow table traverses all mask array.
    If mask-array is too large, the number of invalid flow-mask
    increase, performance will be drop.

    One bad case, for example: M means flow-mask is valid and NULL
    of flow-mask means deleted.

    +-------------------------------------------+
    | M | NULL | ... | NULL | M|
    +-------------------------------------------+

    In that case, without this patch, openvswitch will traverses all
    mask array, because there will be one flow-mask in the tail. This
    patch changes the way of flow-mask inserting and deleting, and the
    mask array will be keep as below: there is not a NULL hole. In the
    fast path, we can "break" "for" (not "continue") in flow_lookup
    when we get a NULL flow-mask.

    "break"
    v
    +-------------------------------------------+
    | M | M | NULL |... | NULL | NULL|
    +-------------------------------------------+

    This patch don't optimize slow or control path, still using ma->max
    to traverse. Slow path:
    * tbl_mask_array_realloc
    * ovs_flow_tbl_lookup_exact
    * flow_mask_find

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Port the codes to linux upstream and with little changes.

    Pravin B Shelar, says:
    | In case hash collision on mask cache, OVS does extra flow
    | lookup. Following patch avoid it.

    Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • When creating and inserting flow-mask, if there is no available
    flow-mask, we realloc the mask array. When removing flow-mask,
    if necessary, we shrink mask array.

    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Port the codes to linux upstream and with little changes.

    Pravin B Shelar, says:
    | mask caches index of mask in mask_list. On packet recv OVS
    | need to traverse mask-list to get cached mask. Therefore array
    | is better for retrieving cached mask. This also allows better
    | cache replacement algorithm by directly checking mask's existence.

    Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • The idea of this optimization comes from a patch which
    is committed in 2014, openvswitch community. The author
    is Pravin B Shelar. In order to get high performance, I
    implement it again. Later patches will use it.

    Pravin B Shelar, says:
    | On every packet OVS needs to lookup flow-table with every
    | mask until it finds a match. The packet flow-key is first
    | masked with mask in the list and then the masked key is
    | looked up in flow-table. Therefore number of masks can
    | affect packet processing performance.

    Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
    Signed-off-by: Tonghao Zhang
    Tested-by: Greg Rose
    Acked-by: William Tu
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

03 Nov, 2019

1 commit


26 Oct, 2019

1 commit

  • In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
    rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
    but there are a few paths calling rtnl_net_notifyid() from atomic
    context or from RCU critical sections. The later also precludes the use
    of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
    call is wrong too, as it uses GFP_KERNEL unconditionally.

    Therefore, we need to pass the GFP flags as parameter and propagate it
    through function calls until the proper flags can be determined.

    In most cases, GFP_KERNEL is fine. The exceptions are:
    * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
    indirectly call rtnl_net_notifyid() from RCU critical section,

    * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
    parameter.

    Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
    by nlmsg_new(). The function is allowed to sleep, so better make the
    flags consistent with the ones used in the following
    ovs_vport_cmd_fill_info() call.

    Found by code inspection.

    Fixes: 9a9634545c70 ("netns: notify netns id events")
    Signed-off-by: Guillaume Nault
    Acked-by: Nicolas Dichtel
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Guillaume Nault
     

23 Oct, 2019

1 commit

  • syzbot found the following crash on:

    HEAD commit: 1e78030e Merge tag 'mmc-v5.3-rc1' of git://git.kernel.org/..
    git tree: upstream
    console output: https://syzkaller.appspot.com/x/log.txt?x=148d3d1a600000
    kernel config: https://syzkaller.appspot.com/x/.config?x=30cef20daf3e9977
    dashboard link: https://syzkaller.appspot.com/bug?extid=13210896153522fe1ee5
    compiler: gcc (GCC) 9.0.0 20181231 (experimental)
    syz repro: https://syzkaller.appspot.com/x/repro.syz?x=136aa8c4600000
    C reproducer: https://syzkaller.appspot.com/x/repro.c?x=109ba792600000

    =====================================================================
    BUG: memory leak
    unreferenced object 0xffff8881207e4100 (size 128):
    comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
    hex dump (first 32 bytes):
    00 70 16 18 81 88 ff ff 80 af 8c 22 81 88 ff ff .p........."....
    00 b6 23 17 81 88 ff ff 00 00 00 00 00 00 00 00 ..#.............
    backtrace:
    [] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [] slab_post_alloc_hook mm/slab.h:522 [inline]
    [] slab_alloc mm/slab.c:3319 [inline]
    [] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
    [] kmalloc include/linux/slab.h:552 [inline]
    [] kzalloc include/linux/slab.h:748 [inline]
    [] ovs_vport_alloc+0x37/0xf0 net/openvswitch/vport.c:130
    [] internal_dev_create+0x24/0x1d0 net/openvswitch/vport-internal_dev.c:164
    [] ovs_vport_add+0x81/0x190 net/openvswitch/vport.c:199
    [] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
    [] ovs_dp_cmd_new+0x22f/0x410 net/openvswitch/datapath.c:1614
    [] genl_family_rcv_msg+0x2ab/0x5b0 net/netlink/genetlink.c:629
    [] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
    [] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
    [] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
    [] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    [] netlink_unicast+0x1ec/0x2d0 net/netlink/af_netlink.c:1328
    [] netlink_sendmsg+0x270/0x480 net/netlink/af_netlink.c:1917
    [] sock_sendmsg_nosec net/socket.c:637 [inline]
    [] sock_sendmsg+0x54/0x70 net/socket.c:657
    [] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
    [] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
    [] __do_sys_sendmsg net/socket.c:2365 [inline]
    [] __se_sys_sendmsg net/socket.c:2363 [inline]
    [] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363

    BUG: memory leak
    unreferenced object 0xffff88811723b600 (size 64):
    comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
    hex dump (first 32 bytes):
    01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 02 00 00 00 05 35 82 c1 .............5..
    backtrace:
    [] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [] slab_post_alloc_hook mm/slab.h:522 [inline]
    [] slab_alloc mm/slab.c:3319 [inline]
    [] __do_kmalloc mm/slab.c:3653 [inline]
    [] __kmalloc+0x169/0x300 mm/slab.c:3664
    [] kmalloc include/linux/slab.h:557 [inline]
    [] ovs_vport_set_upcall_portids+0x54/0xd0 net/openvswitch/vport.c:343
    [] ovs_vport_alloc+0x7f/0xf0 net/openvswitch/vport.c:139
    [] internal_dev_create+0x24/0x1d0 net/openvswitch/vport-internal_dev.c:164
    [] ovs_vport_add+0x81/0x190 net/openvswitch/vport.c:199
    [] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
    [] ovs_dp_cmd_new+0x22f/0x410 net/openvswitch/datapath.c:1614
    [] genl_family_rcv_msg+0x2ab/0x5b0 net/netlink/genetlink.c:629
    [] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
    [] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
    [] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
    [] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    [] netlink_unicast+0x1ec/0x2d0 net/netlink/af_netlink.c:1328
    [] netlink_sendmsg+0x270/0x480 net/netlink/af_netlink.c:1917
    [] sock_sendmsg_nosec net/socket.c:637 [inline]
    [] sock_sendmsg+0x54/0x70 net/socket.c:657
    [] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
    [] __sys_sendmsg+0x80/0xf0 net/socket.c:2356

    BUG: memory leak
    unreferenced object 0xffff8881228ca500 (size 128):
    comm "syz-executor032", pid 7015, jiffies 4294944622 (age 7.880s)
    hex dump (first 32 bytes):
    00 f0 27 18 81 88 ff ff 80 ac 8c 22 81 88 ff ff ..'........"....
    40 b7 23 17 81 88 ff ff 00 00 00 00 00 00 00 00 @.#.............
    backtrace:
    [] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [] slab_post_alloc_hook mm/slab.h:522 [inline]
    [] slab_alloc mm/slab.c:3319 [inline]
    [] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
    [] kmalloc include/linux/slab.h:552 [inline]
    [] kzalloc include/linux/slab.h:748 [inline]
    [] ovs_vport_alloc+0x37/0xf0 net/openvswitch/vport.c:130
    [] internal_dev_create+0x24/0x1d0 net/openvswitch/vport-internal_dev.c:164
    [] ovs_vport_add+0x81/0x190 net/openvswitch/vport.c:199
    [] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
    [] ovs_dp_cmd_new+0x22f/0x410 net/openvswitch/datapath.c:1614
    [] genl_family_rcv_msg+0x2ab/0x5b0 net/netlink/genetlink.c:629
    [] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
    [] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
    [] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
    [] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    [] netlink_unicast+0x1ec/0x2d0 net/netlink/af_netlink.c:1328
    [] netlink_sendmsg+0x270/0x480 net/netlink/af_netlink.c:1917
    [] sock_sendmsg_nosec net/socket.c:637 [inline]
    [] sock_sendmsg+0x54/0x70 net/socket.c:657
    [] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
    [] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
    [] __do_sys_sendmsg net/socket.c:2365 [inline]
    [] __se_sys_sendmsg net/socket.c:2363 [inline]
    [] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363
    =====================================================================

    The function in net core, register_netdevice(), may fail with vport's
    destruction callback either invoked or not. After commit 309b66970ee2
    ("net: openvswitch: do not free vport if register_netdevice() is failed."),
    the duty to destroy vport is offloaded from the driver OTOH, which ends
    up in the memory leak reported.

    It is fixed by releasing vport unless device is registered successfully.
    To do that, the callback assignment is defered until device is registered.

    Reported-by: syzbot+13210896153522fe1ee5@syzkaller.appspotmail.com
    Fixes: 309b66970ee2 ("net: openvswitch: do not free vport if register_netdevice() is failed.")
    Cc: Taehee Yoo
    Cc: Greg Rose
    Cc: Eric Dumazet
    Cc: Marcelo Ricardo Leitner
    Cc: Ying Xue
    Cc: Andrey Konovalov
    Signed-off-by: Hillf Danton
    Acked-by: Pravin B Shelar
    [sbrivio: this was sent to dev@openvswitch.org and never made its way
    to netdev -- resending original patch]
    Signed-off-by: Stefano Brivio
    Reviewed-by: Greg Rose
    Signed-off-by: Jakub Kicinski

    Hillf Danton
     

21 Oct, 2019

1 commit


16 Oct, 2019

1 commit

  • the following script:

    # tc qdisc add dev eth0 clsact
    # tc filter add dev eth0 egress protocol ip matchall \
    > action mpls push protocol mpls_uc label 0x355aa bos 1

    causes corruption of all IP packets transmitted by eth0. On TC egress, we
    can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
    operation will result in an overwrite of the first 4 octets in the packet
    L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
    error pattern is present also in the MPLS 'pop' operation. Fix this error
    in act_mpls data plane, computing 'mac_len' as the difference between the
    network header and the mac header (when not at TC ingress), and use it in
    MPLS 'push'/'pop' core functions.

    v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
    in skb_mpls_pop(), reported by kbuild test robot

    CC: Lorenzo Bianconi
    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Reviewed-by: Simon Horman
    Acked-by: John Hurley
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     

06 Oct, 2019

1 commit

  • This patch allows to attach conntrack helper to a confirmed conntrack
    entry. Currently, we can only attach alg helper to a conntrack entry
    when it is in the unconfirmed state. This patch enables an use case
    that we can firstly commit a conntrack entry after it passed some
    initial conditions. After that the processing pipeline will further
    check a couple of packets to determine if the connection belongs to
    a particular application, and attach alg helper to the connection
    in a later stage.

    Signed-off-by: Yi-Hung Wei
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     

02 Oct, 2019

1 commit

  • commit 174e23810cd31
    ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
    recycle always drop skb extensions. The additional skb_ext_del() that is
    performed via nf_reset on napi skb recycle is not needed anymore.

    Most nf_reset() calls in the stack are there so queued skb won't block
    'rmmod nf_conntrack' indefinitely.

    This removes the skb_ext_del from nf_reset, and renames it to a more
    fitting nf_reset_ct().

    In a few selected places, add a call to skb_ext_reset to make sure that
    no active extensions remain.

    I am submitting this for "net", because we're still early in the release
    cycle. The patch applies to net-next too, but I think the rename causes
    needless divergence between those trees.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

26 Sep, 2019

1 commit

  • userspace openvswitch patch "(dpif-linux: Implement the API
    functions to allow multiple handler threads read upcall)"
    changes its type from U32 to UNSPEC, but leave the kernel
    unchanged

    and after kernel 6e237d099fac "(netlink: Relax attr validation
    for fixed length types)", this bug is exposed by the below
    warning

    [ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.

    Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's")
    Signed-off-by: Li RongQing
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Li RongQing
     

06 Sep, 2019

1 commit

  • Offloaded OvS datapath rules are translated one to one to tc rules,
    for example the following simplified OvS rule:

    recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)

    Will be translated to the following tc rule:

    $ tc filter add dev dev1 ingress \
    prio 1 chain 0 proto ip \
    flower tcp ct_state -trk \
    action ct pipe \
    action goto chain 2

    Received packets will first travel though tc, and if they aren't stolen
    by it, like in the above rule, they will continue to OvS datapath.
    Since we already did some actions (action ct in this case) which might
    modify the packets, and updated action stats, we would like to continue
    the proccessing with the correct recirc_id in OvS (here recirc_id(2))
    where we left off.

    To support this, introduce a new skb extension for tc, which
    will be used for translating tc chain to ovs recirc_id to
    handle these miss cases. Last tc chain index will be set
    by tc goto chain action and read by OvS datapath.

    Signed-off-by: Paul Blakey
    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Paul Blakey
     

03 Sep, 2019

1 commit


29 Aug, 2019

2 commits

  • Only the first fragment in a datagram contains the L4 headers. When the
    Open vSwitch module parses a packet, it always sets the IP protocol
    field in the key, but can only set the L4 fields on the first fragment.
    The original behavior would not clear the L4 portion of the key, so
    garbage values would be sent in the key for "later" fragments. This
    patch clears the L4 fields in that circumstance to prevent sending those
    garbage values as part of the upcall.

    Signed-off-by: Justin Pettit
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Justin Pettit
     
  • When IP fragments are reassembled before being sent to conntrack, the
    key from the last fragment is used. Unless there are reordering
    issues, the last fragment received will not contain the L4 ports, so the
    key for the reassembled datagram won't contain them. This patch updates
    the key once we have a reassembled datagram.

    The handle_fragments() function works on L3 headers so we pull the L3/L4
    flow key update code from key_extract into a new function
    'key_extract_l3l4'. Then we add a another new function
    ovs_flow_key_update_l3l4() and export it so that it is accessible by
    handle_fragments() for conntrack packet reassembly.

    Co-authored-by: Justin Pettit
    Signed-off-by: Greg Rose
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Greg Rose
     

28 Aug, 2019

1 commit


26 Aug, 2019

1 commit

  • This patch addresses a conntrack cache issue with timeout policy.
    Currently, we do not check if the timeout extension is set properly in the
    cached conntrack entry. Thus, after packet recirculate from conntrack
    action, the timeout policy is not applied properly. This patch fixes the
    aforementioned issue.

    Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
    Reported-by: kbuild test robot
    Signed-off-by: Yi-Hung Wei
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     

25 Aug, 2019

1 commit


07 Aug, 2019

2 commits